Why Do Root Cause Analysis?^©

Arthur M. Schneiderman

I've heard it said: "You have to go slow to go fast." In no case is this truer than in incremental process improvement. Yet, give any good supervisor or manager a description of a problem in their area and they will most likely come up with a solution. That's what is expected of them in our traditional style of management. Make a suggestion to them that's consistent with their instincts and they'll usually accept it. But most problems don't have obvious solutions. That's why they remain problems. Let me demonstrate the risks associated with the leap from problem to solution vs. the apparently slower path of data collection, root cause analysis, solution generation, and verification.

Let's first brainstorm for all of the possible causes of a given problem. We usually can generate literally hundreds of candidates. Now let's assume that the 80/20 rule holds so that only 20% of the candidates are what Juran called the "vital few." In other words, there is only one chance in 5 that a randomly selected candidate will be a real root cause of the problem. Now put on your statistician hat for a moment and you can easily show that the probability of getting zero root causes by randomly selecting n of the candidates (and assuming the list of possible candidates is very large compared to n) is 0.8ⁿ. Here's what that looks like graphically:

As you can see, the probability of getting no root causes in a random selection of three is around 50%. Even if we randomly select 10 of the candidates, we still have a 10% chance of coming up with none.

In other words, without verifying that you really have identified the root causes, the deck is heavily stacked against you. Yet there are lots of reasons to believe that this is a valid description of the situation for the too common practice of trial-and-error problem solving. Many of the non-value adding steps, eliminated in process simplification, had their origins in this approach to process improvement.

Now let's consider an alternative approach. We collect and analyze data about the problem. Furthermore, let's design our experiments with a goal of improving our odds so that we now have a short list with an 80% probability that a candidate on that list is a root cause. If we randomly select one from the short list, our chances of it being a root cause is 80%. Select two, and the probability that at least one is a root cause rises to 96%! I'll take those odds over trial-and-error any day.

Much of the literature on experimental design assumes much higher confidence levels than 80%. That's because the goal is scientific validity and publishability. But if we're simply trying to improve our odds of selecting a root cause, our experiments can be far simpler. For example, we rarely need a sample size greater than 50, even though the sampling literature will yield much higher numbers.

Having identified a probable root cause, it's generally straight forward to select a countermeasure. But we don't stop there. By piloting the solution, we can verify its effectiveness. The combination of data collection and analysis and verification will greatly reduce the number of iterations needed to identify an appropriate corrective action.

Although it's easy to get caught up in the enthusiasm generated by a newly formed improvement team, with their desire to "do something now," rather than proceed methodically, it is critical to constantly remember that the real objective is to improve the process at the fastest possible rate. To do this often requires what Admiral Hyman Rickover, the father of the US nuclear navy, called "courageous impatience" and to remember the fabled victory of the tortoise over the hare.

Last modified: August 13, 2006

Why Do Root Cause Analysis?©

Why Do Root Cause Analysis?^©