Wednesday, August 24, 2011

The Cult of the Root Cause

Tip of the Month: August 2011

“Why?” is my favorite question because it illuminates relationships between cause and effect. And when we ask this question more than once we expose even deeper causal relationships. Unfortunately, my favorite question has been hijacked by the Cult of the Root Cause and been transformed into the ritual of “The Five Whys”. The concept behind this ritual is simple: when trying to solve a problem, ask “Why” at least five times. Each “Why” will bring you closer to the ultimate cause of the problem. Finally, you will arrive at the root cause, and once there, you can fix the real problem instead merely treating symptoms.

The wisdom of this approach seems obvious. After all, fixing problems is like weeding a garden. If you only remove the visible top of the weed, it can grow back; if you remove the root, then the weed is gone forever. Why not trace problems back to their root cause and fix them there? The logic seems flawless – that is, unless you stop to think about it.

Invisibly embedded in this approach are two important assumptions. First, the approach assumes that causality progresses from root cause to final effect through linear chain of stages. Second, it assumes that the best location to intervene in this chain of causality is at its source: the root cause. Certainly there are many simple cases where both these assumptions are true; in such cases, it is indeed desirable to intervene at the root cause. However, these two assumptions are frequently wrong, and in such cases the five “Whys” can lead us astray.

Upstream Isn’t Always Best
Let’s look at the second assumption first. Is it always most desirable to intervene at the beginning of the chain, at the root cause? There are two important circumstances that can make it undesirable to intervene at the level of the root cause. First, when speed of response is important, attacking an intermediate stage may produce faster results. For example, you turn on your computer and see smoke rising from the cabinet. You brilliantly deduce that the smoke probably a symptom of a deeper problem. Should you treat the symptom or fix the root cause? Most of us would treat the symptom by shutting off the power, even though we realize this does not addressing the root cause. Thus, we commonly attack symptoms instead of root causes when response time is important.

The second reason to attack a symptom is when this is a more cost-effective solution. For example, people who type produce spelling errors; in many cases the root cause of these errors is that they never learned to spell. We could address the root cause by sentencing bad spellers to long hours in spelling reeducation camps. While this may appeal to our sense of orthographic justice, it is more efficient to use spell checkers to treat the symptoms. Thus, we often choose to attack symptoms when it is more cost-effective to fix an intermediate cause than the root cause.

Networks Are Not Chains
Now let’s look at the first assumption: root cause and final effect are linked in a linear chain of causality. In many cases it is more correct to think of causes generating effects through a causal network rather than a linear chain. In such networks the paths that lead from cause to effect are much more complex than the linear sequence found in the root cause model. There are often multiple causes for an effect, and there can be multiple effects branching out from a single cause.

In such cases it is very misleading to focus on a single linear path. Doing so causes us to ignore the other paths that are entering and exiting the chain, paths that connect the chain to ancillary causes and effects. When we ignore these ancillary paths, we miscalculate the economics of our choices, and this in turn leads us to make bad economic decisions.

Consider, for example, problems with multiple causes. When you view such problems as having a single cause you cannot access the full range of options available to fix the problem. For example, every schoolchild learns that fires require a combination of heat, fuel, and oxygen. Which one is the root cause of fire? There is no one root cause; we can intervene in three different places to prevent fires, and each of these places can be attractive under specific circumstances. When we can’t control heat, we might choose to remove fuel. When we can’t eliminate fuel, we might eliminate heat. When we can’t eliminate either heat or fuel, we might eliminate sources of oxygen. The point is that by fixating on a single cause we lose access to a broader range of solutions.

Now, consider an intermediate stage with multiple effects. For example, diabetes, is a complicated disease that affects many systems within the body. One of its key symptoms is high blood glucose levels. Some patients with Type II diabetes can bring their blood glucose levels under control with careful exercise and diet, but it takes time to do this. Meanwhile, a patient’s high blood glucose levels can lead to conditions like blindness, kidney disease, and heart disease. While high blood glucose is indeed a symptom, it is actually quite sensible to treat this symptom by using insulin. Treating the symptom alleviates the multiple effects of the symptom. If we only focused on a single effect we would underestimate the full benefits of treating the symptom. When selecting interventions it is important to consider the multitude of effects can that fan out from a node in the causal network.

Opening New Horizons
Once we have broken the spell of root cause fixation, this unlocks to two additional insights. First, the optimum intervention point may change with time. For example, let’s say that while sailing you get an alarm indicating high water levels in the bilge of your sailboat. Your immediate intervention may be to pump water out of the bilge. After the water is pumped down you may observe a crack in the hull which you can temporarily plug until you return to port. When you return to port you can have the crack in the hull investigated and repaired. The optimum place to intervene has shifted from pumping, to plugging, to hull repair; thus, it is time dependent. Such dynamic solutions will exist in both causal chains and causal networks.

Second, because there are multiple possible intervention points, we can consider intervening at multiple stages simultaneously. For example, despite our best attempt to plug the crack in our sailboat, there may still be water coming in. If we want to return to port safely we may have to patch the leak and run our bilge pump. The idea that interventions should only take place at the “one best place” is an illusion.

So, ask the question “Why,” but use the answers with care. Don’t assume you will only encounter problems that can be reduced to a simple single chain of causality where the best intervention lies at the start of the chain. Be open to the possibility that you are dealing with a causal network that has multiple starting points and endpoints. You might even consider adding a few more questions to your toolkit:

1. Why do I think the root cause is the best place to fix this problem?
2. Why do I think I should only intervene at a single location?
3. Why do I think the best intervention point will remain static?
4. What other important causes and effects are entering and exiting my causal chain?

Happy problem solving!

Don Reinertsen

No comments: