HOW TO BUILD A BALANCED SCORECARD©
Arthur M. Schneiderman
scorecard contains a concise set of strategically important measures. They
capture the vital few drivers of the organization’s future success.
I’ve called these scorecard measures “metrics” and defined them
Once we have
identified those processes, we face the challenge of selecting this subset from
a seemingly endless list of possibilities. Usually this decision is based
on what measures are already available or can easily be obtained, benchmarking
studies, or executive edict. But there is a much better way of doing it.
of a process come in two flavors: I call them “results measures” and
“process measures,” although each has many aliases:
set of names you choose, there is a very important difference between them:
characterize the output of the process. They
are the consequences of actions taken within it.
Since they are descriptors of the output, they relate directly or
indirectly to things that a customer of that process can sense or measure.
on the other hand, are the internal measures from within the process that
determine these results. In most cases, the customer has little or no interest in or
knowledge of them.
very useful model for generating candidate measures is called the SIPOC method.
SIPOC stands for
using this model, we usually start by identifying all of the customers of the
process and determine their complete sets of requirements. Here, customers
include both the external purchaser of the final product or service as well as
other internal processes that are part of the organization’s value creating
activities (or, as we say in TQM: “The next process step is the customer.”).
Through a process called “Voice of the Customer” we translate these
requirements into results measures that characterize the output of the process
in terms that are both meaningful to and measurable by the process executors.
This translation is necessary because the customer often describes their
requirements in words that do not have a direct process counterpart.
we reverse this procedure by identifying all of the external inputs that we need
in order to execute the process, define our requirements for each of these
inputs, and ideally working with our suppliers, translate them back into a set
of specifications that are expressed in the supplier’s own language (“Voice
of the Supplier”).
measures and their associated quantifiable customer requirements (Output®Customer)
are clearly results measures. Measures associated with steps internal to
the process (Process) are obviously process
measures. But what about input and supplier measures (Supplier®Input)?
Symmetry would suggest that since they are results measures of the supplier’s
value creation process, they must also be results measures for our process.
But is that necessarily so? In other words, can a measure that is a
results measure for an upstream process be a process measure in a subsequent
step? The answer here is a little bit tricky.
is different about Supplier®Input
measures are that we cannot improve them directly from within our own process.
We can only do so indirectly by changing specifications, or suppliers, or
through the redesign of our product and/or process (“design for x-ability”).
Their actual improvement is directly controllable only by the supplier of
that input. Often we have a limited ability to affect our supplier’s
control or improvement efforts (through partnering, for example) or to redesign
our products and/or processes. If that is the case, then we need to treat
that measure as a given (that is, a constant) and that measure’s
classification into the results or process category then becomes moot.
speaking, to indirectly change an input measure requires the exercise of
different internal process within our organization - the supplier selection
process by which we choose suppliers, and/or the product and process design
processes. Even in that case, it is difficult to argue that they are
anything but results measures. In other words, unless we include within
our process sub-processes for supplier selection and product/process redesign,
we must view these measures as the result measures of other internal or external
given process is part of a system of interacting processes. This is one of
the important reasons why it’s critical to have sponsorship of all improvement
efforts by someone who is in a position to set appropriate boundaries and
constraints to that effort.
Math of Metrics
a mathematical point of view, the last alias-pair is the traditional choice of
terms. For each results measure, we
can write a symbolic equation that relates this dependent measure (or more
correctly, “dependent variable”) to the independent ones:
In words, this equation simply states that the dependent measure, yi,
is a function of (i.e. depends on, or is determined by) all of the independent
measures: x1, x2, up to xn,
where n is the total number of independent process measures.
For example, if the process were baking a cake, then one dependent
measure would be the “lightness” (in the Language of the Customer) of the
resulting cake, measured by its density (in the Language of the Process) in
grams per cubic centimeter. Here, y1
would be cake density and the goal is for it to be in a specific range:
not too light, not too heavy, but just right.
What about the x’s? The
list would include oven temperature, cooking time, amounts used of the various
ingredients, freshness of ingredients, etc. These are the measures that
are included or implied in a clear recipe (or Standard Operating Procedure
(SOP)). Other dependent measures would include moistness, sweetness, and
flavor for example and we could create instruments that would measure each of
them, as well as establishing each of their associated target ranges.
Each dependent measure would depend on one or more of the many
drivers of change
general, we are trying to limit variation of and/or improve dependent measures
in order to make our product more attractive to its customers.
So let’s look at how this equation changes with changes in the
symbol “D” stands
for a small change in the measure. So this equation says that the change
in a dependent measure is the sum of the weighted changes in all of the
independent measures. For very small changes in the measures,
mathematicians can show that this simple additive relationship holds in most
practical cases. The weights aij
are sometimes called “influence coefficients” or “impact parameters.”1
They represent the effect that a small change in the jth
independent measure has on the ith dependent measure.
If aij is zero, than small changes in its independent measure
have no effect on that dependent measure. If the value of aij
is large compared to the other coefficients, then the dependent measure is very
sensitive to changes in that independent measure. It’s these influential
independent measures that are usually the targets for both process control
and process improvement
and are therefore candidate scorecard metrics.
process control, they are called “critical nodes.”
By “locking” them, we assure that variation in the dependent measures
that they affect will be maintained within a range that’s acceptable (but not
necessarily satisfactory) to the customer.
For process improvement they indicate the likely root causes of the gap
between current and target results.
in practice, for large changes in the measures, this simple model is often
limited by two phenomena: “non-linearity” and “interaction.”
Non-linearity causes the influence coefficients to change (increase or
decrease) for large changes in their independent measure. Interaction
occurs when interdependencies develop between the various independent measures
(they loose their independence).
exact mathematical function takes on different forms for different dependant
measures and processes. Here are some examples:
The time required to execute a
process from its start to its finish is called its cycle time. If the
various x’s are the cycle times, tj, for the internal
process steps that lie on the “critical path,” then the total cycle time, tT,
The overall yield of a process
depends on the sequential yield of the internal sub-process steps. Let’s
say that if the process were perfect (no internal yield loss), it would produce
100 output units. If the actual yield in the first step is 90%, then only
90 potential outputs survive it to the next step. If that step’s yield
were 80%, then only 80% of those 90 or 72 would make it to the next step, etc.
Therefore, the overall yield is given by:
Example 1 above, all of the influence coefficients have a constant value of 1,
that is any increase or decrease in a critical path cycle time simply adds or
subtracts that change from the total cycle time. We could include
non-critical path sub-process cycle times, but their influence coefficients
would all be zero (until they became long enough to enter the critical path).
On the other hand, for example 2 it is straight forward to show that the
influence coefficient is inversely proportional to that sub-process’ yield (aTj=YT/Yj).
What this means is that improving a low yielding process step by 1% (for example
from 25% to 26%) has a greater impact on total yield than that same 1%
improvement in a high yielding process step (going from say 95% to 96%).
In other words, lower yield process steps have larger influence coefficients.
many manufacturing environments, process or manufacturing engineers know the
mathematical relationships between the dependent and independent measures.
Usually they do this based on a physical or chemical theory of what’s
happening in the process. When this is the case, these experts can help in
the selection of those independent measures that are the principle drivers of
change in any given results metric. Once identified, these process metrics
generally represent the primary targets for improvement efforts and are tracked
on the appropriate scorecard.
Determining Process Metrics
a rule-of-thumb, low influence coefficient independent measures vastly outnumber
the critical few (see Why
Do Root Cause Analysis?).
So trial-and-error is not a viable option. Finding the process metrics in
practice often ends up requiring a mixture of both art and science.
When a theoretical equation does not exist or is not known, we need to resort to empirical observation. Total Quality Management (TQM) employs teams that apply the scientific methodology (the PDCA Cycle and the 7-Step Method) and basic analysis tools (the 7 QC Tools) for identification of the root causes (process metrics) of undesirable outcomes (results metrics). I’ve explained this process in more detail in my article “Are There Limits to TQM?"
vast majority of process improvements can be discovered using these simple
scientific tools. For more complex situations, three additional approaches
are sometimes used: heuristic techniques, design of experiments (DOE), and
I once assisted a
team trying to reduce defects in welded pipe used in the oil industry. The
particular defect was called “hook cracks” since they had the shape of a
fishhook. In stratifying defect data by shift, I discovered that one crew
had significantly lower defect levels than the others. I narrowed it down
to the welder operator and interviewed him in the hopes of documenting his
“secret” so that this best practice could be shared with the others.
Each welder setting was specified with a range determined by the industrial
engineers. I asked him how he chose a setting from within these ranges and
his answer was “I can tell by the sound the welder makes.” The other
operators just tried to pick the mid-point. The IE’s response: “Sound
has nothing to do with weld quality.”
A few months
later I visited an identical pipe mill in Japan where the operators relied on an
additional meter to adjust the mill settings. Using a microphone placed
near the weld site and connected to a measuring instrument (a spectrum
analyzer), their IE’s had determined that if the sound frequency was within a
certain range, a perfect weld was produced. Outside that range, the
resulting product was defective. What was the defect? No one
remembered at first since the discovery had been made several years before.
Finding an old-timer they came back with the answer: “Something called hook
cracks.” Why should a good weld
have a certain pitch to the sound it made? There was no accepted
theoretical explanation; it simply worked. The Japanese IE’s were
willing to accept this heuristic observation while their American counterparts
had discarded it as scientifically baseless.
another example, Kano2
observed an important non-linearity in the independent measures that we call
customer satisfaction. He
classified the independent attributes that drive customer satisfaction (such as
particular product features, price, availability, reliability, etc.) into
To place each independent measure into one of these categories, Kano
developed a structured multiple-choice survey tool. He than created a
heuristic “decoder ring” for determining the measure type from the responses
to paired questions. By understanding current performance and the
type of measure, the user could than rank all of the independent measures by
their improvement’s impact on customer satisfaction.
In general, heuristic methods are based on empirical observation, not on
any underlying mathematical theory. They are often discovered through gut
feel or what I’ve called the “ins”: instinct, intuition, insight,
inspiration, innovation, invention, etc. Their
justification is therefore based on the fact that they simply work in practice.
Although we preach “management by fact” it is important to also acknowledge
that in many instances, and through mechanisms that we do not even understand,
some people are able to see through process complexity and identify the
of Experiments and the Taguchi Method
Another way to
determine the influence coefficients would be to vary each of the independent
measures over an appropriate range while holding all of the others constant and
observing its effect on the dependent measure. By doing this we could also
identify their range of independence. But in many instances, the number of
required experiments would be impractical in both time and cost.
Fortunately mathematicians have devised efficient experimental sequences in
which we can vary more than one independent measure at the same time. The
first to do this was Euler (1783) in what are called Latin Squares. Today
such experimental schemes go under the name “Design of Experiments” or DOE.
DOE is a popular tool used by six-sigma practitioners, and facility with it is
usually a prerequisite for black-belt certification.
has attempted to demystify DOE by creating a somewhat simplified procedure, that
although not as mathematically rigorous usually gives an adequate answer.
In doing so, he followed the example set by Walter Shewhart in his pioneering
efforts (c. 1930) to bring statistical techniques to the shop floor environment.
software packages now include a DOE and/or Taguchi Method capability (see for
example Minitab, which is used in several
six sigma initiatives). However, even with current software support, their
use is beyond the capability of most improvement team members and requires
expert assistance (e.g. staff statisticians or six sigma black belts).
Fortunately, the vast majority of improvement efforts do not require this level
of analysis in order to uncover the relevant independent measures.
In a process simulation, we attempt to dynamically reproduce its
important characteristics in a computer model. By “running” the model,
we can understand the complex interrelationships that exist within the process
and test the effect of changes. Simple simulations are often done using
spreadsheets such as Microsoft Excel or Lotus 123. For example, the
columns in the spreadsheet might represent sequential times (e.g. months or
quarters) while the formulas for each period’s cells depend on several results
calculated for an earlier period. Many software packages have specialized
structures that make them particularly suitable for certain types of process
Flowcharting is an essential step in
process improvement. Several of the
current flowcharting software packages also include a simulation capability (I
use Scitor Process)
that is very helpful in finding internal leverage points, particularly when
there are complex process flows and/or random variation is important.
biotech company’s product involved a new medical procedure that required
special approval from the patient’s insurance company for reimbursement. Long
average approval times were having a serious adverse financial impact on the
company. Furthermore, the variation
(standard deviation) in approval times was also unacceptably high.
What could they do to improve this result metric?
There were many theories as to the root cause, most of which involved
problems in someone else’s area. The
process flow was complicated by many alternate paths and frequent “resubmittal
loops.” A simulation of the process (using the Monte Carlo method),
based on probable paths at each process node explained both the average and
variation in approval time and pointed directly at the independent measures
whose improvement would have the greatest impact.
For complex processes that contain time lags as well as subjective
Dynamics modeling can also be very
valuable (here I use Vensim).
System Dynamics modeling has the advantage that it can easily accommodate
both non-linearity and interdependencies, although its successful use does take
considerable modeling skill.
successfully compete in a new market segment, an electronics company needed
major improvements in its delivery performance.
Stratification of late shipment data showed that it was significantly
higher the last week of the quarter. Again,
there were many theories as to why. A
system dynamics model of the entire order fulfillment process (order receipt to
payment by customer) uncovered the answer and it was closely related to a
similar phenomenon known as the end-of-quarter revenue “hockey stick.”
linearity implies that with constant revenues, one-thirteenth of the quarterly
total accumulates each week. In
many organizations, there is a shortfall and the revenue falls below this linear
goal. Miraculously, in the last
week or two of the quarter, a few heroes appear and through their superhuman
efforts the target is achieved and they are appropriately rewarded.
The shape of the resulting weekly cumulative revenue curve is much like
that of a hockey stick, whence its name.
model explained what was happening. The
added revenue at the end of the quarter came from early shipments of large
dollar orders not due until the first few weeks of the following quarter.
With limited capacity, this was at the expense of many small orders due
in that hectic end-of-quarter period. Even
worse, once started, this practice triggered a perpetual cycle where only small
quantity unfilled orders were due for shipment at the start of the next quarter
thus creating that initial revenue shortfall.
The solution: just as the cycle was started by a one-time action, it
needed to be ended the same way -- just stop doing it!
Unfortunately, this results in a temporary sales shortfall that only goes
unnoticed if it is hidden by rapid revenue growth.
By phasing the practice out over several quarters, the adverse revenue
impact was minimized.
the use of a simulation model, it would have been difficult to identify either
the root cause or a palatable corrective action plan.
the Scorecard Metric
If improving a particular results measure is a strategic goal, then
improvement efforts should be focused on the process measures that will have the
highest impact on its improvement. They are usually the process measures
with the largest influence coefficients. What does that imply about choosing scorecard metrics?
Most scorecards that I’ve seen are heavily populated with results
metrics. No doubt this results from
the all too common management attitude: “I don’t care how you do it, just do
it!” I strongly believe that ALL
scorecard metrics must be directly actionable by their owner. Therefore, it’s the underlying process metrics, not the
results metrics that belong on a scorecard.
If the improvement goals for the process metrics are achieved, than we
can be assured that the desired results will follow, assuming we have identified
these drivers correctly.
For example, dieters often tend to focus on their body weight (a results
metric) rather than its independent measures: exercise along with calorie,
protein, fat, and carbohydrate consumption.
Nutritionists now believe that successful diets involve lifestyle (aka
process) changes that act on these independent measures.
Get them right and over time you will achieve and sustain your weight
goal. I wonder to what extent this
results focus explains the statistic that 95% of dieters fail to maintain their
I would argue that results metrics only belong on a scorecard when their
associated process metrics are on two or more subordinate scorecards. In
this case, the job of the owner of the results metric is not its improvement,
but sponsorship of the subordinate scorecards. That sponsorship includes
guidance, monitoring and diagnosis, organizational troubleshooting, resourcing,
communicating, etc. for the individuals and teams responsible for the
subordinate scorecards. There is an important place for results measures,
but it is mainly in the detection step in process control, not improvement.
The Japanese have a saying “Focus on process, not on results.”
In no case is this truer than in the selection of scorecard metrics. The
key to linking strategy to action is not the balanced scorecard itself; it is
this underlying process focus.
The influence coefficients are given by the partial derivative of fj
with res pect to xi.
pect to xi.
for example: Shoji Shiba, Alan Graham, and David Walden, “A New
American TQM: Four Practical Revolutions in Management” Productivity Press Inc., January 1993, ISBN:
1563270323, pg. 221.
Last modified: August 13, 2006