jump to navigation

Measuring Metrics January 27, 2010

Posted by Chuck Musciano in Leadership, Technology.

It’s a good bet that most people saw all or part of the Super Bowl, either at home or at a Super Bowl party. Suppose, as the game begins, the cable feed goes out and the television goes dark.  Amid the howls of protest of those watching, you grab the phone and frantically dial your cable provider.  When you finally reach a real person to complain of the interruption, they provide this explanation:

We’re sorry for the interruption, but our records show that this is our first outage in your area in more than two months.  Even though we project that the outage will last for at least four hours, that still means that we provided service 99.72% of the time. This easily exceeds our 99.5% target metric for excellent service! We appreciate your business and thank you for your patience as we work to restore your service. Thanks for calling!

Happy now? Of course not. Yet many folks in IT hide behind metrics in a similar fashion.

It is said that anything you measure will improve.  That provides a strong incentive to measure system availability, since we’d all like to hit that elusive goal of 100% uptime.  But there is a difference between using those metrics to improve our performance and using those metrics to improve our public relations.

Uptime tracking coupled with root cause analysis will help you find and fix many tiny problems that may exist in your environment.  Most mature IT shops have long ago figured out how to run their systems without catastrophic failure.  We can all hit availability of about 98 or 99 percent on a regular basis.  Getting much higher than that, however, involves ferreting out deep issues that may only surface under unusual circumstances. It takes discipline and focus to get there, and metrics can really help.

Metrics should never be used as a defense.  When users are affected by an outage, the last thing they want to hear is how well you’ve been doing prior to the problem.  It doesn’t matter, and you’re only annoying people that are already upset.

Similarly, metrics should never be used to tell the world what a great job you are doing.  When things are running fine, announcing that they are fine just makes you look boastful.  Most users just want IT to work, and they don’t want to think about it beyond that.  Building on our cable analogy, how would you like the cable company to call once a month to tell you that the service is running just fine?

From the user’s perspective, availability is measured as a binary value: yes or no.  There is no average, there is no track record, there is no target goal.  You either provide your service or you don’t.  Metrics matter internally so that we can improve our service.  But they have little bearing on user opinion and can actually do more harm than good.  Use them wisely.

[tweetmeme source=”EffectiveCIO” alias=”http://j.mp/cio157″ only_single=false]


1. Wally Bock - January 28, 2010

Many years ago IBM (I think) ran a full page add with 100 clouds pictured. All were white but one, which was dark. The headline: 99 % reliability doesn’t matter if you’re the 1

2. Robert Martin - January 30, 2010

Really great post! Keep them coming!

3. Debbie - February 7, 2010

Really enjoyed this post! I see it used every day. What can I do….to ensure my ‘control’ does not take on this ‘habit’?

Real improvement involves integration and understanding, but not at the expense of user service. I never saw the ad Wally Bock spoke of…but it really put the spotlight on IBMs focus on real customer service. Too many groups serve only the top few percentage to keep them happy and ignore the ‘workers’ of the organization.

4. Steven M. Smith - February 12, 2010

Hi Chuck,

Re: Getting much higher than that (98 or 99 percent on a regular basis), however, involves ferreting out deep issues that may only surface under unusual circumstances. It takes discipline and focus to get there, and metrics can really help.

I agree 100%.

Re: Metrics should never be used as a defense.

I believe it’s important to empathize with the IT client so I agree with your points when talking to clients (terrific analogy BTW).

I think, however, it’s important to use measurements in discussions with the clients’ management to put the availability trade-off into perspective. For instance, for your investment of $X, we can limit unavailability to 1% plus or minus 0.5%, which translates to 2,500 to 7,500 minutes of application unavailability each year. And saying something like, “here is an example (the measurements) on how that looked last year.” If that’s unacceptable, then for an additional investment of $Y we can reduce unavailability by minutes per year.

Availability is truly a trade-off proposition. If the clients’ management is absent from deciding what an acceptable return and for the money it’s willing to pay, then IT management is making that choice for the business. And that’s larceny.

Re: metrics should never be used to tell the world what a great job you are doing.

I like this insight. I think, however, the world is used to organizations constantly bragging and the receivers hear the announcements as noise.

I believe strongly that when metrics show the IT organization has met its goals or on its path to meeting its goals, a celebration is warranted. Why? Constantly signals to every member of the organization the importance of availability.

Celebrate both prevention (efforts that prevent previous outages from reoccurring or mitigate the risks of potential outages) and reaction (efforts the reduced the duration of outages). But don’t over emphasize reaction, which is the typical action. Too often IT organizations only celebrate the reactions, which, in my experience, creates unhealthy internal dynamics.

Wishing you zero outages. Best regards, -Steve

5. technologythatworksblog - November 18, 2016

If you cannot measure it, why do it? This is often said… but I think we also need to have a qualitative and an intuitive approach simultaneously…

Connections are not just measured in number, for example, quality is more important. If you don’t believe me, try talking on the phone with a bad connection… yes you are connected but the conversation is awful.

Geoff Talbot
SharePoint for Intranets

Leave a Reply to technologythatworksblog Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: