Measuring Metrics January 27, 2010Posted by Chuck Musciano in Leadership, Technology.
It’s a good bet that most people saw all or part of the Super Bowl, either at home or at a Super Bowl party. Suppose, as the game begins, the cable feed goes out and the television goes dark. Amid the howls of protest of those watching, you grab the phone and frantically dial your cable provider. When you finally reach a real person to complain of the interruption, they provide this explanation:
We’re sorry for the interruption, but our records show that this is our first outage in your area in more than two months. Even though we project that the outage will last for at least four hours, that still means that we provided service 99.72% of the time. This easily exceeds our 99.5% target metric for excellent service! We appreciate your business and thank you for your patience as we work to restore your service. Thanks for calling!
Happy now? Of course not. Yet many folks in IT hide behind metrics in a similar fashion.
It is said that anything you measure will improve. That provides a strong incentive to measure system availability, since we’d all like to hit that elusive goal of 100% uptime. But there is a difference between using those metrics to improve our performance and using those metrics to improve our public relations.
Uptime tracking coupled with root cause analysis will help you find and fix many tiny problems that may exist in your environment. Most mature IT shops have long ago figured out how to run their systems without catastrophic failure. We can all hit availability of about 98 or 99 percent on a regular basis. Getting much higher than that, however, involves ferreting out deep issues that may only surface under unusual circumstances. It takes discipline and focus to get there, and metrics can really help.
Metrics should never be used as a defense. When users are affected by an outage, the last thing they want to hear is how well you’ve been doing prior to the problem. It doesn’t matter, and you’re only annoying people that are already upset.
Similarly, metrics should never be used to tell the world what a great job you are doing. When things are running fine, announcing that they are fine just makes you look boastful. Most users just want IT to work, and they don’t want to think about it beyond that. Building on our cable analogy, how would you like the cable company to call once a month to tell you that the service is running just fine?
From the user’s perspective, availability is measured as a binary value: yes or no. There is no average, there is no track record, there is no target goal. You either provide your service or you don’t. Metrics matter internally so that we can improve our service. But they have little bearing on user opinion and can actually do more harm than good. Use them wisely.