Skin In The Game November 9, 2009
Posted by Chuck Musciano in Leadership, Technology.Tags: Cloud Computing, Customer Service, Operations
5 comments
With clouds on everyone’s mind these days, more and more CIOs are beginning to consider cloud-based services. There are still a lot of concerns with this, depending on the system or service you seek to move to the cloud. In particular, what happens when the cloud goes down?
When negotiating with cloud service providers, the conversation inevitably turns to service level agreements. Typically, a vendor will promise some level of availability, with some prorated refund if the service is unavailable for an extended period of time. Thus, if a service is unavailable for more than 24 hours, you might get one-thirtieth of your monthly service fee refunded. Less than twenty-four hours? You might get nothing at all.
Does anyone, except for the service provider, think this is a good deal?
The cost of an outage is not the actual cost of the underlying service. The cost of an outage is the value of the business impact you suffer. If your e-commerce platform goes down for an hour, costing you $100,000 in sales, you should get $100,000 from your service provider. Needless to say, when you mention this to potential providers, they tend to get a bit defensive. “You can’t expect us to fully reimburse your lost business, can you?” Well, yes. Yes, you can.
If your service is good enough for a client to bet their business on, they’d expect you to have some skin in the game. If you aren’t willing to put money on the table that says you are as good as you claim to be, why should they be doing business with you? Does anyone want to be the CIO that, while explaining a multi-million dollar outage to his board, concludes with “but we got a check for $1,200!”
What is baffling is that this would be an easy guarantee for a qualified vendor to make. Hedging risk against failure is an actuarial problem. Why wouldn’t a vendor purchase an insurance policy against just such an occurrence, in an amount that would cover the exposed risk up to a certain point? Roll the insurance costs into the service fee and proudly market your “Million Dollar Guarantee” far and wide. I suspect you’d get some business. I also suspect that you’d get really good at providing exceptional service.
A lot of CIOs are naturally reluctant to deal with service providers who refuse to share the risk equally. Vendors who find a way to put their money where their mouth is will gain the respect, and business, of discriminating CIOs.
[tweetmeme source=”EffectiveCIO” alias=”http://bit.ly/cio129″ only_single=false]
Shaking The Mouse July 13, 2009
Posted by Chuck Musciano in Leadership.Tags: Coaching, Operations, Technology
6 comments
Back in the mid-80s, optical mice made their first appearance. Unlike their roller-ball brethren, optical mice used light reflected off a special mouse pad to detect mouse movement. They were cutting-edge and fun to play with.
In my research group, our Sun workstations used these optical mice. One day, a sales rep was in our lab demonstrating some software package, when the mouse stopped responding. Nonplussed, she held the mouse upside down, shook it, and resumed the demo. Our slack-jawed stares caught her attention, and she explained how the mouse “got clogged” every now and then, and shaking it “cleared the mouse” and helped it work again.
Now, it was true that the Sun optical mouse driver did hang every so often, but it was due to a small input buffer being overrun with too many mouse events. If you waited a few seconds, the buffer would drain and the mouse would recover, no shaking necessary. This woman, however, believed that mouse was clogged and that shaking was required to fix it. It clearly worked: every time she shook the mouse, it started working again.
Determining the root cause of a problem and applying the right solution is a crucial skill, whether you are debugging hardware or solving personnel issues. Our brains are so desperate to correlate cause and effect that we are easily convinced that some action, no matter how odd, really can solve a problem. Even worse, as soon as we find what seems to be the solution, we stop looking for the real problem.
In the case of the mouse, some simple analysis of what could actually clog a device with no moving parts might lead you to conclude that something other than shaking was at the heart of the solution. For larger problems in more complicated systems, it can take weeks and months of digging to find the true root cause. But if we do not find the real problem, we are doomed to experience it again, compounded with the frustration that our “solution” is somehow not working. A technical problem is not fully solved until you can connect the dots from the very first event in the failure to the very last element of the repair.
People problems are far harder to debug. Unlike computers, people are non-deterministic and prone to sudden erratic behavior. For many issues, we may never know why someone really made a particular mistake or acted in a certain way. In many cases, the behavior is not repeatable, so our solution cannot be fully tested. Nonetheless, we are duty-bound to explore as many avenues as possible to make sure we understand why people act in certain ways and how our own behavior can affect others.
Technical or personal, it can be tempting to grab onto the first potential solution and stick with it. It’s certainly easier than digging and digging to prove that you solved the problem. But how many times are we left implementing bad ideas or half-baked systems because we didn’t dig as hard as we should have? Are you really getting to the heart of every issue in your world, or are you just shaking the mouse?
Trust But Verify January 29, 2008
Posted by Chuck Musciano in Leadership.Tags: Best Of 2008, Operations
1 comment so far
I once worked in a company where the President had, in a former life, been a nuclear submarine commander. Needless to say, he had many fascinating stories. He also had an uncanny knack for uncovering the one thing that you had missed, or forgotten, or simply done wrong. When there was some sort of operational failure, he would methodically dig and dig until he uncovered the root cause. He then expected you to fix it, and then to implement some plan that would prevent the problem from ever happening again. Invariably, as he went about one of these rock-turning expeditions, he would repeat that fundamental rule: “You get what you inspect, not what you expect.”
A simple rule. Don’t assume that things are done correctly. Confirm, with your own eyes, that they really are right. Ask pointed questions. Look at logs. Pull on wires. Click things that aren’t supposed to be clicked. Make sure it really works, the way you understand it to work.
This makes a lot of sense when you are living in a submarine. One mistake in that environment, and hundreds of men die. Make a big mistake in a nuclear submarine and millions of people could die.
Working for someone with this philosophy can be frustrating. When my boss would come digging into my world, I would cringe and wait for the inevitable “aha!” moment. Over time I realized that the best way to avoid such moments was to adopt a similar attitude and find the problems before he did. I started doing my own inspecting, finding and fixing things before they became visible problems. Our operational metrics really improved. And the lesson had been passed on to another generation.
Obviously, you cannot inspect everything, all the time. But everything can be inspected, at some point, by someone. As a leader, you need to engage in enough inspection to ensure that your people, and their people, will be inspecting everything else. As they inspect more, and you come to trust their inspection, you’ll find yourself inspecting less. Much to their relief, I might add. But don’t ever stop inspecting entirely. The possibility of inspection is often enough to ensure appropriate attention to detail.
Unlike a nuclear submarine commander, most of us do not operate in a world where lives hang in the balance based on our operational decisions. But businesses do, and with them come customers, and employees, and shareholders, all relying on a small group of operations people to get it right, every time. Are you willing to bet your job on what you expect, or are you going to take the time to inspect your world?
I expect you will.
Perfect? Or Good Enough? January 26, 2008
Posted by Chuck Musciano in Leadership.Tags: Best Of 2008, Innovation, Operations
add a comment
I am, I fear, driving my team crazy. One moment, I am demanding perfection, unwilling to accept even the slightest error. The next moment, I’m encouraging them to experiment, to make mistakes, to get results rapidly to our customers. It all makes sense to me, but I don’t know that they see the method to my (apparent) madness.
In reality, IS organizations suffer from a split personality. Half of our job is to deliver any and all computing services, 24 by 7, without fail, no matter what. The other half is to innovate, find clever new solutions to problems, and get those soutions out to people as quickly as possible. The former role demands unrelenting perfection; the latter will tolerate (and even requires) failure and iteration.
Operations: All Or Nothing
Running computer operations is easy. Make it work, never let it fail, handle every contingency, and keep all the users happy. Like electricity and phone service, users view computer services like a utility and have every expectation that they will just be there, all the time.
Users have every right to this level of service. There is nothing magic about computer operations. The discipline of reliable computing is a well-understood field. The processes and best practices that support reliable computer service delivery are not difficult to master. If you elect to be in that business, you do so knowing what is expected and what it will take. If you are not up to the task, don’t accept the challenge.
For those who make operations their life, there is great satisfaction in achieving near-perfection on a regular basis. A well-run data center is a thing of beauty, and the practices and procedures provide a comfortable, controlled world in which to operate. Don’t misunderstand: it is hard, thankless work, and the only time you get any attention is when things go wrong. People standing on the bow of the boat, wind in their hair, rarely appreciate the efforts of those shoveling coal down below. Satisfaction in operations almost always comes from within.
I find it frustrating when we have an operational failure. Almost every single operational breakdown can be traced to failure to adhere to best practices, be they in communicating, alerting, responding, preparing, or reacting. Even when things just explode, there is no excuse for not having a plan that will handle such a rare event.
Like the end users, I expect perfection on the operations side of this business. But I also deeply appreciate the effort that it takes to achieve such perfection, and will do everything I can to ensure that my operations staff has the tools needed to achieve perfection.
Innovation: Good Enough
The other half of IS is about building new things and getting them in front of people. In this world, creativity, innovation, and experimentation carry the day. Perfection is often the enemy.
An IS staff drilled in operational perfection will struggle when they try to switch gears and innovate. Schedules slip and projects collapse as teams work to make every new tool perfect. Users get frustrated as they wait for solutions. Instead of delivering something that is good enough, we waste too much time trying to achieve perfection.
In fact, many users are happy to get something that is good enough. Usually, a “good enough” new tool is such a big improvement over the old process or tool that users are glad to get it. If you build good relationships with users, they will come to trust that you will stick with them to constantly improve that “good enough” solution over time, delivering continuous improvements over the long term.
The other benefit to “good enough” is that it lets you see where you fell short and how you can improve things with good user input. It is not possible to build a perfect tool on the first try. You need to iterate, to fail, to rebuild things to make them better and closer to perfect. You cannot design perfection right from the start.
I strongly encourage my development teams to try new things, make mistakes, and learn from them. The only bad mistake is one that you repeat. If you learn from a mistake and move forward from it, everyone benefits in the long run. The best systems evolve over time, incorporate all those lessons learned, and leverage those mistakes to improve in unforeseen ways.
What Now?
As IS Leaders, we need to enforce both of these disciplines in our teams. We must deliver operational perfection and innovative solutions. We have to help our teams see the difference, give them the tools to achieve perfection where it is needed, and the latitude to fail and recover where that make sense. Achieve both sides of this balance, and your users will truly be well-served.