jump to navigation

Skin In The Game November 9, 2009

Posted by Chuck Musciano in Leadership, Technology.
Tags: , ,
5 comments

With clouds on everyone’s mind these days, more and more CIOs are beginning to consider cloud-based services.  There are still a lot of concerns with this, depending on the system or service you seek to move to the cloud.  In particular, what happens when the cloud goes down?

When negotiating with cloud service providers, the conversation inevitably turns to service level agreements.  Typically, a vendor will promise some level of availability, with some prorated refund if the service is unavailable for an extended period of time.  Thus, if a service is unavailable for more than 24 hours, you might get one-thirtieth of your monthly service fee refunded.  Less than twenty-four hours? You might get nothing at all.

Does anyone, except for the service provider, think this is a good deal?

The cost of an outage is not the actual cost of the underlying service.  The cost of an outage is the value of the business impact you suffer.  If your e-commerce platform goes down for an hour, costing you $100,000 in sales, you should get $100,000 from your service provider.  Needless to say, when you mention this to potential providers, they tend to get a bit defensive.  “You can’t expect us to fully reimburse your lost business, can you?”  Well, yes.  Yes, you can.

If your service is good enough for a client to bet their business on, they’d expect you to have some skin in the game.  If you aren’t willing to put money on the table that says you are as good as you claim to be, why should they be doing business with you? Does anyone want to be the CIO that, while explaining a multi-million dollar outage to his board, concludes with “but we got a check for $1,200!”

What is baffling is that this would be an easy guarantee for a qualified vendor to make.  Hedging risk against failure is an actuarial problem.  Why wouldn’t a vendor purchase an insurance policy against just such an occurrence, in an amount that would cover the exposed risk up to a certain point?  Roll the insurance costs into the service fee and proudly market your “Million Dollar Guarantee” far and wide.  I suspect you’d get some business.  I also suspect that you’d get really good at providing exceptional service.

A lot of CIOs are naturally reluctant to deal with service providers who refuse to share the risk equally.  Vendors who find a way to put their money where their mouth is will gain the respect, and business, of discriminating CIOs.

[tweetmeme source=”EffectiveCIO” alias=”http://bit.ly/cio129″ only_single=false]

Advertisements

Shaking The Mouse July 13, 2009

Posted by Chuck Musciano in Leadership.
Tags: , ,
6 comments

Back in the mid-80s, optical mice made their first appearance.  Unlike their roller-ball brethren, optical mice used light reflected off a special mouse pad to detect mouse movement.  They were cutting-edge and fun to play with.

In my research group, our Sun workstations used these optical mice.  One day, a sales rep was in our lab demonstrating some software package, when the mouse stopped responding.  Nonplussed, she held the mouse upside down, shook it, and resumed the demo.  Our slack-jawed stares caught her attention, and she explained how the mouse “got clogged” every now and then, and shaking it “cleared the mouse” and helped it work again.

Now, it was true that the Sun optical mouse driver did hang every so often, but it was due to a small input buffer being overrun with too many mouse events.  If you waited a few seconds, the buffer would drain and the mouse would recover, no shaking necessary.  This woman, however, believed that mouse was clogged and that shaking was required to fix it.  It clearly worked: every time she shook the mouse, it started working again.

Determining the root cause of a problem and applying the right solution is a crucial skill, whether you are debugging hardware or solving personnel issues.  Our brains are so desperate to correlate cause and effect that we are easily convinced that some action, no matter how odd, really can solve a problem.  Even worse, as soon as we find what seems to be the solution, we stop looking for the real problem.

In the case of the mouse, some simple analysis of what could actually clog a device with no moving parts might lead you to conclude that something other than shaking was at the heart of the solution.  For larger problems in more complicated systems, it can take weeks and months of digging to find the true root cause.  But if we do not find the real problem, we are doomed to experience it again, compounded with the frustration that our “solution” is somehow not working.  A technical problem is not fully solved until you can connect the dots from the very first event in the failure to the very last element of the repair.

People problems are far harder to debug.  Unlike computers, people are non-deterministic and prone to sudden erratic behavior.  For many issues, we may never know why someone really made a particular mistake or acted in a certain way.  In many cases, the behavior is not repeatable, so our solution cannot be fully tested.  Nonetheless, we are duty-bound to explore as many avenues as possible to make sure we understand why people act in certain ways and how our own behavior can affect others.

Technical or personal, it can be tempting to grab onto the first potential solution and stick with it.  It’s certainly easier than digging and digging to prove that you solved the problem.  But how many times are we left implementing bad ideas or half-baked systems because we didn’t dig as hard as we should have?  Are you really getting to the heart of every issue in your world, or are you just shaking the mouse?

Absolute Guy In A Relative World May 18, 2009

Posted by Chuck Musciano in Leadership.
Tags: , , ,
1 comment so far

I like absolutes. Yes or no. Black or white. Right or wrong. No room for debate or equivocating; the answer is patently obvious to all concerned.

This is why computing is so appealing to me.  Strip away all the layers of abstraction, and computing is about getting a sequence of 1s and 0s in the right order.  If you get the right order, it’s correct.  Drop or flip a bit, and it’s not.  You may think you’re reading this blog; in fact, you are viewing an abstract representation of several billion bits arranged to appear as text on your screen.  If even one bit were wrong, these words would not be correct.  Simple: right or wrong.

Leadership is rarely about such absolutes.  When dealing with people and plans, there are a million shades of gray that must be weighed and blended to reach decisions.  From strategic planning to tactical choices, we have to function within a spectrum of relative values that are open to interpretation.

In many cases, relative judgments make life easier.  We often talk about being “good enough,” about applying the 80/20 rule, about knowing when to quit and move on to the next project.  In these cases, there is often a law of diminishing returns that make achieving an absolute result more expensive than the benefit derived.  Knowing when to stop is an important aspect of leadership, too.

With so much of our world based on a relative scale, it can be tempting to let everything shift to a relative scale.  I think it’s important to remember that some things are never relative.  Things like ethics, morals, trust, integrity, and reputation should never be viewed on a relative scale.  We should hold ourselves to absolute standards and never relax in our desire to achieve an absolute result in those areas.  Note that this doesn’t mean that we won’t have lapses, but those lapses can take a long time to overcome.  A tarnished reputation may take years to be restored, but the standard of a “good reputation” should not change; we simply need to work harder to achieve that standard.

I also have certain things, related to my IT background, that I always judge on an absolute scale. Data integrity is not a relative issue for me.  Data is either right or wrong, pure or corrupt.  Systems are either up or down, available or not.  Software features either work, or they don’t.  I tend to drive my team crazy with this stuff, but that doesn’t deter me from getting on my soapbox every now and again.

I find that I get a lot of reactions when I express this view.  Some people, it seems, will gauge almost anything on a relative scale.  There seems to be a general aversion to absolute anything. What do you hold to an absolute scale?  What do you shift to relative judgment?  Does it matter?

Trust But Verify January 29, 2008

Posted by Chuck Musciano in Leadership.
Tags: ,
1 comment so far

I once worked in a company where the President had, in a former life, been a nuclear submarine commander. Needless to say, he had many fascinating stories. He also had an uncanny knack for uncovering the one thing that you had missed, or forgotten, or simply done wrong. When there was some sort of operational failure, he would methodically dig and dig until he uncovered the root cause. He then expected you to fix it, and then to implement some plan that would prevent the problem from ever happening again. Invariably, as he went about one of these rock-turning expeditions, he would repeat that fundamental rule: “You get what you inspect, not what you expect.”

A simple rule. Don’t assume that things are done correctly. Confirm, with your own eyes, that they really are right. Ask pointed questions. Look at logs. Pull on wires. Click things that aren’t supposed to be clicked. Make sure it really works, the way you understand it to work.

This makes a lot of sense when you are living in a submarine. One mistake in that environment, and hundreds of men die. Make a big mistake in a nuclear submarine and millions of people could die.

Working for someone with this philosophy can be frustrating. When my boss would come digging into my world, I would cringe and wait for the inevitable “aha!” moment. Over time I realized that the best way to avoid such moments was to adopt a similar attitude and find the problems before he did. I started doing my own inspecting, finding and fixing things before they became visible problems. Our operational metrics really improved. And the lesson had been passed on to another generation.

Obviously, you cannot inspect everything, all the time. But everything can be inspected, at some point, by someone. As a leader, you need to engage in enough inspection to ensure that your people, and their people, will be inspecting everything else. As they inspect more, and you come to trust their inspection, you’ll find yourself inspecting less. Much to their relief, I might add. But don’t ever stop inspecting entirely. The possibility of inspection is often enough to ensure appropriate attention to detail.

Unlike a nuclear submarine commander, most of us do not operate in a world where lives hang in the balance based on our operational decisions. But businesses do, and with them come customers, and employees, and shareholders, all relying on a small group of operations people to get it right, every time. Are you willing to bet your job on what you expect, or are you going to take the time to inspect your world?

I expect you will.

Perfect? Or Good Enough? January 26, 2008

Posted by Chuck Musciano in Leadership.
Tags: , ,
add a comment

I am, I fear, driving my team crazy.  One moment, I am demanding perfection, unwilling to accept even the slightest error.  The next moment, I’m encouraging them to experiment, to make mistakes, to get results rapidly to our customers.  It all makes sense to me, but I don’t know that they see the method to my (apparent) madness.

In reality, IS organizations suffer from a split personality.  Half of our job is to deliver any and all computing services, 24 by 7, without fail, no matter what.  The other half is to innovate, find clever new solutions to problems, and get those soutions out to people as quickly as possible.  The former role demands unrelenting perfection; the latter will tolerate (and even requires) failure and iteration.

Operations: All Or Nothing

Running computer operations is easy.  Make it work, never let it fail, handle every contingency, and keep all the users happy.  Like electricity and phone service, users view computer services like a utility and have every expectation that they will just be there, all the time.

Users have every right to this level of service.  There is nothing magic about computer operations.  The discipline of reliable computing is a well-understood field.  The processes and best practices that support reliable computer service delivery are not difficult to master.  If you elect to be in that business, you do so knowing what is expected and what it will take.  If you are not up to the task, don’t accept the challenge.

For those who make operations their life, there is great satisfaction in achieving near-perfection on a regular basis.  A well-run data center is a thing of beauty, and the practices and procedures provide a comfortable, controlled world in which to operate.  Don’t misunderstand: it is hard, thankless work, and the only time you get any attention is when things go wrong.  People standing on the bow of the boat, wind in their hair, rarely appreciate the efforts of those shoveling coal down below.  Satisfaction in operations almost always comes from within.

I find it frustrating when we have an operational failure.  Almost every single operational breakdown can be traced to failure to adhere to best practices, be they in communicating, alerting, responding, preparing, or reacting.  Even when things just explode, there is no excuse for not having a plan that will handle such a rare event.

Like the end users, I expect perfection on the operations side of this business.  But I also deeply appreciate the effort that it takes to achieve such perfection, and will do everything I can to ensure that my operations staff has the tools needed to achieve perfection.

Innovation: Good Enough

The other half of IS is about building new things and getting them in front of people.  In this world, creativity, innovation, and experimentation carry the day. Perfection is often the enemy.

An IS staff drilled in operational perfection will struggle when they try to switch gears and innovate.  Schedules slip and projects collapse as teams work to make every new tool perfect.  Users get frustrated as they wait for solutions.  Instead of delivering something that is good enough, we waste too much time trying to achieve perfection.

In fact, many users are happy to get something that is good enough.  Usually, a “good enough” new tool is such a big improvement over the old process or tool that users are glad to get it.  If you build good relationships with users, they will come to trust that you will stick with them to constantly improve that “good enough” solution over time, delivering continuous improvements over the long term.

The other benefit to “good enough” is that it lets you see where you fell short and how you can improve things with good user input.  It is not possible to build a perfect tool on the first try.  You need to iterate, to fail, to rebuild things to make them better and closer to perfect.  You cannot design perfection right from the start.

I strongly encourage my development teams to try new things, make mistakes, and learn from them.  The only bad mistake is one that you repeat.  If you learn from a mistake and move forward from it, everyone benefits in the long run.  The best systems evolve over time, incorporate all those lessons learned, and leverage those mistakes to improve in unforeseen ways.

What Now?

As IS Leaders, we need to enforce both of these disciplines in our teams.  We must deliver operational perfection and innovative solutions.  We have to help our teams see the difference, give them the tools to achieve perfection where it is needed, and the latitude to fail and recover where that make sense.  Achieve both sides of this balance, and your users will truly be well-served.