Cloud and Increased Uptime – Is it a Myth?

Some of the hype about (public) cloud services is that they’ll give you increased levels of uptime.  I propose that this is a myth.  There’s been lots of headlines about downtime (some being quite brief) for the likes of GMail and BPOS.  Last night, storms in Dublin cause electrical issues for the Amazon and Microsoft cloud data centres which led to service outages.  Microsoft claims that the Amsterdam data centre will kick in for the Dublin one during an outage but it appears that this did not happen last night.  It’s funny because not only are these data centres unbelievably complex, and therefore susceptible to failure, but they can be incredibly simple too, which also can lead to failure.

These data centres may have incredible built-in levels of fault tolerance, but somewhere there is always a single point of failure.  I’ve personally seen them hurt two operators in the past 4 years.  One was a single point of fault in an electrical supply, right where incoming power met the UPS/generator (I’m no electrician).  That one caused an incident that was referred to as “black Friday” when 1/3 of the Irish internet went offline for less than an hour but the exponential traffic backlog caused an issue for a weekend.  The other was a central router in a tier IV data centre that decided to crap itself.  That one lasted just 10 minutes, but this was supposed to be a “zero single-point-of-failure” tier IV data centre that charged it customers like it was a tier IV data centre.  Somewhere deep down, despite all the clustering, despite the redundant diesel generators, despite the international replication, despite the automation, there is usually one or more single points of failure, such as being vulnerable to a lightning strike.  We understand that even Google, Microsoft, and Amazon have data centre failures from time to time, now let’s continue dealing with the uptime comparison myth.

How often does your internal Exchange service fail?  How often is your internal SharePoint/file services offline?  We’re a typical small business with a single Exchange server.  It was off briefly last week when a switch died.  We were on it straight away and replaced it.  Maybe 10 minutes of downtime.  Note: I am not involved in day-day internal IT all that much.  I would be very happy in saying that in the last 4 months, something like BPOS has had more downtime than our internal Exchange server.  Our file server hasn’t had any downtime since I’ve been here.

Go have a look at the downtime history of those public cloud services.  Then go look at how often your on-premises services have downtime.  I bet your IT folks are doing a better job than you think.

I hate it when I hear people saying that the (public) cloud will increase uptimes of your IT services.  To me, it’s a BS myth.  There are other reasons to consider the cloud, but I am not willing to agree that uptime is one of them. 

Technorati Tags:

3 thoughts on “Cloud and Increased Uptime – Is it a Myth?”

  1. When stripped of everything else, IT is risk assesment, mitigation and management. Depending on your needs and your current level of achievement with the above you could do better, equal or worse in the cloud. If you’re doing bad on uptime now the cloud might help if the reason for that is fixable by the cloud but that’s not per definition the case. Walk carefully and evaluate every step you make in the cloud. In itself it is not the destination, it is but another tool in the box to get the job done.

    One of the painpoints of internal IT is visibility and appreciation. Often they’re only visible when things go wrong and the attention they get then is not appreciation. It done right does matter. IT without a vision doesn’t (Carr was right about that). Like with fire people/companies need to “get burned” to learn the real value of something and even then after that they need to re-evaluate and re-assess once in a while. There are no golden bulets, magic solutions. Good IT does not happen and is not bought. You build it & you fix it. The aim is to build it as good as you can, which will help when something needs a fix. And it will! No matter how charmed the lives we lead, sooner or later bad things happen to all of our stuff and than the quality of your people and gear make te difference. Whether that is an 100% on premises or a 100% cloud solution. Someone has to be there and give d*.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.