Some of the hype about (public) cloud services is that they’ll give you increased levels of uptime. I propose that this is a myth. There’s been lots of headlines about downtime (some being quite brief) for the likes of GMail and BPOS. Last night, storms in Dublin cause electrical issues for the Amazon and Microsoft cloud data centres which led to service outages. Microsoft claims that the Amsterdam data centre will kick in for the Dublin one during an outage but it appears that this did not happen last night. It’s funny because not only are these data centres unbelievably complex, and therefore susceptible to failure, but they can be incredibly simple too, which also can lead to failure.
These data centres may have incredible built-in levels of fault tolerance, but somewhere there is always a single point of failure. I’ve personally seen them hurt two operators in the past 4 years. One was a single point of fault in an electrical supply, right where incoming power met the UPS/generator (I’m no electrician). That one caused an incident that was referred to as “black Friday” when 1/3 of the Irish internet went offline for less than an hour but the exponential traffic backlog caused an issue for a weekend. The other was a central router in a tier IV data centre that decided to crap itself. That one lasted just 10 minutes, but this was supposed to be a “zero single-point-of-failure” tier IV data centre that charged it customers like it was a tier IV data centre. Somewhere deep down, despite all the clustering, despite the redundant diesel generators, despite the international replication, despite the automation, there is usually one or more single points of failure, such as being vulnerable to a lightning strike. We understand that even Google, Microsoft, and Amazon have data centre failures from time to time, now let’s continue dealing with the uptime comparison myth.
How often does your internal Exchange service fail? How often is your internal SharePoint/file services offline? We’re a typical small business with a single Exchange server. It was off briefly last week when a switch died. We were on it straight away and replaced it. Maybe 10 minutes of downtime. Note: I am not involved in day-day internal IT all that much. I would be very happy in saying that in the last 4 months, something like BPOS has had more downtime than our internal Exchange server. Our file server hasn’t had any downtime since I’ve been here.
Go have a look at the downtime history of those public cloud services. Then go look at how often your on-premises services have downtime. I bet your IT folks are doing a better job than you think.
I hate it when I hear people saying that the (public) cloud will increase uptimes of your IT services. To me, it’s a BS myth. There are other reasons to consider the cloud, but I am not willing to agree that uptime is one of them.