This post will discuss a failed project that I was brought into and what I observed and learned from that project. It’s a real scenario that happened years ago, involving an on-premises deployment.
Too Many Chefs Spoil the Broth
Back in 2010, I joined a Dublin-based services company. The stated intention from the MD was that I was to lead a new Microsoft infrastructure consulting team. As it turned out, not a single manager or salesperson in the company believed that there was any work out there in Microsoft infrastructure technology – really! – and it never really got off the ground. But I was brought into one customer, and this post is the story of that engagement.
It was a sunny, cold day when I drove out to the customer’s campus. They are a large state-owned … hmm … transport company. I had enough experience in IT to know that I was going to be dealing with strong personalities and opinions that were not necessarily based on fact. My brief was that I would be attending a meeting with all the participants of a failing Windows Server 2008 R2 Hyper-V and System Center 2008 R2 project. I came in and met a customer representative – the technical lead of the project. He immediately told me that I was to sit in the corner, observe, and not talk to any of the participants from the other services providers. Note that the last word is plural, very plural.
I sat at the far corner of a long board room table and in came everyone. There was the customer IT managers and tech staff, the storage manufacturer (HP – now HPE) and their partner, the networking manufacturer (Cisco) and their partner, a Microsoft Premier Field Engineer, the consultants that implemented Hyper-V, the consultants that implemented the System Center management of Hyper-V, and probably more. Before I continue, I think that the Hyper-V cluster was something like 6 nodes and maybe 50-100 VMs.
Quickly it became evident that this was the first time that any of the participants in the meeting had talked to each other. I should re-phrase that: this was the first time any of the participants in deploying the next-generation IT infrastructure for running the business had been allowed to talk to each other.
- A new W2008 R2 Hyper-V cluster was built. Although the customer was adamant that this was not true (it was), a 2-site cluster was built as a single-site cluster. There was a latent link between the two sites, with no control of VM placement and no third-site witness.
- HP P4000 “Lefthand” module-based iSCSI storage was used without any consideration of persistent iSCSI reservations – a common problem in the W2008 R2 era, where the volumes in the SAN would “disappear” from the cluster because the scale-out of NICs/CSVs/SAN nodes went beyond the limits of W2008 R2 – a result of poor understanding of storage performance and Hyper-V architecture.
- I remember awful problems with backup. DPM was deployed by a consulting firm but configured by a local staff member. He had a nightmare with VSS providers (HP were awful at this) and backup job sizing. It was not helped by the fact that backup was an afterthought in Hyper-V back then, not resolved really until WS2012 when it became software-defined. This combined with how the P4000 worked, the multi-site cluster that wasn’t, and Redirected IO caused all sorts of fun.
- VMs would disappear – yup the security officer insisted that AV was installed on each host and it scanned every folder, including the CSVs. They even resisted change when presented with the MS documentation on scan exceptions that must be configured on Windows Server roles/features, including Hyper-V.
These were just a few of the technical issues; there were many more – inconsistent or missing patching, NIC teaming issues, and so on. I even created a 2-hour presentation based on this project that I (unofficially) called “How to screw up a Hyper-V project”.
My role was to “observe” but I wanted this thing fixed, so I contributed. I remember I spent a lot of time with the MS PFE on the customer site. He was gathering logs on behalf of support and we shared notes. Together we identified many issues/solutions. I remember one day, the customer lead shouted at me and ordered me back to my desk. I was not there “to talk to people but to observe”. The fact that I was one of two people on site that could solve the issues was lost on him.
The customer’s idea of running a project was to divide it up into little boxes and keep everyone from talking to each other. Part of this was how they funded the project – once it went over a certain monetary level it had to be publicly tendered. They had their preferred vendors and they went with them, even if they were not the best people. This created islands of knowledge/expertise and a lack of a vision. The customer thought they could manage this, and they were wrong. Instead, each supplier/vendor did their own thing based on assumptions of what others were doing and based on incorrect information shared by the customer’s technical team. And it all blew up in the customer’s face.
In the end, I heard that the customer blamed the software, the implementors, and everyone else involved in the project but themselves. They scrapped the lot and went with VMware, allegedly.
I think that there were three major lessons to be learned from this project. I know that these lessons apply equally today, no matter what sort of IT project you are doing, including on-premises, hybrid, or pure cloud.
IT enables or breaks the business. That’s something that most boards/owners do not understand. They think of IT as the nerds playing Doom in a basement, with their flashing lights and whirring toys. Obviously, that’s a wrong opinion.
When IT works, it can make the IT faster, more agile, and more competitive. New practices, be they operational or planning, can change IT, but I’ve even read how SCRUM/Agile concepts can even be brought to business planning.
Any significant IT project that will impact the business must start with the business. Someone at the C-Level must own it, be invested in it, and provide the rails or mission statement that directs it. That oversight will force those involved in the project to operate correctly and give them guidance on how to best serve the business.
Taking some large-impact IT project and treating it as a point solution will not work. For example, building an entirely new IT infrastructure without considering the impact of or the dependencies on networking is stupid! You cannot just hand-off systems to different vendors and wish them bon voyage. There must be a unified vision. This technical vision starts with the previously mentioned business vision that guide-rails the technical design. All components that interconnect and have direct/indirect involvements must be designed as a whole.
The worst thing one can do is divvy up IT infrastructure to 5 or 6 vendors and say, you do that, and I will participate in a monthly meeting. That’s not IT! That’s bailing out on your responsibility! IT vendors can play a role, when chosen well. But they need a complete vision to do their job. And if they cannot get that from you, they must be allowed to help you build it. If your IT department’s role is to manage outsourcing contracts and nothing more, you have already failed the business and should just step aside.
A unified delivery must start with internal guidance, sharing the complete vision with all included parties, internal and external, as early as possible. Revealing significant change that you are working on with Vendor A 6 months into a project with Vendor B is a fail. Isolating each of the vendors is a fail. Not giving each vendor clear rules of engagement with orchestrated interaction is a fail. The delivery must be unified under the guidance of the architect who has a complete vision.
Bad IT Starts at The Top
In my years, I’ve done plenty of projects, reviewed many customer’s IT systems, and worked as a part of IT departments. Some of them were completely shocking. A common theme was the CIO/CTO: typically, an accountant or finance officer who was handed the role of supervising IT because … well … it’s just IT and they have a budget to manage. Someone who doesn’t understand IT, hires/keeps bad IT managers, and bad IT managers hire bad IT staff, make bad IT decisions, and run bad IT projects. As the saying goes, sh&t rolls downhill. When these bad projects are happening to you, and you run IT, then you must look at the mirror and stop pointing the finger elsewhere.
And before you say it, yes, there are crap consultants too 😊