Hyper-V How To: Balance VM I/O

This article explains how to balance I/O between VM’s on hosts that are saturated.  It’s a last resort action to resolve the issue.  Ideally you’ll be balancing your workload across a cluster, e.g. OpsMgr detects a peak load, your storage PRO tips detect the culprit and VMM 2008 balances the workload across the Hyper-V cluster.  However, that won’t work if you don’t have a cluster so using these registry edits might be necessary.

Lots of Hyper-V Updates

Jose Barreto has listed all of the publicly available updates for Hyper-V including the RTM.  You could selectively apply them if they are applicable to your servers.  For future builds, here’s what I’m going to do:

  • I’ve got a WDS captured WIM image of my Hyper-V build.  This has everything done before enabling the Hyper-V role.  It includes the RTM release of Hyper-V.
  • I’ll download the updates.
  • I’ll slipstream the updates to my WIM build and to the WIM’s from the installation media that are on my WDS server.  That future proofs any new builds.

The result will be that any newly built Hyper-V hosts will have all of the updates in place.

What’s an SLA Worth?

Imagine a data centre service provider who offers a 100% availability SLA.  That’s pretty impressive.  Most of us aim for the five 9’s, e.g. 99.999%.  99.9% is even good with no more than 60 minutes outage in a month.  You’d have to be pretty sure of yourself to offer a 100% SLA.

For a data centre, that is actually achievable but you cannot cut any corners.  I’m lucky enough to work with a service provider who can live up to their claim of a 100% SLA.  They invested heavily in the building, people and processes to have a Tier IV facility with no single points of failure.  They haven’t had an outage since they launched in 2001.  As a result their customers can build their brand name on this.  We’re able to say that our service extends their philosophy and our service hasn’t had an outage since we launched earlier this year.

But I know of others who do claim a 100% SLA.  In fact, one of them is having their 4th major outage in 2.5 years …. right now as I type this – if you’re in Ireland it’s not hard to guess who I’m talking about.  I’m not even counting the little mishaps that they have on a recurring basis … some we know about through web forums, blogs, word of mouth, etc.  I’m not going to poke fun at them.  There’s some good people in there who’ll be stressed out right now through no fault of theirs.

However, we do have to look at the people responsible for the SLA being offered.  Their clients are depending on that SLA.  They’ve reflected that to their customers, e.g. if my hosting provider gives me a 99.99% SLA then I can pass that on to my clients.  However, if my data centre is up and down like the proverbial w***e’s knickers then I look like mud to my clients, whether they’re internal or external clients.

If you’re looking at a hosting service provider then please check out their SLA. If they claim 100% then that’s very audacious.  I’m not saying it’s impossible, just very hard.  Look at their track record and see if they live up to it.  If not, then can you believe other fantastic claims about senior staff on site 24*7, huge bandwidth, "everything is possible", etc.  Here’s what I’d do:

  • Ask for the SLA.  Check out blogs.  Web masters are always quick to point out faults so their forums are a good place to check.  If they have a status site then check it.  See if the explanation for a fault stays consistent.  If not, don’t deal with them.
  • The 2am Test: Drive up to the data centre and knock on the door.  If anyone is even in, ask to speak to the senior staff the company claims is on site 24*7.  If they lie, walk away from the sales negotiation.
  • Ask for proof of certification claims, e.g. I’m an MCSE.  I have an ID number that people can check out for proof of my claim.  I’d say the same applies for CCIE’s.  If they lie, walk out.
  • The real kicker: ask them if they do XYZ.  Let XYZ = some thing you’ve just made up off of the top of your head.  If they say yes then walk out.

Am I sounding harsh?  Honestly, no.  We all expect sales people to stretch a little.  But taking things this far is too much.  Imagine if this stings you?  It shuts down your business.  If you’re a reseller your clients will blame you, not the hosting company.  You’ve got a business reputation to maintain.  If you’re in the mission critical world then outages such as this are not tolerable.  They can possible kill people.

If you’ve suffered a hosting power outage again in this 365 day period and it’s affected your business then check out an alternative.

EDIT:

The hosting company having the outage finally came back online after 2.5 hours – sort of.  Some servers are still not responding.  Assuming this is their only outage this year (and it wasn’t) that’d give them a 99.97% uptime.

EDIT #2:

I just checked that company’s web site uptime (they host it themselves in their data centre).  It’s available 99.87% of the time over the last 2 years and they had 97.74% uptime in September 2008.  Not quite 100% or even 99.9%.

Installing OpsMgr 2007 Reporting Successfully

One of the downsides of Microsoft System Center is that you become an accidental SQL administrator.  One of the powerful aspects of Operations Manager 2007 is Reporting.  Having a year of data available for doing historical health and performance reports is very handy, e.g. SLA reporting, baseline performance analysis comparison, fault trend analysis.

Installing Reporting is tricky and not terribly well documented – is anything from MS these days – BLOGS AREN’T DOCUMENTATION! 

This blog post (sigh!) goes through a checklist of what to look for when installing OpsMgr 2007 Reporting.

Installed Hyper-V Server 2008

I freed up a server after using VMM 2008 P2V so that gave me a machine to play with for a short while before it’s sent back into production.  I decided to installed Hyper-V Server 2008 to see if this free product would be something worth considering for us in limited production roles.

The install is easy and quick.  It’s just like installing a Core installation – heck, that’s what it is really.  When you log in you get two command prompts instead of one.  The second is a CMD based wizard.  That makes configuring the box a breeze.  I really hope the Core guys look at this, copy it and make it available to the standard Core installation as a free download (or in SP2).

We run HP machines so that requires installing the Proliant Support Pack for hardware management. Unfortunately, that’s where we hit the problems I had before with Core installation.  HP still haven’t caught up with Core.  We need a way to use their management tools from the command prompt – NO I DON’T WANT TO LEARN ANOTHER SCRIPTING LANGUAGE.

When I get a chance I want to try manage this box using System Center OpsMgr and VMM 2008 to see what happens.

MS Live: Their Own Worst Enemies

I don’t know what’s been going through their heads lately at Microsoft Live.

  • The first thing was that the Statistics functionality would randomly fail.  You’d click on stats and get a dead link.  Hit refresh enough times and it would load.  That seems to be better since MS deployed an update to the templates/layouts.
  • The Live DNS servers fail intermittently.  My blog goes offline due to DNS failures at Live every now and then.  It’s happened twice in the last couple of weeks that I know of.  I hope the DNS service is geographically dispersed.
  • The worst is the new buttons for managing the sites.  They’re supposed to be drop down buttons where a menu appears and you select an option.  The menu does appear but the button also is a link which loads another page before you can select a menu option.  3/4 of the administrator functionality in unavailable because of this.  Someone needs to crack a few developers heads together over this one.

It’s pretty annoying.  These are very simple things to sort out.  I know it’s a free service but if MS wants to generate revenue from advertising then they need to make the service more easy and reliable than options from alternatives such as WordPress.

Managing Workgroup Hosts Using Virtual Machine Manager 2008

I could see from my blog statistics that there was a lot of interest in managing workgroup hosts using VMM 2008.  The hosts that I manage are in our management network forest so that makes things pretty simple.  However, I just hit a scenario where I might prefer to run workgroup hosts or hosts in an un-trusted domain/forest.

It’s a solution that is possible.  A little judicious searching dug up pages on TechNet that gives the host requirements and the step-by-steps.  Here’s a summary:

  • Identify the ports used by your installation of VMM 2008.  You’ll need to open these on the firewall (host outbound and/or network) and enter them on the manual agent installation.
  • Get the IP address of the host.  You’ll either need to add this to the un-trusted network’s DNS or enter it in the manual installation.  You might consider using the local hosts file too (more work).
  • Manually install the agent by running setup.exe.
  • Choose a local agent installation.
  • You can accept the default ports if that’s what you used when installing VMM 2008 or enter non-default ones if that’s what you used.
  • On the security folder page, choose "This host is on a perimeter network", even if it isn’t.  This is for workgroup solutions too.
  • Enter and confirm the encryption key.  You’ll want to record this for when you add the host to VMM.
  • Enter the name/IP address of the VMM server.  If you use the name of the VMM server then ensure it can resolve correctly.  This requires either DNS or an updated local hosts file.
  • Copy %SystemRoot%Program FilesMicrosoft System Center Virtual Machine Manager 2008SecurityFile.txt to a location accesible on the VMM server for when you add the host machine.  You’ll need it to add the host.

Now, add the "perimeter" host on the VMM server.  Use the recorded encryption key and the security file.  Fire up the add host wizard.  Enter the encryption key and the location of the security file as required.

I’d recommend having a separate group for these hosts but that all depends on your security and administration models.  You need to create these folders before starting the wizard.

Microsoft Generation 4 Modular Data Centres

Data centres are a hot topic right now thanks to outsourcing, cloud computing and Software as a Service (or Software + Services).  You might know a computer room but it’s nothing like a data centre in terms of scale, power, cooling, fire suppression, fault tolerance and processes.  It’s a thoroughly different experience.

Microsoft is building a set of new data centres around the world to host their new cloud computing (Azure) infrastructure for Software + Services.  This includes the Grange Castle facility in Dublin that made news headlines a year ago when they turned the first sod.  It appears Microsoft refers to the architecture being used as 3rd generation.

The typical data centre is a large purpose driven building with pre-deployed power and cooling.  There’s a huge cost to building/maintaining/operating this building until it’s fully populated.  The big costs?  Building, rent, water and power.  Electricity is a huge cost and we all know it’s only getting higher.  Green (as in money/taxation, not environment) Party politicians want carbon taxes and we’re likely to see those soon.  Those operating costs are increasing.  This makes it harder to keep computing costs down – more reason to seek cloud/outsourcing services to capitalize on costs savings through shared or bulk buying, i.e. many clients in a managed data centre sharing costs.

Microsoft builds and maintains their own data centres so they are their own client.  They can’t share those costs.  I’ve heard that they buy some staggering number of servers per month so growth is constant and huge.  They could build huge data centres and populate them but the overhead of half empty data centres would be massive.

Their 4th generation architecture is a simple concept.  Instead of building the huge building they will build a spine or back bone.  They defined a modular architecture where they can drop in pre-built and populated building blocks in a just-in-time (JIT) basis.  These blocks are similar to lorry containers on sight.  They blocks are pre-fabricated so building costs are minimal.  Because the building isn’t 1 huge block it also simplifies cooling, one of the big draws on power and a major draw of water.  They are looking at using uncooled external air to cool the individual blocks.  Each block has direct access to the external air.  They might not get 19 degree Celsius internal temperatures but do they really need that?  Nope.  Servers will happily run at 30 degrees.  We only cool beneath that for historical reasons and for human comfort levels.

Using JIT MS can keep a certain amount of resources free while putting more on order.  This Lego-style approach is a simple one and a money saver.  Use what you need now, have some on reserve and have a fixed plan on what/when you will purchase to maintain the reserve.

We do something similar at work.  We acquire servers only as required.  We power up and network racks only as required.  We keep a certain percentage of resources free and have a trigger to acquire fresh upgrades to our reserves.  This keeps our operating costs down which we can pass on to our clients.  Of course, we’re not on the same scale as MS’s data centres … yet 🙂