OpsMgr Empirical Performance Data For Server Sizing

We have a customer who has a number of physical machines hosted with us.  They were deployed before we had a virtualised environment.  The specs were defined by the customer based on what they thought they’d need for a new service.

They asked us to look at replacing (not converting) their Windows Server 2003 web servers with Windows Server 2008/2008 R2 virtual web servers.  They also asked if the back end servers could be looked at as virtualisation candidates. Operations Manager to the rescue!

OpsMgr is constantly gathering performance data.  It keeps over a year of it in a reporting database.  I ran some reports.  CPU and memory were the two important ones.

The web servers were simple enough.  Their CPU average utilisation proved to be low with the occasional spike.  The standard deviation was very small and the spikes were very infrequent.  As Hyper-V VM’s on a cluster, this is no problem.  If a spike is detected by OpsMgr, the VMM Pro Tips integration will move the VM using zero-downtime Live Migration to an idle host and allow the VM the CPU resources it needs.  As it turns out, they use exactly 50% of their RAM.  The nice thing here is that we have empirical data to justify a reduction of the ram by 25%.  If it needs to go up then it’s just a couple of minutes of mouse clicks to do that.

The back end servers were another story.  The average CPU was low, but not quite as low.  I also could see much more frequent CPU spikes.  The standard deviation was much greater.  To be honest, this was what the customer and I both expected.  These machines are not virtualisation candidates.

So instead of doing a blind P2V, or sticking a wet finger in the wind, we went through a scientific decision making process, courtesy of the reporting database in Operations Manager 2007 R2.  There will be no worrying about any future deployment, we should know what the end result will be.

Download OpsMgr 2007 R2 Documentation

You can now download Microsoft’s TechNet documentation for System Center Operations Manager 2007 R2.

The Operations Manager 2007 R2 technical documentation helps you plan, deploy, operate, and maintain Operations Manager 2007 R2. For information about the specific guides available in the library.

Hardware Monitoring Using System Center Operations Manager

Hardware management is the one thing I am most worried about.  Sure, I could deploy the manufacturers management solution.  But do I want consoles to manage lots of different systems?  Really, you don’t.  You want one central point and that can be the Operations Manager console.

I’m most familiar with what HP does so I’ll explain it.  They provide and Insight Manager agent that detects health and performance issues of the hardware.  This includes all of the components, e.g. CPU, fans, disks, network cards, etc.  You can deploy and OpsMgr agent to this server.  If you install the HP Insight Manager management pack then, after discovery, OpsMgr will be aware of the Insight Manager agent.  All data collected by that agent will be detected by OpsMgr.  So now, if a disk fails you learn about it in OpsMgr.  If memory degrades, you learn about it in OpsMgr.  This is so handy – because this is where you also get performance and health alerts for Windows, SQL, Exchange, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, etc.  You can extend with 3rd party solutions to include your Cisco network, etc.  Heck, there’s even a coffee pot management pack!!!

Back in the day, there appeared to be only support from HP and Dell.  But that has changed.

  • HP: Hewlett Packard has management packs for ProLiant servers, BladeSystem, and Integrity.  There is also a management pack for StorageWorks systems (e.g. EVA SAN).
  • Dell: I’ve never managed Dell machines with OpsMgr.  But I am told that Dell did a very nice job.  They are significant Microsoft partners.
  • IBM: I’m not the biggest fan of IBM – we have some X series stuff which I detest.  We had to get a IBM employee to download the management pack because all external links failed.  At the time, it appeared their “shared” download was only available from the IBM corporate network. A Dutch friend had the same issue and I ended up sending him what I was given by IBM.  I’ll be honest, the IBM Director management pack is poor compared to the HP one.  IBM wants you to spend lots of money on consultancy led Tivoli.  IBM Director is pretty poor too.  IBM Ireland employees have been unable to figure out how to monitor IBM DAS nor give me the documentation to do it.
  • Fujitsu: I have not seen a Fujitsu server since 2005.  Back then there was no MOM management pack for the Fujitsu Siemens servers; they wanted you to use a native solution only.  That has changed.  They have ServerView Integration for Microsoft System Center Operations Manager 2007 and System Center Essentials 2007 and ServerView Integration Pack for Microsoft System Center Operations Manager 2007.

That should get you started.  Each of the manufacturers seems to do things differently.  HP, for example, uses the above system for ProLaints.  But blade enclosures require a piece of middleware.  Make sure you read the accompanying documentation from the OEM before you do anything.

Thanks to fellow MVP Mark Wilson for finding the links for Fujitsu. 

Technorati Tags: ,

Operations Manager 2007 R2 Cross Platforms Cumulative Update 2

Microsoft released an update for OpsMgr 2007 R2 cross platform extensions last night. 

The System Center Operations Manager 2007 R2 Cross Platform Cumulative Update 2 includes System Center Operations Manager 2007 R2 Cross Platform Agent Update (KB973583) and additional bug fixes.

This updated release includes all features that were in the previous update release (KB973583) and additional fixes in this release:

Adds support for (in previous release – KB973583):

  • SUSE Linux Enterprise Server 11 (both 32-bit and 64-bit)
  • Zones (Whole and Sparse Zones) for all supported version of Solaris

There are a number of fixes included, all available to read on the MS download page.

Monitor CSV Free Space?

This is something that struck me today.  I was doing some checks in Operations Manager to see what free space was like on some of the servers we run online backup services with.  Then I thought – let’s have a look at the cluster shared volume on our Hyper-V cluster.  The problem is that Operations Manager deals with logical drives that have a letter.  It seems to ignore drive such as the CSV: a mounted drive that appears as a folder in C:ClusterStorageVolume 1, Volume2, etc.

There are two ways to check this manually that I have found so far.  The first is to open up the Failover Clustering MMC and connect to the cluster.  You’ll see the size and free space for the Cluster Shared Volume there. 

image

You can also do it in VMM by right-clicking on the cluster object and viewing the properties.

image

You can ignore the witness disk (at the top); I really hope you’re not so desperate for VM storage that you consider that!

I cannot find anything in Operations Manager for tracking this critical function.  It’s not in the Failover Clustering MP (where it probably should be), Hyper-V or VMM management packs.

I’d advise that you keep an eye on this, especially if you are experiencing growth or using self service in VMM.  For example, I’ve switched to using dynamic VHD’s.  Yeah, early on that means I save on storage space.  My C: VHD’s are half the size they were with Windows Server 2008 fixed VHD’s.  But eventually they will grow and consume space on the CSV.  You need to know when to trigger a growth of the LUN on the SAN and expand the NTFS volume before we reach critical levels.  Bad things happen when a growing VHD doesn’t have any space left.

Managing SharePoint 2010 using System Center

I’ve tuned into a webcast aimed at the System Center Influencers and I’m going to try blog from it live.  Microsoft’s line is that System Center is the way to manage SharePoint because Microsoft understands the requirements.

SharePoint often started as some ad-hoc solution but grew from there to be mission critical and containing urgent business data.  Administration is complex: users, file server admins, web admins, database admins and web developers.

System Center Improves Availability:

  • DPM backs it up the way it should be.
  • Operations Manager monitors health and performance.
  • Virtualisation (VMM managed) can allow for rapid deployment with minimal footprint.

Administration

  • Configuration automates management
  • Service Desk will add more benefits

Centralised Management

This is the norm for System Center.  Centralised management with delegation is how System Center works.  For example, a Sharepoint administrator could deploy a front end server in minutes using the VMM 2008 R2 self service portal.  A quota will control sprawl but the network administrators don’t need to be as involved.

OpsMgr Management Pack

  • There is a new monitoring architecture.  There are physical and logical components where the physical entity rolls up to a logical entity.
  • Monitoring is integrated into SharePoint so the SharePoint admins can see the health in SharePoint
  • There will be a unified management pack instead of the current 2007 split management packs.  The discovery process will identify the roles installed on an agent machine and only utilise the required components.

We’re shown an OpsMgr diagram that shows the architecture of a SharePoint deployment.  If you haven’t seen these, they are hierarchical diagrams that give you a visualisation of some system, e.g. HP Blade farm, Hyper-V cluster, SharePoint farm.

The 2010 management pack allows you to monitor a particular web application in SharePoint 2010.  The management pack is more aware of what components are deployed where and the interdependencies – sorry I’m not a SharePoint guru so I’m missing some of the terminology here.

Rules administration has been simplified.  There is a view in the Monitoring pane to view the health of all rules for the SharePoint 2010 management pack.  I like this.  I’ve not seen it in any other management pack.  The SQL guys should have coffee with the SharePoint folks 🙂

Three are 300% more discoveries and 1293% more classes and 300% more monitors than in 2007.  That is a huge increase in automated knowledge being built into OpsMgr to look after SharePoint 2010.  There are 45% fewer rules.  This is a good thing because there is duplicated effort being reduced for IIS and SQL management pack to reduce noise.  Microsoft assumes you’ll install those other management packs.  approximately 150 TechNet articles are linked in the pack to guide you to fixing certain detected issues.

Data Protection Manager 2010

DPM 2010 is due out around April 2010.  It important to Hyper-V admins because it adds support for CSV.  DPM allows you to backup to disk and then optionally stream to tape.  You can also replicate one DPM server to another for

SharePoint 2003 and WSS 2.0 are backed up basically as SQL.  You need the native SP tool to complete the backup..

SharePoint 2007 and WSS 3.0 is backed up using a SharePoint VSS writer.  Every server (web/content/config/index) gets an agent.  DPM reaches out to “the farm” and can back up everything required.

DPM is designed to know what to back up.  3rd party solutions are generic and don’t have that.  For example, a new server in the farm will be detected.  The DPM administrator needs to authorise this addition.

DPM 2010 does something similar with SharePoint 2010.  However, it is completely automated, allowing your delegated VMM administrators or Configuration Manager administrators (SharePoint administrators) to deploy VM’s or physical machines.

One of the cool things about DPM is that it doesn’t have specialised agents.  It’s using VSS writers.  That means there is 1 agent for all types of protected servers.

We get a demo now and we see the DPM administrator can just select “the farm” and back that up.  There’s no selecting of components or roles.  The speaker only sets up his destination and retention policies.

DPM 2007 is noisy, e.g. data consistency checks.  I’ve seen this when I did some lab work.  The job wizard allows you to either to perform a heal/check if a problem is found, on a scheduled basis or not at all.  This is a self healing feature.

Recoveries can be done at the farm level, an individual content (SQL) database.  SharePoint 2007 can restore a site collection, a site or a document.  This requires a recovery farm, i.e. a server, consuming resources and increasing costs.  SharePoint 2010 with DPM 2010 does not require a recovery farm.  You can directly recover an item into the production farm.  Trust me, that’s huge.

The release candidate for DPM 2010 comes out next week.

Virtualisation

  • Web role, Render Content: Virtualisation ideal
  • Query role, Process Search, Queries: Virtualisation Ideal
  • Application Role, Excel Forms Services: Virtualisation ideal
  • Index role, Crawl Index: Consider virtualisation– small amount of crawling, and drive space used to store the index (VHD = maximum 2TB, although you can go to pass through disks for more).
  • Database role: Consider virtualisation – OK for smaller farms.

My Take

My advice on top of this: Monitor everything using VMM and Operations Manager.  You soon see if something is a candidate for virtualisation or if a VM needs to be migrated to physical.

If you run everything on a Hyper-V 2008 R2 cluster then enable PRO in VMM.  Any performance issues will allow an automatic Live Migration (if you allow it) to avoid performance bottlenecks.

If you are going physical for the production environment then consider virtual for the DR site if reduced capacity is OK.  For example, your production site is backed up with DPM.  You keep a Hyper-V farm in the DR site.  Your DPM server replicates to a DR site DPM server.  During a DR you can do a restoration.  Will it work?  Who knows :)  It’s something you can test pretty cheaply with Hyper-V Server 2008 R2.  Money is tight everywhere and this might be an option.

New Operations Management Management Pack: Power

There’s a glut of Operations Manager 2007 management packs available from Microsoft.  One of them really stood out.  It is the Windows Power Management Pack for System Center Operations Manager 2007 R2.

The short description is:

“The Power Management Pack for Operations Manager 2007 R2 enables you to monitor and manage the power consumption of computers running Windows Server 2008 R2.

This management pack provides:

  • Visibility into power consumption
  • Visibility and control of power policy
  • Ability to lower power consumption during non-business hours to reduce overall power consumption
  • Ability to limit power consumption
  • Ability to detect excessive power consumption”

Microsoft is making a lot of effort in this space.  System Center Configuration Manager 2007 R3 could be considered as “System Center Configuration Manager 2007 Power Management Edition”.  Windows 7 and Windows Server 2008 R2 both include lots of power features.

This new management pack seems to be leveraging some of those features in Windows Server 2008 R2.  The basic concept is that it will retrieve details of power consumption for servers running that operating system.  It allows you to set power plans and use automatic recovery to switch power plans based on server usage.  You can set thresholds and raise alerts when servers consume more than an allocated amount of power.  You can force servers to use no more than a certain amount of power. 

This is a complicated management pack.  There is a Word document with rough instructions.  Please read it thoroughly before importing this management pack.  If you have a physical lab (the management pack won’t do anything with VM’s) then work with it there first.  You should also note that if you use SQL 2008 for your OpsMgr database then you should apply a hotfix first.

System Center Influencers Blog Feed

I’m lucky enough to be a part of Microsoft’s System Center Influencers group.  VMM is one of my core things.  I’m a user of OpsMgr and I’ll blog about what I learn and do with that too.  I used to be a ConfigMgr MVP but I’ve fallen out of step with it because of changes in work but I like to stay in touch.  We don’t use DPM as a core product but Hyper-V keeps it interesting for me.  And that’s just the start of System Center!

The folks behind System Center Influencers have a blog feed gathering content from the members.  You can see our blog posts in one central point.  Check it out.

Rough Guide To Setting Up A Hyper-V Cluster

EDIT: 18 months after I wrote it, this post continues to be one of my most popular.  Here is some extra reading for you if this topic is of interest:

A lot of people who will be doing this have never set up a cluster before.  They know of clusters from stories dating back from the NT4 Wolfpack, Windows Server 2000 and Windows Server 2003 days when consultants made a fortune from making things like Exchange and SQL on 5 days per cluster projects.

Hyper-V is getting more and more widespread.  And that means setting up highly available virtual machines (HAVM) on a Hyper-V cluster will become more and more common.  This is like Active Directory.  Yes, it can be a simple process.  But you have to get it right from the very start or you have to rebuild from scratch.

So what I want to do here is walk through what you need to do in a basic deployment for a Windows Server 2008 R2 Hyper-V cluster running a single Cluster Shared Volume (CSV) and Live Migration.  There won’t be screenshots – I have a single laptop I can run Hyper-V on and I don’t think work would be too happy with me rebuilding a production cluster for the sake of blog post screenshots 🙂  This will be rough and ready but it should help.

Microsoft’s official step by step guide is here.  It covers a lot more detail but it misses out on some things, like “how many NIC’s do I need for a Hyper-V cluster?”, “how do I set up networking in a Hyper-V cluster?”, etc.  Have a read of it as well to make sure you have covered everything.

P2V Project Planning

Are you planning to convert physical machines to virtual machines using Virtual Machine Manager 2008 R2?  If so and you are using VMM 2008 R2 and Operations Manager 2007 (R2), deploy them now (yes, before the Hyper-V cluster!) and start collecting information about your server network.  There are reports in there to help you identify what can be converted and what your host requirements will be.   You can also use the free MAP toolkit for Hyper-V to do this.  If your physical machine uses 50% of a quad core Xeon then the same VM will use 50% of the same quad core Xeon in a Hyper-V host (actually, probably a tiny bit more to be safe).

Buy The Hardware

This is the most critical part.  The requirements for Hyper-V are simple:

  • Size your RAM.  Remember that a VM has a RAM overhead of up to 32MB for the first GB of RAM and up to 8MB for each additional GB of RAM in that VM.
  • Size the host machine’s “internal” disk for the parent partition or host operating system.  See the Windows Server 2008 R2 requirements for that.
  • The CPU(s) should be x64 and feature assisted virtualisation.  All of the CPU’s in the cluster should be from the same manufacturer.  Ideally they should all be the same spec but things happen over time as new hardware becomes available and you’re expanding a cluster.  There’s a tick box for disabling advanced features in a virtual machine’s CPU to take care of that during a VM migration.
  • It should be possible to enable Data Execution Prevention (DEP) in the BIOS and it should work.  Make that one a condition of sale for the hardware.  DEP is required to prevent break out attacks in the hypervisor.  Microsoft took security very, very seriously when it came to Hyper-V.
  • The servers should be certified for Windows Server 2008 R2.
  • You should have shared storage that you will connect to the servers using iSCSI or Fibre Channel.  Make sure the vendor certifies it for Windows Server 2008 R2.  It is on this shared storage (a SAN of some kind) that you will store your virtual machines.  Size it according to your VM’s storage requirements.  If a VM has 2GB of RAM and 100GB of disk then size the SAN to be 102GB plus some space for ISO images (up to 5GB) and some free space for a healthy volume.
  • The servers will be clustered.  That means you should have a private network for the cluster heartbeat.  A second NIC is required in the servers for that.
  • The servers will need to connect to the shared storage.  That means either a fibre channel HBA or a NIC suitable for iSCSI.  The faster the better.  You may go with 2 instead of 1 to allow MPIO in the parent partition.  That allows storage path failover for each physical server.
  • Microsoft recommends a 4th NIC to create another private physical network between the hosts.  It would be used for Live Migration.  See my next page link for more information.  I personally don’t have this in our cluster and have not had any problems.  This is supported AFAIK.
  • Your servers will have virtual machines that require network access.  That requires at least a third NIC in the physical servers.  A virtual switch will be created in Hyper-V and that connects the virtual machines to the physical network.  You may add a 4th NIC for NIC teaming.  You may add many NIC’s here to deal with network traffic.  I’ve talked a good bit about this, including this post.  Just search my blog for more.
  • Try get the servers to be identical.  And make sure everything has Windows Server 2008 R2 support and support for failover clustering.
  • You can have up to 16 servers in your cluster.  Allow for either N+1 or N+2.  The latter is ideal, i.e. there will be capacity for two hosts to be offline and everything is still running.  Why 2?  (a) stuff happens in large clusters and Murphy is never far away.  (b) if a Windows 8 migration is similar to a Windows Server 2008 R2 migration then you’ll thank me later – it involved taking a host from the old cluster and rebuilding it to be a host in a new cluster with the new OS.  N+1 clusters lost their capacity for failover during the migration unless new hardware was purchased.
  • Remember that a Hyper-V host can scale out to 64 logical processors (cores in the host) and 1TB RAM.

The Operating System

This one will be quick.  Remember that the Web and Standard editions don’t support failover clustering.

  • Hyper-V Server 2008 R2 is free, is based on the Core installation type and adds Failover Clustering for the first time in the free edition.  It also has support for CSV and Live Migration.  It does not give you any free licensing for VM’s.  I’d only use it for VDI, Linux VM’s or for very small deployments.
  • Windows Server 2008 R2 Enterprise Edition supports 8 CPU sockets and 2TB RAM.  What’s really cool is that you get 4 free Windows Server licenses to run on VM’s on the licensed host.  A host with 1 Enterprise license effectively gets 4 free VM’s.  You can over license a host too: 2 Enterprise licenses = 8 free VM’s.  These licenses are not transferable to other hosts, i.e. license 1 host and run the VM’s on another host.
  • Windows Server 2008 R2 DataCenter Edition allows you to reach the maximum scalability of Hyper-V, i.e. 64 logical processors (cores in the host) and 1TB RAM.  DataCenter edition as a normal OS has greater capacities than this; don’t be fooled into thinking Hyper-V can reach those.  It cannot do that despite what some people are claiming is supported.

All hosts in the cluster should be running the same operating system and the same installation type.  That means all hosts will be either Server Core or full installations.  I’ve talked about Core before.  Microsoft recommends it because of the smaller footprint and less patching.  I recommend a full installation because the savings are a few MB of RAM and a few GB of disk.  You may have fewer patches with Core but they are probably still every month.  You’ll also find it’s harder to repair a Core installation and 3rd party hardware management doesn’t have support for it.

Install The Hardware

First thing’s first, get the hardware installed.  If you’re unsure of anything then get the vendor to install it.  You should be buying from a vetted vendor with cluster experience.  Ideally they’ll also be a reputed seller of enterprise hardware, not just honest Bob who has a shop over the butchers.  Hardware for this stuff can be fiddly.  Firmwares across the entire hardware set all have to be matching and compatible.  Having someone who knows this stuff rather than searches the Net for it makes a big difference.  You’d be amazed by the odd things that can happen if this isn’t right.

As the network stuff is being done, get the network admins to check switch ports for trouble.  Ideally you’ll use cable testers to test any network cables being used.  Yes, I am being fussy but little things cause big problems.

Install The Operating Systems

Make sure they are all identical.  An installation that is done using using an answer file helps there.  Now you should identify which physical NIC maps to which Local Area Connection in Windows.  Take care of any vendor specific NIC teaming – find out exactly what your vendor prescribes for Hyper-V.  Microsoft has no guidance on this because teaming is a function of the hardware vendor.  Rename each Local Area Connection to it’s role, e.g.

  • Parent
  • Cluster
  • Virtual 1

What you’ll have will depend on how many NIC’s you have and what roles you assigned to them.  Disable everything except for the first NIC.  That’s the one you’ll use for the parent partition.  Don’t disable the iSCSI ones.

Patch the hosts for security fixes.  Configure the TCP/IP  for the parent partition NIC.  Join the machines to the domain.  I strongly recommend setting up the constrained delegation for ISO file sharing over the network.

Do whatever antivirus you need to.  Remember you’ll need to disable scanning of any files related to Hyper-V.  I personally advise against putting AV on a Hyper-V host because of the risks associated with this.  Search my blog for more.  Be very sure that the AV vendor supports scanning files on a CSV.  And even if they do, there’s no need to be scanning that CSV.  Disable it.

Enable the Cluster NIC for the private heartbeat network.  This will either be a cross over cable between 2 hosts in a 2 host cluster or a private VLAN on the switch dedicated just to these servers and this task.  Configure TCP/IP on this NIC on all servers with an IP range that is not routed on your production network.  For example, if your network is 172.168.1.0/16 then use 192.168.1.0/24 for the heartbeat network.  Ping test everything to make sure every server can see every other server.

If you have a Live Migratoin NIC (labelled badly as CSV in my examples diagrams) then set it up similarly to the Cluster NIC.  It will have it’s own VLAN and it’s own IP range, e.g. 192.168.2.0/24.

Enable the Virtual NIC.  Unbind every protocol you can from it, e.g. if using NIC teaming you won’t unbind that.  This NIC will not have a TCP configuration so IPv4 and IPv6 must be unbound.  You’re also doing this for security and simplicity reasons.

Here’s what we have now:

image

Once you have reached here with all the hosts we’re ready for the next step.

Install Failover Clustering

You’ll need to figure out how your cluster will gain a quorum, i.e. be able to make decisions about failover and whether it is operational or not.  This is to do with host failure and how the remaining hosts vote.  It’s done in 2 basic ways.  There are actually 4 ways but it breaks down to 2 ways for most companies and installations:

  1. Node majority: This is used when there are an odd number of hosts in the cluster, e.g. 5 hosts, not 4.  The hosts can vote and there will always be a majority winner, e.g. 3 to 2.
  2. Node majority + Disk: This is used when there are an even number of hosts, e.g. 16.  It’s possible there would be an 8 to 8 vote with no majority winner.  The disk acts as a tie breaker.

Depending on who you talk to or what GUI in Windows you see, this disk is referred to either as a Witness Disk or a Quorum Disk.  I recommend creating it in a cluster no matter what.  Your cluster may grow or shrink to an uneven number of hosts and may need it.  You can quickly change the quorum configuration based on the advice in the Failover Clustering administration MMC console.

The disk only needs to be 500MB size.  Create it on the SAN and connect the disk to all of your hosts.  Log into a host and format the disk with NTFS.  Label it with a good name like Witness Disk.

I’m ignoring the other 2 methods because they’ll only be relevant in stretch clusters than span a WAN link and I am not talking about that here.

Use Server Manager to install the role on all hosts.  Now you can set up the cluster.  The wizard is easy enough.  You’ll need a computer name/DNS name for your cluster and an IP address for it.  This is on the same VLAN as the Parent NIC in the hosts.  You’ll add in all of the hosts.  Part of this process does a check on your hardware, operating system and configuration.  If this passes then you have a supported cluster.  Save the results as a web archive file (.MHT).  The cluster creation will include the quorum configuration.  If you have an even number of hosts then go with the + Disk option and select the witness disk you just created.  Once it’s done your cluster is built.  It is not hard and only takes about 5 to 10 minutes.  Use the Failover Clustering MMC to check the health of everything.  Pay attention to the networks.  Stray networks may appear if you didn’t unbind IPv4 or IPv6 from the virtual network NIC in the hosts.

If you went with Node Majority then here’s my tip.  Go ahead and launch the Failover Clustering MMC.  Add in the storage for the witness disk.  Label it with the same name you used for the NTFS volume.  Now leave it there should you ever need to change the quorum configuration.  A change is no more than 2 or 3 mouse clicks away.

Now you have:

image

Install Hyper-V

Enable the Hyper-V role on each of your hosts, one at a time.  Make sure the logs are clean after the reboot.  Don’t go experimenting yet; Please!

Cluster Shared Volume

CSV is seriously cool.  Most installations will have most, if not all, VM’s stored on a CSV.  CSV is only supported for Hyper-V and not for anything else as you will be warned by Microsoft.

Set up your LUN on the physical storage for storing your VM’s.  This will be your CSV.  Connect the LUN to your hosts.  Format the LUN with NTFS.  Set it to use GPT so it can grow beyond 2TB.  Label it with a good name, e.g. CSV1.  You can have more than 1 CSV in a cluster.  In fact, a VM can have its VHD files on more than one CSV.  Some are doing this to attempt to maximise performance.  I’m not sold that will improve performance but you can test it for yourself and do what you want here.

DO NOT BE TEMPTED TO DEPLOY A VM ON THIS DISK YET.  You’ll lose it after the next step.

Use the Failover Clustering MMC to add the disk in.  Label it in Failover Clustering using the same name you used when you formatted the NTFS volume.  Now configure the the CSV.  When you’re done you’ll find the disk has no drive letter.  In fact, it’ll be “gone” from the Windows hosts.  It’ll actually be mounted as a folder on the C: drive of all of your hosts in the cluster, e.g. C:ClusterStorageVolume1.  This can be confusing at first.  It’s enough to know that all hosts will have access to this volume and that your VM’s are not really in your C: drive.  They are really on the SAN.  C:ClusterStorageVolume1 is just a mount point to a letterless drive.

Now we have this:

image

Virtual Networking

Hopefully you have read the previously linked blog post about networking in Hyper-V.  You should be fully educated about what’s going on here.

Here’s the critical things to know:

  • You really shouldn’t put private or internal virtual networks on a Hyper-V cluster when using more than one VM on those virtual networks.  Why?  A private or internal virtual network on host A cannot talk with a private or internal network on host B.  If you set up VM1 and VM2 on such a virtual network on host A what happens when one of those VM’s is moved to another host?  It will not be able to talk to the other VM.
  • If you create a virtual network on one host then you need to create it on all hosts.  You also must use identical names across all hosts.  So, if I create External Network 1 on host 1 then I must create it on host 2.

Create your virtual network(s) and bind them to your NIC’s.  In my case, I’m binding External Network 1 to the NIC we called Virtual 1.  That gives me this:

image

All of my VM’s will connect to External Network 1.  An identically named external virtual network exists on all hosts.  The physical Cluster 1 NIC is switched identically on all servers on the physical network.  That means if VM1 moves from host 1 to host 2 it will be able to reconnect to the virtual network (because of the identical name) and be able to reach the same places on the physical network.  What I said for virtual network names also applies to tags and VLAN ID’s if you use them.

Get Busy!

Believe it or not, you have just built a Hyper-V cluster.  Go ahead and build your VM’s.  Use the Failover Clustering MMC as much as possible.  You’ll see it has Hyper-V features in there.  Test live migration of the VM between hosts.  Do continuous pings to/from the VM during a migration.  Do file copies during a migration (pre-Vista OS on the VM is perfect for this test).  Make sure the VM’s have the integration components/integration services/enlightenments (or additions for you VMware people) installed.  You should notice no downtime at all.

Remember that for Linux VM’s you need to set the MAC in the VM properties to be static or they’ll lose the binding between their IP configuration and the virtual machine NIC after a migration between hosts.

Administartion of VM’s

I don’t know why some people can’t see or understand this.  You can enable remote desktop in your VM’s operating system to do administration on them.  You do not to use the Connect feature in Hyper-V Manager to open the Virtual Machine Connection.  Think of that tool as your virtual KVM.  Do you always use KVM to manage your physical servers?  You do?  Oh, poor, poor you!  You know there’s about 5 of you out there.

Linux admins always seem to understand that they can use SSH or VNC.

Virtual Machine Manager 2008 R2

VMM 2008 R2 will allow you to manage a Hyper-V cluster(s) as well as VMware and Virtual Server 2005 R2 SP1.  There’s a workgroup edition for smaller clusters.  It’s pretty damned powerful and simplifies many tasks we have to do in Hyper-V.  Learn to love the library because that’s a time saver for creating templates, sharing ISO’s (see constrained delegation above during the OS installation), administration delegation, self service portal, etc.

You can install VMM 2008 R2 as a VM on the cluster but I don’t recommend it.  If you do, then use the Failover Clustering and Hyper-V consoles to manage the VMM virtual machine.  I prefer that VMM be a physical box.  I hate the idea of chicken and egg scenarios.  Can I think of one now?  No, but I’m careful.

To deploy the VMM agent you just need to add the Hyper-V cluster.  All the hosts will be imported and the agent will be deployed.  Now you can do all of your Hyper-V management via PowerShell, the VMM console and the Self Service console.

You also can use VMM to do a P2V conversion as mentioned earlier.  VSS capable physical machines that don’t run transactional databases can be converted using a live or online conversion.  Those other physical machines can be converted using an offline migration that uses Windows PE (pre-installation environment).  Additional network drivers may need to be added to WinPE.

You can enable PRO in your host group(s) to allow VMM to live migrate VM’s around the cluster based on performance requirements and bottlenecks.  I have set it to fully automatic on our cluster.  Windows 2008 quick migration clusters were different: automatic moves meant a VM could be offline for a small amount of time.  Live Migration in Windows Server 2008 R2 solves that one.

Figure out your administration model and set up your delegation model using roles.  Delegated administrators can use the VMM console to manage VM’s on hosts.  Self service users can use the portal.

Populate your library with hardware templates, VHD’s and machine templates.  Add in ISO images for software and operating systems.  An ISO create and mounting tool will prove very useful.

Operations Manager 2008 R2

My advice is “YES, use it if you can!”.  It’s by using System Center that makes Hyper-V so much better.  OpsMgr will give you all sorts of useful information on performance and health.  Import your management packs for Windows Server, clustering, your hardware (HP and Dell do a very nice job on this.  IBM don’t do so well at all – big surprise!), etc.  Use the VMM integration to let OpsMgr and VMM to work together.  VMM will use performance information from OpsMgr for intelligent placement of VM’s and for PRO.

I leave the OpsMgr agent installation as a last step on the Hyper-V cluster.  I want to know that all my tweaking is done … or hopefully done.  Otherwise there’s lots of needless alerts during the engineering phase.

Backup

Deploy your backup solution.  I’ve talked about this before so check out that blog post.  You will also want to backup VMM.  Remember that DPM 2007 cannot backup VM’s on a CSV.  You will need DPM 2010 for that.  Check with your vendor if you are using backup tools from another company.

Pilot

Don’t go running into production.  Test the heck out of the cluster.  Deploy lots of VM’s using your templates.  Spike the CPU in some of them (maybe a floating point calculator or a free performance tool) to test OpsMgr and VMM PRO.  Run live migrations.  Test P2V.  Test the CSV coordinator failover.  Test CSV path failover by disconnecting a running host from the SAN – the storage path should switch to using the Ethernet and route via another host.  Get people involved and have some fun with this stage.  You can go nuts while you’re not yet in production.

Go Into Production

Kick up your feet, relax, and soak in the plaudits for a job well done.

EDIT #1:

I found this post by a Microsoft Failover Clustering program manager that goes through some of this if you want some more advice.

My diagrams do show 4 NIC’s, including the badly named CSV (Live Migration dedicated).  But as I said in the OS installation section, you only need 3 for a reliable system: (1) parent, (2) heartbeat/live migration, and (3) virtual switch.

EDIT #2

There are some useful troubleshooting tips on this page.  Two things should be noted.  Many security experts advise that you disable NTLM in group policy across the domain.  You require NTLM for this solution.  There are quotes out there about Windows Server 2008 failover clusters not needing a heartbeat network. But “If CSV is configured, all cluster nodes must reside on the same non-routable network. CSV (specifically for re-directed I/O) is not supported if cluster nodes reside on separate, routed networks”.

Choosing a Linux to Run on Hyper-V

Become a Hyper-V administrator and sooner or later someone wants you to run Linux.  Hyper-V has support to run is SUSE Linux Enterprise Server (SLES) 10 SP1, 10 SP2 or 11, x86 or x64 as well as RedHat 5.2, and 5.3 with no IC’s.  Performance is important to me so I want my VM’s to have Integration Components.  That limits me to SLES 10 and 11.

If you are running Hyper-V then management is probably important to you.  You’re probably running some components of Microsoft System Center, even Operations Manager 2007 R2.  OpsMgr 2007 R2 has cross platform extensions, i.e. the ability to monitor Linux and UNIX physical and virtual machines using Microsoft written agents and management packs (optionally supplemented by 3rd party management packs).

OpsMgr 2007 R2 supports the following non-Microsoft operating systems:

  • AIX 5.3 (Power), 6.1 (Power)
  • HP-UX 11iv2 (PA-RISC and IA64), and 11iv3 (PA-RISC and IA64)
  • Red Hat Enterprise Server 4 (x64 and x86) and 5 (x64 and x86)
  • Solaris 8 (SPARC), 9 (SPARC) and 10 (SPARC and x86 versions later than 120012-14)
  • SUSE Linux Enterprise Server 9 (x86) and 10 SP1 (x86 and x64)

If you draw a Venn diagram then you’ll see your options for an optimal solution are starting to dwindle … rapidly.  The common MS supported operating systems for Hyper-V and Operations Manager 2007 R2 are:

  • 10 SP1 (x86 and x64)

Maybe I should have said “is” instead of “are”.

So, if you are running Windows Server 2008 R2 Hyper-V and System Center Operations Manager 2007 R2, then I’d recommend that you choose SUSE Linux Enterprise Server 10 SP1 as your Linux of choice.  Yes, it is a bit old.  Hyper-V has kept up to date but OpsMgr has lagged behind a little.

EDIT #1

Microsoft added support for running RHEL with integration components with the version 2 release of the IC’s for Hyper-V.