Cluster Aware Updating (CAU) is a new feature that makes running Windows or Automatic Updates on a Hyper-V cluster easier than ever, as well as any other WS2012 cluster.
If you currently have a Windows Server 2008/R2 Hyper-V cluster, then you have a few options for patching it with no VM downtime:
- Manually Live Migrate VM workloads (Maintenance Mode in VMM 2008 R2makes this easier), patch, and reboot each host in turn, which is a time consuming manual task.
- Use System Center Opalis/Orchestrator to perform a runbook against each cluster node in turn that drains the cluster node of it’s roles (VMs), patches it and reboots it.
- Use the patching feature of System Center 2012 Virtual Machine Manager – which is limited to Hyper-V clusters and adds more management to your patching process.
CAU is actually pretty simple:
- Have some patching mechanism configured: e.g. enable Automatic Updates on the cluster nodes (e.g. Hyper-V hosts), approve updates in WSUS/ConfigMgr/etc. Make sure that you exempt your cluster nodes from automatic installation/rebooting in your patching policy; CAU will do this work.
- Log into Failover Clustering from a machine that is not a cluster node (Hyper-V host) member. Run the CAU wizard.
- Here, you can either manually kick off a patching job for the cluster nodes or schedule it to run automatically. The scheduled automatic option requires that you have deployed a CAU role on the cluster in question to orchestrate the patching.
When a patching job runs the following will happen:
- Determine the patches to install per node.
- Put node 1 in a paused state (maintenance mode). This drains it of clustered roles – in other words your Hyper-V VMs will Live Migrate to the “best possible” hosts. Failover Clustering uses amount of RAM to determine the best possible host. VMM’s advantage is that it uses more information to perform Intelligent Placement.
- Node 1 is removed from a paused state, enabling it to host roles (VMs) once again.
- CAU will wait then patch and reboot Node 1.
- When Node 1 is safely back online, CAU will move onto Node 2 to repeat the operation.
VMs are Live Migrated throughout the cluster as the CAU job runs and each host is put into a paused state (automatically Live Migrating VMs off), patching, rebooting, and un-pausing. It’s a nice simple operation.
The process is actually quite configurable, enabling you to definite variables for decisions, execute scripts at different points, and define a reboot timeout (for those monster hosts).
Something to think of is how long it will take to drain a host of VMs. A 1 GbE Live Migration network will take an eternity to LM (or vMotion for that matter) 192 GB RAM of VMs, even with concurrent LMs (as we have in Windows Server 2012).
Sounds nice, eh? How about you see it in action:
I have edited the video to clip out lots of waiting:
- These were physical nodes (Hyper-V hosts) and a server’s POST takes forever
- CAU is pretty careful, and seems to deliberately wait for a while when a server changes state before CAU continues with the task sequence.