Webcast: Understanding The Virtual Machine Servicing Tool

This is a webcast for the System Center Influencers.  I’ll do my best to blog as it goes along.  It follows the recent beta release of VMST 3.0.  This is the release I’ve been waiting for.  Prior to this, it really only handled VM’s stored in an offline state in the library.  But now there is patching for:

  • Offline virtual machines in a SCVMM library
  • Stopped and saved state virtual machines on a host
  • Virtual machine templates
  • Offline virtual hard disks in a SCVMM library by injecting update packages (DISM)
  • Automated patching of Windows Server 2008 R2 failover cluster hosts running Hyper-V (using Live Migration for zero VM downtime)

Now that’s what I’m talking about!!! We’re very slowly moving towards some of the cool patching functionality for templates that is in VMM v.Next.  That last one is a biggie!

The Challenges:

  • Dormant VM’s miss patch Tuesday.
  • When they wake up they are non-compliant and vulnerable to network threats.
  • Patching without VMST is a manual process which is a waste of effort.

OVMST 2.1

  • Works with stored VM’s in the VMM library
  • Patches via WSUS & ConfigMgr with VMM
  • Move VM to maintenance host, start VM, patch it, shutdown, move to library.
  • Uses VMM PowerShell cmdlets.
  • Supports Hyper-V and Virtual Server 2005 R2 SP1

VSMT 3.0 Beta

Note that it is no longer called the “Offline …” tool.  See the previous features for the reason why.

The offline VM process works as usual, by moving it onto a maintenance host, starting, patching, shutting down and restoring it to the library.

Demo of Configuration and Offline Servicing

We see a VMM library with offline VM’s and template VHD’s.  There are 2 hosts.  Some VM’s are stopped, some are in saved state.  One host is labelled as being a maintenance host.  The VMST GUI is the usual System Center MMC “wunderbar” GUI.  The VMM server is selected, along with ConfigMgr and/or WSUS.  The maintenance host is selected in the wizard.  Credentials for servicing offline VHD’s is entered.  Timeouts for copies and updates are also entered (be careful with service pack updates which can be VERY time consuming – lesson learned from SMS updating process back in 2005). 

You can create groups for VHD’s, from VM’s in the library, from VM’s in template groups, and from VM’s in host groups.  You now create a servicing job for selected VM’s from the group(s).  You can also specify if the VM should use its own configured virtual network or from a selected VLAN (maintenance network).  A schedule is entered for the job, e.g. now, later or on a recurring basis.  You can track the job process in VMST or in VMM.

Servicing Shutdown VM’s on a Host

The VM is moved from the production host to a maintenance host.  Here it is started and patched.  The VM is shutdown and returned to the original host.  The configuration is pretty similar, just using a “stopped VM group” instead.  You can include VM’s with a saved state – these VM’s will lose their saved state.  This is because the VM is powered (woken) up and powered down.

Patching Virtual Machine Templates

These are files stored in the VMM library along with metadata in the VMM SQL database.  Patching these requires using a different method.  VMST creates a “gold VM” from the template and maintains a mapping to it.  The gold VM is started on the maintenance host.  The gold VM is updated.  The gold VM is cloned (not moved or new template).  The cloned VM is sysprepped and replaced the template VHD.  The gold VM is left in place for the next patching.

In the demo, you can select a pre-existing VM from the template that you are going to maintain.  This means you need to deploy 1 VM from each 1 template you keep in the library.  You can choose to backup the template in the library (1 version only per template), just in case the patching breaks the template.

Patching Offline (not template) VHD’s

The VHD can be mounted using Diskpart on a maintenance host (not necc. Hyper-V: W7 or W2008 R2) and DISM is used to inject the update packages into the VHD.

Patching W2008 R2 Clustered Hyper-V Hosts

Must be W2008 R2 hosts and must be clustered.  It puts a host into VMM maintenance mode –> Live Migrates the VM’s to another host.  It patches the host and removes VMM maintenance mode.  The process repeats through the cluster nodes.

There is no integration with OpsMgr so you’ll need to configure a scheduled maintenance mode (by yourself) there for all of your hosts in the cluster to prevent all sorts of nasty alerts.

Summary

This was a good presentation – very demo focused which I like.  The product is now at a point where I think all VMM users should implement it.

MS10-015 / KB977165 Causing BSOD For Some – How To Deal With The Issue

I came home tonight to see reports of blue screens of death and failed boot up/reboots for XP machines that had installed the MS10-015 security patch or hotfix.  There is a long thread on Microsoft Answers.

After installing the patch you have to reboot.  You are then greeted with:

PAGE_FAULT_IN_NONPAGED_AREA

Technical Information:
STOP: 0x00000050 (0x80097004, 0x00000001, 0x80515103, 0x00000000).

The posted fix by Microsoft in the thread is:

  1. Boot from your Windows XP CD or DVD and start the recovery console (see this Microsoft article for help with this step).  Once you are in the Repair Screen..
  2. Type this command: CHDIR $NtUninstallKB977165$spuninst       
  3. Type this command: BATCH spuninst.txt
  4. When complete, type this command: exit

There is what appears to be some misinformation or hysteria about this.  For example:

  • Some news articles are claiming that Windows 2003 and Vista are reported in this thread as being affected.  I saw no mention of those operating systems.
  • I saw one article (a random find) that tried to make it look like that this affected Hyper-V.  Pah!  It does not from everything I have read.  There are no reports of issues with Windows Server 2008 or Windows Server 2008 R2.  Put the Kool-Aid down and step away from the cup.

I cannot claim there are not problems there but they are not in that thread.

EDIT: Overnight after this blog post was originally written, some people did post about Vista and W2003 suffering issues with blue screens caused by the update.

It is bad that a patch has affected many.  I’m sure MS will be making someone feel very uncomfortable overnight about this.  It’s bad that it happened at all.  But let’s face it.  Not everyone is affected.  There is some combination in factors that is contributing to the blue screen.  There is some scenario that MS didn’t test or couldn’t predict.  These things happen.  It could be some niche piece of software or driver that reacts badly to the patch.

EDIT: I’ve read on one site that some people are finding an issue with the ATAPI.SYS file not looking like the genuine file supplied by MS.  They suspect an old malware issue causes an incompatibility with the fix!!!

This situation (whatever the actual cause of the blue screens) is why I think people like Steve Reilly who preach that we should all push out security updates immediately and without question are wrong for me (maybe not wrong for you).  How many zero day exploits have there ever been?  Not many.  Think of the big bad attacks … Nimda, SQL Slammer, MS Blaster, Conficker.  They all attacked vulnerabilities that were fixed with patches long before hand.  What’s a couple of weeks?  It’s because of the rare occasion when a patch goes wrong that I run a 3 phase process for patches.

I have three groups in WSUS.  I configure my Windows Update agents either via group policy (AD members) or registry edits (.REG files for workgroup members) to be members of 1 of 3 groups:

  1. Testing – contains VM’s with various blends of OS and application
  2. Management – Our production AD, management systems, and online applications
  3. Hosting– Hosted customer servers

We’re a hosting company.  WSUS has an automatic approval policy for the Testing group.  The machines in that group are VM’s on my Hyper-V lab server.  They patch in the late morning/early afternoon (around lunch) so we can see how they reacted.

Ideally that group would contain samples of the various bits of hardware you have on the network to include drivers in the mix.  I was lucky enough to be able to do that with one employed in the past – but we did push out updates in less than a week from release.  However, I need to be cost conscious and that is not an option now.

When we’re happy we sit and watch the news.  If all is well, change control happens, and then we approve the updates for the management network.  Stealing a line from Microsoft, we eat our own dog food.  Over the 3 nights of the following weekend (Friday, Saturday, Sunday), machines are patched and reboot automatically.  Some services are clustered/replicated and we do them on different nights or time slots.  We have scheduled scripts on the OpsMgr RMS to put machines into maintenance mode.

Now we watch how that went and continue to watch the news wires.  If there’s no more problems then we approve the updates for the hosting customers after another change control process.  Patches then deploy according to their pre-agreed time windows.

The end result is that within 2-3 weeks all security updates are deployed.  You could compress this down to a week.  We are totally minimizing the risk of being stung by a “bad” update.  Like I said earlier, MS probably did test the update as far as is realistically possible.  There is always the chance that something bad happens.

Steve Reilly’s argument was that if you get a bad update then you call easily rollback your server farm because it’s probably 90% virtual.  In my opinion you shouldn’t really use snapshots in production on Hyper-V.  They’re supported but they suck the life from your VM’s.  DPM or 3rd party solutions that are using the Hyper-V VSS writer are cool for this.  But really, do you want to risk your production network going down for hours while you recover (starting at 3am when your patch failed) because of the rush to deploy an update that will likely not have an attack vector for quite some time?

Weigh the various risks and make an informed decision for yourself.  Maybe Steve Reilly’s approach to push out updates without testing is right for you.  Maybe my phased and cautious approach is.   Maybe there is a middle ground that you prefer.  Do the research and be sure you know why you make your decision and that it is based on fact.

EDIT:

There is strong suspicion that the BSOD’s are actually happening on machines that were already infected by a rootkit called TDSS.  It attacks ATAPI.SYS and replacing that file appears to fix the BSOD issue as well.  Microsoft Security Essentials appears to be able to detect it.

Technorati Tags: ,,,

WSUS: The Update Could Not Be Found

I was re-installing the WSUS role on our security server (W2008 R2) today and hit this error as soon as the installation started:

“The update could not be found”.

It’s a bit of a weird one for a role installation.  I hadn’t the foggiest so I did a quick search and found the solution:

  • Delete the “WindowsUpdate” key from the registry at HKLMSoftwarePoliciesMicrosoftWindows.  I’d recommend you export this to a .reg file to be safe.
  • Restart the Windows Update service.

Now you can go ahead and install WSUS.

The problem and fix applies to previous versions of Windows.  The issue is that the installer is checking Windows Updates but it has found a circular reference.  You’ve uninstalled WSUS from the server and it is configured to update from itself.  How can it?  Make sure you do the install before GPO applies those settings again during an automatic refresh.

Technorati Tags: ,