Azure In-Place VM Migration Eliminates Reboots During Host Maintenance

Microsoft is finally making updates to Azure to reduce downtime to virtual machines when a host is rebooted.

Microsoft sent out the following announcement via the regular pricing and features update email to customers last night:

image

That sounds like Quick Migration. So Azure has caught up with Windows Server 2008 Hyper-V. Winking smile And it sounds like later in 2016, we’ll get Live Migration … yay … Windows Server 2008 R2 Hyper-V Smile with tongue out

Seriously, though, Azure was never designed for the kinds of high availability that we put into an on-premises Hyper-V cluster. Azure is cloud scale, with over 1 million physical hosts. A cluster has around 1000 hosts! When you build at that scale, HA is done in a different way. You encourage customers to design for an army of ants … lots of small deployments where HA is done using software design leveraging cloud fabric features, rather than by hardware. But, when you have customers (from small to huge) who have lots of legacy applications (e.g. file server) that cannot be clustered in Azure without redesign/re-deployment/expense, then you start losing customers.

So Microsoft needed to make changes that acknowledged that many customer workloads are not cloud ready … and to be honest, most of the prospects I’ve encountered where code was being written, the developers weren’t cloud ready either – they are sticking to the one DB server and one web server model that has plagued businesses since the 1990s.

These improvements are great news … and they’re just the tip of last night’s very big and busy iceberg.

KB3172614 To Replace/Fix Hyper-V Installations Broken By KB3161606

Microsoft released a new update rollup to replace the very broken and costly (our time = our money) June rollup, KB3161606. These issues affected Hyper-V on Windows 8.1 and Windows Server 2012 R2 (WS2012 R2).

It’s sad that I have to write this post, but, unfortunately, untested updates are still being released by Microsoft. This is why I advise that updates are delayed by 2 months.

In the case of the issues in the June 2016 update rollup, the fixes are going to require human effort … customers’ human effort … and that means customers are paying for issues caused by a supplier. I’ll let you judge what you think of that (feel free to comment below).

A month after news of the issues in the update became known (the update rollup was already in the wild for a week or two), Microsoft has issued a superseding update that will fix the issues. At the same time, they finally publicly acknowledge the issues in the June update:

image

So it took 1.5 months, from the initial release, for Microsoft to get this update right. That’s why I advise a 2 month delay on approving/deploying updates, and I continue to do so.

What Microsoft needs to fix?

  • Change the way updates are created/packaged. This problem has been going on for years. Support are not good at this stuff, and it needs to move into the product groups.
  • Microsoft has successfully reacted to market pressure by making a special emphasis to change, e.g. The Internet, secure coding, The Cloud. Satya Nadella needs to do the same for quality assurance (QA), something that I learned in software engineering classes was as important as the code. I get that edge scenarios are hard to test, but installing/upgrading ICs in a Hyper-V guest OS is hardly a rare situation.
  • Start communicating. Put your hands up publicly, and say “mea culpa”, show what went wrong and follow it up with progress reports on the fix.

 

Don’t Deploy KB3161606 To Hyper-V Hosts, VMs, or SOFS

Numerous sources have reported that KB3161606, an update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 (WS2012 R2), are breaking the upgrade of Hyper-V VM integration components. This has been confirmed & Microsoft is aware of the situation.

As noted below by the many comments, Microsoft eventually released a superseding update to resolve these issues.

The scenario is:

  1. You deploy the update to your hosts – which upgrades the ISO for the Hyper-V ICs
  2. You deploy the update to your VMs because it contains many Windows updates, not just the ICs.
  3. You attempt to upgrade the ICs in your VMs to stay current. The upgrade will fail.

Note that if you upgrade the ICs before deploying the update rollup inside of the VM, then the upgrade works.

My advice is the same as it has been for a while now. If you have the means to manage updates, then do not approve them for 2 months (I used to say 1 month, but System Center Service Manager decided to cause havoc a little while ago). Let someone else be the tester that gets burned and fired.

Here’s hoping that Microsoft re-releases the update in a way that doesn’t require uninstalls. Those who have done the deployment already in their VMs won’t want another painful maintenance window that requires uninstall-reboot-install-reboot across all of their VMs.

EDIT (6/7/2016)

Microsoft is working on a fix for the Hyper-V IC issue. After multiple reports of issues on scale-out file servers (SOFS), it’s become clear that you should not install KB3161606 on SOFS clusters either.

Windows 10 Being Pushed Out To Domain-Joined PCs

Brad Sams (my boss at Petri.com) published a story last night about how Microsoft has started to push out Windows 10 upgrades to domain-joined PCs.

Note that the PC doesn’t upgrade via Windows Update; the user will be prompted if they want to update, and then a deliberately confusing screen “encourages” the user to upgrade.

Brad notes that the environment must meet certain requirements:

  • The machine must be running and licensed for Windows 7 Pro or Windows 8.1 Pro (Enterprise doesn’t do this stuff).
  • There is no WSUS, ConfigMgr, etc – the machine gets updates directly from MSFT – this means smaller businesses for the most part.
  • The machine must be a domain member.

As you can see, this affects SMEs with a domain (no WSUS, etc). But I’d be surprised if larger businesses weren’t targeted at a later point in order to help MSFT hit their 1 billion PCs goal.

In my opinion, this decision to push upgrades to business is exactly the sort of action that gives Microsoft such a bad name with customers. Most SMEs won’t know this is coming. A lot of SMEs run systems that need to be tested, upgraded, or won’t support or work on newer operating systems. So Microsoft opting to force change and uncertainty on those businesses that are least ready is down right dumb. Brad reports that Microsoft claims that people asked for this upgrade. Right – fine – let those businesses opt into an upgrade via GPO instead of the other way around. Speaking of which …

There is a blocker process. I work in a small business and I’ve deployed the blocker. Windows Update added new GPO options to our domain controllers, and I enabled the GPO to block Windows upgrades via Windows Update:

image

As you can see – I’ve deployed this at work. We will upgrade to Windows 10 (it’s already started) but we will continue to do it at our own pace because we cannot afford people to be offline for 2 hours during the work day while Windows upgrades.

Driver Updates By Windows Update Are Ruining Windows 10 For Me

In previous posts I talked about how Windows Update was breaking the Intel HD graphics adapters in my Lenovo Yoga and Toshiba KIRAbook Ultrabooks, and I also posted a solution that should prevent Windows Update from downloading drivers. Well … nothing has worked, and I regularly face broken graphics drivers on my Ultrabooks.

The only solution that I have to solve the issue is:

  • Uninstall the device in Device Manager
  • Refresh
  • Manually install a driver that I downloaded from Intel – I keep this driver for regularly carrying out this process.

I’ve found that Windows Update can silently install the updated fault driver during the middle of a presentation, and suddenly I am no longer sharing my display with the projector/screen – that’s an interesting problem, that requires 5-10 minutes of fixing.

Some folks have suggested that I use the solution in KB3073930, How to temporarily prevent a Windows or driver update from reinstalling in Windows 10. I did, and that worked for 5 days, until Microsoft shipped replacement versions of the driver, the block rule lapsed, and I was back to Square One.

This is the only issue I’m having with Windows 10 … but it is absolutely driving me nuts.

It’s no wonder that Samsung felt like they had to block all Microsoft updates to give customers a stable Windows experience. Please Microsoft, stop shipping frakked up drivers, or give me actual control over these updates on Windows 10, not just the illusion of it!!!

Let me be very clear: the only source of driver updates should be from the PC manufacturer. Microsoft has always sucked at this, and their new “we know best” model with Windows 10 shows how out of touch they are with this subject.

KB2793908–Leaving WS2012 Server Manager Open Can Cause A Memory Leak

I found this KB article which is a part of a much larger cumulative update (KB2811660) for Windows 8 and Windows Server 2012. 

When you leave Server Manager running in Windows Server 2012, a memory leak occurs in the Wmmimgmt.exe process.

This one is annoying because Server Manager opens by default.  There’s a tip here to stop that from happening.  I haven’t tried the 2008/R2 registry hack yet: Go to HKLMSoftwareMicrosoftServerManager and set the value "DoNotOpenServerManagerAtLogon" to 1.  There is also a GPO option.

You can get KB2811660 via Windows Update.

KB2734608: Enable WSUS 3.0 SP2 To Support Windows Server 2012 And Windows 8

Microsoft has released an update for WSUS 3.0 SP2 that enables Windows Server Update Services to provide updates for Windows Server 2012 and Windows 8.  It is available as an x86 and x64 download.

According to the Microsoft SUS blog, this update will fix:

This update lets servers that are running Windows Server Update Services (WSUS) 3.0 SP2 provide updates to computers that are running Windows 8 or Windows Server 2012.

This update fixes the following issues:

  • Installation of update 2720211 may fail if Service Pack 2 was previously uninstalled and then reinstalled.
  • After you install update 2720221, health monitoring may fail if the WSUS server is configured to use SSL.

Additionally, this update includes the following fixes:

  • 2530678 System Center Update Publisher does not publish customized updates to a computer if WSUS 3.0 SP2 and the .NET Framework 4 are installed
  • 2530709 "Metadata only" updates cannot be expired or revised in WSUS 3.0 SP2
  • 2720211 An update for Windows Server Update Services 3.0 Service Pack 2 is available

Patching A Windows Server 2012 Failover Cluster, Including Hyper-V

Cluster Aware Updating (CAU) is a new feature that makes running Windows or Automatic Updates on a Hyper-V cluster easier than ever, as well as any other WS2012 cluster.

If you currently have a Windows Server 2008/R2 Hyper-V cluster, then you have a few options for patching it with no VM downtime:

  • Manually Live Migrate VM workloads (Maintenance Mode in VMM 2008 R2makes this easier), patch, and reboot each host in turn, which is a time consuming manual task.
  • Use System Center Opalis/Orchestrator to perform a runbook against each cluster node in turn that drains the cluster node of it’s roles (VMs), patches it and reboots it.
  • Use the patching feature of System Center 2012 Virtual Machine Manager – which is limited to Hyper-V clusters and adds more management to your patching process.

CAU is actually pretty simple:

  1. Have some patching mechanism configured: e.g. enable Automatic Updates on the cluster nodes (e.g. Hyper-V hosts), approve updates in WSUS/ConfigMgr/etc.  Make sure that you exempt your cluster nodes from automatic installation/rebooting in your patching policy; CAU will do this work.
  2. Log into Failover Clustering from a machine that is not a cluster node (Hyper-V host) member.  Run the CAU wizard.
  3. Here, you can either manually kick off a patching job for the cluster nodes or schedule it to run automatically.  The scheduled automatic option requires that you have deployed a CAU role on the cluster in question to orchestrate the patching.

When a patching job runs the following will happen:

  1. Determine the patches to install per node.
  2. Put node 1 in a paused state (maintenance mode).  This drains it of clustered roles – in other words your Hyper-V VMs will Live Migrate to the “best possible” hosts.  Failover Clustering uses amount of RAM to determine the best possible host.  VMM’s advantage is that it uses more information to perform Intelligent Placement.
  3. Node 1 is removed from a paused state, enabling it to host roles (VMs) once again.
  4. CAU will wait then patch and reboot Node 1.
  5. When Node 1 is safely back online, CAU will move onto Node 2 to repeat the operation.

VMs are Live Migrated throughout the cluster as the CAU job runs and each host is put into a paused state (automatically Live Migrating VMs off), patching, rebooting, and un-pausing.  It’s a nice simple operation.

The process is actually quite configurable, enabling you to definite variables for decisions, execute scripts at different points, and define a reboot timeout (for those monster hosts).

Something to think of is how long it will take to drain a host of VMs.  A 1 GbE Live Migration network will take an eternity to LM (or vMotion for that matter) 192 GB RAM of VMs, even with concurrent LMs (as we have in Windows Server 2012).

Sounds nice, eh?  How about you see it in action:

 

 

 

I have edited the video to clip out lots of waiting:

  • These were physical nodes (Hyper-V hosts) and a server’s POST takes forever
  • CAU is pretty careful, and seems to deliberately wait for a while when a server changes state before CAU continues with the task sequence.

 

 

KB2568088 – Hyper-V VM Won’t Start on AMD CPU with AVX

I just noticed a new patch was added to the list on the TechNet wiki for Hyper-V on Windows Server 2008 R2 Service Pack 1 (W2008 R2 SP1).  There are 2 scenarios:

Issue 1

  • You have an AMD CPU that supports the Advanced Vector Extensions (AVX) feature on a computer that is running Windows Server 2008 R2 RTM.
    Note AMD introduced support for the AVX feature in Bulldozer-based multicore processors.
  • You install the Hyper-V server role on the computer.
  • You create a virtual machine on the computer, and then you try to start the virtual machine.

In this scenario, the virtual machine does not start, and you receive an error message that resembles the following:`

<Virtual machine name> could not initialize.

This issue occurs because Windows Server 2008 R2 RTM does not support the AVX feature.

Issue 2

  • You have an AMD CPU that supports the AVX feature on a computer that is running Windows Server 2008 R2 Service Pack 1 (SP1). 
    Note AMD introduced support for the AVX feature in Bulldozer-based multicore processors.
  • You install the Hyper-V server role on the computer.
  • You create a virtual machine on the computer, and then you try to start the virtual machine.

In this scenario, the virtual machine does not start, and you receive the following error message:

Virtual machine could not start because the hypervisor is not running.

Additionally, the following event is added to the Microsoft-Windows-Hyper-V-Worker-Admin log:

Source: Microsoft-Windows-Hyper-V-Worker
Event ID: 3112
Level: Error
Description:
The virtual machine could not be started because the hypervisor is not running

This issue occurs because Windows Server 2008 R2 SP1 does not support the AVX feature on AMD processors.

You can download the hotfix to resolve this issue.

Whitepaper: A Guide to Hyper-V Dynamic Memory

I’ve just published a new document or guide that is subtitled as “Understanding, enabling, and configuring Windows Server 2008 R2 Hyper-V Dynamic Memory for virtualised workloads”.

This whitepaper will walk you through:

  • The mechanics of Windows Server 2008 R2 SP1 Hyper-V Dynamic Memory
  • The scenarios that you’ll employ it in
  • The pre-requisites for Dynamic Memory
  • Configuring Dynamic Memory
  • Some of the application workload scenarios

“We normally don’t like it when a service pack includes new features. New features mean changes that need to be tested, possible compatibility issues, and more headaches in between the usual operating system deployment cycles. Windows Server 2008 R2 Service Pack 1 came with a number of new features but we did not complain; in fact, we virtualisation engineers had a mini celebration. This is because those new features were mostly targeted at server and desktop/session virtualisation, and aimed to give us a better return on hardware investment.

Dynamic Memory was one of those new features. Put very simply, this VM memory allocation feature allows us to get more virtual machines on to a Hyper-V host without sacrificing performance.

You can use Dynamic Memory in a few scenarios. The one that gets the most publicity is virtual desktop infrastructure (VDI) where economic PCs are replaced by expensive virtual machines running in the data centre. It’s critical to get as many of them on a host as possible to reduce the cost of ownership. Server virtualisation is the scenario that we techies are most concerned with. We’ve typically found that we tend to run out of memory before we get near to the processor or storage I/O limits of our hardware. And the final scenario is where we use Hyper-V to build an Infrastructure-as-a-Service cloud, where elasticity and greater virtual machine density are required.

The approach that Microsoft took with this new memory optimisation technique ensures that concepts such as over commitment are not possible; that’s because over commitment potentially does cause performance issues. Dynamic Memory does require that you understand how it works, how to troubleshoot it, and how applications may be affected, before you log into your hosts and start enabling it. It will require some planning.

The aim of this document is to teach you how Dynamic Memory works, show you how to configure it, how to monitor it, and how to use it in various application scenarios”.

The document continues …

Credit:

Big shout out to the Hyper-V PMs and my fellow MVPs for the many conversations over the past year that allowed us to learn a lot.