KB2913461 – Hyper-V Live Storage Migration Fails When Moving Files to WS2012 CSV

Keep in mind that one of the features of Live Migration is that the original virtual machine and/or files are not removed until after the entire process is complete. This “bur no bridges” approach ensures that your virtual machine remains running no matter what happens during the migration process. I’ve seen this personally during the preview releases of WS2012 and WS202 R2 when stress testing Live Migration and other features.

Microsoft has published a KB article for when Hyper-V storage migration fails when you try to migrate VHD and configuration files to CSV volumes in Windows Server 2012.

Symptoms

Consider the following scenario:

  • You install the Hyper-V role on a Windows Server 2012-based two-node failover cluster.
  • You have two Cluster Shared Volumes (CSV) volumes.
  • You create a virtual machine on a cluster node. The virtual machine has a single 60-gigabyte (GB) fixed-size virtual hard disk (VHD).
    Note The virtual machine is not created on a CSV volume.
  • On the cluster node, the available space on drive C is less than 20 GB.
  • In the Hyper-V Manager console, you try to move the VHD file to one CSV volume, and you try to move the configuration files to the other CSV volume.
    Note The CSV volumes have enough space to hold the VHD file and the configuration files.

In this scenario, the migration operation fails, and you receive an error message that resembles the following:

Migration did not succeed. Not enough disk space at ”.

Note This issue still occurs after you install hotfix 2844296. For more information about hotfix 2844296, click the following article number to view the article in the Microsoft Knowledge Base:

2844296

(http://support.microsoft.com/kb/2844296/ )

Shared Nothing Live Migration fails when you try to migrate a virtual machine to a destination server in Windows Server 2012.

Cause

This issue occurs because the target CSV volumes are incorrectly identified as being a system drive volume instead of multiple separate CSV volumes.

A hotfix is available to resolve this issue.

KB2901237–Hyper-V Replica Is Created Unexpectedly After Restarting VMMS On Replica WS2012 Host

The Virtual Machine Management Service (VMMS) runs in user mode in the management OS of every Hyper-V host. It has nothing to do with SCVMM; that’s just an unfortunate similarity in names. The VMMS provides the WMI or management interface to Hyper-V for all management tools, such as PowerShell, Hyper-V Manager, or Failover Cluster Manager.

Microsoft published a KB article for when a standard Hyper-V replica is created unexpectedly after you restart the VMMS service in Windows Server 2012.

Symptoms

Consider the following scenario:

  • You have Windows Server 2012-based Hyper-V servers that are running in an environment that has Hyper Replica deployed.
  • You set more than two recovery points. 
  • You restart the VMMS service on a replica server, or you restart the replica server.
  • You wait about 5 minutes until the first time delta is arrived from the primary site.

In this scenario, a standard replica (recovery point) is created unexpectedly. 
Note If the time interval between the latest recovery point and the arrival of the delta is less than 60 minutes, a standard replica should not be created.

Cause

This issue occurs because the VMMS service incorrectly compares the time stamp of the earliest recovery point to the latest delta time stamp. Therefore, the system takes a new snapshot every time the VMMS service is restarted

A hotfix has been published to resolve this issue. It’s not an issue I’d expect to see too often but the fix is there.

Microsoft Publishes January 2014 Update Rollup For Windows 8, Windows RT, & WS2012

Time for you to do … exactly nothing for a month, because Microsoft has pushed out another UR for Windows 8, Windows RT, and Windows Server 2012. So make sure this sucker is unapproved and sits like that for a month until some other sucker has tested it for you. If there is a problem (and based on the last 12 months, there probably is one or more) then let that other person find the issue, report it, and Microsoft re-issue a fixed update rollup.

After digging into the contents of the update, we can see that there are networking fixes and a cluster fix. The latter is KB2876391, "0x0000009E" Stop error on cluster nodes in a Windows Server-based multi-node failover cluster environment.

Symptoms

Assume that you have a Windows Server 2008 R2 Service Pack 1 (SP1) or Windows Server 2012-based multi-node failover cluster that uses the Microsoft Device Specific Module (MSDSM) and Microsoft Multipath I/O (MPIO). The following events occur at almost the same time:

  • A new instance of an existing device arrives. Specifically, a new path to an MPIO disk is generated.
  • MSDSM finishes an I/O request. The request was the last outstanding I/O request.

In this scenario, some cluster nodes crash. Additionally, you receive a Stop error message that resembles the following:

STOP: 0x0000009E (parameter1, parameter2, parameter3, parameter4)

Notes

  • This Stop error describes an USER_MODE_HEALTH_MONITOR issue.
  • The parameters in this Stop error message vary, depending on the configuration of the computer.
  • Not all "Stop 0x0000009E" errors are caused by this issue.

Cause

This issue occurs because a remove lock on a logical unit number (LUN) is obtained two times, but only released one time. Therefore, the Plug and Play (PnP) manager cannot remove the device, and then the node crashes.

 

The hotfix is included in the UR. Despite what the Premier Sustained Engineering author wrote, this is not just for “Windows Server 2008 R2 SP1-based multi-node failover cluster environment” but it is also for WS2012.

KB2923885 – WS2012 VMs With SR-IOV on WS2012 R2 Hyper-V Incorrectly Say Integration Components Require Upgrade

Microsoft has released a KB article to confirm a problem where Hyper-V Manager incorrectly reports "Update required" for the Hyper-V integration services in Windows Server 2012 guest operating systems that use SR-IOV on Windows Server 2012 R2 Hyper-V hosts.

Symptoms

Assume that you have a Windows Server 2012 R2-based Hyper-V server. A Windows Server 2012-based guest operating system that has integration services up to date and that uses Single Root I/O Virtualization (SR-IOV) is running on the server. After you restart the guest operating system, Hyper-V Manager incorrectly reports the integration services state of the guest operating system as Update required.

Status

This is a known issue of Windows Server 2012 R2. Except for the status report, there are no negative effects to the Hyper-V system.

Microsoft has confirmed that this is a problem.

You can ignore this annoying warning. I suspect that if the warning status appears in VMM then it is really annoying. But very few of you should be affected because SR-IOV is not needed by many VMs.

KB2902821 – A VM On WS2012 Or WS2012 R2 Hyper-V Cannot Use DLC Protocol To Contact SNA Host

This is an odd KB article from Microsoft for Hyper-V. It deals with a virtual machine that is running on on Hyper-V Server 2012 or Hyper-V Server 2012 R2.  The VM is configured to use the DLC protocol but the VM cannot connect to an SNA host.

Symptoms

You use Microsoft Hyper-V Server 2012 or Microsoft Hyper-V Server 2012 R2 to host a virtual machine such as for Microsoft Host Integration Server 2009. If the virtual machine is configured to use the Data Link Control (DLC) protocol to connect to a Systems Network Architecture (SNA) host such as an IBM Mainframe z/OS system, the connection fails.

Cause

This problem occurs because Hyper-V Server 2012 and Hyper-V Server 2012 R2 do not support 802.3 frame types that do not have a Sub-Network Access Protocol (SNAP) header.

A hotfix is available from Microsoft to resolve this issue.

Hyper-V Virtual NUMA Versus Dynamic Memory

When you are using VMs with a large amount of memory then NUMA topology becomes important. Hyper-V can reveal the underlying physical NUMA topology to the VM so that the guest OS and NUMA-aware apps (such as SQL Server) efficiently assign memory and schedule processes to make the most of the boundaries.

There is something important to note. Enabling Dynamic Memory in the settings of a VM disables virtual NUMA. That means that the vast majority of VMs will not have virtual NUMA. To squeeze the best processor/memory performance out of larger VMs you will need to use static RAM, as noted here under Virtual NUMA:

Virtual NUMA and Dynamic Memory features cannot be used at the same time. A virtual machine that has Dynamic Memory enabled effectively has only one virtual NUMA node, and no NUMA topology is presented to the virtual machine regardless of the virtual NUMA settings.

So you have a balancing act to do:

  • Applications and large VMs that might benefit from virtual NUMA probably should have static memory. Enabling Dynamic Memory would indirectly reduce the potential performance of the services provided by that VM because virtual NUMA would be disabled.
  • Note that workloads that are not NUMA-aware cannot make use of virtual NUMA. Therefore enabling Dynamic Memory will not impact performance, and it makes sense to optimize the RAM assignment.
  • Maybe service performance isn’t a big deal (!?!?!?) but the cost of RAM is. Then you would always (if the app/guest OS support it) enable Dynamic Memory.

This is not ideal. Introducing a human decision into a cloud where uneducated “users” are deploying their own VMs makes things less efficient. Hopefully MSFT will overcome the Dynamic Memory versus virtual NUMA conflict in a future version, but when you think about it, this would difficult to do.

Memory Page Combining

My reading of the Windows Server 2012 R2 (WS2012 R2) Performance and Tuning Guide continues and I’ve just read about a feature that I didn’t know about. Memory combining is a feature that was added in Windows 8 and Window Server 2012 (WS2012) to reduce memory consumption. There isn’t too much text on it, but I think memory combining stores a single instance of pages if:

  • The memory is pageable
  • The memory is private

Enabling page combining may reduce memory usage on servers which have a lot of private, pageable pages with identical contents. For example, servers running multiple instances of the same memory-intensive app, or a single app that works with highly repetitive data, might be good candidates to try page combining.

Bill Karagounis talked briefly about memory combining in the old Sinofsky Building Windows 8 blog (where it was easy to be lost in the frequent 10,000 word posts):

Memory combining is a technique in which Windows efficiently assesses the content of system RAM during normal activity and locates duplicate content across all system memory. Windows will then free up duplicates and keep a single copy. If the application tries to write to the memory in future, Windows will give it a private copy. All of this happens under the covers in the memory manager, with no impact on applications. This approach can liberate 10s to 100s of MBs of memory (depending on how many applications are running concurrently).

The feature therefore does not improve things for every server:

Here are some examples of server roles where page combining is unlikely to give much benefit:

  • File servers (most of the memory is consumed by file pages which are not private and therefore not combinable)
  • Microsoft SQL Servers that are configured to use AWE or large pages (most of the memory is private but non-pageable)

You can enable (memory) page combining using Enable-MMAgent and query the status using Get-MMAgent.

You’ll find that memory combining is enabled by default on Windows 8 and Windows 8.1.  That makes these OSs even more efficient for VDI workloads. It is disabled by default on servers – analyse your services to see if it will be appropriate.

There is a processor penalty for using memory combining. The feature is also not suitable for all workloads (see above).  So be careful with it.

KB2868279–Moving A VM From WS2012 R2 Hyper-V To WS2012 Hyper-V Is Not Supported

I have to admit that I find this KB article and support statement to be quite baffling.  It states that:

Moving a virtual machine (VM) from a Windows Server 2012 R2 Hyper-V host to a Windows Server 2012 Hyper-V host is not a supported scenario under any circumstances. 
When you try import a VM that is exported from a Windows Server 2012 R2 Hyper-V host into a Windows Server 2012 Hyper-V host, you receive the following error message: 

Hyper-V did not find virtual machines to import from the location <folder location>.
The operation failed with error code ‘32784’.

I am going to raise this with the product group. I see it as a genuine issue because anyone doing an upgrade-migration will require a rollback plan that will work and is supported.

You can move a VM from a Windows Server 2012 Hyper-V host to a Windows Server 2012 R2 Hyper-V host. This is a supported scenario and cane even be done with zero downtime using Cross-Version Live Migration.

How Much RAM & CPU Does Window Server Deduplication Optimization Require?

I’ve been asked about resource requirements for the dedupe optimization job before but I did not have the answer before now.

Processor

The CPU side is … not clear.  The dedupe subsystem will schedule one single-threaded job per volume. That means a machine with 8 logical processors is only 1/8th utilized if there is a single data volume. Microsoft says:

To achieve optimal throughput, consider configuring multiple deduplication volumes, up to the number of CPU cores on the file server.

That seems pretty dumb to me. “Go ahead and complicate volume management to optimize the dedupe processing”. Uhhhhh, no thanks.

Memory

Microsoft tells us that 1-2 GB RAM is used per 1 TB of data per volume.  They clarify this with an example:

Volume Volume size Memory used
Volume 1 1 TB 1-2 GB
Volume 2 1 TB 1-2 GB
Volume 3 2 TB 2-4 GB
Total for all volumes 1+1+2 * 1GB up to 2GB 4 – 8 GB RAM

By default a server will limit the RAM used by the optimization job to 50% of total RAM in the server.  So if the above server had just 4 GB RAM, then only 2 GB would be available for the optimization job.  You can manually override this:

Start-Dedupjob <volume> -Type Optmization  -Memory <50 to 80>

There is an additional note from Microsoft:

Machines where very large amount of data change between optimization job is expected may require even up to 3 GB of RAM per 1 TB of diskspace.

So you might see RAM become a bottleneck or increase pressure (in a VM with Dynamic Memory) if the optimization job hasn’t run in a while or if lots of data is dumped into a deduped volume.  Example: you have deployed lots of new personal (dedicated) VMs for new users on a deduped volume.

KB2878635 – A December 2013 Update To Improve CSV Backup Resiliency On WS2012 Hyper-V

Microsoft released an update in December 2013 that that improves the resiliency of the cloud service provider in Windows Server 2012. That’s a little “marketing speak”. In truth, this update is designed to resolve issues with CSV backup on Windows Server 2012 Hyper-V. This update two fixes. Please note the post-installation instructions!!!!

Symptoms

This article introduces an update that improves the resiliency of the cloud service provider in Windows Server 2012. This update is dated December 2013.
This update replaces update 2870270, which is used to improve resiliency. Also, this update includes update 2869923 and update 2908415. Additionally, the update resolves several issues that occur in the following scenario: 

  • You have a Hyper-V failover cluster.
  • The Hyper-V resources are saved in .vhd files on Cluster Shared Volumes File System (CSVFS) volumes.
  • You use a backup solution. For example, you use System Center Data Protection Manager (DPM) in the Hyper-V environment.
  • You try to perform a backup, and a snapshot is taken of the CSVFS volume.
  • The current active node encounters an error, and the cluster fails over to another node.
  • DPM may start a consistency check on the volume unexpectedly.
Issue 1

Snapshots that are no longer being used are not cleaned up. Therefore, Volume Shadow Copy Service (VSS) snapshots may accumulate on Cluster Shared Volumes (CSV) and guest virtual machines. This causes a deadlock in the Resource Hosting Subsystem (RHS) process, and causes  CSV failures. Additionally, all Hyper-V instances that uses the VHD files go down.
Additionally, the following events are logged separately in the Cluster log and in the System log:

Software snapshot creation on Cluster Shared Volume(s) (‘volume location‘) with snapshot set id ‘snapshot id‘ failed with error ‘HrError(0x80042308)(2147754760)’. Please check the state of the CSV resources and the system events of the resource owner nodes.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume1‘ (‘name’) is no longer available on this node because of ‘STATUS_IO_TIMEOUT(c00000b5)’. All I/O will temporarily be queued until a path to the volume is reestablished.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5142
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume3‘ (‘Cluster Disk 4‘) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

Issue 2

If a CSVFS volume is repeatedly added and removed from a cluster, or if CSVFS snapshots are repeatedly created, the Plug and Play (PnP) hive in the following registry path may grow with many additional registry keys: HKEY_LOCAL_MACHINESYSTEMCurrentControlSetEnumSTORAGEVolumeSnapshot.
Therefore, installation of PnP volumes (which occurs usually during a resource move or failure) may become slow.  This update prevents future unnecessary growth in the volume snapshot registry key, but does not clean up existing registry entries.

 

Resolution

A supported hotfix is available from Microsoft.

Post-Installation Instructions

After you install this hotfix on a Hyper-V server, you must update the integration components in the virtual machines that are running Windows Server 2012. To do this, use Hyper-V Manager to connect to the virtual machine. This starts the Virtual Machine Connection tool. Then, on the Action menu, click Insert Integration Services Setup Disk. Run the Setup.exe file on the Integration Services Setup disk to update the integration component.