KB2761899 – Hyper-V VMMS Fails When Dynamicportrange Is Changed On WS2012

Microsoft released a KB article for when the Hyper-V Virtual Machine Management Service (VMMS) fails and event ID 14050 is logged after dynamicportrange is changed in Windows Server 2012. Note that the VMMS is a service that runs in the Management OS of a Hyper-V host and provides the WMI API to manage Hyper-V and the VMs running on that host.

Symptoms

Assume that you have a computer that is running Windows Server 2012 with Hyper-V installed. If you try to manage the Hyper-V hosts either by using System Center Virtual Machine Manager 2012 Service Pack 1 (SP1) or remotely by using Hyper-V Manager, the attempt fails. Additionally, an event may be logged in the event log that resembles the following:

Log Name: Microsoft-Windows-Hyper-V-VMMS-Admin
Source: Microsoft-Windows-Hyper-V-VMMS
Date: <Date> <Time>
Event ID: 14050
Level: Error
Description: Failed to register service principal name.
Event Xml: …
<Parameter0>Hyper-V Replica Service</Parameter0>

Cause

This problem may occur if the TCP dynamic port range is out of the default range. The Virtual Management Service (Vmms.exe) of Hyper-V uses Windows Service Hardening, and it limits itself to the dynamic port range.

To determine the TCP dynamic port range, run the following command at an elevated command prompt:

C:>netsh int ipv4 show dynamicportrange tcp
Protocol tcp Dynamic Port Range
---------------------------------
Start Port      : 49152
Number of Ports : 16384

This problem may also occur if the NTDS port has been restricted to a specific port on your domain controllers. If this selected NTDS port is not within the default ranges, you must add this port by running the script in the "Resolution" section on every Hyper-V host.

For more information, click the following article number to go to the article in the Microsoft Knowledge Base:

Resolution

Run the script from the original KB article on each affected host. This script adds a custom port range to enable Vmms.exe to communicate over an additional port range of 9000 to 9999

My Most Popular Articles In 2013

I like to have a look at what people are reading on my blog from time to time.  It gives me an idea of what is working, and sometimes, what is not – for example, I still get lots of hits on out-dated articles.  Here are the 5 most viewed articles of the last year, from 5 to 1.

5) Windows Server 2012 Hyper-V Replica … In Detail

An oldie kicks off the charts … this trend continues throughout the top 5.  At least this one is a good subject that is based on WS2012 and is still somewhat relevant to WS2012 R2.  Replica is one, if not the, most popular features in WS2012 (and later) Hyper-V.

4) Rough Guide To Setting Up A Hyper-V Cluster

I wrote this article in 2010 for Windows Server 2008 R2 and it’s still one of my top draws.  I really doubt you folks still are deploying W2008 R2 Hyper-V; I really hope you folks are not still deploying W2008R2 Hyper-V!!!!  Join us in this decade with a much better product version.

Please note that the networking has change significantly (see converged networks/fabrics).  The quorum stuff has changed a bit too (much simpler).

3) Windows Server 2012 Licensing In Detail

Licensing!!! Gah!

2) Comparison of Windows Server 2012 Hyper-V Versus vSphere 5.1

There’s nothing like kicking a hornets nest to generate some web hits Smile  We saw VMware’s market share slide in 2013 (IDC) while Hyper-V continued the march forward.  More and more people want to see how these products compare.

And at number one we have … drumroll please …

1) Windows Server 2012 Virtualisation Licensing Scenarios

Wow! I still cannot believe that people don’t understand how easy the licensing of Windows Server on VMware, Xen, Hyper-V, etc, actually is.  Everyone wants to overthink this subject.  It’s really simple: It’s 2 or unlimited created Window Server VMs per assigned license to a host people!!!  This page accounted for 2.8% of all views in the last 12 months.

Sadly, not a single post from the last year makes it into the top 10.  I guess that folks aren’t reading about WS2012 R2.  Does this indicate that there is upgrade fatigue?

Linux Integration Services Version 3.5 For Hyper-V Is Released

Microsoft has released version 3.5 of the Hyper-V integration components for Linux.  This download is intended for versions of Linux that do not have the Linux Integration Services (LIS) for Hyper-V already installed in the kernel.

Version 3.5 of the LIS supports:

  • Red Hat Enterprise Linux (RHEL) 5.5-5.8, 6.0-6.3 x86 and x64
  • CentOS 5.5-5.8, 6.0-6.3 x86 and x64

Hyper-V from 2008 R2 onwards is supported, including Windows 8 and 8.1.

The below matrix describes which Hyper-V features are supported in which version of the LIS and distro/version of Linux:

image image

Notes

  1. Static IP injection might not work if Network Manager has been configured for a given HyperV-specific network adapter on the virtual machine. To ensure smooth functioning of static IP injection, ensure that either Network Manager is turned off completely, or has been turned off for a specific network adapter through its Ifcfg-ethX file.
  2. When you use Virtual Fibre Channel devices, ensure that logical unit number 0 (LUN 0) has been populated. If LUN 0 has not been populated, a Linux virtual machine might not be able to mount Virtual Fibre Channel devices natively.
  3. If there are open file handles during a live virtual machine backup operation, the backed-up virtual hard disks (VHDs) might have to undergo a file system consistency check (fsck) when restored.
  4. Live backup operations can fail silently if the virtual machine has an attached iSCSI device or a physical disk that is directly attached to a virtual machine (“pass-through disk”).
  5. LIS 3.5 only provides Dynamic Memory ballooning support—it does not provide hot-add support. In such a scenario, the Dynamic Memory feature can be used by setting the Startup memory parameter to a value which is equal to the Maximum memory parameter. This results in all the requisite memory being allocated to the virtual machine at boot time—and then later, depending upon the memory requirements of the host, Hyper-V can freely reclaim any memory from the guest. Also, ensure that Startup Memory and Minimum Memory are not configured below distribution recommended values.

The following features are not available in this version of LIS:

  • Dynamic Memory hot-add support
  • TRIM support
  • TCP offload
  • vRSS

Corrupted Memory Dump When You Obtain Full Memory Dump From A VM On WS2012 Or W2008R2 Cluster

Microsoft have released a KB article for when you get a corrupted memory dump file when you try to obtain a full memory dump file from a virtual machine that is running in a cluster environment.

Symptoms

You have a virtual machine that is running in a cluster environment in Windows Server 2012 or Windows Server 2008 R2. When you try to obtain a full memory dump file from the virtual machine, a corrupted memory dump file is generated. While the memory dump file is loading, you may receive the following message:

**************************************************************************
THIS DUMP FILE IS PARTIALLY CORRUPT.
KdDebuggerDataBlock is not present or unreadable.
**************************************************************************
GetContextState failed, 0xD0000147
Unable to get program counter
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147

Additionally, you may notice that writing a full memory dump file does not finish and that the virtual machine is restarted on another node in the cluster.

Cause

This issue occurs because the Enable heartbeat monitoring for the virtual machine option is selected for the virtual machine. This option resets the clustered virtual machine after one minute (the default value), and the clustered virtual machine requires longer that one minute to finish writing the memory dump.

Note Heartbeats between the virtual machine and Virtual Machine Manager occur every few seconds. It can require up to one minute to detect that the virtual machine is down because the virtual machine resource checks the heartbeat status from Virtual Machine Manager in its isAlive entry-point function. By default, isAlive occurs one time every minute. However, the heartbeats may stop 30 seconds before the one-minute interval. In this case, the cluster can restart the virtual machine on the same server or fail it over to another node.

There are two options for resolving this issue.

Option 1: Change the settings from the GUI
  1. Open Failover Cluster Manager.
  2. Click Roles, and then find the virtual machine resource. 
  3. On the Resources tab, right-click the virtual machine. 
  4. Click Properties, and then click the Settings tab.
  5. In Heartbeat Setting, click to clear the Enable automatic recovery for application health monitoring check box.
  6. Click to clear the Enable heartbeat monitoring for the virtual machine check box, and then click OK.
Option 2: Change the settings by using Windows PowerShell
  1. Start Windows PowerShell.
  2. Check the virtual machine name. To do this, type the following Windows PowerShell command:

    PS C:> Get-ClusterResource

  3. Check whether the Enable heartbeat monitoring for the virtual machine and Enable automatic recovery for application health monitoring options are selected. To do this, type the following Windows PowerShell command:

    PS C:> Get-ClusterResource <VirtualMachineName> | Get-ClusterParameter CheckHeartbeat

  4. When the CheckHeartbeat value is 1, both options are selected. To cancel both options, change this value to 0. To do this, type the following Windows PowerShell command:

    PS C:> Get-ClusterResource <VirtualMachineName> | Set-ClusterParameter CheckHeartbeat 0

    NoteIf you want to cancel only the Enable automatic recovery for application health monitoring option, you should run the following Windows PowerShell command:
    PS C:> (Get-ClusterResource <Object>).EmbeddedFailureAction = 1

KB2779069 – Hotfix To Determine Which Cluster Node Is Blocking GUM Updates In WS2008R2 & WS2012

First some background …

A cluster is made up of (normally) 2 or more servers.  They use a distributed database to keep a synchronised copy of the configuration of the HA resources, e.g. HA VMs on a Hyper-V cluster.  Something called the Global Update Manager (GUM) is used to coordinate consistent updates to resource configurations across the cluster nodes. 

When a node contains an update that has to be shared with other nodes an initiator node first obtains a GUM lock. Then, the node shares the update by using a Multicast Request Reply (MRR) message to the other nodes. After this update is sent, the initiator node waits for a response from other nodes before you continue. However, in certain conditions, one of the nodes does not reply to the GUM request in time because the node is stuck for some reason. Currently, there is no mechanism to determine which node is stuck and does not reply to the GUM request.

That’s just changed thanks to a hotfix that is now available that adds two new cluster control codes to help you determine which cluster node is blocking a GUM update in Windows Server 2008 R2 and Windows Server 2012.

After you install this hotfix, two new cluster control codes are added to help the administrator resolve the problem. One of the cluster control codes returns the GUM lock owner, and the other control code returns the nodes that are stuck. Therefore, administrator can restart the stuck nodes to resolve the problem. For more information about the new control codes, go to the following Microsoft:

Notes

  • The Cluster service has a facility that is called GUM. GUM is used to distribute a global state throughout the cluster.
  • Only one cluster node can send GUM messages at any time. This node is called the GUM lock owner.
  • The GUM lock owner sends an MRR message to a subset of cluster nodes, and then waits for the nodes send message receipt confirmations.
  • Run some iterations of these control codes to confirm that the node is stuck.
  • After the CLUSCTL_CLUSTER_GET_GUM_LOCK_OWNER control code is called, you have to close the cluster handle. Then, you reopen the cluster handle by using the GUM lock owner node name that is returned by the control code. If you do not perform this action, the CLUSCTL_NODE_GET_STUCK_NODES control code may return an incorrect result.

You can get this hotfix from here.

KB2905412 – Stop Error 0xD1 On Windows-Based Computer With Multiple Processors

Not strictly a Hyper-V issue, but you’ll understand why I am blogging about this one; A hotfix has been released for when a stop error 0xD1 on a Windows-based computer with multiple processors.

Symptoms

Your multiprocessor Windows-based computer crashes every two to three days. Additionally, Stop error 0xD1 is generated when the computer crashes.

Cause

This problem occurs because of a race condition that exists in the TCP/IP driver in a multiprocessor environment. If duplicate TCP segments are received on different processors, they may be sequenced incorrectly, and this triggers the crash.

A hotfix is available

KB2908415 – CSVs Go Offline Or Cluster Service Stops During VM Backup On WS2012 Hyper-V

Another hotfix from Microsoft, this one for when clustered shared volumes go offline or the Cluster service stops during VM backup on a Windows Server 2012 Hyper-V host server.

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012 Hyper-V host server.
  • You have the server in a cluster environment, and you use cluster shared volumes.
  • You try to back up a virtual machine (VM).

In this scenario, you may find that the cluster shared volumes go offline, and resource failover occurs on the other cluster nodes. Then, other VMs also go offline, or the Cluster service stops.

Cause

This problem occurs when there are many snapshots in the VM. This causes the Plug and Play (PnP) functionality on the host to be overwhelmed, and other critical cluster activity cannot finish.

A supported hotfix is available from Microsoft Support.

KB2902014 – Guest System Time Incorrect After VM Crashes On Win8 or WS2012 Hyper-V Host

This is a busy month for hotfixes!  Microsoft has released a fix for when system time of a virtual machine becomes incorrect after it crashes or resets in a 64 bit Windows 8-based or Windows Server 2012-based Hyper-V host.

Symptoms

Consider the following scenario:

  • You create a virtual machine (VM) on a Hyper-V host that runs 64 bit Windows 8 or Windows Server 2012.
  • You disable the Hyper-V time synchronization integration service on the VM.
  • The VM crashes or resets.

In this situation, the system time of the VM is incorrect when it starts again.

Cause

This issue occurs because the time information of the VM is not saved on VHD as expected when the VM crashes or resets. When the VM starts again, it uses the old and out-of-date time information that was saved.

A hotfix is available to prevent this issue.

KB2894485 – Cross-Page Memory Operation Crashes VM on Win8 or WS2012 Hyper-V Host

Microsoft has released a hotfix for when a cross-page memory read or write operation crashes virtual machine that runs on 64-bit Windows 8-based or Windows Server 2012-based Hyper-V host.

Symptoms

Assume that you install a Windows system virtual machine on a Hyper-V host that runs 64-bit Windows 8 or Windows Server 2012. You have an application that runs on the virtual machine. This application performs memory read or write operation that touches MMIO (Memory Mapped Input Output) region. The operation crosses page boundary. In this situation, the virtual machine crashes.

A hotfix is available to prevent this problem.

KB2898774 – Data Loss Occurs On SCSI Disk That Turns Off In WS2012-Based Failover Cluster

Microsoft has released a KB article to avoid data loss occurring when a SCSI disk turns off in a Windows Server 2012-based failover cluster.

Symptoms

Consider the following scenario:

  • You deploy a Windows Server 2012-based failover cluster. The cluster contains two nodes (node A and node B).
  • A SCSI disk is used for the failover cluster. The disk is a shared disk and is accessible by both node A and node B.
  • Node A restarts or crashes. Then, the cluster fails over to node B.
  • Node A comes back online.
  • Node B is shut down and the cluster fails over to node A.
  • You write some data to the disk.
  • The disk turns off unexpectedly. For example, the device losses power.

In this scenario, the data that you write to the disk is lost.
Notes

  • This issue also occurs when the cluster contains more than two nodes.
  • This issue does not occur if the SCSI disk supports the SCSI Primary Commands – 4 (SPC-4) standard.

To resolve this issue, install update rollup 2903938.  It’s an update rollup, so update rollup rules apply – either test like nuts in a lab or wait a month before you approve/deploy it.