KB2937634 – Hyper-V Host Unable To Reconnect To WS2012 SOFS After Unplanned Failover

Microsoft has released (another) update overnight (what a week!!!) that deals with a Hyper-V scenario. This one is for when a Hyper-V host may be unable to reconnect to the Windows Server 2012 Cluster Scale-Out File Server (SOFS) share after an unplanned failover of one of the SOFS nodes.

Once again, this is niche. I’ve done many graceful and ungraceful shutdowns of SOFS nodes (both virtual and physical) over the past 18 or so months and not seen this issue.

Symptoms

Consider the following scenarios.
Scenario 1

  • You deploy file storage by using Failover Clustering Scale-Out File Server shares in Windows Server 2012.
  • An unexpected error causes the Cluster service process (clussvc.exe) to stop.

In this scenario, you may receive I/O errors instead of failing over to a working cluster node.
Scenario 2

  • You deploy Windows Server 2012 Hyper-V hosts that run virtual machines that are stored on Failover Clustering Scale-Out File Server shares in Windows Server 2012.
  • An unplanned failover causes the Scale-Out File Server to move to another node.

In this scenario, the Hyper-V host may be unable to reconnect to the share. This causes the virtual machines to become unresponsive and to enter a critical state.

A supported hotfix is available from Microsoft Support.

KB2918371 – Scheduled Backup Of Hyper-V Fails With Event ID 517 & Error 0x80780049

This new article from Microsoft refers to “Windows Server Backup running on the host operating system”, but I cannot say if this issue affects third party backup tools, DPM or not. REPEAT: DO NOT ASK ME – ASK MICROSOFT. Very often Microsoft has a bad habit of stating that a backup fix is for a scenario featuring a Microsoft backup product, but it really affects any tool backing up Hyper-V.

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012 hyper-v host and a Windows Server 2012 guest virtual machine (VM).
  • You start Windows Server Backup on the host operating system.
  • You click Backup Schedule to start the backup schedule wizard and then click Next.
  • You select Custom on the Select Backup Configuration tab and then click Next.
  • You click Add Items, select host component and the guest VM, and then complete the wizard.
  • You restart the host operating system.

In this scenario, scheduled backup fails with event backup ID 517 and error 0x80780049.

“The Update” fixes this issue for Windows Server 2012 R2 Hyper-V and Windows 8.1 Client Hyper-V. A hotfix is available for Windows Server 2012 Hyper-V and Windows 8 Hyper-V.

If the problem is limited to Windows Server Backup then it will typically affect just small installations (1 or maybe even 2 hosts) and labs.

KB2908783 – Data Corruption Occurs On iSCSI LUNs In Windows

Another niche scenario bug is fixed in this update by Microsoft, affecting the following Windows versions/editions:

  • Windows 8 & Windows Server 2012
  • Windows 7 & Windows Server 2008 R2

Symptoms

Consider the following scenario:

  • You have a computer that is running Windows 8, Windows Server 2012, Windows 7 Service Pack 1 (SP1), or Windows Server 2008 R2 SP1.
  • You create iSCSI connections to multiple iSCSI targets which are storage arrays.
  • There are frequent iSCSI session connections and disconnections, such as logical unit number (LUN) arrivals and removals.

In this scenario, a silent read/write data corruption can occur on an iSCSI LUN.

There is a bunch of links for downloading updates to resolve the issue, depending on your OS and architecture. See the original post by Microsoft for links.

KB2905249 – "0x8007007A" Error When You Live Migrate A VM On WS2012 Or WS2012 R2 Hyper-V

I’ve done LOTS of live migrations since the beta of WS2012 and through WS2012 R2, and I’ve put the hosts under significant pressure. I can’t say I’ve seen the issue that is discussed & fixed in this new article by Microsoft where a "0x8007007A" error occurs when you migrate a virtual machine that’s running on Windows Server 2012 R2 Hyper-V or Windows Server 2012 Hyper-V.

Symptoms

Consider the following scenario:

  • You have two Hyper-V hosts that are running Windows Server 2012 R2 or Windows Server 2012.
  • You use the Live Migration feature in Hyper-V to migrate a virtual machine from one server to another.

In this scenario, the migration fails. Additionally, a "0x8007007A" error that resembles the following is logged in the System log:

Log Name: System
Source : Microsoft-Windows-Hyper-V-High-Availability
Event ID: 21502
Level : Error
Message :Live migration of ‘VM_Name‘ failed.Virtual machine migration operation for "VM_Name‘ failed at migration source ‘Node_Name‘. (Virtual machine ID VM_GUID) Failed to save the virtual machine partition state: The data area passed to a system call was too small. (0x8007007A). (Virtual machine ID VM_GUID)

To resolve this issue in Windows Server 2012 R2, install update 2919355 (“The Update” via Windows Update). To resolve this issue in Windows Server 2012, install the Microsoft supplied hotfix.

KB2928439 – VM Network Fails If “Minimum Bandwidth Weight” Is Enabled On WS2012 Hyper-V

Microsoft has published a new KB article for when a Hyper-V virtual machine’s network connection fails if the "minimum bandwidth weight" setting is enabled in Windows Server 2012. The scenario where this happens is very niche (negligent bad practice, one might argue).

Symptoms

Consider the following scenario:

  • You have Virtual Machine Manager for Microsoft System Center 2012 installed on a Windows Server 2012 Hyper-V host.
  • You add a third-party virtual network switch extension to System Center 2012 Service Pack 1 (SP1) Virtual Machine Manager or to System Center 2012 R2 Virtual Machine Manager.
  • One of the following conditions is true: 
    • You apply the MinimumBandwidthWeight setting to the network of a Hyper-V virtual machine.
    • You use the System Center Virtual Machine Manager "high bandwidth adapter" or "medium bandwidth adapter" native port profile.

In this scenario, external communication from the virtual machine network fails.

A supported hotfix is available from Microsoft Support.

KB2935616–Memory Leak Caused By WMI Scripts On WS2012 Cluster

It’s a busy day for fixes. This one is for when you run a Windows Management Instrumentation (WMI) script in a Windows Server 2012 cluster, the memory usage for the Wmiprvse.exe process increases over time.

The cause is this:

… issue occurs because the cluster WMI provider leaks basic strings or binary strings (BSTRs).

To resolve this issue, install update 2934016, the Windows RT, Windows 8, and Windows Server 2012 update rollup: April 2014.

KB976424–Important Update For W2008 Or W2008 R2 DCs If You Have WS2012 Clusters

Microsoft has published an elective hotfix that they want you to know about if you haveWindows Server 2008 or Windows Server 2008 R2 domain controllers and you are running Windows Server 2012 clusters.

Symptoms

You perform an authoritative restore on the krbtgt account in a Windows Server 2008-based or in a Windows Server 2008 R2-based domain. After you perform this operation, the kpasswd protocol fails and generates a KDC_ERROR_S_PRINCIPAL_UNKNOWN error code. Additionally, you may be unable to set the password of a user by using the kpasswd protocol. Also, this issue blocks kpasswd protocol interoperability between the domain and a Massachusetts Institute of Technology (MIT) realm. For example, you cannot set the user password by using the Microsoft Identity Lifecycle Manager during user provisioning.

Note The krbtgt account is used for Kerberos authentication. The account cannot be used to log on to a domain.

You may experience additional symptoms in a Windows Server 2012-based server cluster. Assume that you try to set the password for the cluster computer object in a Windows Server 2012-based server cluster. Additionally, assume that there are Windows Server 2008-based or Windows Server 2008 R2-based domain controllers in the environment. In this situation, you receive the following error message:

CreateClusterNameCOIfNotExists (6783): Unable to set password on <ClusterName$>

To resolve this issue, apply this hotfix on the Windows Server 2008-based or Windows Server 2008 R2-based domain controllers, and then create the Windows Server 2012-based server cluster.

Note You do not need to apply this hotfix if you have Windows Server 2008 R2 Service Pack 1 installed.

Cause

When a user requests a ticket for the Kpasswd service, a flag is incorrectly set in the Kerberos ticket-granting service (TGS) request for the Kpasswd service. This behavior causes the Key Distribution Center (KDC) to incorrectly build a new service name. Therefore, an incorrect service name is used, and the KPasswd service fails.

Note The expected behavior is that the Key Distribution Center (KDC) directly copies the correct service name from the Kerberos ticket-granting tickets (TGTs).

A supported hotfix is available from Microsoft.

KB2929078 & KB2929869 – CSV Snapshot Corrupted After File Modifications On WS2012

There are two new KB articles that offer two different, but very similar, hotfixes for this situation.

The first is KB2929078 which deals with a scenario when you delete and then re-create a file on the live volume in Windows Server 2012, the Cluster Shared Volumes (CSV) snapshot is corrupted.

A hotfix is available.

The second article is KB2929869:

Symptoms

Consider the following scenario:

  • You create some files on a Cluster Shared Volume (CSV) in Windows 8 or Windows Server 2012.
  • You take a snapshot of the CSV.
  • You delete the files.
  • You create some files and delete some older snapshots in parallel.

In this situation, the snapshot CSV snapshot file is corrupted.

A second hotfix is also available for this issue.

KB2901896 – WS2012 CSV Cache Causing Poor Performance For Hyper-V VMs

Microsoft has released a hotfix for when CSV block cache causes poor performance of virtual machines on Windows Server 2012 Hyper-V.

Symptoms

Consider the following scenario:

  • You have Hyper-V virtual machines (VM) that are configured on Windows Server 2012 Hyper-V Cluster by using Scale Out File Server as the Storage Solution.
  • The virtual machine .vhdx files are held in Cluster Shared Volume (CSV).
  • CSV Block Cache is enabled on the volume.

In this scenario, virtual machines may experience slow performance.

 

A supported hotfix is available from Microsoft.

KB2935810 – CSV Failover Takes Longer Than Expected In Windows Failover Cluster

Microsoft released a hotfix for WS2012 and WS2012 R2 to deal with a scenario where CSV failover time is longer than expected in Windows failover cluster.

Symptoms

In a Windows failover-cluster that uses Cluster Shared Volumes (CSV), the diff area that is allocated by Volsnap is large and fragmented. In this situation, you encounter the following issues:

  • The failover time on the CSV is longer than expected.
  • The time that Volsnap takes to mount or unmount snapshots is several minutes.

More Information

When a NTFS or ReFS volume is mounted or dismounted, Volsnap iterates through the diff area to mount or unmount the snapshots that belong to that volume. When the diff area allocation becomes large and fragmented, the time that Volsnap takes to mount or unmount operations could be several minutes. Additionally, failover time can be longer than expected.

The resolution is … hmm … long. It is related to two updates:

Two new cluster Physical Disk resource private properties were added, and they can be manipulated to resolve the issue:

  • SnapshotDiffSize: This property controls the maximum diff area size that can be consumed by Volsnap for a Physical Disk resource configured for CSV. Units: In MB (DWORD), Default Value: 0, Maximum Value: 1 TB, The Physical Disk resource must be taken offline/online for changes to take effect.
  • SnapshotAgeLimit: This property is aResource Type private property of the Physical Disk to control the maximum age of a snapshot. Long lived snapshots are a significant contributor to diff area fragmentation. Units: In Days (DWORD), Default Value: 7, Range: 1-60 , This is a global property which affects  all Physical Disk resources. You do not have to take the resource offline or online for it to take effect.

Get-ClusterSharedVolume <Cluster Disk Name> | Set-ClusterParameter snapshotdiffsize <Snapshot Diff Size in MB>

Get-ClusterResourceType "physical disk" | Set-ClusterParameter snapshotagelimit <Snapshot Age in Days>

My advice: leave well alone and only manipulate these settings under the advice of Microsoft support (not some local dude, but actual Premier support).