Please Welcome CSVFS

If you’re using Windows Server 2012 Failover Clustering for Scale Out File Server or for HA Hyper-V then you’ve created one or more Cluster Shared Volumes (CSV).  This active-active clustered file system (where orchestration is performed by the cluster nodes rather than the file system to achieve greater scalability) is NTFS based.  But wander into Disk Management and you’ll see a different file system label:

image

This label has two purposes:

  1. You can tell from admin tools that this is a CSV volume and is shared across the nodes in the cluster
  2. It allows applications to know that they are working with a CSV rather than a simple single-server volume.  This is probably important for applications that can use the filter extensibility of Windows Server 2012 Hyper-V, e.g. replication or AV.

BTW, this screenshot is taken from the virtualised scale-out file server that I’m building with a HP VSA as the background storage.

KB2687646 – A LUN That Is Registered To A CSV Is Invisible & Inaccessible In W2008 R2

Microsoft has just released a hotfix for Windows Server 2008 R2 Failover Clusters where a LUN LUN that is registered to a cluster shared volume is invisible and inaccessible.  The scenario is:

•You use a cluster shared volume on a Windows Server 2008 R2-based failover cluster that has the Hyper-V 2.0 role installed.
•You register a logical unit number (LUN) from the host operating system on the cluster shared volume.
•The Disk Control Manager component detects a STATUS_CONNECTION_DISCONNECTED error and starts a failover.
In this situation, Windows Server 2008 R2 does not detect the LUN.

In this situation, Windows Server 2008 R2 does not detect the LUN.  This issue occurs because the Disk Control Manager component does not remap the cluster shared volume resource after a failover occurs.

A supported hotfix is available from Microsoft.

Windows Server 8 Hyper-V Failover Cluster Failover Startup Priority

There’s a blogger out there who used to claim that the only reason he wouldn’t consider Hyper-V as an enterprise virtualisation solution was because he couldn’t set the ordering of automatic VM startup during a failover scenario, e.g. start up the SQL server, then the middle tier server, then the web server. 

Windows Server 8 Hyper-V Failover Clustering has this feature, enabling you to set VMs into one of 4 buckets and thus order their startup when they failover from one host to another:

  1. High: These VMs start up first
  2. Medium: The default, and they start up after the high priority ones
  3. Low: These VMs start up after the high and medium priority VMs
  4. No auto start: These VMs fail over but do not start up automatically

How does it work?  Check it out for yourself:

 

See Windows Server 8 Hyper-V Simultaneous Live Migration & Cluster Host Drain In Action

Yesterday I showed you how my Windows Server 8 Hyper-V lab is currently built (I’m in the process of wiping to build something more flexible).  Today, I’m going to show you two things:

  1. Not just Live Migration in action, but simultaneous Live Migration.  I’ll be moving all 66 VMs from Host1 to Host2, and they’ll move 20 at a time.  This is a huge improvement over the 1 at a time that we can do in W2008 R2, and way more than the maximum of 4 (on 1 GbE) or 8 (on 10 GbE) that vSphere 5.0 can handle.  BTW, I was moving all of them at once last night Smile
  2. I’m going to perform the move by draining Host1 using a new pause function.  This is used for host maintenance (similar to VMM maintenance mode) and will Live Migrate the VMs to the most suitable host (Failover Clustering measures memory, where VMM does Intelligent Placement).  This pause function is used by Windows Server 8 Cluster Aware Updating.

In the demo, you’ll see my 20 GbE NIC team that is used for Live Migration and the 1 GbE file server where the VMs are located:

 

 

How I Currently Have A Windows Server 8 Hyper-V Cluster Configured

The great thing about a lab with lots of NICs is that you can configure in lots of ways.  Today I built out a new Windows Server 8 Hyper-V cluster, using SMB 2.2 as the storage for the VMs.  This is how I configured it:

image

You might notice that the configuration isn’t all that different from what you’re used to.  You still require certain communication channels.  How you create those channels can vary wildly in Windows Server 8.  In W2008/R2 you require physical NICs.  In the new version of Hyper-V you can do that, you can create that same effect with native NIC teams to aggregate bandwidth (as I did here), or you can create converged fabrics (as few as 2 physical NICs), or fabrics with isolation, and on and on.  But you still need 2 channels for the cluster, as you can see in the middle of my diagram.

I went a little nuts then.  I used my PowerShell script to create 76 VMs.  Off I went to a meeting, and they were waiting for me when I came back.  And then I did my first stress test of concurrent Live Migration.  You can see that I had a 20 GbE pipe made up of 2 * 10 GbE NICs in a NIC team.  It ran pretty quickly … 38 GB of VM RAM from Host1 to Host2.  I think I might try to script that Live Migration, and run it back and forth again and again to see what happens Smile

KB2673129 – Slow Shutdown Time For A Cluster Node In A Windows 2008R2-based cluster

And the final patch (there’s more so you should check out the Failover Clustering patching wiki) for clustering.  I’m only listing the Hyper-V related ones. 

“Consider the following scenario:

  • You set a preferred node in a Windows Server 2008 R2-based failover cluster.
  • You set the failback policy of this node to Allow Failback, select the Failback between option, and then set a failback time interval.
  • You move a cluster resource group from the preferred node to another node.
  • You try to shut down the preferred node.

In this scenario, the shutdown operation stops responding at the Shutting down Cluster Service phase. It takes about 30 minutes for the node to shut down.

This issue occurs because the Cluster service tries to fail back the resource group every 15 minutes when the following conditions are true:

  • A failback time interval is set.
  • The time that is used to shut down the node is longer than the failback time interval.

When the failback process fails, the thread that performs the operation sleeps for 15 minutes. If you try to shut down the Cluster service on this node, the operation waits until the cluster shutdown times out”.

A supported hotfix is available from Microsoft.

KB2639032 – “0x0000003B” Stop Error When Connection To CSV Is Lost On W2008R2-based Failover Cluster

It’s a busy patch download and test day for you Hyper-V admins!  Another (there’s more to come) patch for CSV.  This Hotfix refers to a "0x0000003B" stop error when a connection to a CSV is lost on a Windows Server 2008 R2-based failover cluster.

“Consider the following scenario:

  • You enable the cluster shared volume (CSV) feature on a Windows Server 2008 R2-based failover cluster.
  • You add some disks to the list of cluster shared volumes.
  • The connection to a disk is lost unexpectedly.

In this scenario, you receive a Stop error message that resembles the following:

STOP: 0x0000003B (parameter1, parameter2, parameter3, parameter4)

Notes

  • This Stop error describes a SYSTEM_SERVICE_EXCEPTION issue.
  • The parameters in this Stop error message vary, depending on the configuration of the computer.
  • Not all "0x0000003B" Stop errors are caused by this issue.

This issue occurs because of a race condition in Partition Manager (Partmgr.sys). Partition Manager does not use a removal lock for the read/write I/O request packet (IRP). This behavior causes Partition Manager to access a nonexistent device object. Therefore, you receive the Stop error message that is mentioned in the "Symptoms" section”.

A supported hotfix is available from Microsoft.

KB2674551 – Redirected Mode Enabled Unexpectedly In CSV When Running 3rd-Party Application In W2008R2-Based Cluster

Another Hyper-V related (clustering this time) hotfix was just released by Microsoft.  The situation is when redirected mode is enabled unexpectedly in a Cluster Shared Volume when you are running a third-party application in a Windows Server 2008 R2-based cluster.

“Consider the following scenario:

  • You are running a third-party application in a Windows Server 2008 R2-based cluster that has the Cluster Shared Volumes (CSV) feature enabled.
  • The third-party application has a mini-filter driver that uses an altitude value to determine the load order of the mini-filter driver.
  • The altitude value contains a decimal point.
  • You set a Cluster Shared Volume to online mode.

In this scenario, the Cluster Shared Volume is set to redirected mode.  This issue occurs because the cluster service assumes that the altitude value is an integer when the cluster service parses the altitude value”.

A supported hotfix is available from Microsoft.

Windows Server 8 Hyper-V Management Improvements

Another big investment by Microsoft in Windows Server 8 Hyper-V was how we interact with the product. With lots more functionality, and some of it being very advanced and not required by everyone, they had to decide how to present it in the GUI.  And with huge cluster scale out (up to 64 hosts per cluster) and target markets such as hosting and large enterprise, automation was of great importance.

The GUI – Hyper-V Manager console (HMC)

On the face of it, the GUI has not changed much.  There is no ribbon bar and things can be found where they previously were in the Windows Server 2008 Hyper-V and Windows Server 2008 R2 Hyper-V HMCs.

Often we fire up the HMC to just look for information.  Tabs have been added in the lower centre pane of HMC to show us information, e.g. summary, memory, networking, and Hyper-V Replica (aka Replica).

Nested Nodes

When you open the settings of a VM to change it’s configuration you will notice that the CPU and Networking nodes on the left are nested.  There are sub-nodes with more settings.  This is done for some reasons including:

  • It cleans up the GUI  Even with newly added scroll bars, there’s only so much you can squeeze into a single screen without making things messy and unusable.
  • It hides away advanced features that should only be used by engineers who know that they do and know that they need them, e.g. the NUMA override settings.

Clustering Interaction

A classic problem for the forums was when a person would edit the settings of a VM in the HCM and live migrate the VM from one host to another in a host cluster.  Their new settings were lost because the cluster database was not updated.  You had to either use Failover Cluster Manager (FCM) to edit the VM settings (auto-update of settings) or remember to manually update the VM resource in FCM after editing it the VM in HCM.

Now, HCM will detect that a VM is clustered and prevent you from editing the settings.  You must use the FCM instead, and quite right too!

VMConnect

Have you ever been ticked off when you use VMConnect to get a console connection to a VM and then you fail it over to another node in a Hyper-V Cluster?  Actually, ticked off isn’t the right word the first time you see this: you crap yourself when VMConnect loses the console connection to the VM and confusingly tells you that the VM must have been deleted!  That’s changing in Windows Server 8.  Yes, VMConnect will disconnect – briefly.  The source host for the VM will redirect the VMConnect session to the VM on the destination host.  No more tingling in the left arm or tightening of the heart when working on a VM at midnight.  Hyper-V engineers and their doctors thank you, Microsoft!

PowerShell 3.0 Cmdlets

The big change is that Hyper-V will have built-in PowerShell (POSH) cmdlets for the very first time in Windows Server 8.  Even from a POSH-disabled person like me, the cmdlets looked easy to use and very powerful to me in the hands on lab that I did.  For you POSH purists, I’ve been told that the Hyper-V POSH cmdlet specs were written so things would be done the POSH way.

With some hep I had created a script that read in some specs from a CSV file, created a bunch of differential disks, created lots of VMs, and connected them to those disks.  One test lab up and running in a few minutes, that could be recreated in a moment’s notice.  I’m sure with more practice, I could have made the script much more elegant than I had in the limited time window.

It’s this sort of thing that the POSH cmdlets are intended to enable.  Big hosting companies can automate deployment from their “control panels”.  Enterprises can automate bulk configuration changes.  We people who demo can deploy a new lab in seconds.  And with POSH 3.0 Workflows you can build complex scripts that work reliably and in an orchestrated manner across many machines and applications.

Just like Exchange, not every admin function is in the GUI.  Some things will have to be done in POSH.  I guess I will have to learn it after all these years of saying “I will have to learn PowerShell”.

Recent KB Hotfixes for Windows Server 2008 R2 Failover Clustering

I’m catching up after my Norway vacation and the recent Intune roadshow in Ireland.

KB2462468: Unable to manage cluster using failover cluster manager. Error Received: "Connection to the cluster is not allowed since you are not an administrator on the cluster node(s) "

While managing cluster using failover cluster management console we receive the following error:
Error
The operation has failed.
An error occurred connecting to the cluster ‘.’.
[Expanded Information]
An error occurred trying to display the cluster information.
Connection to the cluster is not allowed since you are not an administrator on the cluster node(s) (Node name)

Collapse this imageExpand this image

error that we get when we try to manage cluster

or
When you run the Cluster validation you recieve the following error:
Unable to determine if you have administrator privileges on server "Node name" . Please ensure sure that the server service and remote registry services are enabled, and that the firewall is properly configured for remote access.
Managing cluster using command prompt will still work and will be able to list groups (cluster group), resources (cluster . res) and even be able to do failover of groups (cluster group "cluster group" /move) but will error out while managing cluster using GUI (Failover Cluster Management console).
Note: Command to list group & resources, move group are given in bracket.

This issue occurs if you have server service not started on the node which is shown in the error. Please expand the error to check node name.
Additionally, you may get above mentioned issue due to incorrect protocol enabled which are required for Microsoft clustering.

Open services console and start the Server service.
Ensure the cluster network has both the mentioned below protocol checked:
1. Client for Microsoft networks
2. File and printer sharing for Microsoft networks

KB2008795: Unable to access ClusterStorage folder on a passive node in a server 2008 R2 cluster

On a Windows Server 2008 R2 cluster with Cluster Shared Volume(CSV) feature enabled, a user may be unable to access a CSV volume from a passive (non-coordinator) node. When clicking on a CSV volume, explorer may hang. One or all of the following events may be displayed:

Event ID: 5120
Source: Microsoft-Windows-FailoverCluster
Level: Error
Description: Cluster Shared Volume "volume_name" is no longer available on this node because of "STATUS_BAD_NETWORK_PATH(c00000be)’. All I/O will temporarily be queued until a path to the volume is re-established.

Event ID: 5120
Source: Microsoft-Windows-FailoverCluster
Level: Error
Description: Cluster Shared Volume "volume_name" is no longer available on this node because of ‘STATUS_CONNECTION_DISCONNECTED(c000020c)’. All I/O will temporarily be queued until a path to the volume is reestablished.

Event ID: 5120
Source: Microsoft-Windows-FailoverCluster
Level: Error
Description: Cluster Shared Volume "volume_name" is no longer available on this node because of ‘STATUS_MEDIA_WRITE_PROTECTED(c00000a2)’. All I/O will temporarily be queued until a path to the volume is reestablished.

Event ID generated: 5142
Source: Microsoft-Windows-FailoverCluster
Description: Cluster Shared Volume "volume_name" (‘Cluster Disk #’) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

When accessing a CSV volume from a passive (non-coordinator) node, the disk I/O to the owning (coordinator) node is routed through a ‘preferred’ network adapter and requires SMB be enabled on that network adapter. For SMB connections to work on these network adapters, the following protocols must be enabled:

  • Client for Microsoft Networks
  • File and Printer Sharing for Microsoft Networks

Review each cluster node and verify the following protocols are enabled the network adapters available for Cluster use

  • Client for Microsoft Networks
  • File and Printer Sharing for Microsoft Networks

1. Click Start , click Run , type ncpa.cpl , and then click OK .
2. Right-click the local area connection that is associated with the network adapter, and then click Properties .
3. Verify that the above protocols appear in the This connection uses the following items box. If either is missing, follow these steps:
a. Click Install , click Client , and then click Add .
b. Select the missing protocol, click OK , and then click Yes .
4. Verify that the check box that appears next to Client for Microsoft Networks is selected.

Personal Note: Those two articles are closely related.  It appears that people are incorrectly unbinding the 2 required networking protocols for CSV: Client for Microsoft Networks & File and Printer Sharing for Microsoft Networks.

KB2637197: CSV LUNs fail if you use a VSS hardware provider to back up virtual machines on a Windows Server 2008 R2-based cluster

Consider the following scenario:

  • You configure a failover cluster that consists of servers that are running Windows Server 2008 R2.
  • You create a cluster shared volume (CSV) that includes some virtual machines that are owned by different cluster nodes.
  • You back up the virtual machines by using a Volume Shadow Copy Service (VSS) hardware provider. For example, you back up a protection group in Microsoft System Center Data Protection Manager (DPM) 2010.
  • The owner of the CSV changes during the backup process.

In this scenario, the CSV logical unit numbers (LUNs) enter a failed state, and do not come online on the next cluster node. Therefore, the backup process fails.
Notes

  • This issue does not occur if you do not use a VSS hardware provider to back up the virtual machines.
  • This issue does not occur if the change of owner of the CSV is not triggered during the backup process.

This issue occurs because the Cluster service incorrectly accesses stale information to determine whether a CSV LUN is in the correct state to start the backup process.

A supported hotfix is available from Microsoft. However, this hotfix is intended to correct only the problem that is described in this article. Apply this hotfix only to systems that are experiencing the problem described in this article. This hotfix might receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next software update that contains this hotfix.