CSV Cache Is Not Used With Heapmap-Tracked Tiered Storage Spaces

I had an email from Bart Van Der Beek earlier this week questioning an aspect of my kit list for a Hyper-V cluster that is using a SOFS with Storage Spaces for the shared cluster storage. I had added RAM to the SOFS nodes to use for CSV Cache. Bart had talked to some MSFT people who told him that CSV Cache would not be used with tiered storage spaces. He asked if I knew about this. I did not.

So I had the chance to ask Elden Christensen (Failover Clustering PM, TechEd speaker, and author of many of the clustering blog posts, and all around clustering guru) about it tonight. Elden explained that:

  • No, CSV cache is not used with tiered storage spaces where the heapmap is used. This is when the usage of 1 MB blocks is tracked and those blocks are automatically promoted to the hot tier, demoted to the cold tier, or left where they are on a scheduled basis.
  • CSV Cache is used when the heapmap is not used and you manually pin entire files to a tier. This would normally only be used in VDI. However, enabling dedupe on that volume will offer better performance than CSV Cache.

So, if you are creating tiered storage spaces in your SOFS, there is no benefit in adding lots of RAM to the SOFS nodes.

Thanks for the heads up, Bart.

Right-Click On Start And Windows+X Fail To Open Power User Menu On Windows 8.X

This one has been bugging me for a couple of weeks and I just managed to find a fix. Right-clicking on the start button and pressing Windows+X failed to do anything.

The fix?

1) Open up command prompt and run:

xcopy %SystemDrive%UsersDefaultAppDataLocalMicrosoftWindowsWinX %userprofile%AppDataLocalMicrosoftWindowsWinX /e /y

2) Log out and log in again. Everything should work as expected.

Technorati Tags: ,

KB2914974–Cluster Validation Wizard Might Not Discover All LUNs On WS2012 Or WS2012 R2 Failover Cluster

The Failover Cluster Validation Wizard can perform a number of storage tests to determine the suitability and supportability of the shared storage of a potential new cluster. This is important for a Hyper-V cluster that will use directly attached shared storage such as a SAN (not SMB 3.0).

Microsoft has published a KB article for when these storage tests on a multi-site (stretch, cross-campus, or metro) failover cluster may not discover all shared LUNs on Windows Server 2012 or Windows Server 2012 R2.

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012 or Windows Server 2012 R2 multi-site failover cluster.
  • A multi-site storage area network (SAN) is configured to have site-to-site mirroring.
  • You use the Validate a Configuration Wizard to run a set of validation tests on the failover cluster.

In this scenario, storage tests may not detect all logical unit numbers (LUNs) as shared LUNs.

Cause

Storage validation test selects only shared LUNs. A LUN is determined to be shared if its disk signatures, device identification number (page 0×83), and storage array serial number are the same on all cluster nodes. When you have site-to-site mirroring configured, a LUN in one site (site A) has a mirrored LUN in another site (site B). These LUNs have the same disk signatures and device identification number (page 0×83), but the storage array serial number are different. Therefore, they are not recognized as shared LUNs.

Resolution

To resolve the issue, run all the cluster validation tests before you configure the site-to-site LUN mirroring.

Note If the validation test is needed afterward for support situations, LUNs that are not selected for storage validation tests are supported by Microsoft and the storage vendor as valid Shared LUNs.

KB2916993 – Stop Error 0x9E In WS2012 Or Windows 8

This KB article looks like it affects Windows Server 2012 clusters so I’m including it in today’s posts. The fix is for when a stop error 0x9E in Windows Server 2012 or Windows 8.

Symptoms

When you have a cluster node that is running Windows Server 2012, you may encounter a 0x9E Stop error.

Cause

This issue occurs because of lock contention between the memory manager and the Cluster service or a resource monitor when a large file is mapped into system cache.

A hotfix is available to resolve this issue.

KB2913695–OffloadWrite Does PrepareForCriticalIo For Whole VHD On WS2012 Or WS2012 R2 Hyper-V

Microsoft has published a hotfix for when OffloadWrite is doing PrepareForCriticalIo for the whole VHD in a Windows Server 2012 or Windows Server 2012 R2 Hyper-V.

Symptoms

Consider the following scenario:

  • You have a Hyper-V host that is running Windows Server 2012 or Windows Server 2012 R2.
  • You run a copy for a file in a virtual machine.
  • There is an offload write for Virtual Hard Disk (VHD) in the host.

In this scenario, NTFS in the host would do PrepareForCriticalIo for the whole VHD. This operation may cause the following bad consequences for Cluster Shared Volumes:

  • Redirected I/O may be time-out.
  • Snapshot creation can be stuck until the offloadWrite is complete.
  • Volume dismount will be blocked by inflight I/O. This can cause the Physical Disk Resource to be detected as deadlocked if dismount takes more than 3 minutes, or the cluster node to be bug checked if dismount takes more than 20 minutes.

A hotfix is available for this issue.

KB2913461 – Hyper-V Live Storage Migration Fails When Moving Files to WS2012 CSV

Keep in mind that one of the features of Live Migration is that the original virtual machine and/or files are not removed until after the entire process is complete. This “bur no bridges” approach ensures that your virtual machine remains running no matter what happens during the migration process. I’ve seen this personally during the preview releases of WS2012 and WS202 R2 when stress testing Live Migration and other features.

Microsoft has published a KB article for when Hyper-V storage migration fails when you try to migrate VHD and configuration files to CSV volumes in Windows Server 2012.

Symptoms

Consider the following scenario:

  • You install the Hyper-V role on a Windows Server 2012-based two-node failover cluster.
  • You have two Cluster Shared Volumes (CSV) volumes.
  • You create a virtual machine on a cluster node. The virtual machine has a single 60-gigabyte (GB) fixed-size virtual hard disk (VHD).
    Note The virtual machine is not created on a CSV volume.
  • On the cluster node, the available space on drive C is less than 20 GB.
  • In the Hyper-V Manager console, you try to move the VHD file to one CSV volume, and you try to move the configuration files to the other CSV volume.
    Note The CSV volumes have enough space to hold the VHD file and the configuration files.

In this scenario, the migration operation fails, and you receive an error message that resembles the following:

Migration did not succeed. Not enough disk space at ”.

Note This issue still occurs after you install hotfix 2844296. For more information about hotfix 2844296, click the following article number to view the article in the Microsoft Knowledge Base:

2844296

(http://support.microsoft.com/kb/2844296/ )

Shared Nothing Live Migration fails when you try to migrate a virtual machine to a destination server in Windows Server 2012.

Cause

This issue occurs because the target CSV volumes are incorrectly identified as being a system drive volume instead of multiple separate CSV volumes.

A hotfix is available to resolve this issue.

KB2901237–Hyper-V Replica Is Created Unexpectedly After Restarting VMMS On Replica WS2012 Host

The Virtual Machine Management Service (VMMS) runs in user mode in the management OS of every Hyper-V host. It has nothing to do with SCVMM; that’s just an unfortunate similarity in names. The VMMS provides the WMI or management interface to Hyper-V for all management tools, such as PowerShell, Hyper-V Manager, or Failover Cluster Manager.

Microsoft published a KB article for when a standard Hyper-V replica is created unexpectedly after you restart the VMMS service in Windows Server 2012.

Symptoms

Consider the following scenario:

  • You have Windows Server 2012-based Hyper-V servers that are running in an environment that has Hyper Replica deployed.
  • You set more than two recovery points. 
  • You restart the VMMS service on a replica server, or you restart the replica server.
  • You wait about 5 minutes until the first time delta is arrived from the primary site.

In this scenario, a standard replica (recovery point) is created unexpectedly. 
Note If the time interval between the latest recovery point and the arrival of the delta is less than 60 minutes, a standard replica should not be created.

Cause

This issue occurs because the VMMS service incorrectly compares the time stamp of the earliest recovery point to the latest delta time stamp. Therefore, the system takes a new snapshot every time the VMMS service is restarted

A hotfix has been published to resolve this issue. It’s not an issue I’d expect to see too often but the fix is there.

Microsoft Publishes January 2014 Update Rollup For Windows 8, Windows RT, & WS2012

Time for you to do … exactly nothing for a month, because Microsoft has pushed out another UR for Windows 8, Windows RT, and Windows Server 2012. So make sure this sucker is unapproved and sits like that for a month until some other sucker has tested it for you. If there is a problem (and based on the last 12 months, there probably is one or more) then let that other person find the issue, report it, and Microsoft re-issue a fixed update rollup.

After digging into the contents of the update, we can see that there are networking fixes and a cluster fix. The latter is KB2876391, "0x0000009E" Stop error on cluster nodes in a Windows Server-based multi-node failover cluster environment.

Symptoms

Assume that you have a Windows Server 2008 R2 Service Pack 1 (SP1) or Windows Server 2012-based multi-node failover cluster that uses the Microsoft Device Specific Module (MSDSM) and Microsoft Multipath I/O (MPIO). The following events occur at almost the same time:

  • A new instance of an existing device arrives. Specifically, a new path to an MPIO disk is generated.
  • MSDSM finishes an I/O request. The request was the last outstanding I/O request.

In this scenario, some cluster nodes crash. Additionally, you receive a Stop error message that resembles the following:

STOP: 0x0000009E (parameter1, parameter2, parameter3, parameter4)

Notes

  • This Stop error describes an USER_MODE_HEALTH_MONITOR issue.
  • The parameters in this Stop error message vary, depending on the configuration of the computer.
  • Not all "Stop 0x0000009E" errors are caused by this issue.

Cause

This issue occurs because a remove lock on a logical unit number (LUN) is obtained two times, but only released one time. Therefore, the Plug and Play (PnP) manager cannot remove the device, and then the node crashes.

 

The hotfix is included in the UR. Despite what the Premier Sustained Engineering author wrote, this is not just for “Windows Server 2008 R2 SP1-based multi-node failover cluster environment” but it is also for WS2012.

KB2923885 – WS2012 VMs With SR-IOV on WS2012 R2 Hyper-V Incorrectly Say Integration Components Require Upgrade

Microsoft has released a KB article to confirm a problem where Hyper-V Manager incorrectly reports "Update required" for the Hyper-V integration services in Windows Server 2012 guest operating systems that use SR-IOV on Windows Server 2012 R2 Hyper-V hosts.

Symptoms

Assume that you have a Windows Server 2012 R2-based Hyper-V server. A Windows Server 2012-based guest operating system that has integration services up to date and that uses Single Root I/O Virtualization (SR-IOV) is running on the server. After you restart the guest operating system, Hyper-V Manager incorrectly reports the integration services state of the guest operating system as Update required.

Status

This is a known issue of Windows Server 2012 R2. Except for the status report, there are no negative effects to the Hyper-V system.

Microsoft has confirmed that this is a problem.

You can ignore this annoying warning. I suspect that if the warning status appears in VMM then it is really annoying. But very few of you should be affected because SR-IOV is not needed by many VMs.

A Kit/Parts List For A WS2012 R2 Hyper-V Cluster With DataOn SMB 3.0 Storage

I’ve had a number of requests to specify the pieces of a solution where there is a Windows Server 2012 R2 Hyper-V cluster that uses SMB 3.0 to store virtual machines on a Scale-Out File Server with Storage Spaces (JBOD). So that’s what I’m going to try to do with this post. Note that I am not going to bother with pricing:

  • It takes too long to calculate
  • Prices vary from country to country
  • List pricing is usually meaningless; work with a good distributor/reseller and you’ll get a bid/discount price.
  • Depending on where you live in the channel, you might be paying distribution price, trade price, or end-customer price, and that determines how much margin has been added to each component.
  • I’m lazy

Scale-Out File Server

Remember that an SOFS is a cluster that runs a special clustered file server role for application data. A cluster requires shared storage. That shared storage will be one or more Mini-SAS-attached JBOD trays (on the Storage Spaces HCL list) with Storage Spaces supplying the physical disk aggregation and virtualization (normally done by SAN controller software).

On the blade versus rack server question: I always go rack server. I’ve been burned by the limited flexibility and high costs of blades. Sure you can get 64 blades into a rack … but at what cost!?!?! FlexFabric-like solutions are expensive, and strictly speaking, not supported by Microsoft – not to mention they limit your bandwidth options hugely. The massive data centres that I’ve seen and been in use 1U and 2U rack servers.  I like 2U rack servers over 1U because 1U rack servers such as the R420 have only 1 full-height and 1 half-height PCI expansion slots. That half-height slot makes for tricky expansion.

For storage (and more) networking, I’ve elected to go with RDMA networking. Here you have two good choices:

  • iWARP: More affordable and running at 10 GbE – what I’ve illustrated here. Your vendor choice is Chelsio.
  • Infiniband: Amazing speeds (56 Gbps with faster to come) but more expensive. Your vendor choice is Mellanox.

I’ve ruled out RoCE. It’s too damned complicated – just ask Didier Van Hoye (@workinghardinit).

There will be two servers:

  • 2 x Dell R720: Dual Xeon CPU, 6 GB RAM, rail kits, dual CPU, on-board quad port 1 GbE NICs. The dual CPU gives me scalability to handle lots of hosts/clusters. The 4 x 1 GbE NICs are teamed (dynamic load distribution) for management functionality. I’d upgrade the built-in iDRAC Essentials to the Enterprise edition to get the KVM console and virtual media features. A pair of disks in RAID1 configuration are used for the OS in each of the SOFS nodes.
  • 10 x 1 GbE cables: This is to network the 4 x 1 GbE onboard NICs and the iDRAC management port. Who needs KVM when you’ve already bought it in the form of iDRAC.
  • 2 x Chelsio T520-CR: Dual port 10 GbE SFP+ iWARP (RDMA) NICs. These two rNICs are not teamed (not compatible with RDMA). They will reside on different VLANs/subnets for SMB Multichannel (cluster requirement). The role of these NICs is to converge SMB 3.0 storage, and cluster communications. I might even use these networks for backup traffic.
  • 4 x SFP+ cables: These are to connect the two servers to the two SFP+ 10 GbE switches.
  • 2 x LSI 9207-8e Mini-SAS HBAs: These are dual port Mini-SAS adapters that you insert into each server to connect to the JBOD(s). Windows MPIO provides the path failover.
  • 2 x Windows Server Standard Edition: We don’t need virtualization rights on the SOFS nodes. Standard edition includes Failover Clustering.

Regarding the JBODs:

Only use devices on the Microsoft HCL for your version of Windows Server. There are hardware features in these “dumb” JBODs that are required. And the testing process will probably lead to the manufacturer tweaking their hardware.

Not that although “any” dual channel SAS drive can be used, some firmwares are actually better than others. DataOn Storage maintain their own HCL of tested HDDs & SSDs and HBAs. Stick with the list that your JBOD vendor recommends.

How many and what kind of drives do you need? That depends. My example is just that: an example.

How many trays do you need? Enough to hold your required number of drives 😀 Really though, if I know that I will scale out to fill 3 trays then I will buy those 3 trays up front. Why? Because 3 trays is the minimum required for tray fault tolerance with 2-way mirror virtual disks (LUNs). Simply going from 1 tray to 2 and then 3 won’t do because data does not relocate.

Also remember that if you want tiered storage then there is a minimum number of SSDs (STRONGLY) recommended per tray.

Regarding using SATA drives: DON’T DO IT! The available interposer solution is strongly discouraged, even by DataOn.  If you really need SSD for tiered storage then you really need to pay (through the nose).

Here’s my EXAMPLE configuration:

  • 3 x DataOn Storage DNS-1640D: 24 x 2.5” disk slots in each 2U tray, each with a blank disk caddy for a dual channel SAS SSD or HDD drive. Each has dual boards for Mini-SAS connectivity (A+B for server 1 and A+B for server 2), and A+B connectivity for tray stacking. There is also dual PSU in each tray.
  • 18 x Mini-SAS cables: These cables are used to connect the LSI cards in the servers to the JBOD(s) and to stack the trays. At least I think 18 cables are required. They’re short cables because the servers are on top/under the JBOD trays and the entire storage solution is just 10U in height.
  • 12 x STEC S842E400M2 400GB SSD: Go google the price of these for a giggle! These are not your typical (or even “enterprise”) SSD that you’ll stick in a laptop.  I’m putting 4 into each JBOD, the recommended minimum number of SSDs in tiered storage if doing 2-way mirroring.
  • 48 x Seagate ST900MM0026 900 GB 10K SAS HDD: This gives us the bulk of the storage. There are 20 slots free (after the SSDs) in each JBOD and I’ve put in 16 disks into each. That gives me loads of capacity and some wiggle room to add more disks of either type.
  • 18 x Mini-SAS Cables: I’m not looking at a diagram and I’m tired so 18 might not be the right number. There’s a total of 10U of hardware in the SOFS (servers + JBOD) so short Mini-SAS cables will do the trick. These are used to attach the servers to the JBODs and to daisy chain the JBODs. The connections are fault tolerant – hence the high number of cables.

And that’s the SOFS, servers + JBODs with disks.

Just to remind you: it’s a sample spec. You might have one JBOD, you might have 4, or you might go with the 60 disk slot models. It all depends.

Hyper-V Hosts

My hosting environment will consist of one Hyper-V cluster with 8 nodes. This could be:

  • A few clusters, all sharing the same SOFS
  • One or more clusters with some non-clustered hosts, all sharing the same SOFS
  • Lots of non-clustered hosts, all sharing the same SOFS

One of the benefits of SMB 3.0 storage is that a shared folder is more flexible than a CSV on a SAN LUN. There are more sharing options, and this means that Live Migration can span the traditional boundary of storage without involving Shared-Nothing Live Migration.

Regarding host processors, the L2/L3 cache plays a huge role in performance. Try to get as new a processor as possible. And remember, it’s all Intel or all AMD; do not mix the brands.

There are lots of possible networking designs for these hosts. I’m going to use the design that I’ve implemented in the lab at work, and it’s also one that Microsoft recommends. A pair or rNICs (iWARP) will be used for the storage and cluster networking, residing on the same two VLANs as the cluster/storage networks that the SOFS nodes are on. Then two other NICs are going to be used for host and VM networking. These two NICs could be 1 GbE or 10 GbE or faster, depending on the needs of your VMs. I’ve got 4 pNICs to play with so I will team them.

    • 8 x Dell R720: Dual Xeon CPU, 256 GB RAM, rail kits, dual CPU, on-board quad port 1 GbE NICs. These are some big hosts. Put lots of RAM in because that’s the cheapest way to scale. CPU is almost never the 1st or even 2nd bottleneck in host capacity. The 4 x 1 GbE NICs are teamed (dynamic load distribution) for VM networking and management functionality. I’d upgrade the built-in iDRAC Essentials to the Enterprise edition to get the KVM console and virtual media features. A pair of disks in RAID1 configuration are used for the management OS.
    • 40 x 1 GbE cables: This is to network the 4 x 1 GbE onboard NICs and the iDRAC management port in each host. Who needs KVM when you’ve already bought it in the form of iDRAC.
    • 8 x Chelsio T520-CR: Dual port 10 GbE SFP+ iWARP (RDMA) NICs. These two rNICs are not teamed (not compatible with RDMA). They will reside on the same two different VLANs/subnets as the SOFS nodes. The role of these NICs is to converge SMB 3.0 storage, SMB 3.0 Live Migration (you gotta see it to believe it!), and cluster communications. I might even use these networks for backup traffic.
    • 16 x SFP+ cables: These are to connect the two servers to the two SFP+ 10 GbE switches.
    • 8 x Windows Server Datacenter Edition: The Datacenter edition gives us unlimited rights to install Windows Server into VMs that will run on these licensed hosts, making it the economical choice. Enabling Automatic Virtual Machine Activation in the VMs will simplify VM guest OS activation.

There are no HBAs in the Hyper-V hosts; the storage (SOFS) is accessed via SMB 3.0 over the rNICs.

Other Stuff

Hmm, we’re going to need:

  • 2 x SFP+ 10 GbE Switches with DBC support: Data Center Bridging really is required to do QoS of RDMA traffic. If would need PFC (Priority Flow Control) support if using RoCE for RDMA (not recommended – do either iWARP or Infiniband). Each switch needs at least 12 ports – allow for scalability.  For example, you might put your backup server on this network.
  • 2 x 1 GbE Switches: You really need a pair of 48 port top-of-rack switches in this design due to the number of 1 GbE ports being used and the need for growth.
  • Rack
  • PDU

And there’s probably other bits. For example, you might run a 2-node cluster for System Center and other management VMs. The nodes would have 32-64 GB RAM each. Those VMs could be stored on the SOFS or even on a JBOD that is directly attached to the 2 nodes with Storage Spaces enabled. You might run a server with lots of disk as your backup server. You might opt to run a pair of 1U servers are physical domain controllers for your infrastructure.

I recently priced up a kit, similar to above. It came in much cheaper than the equivalent blade/SAN configuration, which was a nice surprise. Even better was that the SOFS had 3 times more storage included than the SAN in that pricing!