Comparing The Costs Of WS2012 Storage Spaces With FC/iSCSI SAN

Microsoft has released a report to help you “understand the cost and performance difference between SANs and a storage solution built using Windows Server 2012 and commodity hardware”.  In WS2012, they are referring to Storage Spaces and Scale-Out File Server.

From my own perspective, we’ve found the JBOD + Storage Spaces solution to be much cheaper than SAN storage, both on the upfront side (initial acquisition) and long term.  Adding disks and trays is cheaper – you get any manufacturer’s disk on the JBOD’s HCL rather than the 60% more expensive Dell/HP/etc disk from the same factory but with a “special” (lockdown) firmware.

ESG Lab tested the performance readiness and cost-effectiveness of Microsoft’s new storage solution and compared the results with two common storage solutions: an ISCSI and FC SAN. For performance testing, ESG Lab tested a tier-1 virtualized Microsoft SQL Server 2012 application workload and witnessed a negligible performance difference between all the tested storage configurations. In fact, when testing with as close to the exact same storage configuration as possible across each of the tested configurations, ESG Lab witnessed a slight performance benefit with Microsoft’s storage solution over iSCSI and FC SAN solutions.

ESG Lab also calculated what organizations could expect to spend when initially purchasing each storage configuration. The price difference was impressive. ESG Lab found that Microsoft’s storage solution can save organizations as much as 50% when compared with traditional iSCSI and FC SAN solutions. Another eye-opener for ESG Lab was around a features comparison between the storage configurations. With the upcoming release of Windows Server 2012 R2, Microsoft’s storage configuration is beginning to match traditional storage offerings feature for feature.

With similar performance, a matching feature set, less management complexity, and 50% cost-savings over a SAN, Microsoft’s Windows Server 2012 file server cluster with Storage Spaces over SMB 3.0 introduces a potentially disruptive storage solution to address any customer’s needs.

image

Yes, You Can Run A Hyper-V Cluster On Your JBOD Storage Spaces

What is the requirement of a cluster?  Shared storage.  What’s supported in WS2012/R2?

  • SAS SAN
  • iSCSI SAN
  • Fiber Channel SAN
  • FCoE
  • PCI RAID (like Dell VRTX)
  • Storage Spaces

What’s the difference between a cluster and a Hyper-V cluster?  You’ve enabled Hyper-V on the nodes.  Here are 2 nodes, each connected to a JBOD.  Storage Spaces is configured on the JBOD to create cluster shared volumes.  All that remains now is to enable Hyper-V on node 1 and node 2, and now you have a valid Hyper-V cluster that stores VMs on the CSVs

It’s completely supported, and a perfect Hyper-V cluster solution for the small/medium business, with the JBOD costing a fraction (search engine here and here) of the equivalent capacity SAN.

Stupid questions that you should not ask:

  • What file shares do I use to store my VMs on?  Where do you see “file shares” in the above text?  You store the VMs directly on the CSVs like in a Hyper-V cluster with a SAN, instead of storing file shares on the CSVs like in a SOFS cluster.
  • Can I run other roles on the hosts?  No.  You never should do that … and I include Exchange Server and SQL Server for the 2 people that I now hope have resigned from working in IT who asked that recently.
  • The required networks if you use 10 GbE are shown above.  Go look at converged networks for all possible designs; it’s the same clustered 2012/R2 Hyper-V networking as always.

Getting Started With DataOn JBOD In WS2012 R2 Scale-Out File Server

Yesterday we took delivery of a DataOn DNS-1640D JBOD tray with 8 * 600 GB 10K disks and 2 * 400 GB dual channel SSDs.  This is going to be the heart of V2 of the lab at work, providing me with physical scalable and continuously available storage.

The JBOD

Below you can see the architecture of the setup.  Let’s start with the DataOn JBOD.  It has dual controllers and dual PSUs.  Each controller has some management ports for factory usage (not shown).  In a simple non-stacked solution such as below, you’ll use SAS ports 1 and 2 to connect your servers.  A SAS daisy chaining port is included to allow you to expand this JBOD to multiple trays.  Note that if scaling out the JBOD is on the cards then look at the much bigger models – this one takes 24 2.5” disks.

I don’t know why people still think that SOFS disks go into the servers – THEY GO INTO A SHARED JBOD!!!  Storage inside a server cannot be HA; there is no replication or striping of internal disks between servers.  In this case we have inserted 8 * 600 GB 10K HDDs (capacity at a budget) and 2 STEC 400 GB SSDs (speed).  This will allow us to implement WS2012 R2 Storage Spaces tiered storage and write-back cache.

image

The Servers

I’m recycling the 2 servers that I’ve been using as Hyper-V hosts for the last year and a half.  They’re HP DL360 servers.  Sadly, HP Proliants are stuck in the year 2009 and I can’t use them to demonstrate and teach new things like SR-IOV.  We’re getting in 2 Dell rack servers to take over the role as Hyper-V hosts and the HP servers will become our SOFS nodes.

Both servers had 2 * dual port 10 GbE cards, giving me 4 * 10 GbE ports.  One card was full height and the other modified to half height – occupying both ports in the servers.  We got LSI controllers to connect the 2 servers to the JBOD.  Each LSI adapter is full height and has 2 ports.  Thus we needed 4 SAS cables.  SOFS Node 1 connects to port 1 on each controller on the back of the JBOD, and SOFS Node 2 connects to port 2 on each controller.  The DataOn manual shows you how to attach further JBODs and cable the solution if you need more disk capacity in this SOFS module.

Note that I have added these features:

  • Multipath I/O: To provide MPIO for the SAS controllers.  There are rumblings of performance issues with this enabled.
  • Windows Standards-Based Storage Management: This provides us with with integration into the storage, e.g. SES

The Cluster

The network design is what I’ve talked about before.  The on-board 1 GbE NICs are teamed for management.  The servers now have a single dual port 10 GbE card.  These 10 GbE NICs ARE NOT TEAMED – I’ve put them on different subnets for SMB Multichannel (a cluster requirement).  That means they are simple traditional NICs, each with a different IP address.  I’ve used NetQOSPolicy to do QoS for those 2 networks on a per-protocol basis.  That means that SMB 3.0 and backup and cluster communications go across these two networks.

Hans Vredevoort (Hyper-V MVP colleague) went with a different approach: teaming the 10 GbE NICs and presenting team interfaces that are bound to different VLANs/subnets.  In WS2012, the dynamic teaming mode will use flowlets to truly aggregate data and spread even a single data stream across the team members (physical interfaces).

Storage Spaces

The storage pool is created in Failover Clustering.  While TechEd demos focused on PowerShell, you can create a tiered pool and tiered virtual disks in the GUI.  PowerShell is obviously the best approach for standardization and repetitive work (such as consulting).  I’ve fired up a single virtual disk so far with a nice chunk of SSD tiering and it’s performing pretty well.

image

First Impressions

I wanted to test quickly before the new Dell hosts come so Hyper-V is enabled on the SOFS cluster.  This is a valid deployment scenario, especially for a small/medium enterprise (SME).  What I have built is the equivalent (more actually) of a 2-node Hyper-V cluster with a SAS attached SAN … albeit with tiered storage … and that storage was less than half the cost of a SAN from Dell/HP.  In fact, the retail price of the HDDs is around 1/3 the list price of the HP equivalent.  There is no comparison.

I deployed a bunch of VMs with differential disks last night.  Nice and quick.  Then I pinned the parent VHD to the SSD tier and created a boot storm.  Once again, nice and quick.  Nothing scientific has been done and I haven’t done comparison tests yet.

But it was all simple to set up and way cheaper than traditional SAN.  You can’t beat that!

Deploy Roles Or Features To Lots Of Servers At Once

I’m deploying a large cluster at the moment and I wanted to install the Failover Clustering feature to all the machines without logging, doing stuff, logging out, and repeating.  This snippet of PowerShell took me 45 seconds to put together.  The feature is installing on 8 machines (Demo-FS1 to Demo-FS8) while I’m writing this blog post Smile

For ($i = 1; $i -lt 9; $i++)
{
    Install-WindowsFeature -ComputerName Demo-FS$i Failover-Clustering, RSAT-Clustering
}

The variable $i starts at 1, is used as part of the computer name that is remotely being updated, and then incremented in the next loop iteration.  The loop ends after the 8th iteration, i.e. the 8th server is updated.

Aint automation be-yoot-eeful?

Constraining SMB Powered Redirected IO Traffic To Selected Networks In A WS2012 Hyper-V Cluster

I was recently doing some System Center work on a customer site when they experienced a peculiar problem with their Windows Server 2012 (WS2012) Hyper-V cluster.  They were recovering a very large virtual machine.  OK – nothing weird there.  What was weird was that they lost management access to the hosts as soon as the restore job started.  VMs were fine – it’s just that they couldn’t remotely manage their hosts any more.

I was told of the problem the following morning.  It wasn’t in my project scope but I decided to dig around; I immediately suspected Redirected IO was to blame and that’s a topic I’ve written a word or two about in the past.

Recap: What is Redirected IO?

Redirected IO is a CSV feature where non-owners of the CSV will redirect their storage reads/writes to the storage via the owner of the CSV.  In other words:

  • The owner temporarily has exclusive read/write access to the CSV
  • All other nodes in the cluster read/write from/to the CSV via the owner (or CSV coordinator)

Why is this done?  There are two reasons in WS2012:

  • There is a metadata operation taking place: this could be a VM start (very very quick), a dynamic virtual hard disk expansion, or even the creation of a fixed virtual hard disk.
  • A node loses direct connectivity to the storage and uses redirect IO to avoid VM storage corruption or VM outage.

Note that redirected IO was used for Volume Shadow Copy Snapshot (VSS powered backup) operations in W2008 R2 but that is no longer the case in WS2012; it uses a new distributed snapshot mechanism to simply the backup.

Why Did Redirected IO Kick In Here?

Restoring the VM was a backup operation, right?  No; it’s the complete opposite – it’s a restore operation and has absolutely nothing to do with VSS.  Effectively the restore is creating files on a CSV … and that is one big mutha of a metadata operation.  I wasn’t there at the time and I don’t know the backup tool, but I suspected that the admin restored the VM to a host that was not the owner of the target CSV.  And that created tonnes of redirected IO:

  1. The backup application restored some 500 GB of VM to the destination node
  2. The node was not the CSV owner so redirected IO kicked in
  3. 500 GB of data went redirecting from the “restore” host to the CSV owner across 1 GbE networking and then into the storage
  4. Redirected IO finished up once the metadata operation (the restore) was completed

That’s my Dr. House theory anyway.  I think I have it right – all the pieces fit.  Of course, sometimes the patient died in House.

Didn’t They Control The Redirected IO Network?

They thought they did.  The actually implemented the correct solution … for W2008 R2.  That solution manipulates (if necessary) the network metric value to make the CSV network the one with the lowest metric.  This is a WS2012 R2 cluster and things are different.

The clue why things are different is in the above illustration.  There are two layers of redirected IO (the new lower block approach is 4 times faster) and bother are powered by SMB 3.0.  What’s one of the new features of SMB 3.0?  SMB Multichannel.  And what does SMB Multichannel do?  It finds every connection between the SMB client (the non-CSV owner) and the SMB server (the CSV owner) and uses all those networks to stream data as quickly as possible.

That’s great for getting the SMB transfer done as quickly as possible, but it can be damaging (as I suspected it was here) if left unmanaged.  And that’s why we have features in WS2012 like QoS and SMB Constraints.

The Problem & Solution

In this case the hosts had a number of networks.  The two that are relevant are the cluster network and the management network:

  • Cluster network: A private network with 2 * 1 GbE NICs in a team
  • Management network: 2 * 1 GbE in a team

Two networks and both were eligible for SMB 3.0 to use for SMB Multichannel during Redirected IO.  And that’s what I suspect happened … 500 GB of data went streaming across both NIC teams for the duration of the restore operation.  That flooded the management network.  That made it impossible for administrators to remotely manage or RDP into the hosts in the cluster.

The solution in this case would be to use SMB constraints to control which networks would be used by SMB Multichannel at the client end.  This cluster is simple enough in that regard.  It uses iSCSI storage and SMB Multichannel should only be used by the team … so the team’s interfaces would be listed in the cmdlet.  Now SMB Multichannel will only use the team.

I didn’t implement the solution – that is outside my scope on the project – but I’ve made the recommendation.  I’ll update if I get any news.

In these types of clusters I’d recommend this solution as a part of the build process.  It gets a bit more complicated if you’re using SMB 3.0 storage: just apply constraints to select any NIC that has a valid SMB Multichannel role, e.g. storage and redirected IO.

A Very Important Article About Health Of Virtual DCs On Hyper-V

I strongly urge you (in other words, do it or else) to head over to Hyper-V.nu to read an article that Hans Vredevoort (Virtual Machine MVP) wrote on the July 2013 Update Rollup and what it does to prevent total corruption of Active Directory domain controllers that are running as virtual machines on Hyper-V.

Nuff said … go read it.

KB2872325 – VMs On WS2012 Hyper-V May Not Be Able To Create Or Join Guest Cluster

A new KB article from Microsoft appeared today for when guest cluster nodes in Windows Server 2012 Hyper-V may not be able to create or join a cluster.

Symptoms

If using the “Create Cluster Wizard” the cluster may fail to create. Additionally, the report from the wizard may have the following message:

An error occurred while creating the cluster.
An error occurred creating cluster ‘<clustername>’.
This Operation returned because the timeout period expired

Note: The above errors can also be seen anytime that communications between the servers that are specified to be part of the cluster creation do not complete. A known cause is described in this article.
In some scenarios, the cluster nodes are successfully created and joined if the VMs are hosted on the same node, but once the VMs are moved to different nodes the communications between the nodes of the guest cluster starts to fail. Therefore the nodes of the cluster may be removed from the cluster.

Cause

This can occur due to packets not reaching to the virtual machines when the VMs are hosted on Windows Server 2012 failover cluster nodes, due to a failover cluster component that is bound to the network adapters of the hosts.  The component is called the “Microsoft Failover Cluster Virtual Adapter Performance Filter” and it was first introduced in Windows Server 2012.

The problem only effects the network packets addressed to cluster nodes hosted in virtual machines.

Workaround

It is recommended that if the Windows Sever 2012 failover cluster is going to host virtual machines that are part of guest clusters, you should unbind the “Microsoft Failover Cluster Virtual Adapter Performance Filter” object from all of the virtual switch network adapters on the Windows Server 2012 Failover Cluster nodes.

Instructions for the workaround (GUI and PowerShell options) are in the KB article.

VMM 2012 R2 Release Notes

Microsoft has published the release notes for System Center 2012 R2 – Virtual Machine Manager (VMM).  There are some important notes there, but I thought I’d highlight a few that stick out:

  • For file server clusters that were not created by VMM, deploying the VMM agent to file server nodes will enable Multipath I/O (MPIO) and claim devices. This operation will cause the server to restart. Deploying the VMM agent to all nodes in a Scale-out File Server cluster will cause all nodes to restart.
  • Generation 2 virtual machines are not supported
  • If System Center 2012 R2 VMM is installed on Windows Server 2012, you cannot manage Spaces storage devices that are attached to a Scale-out File server. Spaces storage requires an updated SMAPI that is included with Windows Server 2012 R2 release version.
  • The Physical-to-Virtual (P2V) feature will be removed from the System Center 2012 R2 release.
  • Windows Server supports storage tiering with Storage Spaces. However, VMM does not manage tiering policy.
  • Windows Server supports specifying write-back cache amount with Storage Spaces. However, VMM does not manage this.
  • Performing a Hyper-V Replica failover followed by a cluster migration causes the VMRefresher service to update the wrong virtual manager, putting the virtual machines into an inconsistent state.
  • VMM does not provide centralized management of World Wide Name (WWM) pools.
  • Failing over and migrating a replicated virtual machine on a cluster node might result in an unstable configuration

And there’s more.  Some have workarounds (see the original article).  Some do not, e.g. removal of P2V from VMM 2012 R2 or lack of support for G2 VMs.  In those cases:

  • Use 3rd party tools or DISK2VHD (no DISK2VHDX tool) for P2V
  • Continue to use G1 VMs if using VMM.  Remember that there is no conversion between G1 and G2

KB2870270 – A Hotfix Bundle For WS2012 Hyper-V Failover Clusters

Although not referred to as an update rollup, this latest hotfix is a bundle of fixes.  As before, don’t rush out to deploy it unless it looks like it’s going to fix a problem you are having.  Otherwise, wait a few weeks, test if you can, check the news, and then deploy it to prevent those problems.

This bundle is an update that improves cloud service provider resiliency in Windows Server 2012.  That title/description sounds like someone didn’t know how to describe it and fell back to marketing jargon.  Please don’t let the title confuse you – this bundle contains important fixes for all.  This new KB2870270 replaces the recent KB2848344 (Update that improves cloud service provider resiliency in Windows Server 2012).

The bundle contains:

  • KB2796995: Offloaded Data Transfers fail on a computer that is running Windows 8 or Windows Server 2012
  • KB2799728: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2801054: VSS_E_SNAPSHOT_SET_IN_PROGRESS error when you try to back up a virtual machine in Windows Server 2012
  • KB2813630: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2848727: "Move-SmbWitnessClient" PowerShell command fails in Windows Server 2012

KB2869923 – VM Crash Caused By Physical Disk Resource Move During WS2012 CSV Backup

An “interesting” week for Hyper-V/clustering hotfixes, and they didn’t stop.  Some more came out yesterday.  Test (if you can), wait a few weeks, and then deploy.  This one is for when a Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage.

Symptoms

Consider the following scenario:

  • You configure a Windows Server 2012-based Hyper-V failover cluster.
  • The VHD or VHDX files reside on a Cluster Shared Volume (CSV).
  • Backups of the CSV are performed using software snapshots.
  • Physical Disk resource for the CSV is moved to another node in the cluster.

In this scenario, the Physical Disk resource may fail to come online if the backup of the CSV is in progress. As a result, virtual machines that rely on the CSV may crash.

 

Cause

During a move of the Physical Disk resource, when the Physical Disk resource comes online on the new node it queries Volume Snapshot Service (VSS) to discover the software snapshots associated with that volume. If the move takes place while software snapshot is in progress, VSS may fail to respond or have a long delay to respond. Ultimately, this may cause the Physical Disk resource to either fail to come online or take a long time to come online on the new node. As a result, VMs that have VHD files on the CSV may crash.

A supported hotfix is available from Microsoft.