Comparing TCPIP, Compressed, and SMB WS2012 R2 Hyper-V Live Migration Speeds

I’m building a demo for some upcoming events, blatantly ripping off what Ben Armstrong did at TechEd – copying is the best form of flattery, Ben Smile  In the demo, I have 2 Dell R420 hosts with a bunch of NICs:

  • 2 disabled 1 GbE NICs
  • 2 Enabled 1 GbE NICs teamed for Live Migration
  • 2 10 GbE iWARP (RDMA) NICs not teamed for cluster, SMB Live Migration, and SMB 3.0 storage
  • 2 10 GbE NICs teamed for VM networking and host management

It’s absolutely over the top for real world but it gives me demo flexibility, especially to do the following.  In the demo, I have a PowerShell script that will perform a measured Live Migration of a VM with 8 GB RAM (statically assigned).  The VM is a pretty real workload: it’s running WS2012 R2, SQL Server, and VMM 2012 R2.

The script then does:

  1. Configure the cluster to use the 1 GbE team for Live Migration with TCPIP Live Migration
  2. Live migrate the VM (measured)
  3. Configure the cluster to use the 1 GbE team for Live Migration with Compressed Live Migration
  4. Live migrate the VM (measured)
  5. Configure the cluster to use a single 10 GbE iWARP NIC Live Migration with SMB Live Migration (SMB Direct)
  6. Live migrate the VM (measured)
  7. Configure the cluster to use a both 10 GbE iWARP NIC Live Migration with SMB Live Migration (SMB Direct + Multichannel)
  8. Live migrate the VM (measured)

What I observed in my test runs:

  • TCP/IP: About 95% of a 1 GbE NIC is utilised consistently for the duration.
  • Compressed: The bandwidth utilisation has a saw tooth pattern up to around 98%, as one should expect with the dynamic nature of compression.  CPU utilisation is higher (as expected), but remember that Live Migration will switch to TCP/IP if compression is contending for resources with the host/VMs.
  • SMB Direct: Smile Nearly 10 Gbps over a single NIC.
  • SMB Direct + SMB Multichannel: Open-mouthed smile Nearly 20 Gbps over the two iWARP rNICs.

And the time taken for each Live Migration?

image

Over 78 seconds to move a running VM over a 1 GbE network without optimizations!  Imagine that scaled out to a host with 250 GB RAM of production VM memory, needing to be drained for preventative maintenance.  That’s over 40 minutes, but it could be longer.  That’s a long time to wait to get critical services off of a host before a hardware warning becomes a host failure.

As the Live Migrations get faster they get closer to the theoretical minimum time.  There are four operations:

  1. Build the VM on the destination host (that magic 3% point, where the VM’s dependencies are attempted to be prepared)
  2. Copy RAM
  3. Sync RAM if required
  4. Destroy the VM on the source host

The first and last operation cannot be accelerated, generally taking a couple of seconds each.  In fact, the first operation could take longer if you use Virtual Fiber Channel. 

This test with with a more common VM with 8 GB RAM.  Remember that I moved a VM with 56 GB RAM in 35 seconds using SMB Direct + Multichannel?  That test was 33 seconds earlier today on the same preview release.  Hmm, I think that hardware would take 2.5 minutes to drain 250 GM RAM of VMs, versus 42 minutes of un-optimised Live Migrations.  I hope the point of this post is clear; if you need dense hosts then:

  • Use 10 GbE Networking; If you can’t upgrade to WS2012 R2 Hyper-V and use compression
  • If you’re using rNICs for storage then leverage that bandwidth and offload for optimising Live Migration, subject to QoS and SMB Bandwidth Constraints

Using WS2012 R2 Hyper-V Storage QoS

Windows Server 2012 R2 Hyper-V brings us a new storage feature called Storage QoS.  You can optionally turn on quality of service management on selected virtual hard disks.  You then have two settings, both of which default to 0 (unmanaged):

  • Minimum: Unlike with networking QoS, this is the one you are least likely to use in WS2012 R2.  This is not a minimum guarantee, like you find with networking.  Instead, this setting is used more as an alerting system, in case a selected virtual hard disk cannot get enough IOPS.  You enter the number of IOPS required.
  • Maximum: Here you can specify the maximum number of IOPS that a virtual hard disk can use from the physical storage.  This is the setting you are most likely to use in Storage QoS in WS2012 R2, because it allows you to limit overly aggressive VM activity on your physical storage.

This is a feature of the host, so the guest OS is irrelevant.  The setting is there for VHD (which you should have stopped deploying) and VHDX (which you should be deploying).

What Storage QoS Looks Like

I’ve set up a test lab to demonstrate this.  A VM has 2 additional 10 GB fixed (for fair comparison) virtual hard disks in the same folder on the host.  I have formatted the drives as P and Q in the guest OS, and created empty files in each volume called testfile.dat.  I then downloaded and installed SQLIO into the guest OS of the VM.  This tool will let me stress/benchmark storage.  I started PerfMon on the host, and added the Read Operations/Sec metric from Hyper-V Virtual Storage Device for the 2 virtual hard disks in question.

image

I opened two command prompt windows and ran:

  • sqlio.exe -s1000 -t10 -o16 -b8 -frandom p:testfile.dat
  • sqlio.exe -s1000 -t10 -o16 -b8 -frandom q:testfile.dat

That gives me 1000 seconds of read activity from the P drive (first data virtual hard disk) and the Q drive (the second data virtual hard disk).  Immediately I saw that both virtual hard disk files had over 300 IOPS of read activity.

clip_image002

I then configured the second virtual hard disk (containing Q:) to be restricted to 50 IOPS.

clip_image004

There was a response in PerfMon before the settings screen could refresh after me clicking OK.  The read activity on the virtual hard disk dropped to around 50 (highlighted in black), usually under and sometimes creeping just over 50 (never for long before it was clawed back down by QoS).

clip_image006

The non-restricted virtual hard disk immediately benefited immediately from the available bandwidth, seeing it’s read IOPS increase (highlighted in black) remains on the ceiling but the metrics rise, now getting up to over 560 IOPS.

clip_image008

Usage of Storage QoS

I think this is going to be a weird woolly area.  The only best practice I know of is that you should know what you are doing first.  Few people understand (A) what IOPS is, and (B) how many IOPS their applications need.  This is why Microsoft added the Hyper-V metrics for measuring read and write operations per second of a virtual hard disk (see above).  This gives you the ability to gather information (I don’t know if a System Center Operations Manager management pack has been updated) and determine regular usage patterns.

Once you know what usage is expected then you could set limits to constrain that virtual hard disk from misbehaving.

I personally think that Storage QoS will be a reactionary measure for out-of-control virtual machines in traditional virtualization deployments and most private clouds.  However, those who are adopting the hands-off, self-service model of a true cloud (such as public cloud) may decide to limit every virtual hard disk by default.  Who knows!

Anyway, the feature is there, and be sure that you know what you’re doing if you decide to use it.

Putting The Scale Into The Scale-Out File Server

Why did Microsoft call the “highly file server for application data” the Scale-Out File Server (SOFS)?  The reason might not be obvious unless you have lots of equipment to play with … or you cheat by using WS2012 R2 Hyper-V Shared VHDX as I did on Tuesday afternoon Smile

The SOFS can scale out in 3 dimensions.

0: The Basic SOFS

Here we have a basic example of a SOFS that you should have seen blogged about over and over.  There are two cluster nodes.  Each node is connected to shared storage.  This can be any form of supported storage in WS2012/R2 Failover Clustering.

image

1: Scale Out The Storage

The likely bottleneck in the above example is the disk space.  We can scale that out by attaching the cluster nodes to additional storage.  Maybe we have more SANs to abstract behind SMB 3.0?  Maybe we want to add more JBODs to our storage pool, thus increasing capacity and allowing mirrored virtual disks to have JBOD fault tolerance.

image

I can provision more disks in the storage, add them to the cluster, and convert them into CSVs for storing the active/active SOFS file shares.

2: Scale Out The Servers

You’re really going to have to have a large environment to do this.  Think of the clustered nodes as SAN controllers.  How often do you see more than 2 controllers in a single SAN?  Yup, not very often (we’re excluding HP P4000 and similar cos it’s weird).

Adding servers gives us more network capacity for client (Hyper-V, SQL Server, IIS, etc) access to the SOFS, and more RAM capacity for caching.  WS2012 allows us to use 20% of RAM as CSV Cache and WS2012 R2 allows us to use a whopping 80%!

image

3: Scale Out Using Storage Bricks

GO back to the previous example.  There you saw a single Failover Cluster with 4 nodes, running the active/active SOFS cluster role.  That’s 2-4 nodes + storage.  Let’s call that a block, named Block A.  We can add more of these blocks … into the same cluster.  Think about that for a moment.

EDIT: When I wrote this article I referred to each unit of storage + servers as a block.  I checked with Claus Joergensen of Microsoft and the terms being used in Microsoft are storage bricks or storage scale units.  So wherever you see “block” swap in storage brick or storage scale unit.

image

I’ve built it and it’s simple.  Some of you will overthink this … as you are prone to do with SOFS.

What the SOFS does is abstract the fact that we have 2 blocks.  The client servers really don’t know; we just configure them to access a single namespace called \Demo-SOFS1 which is the CAP of the SOFS role.

The CSVs that live in Block A only live in Block A, and the CSVs that live in Block B only live in Block B.  The disks in the storage of Block A are only visible to the servers in Block A, and the same goes for Block B.  The SOFS just sorts out who is running what CSV and therefore knows where share responsibility is.  There is a single SOFS role in the entire cluster, therefore we have the single CAP and UNC namespace.  We create the shares in Block A in the same place as we create them for Block B .. in that same single SOFS role.

A Real World Example

I don’t have enough machinery to demo/test this so I fired up a bunch of VMs on WS2012 R2 Hyper-V to give it a go:

  • Test-SOFS1: Node 1 of Block A
  • Test-SOFS2: Node 2 of Block A
  • Test-SOFS3: Node 1 of Block B
  • Test-SOFS4: Node 2 of Block B

All 4 VMs are in a single guest cluster.  There are 3 shared VHDX files:

  • BlockA-Disk1: The disk that will store CSV1 for Block A, attached to Test-SOFS1 + Test-SOFS2
  • BlockB-Disk1: The disk that will store CSV1 for Block B, attached to Test-SOFS3 + Test-SOFS4
  • Witness Disk: The single witness disk for the guest cluster, attached to all VMs in the guest cluster

Here are the 4 nodes in the single cluster that make up my logical Blocks A (1 + 2) and B (3 + 4).  There is no “block definition” in the cluster; it’s purely an architectural concept.  I don’t even know if MSFT has a name for it.

image

Here are the single witness disk and CSVs of each block:

image

Here is the single active/active SOFS role that spans both blocks A and B.  You can also see the shares that reside in the SOFS, one on the CSV in Block A and the other in the CSV in Block B.

image

And finally, here is the end result; the shares from both logical blocks in the cluster, residing in the single UNC namespace:

image

It’s quite a cool solution.

Windows Server 2012 R2 Has RTMd

Brad Anderson has announced that Windows Server 2012 R2 has been released to manufacturing.  He also stated:

Also of note: The next update to Windows Intune will be available at the time of GA, and we are also on track to deliver System Center 2012 R2.

The release of Windows Server 2012 R2 is set to happen on October 18th.

I’ve documented quite a few of the features related to Hyper-V in this new release.  There are some things I’ve not had time to add yet:

  • SMB 3.0/SOFS/Storage Spaces
  • Clustering
  • And a few other things where I’m unsure about the NDA

There is quite a bit of change in this release and plenty for you to digest.

Storage Spaces & Scale-Out File Server Are Two Different Things

In the past few months it’s become clear to me that people are confusing Storage Spaces and Scale-Out File Server (SOFS).  They seem to incorrectly think that one requires the other or that the terms are interchangeable.  I want to make this clear:

Storage Spaces and Scale-Out File Server are completely different features and do not require each other.

 

Storage Spaces

The concept of Storage Spaces is simple: you take a JBOD (a bunch of disks with no RAID) and unify them into a single block of management called a Storage Pool.  From this pool you create Virtual Disks.  Each Virtual Disk can be simple (no fault tolerance), mirrored (2-way or 3-way), or parity (like RAID 5 in concept).  The type of Virtual Disk fault tolerance dictates how the slabs (chunks) of each Virtual Disk are spread across the physical disks included in the pool.  This is similar to how LUNs are created and protected in a SAN.  And yes, a Virtual Disk can be spread across 2, 3+ JBODs.

Note: In WS2012 you only get JBOD tray fault tolerance via 3 JBOD trays.

Storage Spaces can be used as the shared storage of a cluster (note that I did not limit this to a SOFS cluster).  For example, 2 or more (check JBOD vendor) servers are connected to a JBOD tray via SAS cables (2 per server with MPIO) instead of connecting the servers to a SAN.  Storage Spaces is managed via the Failover Cluster Manager console.  Now you have the shared storage requirement of a cluster, such as a Hyper-V cluster or a cluster running the SOFS role.

Yes, the servers in the cluster can be your Hyper-V hosts in a small environment.  No, there is no SMB 3.0 or file shares in that configuration.  Stop over thinking things – all you need to do is provide shared storage and convert it into CSV that is used as normal by Hyper-V.  It is really that simple. 

Yes, JBOD + Storage Spaces can be used in a SOFS as the shared storage.  In that case, the virtual disks are active on each cluster node, and converted into CSVs.  Shares are created on the CSVs, and application servers access the shares via SMB 3.0.

Scale-Out File Server (SOFS)

The SOFS is actually an active/active role that runs on a cluster.  The cluster has shared storage between the cluster nodes.  Disks are provisioned on the shared storage, made available to each cluster node, added to the cluster, and converted into CSVs.  Shares are then created on the CSV and are made active/active on each cluster node via the active/active SOFS cluster role. 

SOFS is for application servers only.  For example Hyper-V can store the VM files (config, VHD/X, etc) on the SMB 3.0 file shares.  SOFS is not for end user shares; instead use virtual file servers that are stored on the SOFS.

Nowhere in this description of a SOFS have I mentioned Storage Spaces.  The storage requirement of a SOFS is cluster supported storage.  That includes:

  • SAS SAN
  • iSCSI SAN
  • Fibre Channel SAN
  • FCoE SAN
  • PCI RAID (like the Dell VRTX)
  • … and SAS attached shared JBOD + Storage Spaces

Note that I only mentioned Storage Spaces with the JBOD option.  Each of the other storage options for a cluster uses hardware RAID and therefore Storage Spaces is unsupported.

Summary

Storage Spaces works with a JBOD to provide a hardware RAID alternative.  Storage Spaces on a shared JBOD can be used as cluster storage.  This could be a small Hyper-V cluster or it could be a cluster running the active/active SOFS role.

A SOFS is an alternative way of presenting active/active storage to application servers. It requires cluster supported storage, which can be a shared JBOD + Storage Spaces.

Configuring Quorum on Storage Spaces For A 2 Node WS2012 (and WS2012 R2) Cluster

In this post I’m going to talk about building a 2 node Windows Server 2012/R2 failover cluster and what type of witness configuration to choose to achieve cluster quorum when the cluster’s storage is a JBOD with Storage Spaces.

I’ve been messing about in the lab with a WS2012 R2 cluster, in particular, a Scale-Out File Server (SOFS) running on a failover cluster with Storage Spaces on a JBOD.  What I’m discussing applies equally to:

  • A Hyper-V cluster that uses a SAS attached JBOD with Storage Spaces as the cluster storage
  • A SOFS based on a JBOD with Storage Spaces

Consider the build process of this 2 node cluster:

  • You attach a JBOD with raw disks to each cluster member
  • You build the cluster
  • You prepare Storage Spaces in the cluster and create your virtual disks

Hmm, no witness was created to break the vote and get an uneven result.  In fact, what happens is that the cluster will rig the vote to ensure that there is an uneven result.  If you’ve got 2 just nodes in the cluster with no witness then one has a quorum vote and the other doesn’t.  Imagine Node1 has a vote and Node2 does not have a vote.  Now Node1 goes offline for whatever reason.  Node2 does not have a vote and cannot achieve quorum; you don’t have a cluster until Node1 comes back online.

There are 2 simple solutions to this:

1) Create A File Share Witness

Create a file share on another highly available file server – uh … that’ll be an issue for small/medium business because all the virtual machines (including the file server) were going to be stored on the JBOD/Storage Spaces.  You can configure the file share as a witness for the cluster.

2) (More realistically) Create a Storage Spaces Virtual Disk As A Witness Disk

Create a small virtual disk (2-way or 3-way mirror for JBOD fault tolerance) and use that disk for quorum as the witness disk.  A 1 GB disk will do; the smallest my Storage Spaces implementation would do was 5 GB but that’s such a small amount anyway.  This solution is pretty what you’d do in a single site cluster with traditional block storage.

We could go crazy talking about quorum options in cluster engineering.  I’ve given you 2 simple options, with the virtual disk as a witness being the simplest.  Now each node has a vote for quorum with a witness to break the vote, and the cluster can survive either node failing.

WS2012 Hyper-V Networking On HP Proliant Blades Using Just 2 Flex Fabric Virtual Connects

On another recent outing I got to play with some Gen8 HP blade servers.  I was asked to come up with a networking design where (please bear in mind that I am not a h/w guy):

  • The blades would have a dual port 10 Gbps mezzanine card that appeared to be doing FCoE
  • There were 2 Flex Fabric virtual connects in the blade chassis
  • They wanted to build a WS2012 Hyper-V cluster using fiber channel storage

I came up with the following design:

The 2 FCoE (I’m guess that’s what they were) adapters were each given a static 4 Gbps slice of the bandwidth from each Virtual Connect (2 * 4 Gbps), which would match 4 Gbps Fiber Channel (FC).  MPIO was deployed to “team” the FC HBA’s.

One Ethernet NIC was presented from each Virtual Connect to each blade (2 per blade), with each NIC getting 6 Gbps.  WS2012 NIC teaming was used to team these NICs, and then we deployed a converged networks design in WS2012 using virtual NICs and QoS to dynamically carve up the bandwidth of the virtual switch (attached to the NIC team).

Some testing was done and we were running Live Migration at a full 6 Gbps, moving a 35 GB RAM VM via TCP/IP Live Migration in 1 minute and 8 seconds.

For WS2012 R2, I’d rather have 2 * 10 GbE for the 2 cluster & backup networks and 2 * 1 or 10 GbE for the management and VM network.  If the VC allowed it (didn’t have the time), I might have tried the below.  This would reduce the demands on the NIC team (actual VM traffic is usually light, but assessment is required to determine that) and allow an additional 2 non-teamed NICs:

Leaving the 2 new NICs (running at 4 Gbps) non-teamed leaves open the option of using SMB 3.0 storage (without RDMA/SMB Direct) on a Scale-Out File Server.  However, the big plus of SMB 3.0 Multichannel would be that I would now have a potential 8 Gbps to use for Live Migration via SMB 3.0 Open-mouthed smile But this is assuming that I could carve up the networking like this via Virtual Connects … and I don’t know if that is actually possible.

ODX–Not All SANs Are Created Equally

I recently got to play with a very expensive fiber channel SAN for the first time in a while (I normally only see iSCSI or SAS in the real world).  This was a chance to play with WS2012 Hyper-V on this SAN, and this SAN supported Offloaded Data Transfer (ODX).

Put simply, ODX is a SAN feature that allows Windows to offload certain file operations to the SAN, such as:

  • Server to server file transfer/copy
  • Creating a VHD file

That latter was of interest to me, because this should accelerate the creation of a fixed VHD/X file, making (self-service) clouds more responsive.

The hosts were fully patched, both hotfixes and update rollups.  Yes, that includes the ODX hotfix that is bundled into the May clustering bundle.  We created a 60 GB fixed size VHDX file … and it took as long as it would without ODX.  I was afraid of this.  The manufacturer of this particular SAN has … a certain reputation for being stuck in the time dilation of an IT black hole since 2009.

If you’re planning on making use of ODX then you need to understand that this isn’t like making a jump from 1 Gbps to 10 Gbps where there’s a predictable 10x improvement.  Far from it; the performance of ODX on one vendors top end SAN can be very different to that of another manufacturer.  Two of my fellow Hyper-V MVPs have done a good bit of work looking into this stuff.

Hans Vredevoort (@hvredevoort) tested the HP 3PAR P10000 V400 with HP 3PAR OS v3.1.2.  With ODX enabled (it is by default on the SAN and WS2012) when creating a pretty regular 50 GB VHDX Hans saw the time go from an unenhanced 6.5 minutes to 2.5 minutes.  On the other hand, a 1 TB VHDX would take 33 minutes with ODX enabled.

Didier Van Hoye (@workinghardinit) decided to experiment with his Dell Compellent.  Didier created 10 * 50 GB VHDX files and 10 * 475 GB fixed VHDX files in 42 seconds.  That was 5.12 TB of files created nearly 2 minutes faster than the 3PAR could create a single 50 GB VHDX file.  Didier has understandably gone on a video recording craze showing off how this stuff works.  Here is his latest.  Clearly, the Compellent rocks where others waltz.

These comparisons reaffirm what you should probably know: don’t trust the whitepapers, brochures, or sales-speak from a manufacturer.  Evidently not all features are created equally.

The Number 1 Support Call For WS2012 R2 Hyper-V Will Be …

… How do I enable remote desktop (Enhanced Session Mode) into a virtual machine on Windows Server 2012 R2 Hyper-V?

I just set up a host for the first time in a while and was trying to connect into a VM and was wondering why the hell this wasn’t working.  I checked the ICs in the guest OS, I patched, I rebooted … and then I realised that I was being an idiot. 

Enhanced Session Mode is enabled by default in Windows 8.1 and it is off by default on Windows Server 2012 R2.  I turned it on and got the experience I expected.

You’d think an MVP would realize this.  Another well known MVP just confessed to me on Skype that he made the exact same mistake Smile

Windows Server 2012 R2, System Center 2012 R2, Windows 8.1, and Windows Intune Release Dates

I was on vacation for a few days, but as was predicted by some in the media, the release date of WS2012R2, SysCtr2012R2, Win8.1, and Intune “Wave E” was announced during the week in two announcements, one for the desktop and one for the server & cloud products.

Windows 8.1 will be available online through the Windows Store to Windows 8 customers starting at 00:00 New Zealand time on October 18th – I think that is midday UK/Irish time on October 17th.  October 18th is the GA date, so that’s when you should be able to walk into stores and buy devices with Windows 8.1 already on them.  Ideally those will be designed-for-Windows 8.1 devices.  However, the Windows 8 release was underwhelming in retail stores around the world so I’m not holding my breath this time around – screw the political correctness, the manufacturers (including Microsoft Surface) did a shite job for the release of Windows 8.  The new devices listed in the Windows 8 announcement are already on the market (some less-so than others).

There is no news of TechNet & MSDN release dates for Windows 8.1 but I suspect Windows 8.1 will be made available universally on Oct 18th.  That’s because that is also the release plan for WS & SC 2012 R2 and Windows Intune “Wave E”.  Everything is happening all at once on Oct 18th.

Note that new VL purchases will be possible on November 1st when the price list is updated.