A Converged Networks Design For Hyper-V On SMB 3.0 Storage With RDMA (SMB Direct)

When you are done reading this post, then see the update that I added for SMB Live Migration on Windows Server 2012 R2 Hyper-V.

Unless you’ve been hiding under a rock for the last 18 months, you might know that Windows Server 2012 (WS2012) Hyper-V (and IIS and SQL Server) supports storing content (such as virtual machines) on SMB 3.0 (WS2012) file servers (and scale-out file server active/active clusters).  The performance of this stuff goes from matching/slightly beating iSCSI on 1 GbE, to crushing fiber channel on 10 GbE or faster.

Big pieces of this design are SMB Multichannel (think simple, configuration free & dynamic MPIO for SMB traffic) and SMB Direct (RDMA – low latency and CPU impact with non-TCP SMB 3.0 traffic).  How does one network this design?  RDMA is the driving force in the design.  I’ve talked to a lot of people about this topic over the last year. They normally over think the design, looking for solutions to problems that don’t exist.  In my core market, I don’t expect lots of RDMA and Infiniband NICs to appear.  But I thought I’d post how I might do a network design.  iWarp was in my head for this because I’m hoping I can pitch the idea for my lab at the office. Smile

image

On the left we have 1 or more Hyper-V hosts.  There are up to 64 nodes in a cluster, and potentially lots of clusters connecting to a single SOFS – not necessarily 64 nodes in each!

On the right, we have between 2 and 8 file servers that make up a Scale-Out File Server (SOFS) cluster with SAS attached (SAN or JBOD/Storage Spaces) or Fiber Channel storage.  More NICs would be required for iSCSI storage for the SOFS, probably using physical NICs with MPIO.

There are 3 networks in the design:

  • The Server/VM networks.  They might be flat, but in this kind of design I’d expect to see some VLANs.  Hyper-V Network Virtualization might be used for the VM Networks.
  • Storage Network 1.  This is one isolated and non-routed subnet, primarily for storage traffic.  It will also be used for Live Migration and Cluster traffic.  It’s 10 GbE or faster and it’s already isolated so it makes sense to me to use it.
  • Storage Network 2.  This is a second isolated and non-routed subnet.  It serves the same function as Storage Network 2.

Why 2 storage networks, ideally on 2 different switches?  Two reasons:

  • SMB Multichannel: It requires each multichannel NIC to be on a different subnet when connecting to a clustered file server, which includes the SOFS role.
  • Reliable cluster communications: I have 2 networks for my cluster communications traffic, servicing my cluster design need for a reliable heartbeat.

The NICs used for the SMB/cluster traffic are NOT teamed.  Teaming does not work with RDMA.  Each physical rNIC has it’s own IP address for the relevant (isolated and non-routed) storage subnet.  These NICs do not go through the virtual switch so the easy per-vNIC QoS approach I’ve mostly talked about is not applicable.  Note that RDMA is not TCP.  This means that when an SMB connection streams data, the OS packet scheduler cannot see it.  That rules out OS Packet Scheduler QoS rules.  Instead, you will need rNICs that support Datacenter Bridging (DCB) and your switches must also support DCB.  You basically create QoS rules on a per-protocol-basis and push them down to the NICs to allow the hardware (which sees all traffic) to apply QoS and SLAs.  This also has a side effect of less CPU utilization.

Note: SMB traffic is restricted to the rNICs by using the constraint option.

In the host(s), the management traffic does not go through the rNICs – they are isolated and non-routed.  Instead, the Management OS traffic (monitoring, configuration, remote desktop, domain membership, etc) all goes through the virtual switch using a virtual NIC.  Virtual NIC QoS rules are applied by the virtual switch.

In the SOFS cluster nodes, management traffic will go through a traditional (WS2012) NIC team.  You probably should apply per-protocol QoS rules on the management OS NIC for things like remote management, RDP, monitoring, etc.  OS Packet Scheduler rules will do because you’re not using RDMA on these NICs and this is the cheapest option.  Using DCB rules here can be done but it requires end-to-end (NIC, switch, switch, etc, NIC) DCB support to work.

What about backup traffic?  I can see a number of options.  Remember: with SMB 3.0 traffic, the agent on the hosts causes VSS to create a coordinated VSS snapshot, and the backup server retrieves backup traffic from a permission controlled (Backup Operators) hidden share on the file server or SOFS (yes, your backup server will need to understand this).

  1. Dual/Triple Homed Backup Server: The backup server will be connected to the server/VM networks.  It will also be connected to one or both of the storage networks, depending on how much network resilience you need for backup, and what your backup product can do.  A QoS (DCB) rule(s) will be needed for the backup protocol(s).
  2. A dedicated backup NIC (team): A single (or teamed) physical NIC (team) will be used for backup traffic on the host and SOFS nodes.  No QoS rules are required for backup traffic because it is alone on the subnet.
  3. Create a backup traffic VLAN, trunk it through to a second vNIC (bound to the VLAN) in the hosts via the virtual switch.  Apply QoS on this vNIC.  In the case of the SOFS nodes, create a new team interface and bind it to the backup VLAN.  Apply OS Packet Scheduler rules on the SOFS nodes for management and backup protocols.

With this design you get all the connectivity, isolation, and network path fault tolerance that you might have needed with 8 NICs plus fiber channel/SAS HBAs, but with superior storage performance.  QoS is applied using DCB to guarantee minimum levels of service for the protocols over the rNICs.

In reality, it’s actually a simple design.  I think people over think it, looking for a NIC team or protocol connection process for the rNICs.  None of that is actually needed.  You have 2 isolated networks, and SMB Multichannel figures it out for itself (it makes MPIO look silly, in my opinion).

The networking chapter of Windows Server 2012 Hyper-V Installation And Configuration Guide goes from the basics through to the advanced steps of understanding these concepts and implementing them:

KB2836402 – You Cannot Add VHD/X Files To Hyper-V VMs On WS2012

Microsoft released a hotfix for when you cannot add VHD or VHDX files to Hyper-V virtual machines in Windows Server 2012.

Symptoms

Consider the following scenario:

  • You create some failover cluster nodes on computers that are running Windows Server 2012.
  • You have the Hyper-V server role installed on the cluster nodes.
  • You create virtual machines on one cluster node, and you configure the virtual machines as cluster resources.
  • You create multiple Cluster Shared Volume (CSV) resources and create one Virtual Hard Disk (VHD) file in each CSV.
  • You use Hyper-V Manager to try to add the VHD files to the virtual machines.

In this scenario, you cannot add the VHD files to the virtual machines. Additionally, you receive an error message that resembles the following:

Error applying Hard Drive changes
Virtual machine‘ failed to add resources to ‘virtual machine
Cannot add ‘C:ClusterStorageVolume3Test3.vhdx‘. The disk is already connected to the virtual machine ‘virtual machine‘. (Virtual machine ID virtual machine ID)
Virtual machine‘ failed to add resources. (Virtual machine ID virtual machine ID)
Cannot add ‘C:ClusterStorageVolume3Test3.vhdx‘. The disk is already connected to the virtual machine ‘virtual machine‘. (Virtual machine ID virtual machine ID)

Cause

This issue occurs because multiple CSV volumes have the same 0000-0000 serial number. Therefore, the VHD files on different volumes are recognized as the same file.

A supported hotfix is available from Microsoft.

KB2838669–A Big Hotfix Bundle For WS2012 Failover Clustering

The Failover Clustering group also released a big update today.  It solves a range of issues.

Issue 1
Consider the following scnario:

  • You have the Hyper-V server role installed on a Windows Server 2012-based file server.
  • You have lots of virtual machines on a Server Message Block (SMB) share.
  • Virtual hard disks are attached to an iSCSI controller.

In this scenario, you cannot access to the iSCSI controller.

Issue 2
Consider the following scenario:

  • You have a two-node failover cluster that is running Windows Server 2012.
  • The cluster is partitioned.
  • There is a Cluster Shared Volume (CSV) on a cluster node, and a quorum resource on the other cluster node.

In this scenario, the cluster becomes unavailable.

Note This issue can be temporarily resolved by restarting the cluster.

Issue 3
Assume that you set up an SMB connection between two Windows Server 2012-based computers. The hardware on the computers do not support Offloaded Data Transfer (ODX). In this situation, the SMB session is closed unexpectedly.

Issue 4
Consider the following scenario:

  • You have a Windows Server 2012-based failover cluster.
  • You have a virtual machine on a CSV volume on the cluster.
  • You try to create a snapshot for the virtual machine. However, the snapshot creation is detected as stuck. Therefore, the snapshot set is aborted.
  • During the abortion process of the snapshot, the CSV volume is deleted after the snapshot shares are deleted.

In this scenario, the abortion process is paused automatically because of an error that occurs on the cluster.

Issue 5
Assume that you have a Windows Server 2012-based failover cluster. Two specific snapshot state change requests are sent from disk control manager to CSV proxy file system (CSVFS). The requests are present in the same message. In this situation, disk control manager is out-of-sync with CSVFS.

Issue 6
Assume that you create a snapshot for a CSV volume on a Windows Server 2012-based failover cluster. When the snapshot creation is still in progress, another snapshot creation is requested on the same CSV volume. In this situation, the snapshot creation fails and all later snapshot creation attempts on the CSV volume fail.

Note You cannot create a snapshot for the CSV volume until the volume fails over or the volume goes offline and then back online.

Additionally, the update also resolves the issues that are described in the following Microsoft Knowledge Base (KB) articles:

  • KB2799728: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2801054: VSS_E_SNAPSHOT_SET_IN_PROGRESS error when you try to back up a virtual machine in Windows Server 2012
  • KB2796995: Offloaded Data Transfers fail on a computer that is running Windows 8 or Windows Server 2012
  • KB2813630: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2824600 Virtual machine enters a paused state or goes offline when you try to create a backup of the virtual machine on a CSV volume in Windows Server 2012.

A supported hotfix is available from Microsoft.

KB2836988 – May 2013 Update Rollup For WS2012 and Win8

It was a busy day for Microsoft releasing hotfixes today.  This includes another UR bundle of hotfixes.  There are a few in here that are relevant to Hyper-V, etc.  As usual, this update rollup is available via the Windows Catalog (WSUS, etc).

KB2836121 – An update for Storage Spaces in Windows 8 and Windows Server 2012 is available

This article describes an update for Storage Spaces in Windows 8 and Windows Server 2012. After you install the update, Storage Spaces will prioritize regeneration. Specifically, the regeneration time is shorter, but the available bandwidth for the regular I/O buffer is decreased.

2833586 – Virtual machine does not come online after you add a pass-through disk to the virtual machine in Windows Server 2012

Consider the following scenario:

  • You have a Windows Server 2012-based failover cluster. 
  • A Hyper-V server role is installed on the cluster node.
  • You create a virtual machine on the cluster node, and then you configure the virtual machine as a cluster resource.
  • You add a pass-through disk to the virtual machine by using Windows Management Instrumentation (WMI).

In this scenario, the virtual machine does not come online.
Note If you add the pass-through disk by using the Failover Cluster Management snap-in, the virtual machine does come online.

IP Assignment Strategies For Hyper-V Replica

What I love about Hyper-V Replica is that (a) it is free (b) it just works and (c) it works for a wide variety of customers/partners (large and small).  It’s great that you can get your VMs from site A operational in site B with a maximum RPO of 5 minutes and an RTO of however long it takes to orchestrate the start of your VMs (from seconds, depending on how many VMs you have to order).  But one question remains – how do I address those VMs in the DR site?

Stretched Subnets

I am not a networking guy.  The term I know is stretched VLANs, but network folks have other mechanisms for this.  Basically, concept is that you enable your subnets to reside and route in the primary and secondary site.  That means a VM with the address of 192.168.1.20 can operate, route, and be accessible to clients (from anywhere) in either site.  That’s great for networks of a certain size.  Small businesses probably can’t do this, and larger enterprises look at the complexity and laugh.

IP Address Injection

With this approach, the Hyper-V administrator pre-configures DR site IP addresses for the VM.  The address is injected into the VM during failover using Key Value Pairs (KVP).  This allows site A and site B to have different IP ranges.  This solution will work pretty well for smaller customers where they own both the primary and the secondary sites.

DHCP

I hate using DHCP addresses for static resources like servers (including VMs).  But you can do it.  With this approach, you have DHCP in the primary site to assign reserved IPs to the VMs in the primary site.  You have something similar in the secondary site, but with a scope that is suitable for there.  Note, you must use static MAC addresses for reservations to work – so be sure to use export/import to move VMs out of band.  This is the one solution that I have the least faith in.  You might want to look at WS2012 DHCP failover to ensure your DHCP is highly available because it has become a very important factor in your business continuing to operate.

Hyper-V Network Virtualization (HNV)

HNV, or software defined networking (SDN), is a very scalable solution.  It also allows VMs to operate with their normal IP addresses (consumer addresses) while really communicating with the physical network via provider addresses.  The VM simply moves/starts on a predefined VM Network(s) in the DR site and continues to communicate.  For this to work in production, you need VMM 2012 SP1 and a network virtualization gateway (see Iron Networks.  F5 also have something coming).

This solution is a nice one for large enterprises that want to use SDN to abstract networks from a central console).  It also allows service providers to support many tenants with overlapping subnets (192.168.1.0/24 or 10.0.0.0).

OK, great, so we get VMs operational in the DR site.  Some of these solutions require the VM to change IP address while some don’t.  If the IP changes, how do clients find the servers?  DNS will be out of date!

DNS TTL

You can reduce the TTL for the A records of your VMs to something small.  If there’s a disaster, is it a big deal if VMs can’t resolve the names of servers for 5 minutes?  Keep in mind DNS replication to local sites – so this might become 15, 20, 60 minutes, depending on TTLs and replication windows.  You can force replication to happen and DNS server caches to flush, but those are manual tasks (and prone to not happening in a disaster).

IP Address Abstraction

Imagine this scenario: a large corporate has an offsite data centre.  The business operates across a WAN.  A DR data centre is deployed, also offsite.  A network appliance(s) are deployed and configured to abstract the actual IP addresses of the servers.  This allows servers to use IP-A in site A and IP-B in site B.  However, the servers are known to the network via IP-C, the abstracted IP managed by the device(s).  This solution is for the very largest of businesses.  For clients on the WAN, DNS is simple: there is only one A record and it’s for IP-C, the abstracted IP.

Personally, I find SDN to be the most elegant solution but there are requirements of scale to make it work.  For the smaller biz, maybe DHCP or IP address injection are the way forward.  There are options – it is up to you to choose the right one.  And I am certainly not going to claim that I have presented all options.

You can learn more about Hyper-V DR and Hyper-V Replica from two chapters on those subjects in Windows Server 2012 Hyper-V from the book, Windows Server 2012 Hyper-V Installation And Configuration Guide:

 

KB2698666 – Can’t Open Hyper-V Settings On W2008 R2 SP1 Hyper-V

Microsoft has released a hotfix for Windows Server 2008 R2 SP1 Hyper-V for when you cannot open the Hyper-V Settings dialog box on a Hyper-V host.

Symptoms

Consider the following scenario:

  • You enable Microsoft RemoteFX on a Hyper-V server that is running Windows Server 2008 R2 Service Pack 1 (SP1).
  • You create a long logon banner on the server by changing the Group Policy settings.
  • You connect to the server by using a Remote Desktop Protocol (RDP) connection.
  • You try to open the Hyper-V Settings dialog box.

In this scenario, you cannot open the Hyper-V Settings dialog box.

Cause

This issue occurs because a worker session cannot display or dismiss long logon banners when RemoteFX is enabled.

A supported hotfix is available from Microsoft

Tip: Legally Deploying Images Windows To OEM Licensed PCs

As usual, I will not be answering licensing questions.  All emails and comments will be deleted without a response.  Please ask your reseller these questions instead – that’s why they add a margin to the license when they sell it to you, so make them work for it.

You cannot legally deploy an image of an OEM media installation of Windows.  According to a Microsoft licensing brief:

Organizations do not have the right to reimage by using OEM media.

An OEM image can only be preloaded on a PC by the OEM during manufacturing. An image can be individually recovered by the organization (or a service provider it chooses) by using the recovery media. The OEM recovery media should match the product version originally preinstalled on the system; no other image can be used to restore the system to its original state

That means a company that buys hundreds or thousands of PCs, intent on using the OEM license, cannot create a custom image from OEM media (assuming OEM media can even be acquired!).  Businesses hate OEM builds because they are full of crap-ware and unmanaged security vulnerabilities.  So what can you do to re-image these PCs?  Do you need to buy a VL for every single machine?  There are benefits to doing that, especially with SA attached, but that’s not for everyone.

There is a little known legal trick that you can apply.  According to Microsoft:

Reimaging is the copying of software onto multiple devices from one standard image. Reimaging rights are granted to all Microsoft Volume Licensing customers. Under these rights, customers may reimage original equipment manufacturer (OEM) or full packaged product (FPP) licensed copies using media provided under their Volume Licensing agreement.

These finer points are detailed in the licensing brief.

Basically:

  • Say you buy 2,000 PCs and want to use their OEM licensing for Windows 7/8 Pro
  • You want to deploy a custom build/image to these machines
  • You buy a single volume license for Windows 8 Pro (includes downgrade rights)
  • You use the MAK/KMS key to create and deploy an image of Windows 7/8 Pro
  • You’re legit!

You must be sure that you understand:

  • The OEM and the VL license must be the same edition, e.g. you cannot deploy a Pro VL image to Home OEM licensed PCs using this licensing technique.
  • You must ensure that the versions are matched, e.g. the OEM license entitles you to Windows 7 (including downgrades) if deploying Windows 7 images.  For example, you can’t deploy a Windows 7 VL image to a PC with a Windows Vista OEM sticker/license using this licensing technique.
  • The languages must be matched as well.

What if you company does not have a VL agreement?  You need to 5 products to start one.  You can buy a single copy of Windows (to get the ISO download and MAK/KMS keys) and 4 cheap dummy CALs – now you have a VL at minimum cost, and you can re-image your OEM-licensed PCs with an image made from your VL media.

Windows 8 Sales Below The Norm, But Will They Spike in FY2014?

A few people have started to figure out that Windows 8 sales are down.  It’s clear that retail sales are down.  Annuity licensing agreements (you always buy the latest volume license version and can choose to downgrade, e.g. Windows 7) and the cheap upgrade offers have boosted numbers, but the 20 million/month norm appears to have slid quite a bit.

I’m not going to get into the whys of this; that’s been talked to death.

I am wondering if we will see a reverse course caused by business customers.  Windows XP end of support is coming in April of next year.  Businesses, who have mostly clung to Windows XP like a zombie Charlton Heston grips his gun, are starting to look at upgrades to Windows 7. 

There is a general misunderstanding with enterprise licensing.  Not every company has an annuity agreement such as OVS (SMEs) or an Enterprise Agreement (EA – larger enterprises) that includes Software Assurance (SA – one of the benefits is upgrade rights).  And even if they do, they will be choosey about what is included: maybe they’ll include a Core CAL or an Enterprise CAL for servers, but they won’t get licensing for the desktop OS.  That might be because they’ve been happy with Windows XP and stuck with the OEM license that came with the PC.  Some of those PCs have Windows 7 stickers (licenses) and some don’t.  My experience is that business PCs hang around for a lot longer than most retail PCs, well after their hardware support expires.

Let’s summarise for a moment:

  • Businesses are using Windows XP, and XP end of support is April 2014, making XP a security risk to the business.
  • Businesses that do have annuity licensing agreements don’t necessarily have licensing for Windows 7.

That means they need Windows 7 licensing for those machines not covered.  They’ll likely get that through volume licensing.  As I said earlier, you can’t buy a legacy version of Windows via VL.  You always buy the latest version (Windows 8 at the moment) and choose to downgrade (e.g. Windows 7).  The estimate is that somewhere around half of business PCs are running Windows XP.  If a significant percentage of those PCs upgrade to Windows 7 (really Windows 8, license-wise) then we could see a big spike in license sales in the coming year.  Microsoft uses the EA Sports calendar, and their new financial year starts in July, therefore we could see big Windows 8 sales from the enterprise then.

And yes, if “Windows 8.1” includes certain features, it could help both consumer and business adoption of “Windows 8.1” (and therefore “Windows 8” sales) in FY14.

Technorati Tags:

The New Adobe Cloud Distribution Model – The Good & The Bad

You might have noticed a lot of comments about Adobe in the cloud over the last few days.  What’s happening is that Adobe has launched a new way to buy Adobe software.  You can buy it direct online.  Or you can buy a card in a store (it must be activated by the till) and download the software.  I knew Adobe was making a change quite a while ago.  I knew they were very serious about it.  And I like it – for the most part.

I’m a pretty serious photographer.  I’m far from the best, but I take it seriously as a way to get away from work stuff.  But just like in work, when I shoot and process, I like to do it right.  I use Adobe software to edit my photos (all photos, even Ansel Adams’ classic b&w’s, are edited in some way).  And the serious Adobe software is expensive.

Imagine this:

  • You have a Canon 50D.
  • You buy some expensive Adobe software that allows you to convert/edit RAW photos from your camera.
  • You are a happy customer for 12-18 months.
  • Adobe launches a new version (Y) of the software but you don’t need it because version X is just fine as it is.
  • You upgrade to a Canon 60D that has a new RAW format.
  • Whoops!  Version X doesn’t have support for the new RAW format and you have to buy version Y to edit your photos.

We can blame Adobe for not upgrading version X to edit the RAWs from the 60D.  But here’s a cold reality: Adobe is a business that is there to make a profit.  If they continued to support older products then no one would ever buy the new software.  Therefore all their efforts at research and development in new versions would be loss making and Adobe would go out of business.

The switch to cloud distribution changes things.  The really serious graphics editors can subscribe to things like the Creative Cloud suite.  That’s one serious mama-jamma of a package.  Maybe you like Photoshop CS but are but off by the huge price tag?  And that nasty price tag might seem worse if you consider the short life of your product if you change camera bodies every 18-24 months.  Well, have a look at Photoshop through the cloud.  You get a modest monthly fee (from 1 month, annual agreement, and CS3+ upgrades), upgrades, and 20 GB of online storage.

That makes the cloud distribution model look very very nice.

There is a fly in the ointment.  Adobe’s currency calculator must be broken.  The price for Photoshop per month (annual commitment) is (conversion based on pricing on 6/May/2013):

  • US Dollars: $19.99
  • Euro: EUR24.59 ($32.15)
  • UK Pound: 17.58 ($27.32)

That makes the Euro price for Adobe Photoshop through the cloud:

  • 60.8% more expensive than the US price
  • 17.7% more expensive than the UK price

No taxation can be blamed for that price difference.  If retail distribution had a place here, then we could blame that … but one of the perks of cloud computing is that there is a uniform distribution cost – it’s the same data center.  Something doesn’t smell right to me.

If Adobe fixes this stinker then I’m all in favour of the switch.  Until then – I have a problem with it, just like I did with Windows Intune pricing until it was fixed in “Wave D”.

Microsoft Infrastructure-as-a-Service Product Line Architecture Guidance

Microsoft has released guidance on how to design and manage IaaS clouds using Windows Server 2012 Hyper-V, storage, networking, and System Center 2012 SP1.

Infrastructure-as-a-Service Product Line Architecture Fabric Architecture Guide

This document provides customers with the necessary guidance to develop solutions for a Microsoft private cloud infrastructure in accordance with the IaaS PLA patterns that are identified for use with the Windows Server 2012 operating system. This document provides specific guidance for developing fabric architectures (compute, network, storage, and virtualization layers) of an overall private cloud solution.

image

Infrastructure-as-a-Service Product Line Architecture Fabric Management Architecture Guide

This document provides customers with the necessary guidance to develop solutions for a Microsoft private cloud infrastructure in accordance with the IaaS PLA patterns that are identified for use with the Windows Server 2012 and System Center 2012 Service Pack 1 (SP1). This document provides specific guidance for developing a management architecture for an overall private cloud solution.

image