Windows Server 2012 Hyper-V Storage Strategies

This article was written just after the beta of WS2012 was launched.  We now now that the performance of SMB 3.0 is -really- good, e.g. 1 million IOPS from a VM good.

WS2012 is bringing a lot of changes in how we design storage for our Hyper-V hosts.  There’s no one right way, just lots of options, which give you the ability to choose the right one for your business.

There were two basic deployments in Windows Server 2008 R2 Hyper-V, and they’re both sill valid with Windows Server 2012 Hyper-V:

  • Standalone: The host had internal disk or DAS and the VMs that ran on the host were stored on this disk.
  • Clustered: You required a SAN that was either SAS, iSCSI, or FIbre Channel (FC) attached (as below).

image

And there’s the rub.  Everyone wants VM mobility and fault tolerance.  I’ve talked about some of this in recent posts.  Windows Server 2012 Hyper-V has Live Migration that is independent of Failover Clustering.  Guest clustering is limited to iSCSI in Windows Server 2012 Hyper-V but Windows Server 2012 Hyper-V is adding support for Virtual Fibre Channel.

Failover Clustering is still the ideal.  Whereas Live Migration gives proactive migration (move workloads before a problem, e,g, to patch a host), Failover Clustering provides high availability via reactive migration (move workloads automatically in advance of a problem, e.g. host failure).  The problem here is that a cluster requires shared storage.  And that has always been expensive iSCSI, SAS, or FC attached storage.

Expensive?  To whom?  Well, to everyone.  For most SMEs that buy a cluster, the SAN is probably the biggest IT investment that that company will ever make.  Wouldn’t it suck if they got it wrong, or if they had to upgrade/replace it in 3 years?  What about the enterprise?  They can afford a SAN.  Sure, but their storage requirements keep growing and growing.  Storage is not cheap (don’t dare talk to me about $100 1 TB drives).  Enterprises are sick and tired of being held captive by the SAN companies for 100% of their storage needs.

We’re getting new alternatives from Microsoft in Windows Server 2012.  This is all made possible by a new version of the SMB protocol.

SMB 3.0 (Formerly SMB 2.2)

Windows Server 2012 is bringing us a new version of the SMB protocol.  With the additional ability to do multichannel, where file share data transfer automatically spans multiple NICs with fault tolerance, we are now getting support to store virtual machines on a file server, as long as both client (Hyper-V host) and server (file server) are running Windows Server 2012 or above.

If you’re thinking ahead then you’ve already started to wonder about how you will backup these virtual machines using an agent on the host.  The host no longer has “direct” access to the VMs as it would with internal disk, DAS, or a SAN.  Windows Server 2012 VSS appears to be quite clever, intercepting a backup agents request to VSS snapshot a file server stored VM, and redirecting that to VSS on the file server.  We’re told that this should all be transparent to the backup agent.

Now we get some new storage and host design opportunities.

Shared File Server – No Hyper-V Clustering

In this example a single Windows Server 2012 file server is used to store the Hyper-V virtual machines.  The Hyper-V hosts can use the same file server, and they are not clustered.  With this architecture, you can do Live Migration between the two hosts, even without a cluster.

image

What about performance?  SMB is going to suck, right?  Not so fast, my friend!  Even with a pair of basic 1 Gbps NICs for SMB 3.0 traffic (instead of a pair of NICs for iSCSI), I’ve been told that you can expect iSCSI-like speeds, and maybe even better.  At 10 Gbps … well Smile The end result is cheaper and easier to configure storage.

With the lack of fault tolerance, this deployment type is probably suitable only for small businesses and lab environments.

Scale Out File Server (SOFS) – No Hyper-V Clustering

Normally we want our storage to be fault tolerant. That’s because all of our VMs are probably on that single SAN (yes, some have the scale and budget for spanning SANs but that’s a whole different breed of organisation).  Normally we would need a SAN made up fault tolerant disk tray$, switche$, controller$, hot $pare disk$, and $o on.   I think you get the point.

Thanks to the innovations of Windows Server 2012, we’re going to get a whole new type of fault tolerant storage called a SOFS.

image

What we have in a SOFS is an active/active file server cluster.  The hosts that store VMs on the cluster use UNC paths instead of traditional local paths (even for CSV).  The file servers in the SOFS cluster work as a team.  A role in SMB 3.0 called the witness runs on the Hyper-V host (SMB witness client) and file server (SMB witness server).  With some clever redirection the SOFS can handle:

  • Failure of a file server with just a blip in VM I/O (no outage).  The cluster will allow the new host of the VMs to access the files without a 60 second delay you might see in today’s technology.
  • Live Migration of a VM from one host to another with a smooth transition of file handles/locks.

And VSS works through the above redirection process too.

One gotcha: you might look at this and this this is a great way to replace current file servers.  The SOFS is intended only for large files with little metadata access (few permissions checks, etc).  The currently envisioned scenarios are SQL Server file storage and Hyper-V VM file storage.  End user file shares, on the other hand, feature many small files with lots of metadata access and are not suitable for SOFS.

Why is this?  To make the file servers active/active with smooth VM file handle/lock transition, the storage that the file servers are using consists of 1 or more Cluster Shared Volumes (CSVs).  This uses CSV v2.0, not the version we have in Windows Server 2008 R2.  The big improvements in CSV 2.0 are:

  • Direct I/O for VSS backup
  • Concurrent backup across all nodes using the CSV

Some activity in a CSV does still cause redirected I/O, and an example of that is metadata lookup.  Now you get why this isn’t good for end user data.

When I’ve talked about SOFS many have jumped immediately to think that it was only for small businesses.  Oh you fools!  Never assume!  Yes, SOFS can be for the small business (more later).  But where this really adds value is that larger business that feels like they are held hostage by their SAN vendors.  Organisations are facing a real storage challenge today.  SANs are not getting cheaper, and the storage scale requirements are rocketing.  SOFS offers a new alternative.  For a company that requires certain hardware functions of a SAN (such as replication) then SOFS offers an alternative tier of storage.  For a hosting company where every penny spent is a penny that makes them more expensive in the yes of their customers, SOFS is a fantastic way to provide economic, highly performing, scalable, fault tolerant storage for virtual machine hosting.

The SOFS cluster does require shared storage of some kind.  It can be made up of the traditional SAN technologies such as SAS, iSCSI, or Fibre Channel with the usual RAID suspects.  Another new technology, called PCI RAID, is on the way.  It will allow you to use just a bunch of disks (JBOD) and you can have fault tolerance in the form of mirroring or parity (Windows Server 2012 Storage Spaces and Storage Pools).  It should be noted that if you want to create a CSV on a Storage Space then it must use mirroring, and not parity.

Update: I had previously blogged in this article that I was worried that SOFS was suitable only for smaller deployments.  I was seriously wrong.

Good news for those small deployment: Microsoft is working with hardware partners to create a cluster-in-a-box (CiB) architecture with 2 file servers, JBOD and PCI RAID.  Hopefully it will be economic to acquire/deploy.

Update: And for the big biz that needs big IOPS for LOB apps, there are CiB solutions for you too, based on Infiniband networking, RDMA (SMB Direct), and SSD, e.g. a 5U appliance having the same IOPS are 4 racks of fibre channel disk.

Back to the above architecture, I see this one being useful in a few ways:

  • Hosting companies will like it because every penny of each Hyper-V host is utilised.  Having N+1 or N+2 Hyper-V hosts means you have to add cost to your customer packages and this makes you less competitive.
  • Larger enterprises will want to reduce their every-5 year storage costs and this offers them a different tier of storage for VMs that don’t require those expensive SAN features such as LUN replication.

SOFS – Hyper-V Cluster

This is the next step up from the previous solution.  It is a fully redundant virtualisation and storage infrastructure without the installation of a SAN.  A SOFS (active-active file server cluster) provides the storage.  A Hyper-V cluster provides the virtualisation for HA VMs.

image

The Hyper-V hosts are clustered.  If they were direct attached to a SAN then they would place their VMs directly on CSVs.  But in this case they store their VMs on a UNC path, just as with the previous SMB 3.0 examples.  VMs are mobile thanks to Live Migration (as before without Hyper-V clusters) and thanks to Failover.  Windows Server 2012 Clustering has had a lot of work done to it; my favourite change being Cluster Aware Updating (easy automated patching of a cluster via Automatic Updates).

The next architectures “up” from this one are Hyper-V clusters that use SAS, iSCSI, or FC.  Certainly SOFS is going to be more scalable than a SAS cluster.  I’d also argue that it could be more scalable than iSCSI or FC purely based on cost.  Quality iSCSI or FC SANs can do things at the hardware layer that a file server cluster cannot, but you can get way more fault tolerant storage per Euro/Dollar/Pound/etc with SOFS.

So those are your options … in a single site Smile

What About Hyper-V Cluster Networking? Has It Changed?

In a word: no.

The basic essentials of what you need are still the same:

  1. Parent/management networking
  2. VM connectivity
  3. Live Migration network (this should usually be your first 10 GbE network)
  4. Cluster communications network (heartbeat and redirected IO which does still have a place, even if not for backup)
  5. Storage 1 (iSCSI or SMB 3.0)
  6. Storage 2 (iSCSI or SMB 3.0)

Update: We now have two types of redirected IO that both support SMB Multichannel and SMB Direct.  SMB redirection (high level) is for those short metadata operations, an block level redirect (2x faster) is for sustained redirection IO operations such as a storage path failure.

Maybe you add a dedicated backup network, and maybe you add a 2nd Live Migration network.

How you get these connections is another story.  Thanks to native NIC teaming, DCB, QoS, and a lot of other networking changes/additions, there’s lots of ways to get these 6+ communication paths in Windows Server 2012.  For that, you need to read about converged fabrics.

Windows Server 2012 Hyper-V Replica … In Detail

If you asked me to pick the killer feature of WS2012 Hyper-V, then Replica would be high if not at the top of my list (64 TB VHDX is right up there in the competition).  In Ireland, and we’re probably not all that different from everywhere else, the majority of companies are in the small/medium enterprise (SME) space and the vast majority of my customers work exclusively in this space.  I’ve seen how DR is a challenge to enterprises and to the SMEs alike.  It is expensive and it is difficult.  Those are challenges an enterprise can overcome by spending, but that’s not the case for the SME.

Virtualisation should help.  Hardware consolidation reduces the cost, but the cost of replication is still there.  SAN’s often need licenses to replicate.  SAN’s are normally outside of the reach of the SME and even the corporate regional/branch office.  Software replication which is aimed at this space is not cheap either, and to be honest, some of them are more risky than the threat of disaster.  And let’s not forget the bandwidth that these two types of solution can require.

Isn’t DR Just An Enterprise Thing?

So if virtualisation mobility and the encapsulation of a machine as a bunch of files can help, what can be done to make DR replication a possibility for the SME?

Enter Replica (Hyper-V Replica), a built-in software based asynchronous replication mechanism that has been designed to solve these problems.  This is what Microsoft envisioned for Replica:

  • If you need to replicate dozens or hundreds of VMs then you should be using a SAN and SAN replication.  Replica is not for the medium/enterprise sites.
  • Smaller branch offices or regional offices that need to replicate to local or central (head office or HQ data centre) DR sites.
  • SME’s who want to replicate to another office.
  • Microsoft partners or hosting companies that want to offer a service where SME’s could configure important Windows Server 2012 Hyper-V host VMs to replicate to their data centre – basically a hosted DR service for SMEs.  Requirements of this is that it must have Internet friendly authentication (not Kerberos) and it must be hardware independent, i.e. the production site storage can be nothing like the replica storage.
  • Most crucially of all: limited bandwidth.  Replica is designed to be used on commercially available broadband without impacting normal email or browsing activity – Microsoft does also want to sell them Office 365, after all Smile How much bandwidth will you need?  How long is a piece of string?  Your best bet is to measure how much change there is to your customers VMs every 5 minutes and that’ll give you an idea of what bandwidth you’ll need.

Figure 1  Replicate virtual machines

In short, Replica is designed and aimed at the ordinary business that makes up 95% of the market, and it’s designed to be easy to set up and invoke.

What Hyper-V Replica Is Not Intended To Do

I know some people are thinking of this next scenario, and the Hyper-V product group anticipated this too.  Some people will look at Hyper-V Replica and see it as a way to provide an alternative to clustered Hyper-V hosts in a single site.  Although Hyper-V Replica could do this, it is not intended for for this purpose.

The replication is designed for low bandwidth, high latency networks that the SME is likely to use in inter-site replication.  As you’ll see later, there will be a delay between data being written on host/cluster A and being replicated to host/cluster B.

You can use Hyper-V Replica within a site for DR, but that’s all it is: DR.  It is not a cluster where you fail stuff back and forth for maintenance windows – although you probably could shut down VMs for an hour before flipping over – maybe – but then it would be quicker to put them in a saved state on the original host, do the work, and reboot without failing over to the replica.

How It Works

I describe Hyper-V Replica as being a storage log based asynchronous disaster recovery replication mechanism.  That’s all you need to know …

But let’s get deeper Smile

How Replication Works

Once Replica is enabled, the source host starts to maintain a HRL (Hyper-V Replica Log file) for the VHDs.  Every 1 write by the VM = 1 write to VHD and 1 write to the HRL.  Ideally, and this depends on bandwidth availability, this log file is replayed to the replica VHD on the replica host every 5 minutes.  This is not configurable.  Some people are going to see the VSS snapshot (more later) timings and get confused by this, but the HRL replay should happen every 5 minutes, no matter what.

The HRL replay mechanism is actually quite clever; it replays the log file in reverse order, and this allows it only to store the latest writes.  In other words, it is asynchronous (able to deal with long distances and high latency by write in site A and later write in site B) and it replicates just the changes.

Note: I love stuff like this.  Simple, but clever, techniques that simplify and improve otherwise complex tasks.  I guess that’s why Microsoft allegedly ask job candidates why manhole covers are circular Smile

As I said, replication or replay of the HRL will normally take place every 5 minutes.  That means if a source site goes offline then you’ll lose anywhere from 1 second to nearly 10 minutes of data.

I did say “normally take place every 5 minutes”.  Sometimes the bandwidth won’t be there.  Hyper-V Replica can tolerate this.  After 5 minutes, if the replay hasn’t happened then you get an alert.  The HRL replay will have another 25 minutes (up to 30 completely including the 5) to complete before going into a failed state where human intervention will be required.  This now means that with replication working, a business could lose between 1 second and nearly 1 hour of data.

Most organisations would actually be very happy with this. Novices to DR will proclaim that they want 0 data loss. OK; that is achievable with EUR100,000 SANs and dark fibre networks over short distances. Once the budget face smack has been dealt, Hyper-V Replica becomes very, very attractive.

That’s the Recovery Point Objective (RPO – amount of time/data lost) dealt with.  What about the Recovery Time Objective (RTO – how long it takes to recover)?  Hyper-V Replica does not have a heartbeat.  There is not automatic failover.  There’s a good reason for this.  Replica is designed for commercially available broadband that is used by SMEs.  This is often phone network based and these networks have brief outages.  The last thing an SME needs is for their VMs to automatically come online in the DR site during one of these 10 minute outages.  Enterprises avoid this split brain by using witness sites and an independent triangle of WAN connections.  Fantastic, but well out of the reach of the SME.  Therefore, Replica will require manual failover of VMs in the DR site, either by the SME’s employees or by a NOC engineer in the hosting company.  You could simplify/orchestrate this using PowerShell or System Center Orchestrator.  The RTO will be short but have implementation specific variables: how long does it take to start up your VMs and for their guest operating systems/applications to start?  How long will it take for you to get your VDI/RDS session hosts (for remote access to applications) up, running and accepting user connections?  I’d reckon this should be very quick, and much better with the 4-24 hours that many enterprises aim for.  I’m chuckling as I type this; the Hyper-V group is giving SMEs a better DR solution than most of the Fortune 1000’s can realistically achieve with oodles of money to spend on networks and storage replication, regardless of virtualisation products.

A common question I expect: there is no Hyper-V integration component for Replica.  This mechanism works at the storage level, where Hyper-V is intercepting and logging storage activity.

Replica and Hyper-V Clusters

Hyper-V Replica works with clusters.  In fact you can do the following replications:

  • Standalone host to cluster
  • Cluster to cluster
  • Cluster to standalone host

The tricky thing is the configuration replication and smooth delegation of replication (even with Live Migration and failover) of HA VMs on a cluster.  How can this be done?  You can enable a HA role called a Hyper-V Replica Broker on a cluster (once only).  This is where you can configure replication, authentication, etc, and the Broker replicates this data out to cluster nodes.  Replica settings for VMs will travel with them, and the broker ensures smooth replication from that point on.

Configuring Hyper-V Replica

I don’t have my lab up and running yet, but there are already many step-by-step posts out there.  I wanted to focus on the how it works and why to use it.  But here are the fundamentals:

On the replica host/cluster, you need to enable Hyper-V Replica.  Here you can control what hosts (or all) can replicate to this host/cluster.  You can do things like have one storage path for all replicas, or creating individual policies based on source FQDN such as storage paths or enabling/pausing/disabling replication.

You do not need to enable Hyper-V Replica on the source host.  Instead, you configure replication for each required VM.  This includes things like:

  • Authentication: HTTP (Kerberos) within the AD forest, or HTTPS (destination provided SSL certificate) for inter-forest (or hosted) replication.
  • Select VHDs to replicate
  • Destination
  • Compressing data transfer: with a CPU cost for the source host.
  • Enable VSS once per hour: for apps requiring consistency – not normally required because of the logging nature of Replica and it does cause additional load on the source host
  • Configure the number of replicas to retain on the destination host/cluster: Hyper-V Replica will automatically retain X historical copies of a VM on the destination site.  These are actually Hyper-V snapshots on the destination copy of the VM that are automatically created/merged (remember we have hot-merge of the AVHD in Windows 8) with the obvious cost of storage.  There is some question here regarding application support of Hyper-V snapshots and this feature.

Initial Replication Method

I’ve worked in the online backup business before and know how difficult the first copy over the wire is.  The SME may have small changes to replicate but might have TBs of data to copy on the first synchronisation.  How do you get that data over the wire?

  • Over-the-wire copy: fine for a LAN, if you have lots of bandwidth to burn, or if you like being screamed at by the boss/customer.  You can schedule this to start at a certain time.
  • Offline media: You can copy the source VMs to some offline media, and import it to the replica site.  Please remember to encrypt this media in case it is stolen/lost (BitLocker-To-Go), and then erase (not format) it afterwards (DBAN).  There might be scope for an R2/Windows 9 release to include this as part of a process wizard.  I see this being the primary method that will be used.  Be careful: there is no time out for this option.  The HRL on the source site will grow and grow until the process is completed (at the destination site by importing the offline copy).  You can delete the HRLs without losing data – it is not like a Hyper-V snapshot (checkpoint) AVHD.
  • Use a seed VM on the destination site: Be very very careful with this option.  I really see it as being a great one for causing calls to MSFT product support.  This is intended for when you can restore a copy of the VM in the DR site, and it will be used in a differencing mechanism where the differences will be merged to create the synch.  This is not to be used with a template or similar VMs.  It is meant to be used with a restored copy of the same VM with the same VM ID.  You have been warned.

And that’s it.  Check out the social media and you’ll see how easy people are saying Hyper-V Replica is to set up and use.  All you need to do now is check out the status of Hyper-V Replica in the Hyper-V Management Console, Event Viewer (Hyper-V Replica log data using the Microsoft-Windows-Hyper-V-VMMSAdmin log), and maybe even monitor it when there’s an updated management pack for System Center Operations Manager.

Failover

I said earlier that failover is manual.  There are two scenarios:

  • Planned: You are either testing the invocation process or the original site is running but unavailable.  In this case, the VMs start in the DR site, there is guaranteed zero data loss, and the replication policy is reversed so that changes in the DR site are replicated to the now offline VMs in the primary site.
  • Unplanned: The primary site is assumed offline.  The VMs start in the DR site and replication is not reversed. In fact, the policy is broken.  To get back to the primary site, you will have to reconfigure replication.Can I Dispense With Backup?No, and I’m not saying that as the employee of a distributor that sells two competing backup products for this market.  Replication is just that, replication.  Even with the historical copies (Hyper-V snapshots) that can be retained on the destination site, we do not have a backup with any replication mechanism.  You must still do a backup, as I previously blogged, and you should have offsite storage of the backup.Many will continue to do off-site storage of tapes or USB disks.  If your disaster affects the area, e.g. a flood, then how exactly will that tape or USB disk get to your DR site if you need to restore data?  I’d suggest you look at backup replication, such as what you can get from DPM:

     

    The Big Question: How Much Bandwidth Do I Need?

  • Ah, if I knew the answer to that question for every implementation then I’d know many answers to many such questions and be a very rich man, travelling the world in First Class.  But I am not.

    There’s a sizing process that you will have to do.  Remember that once the initial synchronisation is done, only changes are replayed across the wire.  In fact, it’s only the final resultant changes of the last 5 minutes that are replayed.  We can guestimate what this amount will be using approaches such as these:

    • Set up a proof of concept with a temporary Hyper-V host in the client site and monitor the link between the source and replica: There’s some cost to this but it will be very accurate if monitored over a typical week.
    • Do some work with incremental backups: Incremental backups, taken over a day, show how much change is done to a VM in a day.
    • Maybe use some differencing tool: but this could have negative impacts.

    Some traps to watch out for on the bandwidth side:

    • Asynchronous broadband (ADSL):  The customer claims to have an 8 Mbps line but in reality it is 7 Mbps down and 300kbps up.  It’s the uplink that is the bottleneck because you are sending data up the wire.  Most SME’s aren’t going to need all that much.  My experience with online backup verifies that, especially if compression is turned on (will consume source host CPU).
    • How much bandwidth is actually available: monitor the customer’s line to tell how much of the bandwidth is being consumed or not by existing services.  Just because they have a functional 500 kbps upload, it doesn’t mean that they aren’t already using it.

    Very Useful Suggestion

    Think about your servers for a moment.  What’s the one file that has the most write activity?  It is probably the paging file.  Do you really want to replicate it from site A to site B, needlessly hammering the wire?

    Hyper-V Replica works by intercepting writes to VHDs.  It has no idea of what’s inside the files.  You can’t just filter out the paging file.  So the excellent suggestion from the Hyper-V product group is to place the paging file of each VM onto a different VHD, e.g. a SCSI attached D drive.  Do not select this drive for replication.  When the VMs are failed over, they’ll still function without the paging file, just not as well.  You can always add one after if the disaster is sustained.  The benefit is that you won’t needlessly replicate paging file changes from the primary site to the DR.

    Summary

    I love this feature because it solves a real problem that the majority of businesses face.  It is further proof that Hyper-V is the best value virtualisation solution out there.  I really do think it could give many Microsoft Partners a way to offer a new multi-tenant business offering to further reduce the costs of DR.

    EDIT:

    I have since posted a demo video of Hyper-V Replica in action, and I have written a guest post on Mary Jo Foley’s blog.

    EDIT2:

    I have written around 45 pages of text (in Word format) on the subject of Hyper-V Replica for a chapter in the Windows Server 2012 Hyper-V Installation and Configuration Guide book. It goes into great depth and has lots of examples. The book should be out Feb/March of 2013 and you can pre-order it now:

     

    Windows Server 2012 Hyper-V Live Migration

    Live Migration was the big story in Windows Server 2008 R2 Hyper-V RTM and in WS2012 Hyper-V it continues to be a big part of a much BIGGER story. Some of the headline stuff about Live Migration in Windows Server 2012 Hyper-V was announced at Build in September 2012. The big news was that Live Migration was separated from Failover Clustering. This adds flexibility and agility (2 of the big reasons beyond economics why businesses have virtualised) to those who don’t want to or cannot afford clusters:

    • Small businesses or corporate branch offices where the cost of shared storage can be prohibitive
    • Hosting companies where every penny spent on infrastructure must be passed onto customers in one way or another, and every time the hosting company spends more than the competition they become less competitive.
    • Shared pools of VDI VMs don’t always need clustering. Some might find it acceptable if a bunch of pooled VMs go offline if a host crashes, and the user is redirected to another host by the broker.

    Don’t get me wrong; Clustered Hyper-V hosts are still the British Airways First Class way to travel. It’s just that sometimes the cost is not always justified, even though the SMB 3.0 and Scale Out File Server story brings those costs way down in many scenarios where the hardware functions of SAS/iSCSI/FC SANs aren’t required.

    Live Migration has grown up. In fact, it’s grown up big time. There are lots of pieces and lots of terminology. We’ll explore some of this stuff now. This tiny sample of the improvements in Windows Server 2012 Hyper-V shows how much work the Hyper-V group have done in the last few years. And as I’ll show you next, they are not taking any chances.

    Live Migration With No Compromises

    Two themes have stood out to me since the Build announcements. The first theme is “there will be no new features that prevent Hyper-V”. In other words, any developer who has some cool new feature for Microsoft’s virtualisation product must design/write it in such a way that it allows for uninterrupted Live Migration. You’ll see the evidence of this as you read more about the new features of Windows Server 2012 Hyper-V. Some of the methods they’ve implemented are quite clever.

    The second and most important theme is “always have a way back”. Sometimes you want to move a VM from one host to another. There are dependencies such as networking, storage, and destination host availability. The source host has no control over these. If one dependency fails, then the VM cannot be lost, leaving the end users to suffer. For that reason, new features always try to have a fallback plan where the VM can be left running on the source host if the migration fails.

    With those two themes in mind, we’ll move on.

    Tests Are Not All They Are Cracked Up To Be

    The first time I implemented VMware 3.0 was on a HP blade farm with an EVA 8000. Just like any newbie to this technology (and to be honest I still do this with Hyper-V because it’s a reassuring test of the networking configurations that I have done) I created a VM and did a live migration (vMotion) of a VM from one host to another while doing a Ping test. I was saddened to see 1 missed ping during the migration.

    What exactly did I test? Ping is an ICMP tool that is designed to have very little tolerance of faults. Of course there is little to no tolerance; it’s a network diagnostic tool that is used to find faults and packet loss. Just about every application we use (SMB, HTTP, RPC, and so on) are TCP based. TCP (or Transmission Control Protocol) is designed to handle small glitches. So where Ping detects a problem, something like a file copy or streaming media might have a bump in the road that we humans probably won’t perceive. And event applications that use UDP, such as the new RemoteFX in Windows Server 2012, are built to be tolerant of a dropped packet if it should happen (they choose UDP instead of TCP because of this).

    Long story, short: Ping is a great test but bear in mind that you have a strong chance of seeing just 1 packet will slightly increased latency or even a single missed ping. The eyeball test with a file copy, an RDP session, or a streaming media session is the real end user test.

    Live Migration – The Catchall

    The term Live Migration is used as a big of a catchall In Windows Server 2012 Hyper-V. To move a VM from one location to another, you’ll start off with a single wizard and then have choices.

    Live Migration – Move A Running VM

    In Windows Server 2008 R2, we had Live Migration built into Failover Clustering. A VM had two components: it’s storage (VHDs usually) and it’s state (processes and memory). Failover Clustering would move responsibility for both the storage and state from one host to another. That still applies in a Windows Server 2012 Hyper-V cluster, and we can still do that. But now, at it’s very core, Live Migration is the movement of the state of a VM … but we can also move the storage as you’ll see later.

    AFAIK, how the state moves hasn’t really changed because it works very well. A VM’s state is it’s configuration (think of it as the specification, such as processor and memory), it’s memory contents, and it’s state (what’s happening now).

    The first step is to copy the configuration from the source host to the destination host. Effectively you now have a bank VM sitting on the destination host, waiting for memory and state.

    clip_image001

    Now the memory of the VM is copied, one page at a time from the source host while the VM is running. Naturally, things are happening on the running VM and it’s memory is changing. Any previously copied pages that subsequently change are marked as dirty so that they can be copied over again. Once copied, they are marked as clean.

    clip_image002

    Eventually you get to a point where either everything is copied or there is almost nothing left (I’m simplifying for brevity – brevity! – me! – hah!). At this point, the VM is paused on the source host. Start the stopwatch because now we have “downtime”. The state, which is tiny, is copied from the source host to the VM in the destination host. The now complete VM on the destination is not complete. It is placed “back” into a running state, and the VM in the source site is removed. Stop the stopwatch. Even in a crude lab, at most I miss is one ping here, and as I stated earlier, that’s not enough to impact applications.

    clip_image003

    And that’s how Live Migration of a running VM works without getting to bogged down in the details.

    Live Migration on a Cluster

    The process for Live Migration of a VM is simple enough:

    • The above process happens to get a VM’s state from the source host to the destination host.
    • As a part of the switch over, responsibility for the VM’s files on the shared storage is passed from the source host to the destination host.

    This combined solution is what kept everything pretty simple from our in-front-of-the-console perspective. Things get more complicated with Windows Server 2012 Hyper-V because Live Migration is now possible without a cluster.

    SMB Live Migration

    Thanks to SMB 3.0 with it’s multichannel support, and added support for high-end hardware features such as RDMA, we can consider placing a VM’s files on a file share.

    clip_image004

    The VMs continue to run on Hyper-V hosts, but when you inspect the VMs you’ll find their storage paths are on a UNC path such as \FileServer1VMs or \FileServerCluster1VMs. The concept here is that you an use a more economic solution to store your VMs on a shared storage solution, with full support for things like Live Migration and VSS backup. I know you’re already questioning this, but by using multiple 1 Gbps or even 10 Gbps NICs with multichannel (SMB 3.0 simultaneously routing file share traffic over multiple NICs without NIC teaming) then you can get some serious throughput.

    There are a bunch of different architectures which will make for some great posts at a later point. The Hyper-V hosts (in the bottom of the picture) can be clustered or not clustered.

    Back to Live Migration, and this scenario isn’t actually that different to the Failover Cluster model. The storage is shared, with both the source and destination hosts having file share and folder permissions to the VM storage. Live Migration happens, and responsibility for files is swapped. Job done!

    Shared Nothing Live Migration

    This is one scenario that I love. I wish I’d had it when I was hosting with Hyper-V in the past. It gives you mobility of VMs across many non-clustered hosts without storage boundaries.

    In this situation we have two hosts that are not clustered. There is no shared storage. VMs are storage on internal disk. For example, VM1 could be on the D: drive of HostA, and we want to move it to HostB.

    A few things make this move possible:

    • Live Migration: we can move the running state of the VM from HostA to HostB using what I’ve already discussed above.
    • Live Storage Migration: Ah – that’s new! We had Quick Storage Migration in VMM 2008 R2 where we could relocate a VM with a few minutes of downtime. Now we get something new in Hyper-V with zero downtime. Live Storage Migration enables us to relocate the files of a VM. There’s two options: move all the files to a single location, or we can choose to relocate the individual files to different locations (useful if moving to a more complex storage architecture such as Fasttrack).

    The process of Live Storage Migration is pretty sweet. It’s really the first time MSFT has implemented it, and the funny thing is that they created it while, at the same time, VMware was having their second attempt (to get it right) at vSphere 5.0 Storage vMotion.

    Say you want to move a VM’s storage from location A to location B. The first step done is to copy the files.

    clip_image005

    IO operations to the source VHD are obviously continuing because the VM is still running. We cannot just flip the VM over after the copy, and lose recent actions on the source VHD. For this reason, the VHD stack simultaneously writes to both the source and destination VHDs as the copy process is taking place.

    clip_image006

    Once the VHD is successfully copied, the VM can switch IO so it only targets the new storage location. The old storage location is finished with, and the source files are removed. Note that they are only removed after Hyper-V knows that they are no longer required. In other words, there is a fall back in case something goes wrong with the Live Storage Migration.

    Note that both hosts must be able to authenticate via Kerberos, i.e. domain membership.

    Bear this in mind: Live Storage Migration is copying and synchronising a bunch of files, and at least one of them (VHD or VHDX) is going to be quite big. There is no way to escape this fact; there will be disk churn during storage migration. It’s for that reason that I wouldn’t consider doing Storage Migration (and hence Shared Nothing Storage Migration) every 5 minutes. It’s a process that I can use in migration scenarios such as storage upgrade, obsoleting a standalone host, or planned extended standalone host downtime.

    Back to the scenario of Live Migration without shared storage. We now have the 2 key components and all that remains is to combine and order them:

    1. Live Storage Migration is used to replicate and mirror storage between HostA and HostB. This mirror is kept in place until the entire Shared Nothing Live Migration is completed.
    2. Live Migration copies the VM state from HostA to HostB. If anything goes wrong, the storage of the VM is still on HostA and Hyper-V can fall back without losing anything.
    3. Once the Live Migration is completed, the storage mirror can be broken, and the VM is removed from the source machine, HostA.

    Summary

    There is a lot of stuff in this post. There are a few things to retain from this post:

    • Live Migration is a bigger term than it was before. You can do so much more with VM mobility.
    • Flexibility & agility are huge. I’ve always hated VMware Raw Device Mapping and Hyper-V Passthrough disks. The much bigger VHDX is the way forward (score for Hyper-V!) because
      it offers scale and unlimited mobility.
    • It might read like I’ve talked about a lot of technologies that make migration complex. Most of this stuff is under the covers and is revealed through a simple wizard. You simply want to move/migrate a VM, and then you have choices based on your environment.
    • You will want to upgrade to Windows Server 2012 Hyper-V.

     

     

    Windows Server 2012 Hyper-V Virtual Fibre Channel

    You now have the ability to virtualise a fibre channel adapter in WS2012 Hyper-V.  This synthetic fibre channel adapter allows a virtual machine to directly connect to a LUN in a fibre channel SAN.

    Benefits

    It is one thing to make a virtual machine highly available.  That protects it against hardware failure or host maintenance.  But what about the operating system or software in the VM?  What if they fail or require patching/upgrades?  With a guest cluster, you can move the application workload to another VM.  This requires connectivity to shared storage.  Windows 2008 R2 clusters, for example, require SAS, fibre channel, or iSCSI attached shared storage.  SAS is right for connecting VMs to storage.  iSCSI consumers were OK.  But those who made the huge investment in fibre channel were left in the cold, sometimes having to implement an iSCSI gateway to their FC storage.  Woudn’t it be nice to allow them to use their FC HBAs in the host to create guest clusters?

    Another example is where we want to provision really large LUNs to a VM.  As I just posted a little while ago, VHDX expands out to 64 TB so really we would need to have a requirement for LUNs beyond 64 TB to justify this reason to provide physical LUNs to a VM and limit mobility.  But I guess with the expanded scalability of VMs, big workloads like OLTP can be virtualised on Windows 8 Hyper-V and they require big disk.

    What It Is

    Virtual Fibre Channel allows you to virtualise the HBA in a Windows 8 Hyper-V host, have a virtual fibre channel in the VM with it’s own WWN (actually, 2 to be precise) and connect the VM directly to LUNs in a FC SAN.

    Windows Server 2012 Hyper-V Virtual Fibre Channel is not intended or supported to do boot from SAN.

    The VM will share bandwidth on the host’s HBA, unless I guess you spend extra on additional HBAs, and cross the SAN to connect to the controllers in the FC storage solution.

    The SAN must support NPIV (N_Port ID Virtualization).  Each VM can have up to 4 virtual HBAs.  Each HBA has it’s own identification on the SAN.

    How It Works

    You create a virtual SAN on the host (parent partition), for each HBA on the host that will be virtualised for VM connectivity to the SAN.  This is a 1-1 binding between virtual SAN and physical HBA, similar to the old model of virtual network and physical NIC.  You then create virtual HBAs in your VMs and connect them to virtual SANs.

    And that’s where things can get interesting.  When you get into the FC world, you want fault tolerance with MPIO.  A mistake people will make is that they will create two virtual HBAs and put them both on the same virtual network, and therefore on a single FC path on a single HBA.  If that single cable breaks, or that physical HBA port fails, then the VM has pointless MPIO because both virtual HBAs are on the same physical connection.

    The correct approach for fault tolerance will be:

    1. 2 or more HBA connections in the host
    2. 1 virtual SAN for each HBA connection in the host.
    3. 1 virtual HBA in each VM, with each one connected to a different virtual SAN
    4. MPIO configured in the VM’s guest OS.  In fact, you can (and should) use your storage vendor’s MPIO/DSM software in the VM’s guest OS.

    Now you have true SAN path fault tolerance at the physical, host, and virtual levels.

    Live Migration

    One of the key themes of Hyper-V is “no new features that prevent Live Migration”.  So how does a VM that is connected to a FC SAN move from one host to another without breaking the IO stream from VM to storage?

    There’s a little bit of trickery involved here.  Each virtual HBA in your VM must have 2 WWNs (either automatically created or manually defined), not just one.  And here’s why.  There is a very brief period where a VM exists on two hosts during live migration.  It is running on HostA and waiting to start on HostB.  The switchover process is that the VM is paused on A and started on B.  With FC, we need to ensure that the VM is able to connect and process IO.

    So in this below example, the VM is connecting to storage using WWN A.  During Live Migration the new instance of the VM on the destination host is set up with WWN B.  When LM un-pauses on the destination host, the VM can instantly connect to the LUN and continue IO uninterrupted.  Each subsequent LM, either to the original host or any other host, will cause the VM to alternate between WWN A and WWN B.  That’ holds true of each virtual HBA in the VM.  You can have up to 64 hosts in your Hyper-V cluster, but each virtual fibre channel adapter will alternate between just 2 WWNs.

    Alternating WWN addresses during a live migration

    What you need to take from this is that your VM’s LUNs need to be masked or zoned for two WWNs for every VM.

    Technical Requirements and Limits

    Fist and foremost, you must have a FC SAN that supports NPIV.  Your host must run Windows Server 2012.  The host must have a FC HBA with a driver that supports Hyper-V and NPIV.  You cannot use virtual fibre channel adapters to boot VMs from the SAN; they are for data LUNs only.  The only supported guest operating systems for virtual fibre channel at this point are Windows Server 2008, Windows Server 2008 R2, and Windows Server 2102.

    This is a list of the HBAs that have support built into the Windows Server 2012 Beta:

    Vendor Model
    Brocade BR415 / BR815
    Brocade BR425 / BR825
    Brocade BR804
    Brocade BR1860-1p / BR1860-2p
    Emulex LPe16000 / LPe16002
    Emulex LPe12000 / LPe12002 / LPe12004 / LPe1250
    Emulex LPe11000 / LPe11002 / LPe11004 / LPe1150 / LPe111
    QLogic Qxx25xx Fibre Channel HBAs

    Summary

    With supported hardware, virtual fibre channel support allows supported Windows Server 2012 Hyper-V guests to connect to and use fibre channel SAN LUNs for data purposes that enable extreme scalable storage and in-guest clustering without compromising the uptime and mobility of Live Migration.

    Why I Think Windows 8 Will RTM Before July 2012

    You’ll soon see that this is not based on any inside information … so make of it what you want.

    We all know that Mary-Jo Foley reported a little while ago that Windows 8 could RTM as soon as April 2012.  I have 2 reasons to think that she might not be far off.

    The first is a little bit more sensible than the second.

    Timing

    The Build conference is being held earlier than PDC used to be.  That makes me think that we’re working on a schedule with earlier milestones.  RTM for Windows 7 was early Summer with launches later in the year.  So maybe we’ll see a Windows 8 RTM in Spring with launches in the summer time frame?

    Microsoft is Superstitious

    Yes, a 100,000 employee corporate giant is afraid of the number 13.  Was there an Office 13?  Was there an Exchange 13?  No; they skipped a version number and went from 12 to 14 (“Wave 15” is on the way).

    Microsoft’s financial years are from July to June.  For example, Microsoft is currently in financial year 2012.  Come July 2012, Microsoft will be in financial year 2013.  They name their products like EA Sports.  If Microsoft releases Windows Server “8” in June, it could be called Windows Server 2012.  But come July, it’ll more likely (not necessarily) be called Windows Server 2013.

    Remember that they hate black cats, walking under ladders, spilling salt (or is it throwing it?), and the number 13.  I bet the next version of Server is called Windows Server 2012 … and therefore they will aim to RTM it before July of 2012.  Launches will probably be in September … that’s because MSFT is a mess in July with FY planning, and everyone is away on vacation in August.

    That’s my 2 cents, not exactly based on science Winking smile