Microsoft News – 2 February 2015

The big news of the last few days was the announcement that the next version of “Windows Server and System Center” won’t be released until 2016. This is quite disappointing.

Windows Server

Windows Client

Azure

Licensing

  • IaaS Gotchas: Compliance gotchas as it pertains to providing infrastructure as a service.

Microsoft News – 14 January 2015

Here’s the Microsoft updates from the last few days.

Windows Server

System Center

Azure

Office 365

Windows Server Technical Preview – Distributed Storage QoS

In a modern data centre, there is more and more resource centralization happening. Take a Microsoft cloud deployment for example, such as what Microsoft does with CPS or what you can do with Windows Server (and maybe System Center). A chunk of a rack can contain over a petabyte of RAW storage in the form of a Scale-Out File Server (SOFS) and the rest of the rack is either hosts or TOR networking. With this type of storage consolidation, we have a challenge: how do we ensure that each guest service gets the storage IOPS that it requires?

From a service providers perspective:

  • How do we provide storage performance SLAs?
  • How do we price-band storage performance (pay more to get more IOPS)?

Up to now with Hyper-V you required a SAN (such as Tintrí) to do some magic on the backend. WS2012 R2 Hyper-V added a crude storage QoS method (maximum rule only) that was performed on at the host and not at the storage. So:

  • There was no minimum or SLA-type rule, only a cap.
  • QoS rules were not distributed so there was no accounting on host X what Hosts A-W were doing to the shared storage system.

Windows Server vNext is adding Distributed Storage QoS that is the function of a partnership between Hyper-V hosts and a SOFS. Yes: you need a SOFS – but remember that a SOFS can be 2-8 clustered Window Servers that are sharing a SAN via SMB 3.0 (no Storage Spaces in that design).

Note: the hosts use a new protocol called MS-SQOS (based on SMB 3.0 transport) to partner with the SOFS.

image

Distributed Storage QoS is actually driven from the SOFS. There are multiple benefits from this:

  • Centralized monitoring (enabled by default on the SOFS)
  • Centralized policy management
  • Unified view of all storage requirements of all hosts/clusters connecting to this SOFS

Policy (PowerShell – System Center vNext will add management and monitoring support for Storage QoS) is created on the SOFS, based on your monitoring or service plans. An IO Scheduler runs on each SOFS node, and the policy manager data is distributed. The Policy Manager (a HA cluster resource on the SOFS cluster) pushes (MS-SQOS) policy up the Hyper-V hosts where Rate Limiters restrict the IOPS of virtual machines or virtual hard disks.

image

There are two kinds of QoS policy that you can create:

  • Single-Instance: The resources of the rule are distributed or shared between VMs. Maybe a good one for a cluster/service or a tenant, e.g. a tenant gets 500 IOPS that must be shared by all of their VMs
  • Multi-Instance: All VMs/disks get the same rule, e.g. each targeted VM gets a maximum of 500 IOPS. Good for creating VM performance tiers, e.g. bronze, silver, gold with each tier offering different levels of performance for an individual VM

You can create child policies. Maybe you set a maximum for a tenant. Then you create a sub-policy that is assigned to a VM within the limits of the parent policy.

Note that some of this feature comes from the Predictable Data Centers effort by Microsoft Research in Cambridge, UK.

Hyper-V storage PM, Patrick Lang, presented the topic of Distributed Storage QoS at TechEd Europe 2014.

MVP Carsten Rachfahl Interviews Me About Windows Server vNext

While at the MVP Summit in Redmond, my friend Carsten Rachfahl (also a Hyper-V MVP) recorded a video interview with me to talk about Windows Server vNext and some of our favourite new features.

image

Windows Server Technical Preview – Replica Support for Hot-Add of VHDX

Shared VHDX was introduced in WS2012 R2 to enable easier and more flexible deployments of guest clusters; that is, a cluster that is made from virtual machines. The guest cluster allows you to make services highly available, because sometimes a HA infrastructure is just not enough (we are supposed to be all about the service, after all).

We’ve been able to do guest clusters with iSCSI, SMB 3.0, or Fiber Channel/FCoE LUNs/shares but this crosses the line between guest/tenant and infrastructure/fabric. That causes a few issues:

  • It reduces flexibility (Live Migration of storage, backup, replication, etc)
  • There’s a security/visibility issue for service providers
  • Self-service becomes a near impossibility for public/private clouds

That’s why Microsoft gave us Shared VHDX. Two virtual machines can connect to the same VHDX that contains data. That disk appears in the guest OS of the two VMs as a shared SAS disk, i.e cluster-supported storage. Now we have moved into the realm of software and we enable easy self service, flexibility, and no longer cross the hardware boundary.

But …

Shared VHDX was a version 1.0 feature in Windows Server 2012 R2. It wasn’t a finished product; Microsoft gave us what they had ready at the time. Feedback was unanimous: we need backup and replications support for Shared HDX, and we’d also like Live Migration support.

The Windows Server vNext Technical Preview gives us support to replicate Shared VHDX files using Hyper-V Replica (HVR). This means that you can add these HA guest clusters to your DR replication set(s) and offer a new level of availability to your customers.

I cannot talk about how Microsoft is accomplishing this feature yet … all I can report is what I’ve seen announced.

Windows Server Technical Preview – Cluster Compute Resiliency

Imagine a scenario:

  1. You have a cluster of Hyper-V hosts
  2. Some operator pulls the wrong network cables
  3. A host becomes network-isolated and the cluster heartbeat times out before the mistake is noticed
  4. Virtual machines fail over

Great, right? HA kicked in? That’s good … right!?!?!

Ummm maybe not. Let me ask you a question. Which is worse:

  • A virtual machine being offline for a minute or so because the host is network-isolated? OR …
  • Every virtual machine on that host stops executing, fails over to other hosts in the cluster, and takes several minutes to boot and get services responsive on the network.

For most people, option A is more favourable and this is why Microsoft is giving us Cluster Compute Resiliency.

With this new feature, a cluster will become more tolerant (and this is configurable) to transient network errors. In the event of a heartbeat timeout, the host will go into isolation. This will allow VMs on that host to continue executing and prevent additional VMs being placed onto that host. If the host becomes responsive within a certain time frame then it comes out of isolation. If the host does not become responsive then VMs are failed over to other hosts.

Note that if a host is determined to be “flapping” then it will be put into Cluster Quarantine.

Windows Server Technical Preview – Differential Export

A differential export is an export of the differences of a virtual machine from between two points in time. It is used to enable an incremental backup of a virtual machine that is backed up using the new file-based backup system with Resilient Change Tracking. The below image shows the state of a VM and its backup after a full backup. Note that this file-based backup has used Resilient Change Tracking to identify what changes are being made to the VM’s storage since the backup.

image

An incremental backup starts, using the differential export process. A backup checkpoint, including VM configuration and VHD fork (via AVHD) is created. The existing Resilient Change Tracking ID T1 is used to determine what has changed in the parent VHD to create a differential export of the VM in the backup target media (exported VM configuration T2 and the differential VHD).

image

The backup checkpoint is removed and a new RCT ID (T2) is created so we can now do Resilient Change Tracking of the VHD for the time after the backup.

image

Old reference points (RCT IDs) can be disposed of as required.

A “synthetic full backup” process is also support for third-party backup solutions.

image

Hyper-V PM Taylor Brown talks about Change Tracking in his session at TechEd Europe 2014.

Microsoft News – 19 December 2014

We’re getting close to Christmas and Microsoft is starting to wind down for the year. Here’s a mostly-Azure report for the last few days.

Hyper-V

Azure

Miscellaneous

Windows Server Technical Preview – File-Based Backup

In Microsoft endeavors to finally close the book on backup issues, the Hyper-V team is switching to file-based backup, and moving from the non-scalable VSS backup. Let’s face it – most hardware VSS Providers have been like a curse.

When you backup a VM in vNext, a “backup checkpoint” is created. This forks the VM’s configuration is forked and the virtual hard disk(s) is forked too using an AVHD. This is done for a short period of time. This allows changes to continue while the backup is being done. The virtual machine can be live exported as a backup.

image

After this operation a dateless Reference Point is created. The AHVD(s) is merged back into the parent VHD(s). This reference point notes the Resilient Change Tracking ID (per VHD), so we know what changes are made after the AVHD was created, and now we know what blocks must be backed up in a following incremental backup.

image

Some notes:

  • Incremental and “synthetic full” backups can now follow the full backup and this is done using a Differential Export.
  • A restore is basically a process of copying the VM files from backup media and importing the VM.

SAN-based backup is different. A LUN snapshot will retain the parent VHD and AVHD, and only the VM configuration is exported by Hyper-V. CDS, SMI-S or network providers be used to create the LUN backup. The LUN snapshot is removed and job done.

Hyper-V PM Taylor Brown talks about file-based backup in his session at TechEd Europe 2014.

image

Windows Server Technical Preview – Resilient Change Tracking

Windows Server Hyper-V has had an … interesting … history when it comes to backup. It has been a take-it-personally mission of the Hyper-V team to stop backup being an issue for Hyper-V customers. Backup of CSV in Windows Server 2008 R2 was not fun. Things got better in WS2012, and again in Windows Server 2012 R2. And we might finally be getting there with the next release of Windows Server.

An important change to Hyper-V backup is to enable partners to keep up with the pace of change of Windows Server – we’ve seen some backup vendors take years to catch up with a new version, and this prevents mutual customers from keeping their hosts in step with Microsoft.

In order for a backup product to do incremental backups, it needs to do block based change tracking. Each vendor has to create one of these filter drivers that sits in the storage stack. This stuff is hard to do right, and it can cause stability and performance issues if not done correctly. And it also slows down the development/re-test/re-certify of BackupProduct2016 to keep up with the release of Windows Server 2016.

Some bad change tracking implementations, that you may know of, lived in memory as bitmaps. If the host had an un-planned outage then the next backup had to be a full backup. Or maybe if the VM live migrated to another host, that VM would have to do a full backup because the change tracking was no longer in the memory of the host.

Resilient Change Tracking is built-in backup change tracking of changed blocks within virtual hard disks. It is used for incremental backup, and it is the underlying engine for differential export. The change tracking bitmap lives in memory and on-disk. The on-disk bitmap is not as granular because it is the fallback from the much more detailed in-memory bitmap.

The goal now is that backup vendors should stop writing their own filter driver to implement change tracking. If they use the built-in resilient change tracking then they can focus more time on feature development testing/certification, and keep up with Microsoft’s frequent releases of Windows Server. And hopefully, Microsoft’s change tracking will undergo suitable levels of testing that will give all customers a universally stable and well-performing subsystem.

Hyper-V PM Taylor Brown talks about Change Tracking in his session at TechEd Europe 2014.