Event Notes – What’s New In Windows Server 2012 R2?

Speaker Jeff Woolsey

The Cloud OS Vision

The Private Cloud is Windows Server & System Center.  Virtualisation is not cloud.  P2V didn’t’ change management.  Look at the traits of a cloud in the NIST definition.  Cloud-centric management layers change virtualisation into a cloud.  That’s what SysCtr 2012 and later do to virtualization layers: create clouds.

Microsoft’s public cloud is Azure, powered by Hyper-V, a huge stress (performance and scalability) on a hypervisor.

Hosting companies can also use Windows Azure Pack on Windows Server & System Center to create a cloud.  That closes the loop … creating 1 consistent platform across public and private, on premise, in Microsoft, and in hosting partners.  The customer can run their workload everywhere.

Performance

The absolute best way to deploy MSFT biz apps is on Hyper-V: test, support, validation, optimization, test, test, test.  They test everything on Hyper-V and Azure, every single day.  25,000 VMs are created every day to do automated unit tests of Windows Server.

In stress tests, Exchange (beyond recommended scale) tested well within Exchange requirements on Hyper-V.  Over 1,000,000 IOPS from a Hyper-V VM in a stress test.

Storage

If you own a SAN, running WS2012 or newer is a no brainer: TRIM, UNMAP, ODX. 

Datacenter without Boundaries

Goal number 1.

They wanted integrated high performance virtualization platform.  Reduce complexity, cost, and downtime.  Ease deployment.  Flexible.

Automatic VM activation.  Live VM export/cloning.  Remote Access via VMBus.  Online VHDX resize.  Live Migration compression.  Live Migration over RDMA.  More robust Linux support.

Ben Armstrong on demo patrol:

Storage QoS.  You can cap the storage IOPS of a VM, on a per hard disk basis.

Linux has full dynamic memory support on WS2012 R2.  Now we can do file system consistent backup of Linux VMs without pausing them.  Don’t confuse it with VSS – Linux does not have VSS.  It’s done using a file system freeze. 

You can do shared VHDX to create 100% virtual production ready guest clusters.  The shared VHDX appears as a SAS connected disk in the guest OSs.  Great for cloud service providers to enable 100% self service.  Store the VHDX on shared storage, e.g. CSV or SMB 3.0 to support Live Migration … best practice is that the guest cluster nodes be on different hosts Smile

End of Ben in this session.

Demystifying Storage Spaces and SOFS

I‘ll recommend you watch the session.  Jeff uses a storage appliance to explain a file server with Storage Spaces.  He’ll probably do the same with classic SAN and scale-out file server. 

Matt McSpirit comes up.

He’s using VMM to deploy a new file server cluster.  He’s not using Failover Clustering or Server Manager.  He can provision bare metal cluster members.  Like the process of deploying bare metal hosts.  The shares can be provisioned and managed through VMM, as in 2012 SP1.  You can add new bare-metal hosts.  There is a configurable thin provisioning alert in the GUI – OpsMgr with the MP for VMM will alert on this too.

Back to Jeff.

Changes of Guest Clustering

It’s a problem for service providers because you have previously needed to provide a LUN to the customer.  Hoster’s just can’t do it because of customisation.  Hoster can’t pierce the hosting boundary, and customer is unhappy.  With shared VHDX, the shared storage resides outside the hoster boundary is the tenant domain.  It’s completely virtualised and perfect for self-service.

SDN

The real question should be: Why deploy software defined networking (Hyper-V Network Virtualization).  The primary answer is “you’re a hosting company that wants multi-tenancy with abstracted networking for seamless network convergence for hybrid clouds”.  Should be a rare deployment in the private cloud – unless you’re friggin huge or in the acquisition business.

WS2012 R2 will feature a built-in multi-tenant NVGRE (Hyper-V Network Virtualisation or Software Defined Newtorking) gateway.  Now you don’t need F5’s vapourware or the Iron Networks appliance to route between VM Networks and physical networks.  You choose the gateway when creating your VM Network (create VM Network Wizard, Connectivity).  VPN, BGP and NAT are supported.

You can deploy the gateway using a VMM Service Template. 

You can use OMI based rack switches, eg. Arista, to allow VMM to configure your Top Of Rack (TOR) switches.

Hyper-V Replica

HVR broadens your replication … maybe you keep your synchronous replication for some stuff if you made the investment.  But you can use HVR for everything else – hardware agnostic (both ends).  Customers love it.  Service providers should offer it as a service.  But service providers also want to replicate.

Hyper-V Recovery Manager gives you automation and orchestration of VMM-managed HVR.  You install a provider in the VMM servers in site A and site B.  Then enable replication in VMM console.  Replication goes direct from site A to B.  Hyper-V Recovery Manager gives you the tools to create, implement, and monitor the failover plans.

You can now choose your replica interval which defaults to every 5 minutes. Alternatives as 30 seconds and 15 minutes.

Scenario 1: customer replicates from primary hosts (a) to hosts (b) across the campus.  Lots of pipe in the campus so  do 30 seconds replica intervals.  Then replicates from primary DR (b) site to secondary and remote DR site (c).  Lots of latency and bandwidth issues, so go for every 15 minutes.

Scenario 2: SME replicates to hosting company every 5 minutes.  Then the hosting company replicates to another location that is far away.

Michael Leworthy comes up to demo HRM. We get a demo of the new HVR wizards.  Then HRM is shown.  HRM workflows allow you to add manual tasks, e.g. turn on the generator. 

KB2838669–A Big Hotfix Bundle For WS2012 Failover Clustering

The Failover Clustering group also released a big update today.  It solves a range of issues.

Issue 1
Consider the following scnario:

  • You have the Hyper-V server role installed on a Windows Server 2012-based file server.
  • You have lots of virtual machines on a Server Message Block (SMB) share.
  • Virtual hard disks are attached to an iSCSI controller.

In this scenario, you cannot access to the iSCSI controller.

Issue 2
Consider the following scenario:

  • You have a two-node failover cluster that is running Windows Server 2012.
  • The cluster is partitioned.
  • There is a Cluster Shared Volume (CSV) on a cluster node, and a quorum resource on the other cluster node.

In this scenario, the cluster becomes unavailable.

Note This issue can be temporarily resolved by restarting the cluster.

Issue 3
Assume that you set up an SMB connection between two Windows Server 2012-based computers. The hardware on the computers do not support Offloaded Data Transfer (ODX). In this situation, the SMB session is closed unexpectedly.

Issue 4
Consider the following scenario:

  • You have a Windows Server 2012-based failover cluster.
  • You have a virtual machine on a CSV volume on the cluster.
  • You try to create a snapshot for the virtual machine. However, the snapshot creation is detected as stuck. Therefore, the snapshot set is aborted.
  • During the abortion process of the snapshot, the CSV volume is deleted after the snapshot shares are deleted.

In this scenario, the abortion process is paused automatically because of an error that occurs on the cluster.

Issue 5
Assume that you have a Windows Server 2012-based failover cluster. Two specific snapshot state change requests are sent from disk control manager to CSV proxy file system (CSVFS). The requests are present in the same message. In this situation, disk control manager is out-of-sync with CSVFS.

Issue 6
Assume that you create a snapshot for a CSV volume on a Windows Server 2012-based failover cluster. When the snapshot creation is still in progress, another snapshot creation is requested on the same CSV volume. In this situation, the snapshot creation fails and all later snapshot creation attempts on the CSV volume fail.

Note You cannot create a snapshot for the CSV volume until the volume fails over or the volume goes offline and then back online.

Additionally, the update also resolves the issues that are described in the following Microsoft Knowledge Base (KB) articles:

  • KB2799728: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2801054: VSS_E_SNAPSHOT_SET_IN_PROGRESS error when you try to back up a virtual machine in Windows Server 2012
  • KB2796995: Offloaded Data Transfers fail on a computer that is running Windows 8 or Windows Server 2012
  • KB2813630: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2824600 Virtual machine enters a paused state or goes offline when you try to create a backup of the virtual machine on a CSV volume in Windows Server 2012.

A supported hotfix is available from Microsoft.

KB2770917 – WS2012 Hyper-V Backup Fails On NetApp

The November 2012 cumulative rollup for Windows 8 and Windows Server 2012 contains a fix for a when you get the following error when backing up Windows Server 2012 Hyper-V VMs that are stored on NetApp storage:

The number of volumes reverted does not match the number of volumes in the snapshot set for virtual machine.

This cumulative update provided a new version of the Integration Components on the host to fix the issue.  You have (or had if you have already done it) to deploy this new version of the ICs to your guest OSs.  Didier Van Hoye previously (and correctly) blogged that this was only necessary for guest OSs that are Windows Server 2008 & Windows Vista or later.

To get the fix: patch your host (Windows Update – KB2770917) and update the ICs in the guest OSs.

Note: I record this as KB2770917.  That’s the number of the cumulative update that was delivered via Windows Update.  That update includes a number of articles that are not publicly documented.  We just got briefed on this issue so that’s why I’m posting this article 5 months after the update release.

Troublesome KB2799728 (CSV Paused State/Offline Patch) Replaced By KB2813630

I previously blogged that memory leak issues were being reported with KB2799728.  That hotfix repaired an issue where CSVs were going into a paused or offline state during backup due to a free space calculation error by ntfs.sys during a VSS backup.

Microsoft has since released a superseding update, KB2813630, which you can download here

image

System Center Data Protection Manager CSV Serialization Tool

I recently blogged about the big changes In WS2012 Cluster Shared Volume (CSV).  The biggest changes are related to backup:

  • Single coordinated VSS snapshot
  • No more redirected IO

In Windows Server 2008 R2 CSV backup, we tried to use a hardware VSS provider to reduce the impacts of redirected IO.  But as it turns out, the multiple-snapshot-per-backup process of the past could cause problems for the hardware VSS provider and the SAN snapshot functionality.  In extreme cases, those problems could even lead to a CSV LUN “disappearing”.

If you had these problems and couldn’t get a better hardware VSS provider then you would switch to using the system VSS provider (using the VSS functionality that is built into Windows Server and does not use SAN snapshot features).  You’d be forced to use the system VSS provider if your SAN did not have support or licensing for a hardware (physical SAN) or software (software SAN) VSS provider.

If you were using the system VSS provider to backup W2008 R2 CSV then Microsoft recommended you to do something called serialization of your CSV backup (see here for DPM 2010 instructions).  This process creates (using PowerShell) and uses an XML file that is read by DPM.  Nice and simple if you have one DPM server for every W2008 R2 Hyper-V cluster.  But what if you had lots of clusters backed up by a single DPM server?  It meant you had to manually merge the XML files, and that would be a nightmare in a cloud where there is nothing but change.

Microsoft has released the System Center Data Protection Manager CSV Serialization Tool to help you in this scenario.  This tool is intended to be used when backing up Windows Server 2008 R2 Hyper-V clusters with one or more CSVs using DPM 2010 with QFE 3 and above or DPM 2012.

You do not need to use this tool with WS2012 CSV.

The downloads include the PS1 PowerShell script to create an XML file for each cluster and a tool to consolidate those XML files for DPM to use. 

Why release this tool?  Lots of people will have W2008 R2 clusters and won’t be in a position to upgrade them now or ever:

  • Change to production systems can be restricted, e.g. pharmaceuticals.
  • They might have licensed without Software Assurance and can’t upgrade their hosts until there is licensing budget.
  • They might build new clusters/hosts using WS2012 and have to leave existing VMs where they are until there is a suitable maintenance window.  For a public cloud, this could have to be scheduled well in advance.

This free tool will allow those sorts of environments to reduce DPM administrative effort.

The Big Changes In WS2012 Cluster Shared Volume (CSV)

Microsoft made lots of changes with CSV 2.0 in Windows Server 2012.  But it seems like that message has not gotten through to people.  I’ve responded to quite a few comments here on the blog and I’m seeing stuff on forums.  What’s really annoying is that when you tell people that X has changed, they don’t listen.

I would strongly recommend that people take some time (I don’t care about excuses) to watch the TechEd presentation, Cluster Shared Volumes Reborn in Windows Server 2012: Deep Dive, by Rob Hindman and  Amitabh Tamhane (Microsoft).  There are lots of changes.  But I want to focus on the big ones that people repeatedly question.

OK, what are the major changes?

There IS NO Redirected IO in WS2012 CSV Backup

Let me restate that in another way: Windows Server 2012 does not use Redirected IO to backup CSVs.

This has been made possible thanks to substantial changes in how VSS places VMs that are stored on CSV into a quiescent state.  The backup agent (VSS Requestor) kicks off a backup request with a list of virtual machines.  The Hyper-V Writer identifies the storage location(s) of the VMs’ files.  A new component, the CSV Writer, is responsible for coordinating the Hyper-V nodes in the cluster … meaning all VMs on a CSV that is being backed up to be placed into a quiescent state at the same time.  This allows for a single distributed VSS snapshot of each CSV.  That allows the provider (hardware, software or system) to go to work and get the snapshot.

image

This is much simpler than what CSV did in Windows Server 2008 R2.  [The following does not happen in WS2012] There was no CSV Writer.    There was no coordination, so Redirected IO was required.  The node performing a snapshot needed exclusive access to the volume so all IO went through it for the time being.  A lot of people knew that bit up to there.  The bit that most people didn’t know was that each node (hosting VMs that were being backed up) took snapshots of each CSV that was being backed up.  And that could cause problems.

I’ve heard several times now from people who’ve experienced issues with volumes going offline during backup.  There were two causes that I’ve seen, and both were related to a third party hardware VSS provider:

  • Using a hardware VSS provider that did not support CSV
  • The rapidly rotating and repeated snapshot process caused chaos in the SAN with the hardware snapshots

But, all that is G-O-N-E when backing up CSV on Windows Server 2012:

  • There is no redirected IO
  • There is a single VSS snapshot performed

SCSI3 Reservation Starvation Should Go Away

Every node in a Hyper-V cluster used SCSI3 persistent reservations and SCSI3 reservations to connected to CSVs.  Every SAN has a finite number of those persistent reservations and reservations.  The SCSI3 persistent reservations was a bottleneck.  No manufacturer shares that number, and it’s a hell of a lot smaller than you’d expect – we typically find out about it during a support call.  To compound this, each host required a number of SCSI3 persistent reservations, and that multiplied based on:

  • Number of hosts in the cluster
  • Number of CSVs
  • Number of storage channels per host (possibly even a multiple of the number of physical HBAs/NICs, depending on the SAN)

What happens when you deploy too many nodes, CSVs, or storage channels?  CSVs go offline.  Yup.  The SAN is starved of resources to connect the hosts to the LUNs.  I saw this with small deployments with an entry level SAN, 3 hosts, and 5 CSVs.  And it aint pretty.

Imagine a cluster with 64 nodes!?!?!  With Windows Server 2012, each node gets a static key instead of using the legacy persistent reservation multiplication.  That means your SAN can support more CSVs and more hosts running Windows Server 2012 than it would have with Windows Server 2008 R2.  Note that the static key is assigned when the node is added to the cluster.

You can find the static keys in the registry of your cluster nodes in HKEY_LOCAL_MACHINEClusterNodes<Node Number>ReserveID (REG_QWORD).  You can identify which node number is which host by the NodeName (REG_SZ) value.  You can see an example of this below.

image

This new system, which replaces persistent reservations, gives you better cluster infrastructure scalability, but it doesn’t eliminate the scalability limits of your SAN.

Memory Leak Issues Being Reported With KB2799728

I mentioned a new KB article/hotfix yesterday that resolves an issue with backup of CSV on WS2012 Hyper-V clustered nodes.

Tim Boothby and Rich Lilly commented on the post that people are reporting memory leak issues after installing this hotfix.  My advice: don’t install the update unless:

  • You really need it OR
  • It appears that the issue is resolved

KB2799728–VM Enters Paused State Or CSV Goes Offline When Backup WS2012 Hyper-V Cluster

Please note the below comments.  There are memory leak issues being reported with this hotfix.

Please pay special attention to this hotfix.  It’s the sort of one I expect to see on forums and be asked about for the next 18 months.  I recommend making this patch a standard part of your install of WS2012 Hyper-V clusters.

The scenario is when a virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster.

Consider the following scenario:

  • You enable the Cluster Shared Volumes (CSV) feature on a Windows Server 2012-based failover cluster.
  • You create a virtual machine on a CSV volume on a cluster node.
  • You start the virtual machine.
  • You try to create a backup of the virtual machine on the CSV volume by using Microsoft System Center Data Protection Manager (DPM) or any backup software that uses the Microsoft Software Shadow Copy Provider.

In this scenario, one of the following issues occurs:

  • The backup is created, and the virtual machine enters a paused state.
  • The CSV volume goes offline. Therefore, the virtual machine goes offline, and the backup is not created.

Additionally, the following events are logged in the Cluster log and System log respectively:

Software snapshot creation on Cluster Shared Volume(s) (‘volume location‘) with snapshot set id ‘snapshot id‘ failed with error ‘HrError(0x80042308)(2147754760)’. Please check the state of the CSV resources and the system events of the resource owner nodes.

 

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume1’ (‘name’) is no longer available on this node because of ‘STATUS_IO_TIMEOUT(c00000b5)’. All I/O will temporarily be queued until a path to the volume is reestablished.

 

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          Date and time
Event ID:      5142
Task Category: Cluster Shared Volume
Level:         Error
Keywords:
User:          SYSTEM
Computer:      Computer name
Description: Cluster Shared Volume ‘Volume3’ (‘Cluster Disk 4’) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

The virtual machine enters a paused state because the Ntfs.sys driver incorrectly reports the available space on the CSV volume when the backup software tries to create a snapshot of the CSV volume. Additionally, the CSV volume goes offline because the CSV volume does not resume from a paused state after an I/O delay issue or an I/O error occurs.
Note The CSV volume is resilient.

A supported hotfix is available from Microsoft.

There is more:

After you install the hotfix, CSV volumes do not enter paused states as frequently. Additionally, a cluster’s ability to recover from expected paused states that occur when a CSV failover does not occur is improved.

To avoid CSV failovers, you may have to make additional changes to the computer after you install the hotfix. For example, you may be experiencing the issue described in this article because of the lack of hardware support for Offloaded Data Transfer (ODX). This causes delays when the operating system queries for the hardware support during I/O requests.

In this situation, disable ODX by changing the FilterSupportedFeaturesMode value for the storage device that does not support ODX to 1. For more information about how to disable ODX, go to the Microsoft website.

Altaro Giving Away 50 Free PC Backup Licenses To All Hyper-V Administrators!

I’ve gotten some very exciting news from Altaro, makers of Altaro Hyper-V Backup (that supports Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V).  Altaro wants to give away for free 50 copies of their desktop backup product, Oops!Backup, to each Hyper-V administrator that can prove that they run Hyper-V.  Here’s the press release:

Altaro Software, a fast-growing developer of backup solutions for Microsoft Hyper-V, today announced that it is giving every Microsoft Hyper-V administrator 50 free licenses of Oops!Backup, their desktop backup solution.

“Following the success of our Hyper-V Backup solution this year, we wanted to give something back to the Hyper-V community during the holiday season” commented David Vella, CEO of Altaro. “Hyper-V admins can give out these licenses to their colleagues, friends and family, for use at work or at home.”

Oops!Backup is a popular desktop backup solution that allows users to preview & restore versions of their files from different points in time.

Any network administrator who uses Microsoft Hyper-V is eligible for the free license keys, they simply need to visit the Altaro website, send in a screenshot of their Hyper-V Manager and expect an email with their respective keys.

To claim the 50 free licenses go here. Thanks Altaro!

Note: Giveaway expires on Monday December 24th. Licenses are Not-For-Resale (NFR) keys.

Online Backup to Windows Azure Using System Center 2012 SP1 – Data Protection Manager

I blogged about Windows Azure Online Backup in March of this year.  What was announced then was a way to get an offsite backup of files and folders (only) into Windows Azure directly from Windows Server 2012 (including the Essentials edition).

The online backup market is pretty crowded and competitive.  You need to offer something that is different, and preferably, integrated with the customer already has for onsite backups so that the customer does not have to manage 2 backup systems.

Being a cloud service, Windows Azure Online Backup (WAOB) is something that can be tweaked and extended relatively rapidly.  And Microsoft has extended it.  WAOB will support protecting backup data from SysCtr 2012 SP1 DPM to the cloud.

With the System Center 2012 SP1 release, the Data Protection Manager (DPM) component enables cloud-based backup of datacenter server data to Windows Azure storage.  System Center 2012 SP1 administrators use the downloadable Windows Azure Online Backup agent to leverage their existing protection, recovery and monitoring workflows to seamlessly integrate cloud-based backups alongside their disk/tape based backups. DPM’s short term, local backup continues to offer quicker disk–based point recoveries when business demands it, while the Windows Azure backup provides the peace of mind & reduction in TCO that comes with offsite backups. In addition to files and folders, DPM also enables Virtual Machine backups to be stored in the cloud.

What this means is that you can:

  • Continue to reap the rewards of your investment in DPM for on-premises backups to disk and/or tape
  • Extend this functionality to back up to the cloud from the storage pools in DPM

image

With WAOB you will be able to:

… transparently recover files, folders and VMs from the cloud

There will be block level incremental backups to reduce the length of backup jobs and reduce the amount of data transfer.  Data is compressed and encrypted before it leaves your network.  And critically important for you to note:

The encryption passphrase is in your control only.  Once the data is encrypted, it stays that way in storage in Microsoft.  They have no way to decrypt your data without your passphrase.  So choose a good one, and document/store is somewhere safe, e.g. with a lawyer or in a deposit box.

There is throttling for bandwidth control.  You can verify data integrity in the cloud without restoring it (but test restores are a good thing).  You can also configure retention policies – you balance regulatory requirements, business needs, and online storage costs.

To go with this, the Windows Azure Online Backup portal has been launched (last week).  You can sign up for a free preview with 300 GB of storage space.

It’s still beta so we don’t know:

  • Pricing
  • RTM date
  • How it will be sold, e.g. via partner channel which is critically important (see Office 365).