KB2918371 – Scheduled Backup Of Hyper-V Fails With Event ID 517 & Error 0x80780049

This new article from Microsoft refers to “Windows Server Backup running on the host operating system”, but I cannot say if this issue affects third party backup tools, DPM or not. REPEAT: DO NOT ASK ME – ASK MICROSOFT. Very often Microsoft has a bad habit of stating that a backup fix is for a scenario featuring a Microsoft backup product, but it really affects any tool backing up Hyper-V.

Symptoms

Consider the following scenario:

  • You have a Windows Server 2012 hyper-v host and a Windows Server 2012 guest virtual machine (VM).
  • You start Windows Server Backup on the host operating system.
  • You click Backup Schedule to start the backup schedule wizard and then click Next.
  • You select Custom on the Select Backup Configuration tab and then click Next.
  • You click Add Items, select host component and the guest VM, and then complete the wizard.
  • You restart the host operating system.

In this scenario, scheduled backup fails with event backup ID 517 and error 0x80780049.

“The Update” fixes this issue for Windows Server 2012 R2 Hyper-V and Windows 8.1 Client Hyper-V. A hotfix is available for Windows Server 2012 Hyper-V and Windows 8 Hyper-V.

If the problem is limited to Windows Server Backup then it will typically affect just small installations (1 or maybe even 2 hosts) and labs.

KB2935810 – CSV Failover Takes Longer Than Expected In Windows Failover Cluster

Microsoft released a hotfix for WS2012 and WS2012 R2 to deal with a scenario where CSV failover time is longer than expected in Windows failover cluster.

Symptoms

In a Windows failover-cluster that uses Cluster Shared Volumes (CSV), the diff area that is allocated by Volsnap is large and fragmented. In this situation, you encounter the following issues:

  • The failover time on the CSV is longer than expected.
  • The time that Volsnap takes to mount or unmount snapshots is several minutes.

More Information

When a NTFS or ReFS volume is mounted or dismounted, Volsnap iterates through the diff area to mount or unmount the snapshots that belong to that volume. When the diff area allocation becomes large and fragmented, the time that Volsnap takes to mount or unmount operations could be several minutes. Additionally, failover time can be longer than expected.

The resolution is … hmm … long. It is related to two updates:

Two new cluster Physical Disk resource private properties were added, and they can be manipulated to resolve the issue:

  • SnapshotDiffSize: This property controls the maximum diff area size that can be consumed by Volsnap for a Physical Disk resource configured for CSV. Units: In MB (DWORD), Default Value: 0, Maximum Value: 1 TB, The Physical Disk resource must be taken offline/online for changes to take effect.
  • SnapshotAgeLimit: This property is aResource Type private property of the Physical Disk to control the maximum age of a snapshot. Long lived snapshots are a significant contributor to diff area fragmentation. Units: In Days (DWORD), Default Value: 7, Range: 1-60 , This is a global property which affects  all Physical Disk resources. You do not have to take the resource offline or online for it to take effect.

Get-ClusterSharedVolume <Cluster Disk Name> | Set-ClusterParameter snapshotdiffsize <Snapshot Diff Size in MB>

Get-ClusterResourceType "physical disk" | Set-ClusterParameter snapshotagelimit <Snapshot Age in Days>

My advice: leave well alone and only manipulate these settings under the advice of Microsoft support (not some local dude, but actual Premier support).

KB2878635 – A December 2013 Update To Improve CSV Backup Resiliency On WS2012 Hyper-V

Microsoft released an update in December 2013 that that improves the resiliency of the cloud service provider in Windows Server 2012. That’s a little “marketing speak”. In truth, this update is designed to resolve issues with CSV backup on Windows Server 2012 Hyper-V. This update two fixes. Please note the post-installation instructions!!!!

Symptoms

This article introduces an update that improves the resiliency of the cloud service provider in Windows Server 2012. This update is dated December 2013.
This update replaces update 2870270, which is used to improve resiliency. Also, this update includes update 2869923 and update 2908415. Additionally, the update resolves several issues that occur in the following scenario: 

  • You have a Hyper-V failover cluster.
  • The Hyper-V resources are saved in .vhd files on Cluster Shared Volumes File System (CSVFS) volumes.
  • You use a backup solution. For example, you use System Center Data Protection Manager (DPM) in the Hyper-V environment.
  • You try to perform a backup, and a snapshot is taken of the CSVFS volume.
  • The current active node encounters an error, and the cluster fails over to another node.
  • DPM may start a consistency check on the volume unexpectedly.
Issue 1

Snapshots that are no longer being used are not cleaned up. Therefore, Volume Shadow Copy Service (VSS) snapshots may accumulate on Cluster Shared Volumes (CSV) and guest virtual machines. This causes a deadlock in the Resource Hosting Subsystem (RHS) process, and causes  CSV failures. Additionally, all Hyper-V instances that uses the VHD files go down.
Additionally, the following events are logged separately in the Cluster log and in the System log:

Software snapshot creation on Cluster Shared Volume(s) (‘volume location‘) with snapshot set id ‘snapshot id‘ failed with error ‘HrError(0x80042308)(2147754760)’. Please check the state of the CSV resources and the system events of the resource owner nodes.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume1‘ (‘name’) is no longer available on this node because of ‘STATUS_IO_TIMEOUT(c00000b5)’. All I/O will temporarily be queued until a path to the volume is reestablished.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date and time
Event ID: 5142
Task Category: Cluster Shared Volume
Level: Error
Keywords:
User: SYSTEM
Computer: Computer name
Description: Cluster Shared Volume ‘Volume3‘ (‘Cluster Disk 4‘) is no longer accessible from this cluster node because of error ‘ERROR_TIMEOUT(1460)’. Please troubleshoot this node’s connectivity to the storage device and network connectivity.

Issue 2

If a CSVFS volume is repeatedly added and removed from a cluster, or if CSVFS snapshots are repeatedly created, the Plug and Play (PnP) hive in the following registry path may grow with many additional registry keys: HKEY_LOCAL_MACHINESYSTEMCurrentControlSetEnumSTORAGEVolumeSnapshot.
Therefore, installation of PnP volumes (which occurs usually during a resource move or failure) may become slow.  This update prevents future unnecessary growth in the volume snapshot registry key, but does not clean up existing registry entries.

 

Resolution

A supported hotfix is available from Microsoft.

Post-Installation Instructions

After you install this hotfix on a Hyper-V server, you must update the integration components in the virtual machines that are running Windows Server 2012. To do this, use Hyper-V Manager to connect to the virtual machine. This starts the Virtual Machine Connection tool. Then, on the Action menu, click Insert Integration Services Setup Disk. Run the Setup.exe file on the Integration Services Setup disk to update the integration component.

How WS2012 R2 Hyper-V Backup Works

You might have heard that WS2012 R2 Hyper-V changed how it performs backups of virtual machines but not heard how.  I’ll give you a glimpse of what that change was in this post.

First some terminology:

  • Volume Shadow Copy Service (VSS) Snapshot: VSS performs a snapshot of a volume to get a consistent backup of a file or files/folders.
  • Checkpoint: Formerly known as a (Hyper-V) snapshot, a checkpoint (matching the SCVMM term) creates a crash consistent copy of a VM at a point in time, using AVHD or AVHDX files that are linked to a parent VHD or VHDX file – or linked to a parent AVHD or AVHDX files in the case of nested checkpoints.

In the past, VSS is used to create a VSS snapshot of the volume(s) containing the files of a VM that is to be backed up.  The snapshot was mounted and the required files (identified by the Hyper-V Writer) were backed up.  The process was pretty complex and could lead to problems for some customers.  The quality of storage hardware VSS writers had too much impact on the process.  Once you have a pretty clean environment (network/storage design, patching, drivers and firmware), backup was the one problem that could hurt.  CSV 2.0 sorted out most of that, but Microsoft wanted to simplify the backup process.

Thanks to live merging of checkpoints (snapshots) that was added in WS2012, and the experience Microsoft has gained by using checkpoints (snapshots) for other features (such as Hyper-V Replica), WS2012 R2 Hyper-V has switched to using checkpoints instead of VSS snapshots.

Now a checkpoint is created of every virtual machine that is to be backed up.  Writes are temporarily redirected to the checkpoint’s AVHDX/AHVD file(s).  This gives the backup requestor a clean & crash consistent copy of the virtual machine’s VHD or VHDX files that are safe to read.    After the backup, the checkpoint is merged and the job is done.

Note: You might notice that a VSS snapshot is still taken of the relevant volume(s).

We’ve moved from application consistent backup to the next (small) step down with a crash consistent backup.  This isn’t a big deal – not for backup experts anyway.  For products like SQL Server or Exchange, restoring this VM is like someone reset the VM.  The restored database starts up, does a quick cleanup, and carries on operating as it did before the backup operation.  In return, we get a much simpler backup process that should prove to be more resilient and selective.

Windows Azure Backup Is Generally Available & Other Azure News

The following message came in an email overnight:

Windows Azure Backup is now generally available, Windows Azure AD directory is created automatically for every subscription, and Hyper-V Recovery Manager is in preview.

What does that mean?  Some backup plans charge you based on the amount of data that you are protecting.  Personally, I prefer that approach because it is easy to predict – I have 5 TB of data and it’s going to cost me 5 * Y to protect it.  Azure Online Backup has gone with the more commonly used approach of charging you based on how many GB/month of storage that you consume on Microsoft’s cloud.  This is easy for a service provider to create bills, but it’s hard for the consumer to estimate their cost … because you have elements like deduplication and compression to account for.

The pricing of Azure Online Backup looks very competitive to me. 

Windows Azure Backup is billed in units based on your average daily amount of compressed data stored during a monthly billing period.

Some plans get the first 5GB free and then it’s €00.3724 per GB per month.  In the USA, it will be $00.50 per GB per month.  Back when I worked in backup, €1/GB per month was considered economic.

In other Azure news:

A Windows Azure AD directory is created automatically for every subscription:

Starting today, every Windows Azure subscription is associated with an autocreated directory in Windows Azure Active Directory (AD). By using this enterprise-level identity management service, you can control access to Windows Azure resources.

To accommodate this advancement, every Windows Azure subscription can now host multiple directories. Additionally, Windows Azure SDK will no longer rely on static management certificates but rather on user accounts in Active Directory. Existing Active Directory tenants related to the same user account will be automatically mapped to a single Windows Azure subscription. You can alter these mappings from the Windows Azure Management Portal.

Take advantage of the new Windows Azure Hyper-V Recovery Manager preview.

Windows Azure Hyper-V Recovery Manager helps protect important applications by coordinating the replication of Microsoft System Center clouds to a secondary location, monitoring availability, and orchestrating recovery as needed.

The service helps automate the orderly recovery of applications and workloads in the event of a site outage at the primary data center. Virtual machines are started in an orchestrated fashion to help restore service quickly.

The Euro GA pricing for Hyper-V Recovery Manager was included in the email.  It will cost 11,9152€ per virtual machine per month to use this service.  The website is not updated with GA pricing.

KB2886362 – New Update Rollup For DPM 2012 SP1 If Backing Up Hyper-V VMs

Microsoft released a fix for System Center 2012 SP1 – Data Protection Manager for an issue where DPM consumes too much space to track changes of Hyper-V VMs stored on CSVs:

DPM has express full technology where DPM tracks the changes via DPM filter driver and the changed block information are tracked as bitmap and is stored in bitmap files. In some scenarios, DPM bitmap files are becoming very big leading to higher CSV volume consumption. This issue is fixed in DPM filter and effects only VM protection scenarios. This fix is done on the DPM filter driver running on the production server.

Please note: It is advised to apply this update only if you are backing up Hyper-V VMs. This upgrade will lead to CC on all data sources that are effected by this particular DPM servers.

This update appears to be called Update Rollup 3.6 and is available via Windows Update.  My advice is:

  • Let some other sucker test this update rollup for Microsoft.  Don’t be the fool who installs this and has to go to the TechNet Forums for help because it breaks something.  Wait one month; if all is well, then consider installing the update.
  • Only rush the install of this update if you are suffering badly from the above problem.

KB2870270 – A Hotfix Bundle For WS2012 Hyper-V Failover Clusters

Although not referred to as an update rollup, this latest hotfix is a bundle of fixes.  As before, don’t rush out to deploy it unless it looks like it’s going to fix a problem you are having.  Otherwise, wait a few weeks, test if you can, check the news, and then deploy it to prevent those problems.

This bundle is an update that improves cloud service provider resiliency in Windows Server 2012.  That title/description sounds like someone didn’t know how to describe it and fell back to marketing jargon.  Please don’t let the title confuse you – this bundle contains important fixes for all.  This new KB2870270 replaces the recent KB2848344 (Update that improves cloud service provider resiliency in Windows Server 2012).

The bundle contains:

  • KB2796995: Offloaded Data Transfers fail on a computer that is running Windows 8 or Windows Server 2012
  • KB2799728: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2801054: VSS_E_SNAPSHOT_SET_IN_PROGRESS error when you try to back up a virtual machine in Windows Server 2012
  • KB2813630: Virtual machine enters a paused state or a CSV volume goes offline when you try to create a backup of the virtual machine on a Windows Server 2012-based failover cluster
  • KB2848727: "Move-SmbWitnessClient" PowerShell command fails in Windows Server 2012

KB2869923 – VM Crash Caused By Physical Disk Resource Move During WS2012 CSV Backup

An “interesting” week for Hyper-V/clustering hotfixes, and they didn’t stop.  Some more came out yesterday.  Test (if you can), wait a few weeks, and then deploy.  This one is for when a Physical Disk resource move during the backup of a Cluster Shared Volume (CSV) may cause resource outage.

Symptoms

Consider the following scenario:

  • You configure a Windows Server 2012-based Hyper-V failover cluster.
  • The VHD or VHDX files reside on a Cluster Shared Volume (CSV).
  • Backups of the CSV are performed using software snapshots.
  • Physical Disk resource for the CSV is moved to another node in the cluster.

In this scenario, the Physical Disk resource may fail to come online if the backup of the CSV is in progress. As a result, virtual machines that rely on the CSV may crash.

 

Cause

During a move of the Physical Disk resource, when the Physical Disk resource comes online on the new node it queries Volume Snapshot Service (VSS) to discover the software snapshots associated with that volume. If the move takes place while software snapshot is in progress, VSS may fail to respond or have a long delay to respond. Ultimately, this may cause the Physical Disk resource to either fail to come online or take a long time to come online on the new node. As a result, VMs that have VHD files on the CSV may crash.

A supported hotfix is available from Microsoft.

Altaro Launches Hyper-V Backup v4

Congratulations to the really nice folks at Altaro (that’s been my experience and that of some of my customers) on the release of Altaro Hyper-V Backup v4.  Here are some of the features:

image

Altaro, like a few others, are really quick to keep up with Microsoft.  I wouldn’t be surprised if they quickly celebrated the release while installing WS2012 R2 Hyper-V to get working on it … while certain big names in backup still don’t support WS2012.

image

Hyper-V Backup is available as a nice free solution for the very small business and a fairly priced solution for larger businesses.

Windows Server 2012 R2 Hyper-V – Linux Support Improvements

Yes, Hyper-V supports many Linux distros, architectures, and versions, and that support has been improved in WS2012 R2 Hyper-V.

It’s no secret that there were some changes to the Linux Integration Services that are built into the Linux kernel.  Those changes were intended for and supported on WS2012 R2 Hyper-V (not WS2012 Hyper-V).  Those two changes are:

  • Dynamic Memory: Linux guest OSs can use the balloon driver to have the exact same support for Dynamic Memory as Windows (add and remove).  Bear in mind the constraints of the Linux distro itself.  And remember the recommendations of Linux when assigning large amounts of CPU/RAM to a machine.  These are Linux recommendations/limits, not Hyper-V ones.
  • Online backup: You can now perform a file system freeze in the Linux guest OS to get a file system consistent backup of a Linux guest OS without pausing the VM.  Linux does not have VSS (like Windows) so we cannot get application consistency.  But this is still a huge step forward.  According to Microsoft, WS2012 R2 Hyper-V is now the best way to virtualize and backup Linux; you can use any backup tool that supports Hyper-V to reliably backup your Linux VMs without using some script that does a dumb file copy.

Remember that online VHDX resizing is a host function, so Linux guest OSs support this too.  Don’t ask me how to resize Linux partitions or make use of the new free space Smile

There is also a new video driver for Linux.  This gives you a better video experience as with the Windows guest OS, including better mouse support – but hey, real Linux admins don’t click!

To take advantage of these features, make sure you have an up-to-date Linux kernel in your VMs, and your running them on WS2012 R2 Hyper-V.