Application-Aware Disaster Recovery For VMware, Hyper-V, and Azure IaaS VMs with Azure Site Recovery

Speaker: Abhishek Hemrajani, Principal Lead Program Manger, Azure Site Recovery, Microsoft

There’s a session title!

The Impact of an Outage

The aviation industry has suffered massive outages over the last couple of years costing millions to billions. Big sites like GitHub have gone down. Only 18% of DR investors feel prepared (Forrester July 2017 The State of Business Technology Resiliency. Much of this is due to immature core planning and very limited testing.

Causes of Significant Disasters

  • Forrester says 56% of declared disasters are caused by h/w or s/w.
  • 38% are because of power failures.
  • Only 31% are caused by natural disasters.
  • 19% are because of cyber attacks.

Sourced from the above Forrester research.

Challenges to Business Continuity

  • Cost
  • Complexity
  • Compliance

How Can Azure Help?

The hyper-scale of Azure can help.

  • Reduced cost – OpEx utility computing and benefits of hyper-scale cloud.
  • Reduced complexity: Service-based solution that has weight of MS development behind it to simplify it.
  • Increased compliance: More certifications than anyone.

DR for Azure VMs

Something that AWS doesn’t have. Some mistakenly think that you don’t need DR in Azure. A region can go offline. People can still make mistakes. MS does not replicate your VMs unless you enable/pay for ASR for selected VMs. Is highly certified for compliance including PCI, EU Data Protection, ISO 27001, and many, many more.

  • Ensure compliance: No-impact DR testing. Test every quarter or, at least, every 6 months.
  • Meet RPO and RTO goals: Backup cannot do this.
  • Centralized monitoring and alerting

Cost effective:

  • “Infrastructure-less” DR sites.
  • Pay for what you consume.

Simple:

  • One-click replication
  • One-click application recovery (multiple VMs)

Demo: Typical SharePoint Application in Azure

3 tiers in availability sets:

  • SQL cluster – replicated to a SQL VM in a target region or DR site (async)
  • App – replicated by ASR – nothing running in DR site
  • Web – replicated by ASR – nothing running in DR site
  • Availability sets – built for you by ASR
  • Load balancers – built for you by ASR
  • Public IP & DNS – abstract DNS using Traffic Manager

One-Click Replication is new and announced this week. Disaster Recovery (Preview) is an option in the VM settings. All the pre-requisites of the VM are presented in a GUI. You click Enable Replication and all the bits are build and the VM is replicated. You can pick any region in a “geo-cluster”, rather than being restricted to the paired region.

For more than one VM, you might enable replication in the recovery services vault (RSV) and multi-select the VMs for configuration. The replication policy includes recovery point retention and app-consistent snapshots.

New: Multi-VM consistent groups. In preview now, up to 8 VMs. 16 at GA. VMs in a group do their application consistent snapshots at the same time. No other public cloud offers this.

Recovery Plans

Orchestrate failover. VMs can be grouped, and groups are failed over in order. You can also demand manual tasks to be done, and execute Azure Automation runbooks to do other things like creating load balancer NAT rules, re-configuring DNS abstraction in Traffic Manager, etc. You run the recovery plan to failover …. and to do test failovers.

DR for Hyper-V

You install the Microsoft Azure Recovery Services (MARS) agent on each host. That connects you to the Azure RSV and you can replicate any VM to that host. No on-prem infrastructure required. No connection broker required.

DR for VMware

You must deploy the ASR management appliance in the data centre. MS learned that the setup experience for this is complex. They had a lot of pre-reqs and configurations to install this in a Windows VM. MS will deliver this appliance as an OVF template from now on – familiar format for VMware admins, and the appliance is configured from the Azure Portal. Replicate Linux and Windows VMs to Azure, as with Hyper-V from then on.

Demo: OVF-Based ASR Management Appliance for VMware

A web portal is used to onboard the downloaded appliance:

  1. Verify the connection to Azure.
  2. Select a NIC for outbound replication.
  3. Choose a recovery services vault from your subscription.
  4. Install any required third-party software, e.g. PowerCLI or MySQL.
  5. Validate the configuration.
  6. Configure vCenter/ESXi credentials – this is never sent to Azure, it stays local. The name of the credential that you choose might appear in the Azure portal.
  7. Then you enter credentials for your Windows/Linux guest OS. This is required to install a mobility service in each VMware VM. This is because VMware doesn’t use VHD/X, it uses VMDK. Again, not sent to MS, but the name of the credential will appear in the Azure Portal when enabling VM replication so you can select the right credentials.
  8. Finalize configuration.

This will start rolling out next month in all regions.

Comprehensive DR for VMware

Hyper-V can support all Linux distros supported by Azure. On VMware they’re close to all. They’ve added Windows Server 2016, Ubuntu 14.04 and 16.04 , Debian 7/8, managed disks, 4 TB disk support.

Achieve Near-Zero Application Data Loss

Tips:

  • Periodic DR testing of recovery plans – leverage Azure Automation.
  • Invoke BCP before disasters if you know it’s coming, e.g. hurricane.
  • Take the app offline before the event if it’s a planned failover – minimize risks.
  • Failover to Azure.
  • Resume the app and validate.

Achieve 5x Improvement in Downtime

Minimize downtime: https://aka.ms/asr_RTO

He shows a slide. One VM took 11 minutes to failover. Others took around/less than 2 minutes using the above guidance.

Demo: Broad OS Coverage, Azure Features, UEFI Support

He shows Ubunu, CentOS, Windows Server, and Debian replicating from VMware to Azure. You can failover from VMware to Azure with UEFI VMs now – but you CANNOT failback. The process converts the VM to BIOS in Azure (Generation 1 VMs). OK if there’s no intention to failback, e.g. migration to Azure.

Customer Success Story – Accenture

They deployed ASR. Increased availability. 53% reduction in infrastructure cost. 3x improvement in RPO. Savings in work and personal time. Simpler solution and they developed new cloud skills.

They get a lot of alerts at the weekend when there’s any network glitches. Could be 500 email alerts.

Demo: New Dashboard & Comprehensive Monitoring

Brand new RSV experience for ASR. Lots more graphical info:

  • Replication health
  • Failover test success
  • Configuration issues
  • Recovery plans
  • Error summary
  • Graphical view of the infrastructure: Azure, VMware, Hyper-V. This shows the various pieces of the solution, and a line goes red when a connection has a failure.
  • Jobs summary

All of this is on one screen.

He clicks on an error and sees the hosts that are affected. He clicks on “Needs Attention” in one of the errors. A blade opens with much more information.

We can see replication charts for a VM and disk – useful to see if VM change is too much for the bandwidth or the target storage (standard VS premium). The disk level view might help you ID churn-heavy storage like a page file that can be excluded from replication.

A message digest will be sent out at the end of the day. This data can be fed into OMS.

Some guest speakers come up from Rackspace and CDW. I won’t be blogging this.

Questions

  • When are things out: News on the ASR blog in October
  • The Hyper-V Planner is out this week, and new cost planners for Hyper-V and VMware are out this week.
  • Failback of managed disks is there for VMware and will be out by end of year for Hyper-V.

Microsoft Makes vSphere Look Like A Toy Once Again

Microsoft has increased the maximums once again for Hyper-V, with the upcoming release of Windows Server 2016. They’re leaving VMware not just in the dust, but somewhere so far behind that they’re over the horizon.

image

How does vSphere 6.0 stack up against the superior Hyper-V?

image

Ouch! Enjoy, vFanboys!

I can’t wait for the angry tweets!

Are VMware Workstation & Fusion Dead?

There’s lots of bad news coming out of VMware lately. The kings of enterprise virtualization (by percentage of incumbent business only) have clung to the past, anticipating the private cloud was the only way forward, and did too little/too late with public cloud. Meanwhile Amazon, Google, and Microsoft attacked on both sides; Amazon with AWS on public cloud, Google to some extent (I reckon it’s overblown) with Apps, and Microsoft on all sides with Hyper-V, WAPack, System Center, Office 356/etc, and Azure.

The first cracks have appeared with some lesser products in the VMware portfolio – VMware made redundant the entire US-base development staff of Fusion and Workstation. To keeps sales going, VMware said:

VMware continues to offer and support all of our End-User Computing portfolio offerings …

I work in the channel (how software gets from manufacturer to reseller). I know that line. I know it very well. It’s what companies like Microsoft, VMware, etc say to keep sales going after a decision has been made to stop development of a product, and long before they announce that it is dead. They just want what little revenue there is to keep coming in. When you poke, you’ll be told something like “we continue to sell and support X”. You can hear the crickets and tumbleweeds roll when you ask about development and future versions.

It appears that vCloud Air, the public/private cloud program, was also hit with layoffs.

Meanwhile, you can:

  • Use the free/awful VirtualBox by Oracle.
  • Enable Client Hyper-V in the Pro editions of Windows 8, 8.1, or 10.
  • Use the free and fully functional Hyper-V Server on some “server”
  • Use trial/MSDN or Open/CSP accounts in Azure

In other news, Microsoft has launched the public preview of the Azure that you can download, with Microsoft Azure Stack.

What is it that they say about rolling stones and moss?

The Genuine Need for Disaster Recovery In Ireland/EU

How many times have you watched or read the news, saw some story about an earthquake, hurricane, typhoon, or some other disaster and think “that will never happen here”? Stop kidding yourself; disasters can happen almost everywhere.

I’ve always considered Ireland to be relatively safe. We don’t have (anything you’d notice) earthquakes, typhoons, or tornadoes; our cattle and sheep don’t need flying licenses. Our weather is dominated by the gulf stream, keep Ireland temperate. It doesn’t get hot here (we are quite northerly) and our winters consist of cloud, rain, and normally about half a day of snow. We get the tail end of some of those hurricanes that hit the east coast US, but there’s not much left by the time they reach us – some trees get knocked over, some tiles knocked on our roofs, but it’s not too bad. Even when we look at our neighbours in England, we see how their more extreme climate causes them disasters that we don’t get. Natural disasters just don’t happen here. Or do they?

The last month or so has revealed that to be a lie. Ireland has been battered by 6 storms in the past month. The latest, Storm Frank, was preceded with warnings that the country was saturated. That means that the ground has absorbed all of the water that it can; any further rainfall will not be absorbed, and it will pool, flow, and flood.

This morning, I woke to these scenes:

image

Enniscorthy, Co. Wexford [Image source: Paddy Banville]

Embedded image permalink

Graignamanagh, Co. Kilkenny [Image source: Graignamanagh G.A.A]

image

Middleton, Co. Cork [Image source: Fiona Donnelly]

Frank isn’t finished. It’s still blowing outside my office and more rain is sure to fall. There are stories of communities being evacuated to hotels, and the above photos are just the easy ones for the media to access.

This isn’t just a case of cows trapped in fields, stick a sandbag on it and you’re sorted, or somewhere far away. This is local. And Ireland is a relatively safe place – we’re not Oklahoma, a place that some deity has decided should be subject to cat 5 tornadoes every time you’re not looking. Dorothy, the point is, that disasters happen everywhere, including in the EU where we think it safe.

Let’s bring this back to business. Businesses have been put out of action by these floods. Odds are any computers or servers were either on the ground floor or in the basement. Those machines are dead. That means those businesses are dead. They might be lucky enough to have tapes (let’s leave that for another time) stored offsite but how reliable are they and will bare-metal restore work, or will it take forever? How much money will those businesses lose, or more critically, will those businesses survive loss of customers?

This is exactly why these businesses need a disaster recovery (DR) solution. There are several reasons why they don’t have one now:

  • Fires and other unnatural disasters happen everywhere
  • They couldn’t afford one
  • The business owners didn’t think there was a need for one
  • Some resellers didn’t think there was demand for one so they never brought it up with their customers

The need is there, as we can clearly see above. And thanks to Microsoft Azure, DR has never been so affordable. FYI, it comes in at a price that is a small fraction of the cost of solutions from the likes of Irish companies such as KeepITSafe – I’ve done the competitive pricing – and it opens that customer up to more technical opportunities with hybrid cloud solutions.

Microsoft Azure Site Recovery Services (ASR) is a disaster recovery-as-a-service (DRaaS) or cloud DR site offering from Microsoft. The beauty of it is that it’s there for everyone from the small business to the large enterprise. It works with Hyper-V, vSphere or physical machines, and it works with Windows or Linux as long as the OS is supported by Azure (W2008 R2 or later on the Windows side).

Note: There is a cost overhead for vSphere or physical machines to allow for on-premises conversion and forward and in-cloud management and storage, so you need a certain scale to absorb that cost. This is why I describe ASR as being perfect for SMEs with Hyper-V and mid-large companies with Hyper-V, vSphere or physical machines.

If I had ASR in place, and I has a business on the quayside in Cork, near the Slaney in Enniscorthy, or anywhere else where the rivers were close to bursting the banks then I would perform a planned failover, requiring about 2 minutes of my time to started a pre-engineered and tested one-click failover. My machines would shut down in the desired order, flush the last bit of replication to Azure, and start up the VMs in the desired order in Azure, and my machines and data would be safe. I can failback to new equipment or stay in Azure if the disaster wipes out my servers. And if that disaster doesn’t happen, I can easily failback to new equipment, or choose to stay in Azure and not worry about local floods again.

Technorati Tags: ,,,,,

Microsoft News–13 July 2015

I don’t have all that much for you, but the big news is the Azure Site Recovery (ASR, Microsoft’s DR site in the cloud) now supports VMware virtual machines and physical servers, without using System Center. You do need to run some stuff on-prem and in the cloud to make it work though, so there will be a tipping point where the solution becomes affordable.

Azure

clip_image001

System Center

Office 365

Ignite 2015 – Protecting Your VMware and Physical Servers by Using Microsoft Azure Site Recovery

These are my notes from the recording of this session by Gaurav Daga at Microsoft Ignite 2015. In case you don’t know, I’ve become of a fan of Azure Site Recovery (ASR) since it dropped SCVMM as a requirement for DR replication to the cloud. And soon it’s adding support for VMware and physical servers … that’s going to be a frakking huge market!

This technology currently is in limited preview (sign-up required). Changes will probably happen before GA.

Note: Replication of Hyper-V VMs is much simpler than all this. See my posts on Petri.com.

What is in Preview

Replication from the following to Azure:

  • vSphere with vCenter
  • ESXi
  • Physical servers

Features

  • Heterogeneous workload support (Windows and Linux)
  • Automated discovery of vSphere ESXi VMs, with or without vCenter
  • Manual discovery of physical machines (based on IP address)
  • Near zero RPOs with Continuous Data Protection (they’ll use whatever bandwidth is available)
  • Multi-VM consistency using Protection Groups. To have consistent failover of n-tier applications.

You get a cold standby site in Azure, consuming storage but not incurring charges for running VMs.

  • Connectivity over the Internet, site-site VPN or ExpressRoute
  • Secure data transfer – no need for inbound ports on the primary site
  • Recovery Plans for single-click failovers and low RTOs
  • Failback possible for vSphere, but not possible for physical machines
  • Events and email notifications for protection and recovery status monitoring

Deployment Architecture

  • An Azure subscription is required
  • A Mobility Service is downloaded and installed onto all required VMware virtual machines (not hosts) and physical servers. This will capture changes (data writes in memory before they hit the VMDK) and replicates them to Azure.
  • A Process Server sits on-premises as a DR gateway. This compresses traffic and caching. It can be a VM or physical machine. If there is a replication n/w outage it will cache data until the connection comes back. Right now, the PS is not HA or load balanced. This will change.
  • A Master Target runs in your subscription as an Azure VM. The changes are being written into Azure VHDs – this is how we get VMDK to VHD … in VM memory to VHD via InMage.
  • The Config(uration) Server is a second Azure VM in your subscription. It does all of the coordination, fix-ups and alerts.
  • When you failover, VMs will appear in your subscription, attach to the VHDs, and power up, 1 cloud service per failed over recovery plan.

image

Demo

The demo environment is a SharePoint server running on vSphere (managed using vSphere Client) that will be replicated and failed over to Azure. He powers the SP web tier and the SP website times out after a refresh in a browser. He’s using Azure Traffic Manager with 2 endpoints – one on-premises and one in the cloud.

In Azure, he launches the Recovery Plan (RP) – and uses the latest application consistent recovery point (VSS snapshot). AD starts, then SQL, app tier, web tier, and then an automation script will open an endpoint for the Traffic Manager redirection. This will take around 40 minutes end-to-end with human involvement limited to 1 click. The slowness is the time it takes for Azure to create/boot VMs which is considerably slower than Hyper-V or vSphere. #

Later on in the session …

The SharePoint site is up and running thanks to the failed over Traffic Manager profile. What’s happened;

Now, back to setting this up:

First you need create an ASR vault. Then you need to deploy a Configuration Server (the manager or coordinator running in an Azure VM). This is similar to the new VM dialogs – you pick a name, username/password, and a VNET/subnet (requires site-site n/w configuration beforehand). A VM is deployed from a standard template in the IaaS gallery (starts with Azure A3 for required performance and scale). You download a registration key and register it in your Configuration Server (CS). The CS should show up as registered. Then you need to deploy a Master Target Server. You need a Windows MTS to replicate VMs with Windows and you need a Linux MTS to replicate VMs with Linux. There are two choices: Std A4 or Standard D14 (!). And you associate the new MTS with a CS. Again, a gallery image is deployed for you.

Next you will move on-premises to deploy a Process Server. Download this from the ASR vault quick start. It is an installation on WS2012 R2.

Are you going to use a VPN or not? The default is “over the Internet” via a public IP/port (endpoint to the CS). If you select VPN then a private IP address will be used.

Now you must register a vCenter server to the Azure portal in the ASR vault. Enter the private IP, credentials and select the on-premises Process Server. All VMs on vSphere will be discovered after a few minutes.

Create a new Protection Group in the ASR vault, select your source, and configure your replication policy:

  • Multi-VM consistency: enable protection groups for n-tier application consistency.
  • RPO Threshold: Replication will use what bandwidth is made available. Alerts will be raised if any server misses this threshold.
  • Recovery Point Retention: How far back in time might you want to go during a failover? This retains more data.
  • Application consistent snapshot frequency: How often will this be done?

image

Now VMs can be added to the Protection Group. There is some logic for showing which VMs cannot be replicated. The mechanism is guest-based so VMs must be powered on to replicate. Powered off VMs with replication enabled will cause alerts. Select the server, select a Process Server, select a MTS, and a storage account for the replicated VHDs. You then must enter credentials to allow you to push the Mobility Service (the replication agent) to the VMs’ guest OSs. Alternatively, use a tool like SCCM to deploy the Mobility Service in advance.

Monitoring is shown in the ASR events view. You can configure e-mail notifications here.

There’s a walk through of creating a RP.

image

Prerequisites

These Azure components must be in the same region:

  • Azure VNET
  • Geo-redundant storage account
  • ASR vault
  • Standard A3 Configuration Server
  • Standard A4 or Standard D14 Master Target Servers

Source machines must comply with Azure VM requirements:

  • Disc count: maximum of 32 disks per protected source machine
  • Individual disk capacity of no more than 1023 GB
  • Clustered servers not supported
  • UEFI/EFI boot not supported
  • BitLocker encrypted volumes not supported

Make sure your Azure subscription can fire up enough virtual processors for a failover – the limit is quite low by default so you will probably have to open an Azure account support call (free as a part of your subscription).

On-premises you need VMware with:

  • vCenter Server or ESXi 5.1/5.5 with latest updates
  • VMs with VMware tools installed & running
  • All vCenter Server resource names in English

The Process Server:

  • WS2012 R2 physical or virtual machine
  • Same network/subnet as the source VMs
  • VMware vSphere VLI 5.5 for automatic discovery of VMs

Network Connectivity:

  • Bandwidth – don’t you dare ask me how much! That’s what you’re paid to figure out.
  • Firewall rules to allow outbound connections to Azure
  • Upfront decision on whether you’re connecting via public Internet, site-site VPN, or ExpressRoute.

Windows Server Source (Physical or Virtual) Machine Requirements:

  • W2008 R2 SP1, WS2012, WS2012 R2
  • No dynamic disk support
  • Single or multiple NICs (Azure VM spec must support the number of NICs)

Linux Source (Physical or Virtual) Machine Requirements (64-bit only):

  • CentOS 6.5/6.5/6.6
  • Oracle Linux 6.4/6.5
  • SLES 11 SP3
  • File systems: EXT3/EXT4/ReiserFS/XFS
  • Multipath software: Device Mapper-Multipath
  • Volume Manager: LVM2
  • Physical servers with HP CCISS controller storage not supported
  • Only 1 NIC can be assigned to an Azure VNET – working to fix this.

Common Source Machines Requirements

  • They must be turned on
  • Remote Desktop or Secure Shell must be turned on if you need access after failover.
  • English naming for: machine name, system path, mount points, device names, and file/folder names

Capacity Planning

In Azure:

  • An A3 CS will protect around 750 source machine volumes – this assumes 3 volumes per source machine, which is around 250 source machines
  • MTS: They offer a Std A4 (16 data disks) and a Std D14 (32 data disks).

One disk on every MTS reserved for retention. This leaves you with:

  • Maximum protected disks per Std A4 = 15
  • Maximum protected disks per Std D14 = 31

Better scale out  with multiple A4 MTS’s. This means you can replicate VMs with 40 volumes to 3 x A4 MTSs. A single source machine cannot replicate to multiple MTS’s (N:1 replication only). Only use a D14 if a single source machine has more than 15 total disks. Remember: use Linux MTS for Linux source machines and Windows MTS for Windows source machines.

Storage Accounts

  • Single MTS can span multiple storage accounts – one for it’s OS and retention disks, one or more for replicated data disks
  • ASR replication as approx a 2.4 IPS multiplier on the Azure subscription. For every source IO, there are 2 IOs on the replicated data disk and .5 IO on the retention disk.
  • Every Azure Storage Account support a max of 20,000 IOPS. Best practice is to have 1 SA (up to 100 in a subscription) for every 8,000-10,000 source machine IOPS – no additional cost to this because you pay for Azure Storage based on GB used (easy to predict) and transactions (hard to predict micropayment).

On Premises Capacity Planning

This is based on your change rate:

image

Migration from VMware to Azure

Yup, you can use this tool to do it. Perform a planned failover and strip away replication and the on-premises stuff.

Technorati Tags: ,,

Reminder: Webinar on ODX for Hyper-V and VAAI for vSphere Storage Enhancement

Here’s a reminder of the webinar by StarWind that I am co-presenting with Max Kolomyeytsev. We’ll be talking about offloading storage operations to a SAN using ODX for Wnidows Server & Hyper-V and VAAI for vSphere. It’s a great piece of functionality and there are some things to know before using it. The session starts at tomorrow at 19:00 UK/IE time, 20:00 CET, and 14:00 EST. Hopefully we’ll see you there!

Register here.

Technorati Tags: ,,,,

Microsoft News – 8 April 2015

There’s a lot of stuff happening now. The Windows Server vNext Preview expires on April 15th and Microsoft is promising a fix … the next preview isn’t out until May (maybe with Ignite on?). There’s rumours of Windows vNext vNext. And there’s talk of open sourcing Windows – which I would hate. Here’s the rest of what’s going on:

Hyper-V

Windows Server

Windows Client

Azure

I Am Co-Hosting A Webinar On ODX/VAAI For Optimising Storage

On April 21st and 2pm ET (USA) or 7PM UK/IE, I will be co-hosting a StarWind Software webinar with Max Kolomyeytsev. I will be talking about using ODX in a Hyper-V scenario, and Max will talk about it (VAAI) from the vSphere perspective.

image

Register here.

VMware Increasing Pricing For Partners

I was forwarded an email today from a VMware distributor that informs VMware authorised partners that their prices are going up.

No customer buys software directly from the big software vendors. Typically the path is either:

  • Manufacturer > Distributor > Reseller > Customer
  • Manufacturer > Large account reseller > Large customer

Each link in the chain (or channel) makes a small percentage. There is a “price list” at the top of the chain, but that is often discounted. Discounts are applied to large deals, and that discount can vary depending on sales targets for the product, what is included in the deal (adding more can sometimes reduce the original price), the time in the sales cycle and the size of the deal. In the case of VMware, few ever pay the prices listed on their website.

This is the email sent out to VMware authorised partners:

image

VMware are reducing those discounts, giving VMware more earnings and reducing the profitability of VMware software to partners.

Do note, that any reseller that has a business plan to make profit from licensing needs to sell A LOT of licenses. Real profits for resellers come in services, not in s/w or tin.

Technorati Tags: ,