Speaker: Abhishek Hemrajani, Principal Lead Program Manger, Azure Site Recovery, Microsoft
There’s a session title!
The Impact of an Outage
The aviation industry has suffered massive outages over the last couple of years costing millions to billions. Big sites like GitHub have gone down. Only 18% of DR investors feel prepared (Forrester July 2017 The State of Business Technology Resiliency. Much of this is due to immature core planning and very limited testing.
Causes of Significant Disasters
- Forrester says 56% of declared disasters are caused by h/w or s/w.
- 38% are because of power failures.
- Only 31% are caused by natural disasters.
- 19% are because of cyber attacks.
Sourced from the above Forrester research.
Challenges to Business Continuity
How Can Azure Help?
The hyper-scale of Azure can help.
- Reduced cost – OpEx utility computing and benefits of hyper-scale cloud.
- Reduced complexity: Service-based solution that has weight of MS development behind it to simplify it.
- Increased compliance: More certifications than anyone.
DR for Azure VMs
Something that AWS doesn’t have. Some mistakenly think that you don’t need DR in Azure. A region can go offline. People can still make mistakes. MS does not replicate your VMs unless you enable/pay for ASR for selected VMs. Is highly certified for compliance including PCI, EU Data Protection, ISO 27001, and many, many more.
- Ensure compliance: No-impact DR testing. Test every quarter or, at least, every 6 months.
- Meet RPO and RTO goals: Backup cannot do this.
- Centralized monitoring and alerting
- “Infrastructure-less” DR sites.
- Pay for what you consume.
- One-click replication
- One-click application recovery (multiple VMs)
Demo: Typical SharePoint Application in Azure
3 tiers in availability sets:
- SQL cluster – replicated to a SQL VM in a target region or DR site (async)
- App – replicated by ASR – nothing running in DR site
- Web – replicated by ASR – nothing running in DR site
- Availability sets – built for you by ASR
- Load balancers – built for you by ASR
- Public IP & DNS – abstract DNS using Traffic Manager
One-Click Replication is new and announced this week. Disaster Recovery (Preview) is an option in the VM settings. All the pre-requisites of the VM are presented in a GUI. You click Enable Replication and all the bits are build and the VM is replicated. You can pick any region in a “geo-cluster”, rather than being restricted to the paired region.
For more than one VM, you might enable replication in the recovery services vault (RSV) and multi-select the VMs for configuration. The replication policy includes recovery point retention and app-consistent snapshots.
New: Multi-VM consistent groups. In preview now, up to 8 VMs. 16 at GA. VMs in a group do their application consistent snapshots at the same time. No other public cloud offers this.
Orchestrate failover. VMs can be grouped, and groups are failed over in order. You can also demand manual tasks to be done, and execute Azure Automation runbooks to do other things like creating load balancer NAT rules, re-configuring DNS abstraction in Traffic Manager, etc. You run the recovery plan to failover …. and to do test failovers.
DR for Hyper-V
You install the Microsoft Azure Recovery Services (MARS) agent on each host. That connects you to the Azure RSV and you can replicate any VM to that host. No on-prem infrastructure required. No connection broker required.
DR for VMware
You must deploy the ASR management appliance in the data centre. MS learned that the setup experience for this is complex. They had a lot of pre-reqs and configurations to install this in a Windows VM. MS will deliver this appliance as an OVF template from now on – familiar format for VMware admins, and the appliance is configured from the Azure Portal. Replicate Linux and Windows VMs to Azure, as with Hyper-V from then on.
Demo: OVF-Based ASR Management Appliance for VMware
A web portal is used to onboard the downloaded appliance:
- Verify the connection to Azure.
- Select a NIC for outbound replication.
- Choose a recovery services vault from your subscription.
- Install any required third-party software, e.g. PowerCLI or MySQL.
- Validate the configuration.
- Configure vCenter/ESXi credentials – this is never sent to Azure, it stays local. The name of the credential that you choose might appear in the Azure portal.
- Then you enter credentials for your Windows/Linux guest OS. This is required to install a mobility service in each VMware VM. This is because VMware doesn’t use VHD/X, it uses VMDK. Again, not sent to MS, but the name of the credential will appear in the Azure Portal when enabling VM replication so you can select the right credentials.
- Finalize configuration.
This will start rolling out next month in all regions.
Comprehensive DR for VMware
Hyper-V can support all Linux distros supported by Azure. On VMware they’re close to all. They’ve added Windows Server 2016, Ubuntu 14.04 and 16.04 , Debian 7/8, managed disks, 4 TB disk support.
Achieve Near-Zero Application Data Loss
- Periodic DR testing of recovery plans – leverage Azure Automation.
- Invoke BCP before disasters if you know it’s coming, e.g. hurricane.
- Take the app offline before the event if it’s a planned failover – minimize risks.
- Failover to Azure.
- Resume the app and validate.
Achieve 5x Improvement in Downtime
Minimize downtime: https://aka.ms/asr_RTO
He shows a slide. One VM took 11 minutes to failover. Others took around/less than 2 minutes using the above guidance.
Demo: Broad OS Coverage, Azure Features, UEFI Support
He shows Ubunu, CentOS, Windows Server, and Debian replicating from VMware to Azure. You can failover from VMware to Azure with UEFI VMs now – but you CANNOT failback. The process converts the VM to BIOS in Azure (Generation 1 VMs). OK if there’s no intention to failback, e.g. migration to Azure.
Customer Success Story – Accenture
They deployed ASR. Increased availability. 53% reduction in infrastructure cost. 3x improvement in RPO. Savings in work and personal time. Simpler solution and they developed new cloud skills.
They get a lot of alerts at the weekend when there’s any network glitches. Could be 500 email alerts.
Demo: New Dashboard & Comprehensive Monitoring
Brand new RSV experience for ASR. Lots more graphical info:
- Replication health
- Failover test success
- Configuration issues
- Recovery plans
- Error summary
- Graphical view of the infrastructure: Azure, VMware, Hyper-V. This shows the various pieces of the solution, and a line goes red when a connection has a failure.
- Jobs summary
All of this is on one screen.
He clicks on an error and sees the hosts that are affected. He clicks on “Needs Attention” in one of the errors. A blade opens with much more information.
We can see replication charts for a VM and disk – useful to see if VM change is too much for the bandwidth or the target storage (standard VS premium). The disk level view might help you ID churn-heavy storage like a page file that can be excluded from replication.
A message digest will be sent out at the end of the day. This data can be fed into OMS.
Some guest speakers come up from Rackspace and CDW. I won’t be blogging this.
- When are things out: News on the ASR blog in October
- The Hyper-V Planner is out this week, and new cost planners for Hyper-V and VMware are out this week.
- Failback of managed disks is there for VMware and will be out by end of year for Hyper-V.