Backup Versus Resiliency Versus Disaster Recovery

Most of us are no strangers to the backup versus disaster recovery conversation. Each is a different problem, typically (but not always) with different business expectations. Lately, resiliency has crawled into the mix, and a lot of social media commentary isn’t helping. In this post, I’m going to explain how I define backup, resiliency, and disaster recovery, and discuss how they impact my Azure designs for service & data availability.

Essential Terminology

There are two essential terms that we have to understand to discuss these problems/solutions:

  • RPO: The recovery point objective is how much data, measured in time, is lost when our solution kicks in.
  • RTO: The recovery time objective is how long, measured in time, services are offline while the solution kicks in.

Backup/Restore

A backup is when we take a copy of our data and (ideally) store that copy elsewhere, and even in several places. The concept is that we can restore our data from a backup if the original data (files, database, VM files, etc) are deleted either accidentally or deliberately.

The base product for backup in Azure is Azure Backup, which supports:

  • Azure VMs
  • Managed disks
  • Azure Files
  • SQL Server in Azure VMs
  • SAP HANA databases in Azure VMs
  • Azure Database for PostgreSQL servers
  • Azure Blobs
  • Azure Database for PostgreSQL Flexible server
  • Azure Kubernetes service
  • Azure Database for MySQL – Flexible Server
  • SAP ASE (Sybase) database on Azure VMs
  • Azure Data Lake Storage
  • Azure Elastic SAN

Quite honestly, that list is much longer than the last time I searched for it! Azure Backup covers a lot, but it doesn’t cover everything. Some solutions, like Azure SQL, feature their own backup solution.

It’s not unusual for people to bring another backup tool to Azure. The one I hear most of is Veeam Backup for Microsoft Azure. While I’ve never used Veeam hands-on, its reputation is excellent, and it has the unique ability to be platform agnostic. Want to restore VMs from Azure to Hyper-V, Nutanix, or VMware if you’re that way inclined ;)? You can with Veeam.

  • RPO: Backup features the longest RPO here. The data loss is depdendent on how often your backup jobs run. Daily backups? You can lose up to 24 hours of data. Backups every 15 minutes? You might lose up to 15 minutes of data.
  • RTO: This is where the pain can be; the RTO is how long it takes to copy your data from the backup storage to the production storage? Restoring an Azure Backup snapshot recovery point is a disk-to-disk copy. Restoring a 10 TB VM from blob storage over the network is going to be a long wait.

Disaster Recovery (DR)

The purpose of DR is to recover from a disaster. Let’s define what a disaster could be using real examples:

  • Hurricane Katrina was a natural disaster that wiped out huge areas of the USA in 2005.
  • The “black summer” bushfires in Australia destroyed millions of hectares of land in 2019-2020.
  • The Indian Ocean Tsunami in 2004 caused devastation in the coastal areas of many countries.
  • Keeping it local for me: post-storm winter floods have caused widespread damage throughout Ireland in the last few years.
  • Three AWS data centres were hit by drone attacks in the UAE & Bahrain in March of this year.

Disasters can be natural or they can be man-made. Disasters rarely target 1 building; they wipe out an area. They are rare – but they happen. There is another kind of disaster, which few think about:

  • KNP Logistics Group, a 125-year-old UK transport firm with over 700 employees, was put out of business because of a ransomware attack in 2025.

Pending (and passed in some countries) EU regulations (NIS2) consider this a disaster that subject organisations must be prepared for.

For cloud planning, if we need to prepare for disaster recovery, then we must plan for the loss of the Azure region ny replicating services/data to another region – typically the paired region. There is no one solution, and there are plenty of complicating factors. Techs that will be in scope include:

  • Azure Site Recovery (ASR) for Azure VMs
  • Geo-redundant storage (GRS) and the various geo-variants
  • PaaS resources that include GRS
  • Database replication
  • DevOps pipelines/workflows to redeploy resources (but not data)

There is a fun grey area here. Veeam is not only a backup solution; it is also a DR solution! You will also find that some people use backup as a budget DR solution – they replicate data from the primary location to the secondary location (Azure Backup Geo-Redundant). The right solution for your organisation is often based on business requirements and budget, with budget being the big elephant in the room.

  • RPO: DR replication is typically based on asynchronous replication. The RPO is often measured in seconds/minutes.
  • RTO: The RTO really is dependent on the complexity of services, the quantity of services to restore, the interdependencies, and how automated the process is once it starts. The RTO should be measured in hours, but a backup solution might be measured in days/weeks.

Resilience

The purpose of resilience is to enable a service to survive a localised issue, such as:

  • A VM crashes.
  • Microsoft are patching an App Service compute instance.
  • An Azure host is getting a firmware update.
  • Microsoft had a networking issue in a single data centre building.

We use resilience to keep the service operational with no perceivable outage to the service consumer. There are many ways to tackle resilience, but they are all based on scaling out:

  • Availability Zones: Most Azure regions have multiple data centre buildings. The buildings are split into what we see as 3 Availability Zones. Each Availability Zone has independent external network connections, power, and cooling. The theory is that if I spread the tier of a service across 3 zones, then that tier can survive 2 zones going offline. Some PaaS services, like Bastion, default to using zones; some have to be opted in. Beware of some PaaS resources, like App Service Environment, that have minimum consumption requirements to be placed across Availability Zones.
  • Availability Sets: If we cannot use Availability Zones (more later on this), then we can place virtual machines in Availability Sets. We can think of Availability Sets as a form of anti-affinity; machines in the same set are placed into different update domains (Azure platform updates) and fault domains (racks) in the same room in the same data centre. Microsoft does this for multi-instance PaaS services that are not using Availability Zones.
  • Zone Redundant Storage (ZRS): Azure storage is based on the concept of storing each block 3 times. ZRS places the replica blocks across 3 different Availability Zones. Your data remains operational even if 2 of the data centres are lost.

There are many architectural considerations to handle when you start resiliency planning.

The old pain-in-the-a** is the legacy line-of-business app that supports just a single VM. There is no scaling out to gain resiliency. Traditionally, VMs used LRS (locally redundant storage) managed disks. LRS managed disks are stored in a single data centre with the VM. There have been issues in the past where storage in a single room has gone offline, taking all three LRS replicas of the disks’ blocks offline. You can choose to use ZRS managed disks. The VM will continue to primarily use the local replica, but two replicas are stored in other Availability Zones in the same region. If the primary storage cluster goes offline, you can perform a manual process to get the VM back online with another replica.

  • RPO: Depending on the architecture and technologies, there is either a zero-RPO (active/active services) or an RPO of a few seconds (replicated storage).
  • RTO: In most cases, the RTO is 0. The one exception that I can think of is the single VM with a ZRS where the RTO is how long it takes you to force-detach the disk and create a new VM with the existing disks in another Availability Zone.

By the way, there are whole areas on networking resiliency that I could type about for hours too!

Confusion

As I have alluded to, I’ve seen some discussions on LinkedIn recently stating that Availability Zones can be used for disaster recovery. They could. Can they? Should they?

What is the disaster that you are planning for? If it’s any of the above natural disasters then I would argue that spreading your services/data across data centres located beside each other is going to lead to a sudden career-ending meeting.

Don’t give me the “Availability Zones are spread apart from each other” line. Suuuuure they are – except any of the ones that I’ve located on Google Maps, such as North Europe, West Europe, or US East to begin with.

Now, let’s get on with the practical realities of following the concept of using Availability Zones for DR. When was the last time you tried to deploy Azure VMs across Availability Zones? What about a firewall? Or App Services? Did you get an “Internal Service Error”, a weird quota error, or at least some helpful message to inform you that there was no capacity in “zone 2”? That’s been my experience for the last 14+ months in any regions that I’ve worked in. So, don’t recommend me to use a technology for emergency DR if I cannot even use it for operational resiliency!

Yes, I know that capacity issues also impact inter-region DR designs. If West Europe were to be flooded, you can be all but sure that you are not getting into North Europe thanks to the instant massive demand from many customers. I know that’s an unlikely scenario – but it’s one that some organisations must plan for. For example, I had a central government customer ask me about Azure region choice. The country in question has an “aggressive” neighbour to the east that likes to wage war on its neighbours. The local Microsoft office asked them to move into the new local Azure region soon after Ukraine was invaded. I asked the customer: “Where would Ukraine be now if all of its IT services were based in a local Azure region under 300 KM from Russia?” I’d extend that with a follow-up question now: “What if you used Availability Zones in that single region for DR?” Yes, the scenario is real – see above. Or consider if a hurricane reached Boydton in Virginia, USA, or a bushfire ran rampant in New South Wales/Victoria, Australia.

Before you go planning, please:

  • Understand the risks you are planning for
  • Have a budget
  • Understand the technologies
  • Comprehend how or if the technologies counter the risks
  • If the technologies are available to you at all!

Microsoft Ignite 2018: Implement Cloud Backup & Disaster Recovery At Scale in Azure

Speakers: Trinadh Kotturu, Senthuran Sivananthan, & Rochak Mittal

Site Recovery At Scale

Senthuran Sivananthan

WIN_20180927_14_18_30_Pro

Real Solutions for Real Problems

Customer example: Finastra.

  1. BCP process: Define RPO/RTO. Document DR failover triggers and approvals.
  2. Access control: Assign clear roles and ownership. Levarage ASR built-in roles for RBAC. Different RS vault for different BU/tenants. They deployed 1 RSV per app to do this.
  3. Plan your DR site: Leveraged region pairs – useful for matching GRS replication of storage. Site connectivity needs to be planned. Pick the primary/secondary regions to align service availability and quota availability – change the quotas now, not later when you invoke the BCP.
  4. Monitor: Monitor replication health. Track configuration changes in environment – might affect recovery plans or require replication changes.
  5. DR drills: Periodically do test failovers.

Journey to Scale

  • Automation: Do things at scale
  • Azure Policy: Ensure protection
  • Reporting: Holistic view and application breakdown
  • Pre- & Post- Scripts: Lower RTO as much as possible and eliminate human error

Demos – ASR

Rochak for demos of recent features. Azure Policies coming soon.

WIN_20180927_14_33_20_Pro

Will assess if VMs are being replicated or not and display non-compliance.

Expanding the monitoring solution.

Demo – Azure Backup & Azure Policy

Trinadh creates an Azure Policy and assigns it to a subscription. He picks the Azure Backup policy definition. He selects a resource group of the vault, selects the vault, and selects the backup policy from the vault. The result is that any VM within the scope of the policy will automatically be backed up to the selected RSV with the selected policy.

Azure Backup & Security

Supports Azure Disk Encryption. KEK and BEK are backed up automatically.

AES 256 protects the backup blobs.

Compliance

  • HIPAA
  • ISO
  • CSA
  • GDPR
  • PCI-DSS
  • Many more

Built-in Roles

Cumulative:

  • Backup reader – see only
  • Backup Operator: Enable backup & restore
  • Backup contributor: Policy management and Delete-Stop Backup

Protect the Roles

PIM can be used to guard the roles – protect against rogue admins.

  • JIT access
  • MFA
  • Multi-user approval

Data Security

  • PIN protection for critical actions, e.g. delete
  • Alert: Notification on critical actions
  • Recovery: Data kept for 14 days after delete. Working on blob soft delete

Backup Center Demo

Being built at the moment. Starting with VMs now but will include all backup items eventually.

WIN_20180927_15_06_47_Pro

All RSVs in the tenant (doh!) managed in a central place.

Aimed at the large enterprise.

They also have Log Analytics monitoring if you like that sort of thing. I’m not a fan of LA – I much prefer Azure Monitor.

Reporting using Power BI

Trinadh demos a Power BI reporting solution that unifies backup data from multiple tenants into a single report.

Cloud Mechanix – “Starting Azure Infrastructure” Training Coming To Frankfurt, Germany

I have great news. Today I got confirmation that our venue for the next Cloud Mechanix class has been confirmed. So on December 3-4, I will be teaching my Cloud Mechanix “Starting Azure Infrastructure” class in Frankfurt, Germany. Registration Link.

Buy Ticket

About The Event

This HANDS-ON theory + practical course is intended for IT professionals and developers that wish to start working with or improve their knowledge of Azure virtual machines. The course starts at the very beginning, explaining what Azure is (and isn’t), administrative concepts, and then works through the fundamentals of virtual machines before looking at more advanced topics such as security, high availability, storage engineering, backup, disaster recovery, management/alerting, and automation.

Aidan has been teaching and assisting Microsoft partners in Ireland about Microsoft Azure since 2014. Over this time he has learned what customers are doing in Azure, and how they best get results. Combined with his own learning, and membership of the Microsoft Valuable Professional (MVP) program for Microsoft Azure, Aidan has a great deal of knowledge to share.

We deliberately keep the class small (maximum of 20) to allow for a more intimate environment where attendees can feel free to interact and ask questions.

Agenda

This course spans two days, running on December 3-4, 2018. The agenda is below.

Day 1 (09:30 – 17:00):

  • Introducing Azure
  • Tenants & subscriptions
  • Azure administration
  • Admin tools
  • Intro to IaaS
  • Storage
  • Networking basics

Day 2 (09:30 – 17:00):

  • Virtual machines
  • Advanced networking
  • Backup
  • Disaster recovery
  • JSON
  • Diagnostics
  • Monitoring & alerting
  • Security Center

The Venue

The location is the Novotel Frankfurt City. This hotel:

  • Has very fast Wi-Fi – an essential requirement for hands-on cloud training!
  • Reasonably priced accommodation.
  • Has car parking – which we are paying for.
  • Is near the Messe (conference centre) and is beside the Kuhwaldstraße tram station and the Frankfurt Main West train station and S-Bahn.
  • Is just a 25 minute walk or 5 minutes taxi from the Hauptbahnhof (central train station).
  • It was only 15-20 minutes by taxi to/from Frankfurt Airport when we visited the hotel to scout the location.

image

Costs

The regular cost for this course is €999 per person. If you are registering more than one person, then the regular price will be €849 per person. A limited number of early bird ticks are on sale for €659 each.

You can pay for for the course by credit card (handled securely by Stripe) or PayPal on the official event site. You can also pay by invoice/bank transfer by emailing contact@cloudmechanix.com. Payment must be received within 21 days of registration – please allow 14 days for an international (to Ireland) bank transfer. We require the following information for invoice & bank transfer payment:

  • The name and contact details (email and phone) for the person attending the course.
  • Name & address of the company paying the course fee.
  • A purchase Order (PO) number, if your company require this for services & purchases.

The cost includes tea/coffee and lunch. Please inform us in advance if you have any dietary requirements.

Note: Cloud Mechanix is a registered education-only company in the Republic of Ireland and does not charge for or pay for VAT/sales tax.

See the event page for Terms and Conditions.

Buy Ticket

Windows Server 2019 Announced for H2 2018

Last night, Microsoft announced that Windows Server 2019 would be released, generally available, in the second half of 2018. I suspect that the big bash will be Ignite in Orlando at the end of September, possibly with a release that week, but maybe in October – that’s been the pattern lately.

LTSC

Microsoft is referring to WS2019 as a “long term servicing channel release”. When Microsoft started the semi-annual channel, a Server Core build of Windows Server released every 6 months to Software Assurance customers that opt into the program, they promised that the normal builds would continue every 3 years. These LTSC releases would be approximately the sum of the previous semi-annual channel releases plus whatever new stuff they cooked up before the launch.

First, let’s kill some myths that I know are being spread by “someone I know that’s connected to Microsoft” … it’s always “someone I know” that is “connected to Microsoft” and it’s always BS:

  • The GUI is not dead. The semi-annual channel release is Server Core, but Nano is containers only since last year, and the GUI is an essential element of the LTSC.
  • This is not the last LTSC release. Microsoft views (and recommends) LTSC for non-cloud-optimised application workloads such as SQL Server.
  • No – Windows Server is not dead. Yes, Azure plays a huge role in the future, but Azure Stack and Azure are both powered by Windows, and hundreds of thousands, if not millions, of companies still are powered by Windows Server.

Let’s talk features now …

I’m not sure what’s NDA and what is not, so I’m going to stick with what Microsoft has publicly discussed. Sorry!

Project Honolulu

For those of you who don’t keep up with the tech news (that’s most IT people), then Project Honolulu is a huge effort by MS to replace the Remote Server Administration Toolkit (RSAT) that you might know as “Administrative Tools” on Windows Server or on an admin PC. These ancient tools were built on MMC.EXE, which was deprecated with the release of W2008!

Honolulu is a whole new toolset built on HTML5 for today and the future. It’s not finished – being built with cloud practices, it never will be – but but’s getting there!

Hybrid Scenarios

Don’t share this secret with anyone … Microsoft wants more people to use Azure. Shh!

Some of the features we (at work) see people adopt first in the cloud are the hybrid services, such as Azure Backup (cloud or hybrid cloud backup), Azure Site Recovery (disaster recovery), and soon I think Azure File Sync (seamless tiered storage for file servers) will be a hot item. Microsoft wants it to be easier for customers to use these services, so they will be baked into Project Honolulu. I think that’s a good idea, but I hope it’s not a repeat of what was done with WS2016 Essentials.

ASR needs more than just “replicate me to the cloud” enabled on the server; that’s the easy part of the deployment that I teach in the first couple of hours in a 2-day ASR class. The real magic is building a DR site, knowing what can be replicated and what cannot (see domain controllers & USN rollback, clustered/replicating databases & getting fired), orchestration, automation, and how to access things after a failover.

Backup is pretty easy, especially if it’s just MARS. I’d like MARS to add backup-to-local storage so it could completely replace Windows Server Backup. For companies with Hyper-V, there’s more to be done with Azure Backup Server (MABS) than just download an installer.

Azure File Sync also requires some thought and planning, but if they can come up with some magic, I’m all for it!

Security

In Hyper-V:

  • Linux will be supported with Shielded VMs.
  • VMConnect supported is being added to Shielded VMs for support reasons – it’s hard to fix a VM if you cannot log into it via “console” access.
  • Encrypted Network Segments can be turned on with a “flip of a switch” for secure comms – that could be interesting in Azure!

Windows Defender ATP (Advanced Threat Protection) is a Windows 10 Enterprise feature that’s coming to WS2019 to help stop zero-day threats.

DevOps

The big bet on Containers continues:

  • The Server Core base image will be reduced from 5GB by (they hope) 72% to speed up deployment time of new instances/apps.
  • Kubernetes orchestration will be natively supported – the container orchestrator that orginated in Google appears to be the industry winner versus Docker and Mesos.

In the heterogeneous world, Linux admins will be getting Windows Subsystem on Linux (WSL) for a unified scripting/admin experience.

Hyper-Converged Infrastructure (HCI)

Storage Spaces Direct (S2D) has been improved and more changes will be coming to mature the platform in WS2019. In case you don’t know, S2D is a way to use local (internal) disks in 2+ (preferably 4+) Hyper-V hosts across a high speed network (virtual SAS bus) to create a single cluster with fault tolerance at the storage and server levels. By using internal disks, they can use cheaper SATA disks, as well as new flash formats don’t natively don’t support sharing, such as NVME.

The platform is maturing in WS2019, and Project Honolulu will add a new day-to-day management UI for S2D that is natively lacking in WS2016.

The Pricing

As usual, I will not be answering any licensing/pricing questions. Talk to the people you pay to answer those questions, i.e. the reseller or distributor that you buy from.

OK; let’s get to the messy stuff. Nothing has been announced other than:

It is highly likely we will increase pricing for Windows Server Client Access Licensing (CAL). We will provide more details when available.

So it appears that User CALs will increase in pricing. That is probably good news for anyone licensing Windows Server via processor (don’t confuse this with Core licensing).

When you acquire Windows Server through volume licensing, you pay for every pair of cores in a server (with a minimum of 16, which matched the pricing of WS2012 R2), PLUS you buy User CALs for every user authenticating against the server(s).

When you acquire Windows Server via Azure or through a hosting/leasing (SPLA) program, you pay for Windows Server based only on how many cores that the machine has. For example, when I run an Azure virtual machine with Windows Server, the per-minute cost of the VM includes the cost of Windows Server, and I do not need any Windows Server CALs to use it (RDS is a different matter).

If CALs are going up in price, then it’s probably good news for SPLA (hosting/leasing) resellers (hosting companies) and Azure where Server CALs are not a factor.

The Bits

So you want to play with WS2019? The first preview build (17623) is available as of last night through the Windows Server Insider Preview program. Anyone can sign up.

image

Would You Like To Learn About Azure Infrastructure?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Azure-to-Azure Site Recovery Fails – Connection Cannot Be Established

In this post, I’ll explain how to fix the following errors when you attempt to replicate an Azure virtual machine from one Azure Region to another:

Error 151072: Connection cannot be established to Azure Site Recovery service endpoints.

And:

Error 539: The requested action couldn’t be performed by the ‘A2A’ Replication Provider.

The Cause

A2ASR (the abbreviation of the ASR service for Azure VMs) uses an extension (guest OS agent) called the Mobility Service to migrate disk contents from a source virtual machine to a target (secondary) region (or DR site). The Mobility Service is using the networking of the virtual machine to talk the ASR endpoints in the secondary region. That traffic is therefore going over the NIC and virtual network of the VM, and then to the target region via the Azure backbone.

if you have restricted outbound traffic for your virtual machines, then you might have blocked this traffic:

  • Third party firewall appliances
  • Using Network Security Groups (NSGs), as I documented here

The Fix

Woops! Don’t worry, you’ve already created exceptions to allow your virtual machine to boot up. You can create more exceptions to allow the virtual machines to talk to the ASR endpoints (see the below screenshot). Let’s imagine that I am replicating from North Europe to West Europe.

 

image

I’ll need at least one set of rules, enabling outbound traffic from my VNet/NICs in the source region, North Europe, to the two IP addresses of the target region, West Europe.

I will also have to enable inbound traffic from my target region, West Europe, to my destination region, North Europe. Why? Isn’t all my traffic going from North Europe to West Europe? That’s true – now. But if you failover to West Europe, you will need to reverse replication afterwards, so you might as well get things right now.

A Script

It all looks messy at first. It probably isn’t too bad. But if you’d like to deploy a canned script to update NSGs, you can. Microsoft has shared a script that you can run. You will need a few pieces of information:

  • NSG name
  • NSG resource group name
  • Subscription ID
  • Source region
  • Target region

Run the script (it will prompt you to log in) from source to target, and then reverse the details, treating the target as the source, and vice versa with the NSG(s) in the DR site.

Where’s the Service Tags?

Storage accounts and Azure SQL all have service accounts, but ASR does not. I believe that ASR should have service tags to avoid all of this IP messiness. If you agree, vote here, or forever stay quiet on the subject.

Was This Kind of Information Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Replicate VM Managed Disks Between Azure Regions

Last week, Microsoft announced that Azure Site Recovery (ASR) for Azure Virtual Machines (in preview still), the system for replicating Azure virtual machines from one region to another, added support for managed disks. To this I say …

Waaahoooooo!

Managed disks are the best way to deploy Azure VM storage because they’re easier to plan for (performance), have predictable pricing (Standard), and have way more management features. Unfortunately, I still found myself advising some customers to use un-managed disks (disks in storage accounts) because those customers needed to be able to replicate VMs from one region to another, e.g. North Europe to West Europe.

But now we have support for managed disks in the preview replication service.

All is not entirely rosy. I’ve been waiting on this feature for this web server since before a “non-“hurricane hit Ireland late last year. I tried to enable the feature (nice experience in the Azure portal, btw) but the replication fails because of a weird “disk.name” error. I’ve reported the issue and hopefully it’ll be fixed.

Would You Like To Learn How To Enable This Feature?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

I Am Running My “Starting Azure Infrastructure” Course in London on Feb 22/23

I am delighted to announce the dates of the first delivery of my own bespoke Azure training in London, UK, on February 21st and 22nd. All the details can be found here.

In my day job, I have been teaching Irish Microsoft partners about Azure for the past three years, using training materials that I developed for my employer. I’m not usually one to brag, but we’ve been getting awesome reviews on that training and it has been critical to us developing a fast growing Azure market. I’ve tweeted about those training activities and many of my followers have asked about the possibility of bringing this training abroad.

So a new venture has started, with brand new training, called Cloud Mechanix. With this business, I am bringing brand-new Azure training to the UK and Europe.  This isn’t Microsoft official training – this is my real world, how-to, get-it-done training, written and presented by me. We are keeping the classes small – I have learned that this makes for a better environment for the attendees. And best of all – the cost is low. This isn’t £2,000 training. This isn’t even £1,000 training.

The first course is booked and will be running in London (quite central) on Feb 22-23. It’s a 2-day “Starting Azure Infrastructure” course that will get noobies to Azure ready to deploy solutions using Azure VMs. And experience has shown that my training also teaches a lot to those that think they already know Azure VMs. You can learn all about this course, the venue, dates, costs, and more here.

I’m excited by this because this is my business (with my wife as partner). I’ve had friends, such as Mark Minasi, telling me to do this for years. And today, I’m thrilled to make this happen. Hopefully some of you will be too and register for this training Smile

Application-Aware Disaster Recovery For VMware, Hyper-V, and Azure IaaS VMs with Azure Site Recovery

Speaker: Abhishek Hemrajani, Principal Lead Program Manger, Azure Site Recovery, Microsoft

There’s a session title!

The Impact of an Outage

The aviation industry has suffered massive outages over the last couple of years costing millions to billions. Big sites like GitHub have gone down. Only 18% of DR investors feel prepared (Forrester July 2017 The State of Business Technology Resiliency. Much of this is due to immature core planning and very limited testing.

Causes of Significant Disasters

  • Forrester says 56% of declared disasters are caused by h/w or s/w.
  • 38% are because of power failures.
  • Only 31% are caused by natural disasters.
  • 19% are because of cyber attacks.

Sourced from the above Forrester research.

Challenges to Business Continuity

  • Cost
  • Complexity
  • Compliance

How Can Azure Help?

The hyper-scale of Azure can help.

  • Reduced cost – OpEx utility computing and benefits of hyper-scale cloud.
  • Reduced complexity: Service-based solution that has weight of MS development behind it to simplify it.
  • Increased compliance: More certifications than anyone.

DR for Azure VMs

Something that AWS doesn’t have. Some mistakenly think that you don’t need DR in Azure. A region can go offline. People can still make mistakes. MS does not replicate your VMs unless you enable/pay for ASR for selected VMs. Is highly certified for compliance including PCI, EU Data Protection, ISO 27001, and many, many more.

  • Ensure compliance: No-impact DR testing. Test every quarter or, at least, every 6 months.
  • Meet RPO and RTO goals: Backup cannot do this.
  • Centralized monitoring and alerting

Cost effective:

  • “Infrastructure-less” DR sites.
  • Pay for what you consume.

Simple:

  • One-click replication
  • One-click application recovery (multiple VMs)

Demo: Typical SharePoint Application in Azure

3 tiers in availability sets:

  • SQL cluster – replicated to a SQL VM in a target region or DR site (async)
  • App – replicated by ASR – nothing running in DR site
  • Web – replicated by ASR – nothing running in DR site
  • Availability sets – built for you by ASR
  • Load balancers – built for you by ASR
  • Public IP & DNS – abstract DNS using Traffic Manager

One-Click Replication is new and announced this week. Disaster Recovery (Preview) is an option in the VM settings. All the pre-requisites of the VM are presented in a GUI. You click Enable Replication and all the bits are build and the VM is replicated. You can pick any region in a “geo-cluster”, rather than being restricted to the paired region.

For more than one VM, you might enable replication in the recovery services vault (RSV) and multi-select the VMs for configuration. The replication policy includes recovery point retention and app-consistent snapshots.

New: Multi-VM consistent groups. In preview now, up to 8 VMs. 16 at GA. VMs in a group do their application consistent snapshots at the same time. No other public cloud offers this.

Recovery Plans

Orchestrate failover. VMs can be grouped, and groups are failed over in order. You can also demand manual tasks to be done, and execute Azure Automation runbooks to do other things like creating load balancer NAT rules, re-configuring DNS abstraction in Traffic Manager, etc. You run the recovery plan to failover …. and to do test failovers.

DR for Hyper-V

You install the Microsoft Azure Recovery Services (MARS) agent on each host. That connects you to the Azure RSV and you can replicate any VM to that host. No on-prem infrastructure required. No connection broker required.

DR for VMware

You must deploy the ASR management appliance in the data centre. MS learned that the setup experience for this is complex. They had a lot of pre-reqs and configurations to install this in a Windows VM. MS will deliver this appliance as an OVF template from now on – familiar format for VMware admins, and the appliance is configured from the Azure Portal. Replicate Linux and Windows VMs to Azure, as with Hyper-V from then on.

Demo: OVF-Based ASR Management Appliance for VMware

A web portal is used to onboard the downloaded appliance:

  1. Verify the connection to Azure.
  2. Select a NIC for outbound replication.
  3. Choose a recovery services vault from your subscription.
  4. Install any required third-party software, e.g. PowerCLI or MySQL.
  5. Validate the configuration.
  6. Configure vCenter/ESXi credentials – this is never sent to Azure, it stays local. The name of the credential that you choose might appear in the Azure portal.
  7. Then you enter credentials for your Windows/Linux guest OS. This is required to install a mobility service in each VMware VM. This is because VMware doesn’t use VHD/X, it uses VMDK. Again, not sent to MS, but the name of the credential will appear in the Azure Portal when enabling VM replication so you can select the right credentials.
  8. Finalize configuration.

This will start rolling out next month in all regions.

Comprehensive DR for VMware

Hyper-V can support all Linux distros supported by Azure. On VMware they’re close to all. They’ve added Windows Server 2016, Ubuntu 14.04 and 16.04 , Debian 7/8, managed disks, 4 TB disk support.

Achieve Near-Zero Application Data Loss

Tips:

  • Periodic DR testing of recovery plans – leverage Azure Automation.
  • Invoke BCP before disasters if you know it’s coming, e.g. hurricane.
  • Take the app offline before the event if it’s a planned failover – minimize risks.
  • Failover to Azure.
  • Resume the app and validate.

Achieve 5x Improvement in Downtime

Minimize downtime: https://aka.ms/asr_RTO

He shows a slide. One VM took 11 minutes to failover. Others took around/less than 2 minutes using the above guidance.

Demo: Broad OS Coverage, Azure Features, UEFI Support

He shows Ubunu, CentOS, Windows Server, and Debian replicating from VMware to Azure. You can failover from VMware to Azure with UEFI VMs now – but you CANNOT failback. The process converts the VM to BIOS in Azure (Generation 1 VMs). OK if there’s no intention to failback, e.g. migration to Azure.

Customer Success Story – Accenture

They deployed ASR. Increased availability. 53% reduction in infrastructure cost. 3x improvement in RPO. Savings in work and personal time. Simpler solution and they developed new cloud skills.

They get a lot of alerts at the weekend when there’s any network glitches. Could be 500 email alerts.

Demo: New Dashboard & Comprehensive Monitoring

Brand new RSV experience for ASR. Lots more graphical info:

  • Replication health
  • Failover test success
  • Configuration issues
  • Recovery plans
  • Error summary
  • Graphical view of the infrastructure: Azure, VMware, Hyper-V. This shows the various pieces of the solution, and a line goes red when a connection has a failure.
  • Jobs summary

All of this is on one screen.

He clicks on an error and sees the hosts that are affected. He clicks on “Needs Attention” in one of the errors. A blade opens with much more information.

We can see replication charts for a VM and disk – useful to see if VM change is too much for the bandwidth or the target storage (standard VS premium). The disk level view might help you ID churn-heavy storage like a page file that can be excluded from replication.

A message digest will be sent out at the end of the day. This data can be fed into OMS.

Some guest speakers come up from Rackspace and CDW. I won’t be blogging this.

Questions

  • When are things out: News on the ASR blog in October
  • The Hyper-V Planner is out this week, and new cost planners for Hyper-V and VMware are out this week.
  • Failback of managed disks is there for VMware and will be out by end of year for Hyper-V.

Speaking At European SharePoint, Office 365 & Azure Conference 2017

I will be speaking at this year’s European SharePoint, Office 365, and Azure Conference, which is being held in the National Conference Center in Dublin between 13-16 November. I’ll be talking about Azure Site Recovery (ASR):

image

It’s a huge event with lots of tracks, content and speakers from around the world.

 

For those of you in Ireland, this is a rare opportunity to attend a Microsoft-focused conference of such a scale here in Ireland.