Microsoft Ignite 2018: Implement Cloud Backup & Disaster Recovery At Scale in Azure

Speakers: Trinadh Kotturu, Senthuran Sivananthan, & Rochak Mittal

Site Recovery At Scale

Senthuran Sivananthan

WIN_20180927_14_18_30_Pro

Real Solutions for Real Problems

Customer example: Finastra.

  1. BCP process: Define RPO/RTO. Document DR failover triggers and approvals.
  2. Access control: Assign clear roles and ownership. Levarage ASR built-in roles for RBAC. Different RS vault for different BU/tenants. They deployed 1 RSV per app to do this.
  3. Plan your DR site: Leveraged region pairs – useful for matching GRS replication of storage. Site connectivity needs to be planned. Pick the primary/secondary regions to align service availability and quota availability – change the quotas now, not later when you invoke the BCP.
  4. Monitor: Monitor replication health. Track configuration changes in environment – might affect recovery plans or require replication changes.
  5. DR drills: Periodically do test failovers.

Journey to Scale

  • Automation: Do things at scale
  • Azure Policy: Ensure protection
  • Reporting: Holistic view and application breakdown
  • Pre- & Post- Scripts: Lower RTO as much as possible and eliminate human error

Demos – ASR

Rochak for demos of recent features. Azure Policies coming soon.

WIN_20180927_14_33_20_Pro

Will assess if VMs are being replicated or not and display non-compliance.

Expanding the monitoring solution.

Demo – Azure Backup & Azure Policy

Trinadh creates an Azure Policy and assigns it to a subscription. He picks the Azure Backup policy definition. He selects a resource group of the vault, selects the vault, and selects the backup policy from the vault. The result is that any VM within the scope of the policy will automatically be backed up to the selected RSV with the selected policy.

Azure Backup & Security

Supports Azure Disk Encryption. KEK and BEK are backed up automatically.

AES 256 protects the backup blobs.

Compliance

  • HIPAA
  • ISO
  • CSA
  • GDPR
  • PCI-DSS
  • Many more

Built-in Roles

Cumulative:

  • Backup reader – see only
  • Backup Operator: Enable backup & restore
  • Backup contributor: Policy management and Delete-Stop Backup

Protect the Roles

PIM can be used to guard the roles – protect against rogue admins.

  • JIT access
  • MFA
  • Multi-user approval

Data Security

  • PIN protection for critical actions, e.g. delete
  • Alert: Notification on critical actions
  • Recovery: Data kept for 14 days after delete. Working on blob soft delete

Backup Center Demo

Being built at the moment. Starting with VMs now but will include all backup items eventually.

WIN_20180927_15_06_47_Pro

All RSVs in the tenant (doh!) managed in a central place.

Aimed at the large enterprise.

They also have Log Analytics monitoring if you like that sort of thing. I’m not a fan of LA – I much prefer Azure Monitor.

Reporting using Power BI

Trinadh demos a Power BI reporting solution that unifies backup data from multiple tenants into a single report.

Microsoft Ignite 2018–Azure Migrate

I arrived late for this session because I was in a meeting. They were doing a demo of Azure Migrate.

Azure Migrate fo Discovery And Assessment

  • Agentless discovery
  • TCO calculation
  • Right-size and suitability
  • Azure Platform

The are “announcing” support for Hyper-V – it’s still in limited private preview.

Third Party Solutions

Cloudamize is just an assessment tool

  • Indepth performance analysis
  • Right-size compute and stoage options.
  • TCO calculations
  • Agentless
  • Assessments for migration to Azure SQL
  • Integrates into ASR to do the migration

Migration solutions:

  • ASR
  • Zerto
  • CloudEndure

Azure Site Recovery (ASR)

  • Easy to onboard – appliance wizard for VMware
  • Broad coverage for Windows and Linux
  • UEFI support for VMware and physical machines – converted to BIOS
  • W2008 32-bit support

They do a demo of Zerto for migrations. Then they demo CloudEndure.

Futures

They’re trying to simplify the process. Starting a limited private preview:

Assess > Migrate & modernize > optimize > secure & manage.

Going to use the new tabbed UI in the Azure Portal. You can import and assessment into a migration. Pick the ready machines that you want to migrate, optionally apply HUB and overrise VM sizing, OS disk, and availability set membership. This migration experience will ideally be used by the 3rd parties too.

Windows Server 2019 Announced for H2 2018

Last night, Microsoft announced that Windows Server 2019 would be released, generally available, in the second half of 2018. I suspect that the big bash will be Ignite in Orlando at the end of September, possibly with a release that week, but maybe in October – that’s been the pattern lately.

LTSC

Microsoft is referring to WS2019 as a “long term servicing channel release”. When Microsoft started the semi-annual channel, a Server Core build of Windows Server released every 6 months to Software Assurance customers that opt into the program, they promised that the normal builds would continue every 3 years. These LTSC releases would be approximately the sum of the previous semi-annual channel releases plus whatever new stuff they cooked up before the launch.

First, let’s kill some myths that I know are being spread by “someone I know that’s connected to Microsoft” … it’s always “someone I know” that is “connected to Microsoft” and it’s always BS:

  • The GUI is not dead. The semi-annual channel release is Server Core, but Nano is containers only since last year, and the GUI is an essential element of the LTSC.
  • This is not the last LTSC release. Microsoft views (and recommends) LTSC for non-cloud-optimised application workloads such as SQL Server.
  • No – Windows Server is not dead. Yes, Azure plays a huge role in the future, but Azure Stack and Azure are both powered by Windows, and hundreds of thousands, if not millions, of companies still are powered by Windows Server.

Let’s talk features now …

I’m not sure what’s NDA and what is not, so I’m going to stick with what Microsoft has publicly discussed. Sorry!

Project Honolulu

For those of you who don’t keep up with the tech news (that’s most IT people), then Project Honolulu is a huge effort by MS to replace the Remote Server Administration Toolkit (RSAT) that you might know as “Administrative Tools” on Windows Server or on an admin PC. These ancient tools were built on MMC.EXE, which was deprecated with the release of W2008!

Honolulu is a whole new toolset built on HTML5 for today and the future. It’s not finished – being built with cloud practices, it never will be – but but’s getting there!

Hybrid Scenarios

Don’t share this secret with anyone … Microsoft wants more people to use Azure. Shh!

Some of the features we (at work) see people adopt first in the cloud are the hybrid services, such as Azure Backup (cloud or hybrid cloud backup), Azure Site Recovery (disaster recovery), and soon I think Azure File Sync (seamless tiered storage for file servers) will be a hot item. Microsoft wants it to be easier for customers to use these services, so they will be baked into Project Honolulu. I think that’s a good idea, but I hope it’s not a repeat of what was done with WS2016 Essentials.

ASR needs more than just “replicate me to the cloud” enabled on the server; that’s the easy part of the deployment that I teach in the first couple of hours in a 2-day ASR class. The real magic is building a DR site, knowing what can be replicated and what cannot (see domain controllers & USN rollback, clustered/replicating databases & getting fired), orchestration, automation, and how to access things after a failover.

Backup is pretty easy, especially if it’s just MARS. I’d like MARS to add backup-to-local storage so it could completely replace Windows Server Backup. For companies with Hyper-V, there’s more to be done with Azure Backup Server (MABS) than just download an installer.

Azure File Sync also requires some thought and planning, but if they can come up with some magic, I’m all for it!

Security

In Hyper-V:

  • Linux will be supported with Shielded VMs.
  • VMConnect supported is being added to Shielded VMs for support reasons – it’s hard to fix a VM if you cannot log into it via “console” access.
  • Encrypted Network Segments can be turned on with a “flip of a switch” for secure comms – that could be interesting in Azure!

Windows Defender ATP (Advanced Threat Protection) is a Windows 10 Enterprise feature that’s coming to WS2019 to help stop zero-day threats.

DevOps

The big bet on Containers continues:

  • The Server Core base image will be reduced from 5GB by (they hope) 72% to speed up deployment time of new instances/apps.
  • Kubernetes orchestration will be natively supported – the container orchestrator that orginated in Google appears to be the industry winner versus Docker and Mesos.

In the heterogeneous world, Linux admins will be getting Windows Subsystem on Linux (WSL) for a unified scripting/admin experience.

Hyper-Converged Infrastructure (HCI)

Storage Spaces Direct (S2D) has been improved and more changes will be coming to mature the platform in WS2019. In case you don’t know, S2D is a way to use local (internal) disks in 2+ (preferably 4+) Hyper-V hosts across a high speed network (virtual SAS bus) to create a single cluster with fault tolerance at the storage and server levels. By using internal disks, they can use cheaper SATA disks, as well as new flash formats don’t natively don’t support sharing, such as NVME.

The platform is maturing in WS2019, and Project Honolulu will add a new day-to-day management UI for S2D that is natively lacking in WS2016.

The Pricing

As usual, I will not be answering any licensing/pricing questions. Talk to the people you pay to answer those questions, i.e. the reseller or distributor that you buy from.

OK; let’s get to the messy stuff. Nothing has been announced other than:

It is highly likely we will increase pricing for Windows Server Client Access Licensing (CAL). We will provide more details when available.

So it appears that User CALs will increase in pricing. That is probably good news for anyone licensing Windows Server via processor (don’t confuse this with Core licensing).

When you acquire Windows Server through volume licensing, you pay for every pair of cores in a server (with a minimum of 16, which matched the pricing of WS2012 R2), PLUS you buy User CALs for every user authenticating against the server(s).

When you acquire Windows Server via Azure or through a hosting/leasing (SPLA) program, you pay for Windows Server based only on how many cores that the machine has. For example, when I run an Azure virtual machine with Windows Server, the per-minute cost of the VM includes the cost of Windows Server, and I do not need any Windows Server CALs to use it (RDS is a different matter).

If CALs are going up in price, then it’s probably good news for SPLA (hosting/leasing) resellers (hosting companies) and Azure where Server CALs are not a factor.

The Bits

So you want to play with WS2019? The first preview build (17623) is available as of last night through the Windows Server Insider Preview program. Anyone can sign up.

image

Would You Like To Learn About Azure Infrastructure?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Replicate VM Managed Disks Between Azure Regions

Last week, Microsoft announced that Azure Site Recovery (ASR) for Azure Virtual Machines (in preview still), the system for replicating Azure virtual machines from one region to another, added support for managed disks. To this I say …

Waaahoooooo!

Managed disks are the best way to deploy Azure VM storage because they’re easier to plan for (performance), have predictable pricing (Standard), and have way more management features. Unfortunately, I still found myself advising some customers to use un-managed disks (disks in storage accounts) because those customers needed to be able to replicate VMs from one region to another, e.g. North Europe to West Europe.

But now we have support for managed disks in the preview replication service.

All is not entirely rosy. I’ve been waiting on this feature for this web server since before a “non-“hurricane hit Ireland late last year. I tried to enable the feature (nice experience in the Azure portal, btw) but the replication fails because of a weird “disk.name” error. I’ve reported the issue and hopefully it’ll be fixed.

Would You Like To Learn How To Enable This Feature?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Speaking At European SharePoint, Office 365 & Azure Conference 2017

I will be speaking at this year’s European SharePoint, Office 365, and Azure Conference, which is being held in the National Conference Center in Dublin between 13-16 November. I’ll be talking about Azure Site Recovery (ASR):

image

It’s a huge event with lots of tracks, content and speakers from around the world.

 

For those of you in Ireland, this is a rare opportunity to attend a Microsoft-focused conference of such a scale here in Ireland.

My Experience at Cloud & Datacenter Conference Germany

Last week I was in Munich for the Cloud & Datacenter Germany conference. I landed in Munich on Wednesday for a pre-conference Hyper-V community event, and 2 hours later I was talking to a packed room of over 100 people about implementing Azure Site Recovery with Windows Server 2016 Hyper-V. This talk was very different to my usual “When Disaster Strikes” talk; I wanted to do something different so instead of an hour of PowerPoint, I had 11 slides, half of which were the usual title, who I am, etc, slides. Most of my time was spent doing live demos and whiteboarding using Windows 10 Ink on my Surface Book.

image

Photo credit: Carsten Rachfahl (@hypervserver)

On Friday I took the stage to do my piece for the conference, and I presented my Hidden Treasures in Windows Server 2016 Hyper-V talk. This was slightly evolved from what I did last month in Amsterdam – I chopped out lots of redundant PowerPoint and spent more time on live demos. As usual with this talk, which I’d previously done on WS2012 R2 for TechEd Europe 2014 and Ignite 2015, I ran all of my demos using PowerShell scripts.

Media preview

Photo credit: Benedikt Gasch (@BenediktGasch)

 

One of the great things about attending these events is that I get to meet up with some of my Hyper-V MVPs friends. It was great to sit down for dinner with them, and a few of us were still around for a quieter dinner on the Friday night. Below you can see me hanging out with Tudy Damian, Carsten Rachfahl, Ben Armstrong (Virtual PC Guy), and Didier Van Hoye.

Media preview

As expected, CDC Germany was an awesome event with lots of great speakers sharing knowledge over 2 days. Plans have already started for the next event, so if you speak German and want to stay up to speed with Hyper-V, private & public cloud in the Microsoft world, then make sure you follow the news on https://www.cdc-germany.de/

Template For ASR Recovery Plan Runbook

I’m sharing a template PowerShell-based Azure Automation runbook that I’ve written, to enable advanced automation in the Recovery Plans that can be used in Azure Site Recovery (ASR) failover.

ASR allows us to orchestrate the failover and failback of virtual machines. This can be a basic ordering of virtual machine power-up and power-down actions, but we can also inject runbooks from Azure Automation. This allows us to do virtually anything inside of Azure during a failover or failback.

I needed to write a runbook for ASR recently and I found that I needed this runbook to be able to work differently in different scenarios:

  • Planned failover
  • Unplanned failover
  • Test failover
  • Cleanup after a test failover

That last one is tricky, but I’ll come back to it. They key to the magic is a parameter in the runbook called $RecoveryPlanContext. When an ASR executes a runbook it passes some information into the script via this parameter. The parameter has multiple attributes, as documented by Microsoft here. Some of the interesting values we can query for are:

  • FailoverType: test, unplanned, or planned.
  • FailoverDirection: primary or secondary are the destinations.
  • VmMap: An array listing the names of the VMs included in the runbook.
  • ResourceGroupName: The name of the resource group that the failover VMs are in.

My template currently only queries for FailoverType, because my goal was to automate failover to Azure. However, there is as much value in dealing with FailoverDirection, because the runbook can be used to orchestrate a failback.

Let’s assume that I use a runbook to start up some machine(s) during a failover. This machine could be a jump box, enabling remote access into that machine, and from there we can jump to failed over machines. That would be useful in all kinds of failover, including a test failover. But what happens if I run a test, am happy with everything and run a cleanup task? Guess what … our recovery plan does not execute in reverse order, and the runbook is not executed. So what I’ve come up with is a way to tell the runbook that I want to do a cleanup. The runbook will typically be executed before the cleanup (some tasks must be done before VMs are removed to make scripting easier, e.g. finding PIPs used by VMs before the VMs and their NICs are deleted). The runbook still expects a parameter; you are prompted to enter a value, so I enter “cleanup”. Once my script sees that task it runs a cleanup function.

My template has 4 functions, one for each of the above 4 scenarios. There is an if statement to look for cleanup, and if that’s not the value, a switch statement checks $RecoveryPlanContext.FailoverType to see what the scenario is. If some unexpected value is found, the runbook will exit.

param ( 
[Object]$RecoveryPlanContext 
)

function DoTestCleanup ()
{
Write-Output ("This is a cleanup after test failover")
# Do stuff here
}


function DoTestFailover ()
{
Write-Output ("This is a test failover")
# Do stuff here
}


function DoPlannedFailover ()
{
Write-Output ("This is a planned failover")
# Do stuff here
}


function DoUnplannedFailover ()
{
Write-Output ("This is an unplanned failover")
# Do stuff here
}


# The runbook starts here
Sleep 10
$connectionName = "AzureRunAsConnection"
try
{
# Get the connection "AzureRunAsConnection "
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName

"Logging in to Azure using $connectionName ..."
Add-AzureRmAccount -ServicePrincipal -TenantId $servicePrincipalConnection.TenantId -ApplicationId $servicePrincipalConnection.ApplicationId -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
}
catch {
if (!$servicePrincipalConnection)
{
$ErrorMessage = "Connection $connectionName not found."
throw $ErrorMessage
} else{
Write-Error -Message $_.Exception
throw $_.Exception
}
}

write-output ("The failover type parameter is $RecoveryPlanContext")


if($RecoveryPlanContext -eq 'cleanup')
{
DoTestCleanup
}
else
{
switch ($RecoveryPlanContext.FailoverType)
{
"Test" { DoTestFailover }
"Planned" { DoPlannedFailover }
"Unplanned" { DoUnplannedFailover }
default { Write-Output ("Runbook aborted because there no failover type was specified") }
}
}

 

Hurricane Matthew – Start Those Planned Failovers

A hurricane is about to blast it’s way up the east coast of the USA, making landfall in south Florida probably early on Friday morning, and working it’s way up to Norfolk, VA, by Sunday morning. We know how much damage these hurricanes can do, especially if tides rise and seawater starts mixing with electric, servers, and storage – we’re talking not just business down, but business offline, and maybe even business dead. I’m sorry, but even a stretch cluster to a nearby location is subject to the same mess.

This is when a true DR solution is required. “But I cannot afford a DR solution”, you say. You can’t afford to not have one, but I do know what you could have deployed (it’s too late now, by the way, if you are in the target zone for Hurricane Matthew). Azure Site Recovery (ASR) is an OPEX-based way to get a DR site in the cloud. The cost is a monthly drip feed instead of the CAPEX big bang that a traditional DR site is:

  • $25 per replicated machine per month, in Azure South Central US.
  • Replicated disk storage starts at $0.05 per GB in the same Azure region.

The solution works with:

  • Hyper-V
  • vSphere
  • Physical servers

And it’s really simple to use and reliable; thousands (if not more) of businesses are deploying and testing ASR failovers on a regular basis. This out-of-“the box” shared platform is tested constantly, which makes it way more reliable than some home-baked solution.

You get full orchestration – so if I saw the forecast today, I could start my business continuity plan, start the failover and hit the road. My machines would start a planned failover (ordered and no data loss) to Azure and would be waiting for me when I get to my rendezvous point. Note that my orchestration can also kick off PowerShell scripts (Azure Automation) to do some fancy things, such as redirecting internet traffic that I had routed using Azure Traffic Manager.

If you have ASR and are in one of the areas that will be affected, then do a test failover, do any required remediation’s, and then start that failover. Hopefully, your business is not damaged and you can do a failback afterwards (if you want to). If you don’t have a DR solution, I hope you survive, and have the sense to look at ASR soon afterwards – it is hurricane season!

Technorati Tags: ,,

Future Decoded: My Session Is “Azure Site Recovery – Be A Super Hero!”

I’m going to be talking about Azure’s DR-as-a-Service or DR-site-in-the-cloud solution, Azure Site Recovery (ASR) at Future Decoded, a fantastic IT event by Microsoft UK beside London City Airport, on November 1/2.

“Remember; when disaster strikes, the time to prepare has passed” , Stephen Cyros.

We all think that disasters never happen near us; bushfires, earthquakes and flying cows are things that happen elsewhere. But the truth is very different, disasters strike every day without making headlines, sometimes wiping out a company or just that one critical server, and the cruel thing about disasters is that they tend to strike those that are unprepared; it’s those times that the business needs a hero. Unfortunately, a hero needs to be prepared, and during a disaster is not the time to prepare. IT Pros know that we need to have DR solutions, but often they’ve proven to be too costly or too difficult to implement. Times have changed; cloud computing has democratized and simplified DR. ASR’s low cost OPEX model makes replication of physical, vSphere, or Hyper-V servers to Azure more … more so now, thanks to recent price cuts. Large and small enterprises benefit from ASR’s orchestration which makes failover easy and reliable – you can order failover of machines and build in scripted extensions, and test your orchestrated failover without impacting production systems.

TW_FD-Register-banner_1024x512px (002)

Future Decoded will have lots of great content from a variety of speakers with different backgrounds, and come along to my session to learn how you can be the super hero, and get your business back operational when everyone else is panicking.