Azure Site Recovery

Azure Infrastructure Announcements – August 2023

This post brings you a summary of the infrastructure announcements from Azure that were made during August 2023. There are lots of announcements from Storage and a few interesting notes for VMs, networking, and ASR.

Storage

Azure Managed Lustre: not your grandparents’ parallel file system

With a few clicks of a web interface or an Azure Resource Manager template, AMLFS lets you provision an all-flash Lustre file system in minutes. What’s different is that this Lustre file system is all yours. If someone else in Azure is running a job that creates a million files, you won’t ever know it because your Lustre servers and SSDs are exclusively yours.

Massively scaled and high performance file systems for HPC workloads.

General availability | Azure NetApp Files: SMB Continuous Availability (CA) shares

To enhance resiliency during storage service maintenance operations, SMB volumes used by Citrix App Layering, FSLogix user profile containers and Microsoft SQL Server on Microsoft Windows Server can be enabled with Continuous Availability

SMB Transparent Failover means that clients should not notice maintenance operations.

Public preview: Azure Storage Mover support for SMB and Azure Files

Storage Mover is a fully managed migration service that enables you to migrate on-premises files and folders to Azure Storage while minimizing downtime for your workload. Azure Storage Mover can now migrate your SMB shares to Azure file shares.

To be honest, I’ve not encountered a “replace the file server with Azure Files” scenario yet. Third-party vendors often won’t support it for LOB apps. User data typically ends up in SharePoint/OneDrive. And wouldn’t most Citrix/RDS admins want to start with new profiles?

Generally available: Azure Blob Storage Cold Tier

Azure Blob Storage Cold Tier is now generally available. It is a new online access tier that is the most cost-effective Azure Blob offering for storing infrequently accessed data with long-term retention requirements, while providing instant access. The pricing of the cold tier storage option lies between the cool and archive tiers, and it follows a 90-day early deletion policy. You can seamlessly utilize the cold tier in the same way as the hot and cool tiers.

Cool – Cold. Tell me that isn’t confusing. The scenario is that you want to store data for a long time, but you need it immediately available. Archive requires a 15-hour restore (“rehydration”) that can be accelerated with a charge. Cold is one step up, but not as cost-effective.

Public Preview: Azure NetApp Files Cloud Backup for Virtual Machines

With Cloud Backup for Virtual Machines, you can now create VM consistent snapshot backups of VMs on Azure NetApp Files datastores. The associated virtual appliance installs in the Azure VMware Solution cluster and provides policy-based automated and consistent backup of VMs integrated with Azure NetApp Files snapshot technology for fast backups and restores of VMs, groups of VMs (organized in resource groups) or complete datastores lowering RTO, RPO, and improving total cost of ownership.

General Availability: Incremental snapshots for Premium SSD v2 Disk and Ultra Disk Storage

You can now instantly restore Premium SSD v2 and Ultra Disks from snapshots and attach them to a running VM without waiting for any background copy of data. This new capability allows you to read and write data on disks immediately after creation from snapshots, enabling you to recover your data from accidental deletes or a disaster quickly

I can see third-party backup making use of this.

Azure Elastic SAN updates: Private Endpoints & Shared Volumes

As we approach general availability of Azure Elastic SAN, we continue improving the service and adding features based on your feedback. Today, we are releasing private endpoint support and volume sharing support via SCSI (Small Computer System Interface) Persistent Reservation.

This sounds like the sort of feature maturity one will expect as the service approaches general availability. I wonder what the actual target market is for this service.

Private Preview – DR for Shared Disks – Azure Site Recovery

We are excited to announce the Private Preview of DR for Azure Shared Disks for workloads running Windows Server Failover Clusters (WSFC) on Azure VMs. Now you can protect, monitor, and recover your WSFC-clusters as a single unit across its DR Lifecycle, while also generating cluster-consistent recovery points – which are consistent across all the disks (including the Shared Disk) of the cluster.

This feature is long overdue for customers using shared virtual hard disks to create failover clusters.

Networking

Public preview: Support for new custom error pages in Application Gateway

In addition to the response codes 403 and 502, the Azure Application Gateway now lets you configure company-branded error pages for more response codes – 400, 405, 408, 500, 503, and 504. You can configure these error pages at a global level to apply to all the listeners on your gateway or individually for each listener.

These pages can be shared on any publicly accessible URI.

Azure Firewall: New Monitoring and Logging Updates

Notes:

(Preview) With the Azure Firewall Resource Health check, you can now view the health status of your Azure Firewall and address service problems that may affect your Azure Firewall resource. Resource Health allows IT teams to receive proactive notifications regarding potential health degradations and recommended mitigation actions for each health event type
(Preview) The Azure Firewall Workbook presents a dynamic platform for analyzing Azure Firewall data. Within the Azure portal, you can utilize it to generate visually engaging reports.
(GA) The Latency Probe metric is designed to measure the overall latency of Azure Firewall and provide insight into the health of the service. IT administrators can use the metric for monitoring and alerting if there is observable latency and diagnosing if the Azure Firewall is the cause of latency in a network.

Resource health should make for a useful alert, especially when enabling DevSecOps – be aware of the dreaded “out of sync” error. I just tried the workbook in a production system – I noticed a couple of things that I might not have otherwise noticed because they didn’t trigger a human response (yet). The latency probe is interesting – I think it originated from customer network performance scenarios where it was suspected that the firewall was the root cause.

Virtual Machines

Public preview: Azure Mv3 Medium Memory (MM) Virtual Machines

Today we are announcing the public preview of the next generation Mv3 Medium Memory (MM) virtual machine series. Powered by the 4th Generation Intel® Xeon® Scalable Processor and DDR5 DRAM technology, the Mv3 medium memory (MM) virtual machines can scale for SAP workloads from 250GB to 4TB. With Azure Boost, Mv3 MM provides a ~25% improvement in network throughput and up to 1.5X improvement in remote storage throughput over the previous M-series families.

These machines start at 12 vCPUs and 240 GB RAM, scaling up to 176 vCPUs and 2794 RAM. That should just about be enough to run Teams.

The Azure IaaS Book Of News – December 2022

Here’s all the news that I thought was interesting for Ops and Security folks working with Azure IaaS from December 2022.

Azure VMware Solution

Azure VMware Solution Advanced Monitoring: This solution add-on deploys a virtual machine running Telegraf in Azure with a managed identity that has contributor and metrics publisher access to the Azure VMware Solution private cloud object. Telegraf then connects to vCenter Server and NSX-T Manager via API and provides responses to API metric requests from the Azure portal.

Azure Kubernetes Service

Microsoft and Isovalent partner to bring next generation eBPF dataplane for cloud-native applications in Azure: Microsoft announces the strategic partnership with Isovalent to bring Cilium’s eBPF-powered networking data plane and enhanced features for Kubernetes and cloud-native infrastructure. Azure Kubernetes Services (AKS) will now be deployed with Cilium open-source data plane and natively integrated with Azure Container Networking Interface (CNI). Microsoft and Isovalent will enable Isovalent Cilium Enterprise as a Kubernetes container App offering onto Azure Container Marketplace. This will provide a one-click deployment solution to Azure Kubernetes clusters with Isovalent Cilium Enterprise advanced features.
Generally Available: Kubernetes 1.25 support in AKS: AKS support for Kubernetes release 1.25 is now generally available. Kubernetes 1.25 delivers 40 enhancements. This release includes new changes such as the removal of PodSecurityPolicy.

Azure Backup

General Availability of Cross Zonal Restore of Azure Virtual Machines from Azure Backup: With the preview of Cross Zonal Restore of Azure VMs, Azure Backup offers a compelling set of durability options for your backup data including ZRS for intra-region high durability. Aidan’s note – you should consider this with regions such as Norway East where the paired region is unavailable to 99.9% of customers.
How to automate On-Demand Azure Backup for Azure Virtual Machines using PowerShell: Aidan’s note – A solution to enable more frequent VM backups than otherwise possible, but make sure frequency doesn’t overlap with backup job time.

Azure Virtual Desktop

Announcing the Public Preview of AVD Insights at Scale: This update provides the ability to review performance and diagnostic information across multiple host pools in one view. Aidan’s note – no additional diagnostics settings are required.
Confidential Virtual Machine support for Azure Virtual Desktop now in Public Preview: Azure Virtual Desktop has public preview support for Azure Confidential Virtual Machines. Confidential Virtual Machines increase data privacy and security by protecting data in use.
Announcing general availability of RDP Shortpath: RDP Shortpath improves the transport reliability of Azure Virtual Desktop connections by establishing a direct UDP data flow between the Remote Desktop client and session hosts. This feature is enabled by default for all customers. Aidan’s Note – I haven’t looked into this but there may be networking issues where firewall’s/routing are deployed.
Announcing general availability of FSLogix 2210: This latest version is focused on three core features, six bug fixes, and two general updates.

Virtual Machines

Public preview: New Memory Optimized VM sizes – E96bsv5 and E112ibsv5: The new E96bsv5 and E112ibsv5 VM sizes part of the Azure Ebsv5 VM series offer the highest remote storage performances of any Azure VMs to date. The new VMs can now achieve even higher VM-to-disk throughput and IOPS performance with up to 8,000 MBps and 260,000 IOPS.
Generally Available: Azure Dedicated Host – Restart: Azure Dedicated Host gives you more control over the hosts you deployed by giving you the option to restart any host. When undergoing a restart, the host and its associated VMs will restart while staying on the same underlying physical hardware.

Governance

Public preview: Use tag inheritance for cost management: You no longer need to ensure that every resource is tagged or rely on resource providers to support and emit tags in their billing pipeline for cost management. Aidan’s Note – Restricted to EA/MCA … which unreasonably sucks. The latest example of “cost management” excluding other customers.

App Services

Generally available: Static Web Apps Diagnostics: Static Web Apps diagnostics will help you diagnose what went wrong and will show you how to resolve the issues.

Storage

Public preview: Azure NetApp Files cross-zone replication: The cross-zone replication feature allows you to replicate your Azure NetApp Files volumes asynchronously from one Azure availability zone (AZ) to another in the same region.

Azure Site Recovery

Public Preview: Azure Site Recovery Higher Churn Support: Azure Site Recovery (ASR) has increased its data churn limit by approximately 2.5x to 50 MB/s per disk. With this, you can configure disaster recovery (DR) for Azure VMs having data churn up to 100 MB/s. This helps you to enable DR for more IO intensive workloads.

Networking

General availability: Feature enhancements to Azure Web Application Firewall (WAF): Azure’s global Web Application Firewall (WAF) running on Azure Front Door, and Azure’s regional WAF running on Application Gateway, now support additional features that help organizations improve their security posture and make it easier to manage logging across resources.

Miscellaneous

Public Preview : Introducing Multi-Region Replication for Azure Key Vault Managed HSM: The feature allows you to extend a managed HSM pool from one Azure region to an other thereby enhancing the availability of mission critical cryptographic keys with automated key replication and maximizing read throughput and latency with the closest available region.

Microsoft Ignite 2018: Implement Cloud Backup & Disaster Recovery At Scale in Azure

Speakers: Trinadh Kotturu, Senthuran Sivananthan, & Rochak Mittal

Site Recovery At Scale

Senthuran Sivananthan

Real Solutions for Real Problems

Customer example: Finastra.

BCP process: Define RPO/RTO. Document DR failover triggers and approvals.
Access control: Assign clear roles and ownership. Levarage ASR built-in roles for RBAC. Different RS vault for different BU/tenants. They deployed 1 RSV per app to do this.
Plan your DR site: Leveraged region pairs – useful for matching GRS replication of storage. Site connectivity needs to be planned. Pick the primary/secondary regions to align service availability and quota availability – change the quotas now, not later when you invoke the BCP.
Monitor: Monitor replication health. Track configuration changes in environment – might affect recovery plans or require replication changes.
DR drills: Periodically do test failovers.

Journey to Scale

Automation: Do things at scale
Azure Policy: Ensure protection
Reporting: Holistic view and application breakdown
Pre- & Post- Scripts: Lower RTO as much as possible and eliminate human error

Demos – ASR

Rochak for demos of recent features. Azure Policies coming soon.

Will assess if VMs are being replicated or not and display non-compliance.

Expanding the monitoring solution.

Demo – Azure Backup & Azure Policy

Trinadh creates an Azure Policy and assigns it to a subscription. He picks the Azure Backup policy definition. He selects a resource group of the vault, selects the vault, and selects the backup policy from the vault. The result is that any VM within the scope of the policy will automatically be backed up to the selected RSV with the selected policy.

Azure Backup & Security

Supports Azure Disk Encryption. KEK and BEK are backed up automatically.

AES 256 protects the backup blobs.

Compliance

HIPAA
ISO
CSA
GDPR
PCI-DSS
Many more

Built-in Roles

Cumulative:

Backup reader – see only
Backup Operator: Enable backup & restore
Backup contributor: Policy management and Delete-Stop Backup

Protect the Roles

PIM can be used to guard the roles – protect against rogue admins.

JIT access
MFA
Multi-user approval

Data Security

PIN protection for critical actions, e.g. delete
Alert: Notification on critical actions
Recovery: Data kept for 14 days after delete. Working on blob soft delete

Backup Center Demo

Being built at the moment. Starting with VMs now but will include all backup items eventually.

All RSVs in the tenant (doh!) managed in a central place.

Aimed at the large enterprise.

They also have Log Analytics monitoring if you like that sort of thing. I’m not a fan of LA – I much prefer Azure Monitor.

Reporting using Power BI

Trinadh demos a Power BI reporting solution that unifies backup data from multiple tenants into a single report.

Microsoft Ignite 2018–Azure Migrate

I arrived late for this session because I was in a meeting. They were doing a demo of Azure Migrate.

Azure Migrate fo Discovery And Assessment

Agentless discovery
TCO calculation
Right-size and suitability
Azure Platform

The are “announcing” support for Hyper-V – it’s still in limited private preview.

Third Party Solutions

Cloudamize is just an assessment tool

Indepth performance analysis
Right-size compute and stoage options.
TCO calculations
Agentless
Assessments for migration to Azure SQL
Integrates into ASR to do the migration

Migration solutions:

ASR
Zerto
CloudEndure

Azure Site Recovery (ASR)

Easy to onboard – appliance wizard for VMware
Broad coverage for Windows and Linux
UEFI support for VMware and physical machines – converted to BIOS
W2008 32-bit support

They do a demo of Zerto for migrations. Then they demo CloudEndure.

Futures

They’re trying to simplify the process. Starting a limited private preview:

Assess > Migrate & modernize > optimize > secure & manage.

Going to use the new tabbed UI in the Azure Portal. You can import and assessment into a migration. Pick the ready machines that you want to migrate, optionally apply HUB and overrise VM sizing, OS disk, and availability set membership. This migration experience will ideally be used by the 3rd parties too.

Windows Server 2019 Announced for H2 2018

Last night, Microsoft announced that Windows Server 2019 would be released, generally available, in the second half of 2018. I suspect that the big bash will be Ignite in Orlando at the end of September, possibly with a release that week, but maybe in October – that’s been the pattern lately.

LTSC

Microsoft is referring to WS2019 as a “long term servicing channel release”. When Microsoft started the semi-annual channel, a Server Core build of Windows Server released every 6 months to Software Assurance customers that opt into the program, they promised that the normal builds would continue every 3 years. These LTSC releases would be approximately the sum of the previous semi-annual channel releases plus whatever new stuff they cooked up before the launch.

First, let’s kill some myths that I know are being spread by “someone I know that’s connected to Microsoft” … it’s always “someone I know” that is “connected to Microsoft” and it’s always BS:

The GUI is not dead. The semi-annual channel release is Server Core, but Nano is containers only since last year, and the GUI is an essential element of the LTSC.
This is not the last LTSC release. Microsoft views (and recommends) LTSC for non-cloud-optimised application workloads such as SQL Server.
No – Windows Server is not dead. Yes, Azure plays a huge role in the future, but Azure Stack and Azure are both powered by Windows, and hundreds of thousands, if not millions, of companies still are powered by Windows Server.

Let’s talk features now …

I’m not sure what’s NDA and what is not, so I’m going to stick with what Microsoft has publicly discussed. Sorry!

Project Honolulu

For those of you who don’t keep up with the tech news (that’s most IT people), then Project Honolulu is a huge effort by MS to replace the Remote Server Administration Toolkit (RSAT) that you might know as “Administrative Tools” on Windows Server or on an admin PC. These ancient tools were built on MMC.EXE, which was deprecated with the release of W2008!

Honolulu is a whole new toolset built on HTML5 for today and the future. It’s not finished – being built with cloud practices, it never will be – but but’s getting there!

Hybrid Scenarios

Don’t share this secret with anyone … Microsoft wants more people to use Azure. Shh!

Some of the features we (at work) see people adopt first in the cloud are the hybrid services, such as Azure Backup (cloud or hybrid cloud backup), Azure Site Recovery (disaster recovery), and soon I think Azure File Sync (seamless tiered storage for file servers) will be a hot item. Microsoft wants it to be easier for customers to use these services, so they will be baked into Project Honolulu. I think that’s a good idea, but I hope it’s not a repeat of what was done with WS2016 Essentials.

ASR needs more than just “replicate me to the cloud” enabled on the server; that’s the easy part of the deployment that I teach in the first couple of hours in a 2-day ASR class. The real magic is building a DR site, knowing what can be replicated and what cannot (see domain controllers & USN rollback, clustered/replicating databases & getting fired), orchestration, automation, and how to access things after a failover.

Backup is pretty easy, especially if it’s just MARS. I’d like MARS to add backup-to-local storage so it could completely replace Windows Server Backup. For companies with Hyper-V, there’s more to be done with Azure Backup Server (MABS) than just download an installer.

Azure File Sync also requires some thought and planning, but if they can come up with some magic, I’m all for it!

Security

In Hyper-V:

Linux will be supported with Shielded VMs.
VMConnect supported is being added to Shielded VMs for support reasons – it’s hard to fix a VM if you cannot log into it via “console” access.
Encrypted Network Segments can be turned on with a “flip of a switch” for secure comms – that could be interesting in Azure!

Windows Defender ATP (Advanced Threat Protection) is a Windows 10 Enterprise feature that’s coming to WS2019 to help stop zero-day threats.

DevOps

The big bet on Containers continues:

The Server Core base image will be reduced from 5GB by (they hope) 72% to speed up deployment time of new instances/apps.
Kubernetes orchestration will be natively supported – the container orchestrator that orginated in Google appears to be the industry winner versus Docker and Mesos.

In the heterogeneous world, Linux admins will be getting Windows Subsystem on Linux (WSL) for a unified scripting/admin experience.

Hyper-Converged Infrastructure (HCI)

Storage Spaces Direct (S2D) has been improved and more changes will be coming to mature the platform in WS2019. In case you don’t know, S2D is a way to use local (internal) disks in 2+ (preferably 4+) Hyper-V hosts across a high speed network (virtual SAS bus) to create a single cluster with fault tolerance at the storage and server levels. By using internal disks, they can use cheaper SATA disks, as well as new flash formats don’t natively don’t support sharing, such as NVME.

The platform is maturing in WS2019, and Project Honolulu will add a new day-to-day management UI for S2D that is natively lacking in WS2016.

The Pricing

As usual, I will not be answering any licensing/pricing questions. Talk to the people you pay to answer those questions, i.e. the reseller or distributor that you buy from.

OK; let’s get to the messy stuff. Nothing has been announced other than:

It is highly likely we will increase pricing for Windows Server Client Access Licensing (CAL). We will provide more details when available.

So it appears that User CALs will increase in pricing. That is probably good news for anyone licensing Windows Server via processor (don’t confuse this with Core licensing).

When you acquire Windows Server through volume licensing, you pay for every pair of cores in a server (with a minimum of 16, which matched the pricing of WS2012 R2), PLUS you buy User CALs for every user authenticating against the server(s).

When you acquire Windows Server via Azure or through a hosting/leasing (SPLA) program, you pay for Windows Server based only on how many cores that the machine has. For example, when I run an Azure virtual machine with Windows Server, the per-minute cost of the VM includes the cost of Windows Server, and I do not need any Windows Server CALs to use it (RDS is a different matter).

If CALs are going up in price, then it’s probably good news for SPLA (hosting/leasing) resellers (hosting companies) and Azure where Server CALs are not a factor.

The Bits

So you want to play with WS2019? The first preview build (17623) is available as of last night through the Windows Server Insider Preview program. Anyone can sign up.

Would You Like To Learn About Azure Infrastructure?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Replicate VM Managed Disks Between Azure Regions

Last week, Microsoft announced that Azure Site Recovery (ASR) for Azure Virtual Machines (in preview still), the system for replicating Azure virtual machines from one region to another, added support for managed disks. To this I say …

Waaahoooooo!

Managed disks are the best way to deploy Azure VM storage because they’re easier to plan for (performance), have predictable pricing (Standard), and have way more management features. Unfortunately, I still found myself advising some customers to use un-managed disks (disks in storage accounts) because those customers needed to be able to replicate VMs from one region to another, e.g. North Europe to West Europe.

But now we have support for managed disks in the preview replication service.

All is not entirely rosy. I’ve been waiting on this feature for this web server since before a “non-“hurricane hit Ireland late last year. I tried to enable the feature (nice experience in the Azure portal, btw) but the replication fails because of a weird “disk.name” error. I’ve reported the issue and hopefully it’ll be fixed.

Would You Like To Learn How To Enable This Feature?

Speaking At European SharePoint, Office 365 & Azure Conference 2017

I will be speaking at this year’s European SharePoint, Office 365, and Azure Conference, which is being held in the National Conference Center in Dublin between 13-16 November. I’ll be talking about Azure Site Recovery (ASR):

It’s a huge event with lots of tracks, content and speakers from around the world.

For those of you in Ireland, this is a rare opportunity to attend a Microsoft-focused conference of such a scale here in Ireland.

My Experience at Cloud & Datacenter Conference Germany

Last week I was in Munich for the Cloud & Datacenter Germany conference. I landed in Munich on Wednesday for a pre-conference Hyper-V community event, and 2 hours later I was talking to a packed room of over 100 people about implementing Azure Site Recovery with Windows Server 2016 Hyper-V. This talk was very different to my usual “When Disaster Strikes” talk; I wanted to do something different so instead of an hour of PowerPoint, I had 11 slides, half of which were the usual title, who I am, etc, slides. Most of my time was spent doing live demos and whiteboarding using Windows 10 Ink on my Surface Book.

Photo credit: Carsten Rachfahl (@hypervserver)

On Friday I took the stage to do my piece for the conference, and I presented my Hidden Treasures in Windows Server 2016 Hyper-V talk. This was slightly evolved from what I did last month in Amsterdam – I chopped out lots of redundant PowerPoint and spent more time on live demos. As usual with this talk, which I’d previously done on WS2012 R2 for TechEd Europe 2014 and Ignite 2015, I ran all of my demos using PowerShell scripts.

Photo credit: Benedikt Gasch (@BenediktGasch)

One of the great things about attending these events is that I get to meet up with some of my Hyper-V MVPs friends. It was great to sit down for dinner with them, and a few of us were still around for a quieter dinner on the Friday night. Below you can see me hanging out with Tudy Damian, Carsten Rachfahl, Ben Armstrong (Virtual PC Guy), and Didier Van Hoye.

As expected, CDC Germany was an awesome event with lots of great speakers sharing knowledge over 2 days. Plans have already started for the next event, so if you speak German and want to stay up to speed with Hyper-V, private & public cloud in the Microsoft world, then make sure you follow the news on https://www.cdc-germany.de/

Template For ASR Recovery Plan Runbook

I’m sharing a template PowerShell-based Azure Automation runbook that I’ve written, to enable advanced automation in the Recovery Plans that can be used in Azure Site Recovery (ASR) failover.

ASR allows us to orchestrate the failover and failback of virtual machines. This can be a basic ordering of virtual machine power-up and power-down actions, but we can also inject runbooks from Azure Automation. This allows us to do virtually anything inside of Azure during a failover or failback.

I needed to write a runbook for ASR recently and I found that I needed this runbook to be able to work differently in different scenarios:

Planned failover
Unplanned failover
Test failover
Cleanup after a test failover

That last one is tricky, but I’ll come back to it. They key to the magic is a parameter in the runbook called $RecoveryPlanContext. When an ASR executes a runbook it passes some information into the script via this parameter. The parameter has multiple attributes, as documented by Microsoft here. Some of the interesting values we can query for are:

FailoverType: test, unplanned, or planned.
FailoverDirection: primary or secondary are the destinations.
VmMap: An array listing the names of the VMs included in the runbook.
ResourceGroupName: The name of the resource group that the failover VMs are in.

My template currently only queries for FailoverType, because my goal was to automate failover to Azure. However, there is as much value in dealing with FailoverDirection, because the runbook can be used to orchestrate a failback.

Let’s assume that I use a runbook to start up some machine(s) during a failover. This machine could be a jump box, enabling remote access into that machine, and from there we can jump to failed over machines. That would be useful in all kinds of failover, including a test failover. But what happens if I run a test, am happy with everything and run a cleanup task? Guess what … our recovery plan does not execute in reverse order, and the runbook is not executed. So what I’ve come up with is a way to tell the runbook that I want to do a cleanup. The runbook will typically be executed before the cleanup (some tasks must be done before VMs are removed to make scripting easier, e.g. finding PIPs used by VMs before the VMs and their NICs are deleted). The runbook still expects a parameter; you are prompted to enter a value, so I enter “cleanup”. Once my script sees that task it runs a cleanup function.

My template has 4 functions, one for each of the above 4 scenarios. There is an if statement to look for cleanup, and if that’s not the value, a switch statement checks $RecoveryPlanContext.FailoverType to see what the scenario is. If some unexpected value is found, the runbook will exit.

param ( 
[Object]$RecoveryPlanContext 
)

function DoTestCleanup ()
{
Write-Output ("This is a cleanup after test failover")
# Do stuff here
}


function DoTestFailover ()
{
Write-Output ("This is a test failover")
# Do stuff here
}


function DoPlannedFailover ()
{
Write-Output ("This is a planned failover")
# Do stuff here
}


function DoUnplannedFailover ()
{
Write-Output ("This is an unplanned failover")
# Do stuff here
}


# The runbook starts here
Sleep 10
$connectionName = "AzureRunAsConnection"
try
{
# Get the connection "AzureRunAsConnection "
$servicePrincipalConnection=Get-AutomationConnection -Name $connectionName

"Logging in to Azure using $connectionName ..."
Add-AzureRmAccount -ServicePrincipal -TenantId $servicePrincipalConnection.TenantId -ApplicationId $servicePrincipalConnection.ApplicationId -CertificateThumbprint $servicePrincipalConnection.CertificateThumbprint 
}
catch {
if (!$servicePrincipalConnection)
{
$ErrorMessage = "Connection $connectionName not found."
throw $ErrorMessage
} else{
Write-Error -Message $_.Exception
throw $_.Exception
}
}

write-output ("The failover type parameter is $RecoveryPlanContext")


if($RecoveryPlanContext -eq 'cleanup')
{
DoTestCleanup
}
else
{
switch ($RecoveryPlanContext.FailoverType)
{
"Test" { DoTestFailover }
"Planned" { DoPlannedFailover }
"Unplanned" { DoUnplannedFailover }
default { Write-Output ("Runbook aborted because there no failover type was specified") }
}
}

Hurricane Matthew – Start Those Planned Failovers

A hurricane is about to blast it’s way up the east coast of the USA, making landfall in south Florida probably early on Friday morning, and working it’s way up to Norfolk, VA, by Sunday morning. We know how much damage these hurricanes can do, especially if tides rise and seawater starts mixing with electric, servers, and storage – we’re talking not just business down, but business offline, and maybe even business dead. I’m sorry, but even a stretch cluster to a nearby location is subject to the same mess.

This is when a true DR solution is required. “But I cannot afford a DR solution”, you say. You can’t afford to not have one, but I do know what you could have deployed (it’s too late now, by the way, if you are in the target zone for Hurricane Matthew). Azure Site Recovery (ASR) is an OPEX-based way to get a DR site in the cloud. The cost is a monthly drip feed instead of the CAPEX big bang that a traditional DR site is:

$25 per replicated machine per month, in Azure South Central US.
Replicated disk storage starts at $0.05 per GB in the same Azure region.

The solution works with:

Hyper-V
vSphere
Physical servers

And it’s really simple to use and reliable; thousands (if not more) of businesses are deploying and testing ASR failovers on a regular basis. This out-of-“the box” shared platform is tested constantly, which makes it way more reliable than some home-baked solution.

You get full orchestration – so if I saw the forecast today, I could start my business continuity plan, start the failover and hit the road. My machines would start a planned failover (ordered and no data loss) to Azure and would be waiting for me when I get to my rendezvous point. Note that my orchestration can also kick off PowerShell scripts (Azure Automation) to do some fancy things, such as redirecting internet traffic that I had routed using Azure Traffic Manager.

If you have ASR and are in one of the areas that will be affected, then do a test failover, do any required remediation’s, and then start that failover. Hopefully, your business is not damaged and you can do a failback afterwards (if you want to). If you don’t have a DR solution, I hope you survive, and have the sense to look at ASR soon afterwards – it is hurricane season!

Technorati Tags: Azure,DR,Azure Site Recovery