Virtual Machines

Default Outbound Access For VMs In Azure Will Be Retired

Microsoft has announced that the default route, an implicit public IP address, is being deprecated 30 September 2025.

Background

Let’s define “Internet” for the purposes of this post. The Internet includes:

The actual Internet.
Azure services, such as Azure SQL or Azure’s KMS for Windows VMs, that are shared with a public endpoint (IP address).

We have had ways to access those services, including:

Public IP address associated with a NIC of the virtual machine
Load Balancer with a public IP address with the virtual machine being a backend
A NAT Gateway
An appliance, such as a firewall NVA or Azure firewall, being defined as the next hop to Internet prefixes, such as 0.00.0/0

If a virtual machine is deployed without having any of the above, it still needs to reach the Internet to do things like:

Activate a Windows license against KVM
Download packages for Ubuntu
Use Azure services such as Key Vault, My SQL for Azure SQL, or storage accounts (diagnostics settings)

For that reason, all Azure virtual machines are able to reach the Internet using an implied public IP address. This is an address that is randomly assigned to SNAT the connection out from the virtual machine to the Internet. That address:

Is random and can change
Offers no control or security

Modern Threats

There are two things that we should have been designing networks to stop for years:

Malware command and control
Data exfiltration

The modern hack is a clever and gradual process. Ransomware is not some dumb bot that gets onto your network and goes wild. Some of the recent variants are manually controlled. The malware gets onto the network and attempts to call home to a “machine” on the Internet. From there, the controllers can explore the network and plan their attack. This is the command and control. This attempt to “call home” should be blocked by network/security designs that block outbound access to the Internet by default, opening only connections that are required for workloads to function.

The controller will discover more vulnerabilities and download more software, taking further advantage of vulnerable network/security designs. Backups are targeted for attack first, data is stolen, and systems are crippled and encrypted.

The data theft, or exfiltration, is to an IP address that a modern network/security design would block.

So you can see, that a network design where an implied public IP address is used is not a good practice. This is a primary consideration for Microsoft in making its decision to end the future use of implied public IP addresses.

What Is Happening?

On September 30th, all future virtual machines will no longer be able to use an implied public IP address. Existing virtual machines will be unaffected – but I want to drill into that because it’s not as simple as one might think.

A virtual machine is a resource in Azure. It’s not some disks. It’s not your concept of “I have something called X” that is a virtual machine. It’s a resource that exists. At some point, that resource might be removed. At that point, the virtual machine no longer exists, even if you recreate it with the exact same disks and name.

So keep in mind:

Virtual networks with existing VMs: The existing VMs are unaffected, but new VMs in the VNet will be affected and won’t work.
Scale-out: Let’s say you have a big workload with dozens of VMs with no public IP usage. You add more VMs and they don’t work – it’s because they don’t have an implied IP address, unlike their older siblings.
Restore from backup: You restore a VM to create a new VM. The new VM will not have an implied public IP address.

Is This a Money Grab?

No, this is not a money grab. This is an attempt by Microsoft to correct a “wrong” (it was done to be helpful to cloud newcomers) that was done in the original design. Some of the mitigations are quite low-cost, even for small businesses. To be honest, what money could be made here is pennies compared to the much bigger money that is made elsewhere by Azure.

The goal here is to:

Be secure by default by controlling egress traffic to limit command & control and data exfiltration.
Provide more control over egress flows by selecting the appliance/IP address that is used.
Enable more visibility over public IP addresses, for example, what public address should I share with a partner for their firewall rules?
Drive better networking and security architectures by default.

What Is Your Mitigation?

There are several paths that you can choose.

Assign a public IP address to a virtual machine: This is the lowest cost option but offers no egress security. It can get quite messy if multiple virtual machines require public IP addresses. Rate this as “better than nothing”.
Use a NAT Gateway: This allows a single IP address (or a range from an Azure Public IP Address Prefix) to be shared across an entire subnet. Note that NAT Gateway gets messy if you span availability zones, requiring disruptive VNet and workload redesign. Again this is not a security option.
Use a next hop: You can use an appliance (virtual machine or Marketplace network virtual appliance) or the Azure Firewall as a next hop to the Internet (0.0.0.0/0) or specific Internet IP prefixes. This is a security option – a firewall can block unwanted egress traffic. If you are budget-conscious, then consider Azure Firewall Basic. No matter what firewall/appliance you choose, there will be some subnet/VNet redesign and changes required to routing, which could affect VNet-integrated PaaS services such as API Management Premium.

September 2025 is a long time away. But you have options to consider and potentially some network redesign work to do. Don’t sit around – start working.

In Summary

The implied route to the Internet for Azure VMs will stop being available to new VMs on September 30th, 2025. This is not a money grab – you can choose low-cost options to mitigate the effects if you wish. The hope is that you opt to choose better security, either from Microsoft or a partner. The deadline is a long time away. Do not assume that you are not affected – one day you will expand services or restore a VM from backup and be affected. So get started on your research & planning.

Azure Infrastructure Announcements – September 2023

September is a month of storms. There appears to have been lots of activity in the Azure cloud last month too. Everyone working on Azure should pay attention to the PAY ATTENTION! section.

PAY ATTENTION!

Default outbound access for VMs in Azure will be retired— transition to a new method of internet access

On 30 September 2025, default outbound access connectivity for virtual machines in Azure will be retired. After this date, all new VMs that require internet access will need to use explicit outbound connectivity methods such as Azure NAT Gateway, Azure Load Balancer outbound rules, or a directly attached Azure public IP address.

There will be more communications on this from Microsoft. But this is more than a “don’t worry about your existing VMs” situation. What happens when you add more VMs to an existing old network? What happens when you do a restore? What happens when you do an Azure Site Recovery failover? Those are all new VMs in old networks and they are affected. Everyone should do some work to see if they are affected and prepare remediations in advance – not on the day when they are stressed out by a restore or a Black Friday expansion.

App Service Environment version 1 and version 2 will be retired on 31 August 2024

After 31 August 2024, App Service Environment v1 and v2 will no longer be supported and these App Service Environments and the applications running on them will be deleted and any application data associated with them will be lost.

Oh yeah, you’d better start working on migrations now.

Azure Kubernetes Service

Application gateway for Containers vs Application Gateway Ingress Controller – What’s changed?

Application Gateway for Containers is a new application (layer 7) load balancing and dynamic traffic management product for workloads running in a Kubernetes cluster. At the time of writing this service is currently in public preview. In this article we will look at the differences between AGIC and Application Gateway for containers and some of the great new features available through this new offering.

I know little about AKS but this subject seems to have excited some AKS users.

A Bucket Load Of Stuff

Too much for me to get into and I don’t know enough about this stuff:

App Services

Announcing Public Preview of Free Hosting Plan for WordPress on App Service

We announced the General Availability of WordPress on App Service one year ago, in August 2022 with 3 paid hosting plans. We learnt that sometimes you might need to try out the service before you migrate your production applications. So, we are offering you a playground for a limited period – a free hosting plan to and explore and experiment with WordPress on App Service. This will help you understand the offering better before you make a long-term investment.

They really want you to try this out – note that this plan is not for production workloads.

Hybrid

Announcing the General Availability of Jumpstart HCIBox

Almost one year ago the Jumpstart team released the public preview of HCIBox, our self-contained sandbox for exploring Azure Stack HCI capabilities without the need for physical hardware. Feedback from the community has been fantastic, with dozens of feature requests and issues submitted and resolved through our open-source community.

Today, the Jumpstart team is excited to announce the general availability of HCIBox!

It’s one thing to test out the software functionality of Azure Stack HCI. But the reality is that this is a hardware-centric solution and there is no simulating the performance, stability, or operations of something this complex.

Generally Available: Windows Server 2012 and 2012 R2 Extended Security Updates enabled by Azure Arc

Windows Server 2012 and 2012 R2 Extended Security Updates (ESUs) enabled by Azure Arc is now Generally Available. Windows Server 2012 and 2012 R2 are going End of Support on October 10, 2023. With ESUs, customers who are running Windows Server 2012 on-premises or in other clouds can get three more years of critical security updates from Microsoft to protect their End of Life infrastructure.

This is not free. This is tied into the news about Azure Update Manager (below).

Miscellaneous

Detailed CSP to EA Migration guidance and crucial considerations

In this blog, I’ve shared insights drawn from real-world migration experiences. This article can help you meticulously plan your own CSP to EA migration, ensuring a smoother transition while incorporating critical considerations into your migration strategy.

One really wishes that CSP, EA, etc were just differences in billing and not Azure APIs. Changing of billing should be like changing a phone plan.

Top 10 Considerations for running your workload successfully on Azure this Holiday Season

Black Friday, Small Business Saturday and Cyber Monday will test your app’s limits, and so it’s time for your Infrastructure and Application teams to ensure that your platforms delivers when it is needed the most. Be it shopping applications on the web and mobile or payment gateways or banking systems supporting payments or inventory systems or billing systems – anything and everything associated with the shopping season should be prepared to face the load for this holiday season.

The “holiday season” starts earlier every year. Tesco Ireland started in August. Amazon has a Prime Day next Tuesday (October 10). These events test systems harder than ever and monolithic on-prem designs will not handle it. It’s time to get ready – if it’s not already too late!

Ungated Public Preview: Azure API Center

We’re thrilled to share that Azure API Center is now open for everyone to try during our ungated public preview! Azure API Center is a new Azure service that is part of the Azure API Management platform. It is the central hub where you can effortlessly keep track of all your APIs company-wide, making them readily discoverable, reusable, and manageable.

Managing a catalog of APIs could be challenging. Tooling is welcome.

Generally available: Secure critical infrastructure from accidental deletions at scale with Policy

We are thrilled to announce the general availability of DenyAction, a new effect in Azure Policy! With the introduction of Deny Action, policy enforcement now expands into blocking request based on actions to the resource. These deny action policy assignments can safeguard critical infrastructure by blocking unwarranted delete calls.

Can you believe that Azure was designed deliberately to not have a deny permission? Adding it after is not easy. The idea here is that delete locks on resources/resource groups become too easy to remove – and are frequently removed. Something, like a policy, that is enforced in the API (between you and the resources) is always applied and is not easy to remove and can be easily deployed at scale.

Generally available: Azure Premium SSD v2 Disk Storage is now available in more regions

Azure Premium SSD v2 Disk Storage is now available in Australia East, France Central, Norway East and UAE North regions. This next-generation storage solution offers advanced general-purpose block storage with the best price performance, delivering sub-millisecond disk latencies for demanding IO-intensive workloads at a low cost.

Expanded region availability makes this something more interesting. But, Azure Backup support is in very limited preview since the Spring.

Announcing the general availability of new Azure burstable virtual machines

we are announcing the general availability of the latest generations of Azure Burstable virtual machine (VM) series – the new Bsv2, Basv2, and Bpsv2 VMs based on the Intel® Xeon® Platinum 8370C, AMD EPYC™ 7763v, and Ampere® Altra® Arm-based processors respectively.

Faster and cheaper than the previous editions of B-Series VMs and they include ARM support too. The new virtual machines support all remote disk types such as Standard SSD, Standard HDD, Premium SSD and Ultra Disk storage.

Generally Available: Azure Update Manager

We are pleased to announce that Azure Update Manager, previously known as Update Management Center, is now generally available.

The controversial news is that Arc-managed machines will cost $5/month. I’m still not sold on this solution – it still feels less than legacy solutions like WSUS.

Announcing Public Preview of NVMe-enabled Ebsv5 VMs offering 400K IOPS and 10GBps throughput

Today, we are announcing a Public Preview of accelerated remote storage performance using Azure Premium SSD v2 or Ultra disk and selected sizes within the existing NVMe-enabled Ebsv5 family. The higher storage performance is offered on the E96bsv5 and E112ibsv5 VM sizes and delivers up to 400K IOPS (I/O operations per second) and 10GBps of remote disk storage throughput.

Even the largest SQL VM that I have worked with comes nowhere near these specs. The customer(s) that have justified this investment by Microsoft must be huge.

Azure savings plan for compute: How the benefit is applied

Organizations are benefiting from Azure savings plan for compute to save up to 65% on select compute services – and you could too. By committing to spending a fixed hourly amount for either one year or three years, you can save on plans tailored to your budget needs. But you may wonder how Azure applies this benefit.

It’s simple really. The system looks at your VMs, calculates the theoretical savings, and first applies your discount to the machines where you will save the most money, and then repeats until your discount is used.

General Availability: Share VM images publicly with community gallery – Azure Compute Gallery feature

With community gallery, a new feature of Azure Compute Gallery, you can now easily share your VM images with the wider Azure community. By setting up a ‘community gallery’, you can group your images and make them available to other Azure customers. As a result, any Azure customer can utilize images from the community gallery to create resources such as virtual machines (VMs) and VM scale sets.

This is a cool idea.

Trusted Launch for Azure VMware Solution virtual machines

Azure VMware Solution proudly introduces Public Preview of Trusted Launch for Virtual Machines. This advanced feature comprises Secure Boot, Virtual Trusted Platform Module (vTPM), and Virtualization-based Security (VBS), collectively forming a formidable defense against modern cyber threats.

A feature that was introduced in Windows Server 2016 Hyper-V.

Infrastructure-As-Code

Introduction to Azure DevOps Workload identity federation (OIDC) with Terraform

Workload identity federation is an OpenID Connect implementation for Azure DevOps that allow you to use short-lived credential free authentication to Azure without the need to provision self-hosted agents with managed identity. You configure a trust between your Azure DevOps organisation and an Azure service principal. Azure DevOps then provides a token that can be used to authenticate to the Azure API.

This looks like a more secure way to authenticate your pipelines. No secrets are stored and a trust between your DevOps organasation and Azure enables short-lived authentication with desired access rights/scopes.

Quickstart: Automate an existing load test with CI/CD

In this article, you learn how to automate an existing load test by creating a CI/CD pipeline in Azure Pipelines. Select your test in Azure Load Testing, and directly configure a pipeline in Azure DevOps that triggers your load test with every source code commit. Automate load tests with CI/CD to continuously validate your application performance and stability under load.

This is not something that I have played with but I suspect that you don’t want to do this against production systems!

General Availability: GitHub Advanced Security for Azure DevOps

Starting September 20th, 2023, the core scanning capabilities of GitHub Advanced Security for Azure DevOps can now be self-enabled within Azure DevOps and connect to Microsoft Defender for Cloud. Customers can automate security checks in the developer workflow using:

Code Scanning: locates vulnerabilities in source code and provides remediation guidance.
Secret Scanning: identifies high-confidence secrets and blocks developers from pushing secrets into code repositories.
Dependency Scanning: discovers vulnerabilities with open-source dependencies and automates update alerts for developers.

This seems like a good direction to go but I’m told it’s quite pricey.

Networking

General availability: Sensitive Data Protection for Application Gateway Web Application Firewall

WAF running on Application Gateway now supports sensitive data protection through log scrubbing. When a request matches the criteria of a rule, and triggers a WAF action, that event is captured within the WAF logs. WAF logs are stored as plain text for debuggability, and any matching patterns with sensitive customer data like IP address, passwords, and other personally identifiable information could potentially end up in logs as plain text. To help safeguard this sensitive data, you can now create log scrubbing rules that replace the sensitive data with “******”.

Sounds good to me!

General availability: Gateway Load Balancer IPv6 Support

Azure Gateway Load Balancer now supports IPv6 traffic, enabling you to distribute IPv6 traffic through Gateway Load Balancer before it reaches your dual-stack applications.

With this support, you can now add IPv6 frontend IP addresses and backend pools to Gateway Load Balancer. This allows you to inspect, protect, or mirror both IPv4 and IPv6 traffic flows using third-party or custom network virtual appliances (NVAs).

Useful for security architectures where NVAs are being used

Azure Backup

Preview: Cross Region Restore (CRR) for Recovery Services Agent (MARS) using Azure Backup

We are announcing the support of Cross Region Restore for Recovery Services Agent (MARS) using Azure Backup.

This makes sense. Let’s say I back up my on-prem data, located in Virginia, to Azure East US, in Boydton Virginia. And then there’s a disaster in VA that wipes out my office and Azure East US. Now I can restore to a new location from the paired region replica.

Preview: Save Azure Backup Recovery Services Agent (MARS) passphrase to Azure Key Vault

Now, you can save your Azure Recovery Services Agent encryption passphrase in Azure Key Vault directly from the console, making the Recovery Services Agent installation seamless and secure.

This beats the old default option of saving it as a text file on the machine that you were backing up.

General availability: Selective Disk Backup and Restore in Enhanced Policy for Azure VM Backup

We are adding the “Selective Disk Backup and Restore” capability in Enhanced Policy of Azure VM Backup.

Be careful out there!

Storage

General Availability: Malware Scanning in Defender for Storage

Malware Scanning in Defender for Storage will be generally available September 1, 2023.

Please make sure that you read up on how much this will cost you. The DfC plans changed recently, and the pricing model for Storage plans changed to include this feature.

Azure Monitor

Public preview: Alerts timeline view

Azure Monitor alerts is previewing a new timeline view that simplifies the consumption experience of fired alerts. The new view has the following advantages:

Shows fired alerts on a timeline
Helps identify co-occurrence of alerts
Displays alerts in the context of the resources they fired on
Focuses on showing counts of alerts to better understand impact
Supports viewing alerts by severity
Provides a more intuitive discovery and investigation path

This might be useful if you are getting a lot of alerts.

Azure Virtual Desktop

Announcing general availability of Azure Virtual Desktop Custom Image Templates

Custom image templates allow admins to build a custom “golden image” using the Azure Virtual Desktop management user interface. Leverage a variety of built-in customizations or add your own customization scripts to install applications or configurations.

Why are they not using Azure Image Builder like I do?

Azure Infrastructure Announcements – August 2023

This post brings you a summary of the infrastructure announcements from Azure that were made during August 2023. There are lots of announcements from Storage and a few interesting notes for VMs, networking, and ASR.

Storage

Azure Managed Lustre: not your grandparents’ parallel file system

With a few clicks of a web interface or an Azure Resource Manager template, AMLFS lets you provision an all-flash Lustre file system in minutes. What’s different is that this Lustre file system is all yours. If someone else in Azure is running a job that creates a million files, you won’t ever know it because your Lustre servers and SSDs are exclusively yours.

Massively scaled and high performance file systems for HPC workloads.

General availability | Azure NetApp Files: SMB Continuous Availability (CA) shares

To enhance resiliency during storage service maintenance operations, SMB volumes used by Citrix App Layering, FSLogix user profile containers and Microsoft SQL Server on Microsoft Windows Server can be enabled with Continuous Availability

SMB Transparent Failover means that clients should not notice maintenance operations.

Public preview: Azure Storage Mover support for SMB and Azure Files

Storage Mover is a fully managed migration service that enables you to migrate on-premises files and folders to Azure Storage while minimizing downtime for your workload. Azure Storage Mover can now migrate your SMB shares to Azure file shares.

To be honest, I’ve not encountered a “replace the file server with Azure Files” scenario yet. Third-party vendors often won’t support it for LOB apps. User data typically ends up in SharePoint/OneDrive. And wouldn’t most Citrix/RDS admins want to start with new profiles?

Generally available: Azure Blob Storage Cold Tier

Azure Blob Storage Cold Tier is now generally available. It is a new online access tier that is the most cost-effective Azure Blob offering for storing infrequently accessed data with long-term retention requirements, while providing instant access. The pricing of the cold tier storage option lies between the cool and archive tiers, and it follows a 90-day early deletion policy. You can seamlessly utilize the cold tier in the same way as the hot and cool tiers.

Cool – Cold. Tell me that isn’t confusing. The scenario is that you want to store data for a long time, but you need it immediately available. Archive requires a 15-hour restore (“rehydration”) that can be accelerated with a charge. Cold is one step up, but not as cost-effective.

Public Preview: Azure NetApp Files Cloud Backup for Virtual Machines

With Cloud Backup for Virtual Machines, you can now create VM consistent snapshot backups of VMs on Azure NetApp Files datastores. The associated virtual appliance installs in the Azure VMware Solution cluster and provides policy-based automated and consistent backup of VMs integrated with Azure NetApp Files snapshot technology for fast backups and restores of VMs, groups of VMs (organized in resource groups) or complete datastores lowering RTO, RPO, and improving total cost of ownership.

General Availability: Incremental snapshots for Premium SSD v2 Disk and Ultra Disk Storage

You can now instantly restore Premium SSD v2 and Ultra Disks from snapshots and attach them to a running VM without waiting for any background copy of data. This new capability allows you to read and write data on disks immediately after creation from snapshots, enabling you to recover your data from accidental deletes or a disaster quickly

I can see third-party backup making use of this.

Azure Elastic SAN updates: Private Endpoints & Shared Volumes

As we approach general availability of Azure Elastic SAN, we continue improving the service and adding features based on your feedback. Today, we are releasing private endpoint support and volume sharing support via SCSI (Small Computer System Interface) Persistent Reservation.

This sounds like the sort of feature maturity one will expect as the service approaches general availability. I wonder what the actual target market is for this service.

Azure Site Recovery

Private Preview – DR for Shared Disks – Azure Site Recovery

We are excited to announce the Private Preview of DR for Azure Shared Disks for workloads running Windows Server Failover Clusters (WSFC) on Azure VMs. Now you can protect, monitor, and recover your WSFC-clusters as a single unit across its DR Lifecycle, while also generating cluster-consistent recovery points – which are consistent across all the disks (including the Shared Disk) of the cluster.

This feature is long overdue for customers using shared virtual hard disks to create failover clusters.

Networking

Public preview: Support for new custom error pages in Application Gateway

In addition to the response codes 403 and 502, the Azure Application Gateway now lets you configure company-branded error pages for more response codes – 400, 405, 408, 500, 503, and 504. You can configure these error pages at a global level to apply to all the listeners on your gateway or individually for each listener.

These pages can be shared on any publicly accessible URI.

Azure Firewall: New Monitoring and Logging Updates

Notes:

(Preview) With the Azure Firewall Resource Health check, you can now view the health status of your Azure Firewall and address service problems that may affect your Azure Firewall resource. Resource Health allows IT teams to receive proactive notifications regarding potential health degradations and recommended mitigation actions for each health event type
(Preview) The Azure Firewall Workbook presents a dynamic platform for analyzing Azure Firewall data. Within the Azure portal, you can utilize it to generate visually engaging reports.
(GA) The Latency Probe metric is designed to measure the overall latency of Azure Firewall and provide insight into the health of the service. IT administrators can use the metric for monitoring and alerting if there is observable latency and diagnosing if the Azure Firewall is the cause of latency in a network.

Resource health should make for a useful alert, especially when enabling DevSecOps – be aware of the dreaded “out of sync” error. I just tried the workbook in a production system – I noticed a couple of things that I might not have otherwise noticed because they didn’t trigger a human response (yet). The latency probe is interesting – I think it originated from customer network performance scenarios where it was suspected that the firewall was the root cause.

Virtual Machines

Public preview: Azure Mv3 Medium Memory (MM) Virtual Machines

Today we are announcing the public preview of the next generation Mv3 Medium Memory (MM) virtual machine series. Powered by the 4th Generation Intel® Xeon® Scalable Processor and DDR5 DRAM technology, the Mv3 medium memory (MM) virtual machines can scale for SAP workloads from 250GB to 4TB. With Azure Boost, Mv3 MM provides a ~25% improvement in network throughput and up to 1.5X improvement in remote storage throughput over the previous M-series families.

These machines start at 12 vCPUs and 240 GB RAM, scaling up to 176 vCPUs and 2794 RAM. That should just about be enough to run Teams.

The Azure IaaS Book Of News – December 2022

Here’s all the news that I thought was interesting for Ops and Security folks working with Azure IaaS from December 2022.

Azure VMware Solution

Azure VMware Solution Advanced Monitoring: This solution add-on deploys a virtual machine running Telegraf in Azure with a managed identity that has contributor and metrics publisher access to the Azure VMware Solution private cloud object. Telegraf then connects to vCenter Server and NSX-T Manager via API and provides responses to API metric requests from the Azure portal.

Azure Kubernetes Service

Microsoft and Isovalent partner to bring next generation eBPF dataplane for cloud-native applications in Azure: Microsoft announces the strategic partnership with Isovalent to bring Cilium’s eBPF-powered networking data plane and enhanced features for Kubernetes and cloud-native infrastructure. Azure Kubernetes Services (AKS) will now be deployed with Cilium open-source data plane and natively integrated with Azure Container Networking Interface (CNI). Microsoft and Isovalent will enable Isovalent Cilium Enterprise as a Kubernetes container App offering onto Azure Container Marketplace. This will provide a one-click deployment solution to Azure Kubernetes clusters with Isovalent Cilium Enterprise advanced features.
Generally Available: Kubernetes 1.25 support in AKS: AKS support for Kubernetes release 1.25 is now generally available. Kubernetes 1.25 delivers 40 enhancements. This release includes new changes such as the removal of PodSecurityPolicy.

Azure Backup

General Availability of Cross Zonal Restore of Azure Virtual Machines from Azure Backup: With the preview of Cross Zonal Restore of Azure VMs, Azure Backup offers a compelling set of durability options for your backup data including ZRS for intra-region high durability. Aidan’s note – you should consider this with regions such as Norway East where the paired region is unavailable to 99.9% of customers.
How to automate On-Demand Azure Backup for Azure Virtual Machines using PowerShell: Aidan’s note – A solution to enable more frequent VM backups than otherwise possible, but make sure frequency doesn’t overlap with backup job time.

Azure Virtual Desktop

Announcing the Public Preview of AVD Insights at Scale: This update provides the ability to review performance and diagnostic information across multiple host pools in one view. Aidan’s note – no additional diagnostics settings are required.
Confidential Virtual Machine support for Azure Virtual Desktop now in Public Preview: Azure Virtual Desktop has public preview support for Azure Confidential Virtual Machines. Confidential Virtual Machines increase data privacy and security by protecting data in use.
Announcing general availability of RDP Shortpath: RDP Shortpath improves the transport reliability of Azure Virtual Desktop connections by establishing a direct UDP data flow between the Remote Desktop client and session hosts. This feature is enabled by default for all customers. Aidan’s Note – I haven’t looked into this but there may be networking issues where firewall’s/routing are deployed.
Announcing general availability of FSLogix 2210: This latest version is focused on three core features, six bug fixes, and two general updates.

Virtual Machines

Public preview: New Memory Optimized VM sizes – E96bsv5 and E112ibsv5: The new E96bsv5 and E112ibsv5 VM sizes part of the Azure Ebsv5 VM series offer the highest remote storage performances of any Azure VMs to date. The new VMs can now achieve even higher VM-to-disk throughput and IOPS performance with up to 8,000 MBps and 260,000 IOPS.
Generally Available: Azure Dedicated Host – Restart: Azure Dedicated Host gives you more control over the hosts you deployed by giving you the option to restart any host. When undergoing a restart, the host and its associated VMs will restart while staying on the same underlying physical hardware.

Governance

Public preview: Use tag inheritance for cost management: You no longer need to ensure that every resource is tagged or rely on resource providers to support and emit tags in their billing pipeline for cost management. Aidan’s Note – Restricted to EA/MCA … which unreasonably sucks. The latest example of “cost management” excluding other customers.

App Services

Generally available: Static Web Apps Diagnostics: Static Web Apps diagnostics will help you diagnose what went wrong and will show you how to resolve the issues.

Storage

Public preview: Azure NetApp Files cross-zone replication: The cross-zone replication feature allows you to replicate your Azure NetApp Files volumes asynchronously from one Azure availability zone (AZ) to another in the same region.

Azure Site Recovery

Public Preview: Azure Site Recovery Higher Churn Support: Azure Site Recovery (ASR) has increased its data churn limit by approximately 2.5x to 50 MB/s per disk. With this, you can configure disaster recovery (DR) for Azure VMs having data churn up to 100 MB/s. This helps you to enable DR for more IO intensive workloads.

Networking

General availability: Feature enhancements to Azure Web Application Firewall (WAF): Azure’s global Web Application Firewall (WAF) running on Azure Front Door, and Azure’s regional WAF running on Application Gateway, now support additional features that help organizations improve their security posture and make it easier to manage logging across resources.

Miscellaneous

Public Preview : Introducing Multi-Region Replication for Azure Key Vault Managed HSM: The feature allows you to extend a managed HSM pool from one Azure region to an other thereby enhancing the availability of mission critical cryptographic keys with automated key replication and maximizing read throughput and latency with the closest available region.

Understanding the Azure Image Builder Resources

In this post, I will explain the roles of and links/connections between the various resources used by Azure Image Builder.

Background

I enjoy the month of July. My customers, all in the Nordics, are off for the entire month and I am working. This year has been a crazy busy one so far, so there has been almost no time in the lab – noticeable I’m sure by my lack of writing. But this month, if all goes to plan, I will have plenty of time in the lab. As I type, a pipeline is deploying a very large lab for me. While that runs, I’ve been doing some hands on lab work.

Recently I helped develop and use an image building process, based on Packer, to regularly create images for a Citrix farm hosted in Microsoft Azure. It’s a pretty sweet solution that is driven from Azure DevOps and results in a very automated deployment that requires little work to update app versions or add/remove apps. At the time, I quickly evaluated Azure Image Builder (also based on Packer but still in Preview back then) but I thought it was too complicated and would still require the same pieces as our Packer solution. But I did decide to come back to Azure Image Builder when there was time (today) and have another look.

The first mission – figure out the resource complexity (compared to Packer by itself).

The Resources

I believe that one of Microsoft’s failings when documenting these services is their inability to explain the functions of the resources and how they work together. Working primarily in ARM templates, I get to see that stuff (a little). I’ve always felt that understanding the underlying system helps with understanding the solution – it was that way with Hyper-V and that continues with Azure.

Managed Identity – Microsoft.ManagedIdentity/userAssignedIdentities

A managed identity will be used by an Image Template to authorise Packer to use the imaging process that you are building. A custom role is associated with this Managed Identity, granting Packer rights to the resource group that the Shared Image Gallery, Image Definition, and Image Template are stored in.

Shared Image Gallery – Microsoft.Compute/galleries/images

The Shared Image Gallery is the management resource for images. The only notable attribute in the deployment is the name of the resource, which sadly, is similar to things like Storage Accounts in lacking standardisation with the rest of Microsoft Azure resource naming.

Image Definition- Microsoft.Compute/galleries/images

The Image Definition documents your image as you would like to present it to your “customers”.

The Image Definition is associated with the Shared Image Gallery by naming. If your Shared Image Gallery was named “myGallery” then an image definition called “myImage” would actually be named as “myGallery/myImage”.

The properties document things including:

VM generation
OS type
Generalised or not
How you will brand the images build from the Image Definition

Image Template – Microsoft.VirtualMachineImages/imageTemplates

This is where you will end up spending most of your time while operating the imaging process over time.

The Image Template describes to Packer (hidden by Azure) how it will build your image:

Identity points to the resource ID of the Managed Identity, permitting Packer to sign in as that identity/receiving its rights when using this Image Template to build an Image Version.
Properties:
- Source: The base image from the Azure Marketplace to start the build with.
- Customize: The tasks that can be run, including PowerShell scripts that can be downloaded, to customise the image, including installing software, configuring the OS, patching and rebooting.
- Distribute: Here you associate the Image Template with an Image Definition, referencing the resource ID of the desired Image Definition. Everytime you run this Image Template, a new Image Version of the Image Definition will be created.

Image Version – Microsoft.Compute/galleries/images/versions

An Image Version, a resource with a messy resource name that will break your naming standards, is created when you build from an Image Template. The name of the Image Version is based on the name of the Image Definition plus an incremental number. If my Image Definition is named “myGallery/myImage” then the Image Version will be named “myGallery/myImage/<unique number>”.

The properties of this resource include a publishing profile, documenting to what regions an image is replicated and how it is stored.

What Is Not Covered

Packer will create a resource group and virtual machine (and associated resources) to build the new image. The way that the virtual machine is networked (public IP address by default) can normally be manipulated by the Image Template when using Packer.

Summary

There is a lot more here than with a simple run of Packer. But, Azure Image Builder provides a lot more functionality for making images available to “customers” across an enterprise-scale deployment; that’s really where all the complexity comes from and I guess “releasing” is something that Microsoft knows a lot about.

What Impact on You Will AMD EPYC Processors Have?

Microsoft has announced new HB-V2, Das_v3, and Eas_v3 virtual machines based on hosts with AMD EPYC processors. What does this mean to you and when should you use these machines instead of the Intel Xeon alternatives?

A is for AMD

The nomenclature for Azure virtual machines is large. It can be confusing for those unfamiliar with the meanings. When I discussed the A-Series, the oldest of the virtual machine series, I would tell people “A is the start of the alphabet” and discuss these low power machines. The A-Series was originally hosted on physical machines with AMD Opteron processors, a CPU that had lots of cores and required little electricity when compared to the Intel Xeon competition. These days, an A-Series might actually be hosted on hosts with Intel CPUs, but each virtual processor is throttled to offer similar performance to the older hosts.

Microsoft has added the AMD EPYC 7002 family of processors to their range of hosts, powering new machines:

HB_v2: A high performance compute machine with high bandwidth between the CPU and RAM.
Das_v3 (and Da_v3): A new variation on the Ds_v3 that offers fast disk performance that is great for database virtual
Eas_v3 (and Ea_v3): Basically the Das_v3 with extra

EPYC Versus Xeon

The 7002 or “Rome” family of EPYC processors is AMD’s second generation of this type of processor. From everything I have read, this generation of the processor family firmly returns AMD back into the data centre.

I am not a hardware expert, but some things really stand out about the EPYC, which AMD claims is revolutionary about how it focuses on I/O, which pretty important for services such as databases (see the Ds_v3/Es_v3 core scenarios). EPYC uses PCI Gen 4 which is double the performance of Gen 3 which Intel still uses. That’s double the bus to storage … great for disk performance. The EPYC gets offers 45% faster RAM access than the Intel option … hence Microsoft’s choice for the HB_v2. If you want to get nerdy, then there are fewer NUMA nodes per socket, which reduces context switches for complex RAM v process placement scenarios.

Why AMD Now?

There have been rumours that Microsoft hasn’t been 100% happy with Intel for quite a while. Everything I heard was in the PC market (issues with 4^th generation, battery performance, mobility, etc). I have not heard any rumours of discontent between Azure and Intel – in fact, the DC-Series virtual machine exists because of cooperation between the two giant technology corporations on SGX. But two things are evident:

Competition is good
Everything you read about AMD’s EPYC makes it sound like a genuine Xeon killer. As AMD says, Xeon is a BMW 3-series and EPYC is a Tesla – I hope the AMD build quality is better than the American-built EV!
As is often the case, the AMD processor is more affordable to purchase and to power – both big deals for a hosting/cloud company.

Choosing Between AMD and Xeon

OK, it was already confusing which machine to choose when deploying in Azure … unless you’ve heard me explain the series and specialisation meanings. But now we must choose between AMD and Intel processors!

I was up at 5 am researching so this next statement is either fuzzy or was dreamt up (I’m not kidding!): it appears that for multi-threaded applications, such as SQL Server, then AMD-powered virtual machines are superior. However, even in this age-of-the-cloud, single threaded applications are still running corporations. In that case, (this is where things might be fuzzy) an Intel Xeon-powered virtual machine might be best. You might think that single-threaded applications are a thing of the past but I recently witnessed the negative affect on performance of one of those – no matter what virtual/hardware was thrown at it.

The final element of the equation will be cost. I have no idea how the cost of the EPYC-powered machines will compare with the Xeon-powered ones. I do know that the AMD processor is cheaper and offers more threads per socket, and it should require less power. That should make it a cheaper machine to run, but higher consumption of IOs per machine might increase the cost to the hosting company (Azure). I guess we’ll know soon enough when the pricing pages are updated.

Webinar – Getting More Performance From Azure VMs

I will be doing a webinar later today for the European SharePoint Office 365 & Azure Community (from the like-named conference). The webinar is at 14:00 UK/Irish, 15:00 CET, and 09:00 EST. Registration is here.

Title: Getting More Performance from Azure Virtual Machines

Speaker: Aidan Finn, MVP, Ireland

Date and Time: Wed, May 1, 2019 3:00 PM – 4:00 PM CEST

Webinar Description: You’ve deployed your shiny new application in the cloud, and all that pride crashes down when developers and users start to complain that it’s slow. How do you fix it? In this session you’ll learn to understand what Azure virtual machines can offer, how to pick the right ones for the right job, and how to design for the best possible performance, including networking, storage, processor, and GPU.

Key benefits of attending:
– Understand virtual machine design
– Optimise storage performance
– Get more from Azure networking

Azure Availability Zones in the Real World

I will discuss Azure’s availability zones feature in this post, sharing what they can offer for you and some of the things to be aware of.

Uptime Versus SLA

Noobs to hosting and cloud focus on three magic letters: S, L, A or service level agreement. This is a contractual promise that something will be running for a certain percentage of time in the billing period or the hosting/cloud vendor will credit or compensate the customer.

You’ll hear phrases like “three nines”, or “four nines” to express the measure of uptime. The first is a 99.9% measure, and the second is a 99.99% measure. Either is quite a high level of uptime. Azure does have SLAs for all sorts of things. For example, a service deployed in a valid virtual machine availability set has a connectivity (uptime) SLA of 99.9%.

Why did I talk about noobs? Promises are easy to make. I once worked for a hosting company that offers a ridiculous 100% SLA for everything, including cheap-ass generic Pentium “servers” from eBay with single IDE disks. 100% is an unachievable target because … let’s be real here … things break. Even systems with redundant components have downtime. I prefer to see realistic SLAs and honest statements on what you must do to get that guarantee.

Azure gives us those sorts of SLAs. For virtual machines we have:

5% for machines with just Premium SSD disks
9% for services running in a valid availability set
99% for services running in multiple availability zones

Ah… let’s talk about that last one!

Availability Sets

First, we must discuss availability sets and what they are before we move one step higher. An availability set is anti-affinity, a feature of vSphere and in Hyper-V Failover Clustering (PowerShell or SCVMM); this is a label on a virtual machine that instructs the compute cluster to spread the virtual machines across different parts of the cluster. In Azure, virtual machines in the same availability set are placed into different:

Update domains: Avoiding downtime caused by (rare) host reboots for updates.
Fault domains: Enable services to remain operational despite hardware/software failure in a single rack.

The above solution spreads your machines around a single compute (Hyper-V) cluster, in a single room, in a single building. That’s amazing for on-premises, but there can still be an issue. Last summer, a faulty humidity sensor brought down one such room and affected a “small subset” of customers. “Small subset” is OK, unless you are included and some mission critical system was down for several hours. At that point, SLAs are meaningless – a refund for the lost runtime cost of a pair of Linux VMs running network appliance software won’t compensate for thousands or millions of Euros of lost business!

Availability Zones

We can go one step further by instructing Azure to deploy virtual machines into different availability zones. A single region can be made up of different physical locations with independent power and networking. These locations might be close together, as is typically the case in North Europe or West Europe. Or they might be on the other side of a city from each other, as is the case in some in North America. There is a low level of latency between the buildings, but this is still higher than that of a LAN connection.

A region that supports availability zones is split into 4 zones. You see three zones (round robin between customers), labeled as 1, 2, and 3. You can deploy many services across availability zones – this is improving:

VNet: Is software-defined so can cross all zones in a single region.
Virtual machines: Can connect to the same subnet/address space but be in different zones. They are not in availability sets but Azure still maintains service uptime during host patching/reboots.
Public IP Addresses: Standard IP supports anycast and can be used to NAT/load balance across zones in a single region.

Other network resources can work with availability zones in one of two ways:

Zonal: Instances are deployed to a specific zone, giving optimal latency performance within that zone, but can connect to all zones in the region.
Zone Redundant: Instances are spread across the zone for an active/active configuration.

Examples of the above are:

The zone-aware VNet gateways for VPN/ExpressRoute
Standard load balancer
WAGv2 / WAFv2

Considerations

There are some things to consider when looking at availability zones.

Regions: The list of regions that supports availability zones is increasing slowly but it is far from complete. Some regions will not offer this highest level of availability.
Catchup: Not every service in Azure is aware of availability zones, but this is changing.

Let me give you two examples. The first is VM Boot Diagnostics, a service that I consider critical for seeing the console of the VM and getting serial console access without a network connection to the virtual machine. Boot Diagnostics uses an agent in the VM to write to a storage account. That storage account can be:

LRS: 3 replicas reside in a single compute cluster, in a single room, in a single building (availability zone).
GRS: LRS plus 3 asynchronous replicas in the paired region, that are not available for write unless Microsoft declares a total disaster for the primary region.

So, if I have a VM in zone 1 and a VM in zone 2, and both write to a storage account that happens to be in zone 1 (I have no control over the storage account location), and zone 1 goes down, there will be issues with the VM in zone 2. The solution would be to use ZRS GPv2 storage for Boot Diagnostics, however, the agent will not support this type of storage configuration. Gotcha!

Azure Advisor will also be a pain in the ass. Noobs are told to rely on Advisor (it is several questions in the new Azure infrastructure exams) for configuration and deployment advice. Advisor will see the above two VMs as being not highly available because they are not (and cannot) be in a common availability set, so you are advised to degrade their SLA by migrating them to a single zone for an availability set configuration – ignore that advice and be prepared to defend the decision from Azure noobs, such as management, auditors, and ill-informed consultants.

Opinion

Availability zones are important – I use them in an architecture pattern that I am working on with several customers. But you need to be aware of what they offer and how certain things do not understand them yet or do not support them yet.

Generation 2 Virtual Machines Make Their First Public Appearance in Microsoft Azure

Microsoft has revealed that the new preview series of confidential computing virtual machines, the DC-Series, which went into public preview overnight are based on Generation 2 (Gen 2) Hyper-V virtual machines. This is the first time that a non-Generation 1 (Gen 1) VM has been available in Azure.

Note that ASR allows you to migrate/replicate Generation 2 machines into Azure by converting them into Generation 1 at the time of failover.

These confidential compute VMs use hardware features of the Intel chipset to provide secure enclaves to isolate the processing of sensitive data.

The creation process for a DC-Series is a little different than usual – you have to look for Confidential Compute VM Deployment in the Marketplace and then you work through a (legacy blade-based) customised deployment that is not as complete as a normal virtual machine deployment. In the end a machine appears.

I’ve taken a screenshot from a normal Azure VM including a view of Device Manager from Windows Server 2016 with the OS disk.

Note that both the OS disk and the Temp Drive are IDE drives on a Virtual HD ATA controller. This is typical a Generation 1 virtual machine. Also note the IDE/ATA controller?

Now have a look at a DC-Series machine:

Note how the OS disk and the Temp Drive are listed as Microsoft Virtual Disk on SCSI controllers? Ah – definitely a Generation 2 virtual machine! Also do you see the IDE/ATA controller is missing from the device listing? If you expand System Devices you will find that the list is much smaller. For example, the Hyper-V S3 Cap PCI bus video controller (explained here by Didier Van Hoye) of Generation 1 is gone.

Did you Find This Post Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Frankfurt on December 3-4, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Microsoft Ignite 2018: Implement Cloud Backup & Disaster Recovery At Scale in Azure

Speakers: Trinadh Kotturu, Senthuran Sivananthan, & Rochak Mittal

Site Recovery At Scale

Senthuran Sivananthan

Real Solutions for Real Problems

Customer example: Finastra.

BCP process: Define RPO/RTO. Document DR failover triggers and approvals.
Access control: Assign clear roles and ownership. Levarage ASR built-in roles for RBAC. Different RS vault for different BU/tenants. They deployed 1 RSV per app to do this.
Plan your DR site: Leveraged region pairs – useful for matching GRS replication of storage. Site connectivity needs to be planned. Pick the primary/secondary regions to align service availability and quota availability – change the quotas now, not later when you invoke the BCP.
Monitor: Monitor replication health. Track configuration changes in environment – might affect recovery plans or require replication changes.
DR drills: Periodically do test failovers.

Journey to Scale

Automation: Do things at scale
Azure Policy: Ensure protection
Reporting: Holistic view and application breakdown
Pre- & Post- Scripts: Lower RTO as much as possible and eliminate human error

Demos – ASR

Rochak for demos of recent features. Azure Policies coming soon.

Will assess if VMs are being replicated or not and display non-compliance.

Expanding the monitoring solution.

Demo – Azure Backup & Azure Policy

Trinadh creates an Azure Policy and assigns it to a subscription. He picks the Azure Backup policy definition. He selects a resource group of the vault, selects the vault, and selects the backup policy from the vault. The result is that any VM within the scope of the policy will automatically be backed up to the selected RSV with the selected policy.

Azure Backup & Security

Supports Azure Disk Encryption. KEK and BEK are backed up automatically.

AES 256 protects the backup blobs.

Compliance

HIPAA
ISO
CSA
GDPR
PCI-DSS
Many more

Built-in Roles

Cumulative:

Backup reader – see only
Backup Operator: Enable backup & restore
Backup contributor: Policy management and Delete-Stop Backup

Protect the Roles

PIM can be used to guard the roles – protect against rogue admins.

JIT access
MFA
Multi-user approval

Data Security

PIN protection for critical actions, e.g. delete
Alert: Notification on critical actions
Recovery: Data kept for 14 days after delete. Working on blob soft delete

Backup Center Demo

Being built at the moment. Starting with VMs now but will include all backup items eventually.

All RSVs in the tenant (doh!) managed in a central place.

Aimed at the large enterprise.

They also have Log Analytics monitoring if you like that sort of thing. I’m not a fan of LA – I much prefer Azure Monitor.

Reporting using Power BI

Trinadh demos a Power BI reporting solution that unifies backup data from multiple tenants into a single report.