Azure Infrastructure Announcements – August 2023

This post brings you a summary of the infrastructure announcements from Azure that were made during August 2023. There are lots of announcements from Storage and a few interesting notes for VMs, networking, and ASR.

Storage

Azure Managed Lustre: not your grandparents’ parallel file system

With a few clicks of a web interface or an Azure Resource Manager template, AMLFS lets you provision an all-flash Lustre file system in minutes. What’s different is that this Lustre file system is all yours. If someone else in Azure is running a job that creates a million files, you won’t ever know it because your Lustre servers and SSDs are exclusively yours.

Massively scaled and high performance file systems for HPC workloads.

General availability | Azure NetApp Files: SMB Continuous Availability (CA) shares

To enhance resiliency during storage service maintenance operations, SMB volumes used by Citrix App Layering, FSLogix user profile containers and Microsoft SQL Server on Microsoft Windows Server can be enabled with Continuous Availability

SMB Transparent Failover means that clients should not notice maintenance operations.

Public preview: Azure Storage Mover support for SMB and Azure Files

Storage Mover is a fully managed migration service that enables you to migrate on-premises files and folders to Azure Storage while minimizing downtime for your workload. Azure Storage Mover can now migrate your SMB shares to Azure file shares.

To be honest, I’ve not encountered a “replace the file server with Azure Files” scenario yet. Third-party vendors often won’t support it for LOB apps. User data typically ends up in SharePoint/OneDrive. And wouldn’t most Citrix/RDS admins want to start with new profiles?

Generally available: Azure Blob Storage Cold Tier

Azure Blob Storage Cold Tier is now generally available. It is a new online access tier that is the most cost-effective Azure Blob offering for storing infrequently accessed data with long-term retention requirements, while providing instant access. The pricing of the cold tier storage option lies between the cool and archive tiers, and it follows a 90-day early deletion policy. You can seamlessly utilize the cold tier in the same way as the hot and cool tiers.

Cool – Cold. Tell me that isn’t confusing. The scenario is that you want to store data for a long time, but you need it immediately available. Archive requires a 15-hour restore (“rehydration”) that can be accelerated with a charge. Cold is one step up, but not as cost-effective.

Public Preview: Azure NetApp Files Cloud Backup for Virtual Machines

With Cloud Backup for Virtual Machines, you can now create VM consistent snapshot backups of VMs on Azure NetApp Files datastores. The associated virtual appliance installs in the Azure VMware Solution cluster and provides policy-based automated and consistent backup of VMs integrated with Azure NetApp Files snapshot technology for fast backups and restores of VMs, groups of VMs (organized in resource groups) or complete datastores lowering RTO, RPO, and improving total cost of ownership.

General Availability: Incremental snapshots for Premium SSD v2 Disk and Ultra Disk Storage

You can now instantly restore Premium SSD v2 and Ultra Disks from snapshots and attach them to a running VM without waiting for any background copy of data. This new capability allows you to read and write data on disks immediately after creation from snapshots, enabling you to recover your data from accidental deletes or a disaster quickly

I can see third-party backup making use of this.

Azure Elastic SAN updates: Private Endpoints & Shared Volumes

As we approach general availability of Azure Elastic SAN, we continue improving the service and adding features based on your feedback. Today, we are releasing private endpoint support and volume sharing support via SCSI (Small Computer System Interface) Persistent Reservation.

This sounds like the sort of feature maturity one will expect as the service approaches general availability. I wonder what the actual target market is for this service.

Azure Site Recovery

Private Preview – DR for Shared Disks – Azure Site Recovery

We are excited to announce the Private Preview of DR for Azure Shared Disks for workloads running Windows Server Failover Clusters (WSFC) on Azure VMs. Now you can protect, monitor, and recover your WSFC-clusters as a single unit across its DR Lifecycle, while also generating cluster-consistent recovery points – which are consistent across all the disks (including the Shared Disk) of the cluster.

This feature is long overdue for customers using shared virtual hard disks to create failover clusters.

Networking

Public preview: Support for new custom error pages in Application Gateway

In addition to the response codes 403 and 502, the Azure Application Gateway now lets you configure company-branded error pages for more response codes – 400, 405, 408, 500, 503, and 504. You can configure these error pages at a global level to apply to all the listeners on your gateway or individually for each listener. 

These pages can be shared on any publicly accessible URI.

Azure Firewall: New Monitoring and Logging Updates

Notes:

  • (Preview) With the Azure Firewall Resource Health check, you can now view the health status of your Azure Firewall and address service problems that may affect your Azure Firewall resource. Resource Health allows IT teams to receive proactive notifications regarding potential health degradations and recommended mitigation actions for each health event type
  • (Preview) The Azure Firewall Workbook presents a dynamic platform for analyzing Azure Firewall data. Within the Azure portal, you can utilize it to generate visually engaging reports.
  • (GA) The Latency Probe metric is designed to measure the overall latency of Azure Firewall and provide insight into the health of the service. IT administrators can use the metric for monitoring and alerting if there is observable latency and diagnosing if the Azure Firewall is the cause of latency in a network.

Resource health should make for a useful alert, especially when enabling DevSecOps – be aware of the dreaded “out of sync” error. I just tried the workbook in a production system – I noticed a couple of things that I might not have otherwise noticed because they didn’t trigger a human response (yet). The latency probe is interesting – I think it originated from customer network performance scenarios where it was suspected that the firewall was the root cause.

Virtual Machines

Public preview: Azure Mv3 Medium Memory (MM) Virtual Machines

Today we are announcing the public preview of the next generation Mv3 Medium Memory (MM) virtual machine series. Powered by the 4th Generation Intel® Xeon® Scalable Processor and DDR5 DRAM technology, the Mv3 medium memory (MM) virtual machines can scale for SAP workloads from 250GB to 4TB. With Azure Boost, Mv3 MM provides a ~25% improvement in network throughput and up to 1.5X improvement in remote storage throughput over the previous M-series families. 

These machines start at 12 vCPUs and 240 GB RAM, scaling up to 176 vCPUs and 2794 RAM. That should just about be enough to run Teams.

Microsoft Ignite 2019 – Windows Server on Azure Overview, Lift-and-Shift Migrations for Enterprise Workloads

Speakers:

  • Rob Hindman, Microsoft
  • Elden Christensen, Microsoft

Why Windows + Azure

  • Unmatched security
  • Built-in hybrid
  • Most cost effective
  • Unparalleled innovation and deep trust with enterprises

Weighing Your Options

  • Rehost – lift and shift
  • Refactor, rearchitect or rebuild – modernize/transform

Workloads

Typically dictates your migration options.

Windows Server 2008/R2

Lift-and-shift to Azure offers free extended security updates beyond the normal EOL of Na 14, 2020, by 3 more years (Jan 14, 2023).

You can unlock on-prem server extended support (if you buy it) through the Azure Portal.

Hybrid Benefit

If you have Software Assurance then you can reduce your Windows Server cost (built into the cost of Azure VMs with Windows) with the Azure Hybrid Use Benefit (AHUB).

Workloads

A bunch of eyechart tests that should be downloaded for reference.

S2D with Ultra Disk support “coming soon”.

SLA

You can build HA in infrastructure or app. Focus on App availability – Azure VM SLAs are built on this concept. Other consideration is operational consistency.

Designs

A bunch of designs are shown, including S2D and Storage Spaces clusters. Two Azure options:

  • Premium File Shares (GA) as shared storage
  • Shared Azure Disks (roadmap) as shared cluster disks – witness & CSV https://aka.ms/SharedDiskSignUp Will support Premium SSD and Ultra SSD.

Lessons From The Trenches

Download the slide(s) – lots of details.

  1. You are the operator of Azure VMs and must to updates, backups, etc, and test.
  2. You must practice failure scenarios and have processes to deal with it.
  3. DO not use the VM temporary disk
  4. Do not attempt to convert from Standard to Premium disks during DR site failover – the conversion is not instant.
  5. Large VMs do not compensate for application architecture –sometimes refactoring cannot be avoided
  6. Do not had-code specific VM gallery image names into scripts. Images are retired after two years – use the latest version because it is recently patched.
  7. Use DskSpd and FIO to measure performance as early as possible. Run multiple tests – because Azure Cache will kick in to improve performance. Note that some regions (East US) will perform faster than others.
  8. Use WS2019 to capture Azure host maintenance events in the faulover clustering event logs – 1139, 1140, 1136
  9. S2D on Azure IaaS VM guest clusters can only use 2-way and 3-way mirror reliiency types. S2D caching tiers cannot be used.
  10. Don’t let S2D volumes become 100% full. Expanding when they are full is difficult and requires downtime.
  11. S2D is slower on clouds that enforce VM network QoS
  12. Mirror repair is faster on WS2019.
  13. WS2019 for file server roles, especially the Information Worker workload is better.

 

 

Cloud Mechanix – “Starting Azure Infrastructure” Training Coming To Frankfurt, Germany

I have great news. Today I got confirmation that our venue for the next Cloud Mechanix class has been confirmed. So on December 3-4, I will be teaching my Cloud Mechanix “Starting Azure Infrastructure” class in Frankfurt, Germany. Registration Link.

Buy Ticket

About The Event

This HANDS-ON theory + practical course is intended for IT professionals and developers that wish to start working with or improve their knowledge of Azure virtual machines. The course starts at the very beginning, explaining what Azure is (and isn’t), administrative concepts, and then works through the fundamentals of virtual machines before looking at more advanced topics such as security, high availability, storage engineering, backup, disaster recovery, management/alerting, and automation.

Aidan has been teaching and assisting Microsoft partners in Ireland about Microsoft Azure since 2014. Over this time he has learned what customers are doing in Azure, and how they best get results. Combined with his own learning, and membership of the Microsoft Valuable Professional (MVP) program for Microsoft Azure, Aidan has a great deal of knowledge to share.

We deliberately keep the class small (maximum of 20) to allow for a more intimate environment where attendees can feel free to interact and ask questions.

Agenda

This course spans two days, running on December 3-4, 2018. The agenda is below.

Day 1 (09:30 – 17:00):

  • Introducing Azure
  • Tenants & subscriptions
  • Azure administration
  • Admin tools
  • Intro to IaaS
  • Storage
  • Networking basics

Day 2 (09:30 – 17:00):

  • Virtual machines
  • Advanced networking
  • Backup
  • Disaster recovery
  • JSON
  • Diagnostics
  • Monitoring & alerting
  • Security Center

The Venue

The location is the Novotel Frankfurt City. This hotel:

  • Has very fast Wi-Fi – an essential requirement for hands-on cloud training!
  • Reasonably priced accommodation.
  • Has car parking – which we are paying for.
  • Is near the Messe (conference centre) and is beside the Kuhwaldstraße tram station and the Frankfurt Main West train station and S-Bahn.
  • Is just a 25 minute walk or 5 minutes taxi from the Hauptbahnhof (central train station).
  • It was only 15-20 minutes by taxi to/from Frankfurt Airport when we visited the hotel to scout the location.

image

Costs

The regular cost for this course is €999 per person. If you are registering more than one person, then the regular price will be €849 per person. A limited number of early bird ticks are on sale for €659 each.

You can pay for for the course by credit card (handled securely by Stripe) or PayPal on the official event site. You can also pay by invoice/bank transfer by emailing contact@cloudmechanix.com. Payment must be received within 21 days of registration – please allow 14 days for an international (to Ireland) bank transfer. We require the following information for invoice & bank transfer payment:

  • The name and contact details (email and phone) for the person attending the course.
  • Name & address of the company paying the course fee.
  • A purchase Order (PO) number, if your company require this for services & purchases.

The cost includes tea/coffee and lunch. Please inform us in advance if you have any dietary requirements.

Note: Cloud Mechanix is a registered education-only company in the Republic of Ireland and does not charge for or pay for VAT/sales tax.

See the event page for Terms and Conditions.

Buy Ticket

Physical Disks are Missing in Disk Management

In this post, I’ll explain how I fixed a situation where most of my Storage Spaces JBOD disks were missing in Disk Management and Get-PhysicalDisk showed their OperationalStatus as being stuck on “Starting”.

I’ve had some interesting hardware/software issues with an old lab at work. All of the hardware is quite old now, but I’ve been trying to use it in what I’ll call semi-production. The WS2016 Hyper-V cluster hardware consists of a pair of Dell R420 hosts and an old DataON 6 Gbps SAS Storage Spaces JBOD.

Most of the disks disappeared in Disk Management and thus couldn’t be added to a new Storage Spaces pool. I checked Device Manager and they were listed. I removed the devices and rebooted but the disks didn’t appear in Disk Management. I then ran Get-PhysicalDisk and this came up:

image

As you can see, the disks were there, but their OperationalStatus was hung on “Starting” and their HealthStatus was “Unknown”. If this was a single disk, I could imagine that it had failed. However, this was nearly every disk in the JBOD and spanned HDD and SSD. Something else was up – probably Windows Server 2016 or some firmware had threw a wobbly and wasn’t wrapping up some task.

The solution was to run Reset-PhysicalDisk. The example on docs.microsoft.com was incorrect, but adding a foreach loop fixed things:

$phydisk = (Get-Physicaldisk | Where-Object -FilterScript {$_.HealthStatus -Eq “Unknown”})

foreach ($item in $phydisk)
{
Reset-PhysicalDisk -FriendlyName $item.FriendlyName
}

A few seconds later, things looked a lot better:

image

I was then able to create the new pool and virtual disks (witness + CSVs) in Failover Cluster Manager.

Azure Preview – Standard SSD Disks

Say what?!?! Standard SSD disks? And we thought that Standard = HDD and Premium = SSD! But that’s no longer the case.

In the coming world of Azure (we’re not there yet), Premium will be just that, but Standard will be normal deployments of either HDD or SSD. How do the three tiers of managed disks break down?

  • Premium: For when you need the highest IOPS, throughput and lowest latency.
  • Standard SSD: For production workloads when you need predictable IOPS performance (500 per disk) with lower latency than HDD.
  • Standard HDD: Test and archive storage in VMs, with IOPS listed as up to 500 IOPS – spinning disks are based on serial access and bursts of activity lead to access queues and lower performance levels.

Let’s set expectations: Standard SSD is in very limited preview today. Only North Europe (Dublin) is supported today, with several more regions coming online internationally by mid-June:

  • France Central
  • East US 2
  • Central US
  • Canada Central
  • East Asia
  • Korea South
  • Australia East

There is no Azure Portal support today. If you want to deploy Standard SSD disks, you must do it using ARM templates:

  • apiVersion for Microsoft.Compute/virtualMachines must be set as “2018-04-01” (or later)
  • storageAccountType as “StandardSSD_LRS”

image

Won’t SSD be too expensive compared to HDD? That was my first thought. But check out the RRP pricing (North Europe in Euros). The per-GB pricing for Standard HDD is:

image

But the per-GB pricing for Standard SSD disks is:

image

The per-GB cost of Standard SSD is LOWER than that of Standard HDD. How could that be? Electricity is a huge cost in data centers, and disk arrays eat up a lot of power. SSD is way more efficient, the cost of SSD has been falling, and Microsoft eats the commodity storage hardware dog food.

If you read the pricing small print, you will notice that the micro-cost of storage transactions in Standard SSD is €0.000844 per 10,000 transactions, double the Standard HDD cost of €0.000422 per 10,000 transactions but that’s one of those costs that’s tucked away at the bottom of the bill that few ever notice because it’s tiny.

Standard SSD, managed disks only, is LRS only, as with other managed disks. They also come with the usual 99.999% availability, and Microsoft’s claimed ZERO percent annualized failure rate.

I’m looking forward to the day that Standard SSD is GA and I can start telling customers to switch over to it as their normal disk, when Premium isn’t required.

Feedback Required By MS – Storage Replica in WS2019 STANDARD

Microsoft is planning to add Storage Replica into the Standard Edition of Windows Server 2019 (WS2019). In case you weren’t paying attention, Windows Server 2016 (WS2016) only has this feature in the Datacenter edition – a large number of us campaigned to get that changed. I personally wrecked the head of Ned Pyle (@NerdPyle) who, when he isn’t tweeting gifs, is a Principal Program Manager in the Microsoft Windows Server High Availability and Storage group – he’s one of the people responsible for the SR feature and he’s the guy who presents it at conferences such as Ignite.

What is SR? It’s volume based replication in Windows Server Failover Clustering. The main idea what to enable replication of LUNs when companies couldn’t afford SAN replication licensing. Some SAN vendors charge a fortune to enable LUN replication for disaster recovery and SR is a solution for this.

A by product of SR is a scenario for smaller businesses. With the death of cluster-in-a-box (manufacturers are focused on larger S2D customers) the small-medium business is left looking for a new way to build a Hyper-V cluster. You can do 2-node S2D clusters but they have single points of failure (4 nodes are required to get over this) and require at least 10 GBE networking. If you use SR, you can create an active/passive 2-node Hyper-V cluster using just internal RAID storage in your Hyper-V hosts. It’s a simpler solution … but it requires Datacenter Edition today, and in the SME & branch office scenario, Datacenter only makes financial sense when there are 13+ VMs per host.

Ned listened to the feedback. I think he had our backs Smile and understood where we were coming from. So SR has been added to WS2019 Standard in the preview program. Microsoft wants telemetry (people to use it) and to give feedback – there’s a survey here. SR in Standard will be limited. Today, those limits are:

  • SR replicates a single volume instead of an unlimited number of volumes.
  • Servers can have one partnership instead of an unlimited number of partners.
  • Volume size limited to 2 TB instead of an unlimited size.

Microsoft really wants feedback on those limitations. If you think those limitations are too low, then TALK NOW. Don’t wait for GA when it is too late. Don’t be the idiot at some event who gives out shite when nothing can be done. ACT NOW.

If you cannot get the hint, complete the survey!

Video–Azure File Sync

I’ve produced and shared a short video (12:33 minutes) to explain what Azure File Sync is, what it will do for you, and there’s a quick demo at the end. If you want to:

  • Synchronise file shares between offices
  • Fix problems with full file servers by using tiered storage in the cloud
  • Use online backup
  • Get a DR solution for file servers, e.g. small business or branch office

… then Azure File Sync is for you!

Was This Post Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Faster & Bigger Azure Backup for Azure VMs

Azure Backup recently rolled out an update to their service for protecting Azure VMs to improve backup speed, restore performance, and to add support for larger disks.

Support for Large Disks

Azure Backup didn’t support disks that were larger than 1 TiB (1 TB is the marketing measure of 1000 GB, and 1 TiB is the computer science measure of 1024 GiB). Those large disks must be popular – I know that people couldn’t get their head around the idea of a volume being spread across disk aggregation (they never heard of RAID, I guess) and wouldn’t touch Azure VMs because of this.

Today, Azure Backup, once upgraded by you, does support the large disks that Azure can offer (over 1 TiB).

Snapshot-Based Backup

People who deploy large VMs have seen that the traditional process of protecting their machines has been slow. Historically Azure Backup would:

  1. Create a snapshot of the virtual machine.
  2. Transfer the backup data from the storage cluster to the recovery services vault (standard tier block blob storage) over a network.

The snapshot was then dispensed with.

The backup was slow (the process of calculating changes, the network transfer and the write to standard storage), and restores were just as slow. It’s one thing for a backup to be slow, but when a restore is a 12 hour job, you’ve got a problem!

Azure made some changes, and now the process of a backup is:

  1. Create a snapshot of the virtual machine and keep 7 snapshots (7 backups).
  2. Use the previous snapshot to speed up the process of identifying changes.
  3. Transfer the backup data from the storage cluster to the recovery services vault (standard tier block blob storage) over a network.

Two things to note:

  • The differencing calculation is faster, speeding up the end-to-end process.
  • But after you upgrade Azure Backup, you can do a restore once the snapshot is complete, and while the backup job (transfer) is still happening!

Capture

7 snapshots are kept, and you can restore a virtual machine from either:

  • A snapshot from the last 7 backups)
  • A recovery point in the recovery services vault from up to the last 99 years, 9999 recovery points, depending on your backup policy.

AzureVMBackupRestoreUsingSnapshot

Restoring from a snapshot should be much quicker, and this will benefit large workloads, such as database servers, where a restore is usually from as recent a backup as possible.

Distributed Disks Restore

The last new feature is that when you restore a virtual machine with un-managed disks (storage account disks) then you can opt to distribute the disks to different storage accounts.

Accessing the Features

A one-time one-way upgrade must be done in each subscription to access the new Azure Backup for IaaS VM features. When you open a (single) recovery services vault, a banner will appear at the top. Click the banner, and then read the blade that opens. When you understand the process, click Upgrade. A quick task will complete and approximately two hours later, your entire subscription will be upgraded and able to take advantage of the features described above.

Was This Post Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

First Cloud Mechanix Azure Course Completed

Last week, I delivered my first ever Cloud Mechanix Azure training course, to a full room in the Lancaster Gate area of London, UK.

It was a jam-packed full 2 days of Azure storage, networking, virtual machines, backup, DR, security, and management, with lots of hands-on labs. Half the attendees were from the UK, the rest from countries such as Denmark, Netherlands, Belgium, and even Canada! I had a lot of fun teaching the class – there were lots of questions and laughs. And as often happens in these classes, the interactions lead me to picking up a couple of ideas from the attendees.

In my class, everyone gets hands-on labs a few days before the event. That allows them to get their laptops ready. On the day, they get copies of the slides so they can follow/along or make notes on their laptops – the labs and slides are updated with the latest information that I have. The goal of the class isn’t to teach you where to click, but why to click. In the cloud, things move and get renamed so detailed instructions age very quickly. But what lasts is understanding the why. Not everyone got to finish the hands-on labs, but I am available to help the attendees complete the labs.

If this course sounds interesting to you, then we have another class running in Amsterdam in April. Some tweaks are being made the labs/slides (which the London class will be getting too) and, as always, the April class will be getting the latest that I can share on Azure.