Webinar – Getting More Performance From Azure VMs

I will be doing a webinar later today for the European SharePoint Office 365 & Azure Community (from the like-named conference). The webinar is at 14:00 UK/Irish, 15:00 CET, and 09:00 EST. Registration is here.

Title: Getting More Performance from Azure Virtual Machines

Speaker: Aidan Finn, MVP, Ireland

Date and Time: Wed, May 1, 2019 3:00 PM – 4:00 PM CEST

Webinar Description:  You’ve deployed your shiny new application in the cloud, and all that pride crashes down when developers and users start to complain that it’s slow. How do you fix it? In this session you’ll learn to understand what Azure virtual machines can offer, how to pick the right ones for the right job, and how to design for the best possible performance, including networking, storage, processor, and GPU.

Key benefits of attending:
– Understand virtual machine design
– Optimise storage performance
– Get more from Azure networking

Microsoft Ignite–Building Enterprise Grade Applications With Azure Networking’s Delivery Suite

Speakers: Daniel Grickholm & Amit Srivastava

I arrived late to this session after talking to some product group people in the expo hall.

Application Gateway Demo

We see the number of instances dynamically increase and cool down – I think there was an app on Kubernetes in the background.

Application Gateway

Application gateway ingress controller for AKS v2.

  • Attach WAG to AKS clusters.
  • Load balance from the Internet to pods
  • Supports features of K8s ingress resource – TLS, multisite and path-based

Demo: we see a K8s containers app published via the WAG. The backend pool is shown – IPs of containers. Deleting the app in K8s removes the backend pool registration from the WAG (this fails in the demo).

Web Application Firewall



Demo – WAF

App behind a firewall with no exclusion parameters. Backend pool is a simple PHP application. Second firewall is using the same backend VM as a backend pool – a scan exclusion is set up to ignore any field which matches a “comments” string. The second one allows a comment post, the other one does not.





Get performance closer to the customer. Runs in edge sites, not the azure data centers.



Once you hit an edge site via front door, you are on the Azure WAN.


ADN = application delivery network


Big focus on SLA HA and performance. Built for Office.


5 years old and mature.

Can work in conjunction with WAG, even if there is some overlap, e.g. SSL termination.


What will be in the next demo:


Has an app for USA in Central US. Another for UK deployed in UK South. Shows the front door creation – Name/resource group, Configuration screen during creation is a bit different for Azure. Create a global CName and session affinity in fron end hosts. Create backends – app service, gateways, etc. You can set up host headers for custom domains, priority, port translation, priority for failover, weight for load balancing. You can add health probes to the backend pools, to a URL path, HTTP/S, and set the interval. Finally you create a routing rule; this maps frontend hosts to backend pools. You can set if it should be HTTP and/or HTTPS.

Skips to one he created earlier. When he browses the two apps that are in it, he is sent to the closest instance – in central US. You can set up  rules to block certain countries.

You can implement rate limiting and policies for fairness.

You can implement URL rewrites to map to a different path on the web servers.

This is like traffic manager + WAG combined at the edges of the Azure WAN.



Front Door load balances between regions. WAG load balances inside the region – that’s why they work together.


Adding Azure Monitor Performance Alerts Using PowerShell

Below is a sample script for adding Azure Metrics alerts using Azure Monitor. It is possible to create alerts using the Azure Portal, but that doesn’t scale well because each alert is specific to one VM. For example, if you have 4 alerts per VM, and 10 VMs, then you have to create 40 alerts! One could say: Use Log Analytics, but there’s a cost to that, and I find the OMS Workspace to be immature. Instead, one can continue to use Resource/Azure Monitor metrics, but script the creation of the metrics alerts.

Once could use JSON, but again, there’s a scale-out issue there unless you build this into every deployment. But the advantage with PowerShell is that you can automatically vary thresholds based on the VM’s spec, as you will see below – some metric thresholds vary depending on the spec of a machine, e.g. the number of cores.

The magic cmdlet for doing this work is Add-AzureRmMetricAlertRule. And the key to making that cmdlet work is to know the name of the metric. Microsoft’s docs state that you can query for available metrics using Get-AzureRmMetricDefinition, but I found that with VMs, it only returned back the Host metrics and not the Guest metrics. I had to do some experimenting, but I found that the names of the guest metrics are predictable; they’re exactly what you see in the Azure Portal, e.g. \System\Processor Queue Length.

The below script is made up of a start and 2 functions:

  1. The start is where I specify some variables to define the VM, resource group name, and query for the location of the VM. The start can then call a series of functions, one for each metric type. In this example, I call ProcessorQLength.
  2. The ProcessorQLength function takes the VM, queries for it’s size, and then gets the number of cores assigned to that VM. We need that because the alert should be triggers if the average queue length per core is over 4, e.g. 12 for a 4 core VM. The AddMetric function is called with a configuration for the \System\Processor Queue Length alert.
  3. The AddMetric function is a generic function capable of creating any Azure metrics alert. It is configured by the parameters that are fed into it, in this case by the ProcessorQLength function.

Here’s my example:

#A generic function to create an Azure Metrics alert
function AddMetric ($FunMetricName, $FuncMetric, $FuncCondition, $FuncThreshold, $FuncWindowSize, $FuncTimeOperator, $FuncDescription)
    $VMID = (Get-AzureRmVM -ResourceGroupName $RGName -Name $VMName).Id
    Add-AzureRmMetricAlertRule -Name $FunMetricName -Location $VMLocation -ResourceGroup $RGName -TargetResourceId $VMID -MetricName $FuncMetric -Operator $FuncCondition -Threshold $FuncThreshold -WindowSize $FuncWindowSize -TimeAggregationOperator $FuncTimeOperator -Description $FuncDescription

#Create an alert for Processor Queue Length being 4x the number of cores in a VM
function ProcessorQLength ()
    $VMSize = (Get-AzureRMVM -ResourceGroupName $RGName -Name $VMName).HardwareProfile.VmSize
    $Cores = (Get-AzureRMVMSize -Location $VMLocation | Where-Object {$_.Name -eq $VMSize}).NumberOfCores
    $QThreshold = $Cores * 4
    AddMetric "$VMname - CPU Q Length" "\System\Processor Queue Length" "GreaterThan" $QThreshold "00:05:00" "Average" "Created using PowerShell"

#The script starts here
#Specify a VM name/resource group
$VMName = "vm-test-01"
$RGName = "test"
$VMLocation = (Get-AzureRMVM -ResourceGroupName $RGName -Name $VMName).Location

#Start running functions to create alerts

Was This Post Useful?

If you found this information useful, then imagine what 2 days of training might mean to you. I’m delivering a 2-day course in Amsterdam on April 19-20, teaching newbies and experienced Azure admins about Azure Infrastructure. There’ll be lots of in-depth information, covering the foundations, best practices, troubleshooting, and advanced configurations. You can learn more here.

Azure IaaS Design & Performance Considerations–Best Practices & Learnings From The Field

Speaker: Daniel Neumann, TSP – Azure Infrastructure, Microsoft (ex-MVP).

Selecting the Best VM Size

Performance of each Azure VM vCPU/core is rated using ACU, based on 100 for the Standard A-Series. E.g. D_v2 offers 210-250 per vCPU. H offers 290-300. Note that the D_v3 has lower speeds than D_v2 because it uses hyprethreading on the host – MS matched this by reducing costs accordingly. Probably not a big deal – DB workloads which are common on the D-family care more about thread count than GHz.

Network Performance

Documentation has been improved to show actual Gbps instead of low/medium/high. Higher-end machines can be created with Accelerated Networking (SR-IOV) which can offer very high speeds. Announced this week: the M128s the VM can hit 30 Gbps.


Is not always enabled by default for Windows VMs. It is on larger VMs, and it is for all Linux machines. Can greatly improve inbound data transfer performance for multi-core VMs.

Storage Throughput

Listed in the VM sizes. This varies between series, and increases as you go up through the sizes. Watch out when using Premium Storage – lower end machines might not be able to offer the potential of larger disks or storage pools of disks, so you might need a larger VM size to achieve the performance potential of the disks/pool.

Daniel uses a tool called PerfInsights from MS Downloads to demo storage throughput.

Why Use Managed Disks

Storage accounts are limited to 50,0000 IOPS since 20/9/2017. That limits the number of disks that you can have in a single storage account. If you put too many disks in a single storage account, you cannot get the performance potential of each disk.

Lots of reasons to use managed disks. In short:

  • No more storage accounts
  • Lots more management features
  • FYI: no support yet for Azure-to-Azure Site Recovery (replication to other regions)

If you use un-managed disks with availability sets, it can happen that all 3 copies of storage accounts are in the same fault domain. With managed disks, availability set alignment is mirrored by disk placement.

Storage Spaces

Do not use disk mirroring. Use simple virtual disks/LUNs.

Ensure that the column count = the number of disks for performance.

Daniel says to format the volume with 64KB allocation unit size. True, for almost everything except SQL Server. For normal transactional databases, stick with 64KB allocation unit size. For SQL Server data warehouess, go with 256KB allocation unit size – from the SQL Tiger team this week.


Daniel doesn’t appear to be a fan of micro-segmentation of a subnet using an NVA. Maybe the preview DPDK feature for NVA performance might change that.

He shows the NSG Security Group View in Network Watcher. It allows you to understand how L4 firewall rules are being applied by NSGs. In a VM you also have: effective routes and effective security rules.

Encryption Best Practices

Azure Disk Encryption requires that your key vault and VMs reside in the same Azure region and subscription.

Use the latest version of Azure PowerShell to configure Azure Disk Encryption.

You need an Azure AD Service Principal – the VM cannot talk directly to the key vault, so it goes via the service principal. Best practice is to have 1 service principal for each key vault.

Storage Service Encryption (managed disks) is easier. There is no BYOK at the moment so there’s no key vault function. The keys are managed by Azure and not visible to the customer.

The Test Tools Used In This Session

29-09-2017 09-33 Office Lens (1)

Comparing Performance with Encryption

There’s lots of charts in this section so best to watch the video on Channel 9/Ignite?YouTube.

In short, ADE encryption causes some throughput performance hits, depending on disk tier, size, and block size of data – CPU 3% utilization, no IOPS performance hit. SSE has no performance impact.

Azure Backup Best Practices

You need a recovery services vault in the same region/subscription as the VM you want to backup.

VMs using ADE encryption must have a Key Encryption Key (KEK).

Best case performance of Azure Backup backups:

  • Initial backup: 20 Mbps.
  • Incremental backup: 80 Mbps.

Best practices:

  • Do not schedule more than 40 VMs to backup at the same time.
  • Make sure you have Python 2.7 in Linux VMs that you are backing up.

Running Tier 1 Worklaods on SQL Server on Microsoft Azure Virtual Machines

Speaker: Ajay Jagannathan, Principal PM Manager, Microsoft Data Platform Group. He leads the @mssqltiger team.

I think that this is the first every SQL Server that I’ve attended in person at a TechEd/Ignite. I was going to a PaaS session instead, but I’ve got so many customers running SQL Server on Azure VMs, that I thought that this was important for me to see. I also thought it might be useful for a lot of readers.

Microsoft Data Platform

Starting with SQL 2016, the goal was to make the platform consistent on-premises, with Azure VMs, or in Azure SQL. With Azure, scaling is possible using VM features such as scale sets. You can offload database loads, so analytics can be on a different tier:

  • On-premises: SQL Server and SQL Server (DW) Reference architecture
  • IaaS: SQL Server in Azure VM with SQL Server (DW) in Azure VM.
  • PaaS: Azure SQL database with Azure SQL data warehouse

Common T-SQL surface area. Simple cloud migration. Single vendor for support. Develop once and deploy anywhere.

Azure VM

  • Azure load balancer routes traffic to the VM NIC.
  • The compute and storage are separate from the storage.
  • The virtual machine issues operations to the storage.

SQL Server in Azure VM – Deployment Options

  • Microsoft gallery images: SQL Server 2008 R2 – 2017, SQL Web, Std, Ent, Dev, Express. Windows Server 2008 R2 – WS2016. RHEL and Ubuntu.
  • SQL Licensing: PAYG based on number of cores and SQL edition. Pay per minute.
  • Bring your own license: Software Assurance required to move/license SQL to the cloud if not doing PAYG.
  • Creates in ~10 miuntes.
  • Connect via RDP, ADO, .NET, OLEDB, JBDC, PHO …
  • Manage via Portal, SSMS, owerShell, CLI, System Center …

It’s a VM so nothing really changes from on-premises VM in terms of management.

Everytime there’s a critical update or service pack, they update the gallery images.

VM Sizes

The recommend DS__V2- or FS-Series with Premium Storage. For larger loads, they recommend the GS- and LS-Series.

For other options, there’s the ES_v2 series (memory optimized DS_v3), and the M-Series for huge RAM amounts.

VM Availability

Availability sets distribute VMs across fault and update domains in a single cluster/data centre. You get a 99.95% SLA on the service for valid configurations. Use this for SQL clusters.

Managed disks offer easier IOPS management, particularly with Premium Disks (storage account has a limit of 20,000 IOPS). Disks are distributed to different storage stamps when the VM is in an availability set – better isolation for SQL HA or AlwaysOn.

High Availability

Provision a domain controller replica in a different availability set to your SQL VMs. This can be in the same domain as your on-prem domain (ExpressRoute or site-to-site VPN).

Use (Get-Cluster).SameSubnetThreshold = 20 to relax Windows Cluster failure detection for transient network failure.

Configure the cluster to ignore storage. They recommend AlwaysOn. There is no shared storage in Azure. New-Cluster –Name $ClusterName –NoStorage –Node $LocalMachineName

Configure Azure load balancer and backend pool. Register the IP address of listener.

There are step-by-step instructions on MS documentation.

SQL Server Disaster Recovery

Store database backups in geo-replicated readable storage. Restore backups in a remote region (~30 min).

Availability group options:

  • Configure Azure as remote region for on-premise
  • Configure On-prem as DR for Azure
  • Replicate in Azure Remote region – failover to remove in ~30s. Offload remote reads.

Automated Configuration

Some of these are provided by MS in the portal wizard:

  • Optimization to a target workload: OLTP/DW
  • Automated patching and shutdown – latter is very new, and to reduce costs for new dev/test workloads to reduce costs at the end of the workday.
  • Automated backup to a storage account, including user and system databases. Useful for a few databases, but there’s another option coming for larger collections.

Storage Options

The recommend LRS only to keep write performance to a maximum. GRS storage is slower, and could lead to database file being written/replicated before log storage.

Premium Storage: high IOPS and low latency. Use Storage Spaces to increase capacity and performance. Enable host-based read caching in data disks for better IOPS/latency.

Backup to Premium Storage is 6x faster. Restore is 30x faster.

Azure VM Connectivity

  • Over the Internet.
  • Over site-site tunnel: VPN or ExpressRoute
  • Apps can connect transparently via a listener, e.g. Load Balancer.

Demo: Deployment

The speaker shows a PowerShell script. Not much point in blogging this. I refer JSON anyway.

http://aka.ms/tigertoolbox is the script/tools/demos repository.


  • Physical security of the datacenter
  • Infrastructure security: virtual network isolation, and storage encryption including bring-your-own-key self-service encryption with Key Vault. Best practices and monitoring by Security Center.
  • Many certifications
  • SQL Security: auto-patching, database/backup encryption, and more.

VM Configuration for SQL Server

  • Use D-Series or higher.
  • Use Storage Spaces for performance of disks. Use Simple disks: the number of columns should equal the number of disks. For OLTP use 64KB interleave and use 256KB for data warehouse.
  • Do not use the system drive.
  • Put TempDB, logs, and databases on different volumes because of their different write patterns.
  • 64K allocation unit size.
  • Enable read caching on disks for data files and TempDB.
  • Do not use GRS storage.

SQL Configuration

  • Enable instant file initialization
  • Enabled locked ages
  • Enable data page compression
  • Disable auto-shrink for your databases
  • Backup to URL with compressed backups – useful for a few VMs/databases. SQL 2016 does this very quickly.
  • Move all databases to data disks, including system databases (separate data and log). Use read caching.
  • Move SQL Server error log and trace file directories to data disks

Demo: Workload Performance of Standard Versus Premium Storage

A scripted demo. 2 scripts doing the same thing – one targeting a DB on Standard disk (up to 500 IOPS) and the second targets a DB on a Premium P30 (4,500 IOPS) disk. There’s table creation, 10,000 rows, inserts, more tables, etc. The scripts track the time required.

It takes a while – he has some stats from previous runs. There’s only a 25% difference in the test. Honestly – that’s no indicative of the differences. He needs a better demo.

An IFI test shows that the bigger the database file is, the bigger the difference is in terms of performance – this makes sense considering the performance nature of flash storage.

Seamless Database Migration

There is a migration guide, and tools/services. http://datamigration.microsoft.com. One-stop shop for database migrations. Guidance to get from source to target. Recommended partners and case studies.


  • Data Migration Assistant: An analysis tool to produce a report.
  • Azure Database Migration Service (free service that runs in a VM): Works with Oracle, MySQL, and SQL Server to SQL Server, Azure SQL, Azure SQL Managed Instance. It works by backing up the DB on the source, moving the backup to the cloud, and restoring the backup.

Azure Backup

Today, SQL Server can backup from the SQL VM (Azure or on-prem) to a storage account in Azure. It’s all managed from SQL Server. Very distributed, no centralized reporting, difficult/no long-term retention.  Very cheap.

Azure Backup will offer centralized management of SQL Backup in an Azure VM. In preview today. Managed from the Recovery Services Vault. You select the type of backup, and a discovery will detect all SQL instances in Azure VMs, and their databases. A service account is required for this and is included in the gallery images. You must add this service for custom VMs. You then configure a backup policy for selected DBs. You can define a full backup policy, incremental, and transactional backup policy with SQL backup compression option. The retention options are the familiar ones from Azure Backup (up to 99 years by the looks of it). The backup is scheduled and you can do ad-hoc/manual backups as usual with Azure Backup.

You can restore databases too – there’s a nice GUI for selecting a restore date/time. It looks like quite a bit of work went into this. This will be the recommended solution for centralized backup of lots of databases, and for those wanting long term retention.

Backup Verification is not in this solution yet.

Hyper-V Virtual NUMA Versus Dynamic Memory

When you are using VMs with a large amount of memory then NUMA topology becomes important. Hyper-V can reveal the underlying physical NUMA topology to the VM so that the guest OS and NUMA-aware apps (such as SQL Server) efficiently assign memory and schedule processes to make the most of the boundaries.

There is something important to note. Enabling Dynamic Memory in the settings of a VM disables virtual NUMA. That means that the vast majority of VMs will not have virtual NUMA. To squeeze the best processor/memory performance out of larger VMs you will need to use static RAM, as noted here under Virtual NUMA:

Virtual NUMA and Dynamic Memory features cannot be used at the same time. A virtual machine that has Dynamic Memory enabled effectively has only one virtual NUMA node, and no NUMA topology is presented to the virtual machine regardless of the virtual NUMA settings.

So you have a balancing act to do:

  • Applications and large VMs that might benefit from virtual NUMA probably should have static memory. Enabling Dynamic Memory would indirectly reduce the potential performance of the services provided by that VM because virtual NUMA would be disabled.
  • Note that workloads that are not NUMA-aware cannot make use of virtual NUMA. Therefore enabling Dynamic Memory will not impact performance, and it makes sense to optimize the RAM assignment.
  • Maybe service performance isn’t a big deal (!?!?!?) but the cost of RAM is. Then you would always (if the app/guest OS support it) enable Dynamic Memory.

This is not ideal. Introducing a human decision into a cloud where uneducated “users” are deploying their own VMs makes things less efficient. Hopefully MSFT will overcome the Dynamic Memory versus virtual NUMA conflict in a future version, but when you think about it, this would difficult to do.

Memory Page Combining

My reading of the Windows Server 2012 R2 (WS2012 R2) Performance and Tuning Guide continues and I’ve just read about a feature that I didn’t know about. Memory combining is a feature that was added in Windows 8 and Window Server 2012 (WS2012) to reduce memory consumption. There isn’t too much text on it, but I think memory combining stores a single instance of pages if:

  • The memory is pageable
  • The memory is private

Enabling page combining may reduce memory usage on servers which have a lot of private, pageable pages with identical contents. For example, servers running multiple instances of the same memory-intensive app, or a single app that works with highly repetitive data, might be good candidates to try page combining.

Bill Karagounis talked briefly about memory combining in the old Sinofsky Building Windows 8 blog (where it was easy to be lost in the frequent 10,000 word posts):

Memory combining is a technique in which Windows efficiently assesses the content of system RAM during normal activity and locates duplicate content across all system memory. Windows will then free up duplicates and keep a single copy. If the application tries to write to the memory in future, Windows will give it a private copy. All of this happens under the covers in the memory manager, with no impact on applications. This approach can liberate 10s to 100s of MBs of memory (depending on how many applications are running concurrently).

The feature therefore does not improve things for every server:

Here are some examples of server roles where page combining is unlikely to give much benefit:

  • File servers (most of the memory is consumed by file pages which are not private and therefore not combinable)
  • Microsoft SQL Servers that are configured to use AWE or large pages (most of the memory is private but non-pageable)

You can enable (memory) page combining using Enable-MMAgent and query the status using Get-MMAgent.

You’ll find that memory combining is enabled by default on Windows 8 and Windows 8.1.  That makes these OSs even more efficient for VDI workloads. It is disabled by default on servers – analyse your services to see if it will be appropriate.

There is a processor penalty for using memory combining. The feature is also not suitable for all workloads (see above).  So be careful with it.

WS2012 Hyper-V – Virtual Hard Disk (VHD) Block Fragmentation

The Performance Tuning Guidelines for Windows Server 2012 document is available and I’m reviewing and commenting on notable text in it.

This is a small but important note in the document:

Just as the allocations on a physical disk can be fragmented, the allocation of the blocks on a virtual disk can be fragmented when two virtually adjacent blocks are not allocated together on a virtual disk file.

The fragmentation percentage is reported for disks. If a performance issue noticed on a virtual disk, you should check the fragmentation percentage. When applicable, defragment the virtual disk by creating a new virtual disk with the data from the fragmented disk by using the Create from Source option.

Hmm, I’ve MSFT for more information on this one; I would have thought that we could defrag the LUN that the fragmented VHD was on, rather than create a whole new VHD.  I’ll update this if I get an answer.

WS2012 Hyper-V – Choosing Between Storage Controllers

The Performance Tuning Guidelines for Windows Server 2012 document is available and I’m reviewing and commenting on notable text in it.

There are 3 types of storage controller:

  • IDE
  • SCSI
  • Virtual HBA


There are 2 IDE controllers (0 and 1), with each one having 2 channels.  In other words, you can have 4 devices attached to IDE.  Two quick notes:

  • Your boot drive must be attached to IDE.  You have no choice in this.  Before the VMware fanboy shite starts, this is a software controller and has no relevance to the hardware.  Your hosts don’t need IDE controllers.  It is a simulated virtual device and performs just as well as SCSI, as Ben Armstrong pointed out years ago.
  • The virtual CD/DVD drive will be mounted to the IDE controllers, usually IDE 1.  Microsoft states that you can save host resources by removing this device if it is not used.  Be careful, you need it to do the usual manual Integration Services upgrade, install s/w from ISO (as you would in a cloud via the VMM library). 

Adding or removing devices to IDE requires the VM to be powered down.


The virtual SCSI controller has nothing to do with hardware either.  It is a simulated virtual device.  A benefit is that it allows hot add of storage to a running VM.  WS2012 SCSI attached VHDX enables unmap to save physical disk space.  A single SCSI controller allows up to 64 attached disks.  You can have up to 4 SCSI controllers.  That’s 256 SCSI attached disks. That’s a lot of storage if you use 64 TB VHDX!

Virtual HBA

You can virtualise your host’s physical HBA ports to create virtual HBAs in the VMs on that host.  This allows your VMs to have their own WWNs and directly connect to the SAN, using NPIV (required).  If your SAN vendor supports it, you can run their DSM/MPIO to use multiple virtual HBAs in a VM.  This gives greater storage IO performance and provides fault tolerance if you design the virtual SANs correctly. 

Microsoft says you can do this for large LUNs.  I strongly urge you to use VHDX for this if you need up to 64 TB in your large LUN:

  • More flexible/mobile (storage migration), unlike LUNs that are physically bound to the SAN
  • Can be backed up at the host/storage level, unlike physical LUNs that can only be backed up by an agent in the VM

A really good reason is virtual guest clusters, where you need some shared storage between the (up to 64) nodes in the guest cluster.

WS2012 Hyper-V – Host Memory Sizing

The Performance Tuning Guidelines for Windows Server 2012 document is available and I’m reviewing and commenting on notable text in it.

I have deleted the contents of this post. The information in the Performance and Tuning Guide was incorrect.