VPN | Aidan Finn, IT Pro

Connecting To A Third-Party Network From Azure Using NAT

An unfortunately common scenario is where you must create a site-to-site network connection with a third-party network from your Azure network using NAT. This post will explain a few solutions.

The Scenario

There are those out there who think that every implementation in The Cloud is 100% under your control and is cloud-ready. But sometimes, you must fit in with other people’s designs and you can’t use cool integrations such as Private Link or API. Sometimes you need to connect your network to a third party and they dictate the terms of the connection.

The connection is typically a site-to-site connection, usually VPN but I have seen ExpressRoute used. VPN means there are messy bits – you can control that with your own on-premises firewalls but you have no control over the VPN configuration of an externally owned firewall.

Site-to-site connections with a service provider means that there could be IP address overlap. The only way to handle that is to use NAT – and that is not always possible natively in the platform or it’s really badly documented.

Solution 1: On-Premises Relay

In this scenario, the third-party will make a connection to your on-premises network. NAT is implemented on the on-premises network to translate your private Azure address to some “public address” (it is routed only over the private connections).

The connection between on-premises and Azure could be VPN or ExpressRoute.

This design is useful in two situations:

You are using ExpressRoute – the ExpressRoute Gateway does not offer NAT functionality.
The third-party insists that you use some kind of VPN configuration that is not supported in Azure, for example, GRE.

The downside with this design is that might be additional latency between the third-party and your Azure network.

Solution 2: AWS Relay

Oh – did this post by an Azure MVP just mention AWS? Sure – there is a time and a place for everything.

This solution is similar to the on-premises relay solution but it replaces on-premises with AWS. This can be useful where:

You want to minimise on-premises resources. AWS does support GRE so a VPN connection to a third-party that requires GRE can be handled in this way.
You can use an AWS region that is close to either the third-party and/or your Azure region and minimise latency.

Note that the connection from AWS or Azure could be either VPN or ExpressRoute (with an ISP that supports Azure ExpressRoute and AWS Direct Connect).

The downside is that there is still “more stuff” and a requirement for skills that you might not have. On the plus side, it offers compatibility with reduced latency.

Solution 3: Azure Relay

In this design, the third-party makes a connection to your Azure network(s) using ExpressRoute. But as usual, you must implement a NAT rule. The ExpressRoute Gateway cannot natively implement NAT. That requires that you must deploy “an appliance” (NVA or Linux VM with NAT tables).

In the above design, there is a route table associated with the GatewaySubnet of the ExpressRoute Gateway. An user-defined route with a prefix of 40.40.40.4 will forward to the appliance as the next hop. A user-defined route on the VM’s subnet with a prefix of the third-party network(s) will use the appliance as the next hop.

This design allows you to use ExpressRoute to connect to the third-party but it also allow you to implement NAT.

Solution 4: VPN Gateway & NAT

Other than using some modern solution, such as authenticated API over HTTPS, this is probably “the best” scenario in terms of Azure resource simplicity.

The third-party connects to your Azure network using a site-to-site VPN. The connection is terminated in Azure using a VPN Gateway. The Azure VPN Gateway is capable of supporting NAT rules. Unfortunately, that’s where things begin to fall apart because of the documentation (quality and completeness).

This is a simple scenario where the third-party needs access to an IP address (VM other otherwise) hosted in your Azure network. That internal address of your Azure resource must be translated to a different External IP Address.

As long as your VPN Gateway is VpnGw2/VpnGw2Az or higher, then you can create NAT rules in the Gateway. The scenario that I have described requires a confusingly-named egress NAT rule – you are translating an internal IP address(es) to an external IP address(es) to abstract the internal address(es) for ingress traffic. An ingress NAT rule translates an external IP address(es) to an internal address(es) to abstract the external address(es) for ingress traffic.

The Terraform code for my scenario is shown below: I want to make my Azure resource with 10.10.8.4 available externally as 40.40.40.4 on TCP 443:

Once you have the NAT rule, you will associate it with the Connection resource for the VPN.

And that’s it – 10.10.8.4 will be available as 40.40.40.4 on TCP 443 to the third-party – no other connection can use this NAT rule unless it is associated with it.

Solution 5 – NVA & NAT

This is alm ost the same as the previous example, but an NVA is used instead of the Azure VPN Gateway, maybe because you like their P2S VPN solution or you are using SD-WAN. The NAT rules are implemented in the NVA.

Connecting Azure Hub-And-Spoke Architectures Together

In this post, I will explain how you can connect multiple Azure hub-and-spoke (virtual data centre) deployments together using Azure networking, even across different Azure regions.

There is a lot to know here so here is some recommended reading that I previously published:

If you are using Azure Virtual WAN Hub then some stuff will be different and that scenario is not covered fully here – Azure Virtual WAN Hub has a preview (today) feature for Any-to-Any routing.

The Scenario

In this case, there are two hub-and-spoke deployments:

Blue: Multiple virtual networks covered by the CIDR of 10.1.0.0/16
Green: Another set of multiple virtual networks covered by the CIDR of 10.2.0.0/16

I’m being strategic with the addressing of each hub-and-spoke deployment, ensuring that a single CIDR will include the hub and all spokes of a single deployment – this will come in handy when we look at User-Defined Routes.

Either of these hub-and-spoke deployments could be in the same region or even in different Azure regions. It is desired that if:

Any spoke wishes to talk to another spoke it will route through the local firewall in the local hub.
All traffic coming into a spoke from an outside source, such as the other hub-and-spoke, must route through the local firewall in the local hub.

That would mean that Spoke 1 must route through Hub 1 and then Hub 2 to talk to Spoke 4. The firewall can be a third-party appliance or the Azure Firewall.

Core Routing

Each subnet in each spoke needs a route to the outside world (0.0.0.0/0) via the local firewall. For example:

The Blue firewall backend/private IP address is 10.1.0.132
A Route Table for each subnet is created in the Blue deployment and has a route to 0.0.0.0/0 via a virtual appliance with an IP address of 10.1.0.132
The Greenfirewall backend/private IP address is 10.2.0.132
A Route Table for each subnet is created in the Green deployment and has a route to 0.0.0.0/0 via a virtual appliance with an IP address of 10.2.0.132

Note: Some network-connected PaaS services, e.g. API Management or SQL Managed Instance, require additional routes to the “control plane” that will bypass the local firewall.

Site-to-Site VPN

In this scenario, the organisation is connecting on-premises networks to 1 or more of the hub-and-spoke deployments with a site-to-site VPN connection. That connection goes to the hub of Blue and to Green hubs.

To connect Blue and Green you will need to configure VNet Peering, which can work inside a region or across regions (using Microsoft’s low latency WAN, the second-largest private WAN on the planet). Each end of peering needs the following settings (the names of the settings change so I’m not checking their exact naming):

Enabled: Yes
Allow Transit: Yes
Use Remote Gateway: No
Allow Gateway Sharing: No

Let’s go back and do some routing theory!

That peering connection will add a hidden Default (“system”) route to each subnet in the hub subnets:

Blue hub subnets: A route to 10.2.0.0/24
Green hub subnets: A route to 10.1.0.0/24

Now imagine you are a packet in Spoke 1 trying to get to Spoke 4. You’re sent to the firewall in Blue Hub 1. The firewall lets the traffic out (if a rule allows it) and now the packet sits in the egress/frontend/firewall subnet and is trying to find a route to 10.2.2.0/24. The peering-created Default route covers 10.2.0.0/24 but not the subnet for Spoke 4. So that means the default route to 0.0.0.0/0 (Internet) will be used and the packet is lost.

To fix this you will need to add a Route Table to the egress/frontend/firewall subnet in each hub:

Blue firewall subnet Route Table: 10.2.0.0/16 via virtual appliance 10.2.0.132
Red firewall subnet Route Table: 10.1.0.0/16 via virtual appliance 10.1.0.132

Thanks to my clever addressing of each hub-and-spoke, a single route will cover all packets leaving Blue and trying to get to any spoke in Red and vice-versa.

ExpressRoute

Now the customer has decided to use ExpressRoute to connect to Azure – Sweet! But guess what – you don’t need 1 expensive circuit to each hub-and-spoke.

You can share a single circuit across multiple ExpressRoute gateways:

ExpressRoute Standard: Up to 10 simultaneous connections to Virtual Network Gateways in 1+ regions in the same geopolitical region.
ExpressRoute Premium: Up to 100 simultaneous connections to Virtual Network Gateways in 1+ regions in any geopolitical region.

FYI, ExpressRoute connections to the Azure Virtual WAN Hub must be of the Premium SKU.

ExpressRoute is powered by BGP. All the on-premises routes that are advertised propagate through the ISP to the Microsoft edge router (“meet-me”) in the edge data centre. For example, if I want an ExpressRoute circuit to Azure West Europe (Middenmeer, Netherlands – not Amsterdam) I will probably (not always) get a circuit to the POP or edge data centre in Amsterdam. That gets me a physical low-latency connection onto the Microsoft WAN – and my BGP routes get to the meet-me router in Amsterdam. Now I can route to locations on that WAN. If I connect a VNet Gateway to that circuit to Blue in Azure West Europe, then my BGP routes will propagate from the meet-me router to the GatewaySubnet in the Blue hub, and then on to my firewall subnet.

BGP propagation is disabled in the spoke Route Tables to ensure all outbound flows go through the local firewall.

But that is not the extent of things! The hub-and-spoke peering connections allow Gateway Sharing from the hub and Use Remote Gateway from the spoke. With that configuration, BGP routes to the spoke get propagated to the GatewaySubnet in the hub, then to the meet-me router, through the ISP and then to the on-premises network. This is what our solution is based on.

Let’s imagine that the Green deployment is in North Europe (Dublin, Ireland). I could get a second ExpressRoute connection but:

That will add cost
Not give me the clever solution that I want – but I could work around that with ExpressRoute Global Reach

I’m going to keep this simple – by the way, if I wanted Green to be in a different geopolitical region such as East US 2 then I could use ExpressRoute Premium to make this work.

In the Green hub, the Virtual Network Gateway will connect to the existing ExpressRoute circuit – no more money to the ISP! That means Green will connect to the same meet-me router as Blue. The on-premises routes will get into Green the exact same way as with Blue. And the routes to the Green spokes will also propagate down to on-premises via the meet-me router. That meet-me router knows all about the subnets in Blue and Green. And guess what BGP routers do? They propagate – so, the routes to all of the Blue subnets propagate to Green and vice-versa with the next hop (after the Virtual Network Gateway) being the meet-me router. There are no Route Tables or peering required in the hubs – it just works!

Now the path from Blue Spoke 1 to Green Spoke 4 is Blue Hub Firewall, Blue Virtual Network Gateway, <the Microsoft WAN>, Microsoft (meet-me) Router, <the Microsoft WAN>, Green Virtual Network Gateway, Green Hub Firewall, Green Spoke 4.

There are ways to make this scenario more interesting. Let’s say I have an office in London and I want to use Microsoft Azure. Some stuff will reside in UK South for compliance or performance reasons. But UK South is not a “hero region” as Microsoft calls them. There might be more advanced features that I want to use that are only in West Europe. I could use two ExpressRoute circuits, one to UK South and one to West Europe. Or I could set up a single circuit to London to get me onto the Microsoft WAN and connected this circuit to both of my deployments in UK South and West Europe. I have a quicker route going Office > ISP > London edge data center > Azure West Europe than from Office > ISP > Amsterdam edge data center > Azure West Europe because I have reduced the latency between me and West Europe by reducing the length of the ISP circuit and using the more-direct Microsoft WAN. Just like with Azure Front Door, you want to get onto the Microsoft WAN as quickly as possible and let it get you to your destination as quickly as possible.

Microsoft Ignite 2019 – Global Transit Network Architectures With Azure Virtual WAN

Speakers:

Reshmi Yandapalli (main speaker), Principal Program Manager
Ben Peeri, KPMG customer story

Lots more content in the hidden slides in the download.

Scale

Usual stats. Interesting note: a new POP being built almost every day.

Azure WAN: Global Transit Architecture

The Beginning

HQ/Bigger Office
Branhc office(s)
Users
Private WAN
Shared services

Start with HQ. Users multiply. VLANs multiply. Locations multiply. WAN grows. You grow:

Need to simplify network
Need ease of use
Need operational savings.

Azure Virtual WAN

Managed hub & spoke architecture, with hub being Azure and spokes being offices.
Public (VPN) and private (ExpressRoute) connectivity.
Global Scale:
20 Gbps S2S VPN and 20 Gbps ER = 20 Gbps user VPN
10K users per hub
1000 sites per hub
1 hub per region
Transit routing
Cloud Network Orchestration
Automated large-sale branch/SDWAN CPE connectivity

Connectivity

What if you had many regions – many hubs. And what if you wanted any branch to access any Azure VNet, regardless of local vWAN hub? In other words, connect to a hub, and use the Azure WAN to seamlessly reach the destination. So you build hub/spoke in different Azure regions, each with a vWAN hub. And a branch connects to the closest vWAN hub, and can get to any Azure VNet via transitive routing between vWAN hubs across the Azure WAN.

Simplified network
Ease of use
Operational savings

This is called Global Transit Architecture over Azure Virtual WAN.

Azure Virtual WAN – What’s New

Any-to-Any connectivity (Preview, soon GA)
ExpressRoute and User VPN GA
ExpressRoute encryption
Multi-link Azure Path Selection
Custom IPsec
Connect VNG VPN to Virtual WAN
Availble in Gov Cloud & China
Azure Firewall integration (Preview) – this is the big announcement IMO
Pricing – reduced
New partnerships coming soon
- Arista,
- Aruba
- Cisco
- F5
- OpenSystems
- VeroCloud

Global Transit Architecture – A Customer Example

4 regions, 70 countries with 100’s of sites. 34 VNets, 2 ExpressRoute Premium circuits.
Challenges: scale issues, routing complexity, ER VNet limits

The before and after architecture diagrams are totally different – after is much more simple.

Azure Virtual WAN Types

Basic:

VPN only
- Branch to Azure
- Branch to Branch
Connect VNet
- DIY VNet peering, VNet to VNet non-transitive via hub
- Hubs are not connected

Standard = Basic + Following

Stuff

Multi-Link Support in VPN Site

Support dual links of different types/ISPs. Azure sees the link information. The branch partner can do path selection across these links.

Barracuda CloudGen Firewall is the first to support this. You get always-on Azure in the branch.

ExpressRoute

GA in Standard Virtual WAN.
Up to 20 Gbps aggregate per hub.
Private connectivity – requires premium circuit.
In Global Reach Location
ExpressRoute VPN Interconnect
Integrated with Azure Monitor

EXPRESSROUTE + VPN Path Selection

Path selection between ER and VPN. Fortinet can do this.

Customer Story – Ben Peeri, KPMG

No notes here – sales story.

User VPN

Available in Standard Virtual WAN
Up to 20 Gbps aggregate and 10K users per hub
Cloud based secure remote access
- Works with OpenVON and IKEv2 client
- Cert based and RADIU authentication
Any-to-Any
- User to branch, user to Azure VNet
More

Azure Firewall

Firewall in virtual hub
Centralized policy and route management
- VNet to Internet through Azure Firewall
- Branch to Internet through Azure Firewall
- Managed through Azure Firewall Manager

Azure MSP Program

Announced in July. Focused on networking. Offerings in Azure Marketplace.

Pricing

Connection Unit
- Site-to-site VPN / ExpressRoute: No reduced
- User VPN
Scale Unit – aggregate throughput
- 1 VPN scale unit
- 1 ER scale unit
Virtual Hub (Effective CYQ1 2020)
- Basic vWAN hub: no charge
- Standard hub
- Data processing intra region
- Data processing inter region

Private Connections to Azure PaaS Services

In this post, I’d like to explain a few options you have to get secure/private connections to Azure’s platform-as-a-service offerings.

Express Route – Microsoft Peering

ExpressRoute comes in a few forms, but at a basic level, it’s a “WAN” connection to Azure virtual networks via one or more virtual network gateways; Customers this private peering to connect on-premises networks to Azure virtual networks over an SLA-protected private circuit. However, there is another form of peering that you can do over an ExpressRoute circuit called Microsoft peering. This is where you can use your private circuit to connect to Microsoft cloud services that are normally connected to over the public Internet. What you get:

Private access to PaaS services from your on-premises networks.
Access to an entire service, such as Azure SQL.
A wide array of Azure and non-Azure Microsoft cloud services.

FYI, Office 365 is often mentioned here. In theory, you can access Office 365 over Microsoft peering/ExpressRoute. However, the Office 365 group must first grant you permission to do this – the last I checked, you had to have legal proof of a regulatory need for private access to Cloud services.

Service Endpoint

Imagine that you are running some resources in Azure, such as virtual machines or App Service Environment (ASE); these are virtual network integrated services. Now consider that these services might need to connect to other services such as storage accounts, Azure SQL, or others. Normally, when a VNet connected resource is communicating with, say, Azure SQL, the packets will be routed to “Internet” via the 0.0.0.0/0 default route for the subnet – “Internet” is everywhere outside the virtual network, not necessarily The Internet. The flow will hit the “public” Azure backbone and route to the Azure SQL compute cluster. There are two things about that flow:

It is indirect and introduces latency.
It traverses a shared network space.
A growing number of Azure-only services that support service endpoints.

A growing number of services, including storage accounts, Azure SQL, Cosmos DB, and Key Vault, all have services endpoints available to them. You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet” and the packets will “drop” through the service endpoint to the required Azure service – make sure that any firewall in the service accepts packets from the private subnet IP address of the source (VM or whatever). Now you have a more direct and more private connection to the platform service in Azure from your VNet. What you get:

Private access to PaaS services from your Azure virtual networks.
Access to an entire service, such as Azure SQL, but you can limit this to a region.

Service Endpoint Trick #1

Did you notice in the previous section on service endpoints that I said:

You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet”

Imagine you have a complex network and not everyone enables service endpoints the way that they should. But you manage the firewall, the public IPs, and the routing. Well, my friend, you can force traffic to support Azure platform services via service endpoints. If you have a firewall, then your routes to “Internet” should direct outbound traffic through the firewall. In the firewall (frontend) subnet, you can enable all the Azure service endpoints. Now when packets egress the firewall, they will “drop” through the service endpoints and to the desired Azure platform service, without ever reaching “Internet”.

Service Endpoint Trick #2

You might know that I like Azure Firewall. Here’s a trick that the Azure networking teams shared with me – it’s similar to the above one but is for on-premises clients trying to access Azure platform services.

You’ve got a VPN connection to a complex virtual network architecture in Azure. And at the frontend of this architecture is Azure Firewall, sitting in the AzureFirewallSubnet; in this subnet you enabled all the available service endpoints. Let’s say that someone wants to connect to Azure SQL using Power BI on their on-premises desktop. Normally that traffic will go over the Internet. What you can do is configure name resolution on your network (or PC) for the database to point at the private IP address of the Azure Firewall. Now Power BI will forward traffic to Azure Firewall, which will relay you to Azure SQL via the service endpoint. What you get:

Private access to PaaS services from your on-premises or Azure networks.
Access to individual instances of a service, such as an Azure SQL server
A growing number of Azure-only services that support service endpoints.

Private Link

In this post, I’m focusing on only one of the 3 current scenarios for Private Link, which is currently in unsupported preview in limited US regions only, for limited platform services – in other words, it’s early days.

This approach aims to give a similar solution to the above “Service Endpoint Trick #2” without the use of trickery. You can connect an instance of an Azure platform service to a virtual network using Private Link. That instance will now have a private IP address on the VNet subnet, making it fully routable on your virtual network. The private link gets a globally unique record in the Microsoft-managed privatelink.database.windows.net DNS zone. For example, your Azure SQL Server would now be resolvable to the private IP address of the private link as yourazuresqlsvr.privatelink.database.windows.net. Now your clients, be the in Azure or on-premises, can connect to this DNS name/IP address to connect to this Azure SQL instance. What you get:

Private access to PaaS services from your on-premises or Azure networks.
Access to individual instances of a service, such as an Azure SQL server.
(PREVIEW LIMITATIONS) A limited number of platform services in limited US-only regions.

Creating an Azure Service for Slow Moving Organisations

In this post, I will explain how you can use Azure’s Public IP Prefix feature to pre-create public IP addresses to access Azure services when you are working big/government organisations that can take weeks to configure a VPN tunnel, outbound firewall rule, and so on.

In this scenario, I need a predictable IP address so that means I must use the Standard SKU address tier.

The Problem

It normally only takes a few minutes to create a firewall rule, a VPN tunnel, etc in an on-premises network. But sometimes it seems to take forever! I’ve been in that situation – you’ve set up an environment for the customer to work with, but their on-premises networking team(s) are slow to do anything. And you only wish that you had given them all the details that they needed earlier in the project so their configuration work would end when your weeks of engineering was wrapping up.

But you won’t know the public IP address until you create it. And that is normally only created when you create the virtual network gateway, Azure Firewall, Application Firewall, etc. But what if you had a pool of Azure public IP addresses that were pre-reserved and ready to share with the network team. Maybe they could be used to make early requests for VPN tunnels, firewall rules, and so on? Luckily, we can do that!

Public IP Prefix

An Azure Public IP Prefix is a set of reserved public IP addresses (PIPs). You can create an IP Prefix of a certain size, from /31 (2 addresses) to /24 (256 addresses), in a certain region. The pool of addresses is a contiguous block of predictable addresses. And from that pool, you can create public IP addresses for your Azure resources.

In my example, I want a Standard tier IP address and this requires a Standard tier Public IP Prefix. Unfortunately, the Azure Portal doesn’t allow for this with Public IP Prefix, so we need some PowerShell. First, we’ll define some reused variables:

$rgName = "test"
$region = "westeurope"
$ipPrefixName = "test-ipfx"

Now we will create the Publix IP Prefix. Note that the length refers to the subnet mask length. In my example that’s a /30 resulting in a prefix with 4 reserved public IP addresses:

$ipPrefix = New-AzPublicIpPrefix -Name $ipPrefixName -ResourceGroupName $rgName -PrefixLength 30 -Sku Standard -Location $region

You’ll note above that I used Standard in the command. This creates a pool of static Standard tier public IP addresses. I could have dropped the Standard, and that would have created a pool of static Basic tier IP addresses – you can use the Azure Portal to deploy Basic tier Public IP Prefix and public IP addresses from that prefix. The decision to use Standard tier or Basic tier affects what resources I can deploy with the addresses:

Standard: Azure Firewall, zone-redundant virtual network gateways, v2 application gateways/firewalls, standard tier load balancers, etc.
Basic static: Basic tier load balancers, v1 application gateways/firewalls, etc.

Note that the non-zone redundant virtual network gateways cannot use static public IP addresses and therefore cannot use Public IP Prefix.

Creating a Public IP Address

Let’s say that I have a project coming up where I need to deploy an Application Firewall and I know the on-premises network team will take weeks to allow outbound access to my new web service. Instead of waiting until I build the application, I can reserve the IP address now, tell the on-premises firewall team to allow it, and then work on my project. Hopefully, by the time I have the site up and running and presented to the Internet by my Application Firewall, they will have created the outbound firewall rule from the company network.

Browse to the Public IP Prefix and make sure that it is in the same region as the new virtual network and virtual network gateway. Open the prefix and check Allocated IP Addresses in the Overview. Make sure that there is free capacity in the reserved block.

Now I can continue to use my variables from above and create a new public IP address from one of the reserved addresses in the Public IP Prefix:

New-AzPublicIpAddress -Name "test-vpn-pip" -ResourceGroupName $rgName -AllocationMethod Static -DomainNameLabel "test-vpn" -Location $region -PublicIpPrefix $ipPrefix -Sku Standard

Use the Public IP Address

I now have everything I need to pass onto the on-premises network team in my request. In my example, I am going to create a v2 Application Firewall.

Once I configure the WAF, the on-premises firewall will (hopefully) already have the rule to allow outbound connections to my pre-reserved IP address and, therefore, my new web service.

Azure Availability Zones in the Real World

I will discuss Azure’s availability zones feature in this post, sharing what they can offer for you and some of the things to be aware of.

Uptime Versus SLA

Noobs to hosting and cloud focus on three magic letters: S, L, A or service level agreement. This is a contractual promise that something will be running for a certain percentage of time in the billing period or the hosting/cloud vendor will credit or compensate the customer.

You’ll hear phrases like “three nines”, or “four nines” to express the measure of uptime. The first is a 99.9% measure, and the second is a 99.99% measure. Either is quite a high level of uptime. Azure does have SLAs for all sorts of things. For example, a service deployed in a valid virtual machine availability set has a connectivity (uptime) SLA of 99.9%.

Why did I talk about noobs? Promises are easy to make. I once worked for a hosting company that offers a ridiculous 100% SLA for everything, including cheap-ass generic Pentium “servers” from eBay with single IDE disks. 100% is an unachievable target because … let’s be real here … things break. Even systems with redundant components have downtime. I prefer to see realistic SLAs and honest statements on what you must do to get that guarantee.

Azure gives us those sorts of SLAs. For virtual machines we have:

5% for machines with just Premium SSD disks
9% for services running in a valid availability set
99% for services running in multiple availability zones

Ah… let’s talk about that last one!

Availability Sets

First, we must discuss availability sets and what they are before we move one step higher. An availability set is anti-affinity, a feature of vSphere and in Hyper-V Failover Clustering (PowerShell or SCVMM); this is a label on a virtual machine that instructs the compute cluster to spread the virtual machines across different parts of the cluster. In Azure, virtual machines in the same availability set are placed into different:

Update domains: Avoiding downtime caused by (rare) host reboots for updates.
Fault domains: Enable services to remain operational despite hardware/software failure in a single rack.

The above solution spreads your machines around a single compute (Hyper-V) cluster, in a single room, in a single building. That’s amazing for on-premises, but there can still be an issue. Last summer, a faulty humidity sensor brought down one such room and affected a “small subset” of customers. “Small subset” is OK, unless you are included and some mission critical system was down for several hours. At that point, SLAs are meaningless – a refund for the lost runtime cost of a pair of Linux VMs running network appliance software won’t compensate for thousands or millions of Euros of lost business!

Availability Zones

We can go one step further by instructing Azure to deploy virtual machines into different availability zones. A single region can be made up of different physical locations with independent power and networking. These locations might be close together, as is typically the case in North Europe or West Europe. Or they might be on the other side of a city from each other, as is the case in some in North America. There is a low level of latency between the buildings, but this is still higher than that of a LAN connection.

A region that supports availability zones is split into 4 zones. You see three zones (round robin between customers), labeled as 1, 2, and 3. You can deploy many services across availability zones – this is improving:

VNet: Is software-defined so can cross all zones in a single region.
Virtual machines: Can connect to the same subnet/address space but be in different zones. They are not in availability sets but Azure still maintains service uptime during host patching/reboots.
Public IP Addresses: Standard IP supports anycast and can be used to NAT/load balance across zones in a single region.

Other network resources can work with availability zones in one of two ways:

Zonal: Instances are deployed to a specific zone, giving optimal latency performance within that zone, but can connect to all zones in the region.
Zone Redundant: Instances are spread across the zone for an active/active configuration.

Examples of the above are:

The zone-aware VNet gateways for VPN/ExpressRoute
Standard load balancer
WAGv2 / WAFv2

Considerations

There are some things to consider when looking at availability zones.

Regions: The list of regions that supports availability zones is increasing slowly but it is far from complete. Some regions will not offer this highest level of availability.
Catchup: Not every service in Azure is aware of availability zones, but this is changing.

Let me give you two examples. The first is VM Boot Diagnostics, a service that I consider critical for seeing the console of the VM and getting serial console access without a network connection to the virtual machine. Boot Diagnostics uses an agent in the VM to write to a storage account. That storage account can be:

LRS: 3 replicas reside in a single compute cluster, in a single room, in a single building (availability zone).
GRS: LRS plus 3 asynchronous replicas in the paired region, that are not available for write unless Microsoft declares a total disaster for the primary region.

So, if I have a VM in zone 1 and a VM in zone 2, and both write to a storage account that happens to be in zone 1 (I have no control over the storage account location), and zone 1 goes down, there will be issues with the VM in zone 2. The solution would be to use ZRS GPv2 storage for Boot Diagnostics, however, the agent will not support this type of storage configuration. Gotcha!

Azure Advisor will also be a pain in the ass. Noobs are told to rely on Advisor (it is several questions in the new Azure infrastructure exams) for configuration and deployment advice. Advisor will see the above two VMs as being not highly available because they are not (and cannot) be in a common availability set, so you are advised to degrade their SLA by migrating them to a single zone for an availability set configuration – ignore that advice and be prepared to defend the decision from Azure noobs, such as management, auditors, and ill-informed consultants.

Opinion

Availability zones are important – I use them in an architecture pattern that I am working on with several customers. But you need to be aware of what they offer and how certain things do not understand them yet or do not support them yet.

Cannot Create a Basic Tier Virtual Network Gateway in Azure

There is a bug in the Azure Portal that prevents you from selecting a virtual network when you pick the Basic Tier of the virtual network gateway, and you are forced into selecting the more expensive VpnGw1. I’ll show you how to workaround this bug in this post.

Background

I recently ran a hands-on Azure class in London. Part of the class required deploying & configuring a VPN gateway in the West Europe region. I always use the Basic tier because:

It’s cheaper – $26.79 for Basic versus $141.36 for VpnGw1 per month
That’s what most (by a long shot) of my customers deploy in production because it meets their needs.

I’ve had a customer in Northern Ireland report the same problem in North Europe.

The process goes like this:

You select VPN gateway type
Select Route-Based
Select Basic as the SKU
Then you attempt to select the virtual network that you want to use – it already has a gateway subnet
You cannot continue because the virtual network is greyed out

The error shown is:

The following issues must be fixed to use this virtual network: The VPN gateway cannot have a basic SKU in order for it to coexist with an existing ExpressRoute gateway.

In all cases so far, the subscriptions have been either brand new CSP/trial subscriptions with no previous resources, or my lab subscription where I’ve used a new virtual network to demonstrate this scenario – and I have never deployed ExpressRoute in any subscription.

Workaround

Credit where credit is due – some of my attendees last week figured out how to beat the UI bug.

Close the Choose Virtual Network blade if it is open.
Select the VpnGw1 tier gateway in the Create Virtual Network Gateway blade – don’t worry, you won’t be creating it if you don’t want to pay the price.
Click Choose A Virtual Network
Select your virtual network
Change the SKU of the gateway back to Basic
Finish the wizard

I know – it’s a daft UI bug, but the above workaround works.