Azure Virtual WAN – Connectivity

In this post, I’ll explain how Azure Virtual WAN offers its core service: connections.

SD-WAN

Some of you might be thinking – this is just for large corporations and I’m outta here. Don’t run just yet. Azure Virtual WAN is a rethinking of how to:

  • Connect users to Azure services and on-premises at the same time
  • Connect sites to Azure and (optionally) other sites
  • Replace the legacy hardware-defined WAN
  • Connect Azure virtual networks together.

That first point is quite timely – connecting users to services. Work-from-home (WFH) has forced enterprises to find ways to connect users to services no matter where they are. That connectivity was often limited to a privileged few. The pandemic forced small/large organisations to re-think productivity connectivity and to scale out. Before COVID19 struck, I was starting to encounter businesses that were considering (some even starting) to replace their legacy MPLS WAN with a software-defined WAN (SD-WAN) where media of different types, suitable to different kinds of sites/users/services, were aggregated via appliances; this SD-WAN is lower cost, more flexible, and by leveraging local connectivity, enables smaller locations, such as offices or retail outlets, to have an affordable direct connection to the cloud for better performance. How the on-premises part of the SD-WAN is managed is completely up to you; some will take direct control and some will outsource it to a network service provider.

Connections

Azure Virtual WAN is all about connections. When you start to read about the new Custom Routing model in Azure Virtual WAN, you’ll see how route tables are associated with connections. In summary, a connection is a link between an on-premises location (referred to as a branch, even if it’s HQ) or a spoke virtual network with a Hub. And now we need to talk about some Azure resources.

Azure Resources

I’ve provided lots more depth on this topic elsewhere so I will keep this to the basics. There are two core resources in Azure Virtual WAN:

  • A Virtual WAN
  • A Hub

The Virtual WAN is a logical resource that provides a global service, although it is actually located in one Azure region. Any hubs that are connected to this Virtual WAN resource can talk to each other (automatically), route it’s connections to another hub’s connections and share resources.

A Virtual WAN Hub is similar to a hub in an Azure hub & spoke architecture. It is a central routing point (with a hidden virtual router) that is the meeting point for any connections to that hub. An Azure region can have 1 hub in your tenant. That means I can have 1 Hub in West Europe and 1 Hub in East US. The Hubs must be connected to an Azure WAN resource; if they share a WAN resource then their connections can talk to each other. I might have all my branches in Europe connect to the Hub in West Europe, and I will connect all my spoke virtual networks in West Europe to the Hub in West Europe too; this means that by default (and I can control this):

  • The virtual networks can route to each other
  • The virtual networks can route to the branches
  • The branches can route to the virtual networks
  • The branches can route to other branches

We can extend this routing by connecting my branches in North America to the East US Hub and the spoke virtual networks in East US to the East US Hub. Yes; all those North American locations can route to each other. Because the Hubs are connected to a common Virtual WAN, the routing now extends across the Microsoft WAN. That means a retail outlet in the further reaches of northwest rural Ireland can connect to services hosted in East US, via a connection to the Hub in West Europe, and then hopping across the Atlantic Ocean using Microsoft’s low-latency WAN. Nice, right? Even, better – it routes just like that automatically if you are using SD-WAN appliances in the branches.

Note that a managed WAN might wire up that retail outlet differently, but still provide a fairly low-latency connection to the local Hub.

Branch Connections

If you have done any Azure networking then you are probably familiar with:

  • Site-to-site VPN: Connecting a location with a cost-effective but no-SLA VPN tunnel to Azure.
  • ExpressRoute: A circuit rented from an ISP for low-latency, high bandwidth, and an SLA-supported private connection to Azure
  • Point-to-Site VPN: Enabling end-users to create a private VPN tunnel to Azure from their devices while on the move or working from home

Each of the above is enabled in Azure using a Virtual Network Gateway, each running independently. Routing from branch to branch is not an intended purpose. Routing from user to the branch is not an intended purpose. The Virtual Network Gateway’s job is to connect a user to Azure.

The Azure Virtual WAN Hub supports gateways – as hidden resources that must be enabled and configured. All three of the above media types are supported as 3 different types of gateway, sized based on a billing concept called scale units – more scale units means more bandwidth and more cost, with a maximum hub throughput of 40 Gbps (including traffic to/from/between spokes).

Note that a Secured Virtual Hub, featuring the Azure Firewall, has a limit of 30 Gbps if all traffic is routed through that firewall.

You can be flexible with the branch connections. Some locations might be small and have a VPN connection to the Hub. Other locations might require an SLA and use ExpressRoute. Some might require low latency or greater bandwidth and use higher SKUs of ExpressRoute. And of course, some users will be on the move or at home and use P2S VPN. A combination of all 3 connection types can be used at once, providing each location and user the connections and costs that suit them best.

ExpressRoute

You will be using ExpressRoute Standard for Azure Virtual WAN; this is a requirement. I don’t think there’s really too much more to say here – the tech just works once the circuit is up, and a combination of Global Reach and the any-to-any connections/routing of Azure WAN means that things will just work.

Site-to-Site VPN

The VPN gateway is deployed in an active/active cluster configuration with two public IP addresses. A branch using VPN for connectivity can have:

  • A single VPN connection over a single ISP connection.
  • Resilient VPN connections over two ISP connections, ideally with different physical providers or even media types.

An on-premises SD-WAN appliance is strongly recommended for Azure Virtual WAN, but you can use any VPN appliance that is supported for route-based VPN by Microsoft Azure; if you are doing the latter you can use BGP or the Azure WAN alternative to Local Network Gateway-provided prefixes for routing to on-premises.

Point-to-Site (P2S) VPN

The P2S gateway offers a superior service to what you might have observed with the traditional Virtual Network Gateway for VPN. Connectivity from the user device is to a hub with a routing appliance. Any-to-any connectivity treats the user device as a branch, albeit in a dedicated network address space. Once the user has connected the VPN tunnel, they can route to (by default):

  • Any spoke virtual network connected to the Hub
  • Any spoke virtual network connected to another Hub on the same Virtual WAN
  • Any branch office connected to any Hub on the Virtual WAN

In summary, the user is connected to the WAN as a result of being connected to the Hub and is subject to the routing and firewall configurations of that Hub. That’s a pretty nice WFH connectivity solution.

Note that you have support for certificate and RADIUS authentication in P2S VPN, as well as the OpenVPN and Microsoft client.

The Connectivity Experience

Imagine we’re back in normal times again with common business travel. A user in Amsterdam could sit down at their desk in the office and connect to services in West Europe via VPN. They could travel to a small office in Luxembourg and connect to the same services via VPN with no discernible difference. That user could travel to a conference in London and use P2S VPN from their hotel room to connect via the Amsterdam Hub. Now that user might get a jet to Philadelphia, and use their mobile hotspot to offer connectivity to the Azure Virtual WAN Hub in East US via P2S VPN – and the experience is no different!

One concept I would like to try out and get a support statement on is to abstract the IP addresses and locations of the P2S gateways using Azure Traffic Manager so the user only needs to VPN to a single FQDN and is directed (using the performance profile) to the closest (latency) Hub in the Virtual WAN with a P2S gateway.

Simplicity

So much is done for you with Azure Virtual WAN. If you like to click in the Azure Portal, it’s a pretty simple set up to get things going, although security engineering looks to have a steep learning curve with Custom Routing. By default, everything is connected to everything; that’s what a network should do. You shouldn’t have to figure out how to route from A to B. I believe that Azure WAN will offer a superior connectivity solution, even for a single location organisation. That’s why I’ve been spending time figuring this tech out over the last few weeks.

Azure Virtual WAN ARM – The Chicken & Egg Gateway ID Discombobulation

This post will explain how to deal with the gateway ID properties in the Azure Microsoft.Network/virtualhubs resource when using ARM templates.

Background

The Azure WAN Hub is capable of having 3 gateway sub-resources:

  • Point-to-site VPN: Microsoft.Network/p2sVpnGateways
  • VPN (site-to-site): Microsoft.Network/vpnGateways
  • ExpressRoute: Microsoft.Network/expressRouteGateways, which does not support diagnostic settings in the 2020-04-01 API

As you would expect, when you create these resources, you have to supply them with the resource ID of the Microsoft.Network/virtualhubs resource:

"virtualHub": {
  "id": "<<<<resource ID of the virtual hub>>>>"
},

What is a surprise is what happens in the Microsoft.Network/virtualhubs resource. After a gateway is associated, a property (type object, presumably for future-proofing) for the associated gateway type is added to the hub:

"vpnGateway": {
  "id": "<<<< Resource ID of Microsoft.Network/vpnGateways resource>>>>"
},
"expressRouteGateway": { 
 "id": "<<<< Resource ID of Microsoft.Network/p2sVpnGateways resource>>>>"
},
"p2SVpnGateway": { 
 "id": "<<<< Resource ID of Microsoft.Network/expressRouteGateways resource>>>>"
},

The surprising thing is what happens.

The Problem

There are 3 possible states in the hub when it comes to each gateway:

  1. The hub exists without a gateway: The above hub properties are not required.
  2. The gateways are being added: The above hub properties cannot be added because the gateway resource ID points to a resource that does not exist yet – the hub must exist and be configured before the gateway(s).
  3. The gateways exist: Any re-run of the ARM template (which might be common to update the hub route tables or configuration via DevOps) must include the above gateway properties in the hub resource with the correct resource IDs for the gateways.

And steps 2 and 3 are where the chicken and egg are in an ARM template. You must supply the gateway resource ID in the hub for all updates to the hub after a gateway is deployed, and you must not include the gateway resource ID in the hub when deploying the gateway. This would be easy to deal with if ARM would (finally) give us a “ifexists()” function but there is no sign of that. So we need a hack solution.

The Hack Solution

This one comes from the Well-Architected Framework/Cloud Adoption Framework, Enterprise-Scale Architecture. This way-too-complicated beastie shows how Microsoft’s people are dealing with the issue. The JSON for the Microsoft.Network/virtualhubs template contains these properties:

"properties": {
  "virtualWan": {
    "id": "[variables('vwanresourceid')]"
  },
  "addressPrefix": "[parameters('vHUB').addressPrefix]",
  "vpnGateway": "[if(not(empty(parameters('vHUB').vpnGateway)),parameters('vHUB').vpnGateway, json('null'))]"
}

The key for dealing with vpnGateway is the vHUB parameter, an object that contains a value called vpnGateway.

When they first run the deployment, the value of vHUB.vpngateway is set to {} or null in the parameters file, stored in GitHub. That means that when the hub is first run (and there is no VPN gateway), the if statement in the above snippet will pass json(‘null’) to the vpnGateway property. That is acceptable to the resource provider and the hub will deploy cleanly. Later on in the deployment, the VPN gateway will be created.

If you were to just re-run the hub template now, you will get an error about not being allowed to change the vpnGateway property in the hub resource. Behind the scenes it has been updated by the VPN gateway deployment. Every execution of the hub template must now include the resource ID of the VPN Gateway – that sucks, right? Now the hack really kicks in.

After the first deployment of the hub (and the VPN Gateway), you must open the resource group in the Azure Portal, enable viewing hidden items, open the VPN Gateway resource, go to properties, and document the resource ID.

Now, you need to open the parameters file for the hub. Edit the vHUB.vpnGateway property and set it to:

"vpnGateway": { 
 "id": "<<<< Resource ID of Microsoft.Network/vpnGateways resource>>>>"
},

Now you can cleanly re-run the hub template.

How Should It Work?

The best solution would be if the gateway ID properties were just documentation for Azure, properties that we humans cannot edit. But I suspect that the ability to configure these settings might have something to do with the newly announced NVA-in-hub preview. Otherwise, ARM needs to finally give us an ifexists() function – vote here now if you agree.