An Introduction to Azure ExpressRoute Architecture

This post will give you an overview of Azure ExpressRoute architecture. This is not a “how to” post; instead, the purpose of this post is to document the options for architecting connectivity with Microsoft Azure in one concise (as much as possible) document.

Introduction to ExpressRoute

Azure ExpressRoute is a form of private Layer-2 or Layer-3 network connectivity between a customer’s on-premises network(s) and a virtual network hosted in Microsoft Azure. ExpressRoute is one of the 2 Azure-offered solutions (also, VPN) for achieving a private network connection.

There are 2 vendor types that can connect you to Azure using ExpressRoute:

  • Exchange provider: Has an ExpressRoute circuit in their data centre. Either you run your “on-premises” in their data centre or you connect to their data centre.
  • Network service provider: You get a connection to an ISP and they relay you to a Microsoft edge data centre or POP.

The locations of ExpressRoute and Azure are often confused. A connection using ExpressRoute, at a very high level and from your perspective, has three pieces:

  • Circuit: A connection to a Microsoft edge data centre or pop. This can be one of many global locations that are often nothing to do with Azure regions; they are connected to the same Microsoft WAN as Azure (and Microsoft 365) and are a means to relay you to Azure (or Microsoft 365) using Azure ExpressRoute.
  • Connection: Connecting an Azure Virtual Network (ExpressRoute Gateway) in an Azure region to a circuit that terminates at the edge data centre or POP.
  • Peering: Configuring the routing across the circuit and connection.

For example, a customer in Eindhoven, Netherlands might have an ExpressRoute circuit that connects to “Amsterdam”; This POP or edge data centre is probably in Amsterdam, Netherlands, or the suburbs. The customer might use that circuit to connect to Azure West Europe, colloquially called “Amsterdam”, but is actually in Middenmeer, approximately 60 KM north of Amsterdam.

ExpressRoute Versus VPN

The choice between ExpressRoute and site-to-site VPN isn’t always as clear-cut as one might think: “big organisations go with ExpressRoute and small/mid go with VPN”. Very often, organisations are choosing to access Azure services over the Internet using HTTPS, with small amounts of legacy traffic traversing a private connection. In this case, VPN is perfect. But when you want an SLA or low latency, ExpressRoute is your choice.

Site-to-Site VPN ExpressRoute
Microsoft SLA Microsoft: Azure

Internet: No one

Microsoft: Azure

Service Provider: Circuit

Max bandwidth Aggregate of 10 Gbps 100 Gbps
Routing BGP (even if you don’t use/enable it) BGP
Latency Internet Low
Multi-Site See SD-WAN (Azure Virtual WAN) Global Reach

Also see Azure Virtual WAN

Connections Azure Virtual Networks Azure Virtual Networks

Other Azure Services

Microsoft 365

Dynamics 365

Other clouds, depending on service provider

Payment Outbound data transfer and your regular Internet connection Payment to service provider for the circuit.

Payment for either a metered (outbound data + circuit) or unlimited data (circuit) to Microsoft.

Terminology

  • Customer premises equipment (CPE) or Customer edge routers (CEs): 2, ideally, edge devices that will be connected in a highly available way to 2 lines connecting your network(s) to the service provider.
  • Provider edge routers (PEs), CE facing: Routers or switches operated by the service provider that the customer is connected to.
  • Provider edge routers (PEs), MSEE facing: Routers or switches operated by the service provider that connect to Microsoft’s MSEEs.
  • Microsoft Enterprise Edge (MSEE) routers: Routers in the Microsoft POP or edge data centre that the service provider has connected to.

The MSEE is what:

  • Your ExpressRoute virtual network gateway connects to.
  • Propagates BGP routes to your virtual network.
  • Can connect two virtual networks together (with BGP propagation) if they both connect to the same circuit (MSEE).
  • Can relay you to other Azure services or other Microsoft cloud services.

It is very strongly recommended that the customer deploys two highly available pieces of hardware for the CEs. The ExpressRoute virtual network gateway is also HA, but if the Azure region supports it, spread the two nodes across different availability zones for a higher level of availability.

FYI, these POPs or Edge Data Centers also host other Azure services for edge services.

Peering

Quite often, the primary use case for Azure ExpressRoute is to connect to Azure virtual networks, and resources connected to those virtual networks such as:

  • Virtual machines
  • VNet integrated SKUs such as App Service Environment, API Management, and SQL Managed Instance
  • Platform services supporting Private Endpoint

That connectivity is provided by Azure Private Peering. However, you can also connect to other Microsoft services using Microsoft Peering:

To use Microsoft Peering you will need to configure NAT to convert connections from private IP addresses to public IP addresses before they enter the Microsoft network.

ExpressRoute And VPN

There are two scenarios where ExpressRoute and site-to-site VPN can coexist to connect the same on-premises network and virtual network.

The first is for failover. If you deploy a /27 or larger GatewaySubnet then that subnet can contain an ExpressRoute Virtual Network Gateway and a VPN Virtual Network Gateway. You can then configure ExpressRoute and VPN to connect the same on-premises and Azure networks. The scenario here is that the VPN tunnel will be an automated failover connection for the ExpressRoute circuit – failover will happen automatically with less than 10 packets being lost. Two things immediately come to mind:

  • Use a different ISP for Internet/VPN connection than used for ExpressRoute
  • Both connections must propagate the same on-premises networks.

An interesting new twist was announced recently for Virtual Network Gateway and Azure Virtual WAN. By default, there is no encryption on your ExpressRoute circuit (more on this later). You will be able to initiate a site-to-site VPN connection across the ExpressRoute circuit to a VPN Virtual Network Gateway that is in the same GatewaySubnet as the ExpressRoute Virtual Network Gateway, encrypting your traffic.

ExpressRoute Tiers

There are three tiers of ExpressRoute circuit that you can deploy in Microsoft Azure. I have not found a good comparison table, so the below will not be complete:

Standard Premium
Price Normal More Expensive
Azure Virtual WAN support Announced, not GA GA
Azure Global Reach Limited to same geo-zone All regions
Max connections per circuit 10 100, depending on the circuit size (Mbps) – 20 for 50 Mbps, 100 for 10 Gbps+
Connections from different subscriptions No Yes
Max routes advertised Private peering: 4,000

Microsoft peering: 200

Private Peering: Up to 10,000

Microsoft peering: 200

I said “three tiers”, right? But there is also a third tier called Local which is very lightly documented. ExpressRoute Local is a subset of ExpressRoute Standard where:

  • The circuit can only connect to 1 or 2 Azure regions in the same metro as the POP or edge data centre. Therefore it is available in fewer locations than ExpressRoute Standard.
  • ExpressRoute Global Reach is not available.
  • It requires an unlimited data plan with at least 1 Gbps, coming in at ~25% of the price of a 1 Gbps Standard tier unlimited data plan.

Service Provider Types

There are three ways that a service provider can connect you to Azure using ExpressRoute, with two of them being:

  • Layer-2: A VLAN is stretched from your on-premises network to Azure
  • Layer-3: You connect to Azure over IP VPN or MPLS VPN. Your on-premises network connects either by BGP or a static default route.

There is a third option, called ExpressRoute Direct.

ExpressRoute Direct

A subset of the Microsoft POPs or edge data centres offer a third kind of connection for Azure ExpressRoute called ExpressRoute Direct. The features of this include:

  • Larger sizes: You can have sizes from 1 Gbps to 100 Gbps for massive data ingestion, for things like Cosmos DB or storage (HPC).
  • Physical Isolation: Some organisations will have a compliance reason to avoid connections to shared network equipment (the CEs and MSEE).
  • Granular control of circuit distribution: Based on business unit

This is a very specialised SKU that you must apply to use.

ExpressRoute FastPath

The normal flow of packets routing into Azure over ExpressRoute is:

  1. Enter Microsoft at the MSEE
  2. Travel via the ExpressRoute Virtual Network Gateway.
  3. If a route table exists, follow that route, for example, to a hub-based firewall.
  4. Route to the NIC of the virtual machine

There is a tiny latency penalty by routing through the Virtual Network Gateway. For a tiny percentage of customers, this latency may cause issues.

The concept of ExpressRoute Fast Path is that you can skip the hop of the virtual network gateway and route directly to the NICs of the virtual machines (in the same virtual network as the gateway).

To use this feature you must be using one of these gateway sizes:

  • Ultra Performance
  • ErGw3AZ

The following are not supported and will force traffic to route via the ExpressRoute Virtual Network Gateway:

  • There is a UDR on the GatewaySubnet
  • Virtual Network Peering is used. An alternative is to connect the otherwise-peered VNets directly to the circuit with their own VNet Gateway.
  • You use a Basic Load Balancer in front of the VMs; use a Standard tier Load Balancer.
  • You are attempting to connect to Private Endpoint.

ExpressRoute Global Reach

I think that ExpressRoute Global Reach is one of the more interesting features in ExpressRoute. You can have two or more offices, each with their own ExpressRoute (not Local tier) circuit to a local POP/edge data center, and enable Global Reach to allow:

  • The offices to connect to Azure/Microsoft cloud resources
  • Connect to each other over the Microsoft WAN instead of deploying a WAN

Note that ExpressRoute Standard will support connecting locations in the same geo-zone, and ExpressRoute Premium will support all geo-zones. Supported POPs are limited to a small subset of locations.

Encryption

Traffic over ExpressRoute is not encrypted and as Edward Snowden informed us, various countries are doing things to sniff traffic. If you wish to protect your traffic you will have to “bring your own key”.  We have a few options:

  • The aforementioned VPN over ExpressRoute, which is available now for Virtual Network Gateway and Azure Virtual WAN.
  • Implement a site-to-site VPN across ExpressRoute using a third-party virtual appliance hosted in the Azure VNet.
  • IPsec configured on each guest OS, limited to machines.
  • MACsec, a Layer-2 feature where you can implement your own encryption from your VE to the MSEE, encrypting all traffic, not just to/from VMs.

The MACsec key is stored securely in Azure Key Vault. From what I can see, MACsec is only available on ExpressRoute Direct. Microsoft claims that it does not cause a performance issue on their routers, but they do warn you to check your CE vendor guidance.

Multi-Cloud

Now you’ll see why I talked about Layer-2 and Layer-3. Depending on your service provider type and their connectivity to non-Microsoft clouds, if you have a circuit with the service provider (from your CEs to their CE facing PEs) that same circuit can be used to connect to Azure over ExpressRoute and to other clouds such as AWS or others. With BGP propagation, you could route from on-premises to/from either cloud, and your deployments in those clouds could route to each other.

Bidirectional Forwarding Detection (BFD)

The circuit is deployed as two connections, ideally connected to 2 CEs in your edge network. Failover is automated, but some will want failover to be as quick as possible. You can reduce the BGP keepalive and hold-time but this will be processor intensive on the network equipment.

A feature called BFD can detect link failure in a sub-second with low overhead. BFD is enabled on “newly” created ExpressRoute private peering interfaces on the MSEEs – you can reset the peering if required. If you want this feature then you need to enable it on your CEs – the service provider must also enable it on their PEs.

Monitoring

Azure Monitor provides a bunch of metrics for ExpressRoute that you can visualise or create alerts on.

Azure’s Connection Monitor is the Microsoft-offered solution for monitoring an ExpressRoute connection. The idea is that a Log Analytics agent (Windows or Linux) is deployed onto one or more always-on on-premises machines. A test is configured to run across the circuit measuring availability and performance.

 

Azure Virtual WAN – Connectivity

In this post, I’ll explain how Azure Virtual WAN offers its core service: connections.

SD-WAN

Some of you might be thinking – this is just for large corporations and I’m outta here. Don’t run just yet. Azure Virtual WAN is a rethinking of how to:

  • Connect users to Azure services and on-premises at the same time
  • Connect sites to Azure and (optionally) other sites
  • Replace the legacy hardware-defined WAN
  • Connect Azure virtual networks together.

That first point is quite timely – connecting users to services. Work-from-home (WFH) has forced enterprises to find ways to connect users to services no matter where they are. That connectivity was often limited to a privileged few. The pandemic forced small/large organisations to re-think productivity connectivity and to scale out. Before COVID19 struck, I was starting to encounter businesses that were considering (some even starting) to replace their legacy MPLS WAN with a software-defined WAN (SD-WAN) where media of different types, suitable to different kinds of sites/users/services, were aggregated via appliances; this SD-WAN is lower cost, more flexible, and by leveraging local connectivity, enables smaller locations, such as offices or retail outlets, to have an affordable direct connection to the cloud for better performance. How the on-premises part of the SD-WAN is managed is completely up to you; some will take direct control and some will outsource it to a network service provider.

Connections

Azure Virtual WAN is all about connections. When you start to read about the new Custom Routing model in Azure Virtual WAN, you’ll see how route tables are associated with connections. In summary, a connection is a link between an on-premises location (referred to as a branch, even if it’s HQ) or a spoke virtual network with a Hub. And now we need to talk about some Azure resources.

Azure Resources

I’ve provided lots more depth on this topic elsewhere so I will keep this to the basics. There are two core resources in Azure Virtual WAN:

  • A Virtual WAN
  • A Hub

The Virtual WAN is a logical resource that provides a global service, although it is actually located in one Azure region. Any hubs that are connected to this Virtual WAN resource can talk to each other (automatically), route it’s connections to another hub’s connections and share resources.

A Virtual WAN Hub is similar to a hub in an Azure hub & spoke architecture. It is a central routing point (with a hidden virtual router) that is the meeting point for any connections to that hub. An Azure region can have 1 hub in your tenant. That means I can have 1 Hub in West Europe and 1 Hub in East US. The Hubs must be connected to an Azure WAN resource; if they share a WAN resource then their connections can talk to each other. I might have all my branches in Europe connect to the Hub in West Europe, and I will connect all my spoke virtual networks in West Europe to the Hub in West Europe too; this means that by default (and I can control this):

  • The virtual networks can route to each other
  • The virtual networks can route to the branches
  • The branches can route to the virtual networks
  • The branches can route to other branches

We can extend this routing by connecting my branches in North America to the East US Hub and the spoke virtual networks in East US to the East US Hub. Yes; all those North American locations can route to each other. Because the Hubs are connected to a common Virtual WAN, the routing now extends across the Microsoft WAN. That means a retail outlet in the further reaches of northwest rural Ireland can connect to services hosted in East US, via a connection to the Hub in West Europe, and then hopping across the Atlantic Ocean using Microsoft’s low-latency WAN. Nice, right? Even, better – it routes just like that automatically if you are using SD-WAN appliances in the branches.

Note that a managed WAN might wire up that retail outlet differently, but still provide a fairly low-latency connection to the local Hub.

Branch Connections

If you have done any Azure networking then you are probably familiar with:

  • Site-to-site VPN: Connecting a location with a cost-effective but no-SLA VPN tunnel to Azure.
  • ExpressRoute: A circuit rented from an ISP for low-latency, high bandwidth, and an SLA-supported private connection to Azure
  • Point-to-Site VPN: Enabling end-users to create a private VPN tunnel to Azure from their devices while on the move or working from home

Each of the above is enabled in Azure using a Virtual Network Gateway, each running independently. Routing from branch to branch is not an intended purpose. Routing from user to the branch is not an intended purpose. The Virtual Network Gateway’s job is to connect a user to Azure.

The Azure Virtual WAN Hub supports gateways – as hidden resources that must be enabled and configured. All three of the above media types are supported as 3 different types of gateway, sized based on a billing concept called scale units – more scale units means more bandwidth and more cost, with a maximum hub throughput of 40 Gbps (including traffic to/from/between spokes).

Note that a Secured Virtual Hub, featuring the Azure Firewall, has a limit of 30 Gbps if all traffic is routed through that firewall.

You can be flexible with the branch connections. Some locations might be small and have a VPN connection to the Hub. Other locations might require an SLA and use ExpressRoute. Some might require low latency or greater bandwidth and use higher SKUs of ExpressRoute. And of course, some users will be on the move or at home and use P2S VPN. A combination of all 3 connection types can be used at once, providing each location and user the connections and costs that suit them best.

ExpressRoute

You will be using ExpressRoute Standard for Azure Virtual WAN; this is a requirement. I don’t think there’s really too much more to say here – the tech just works once the circuit is up, and a combination of Global Reach and the any-to-any connections/routing of Azure WAN means that things will just work.

Site-to-Site VPN

The VPN gateway is deployed in an active/active cluster configuration with two public IP addresses. A branch using VPN for connectivity can have:

  • A single VPN connection over a single ISP connection.
  • Resilient VPN connections over two ISP connections, ideally with different physical providers or even media types.

An on-premises SD-WAN appliance is strongly recommended for Azure Virtual WAN, but you can use any VPN appliance that is supported for route-based VPN by Microsoft Azure; if you are doing the latter you can use BGP or the Azure WAN alternative to Local Network Gateway-provided prefixes for routing to on-premises.

Point-to-Site (P2S) VPN

The P2S gateway offers a superior service to what you might have observed with the traditional Virtual Network Gateway for VPN. Connectivity from the user device is to a hub with a routing appliance. Any-to-any connectivity treats the user device as a branch, albeit in a dedicated network address space. Once the user has connected the VPN tunnel, they can route to (by default):

  • Any spoke virtual network connected to the Hub
  • Any spoke virtual network connected to another Hub on the same Virtual WAN
  • Any branch office connected to any Hub on the Virtual WAN

In summary, the user is connected to the WAN as a result of being connected to the Hub and is subject to the routing and firewall configurations of that Hub. That’s a pretty nice WFH connectivity solution.

Note that you have support for certificate and RADIUS authentication in P2S VPN, as well as the OpenVPN and Microsoft client.

The Connectivity Experience

Imagine we’re back in normal times again with common business travel. A user in Amsterdam could sit down at their desk in the office and connect to services in West Europe via VPN. They could travel to a small office in Luxembourg and connect to the same services via VPN with no discernible difference. That user could travel to a conference in London and use P2S VPN from their hotel room to connect via the Amsterdam Hub. Now that user might get a jet to Philadelphia, and use their mobile hotspot to offer connectivity to the Azure Virtual WAN Hub in East US via P2S VPN – and the experience is no different!

One concept I would like to try out and get a support statement on is to abstract the IP addresses and locations of the P2S gateways using Azure Traffic Manager so the user only needs to VPN to a single FQDN and is directed (using the performance profile) to the closest (latency) Hub in the Virtual WAN with a P2S gateway.

Simplicity

So much is done for you with Azure Virtual WAN. If you like to click in the Azure Portal, it’s a pretty simple set up to get things going, although security engineering looks to have a steep learning curve with Custom Routing. By default, everything is connected to everything; that’s what a network should do. You shouldn’t have to figure out how to route from A to B. I believe that Azure WAN will offer a superior connectivity solution, even for a single location organisation. That’s why I’ve been spending time figuring this tech out over the last few weeks.

Azure Virtual WAN ARM – The Resources

In this post, I will explain the types of resources used in Azure Virtual WAN and the nature of their relationships.

Note, I have not included any content on the recently announced preview of third-party NVAs. I have not seen any materials on this yet to base such a post on and, being honest, I don’t have any use-cases for third-party NVAs.

As you can see – there are quite a few resources involved … and some that you won’t see listed at all because of the “appliance-like” nature of the deployment. I have not included any detail on spokes or “branch offices”, which would require further resources. The below diagram is enough to get a hub operational and connected to on-premises locations and spoke virtual networks.

The Virtual WAN – Microsoft.Network/virtualWans

You need at least one Virtual WAN to be deployed. This is what the hub will connect to, and you can connect many hubs to a common Virtual WAN to get automated any-to-any connectivity across the Microsoft physical WAN.

Surprisingly, the resource is deployed to an Azure region and not as a global resource, such as other global resources such as Traffic Manager or Azure DNS.

The Virtual Hub – Microsoft.Network/virtualHubs

Also known as the hub, the Virtual Hub is deployed once, and once only, per Azure region where you need a hub. This hub replaces the old hub virtual network (plus gateway(s), plus firewall, plus route tables) deployment you might be used to. The hub is deployed as a hidden resource, managed through the Virtual WAN in the Azure Portal or via scripting/ARM.

The hub is associated with the Virtual WAN through a virtualWAN property that references the resource ID of the virtualWans resource.

In a previous post, I referred to a chicken & egg scenario with the virtualHubs resource. The hub has properties that point to the resource IDs of each deployed gateway:

  • vpnGateway: For site-to-site VPN.
  • expressRouteGateway: For ExpressRoute circuit connectivity.
  • p2sVpnGateway: For end-user/device tunnels.

If you choose to deploy a “Secured Virtual Hub” there will also be a property called azureFirewall that will point to the resource ID of an Azure Firewall with the AZFW_Hub SKU.

Note, the restriction of 1 hub per Azure region does introduce a bottleneck. Under the covers of the platform, there is actually a virtual network. The only clue to this network will be in the peering properties of your spoke virtual networks. A single virtual network can have, today, a maximum of 500 spokes. So that means you will have a maximum of 500 spokes per Azure region.

Routing Tables – Microsoft.Network/virtualHubs/hubRouteTables & Microsoft.Network/virtualHubs/routeTables

These are resources that are used in custom routing, a recently announced as GA feature that won’t be live until August 3rd, according to the Azure Portal. The resource control the flows of traffic in your hub and spoke architecture. They are child-resources of the virtualHubs resource so no references of hub resource IDs are required.

Azure Firewall – Microsoft.Network/azureFirewalls

This is an optional resource that is deployed when you want a “Secured Virtual Hub”. Today, this is the only way to put a firewall into the hub, although a new preview program should make it possible for third-parties to join the hub. Alternatively, you can use custom routing to force north-south and east-west traffic through an NVA that is running in a spoke, although that will double peering costs.

The Azure Firewall is deployed with the AZFW_Hub SKU. The firewall is not a hidden resource. To manage the firewall, you must use an Azure Firewall Policy (aka Azure Firewall Manager). The firewall has a property called firewallPolicy that points to the resource ID of a firewallPolicies resource.

Azure Firewall Policy – Microsoft.Network/firewallPolicies

This is a resource that allows you to manage an Azure Firewall, in this case, an AZFW_Hub SKU of Azure Firewall. Although not shown here, you can deploy a parent/child configuration of policies to manage firewall configurations and rules in a global/local way.

VPN Gateway – Microsoft.Network/vpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using VPN. The VPN Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the VPN gateway resource ID using a property called vpnGateway.

ExpressRoute Gateway – Microsoft.Network/expressRouteGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using ExpressRoute. The ExpressRoute Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the ExpressRoute gateway resource ID using a property called p2sGateway.

Point-to-Site Gateway – Microsoft.Network/p2sVpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides users/devices with connectivity using VPN tunnels. The Point-to-Site Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

The Point-to-Site Gateway inherits a VPN configuration from a VPN configuration resource based on Microsoft.Network/vpnServerConfigurations, referring to the configuration resource by its resource ID using a property called vpnServerConfiguration.

Note that the virtualHubs resource must also point at the resource ID of the Point-to-Site gateway resource ID using a property called p2sVpnGateway.

VPN Server Configuration – Microsoft.Network/vpnServerConfigurations

This configuration for Point-to-Site VPN gateways can be seen in the Azure WAN and is intended as a shared configuration that is reusable with more than one Point-to-Site VPN Gateway. To be honest, I can see myself using it as a per-region configuration because of some values like DNS servers and RADIUS servers that will probably be placed per-region for performance and resilience reasons. This is a hidden resource.

The following resources were added on 22nd July 2020:

VPN Sites – Microsoft.Network/vpnSites

This resource has a similar purpose to a Local Network Gateway for site-to-site VPN connections; it describes the on-premises location, AKA “branch office”.  A VPN site can be associated with one or many hubs, so it is actually connected to the Virtual WAN resource ID using a property called virtualWan. This is a hidden resource.

An array property called vpnSiteLinks describes possible connections to on-premises firewall devices.

VPN Connections – Microsoft.Network/vpnGateways/vpnConnections

A VPN Connections resource associates a VPN Gateway with the on-premises location that is described by an associated VPN Site. The vpnConnections resource is a child resource of vpnGateways, so there is no actual resource; the vpnConnections resource takes its name from the parent VPN Gateway, and the resource ID is an extension of the parent VPN Gateway resource ID.

By necessity, there is some complexity with this resource type. The remoteVpnSite property links the vpnConnections resource with the resource ID of a VPN Site resource. An array property, called vpnSiteLinkConnections, is used to connect the gateway to the on-premises location using 1 or 2 connections, each linking from vpnSiteLinkConnections to the resource/property ID of 1 or 2 vpnSiteLinks properties in the VPN Site. With one site link connection, you have a single VPN tunnel to the on-premises location. With 2 link connections, the VPN Gateway will take advantage of its active/active configuration to set up resilient tunnels to the on-premises location.

Virtual Network Connections – Microsoft.Network/virtualHubs/hubVirtualNetworkConnections

The purpose of a hub is to share resources with spoke virtual networks. In the case of the Virtual Hub, those resources are gateways, and maybe a firewall in the case of Secured Virtual Hub. As with a normal VNet-based hub & spoke, VNet peering is used. However, the way that VNet peering is used changes with the Virtual Hub; the deployment is done using the hub/VirtualNetworkConnections child resource, whose parent is the Virtual Hub. Therefore, the name and resource ID are based on the name and resource ID of the Virtual Hub resource.

The deployment is rather simple; you create a Virtual Network Connection in the hub specifying the resource ID of the spoke virtual network, using a property called remoteVirtualNetwork. The underlying resource provider will initiate both sides of the peering connection on your behalf – there is no deployment required in the spoke virtual network resource. The Virtual Network Connection will reference the Hub Route Tables in the hub to configure route association and propagation.

More Resources

There are more resources that I’ve yet to document, including:

Verifying Propagated BGP Routes on Azure ExpressRoute

An important step of verifying or troubleshooting communications over ExpressRoute is checking that all the required routes to get to on-premises or WAN subnets have been propagated by BGP to your ExpressRoute Virtual Network Gateway (and the connected virtual networks) by the on-premises edge router.

The Problem

Routing to Azure is often easy; your network admins allocate you a block of private address space on the “WAN” and you use it for your virtual network(s). They add a route entry to that CIDR block on their VPN/ExpressRoute edge device and packets can now get to Azure. The other part of that story is that Azure needs to know how to send packets back to on-premises – this affects responses and requests. And I have found that this is often overlooked and people start saying things like “Azure networking is broken” when they haven’t sent a route to Azure so that the Azure resources connected to the virtual network(s) can respond.

The other big cause is that the on-premises edge firewall doesn’t allow the traffic – this is the #1 cause of RDP/SSH to Azure virtual machines not working, in my experience.

I had one such scenario where a system in Azure was “not-accessible”. We verified that everything in Azure was correct. When we looked at the propagated BGP routes (via ExpressRoute) then we saw the client subnets were not included in the Route Table. The on-prem network admins had not propagated those routes so the Azure ExpressRoute Gateway did not have a route to send clients responses to. Once the route was propagated, things worked as expected.

Finding the Routes

There are two ways you can do this. The first is to use PowerShell:

Get-AzExpressRouteCircuitRouteTable -DevicePath Primary -ExpressRouteCircuitName TheNameOfMyCircuitResourceInAzure -PeeringType AzurePrivatePeering -ResourceGroupName TheNameOfTheResourceGroupTheCircuitResourceIsIn

The command takes quite a while to run. Eventually, it will spit out the full route table. If there are lots of routes (there could be hundreds if not thousands) then they will scroll beyond the buffer of your console. So modify the command to send the output to a text file:

Get-AzExpressRouteCircuitRouteTable -DevicePath Primary -ExpressRouteCircuitName TheNameOfMyCircuitResourceInAzure -PeeringType AzurePrivatePeering -ResourceGroupName TheNameOfTheResourceGroupTheCircuitResourceIsIn > BgpRouteTable.txt

Unfortunately, it does not create a CSV format by default but one could format the output to get something that’s easier to filter and manipulate.

You can also use the Azure Portal where you can view routes from the Route Table and export a CSV file with the contents of the Route Table. Open the ExpressRoute Circuit and browse to Peerings.

Click Azure Private, which is the site-to-site ExpressRoute connection.

Now a pop-up blade appears in the Azure Portal called Private Peering. There are three interesting options here:

  • Get ARP records to see information on ARP.
  • Get Route Table – more on this in a second.
  • Get Route Table Summary to get a breakdown/summary of the records, including neighbor, version, status ASN, and a count of routes.

We want to see the Route Table so you click that option. Another pop-up blade appears and now you wait for several minutes. Eventually, the screen will load up to 200 of the entries from the Route Table. If you want to see the entire list of entries or you want an export, click Download. A CSV file will download via your browser, with one line per route from the Route Table, including every one of the routes.

Search the Route Table and look for a listing that either lists the on-premises/WAN subnet or includes it’s space, for example, a route to 10.10.0.0/16 includes a subnet called 10.10.10.0/24.

Microsoft Ignite 2019 – Global Transit Network Architectures With Azure Virtual WAN

Speakers:

  • Reshmi Yandapalli (main speaker), Principal Program Manager
  • Ben Peeri, KPMG customer story

Lots more content in the hidden slides in the download.

Scale

Usual stats. Interesting note: a new POP being built almost every day.

Azure WAN: Global Transit Architecture

The Beginning

  • HQ/Bigger Office
  • Branhc office(s)
  • Users
  • Private WAN
  • Shared services

Start with HQ. Users multiply. VLANs multiply. Locations multiply. WAN grows. You grow:

  • Need to simplify network
  • Need ease of use
  • Need operational savings.

Azure Virtual WAN

  • Managed hub & spoke architecture, with hub being Azure and spokes being offices.
  • Public (VPN) and private (ExpressRoute) connectivity.
  • Global Scale:
  • 20 Gbps S2S VPN and 20 Gbps ER = 20 Gbps user VPN
  • 10K users per hub
  • 1000 sites per hub
  • 1 hub per region
  • Transit routing
  • Cloud Network Orchestration
  • Automated large-sale branch/SDWAN CPE connectivity

Connectivity

What if you had many regions – many hubs. And what if you wanted any branch to access any Azure VNet, regardless of local vWAN hub? In other words, connect to a hub, and use the Azure WAN to seamlessly reach the destination. So you build hub/spoke in different Azure regions, each with a vWAN hub. And a branch connects to the closest vWAN hub, and can get to any Azure VNet via transitive routing between vWAN hubs across the Azure WAN.

  • Simplified network
  • Ease of use
  • Operational savings

This is called Global Transit Architecture over Azure Virtual WAN.

Azure Virtual WAN – What’s New

  • Any-to-Any connectivity (Preview, soon GA)
  • ExpressRoute and User VPN GA
  • ExpressRoute encryption
  • Multi-link Azure Path Selection
  • Custom IPsec
  • Connect VNG VPN to Virtual WAN
  • Availble in Gov Cloud & China
  • Azure Firewall integration (Preview) – this is the big announcement IMO
  • Pricing – reduced
  • New partnerships coming soon
    • Arista,
    • Aruba
    • Cisco
    • F5
    • OpenSystems
    • VeroCloud

Global Transit Architecture – A Customer Example

  • 4 regions, 70 countries with 100’s of sites. 34 VNets, 2 ExpressRoute Premium circuits.
  • Challenges: scale issues, routing complexity, ER VNet limits

The before and after architecture diagrams are totally different – after is much more simple.

Azure Virtual WAN Types

Basic:

  • VPN only
    • Branch to Azure
    • Branch to Branch
  • Connect VNet
    • DIY VNet peering, VNet to VNet non-transitive via hub
    • Hubs are not connected

Standard = Basic + Following

  • Stuff

Multi-Link Support in VPN Site

Support dual links of different types/ISPs. Azure sees the link information. The branch partner can do path selection across these links.

Barracuda CloudGen Firewall is the first to support this. You get always-on Azure in the branch.

ExpressRoute

  • GA in Standard Virtual WAN.
  • Up to 20 Gbps aggregate per hub.
  • Private connectivity – requires premium circuit.
  • In Global Reach Location
  • ExpressRoute VPN Interconnect
  • Integrated with Azure Monitor

EXPRESSROUTE + VPN Path Selection

Path selection between ER and VPN. Fortinet can do this.

Customer Story – Ben Peeri, KPMG

No notes here – sales story.

User VPN

  • Available in Standard Virtual WAN
  • Up to 20 Gbps aggregate and 10K users per hub
  • Cloud based secure remote access
    • Works with OpenVON and IKEv2 client
    • Cert based and RADIU authentication
  • Any-to-Any
    • User to branch, user to Azure VNet
  • More

Azure Firewall

  • Firewall in virtual hub
  • Centralized policy and route management
    • VNet to Internet through Azure Firewall
    • Branch to Internet through Azure Firewall
    • Managed through Azure Firewall Manager

Azure MSP Program

Announced in July. Focused on networking. Offerings in Azure Marketplace.

Pricing

  • Connection Unit
    • Site-to-site VPN / ExpressRoute: No reduced
    • User VPN
  • Scale Unit – aggregate throughput
    • 1 VPN scale unit
    • 1 ER scale unit
  • Virtual Hub (Effective CYQ1 2020)
    • Basic vWAN hub: no charge
    • Standard hub
    • Data processing intra region
    • Data processing inter region

Private Connections to Azure PaaS Services

In this post, I’d like to explain a few options you have to get secure/private connections to Azure’s platform-as-a-service offerings.

Express Route – Microsoft Peering

 

ExpressRoute comes in a few forms, but at a basic level, it’s a “WAN” connection to Azure virtual networks via one or more virtual network gateways; Customers this private peering to connect on-premises networks to Azure virtual networks over an SLA-protected private circuit. However, there is another form of peering that you can do over an ExpressRoute circuit called Microsoft peering. This is where you can use your private circuit to connect to Microsoft cloud services that are normally connected to over the public Internet. What you get:

  • Private access to PaaS services from your on-premises networks.
  • Access to an entire service, such as Azure SQL.
  • A wide array of Azure and non-Azure Microsoft cloud services.

FYI, Office 365 is often mentioned here. In theory, you can access Office 365 over Microsoft peering/ExpressRoute. However, the Office 365 group must first grant you permission to do this – the last I checked, you had to have legal proof of a regulatory need for private access to Cloud services. 

Service Endpoint

Imagine that you are running some resources in Azure, such as virtual machines or App Service Environment (ASE); these are virtual network integrated services. Now consider that these services might need to connect to other services such as storage accounts, Azure SQL, or others. Normally, when a VNet connected resource is communicating with, say, Azure SQL, the packets will be routed to “Internet” via the 0.0.0.0/0 default route for the subnet – “Internet” is everywhere outside the virtual network, not necessarily The Internet. The flow will hit the “public” Azure backbone and route to the Azure SQL compute cluster. There are two things about that flow:

  • It is indirect and introduces latency.
  • It traverses a shared network space.
  • A growing number of Azure-only services that support service endpoints.

A growing number of services, including storage accounts, Azure SQL, Cosmos DB, and Key Vault, all have services endpoints available to them. You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet” and the packets will “drop” through the service endpoint to the required Azure service – make sure that any firewall in the service accepts packets from the private subnet IP address of the source (VM or whatever). Now you have a more direct and more private connection to the platform service in Azure from your VNet. What you get:

  • Private access to PaaS services from your Azure virtual networks.
  • Access to an entire service, such as Azure SQL, but you can limit this to a region.

Service Endpoint Trick #1

Did you notice in the previous section on service endpoints that I said:

You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet”

Imagine you have a complex network and not everyone enables service endpoints the way that they should. But you manage the firewall, the public IPs, and the routing. Well, my friend, you can force traffic to support Azure platform services via service endpoints. If you have a firewall, then your routes to “Internet” should direct outbound traffic through the firewall. In the firewall (frontend) subnet, you can enable all the Azure service endpoints. Now when packets egress the firewall, they will “drop” through the service endpoints and to the desired Azure platform service, without ever reaching “Internet”.

Service Endpoint Trick #2

You might know that I like Azure Firewall. Here’s a trick that the Azure networking teams shared with me – it’s similar to the above one but is for on-premises clients trying to access Azure platform services.

You’ve got a VPN connection to a complex virtual network architecture in Azure. And at the frontend of this architecture is Azure Firewall, sitting in the AzureFirewallSubnet; in this subnet you enabled all the available service endpoints. Let’s say that someone wants to connect to Azure SQL using Power BI on their on-premises desktop. Normally that traffic will go over the Internet. What you can do is configure name resolution on your network (or PC) for the database to point at the private IP address of the Azure Firewall. Now Power BI will forward traffic to Azure Firewall, which will relay you to Azure SQL via the service endpoint. What you get:

  • Private access to PaaS services from your on-premises or Azure networks.
  • Access to individual instances of a service, such as an Azure SQL server
  • A growing number of Azure-only services that support service endpoints.

Private Link

In this post, I’m focusing on only one of the 3 current scenarios for Private Link, which is currently in unsupported preview in limited US regions only, for limited platform services – in other words, it’s early days.

This approach aims to give a similar solution to the above “Service Endpoint Trick #2” without the use of trickery. You can connect an instance of an Azure platform service to a virtual network using Private Link. That instance will now have a private IP address on the VNet subnet, making it fully routable on your virtual network. The private link gets a globally unique record in the Microsoft-managed privatelink.database.windows.net DNS zone. For example, your Azure SQL Server would now be resolvable to the private IP address of the private link as yourazuresqlsvr.privatelink.database.windows.net. Now your clients, be the in Azure or on-premises, can connect to this DNS name/IP address to connect to this Azure SQL instance. What you get:

  • Private access to PaaS services from your on-premises or Azure networks.
  • Access to individual instances of a service, such as an Azure SQL server.
  • (PREVIEW LIMITATIONS) A limited number of platform services in limited US-only regions.

Creating an Azure Service for Slow Moving Organisations

In this post, I will explain how you can use Azure’s Public IP Prefix feature to pre-create public IP addresses to access Azure services when you are working big/government organisations that can take weeks to configure a VPN tunnel, outbound firewall rule, and so on.

In this scenario, I need a predictable IP address so that means I must use the Standard SKU address tier.

The Problem

It normally only takes a few minutes to create a firewall rule, a VPN tunnel, etc in an on-premises network. But sometimes it seems to take forever! I’ve been in that situation – you’ve set up an environment for the customer to work with, but their on-premises networking team(s) are slow to do anything. And you only wish that you had given them all the details that they needed earlier in the project so their configuration work would end when your weeks of engineering was wrapping up.

But you won’t know the public IP address until you create it. And that is normally only created when you create the virtual network gateway, Azure Firewall, Application Firewall, etc. But what if you had a pool of Azure public IP addresses that were pre-reserved and ready to share with the network team. Maybe they could be used to make early requests for VPN tunnels, firewall rules, and so on? Luckily, we can do that!

Public IP Prefix

An Azure Public IP Prefix is a set of reserved public IP addresses (PIPs). You can create an IP Prefix of a certain size, from /31 (2 addresses) to /24 (256 addresses), in a certain region. The pool of addresses is a contiguous block of predictable addresses. And from that pool, you can create public IP addresses for your Azure resources.

In my example, I want a Standard tier IP address and this requires a Standard tier Public IP Prefix. Unfortunately, the Azure Portal doesn’t allow for this with Public IP Prefix, so we need some PowerShell. First, we’ll define some reused variables:

$rgName = "test"
$region = "westeurope"
$ipPrefixName = "test-ipfx"

Now we will create the Publix IP Prefix. Note that the length refers to the subnet mask length. In my example that’s a /30 resulting in a prefix with 4 reserved public IP addresses:

$ipPrefix = New-AzPublicIpPrefix -Name $ipPrefixName -ResourceGroupName $rgName -PrefixLength 30 -Sku Standard -Location $region

You’ll note above that I used Standard in the command. This creates a pool of static Standard tier public IP addresses. I could have dropped the Standard, and that would have created a pool of static Basic tier IP addresses – you can use the Azure Portal to deploy Basic tier Public IP Prefix and public IP addresses from that prefix. The decision to use Standard tier or Basic tier affects what resources I can deploy with the addresses:

  • Standard: Azure Firewall, zone-redundant virtual network gateways, v2 application gateways/firewalls, standard tier load balancers, etc.
  • Basic static: Basic tier load balancers, v1 application gateways/firewalls, etc.

Note that the non-zone redundant virtual network gateways cannot use static public IP addresses and therefore cannot use Public IP Prefix.

Creating a Public IP Address

Let’s say that I have a project coming up where I need to deploy an Application Firewall and I know the on-premises network team will take weeks to allow outbound access to my new web service. Instead of waiting until I build the application, I can reserve the IP address now, tell the on-premises firewall team to allow it, and then work on my project. Hopefully, by the time I have the site up and running and presented to the Internet by my Application Firewall, they will have created the outbound firewall rule from the company network.

Browse to the Public IP Prefix and make sure that it is in the same region as the new virtual network and virtual network gateway. Open the prefix and check Allocated IP Addresses in the Overview. Make sure that there is free capacity in the reserved block.

Now I can continue to use my variables from above and create a new public IP address from one of the reserved addresses in the Public IP Prefix:

New-AzPublicIpAddress -Name "test-vpn-pip" -ResourceGroupName $rgName -AllocationMethod Static -DomainNameLabel "test-vpn" -Location $region -PublicIpPrefix $ipPrefix -Sku Standard

Use the Public IP Address

I now have everything I need to pass onto the on-premises network team in my request. In my example, I am going to create a v2 Application Firewall.

Once I configure the WAF, the on-premises firewall will (hopefully) already have the rule to allow outbound connections to my pre-reserved IP address and, therefore, my new web service.

Azure Availability Zones in the Real World

I will discuss Azure’s availability zones feature in this post, sharing what they can offer for you and some of the things to be aware of.

Uptime Versus SLA

Noobs to hosting and cloud focus on three magic letters: S, L, A or service level agreement. This is a contractual promise that something will be running for a certain percentage of time in the billing period or the hosting/cloud vendor will credit or compensate the customer.

You’ll hear phrases like “three nines”, or “four nines” to express the measure of uptime. The first is a 99.9% measure, and the second is a 99.99% measure. Either is quite a high level of uptime. Azure does have SLAs for all sorts of things. For example, a service deployed in a valid virtual machine availability set has a connectivity (uptime) SLA of 99.9%.

Why did I talk about noobs? Promises are easy to make. I once worked for a hosting company that offers a ridiculous 100% SLA for everything, including cheap-ass generic Pentium “servers” from eBay with single IDE disks. 100% is an unachievable target because … let’s be real here … things break. Even systems with redundant components have downtime. I prefer to see realistic SLAs and honest statements on what you must do to get that guarantee.

Azure gives us those sorts of SLAs. For virtual machines we have:

  • 5% for machines with just Premium SSD disks
  • 9% for services running in a valid availability set
  • 99% for services running in multiple availability zones

Ah… let’s talk about that last one!

Availability Sets

First, we must discuss availability sets and what they are before we move one step higher. An availability set is anti-affinity, a feature of vSphere and in Hyper-V Failover Clustering (PowerShell or SCVMM); this is a label on a virtual machine that instructs the compute cluster to spread the virtual machines across different parts of the cluster. In Azure, virtual machines in the same availability set are placed into different:

  • Update domains: Avoiding downtime caused by (rare) host reboots for updates.
  • Fault domains: Enable services to remain operational despite hardware/software failure in a single rack.

The above solution spreads your machines around a single compute (Hyper-V) cluster, in a single room, in a single building. That’s amazing for on-premises, but there can still be an issue. Last summer, a faulty humidity sensor brought down one such room and affected a “small subset” of customers. “Small subset” is OK, unless you are included and some mission critical system was down for several hours. At that point, SLAs are meaningless – a refund for the lost runtime cost of a pair of Linux VMs running network appliance software won’t compensate for thousands or millions of Euros of lost business!

Availability Zones

We can go one step further by instructing Azure to deploy virtual machines into different availability zones. A single region can be made up of different physical locations with independent power and networking. These locations might be close together, as is typically the case in North Europe or West Europe. Or they might be on the other side of a city from each other, as is the case in some in North America. There is a low level of latency between the buildings, but this is still higher than that of a LAN connection.

A region that supports availability zones is split into 4 zones. You see three zones (round robin between customers), labeled as 1, 2, and 3. You can deploy many services across availability zones – this is improving:

  • VNet: Is software-defined so can cross all zones in a single region.
  • Virtual machines: Can connect to the same subnet/address space but be in different zones. They are not in availability sets but Azure still maintains service uptime during host patching/reboots.
  • Public IP Addresses: Standard IP supports anycast and can be used to NAT/load balance across zones in a single region.

Other network resources can work with availability zones in one of two ways:

  • Zonal: Instances are deployed to a specific zone, giving optimal latency performance within that zone, but can connect to all zones in the region.
  • Zone Redundant: Instances are spread across the zone for an active/active configuration.

Examples of the above are:

  • The zone-aware VNet gateways for VPN/ExpressRoute
  • Standard load balancer
  • WAGv2 / WAFv2

Considerations

There are some things to consider when looking at availability zones.

  • Regions: The list of regions that supports availability zones is increasing slowly but it is far from complete. Some regions will not offer this highest level of availability.
  • Catchup: Not every service in Azure is aware of availability zones, but this is changing.

Let me give you two examples. The first is VM Boot Diagnostics, a service that I consider critical for seeing the console of the VM and getting serial console access without a network connection to the virtual machine. Boot Diagnostics uses an agent in the VM to write to a storage account. That storage account can be:

  • LRS: 3 replicas reside in a single compute cluster, in a single room, in a single building (availability zone).
  • GRS: LRS plus 3 asynchronous replicas in the paired region, that are not available for write unless Microsoft declares a total disaster for the primary region.

So, if I have a VM in zone 1 and a VM in zone 2, and both write to a storage account that happens to be in zone 1 (I have no control over the storage account location), and zone 1 goes down, there will be issues with the VM in zone 2. The solution would be to use ZRS GPv2 storage for Boot Diagnostics, however, the agent will not support this type of storage configuration. Gotcha!

Azure Advisor will also be a pain in the ass. Noobs are told to rely on Advisor (it is several questions in the new Azure infrastructure exams) for configuration and deployment advice. Advisor will see the above two VMs as being not highly available because they are not (and cannot) be in a common availability set, so you are advised to degrade their SLA by migrating them to a single zone for an availability set configuration – ignore that advice and be prepared to defend the decision from Azure noobs, such as management, auditors, and ill-informed consultants.

Opinion

Availability zones are important – I use them in an architecture pattern that I am working on with several customers. But you need to be aware of what they offer and how certain things do not understand them yet or do not support them yet.

 

Cannot Create a Basic Tier Virtual Network Gateway in Azure

There is a bug in the Azure Portal that prevents you from selecting a virtual network when you pick the Basic Tier of the virtual network gateway, and you are forced into selecting the more expensive VpnGw1. I’ll show you how to workaround this bug in this post.

Background

I recently ran a hands-on Azure class in London. Part of the class required deploying & configuring a VPN gateway in the West Europe region. I always use the Basic tier because:

  • It’s cheaper – $26.79 for Basic versus $141.36 for VpnGw1 per month
  • That’s what most (by a long shot) of my customers deploy in production because it meets their needs.

I’ve had a customer in Northern Ireland report the same problem in North Europe.

The process goes like this:

  1. You select VPN gateway type
  2. Select Route-Based
  3. Select Basic as the SKU
  4. Then you attempt to select the virtual network that you want to use – it already has a gateway subnet
  5. You cannot continue because the virtual network is greyed out

image

The error shown is:

The following issues must be fixed to use this virtual network: The VPN gateway cannot have a basic SKU in order for it to coexist with an existing ExpressRoute gateway.

In all cases so far, the subscriptions have been either brand new CSP/trial subscriptions with no previous resources, or my lab subscription where I’ve used a new virtual network to demonstrate this scenario – and I have never deployed ExpressRoute in any subscription.

Workaround

Credit where credit is due – some of my attendees last week figured out how to beat the UI bug.

  1. Close the Choose Virtual Network blade if it is open.
  2. Select the VpnGw1 tier gateway in the Create Virtual Network Gateway blade – don’t worry, you won’t be creating it if you don’t want to pay the price.
  3. Click Choose A Virtual Network
  4. Select your virtual network
  5. Change the SKU of the gateway back to Basic
  6. Finish the wizard

image

I know – it’s a daft UI bug, but the above workaround works.