This post will give you an overview of Azure ExpressRoute architecture. This is not a “how to” post; instead, the purpose of this post is to document the options for architecting connectivity with Microsoft Azure in one concise (as much as possible) document.
Introduction to ExpressRoute
Azure ExpressRoute is a form of private Layer-2 or Layer-3 network connectivity between a customer’s on-premises network(s) and a virtual network hosted in Microsoft Azure. ExpressRoute is one of the 2 Azure-offered solutions (also, VPN) for achieving a private network connection.
There are 2 vendor types that can connect you to Azure using ExpressRoute:
- Exchange provider: Has an ExpressRoute circuit in their data centre. Either you run your “on-premises” in their data centre or you connect to their data centre.
- Network service provider: You get a connection to an ISP and they relay you to a Microsoft edge data centre or POP.
The locations of ExpressRoute and Azure are often confused. A connection using ExpressRoute, at a very high level and from your perspective, has three pieces:
- Circuit: A connection to a Microsoft edge data centre or pop. This can be one of many global locations that are often nothing to do with Azure regions; they are connected to the same Microsoft WAN as Azure (and Microsoft 365) and are a means to relay you to Azure (or Microsoft 365) using Azure ExpressRoute.
- Connection: Connecting an Azure Virtual Network (ExpressRoute Gateway) in an Azure region to a circuit that terminates at the edge data centre or POP.
- Peering: Configuring the routing across the circuit and connection.
For example, a customer in Eindhoven, Netherlands might have an ExpressRoute circuit that connects to “Amsterdam”; This POP or edge data centre is probably in Amsterdam, Netherlands, or the suburbs. The customer might use that circuit to connect to Azure West Europe, colloquially called “Amsterdam”, but is actually in Middenmeer, approximately 60 KM north of Amsterdam.
ExpressRoute Versus VPN
The choice between ExpressRoute and site-to-site VPN isn’t always as clear-cut as one might think: “big organisations go with ExpressRoute and small/mid go with VPN”. Very often, organisations are choosing to access Azure services over the Internet using HTTPS, with small amounts of legacy traffic traversing a private connection. In this case, VPN is perfect. But when you want an SLA or low latency, ExpressRoute is your choice.
Internet: No one
Service Provider: Circuit
||Aggregate of 10 Gbps
||BGP (even if you don’t use/enable it)
||See SD-WAN (Azure Virtual WAN)
Also see Azure Virtual WAN
||Azure Virtual Networks
||Azure Virtual Networks
Other Azure Services
Other clouds, depending on service provider
||Outbound data transfer and your regular Internet connection
||Payment to service provider for the circuit.
Payment for either a metered (outbound data + circuit) or unlimited data (circuit) to Microsoft.
- Customer premises equipment (CPE) or Customer edge routers (CEs): 2, ideally, edge devices that will be connected in a highly available way to 2 lines connecting your network(s) to the service provider.
- Provider edge routers (PEs), CE facing: Routers or switches operated by the service provider that the customer is connected to.
- Provider edge routers (PEs), MSEE facing: Routers or switches operated by the service provider that connect to Microsoft’s MSEEs.
- Microsoft Enterprise Edge (MSEE) routers: Routers in the Microsoft POP or edge data centre that the service provider has connected to.
The MSEE is what:
- Your ExpressRoute virtual network gateway connects to.
- Propagates BGP routes to your virtual network.
- Can connect two virtual networks together (with BGP propagation) if they both connect to the same circuit (MSEE).
- Can relay you to other Azure services or other Microsoft cloud services.
It is very strongly recommended that the customer deploys two highly available pieces of hardware for the CEs. The ExpressRoute virtual network gateway is also HA, but if the Azure region supports it, spread the two nodes across different availability zones for a higher level of availability.
FYI, these POPs or Edge Data Centers also host other Azure services for edge services.
Quite often, the primary use case for Azure ExpressRoute is to connect to Azure virtual networks, and resources connected to those virtual networks such as:
- Virtual machines
- VNet integrated SKUs such as App Service Environment, API Management, and SQL Managed Instance
- Platform services supporting Private Endpoint
That connectivity is provided by Azure Private Peering. However, you can also connect to other Microsoft services using Microsoft Peering:
To use Microsoft Peering you will need to configure NAT to convert connections from private IP addresses to public IP addresses before they enter the Microsoft network.
ExpressRoute And VPN
There are two scenarios where ExpressRoute and site-to-site VPN can coexist to connect the same on-premises network and virtual network.
The first is for failover. If you deploy a /27 or larger GatewaySubnet then that subnet can contain an ExpressRoute Virtual Network Gateway and a VPN Virtual Network Gateway. You can then configure ExpressRoute and VPN to connect the same on-premises and Azure networks. The scenario here is that the VPN tunnel will be an automated failover connection for the ExpressRoute circuit – failover will happen automatically with less than 10 packets being lost. Two things immediately come to mind:
- Use a different ISP for Internet/VPN connection than used for ExpressRoute
- Both connections must propagate the same on-premises networks.
An interesting new twist was announced recently for Virtual Network Gateway and Azure Virtual WAN. By default, there is no encryption on your ExpressRoute circuit (more on this later). You will be able to initiate a site-to-site VPN connection across the ExpressRoute circuit to a VPN Virtual Network Gateway that is in the same GatewaySubnet as the ExpressRoute Virtual Network Gateway, encrypting your traffic.
There are three tiers of ExpressRoute circuit that you can deploy in Microsoft Azure. I have not found a good comparison table, so the below will not be complete:
|Azure Virtual WAN support
||Announced, not GA
|Azure Global Reach
||Limited to same geo-zone
|Max connections per circuit
||100, depending on the circuit size (Mbps) – 20 for 50 Mbps, 100 for 10 Gbps+
|Connections from different subscriptions
|Max routes advertised
||Private peering: 4,000
Microsoft peering: 200
|Private Peering: Up to 10,000
Microsoft peering: 200
I said “three tiers”, right? But there is also a third tier called Local which is very lightly documented. ExpressRoute Local is a subset of ExpressRoute Standard where:
- The circuit can only connect to 1 or 2 Azure regions in the same metro as the POP or edge data centre. Therefore it is available in fewer locations than ExpressRoute Standard.
- ExpressRoute Global Reach is not available.
- It requires an unlimited data plan with at least 1 Gbps, coming in at ~25% of the price of a 1 Gbps Standard tier unlimited data plan.
Service Provider Types
There are three ways that a service provider can connect you to Azure using ExpressRoute, with two of them being:
- Layer-2: A VLAN is stretched from your on-premises network to Azure
- Layer-3: You connect to Azure over IP VPN or MPLS VPN. Your on-premises network connects either by BGP or a static default route.
There is a third option, called ExpressRoute Direct.
A subset of the Microsoft POPs or edge data centres offer a third kind of connection for Azure ExpressRoute called ExpressRoute Direct. The features of this include:
- Larger sizes: You can have sizes from 1 Gbps to 100 Gbps for massive data ingestion, for things like Cosmos DB or storage (HPC).
- Physical Isolation: Some organisations will have a compliance reason to avoid connections to shared network equipment (the CEs and MSEE).
- Granular control of circuit distribution: Based on business unit
This is a very specialised SKU that you must apply to use.
The normal flow of packets routing into Azure over ExpressRoute is:
- Enter Microsoft at the MSEE
- Travel via the ExpressRoute Virtual Network Gateway.
- If a route table exists, follow that route, for example, to a hub-based firewall.
- Route to the NIC of the virtual machine
There is a tiny latency penalty by routing through the Virtual Network Gateway. For a tiny percentage of customers, this latency may cause issues.
The concept of ExpressRoute Fast Path is that you can skip the hop of the virtual network gateway and route directly to the NICs of the virtual machines (in the same virtual network as the gateway).
To use this feature you must be using one of these gateway sizes:
- Ultra Performance
The following are not supported and will force traffic to route via the ExpressRoute Virtual Network Gateway:
- There is a UDR on the GatewaySubnet
- Virtual Network Peering is used. An alternative is to connect the otherwise-peered VNets directly to the circuit with their own VNet Gateway.
- You use a Basic Load Balancer in front of the VMs; use a Standard tier Load Balancer.
- You are attempting to connect to Private Endpoint.
ExpressRoute Global Reach
I think that ExpressRoute Global Reach is one of the more interesting features in ExpressRoute. You can have two or more offices, each with their own ExpressRoute (not Local tier) circuit to a local POP/edge data center, and enable Global Reach to allow:
- The offices to connect to Azure/Microsoft cloud resources
- Connect to each other over the Microsoft WAN instead of deploying a WAN
Note that ExpressRoute Standard will support connecting locations in the same geo-zone, and ExpressRoute Premium will support all geo-zones. Supported POPs are limited to a small subset of locations.
Traffic over ExpressRoute is not encrypted and as Edward Snowden informed us, various countries are doing things to sniff traffic. If you wish to protect your traffic you will have to “bring your own key”. We have a few options:
- The aforementioned VPN over ExpressRoute, which is available now for Virtual Network Gateway and Azure Virtual WAN.
- Implement a site-to-site VPN across ExpressRoute using a third-party virtual appliance hosted in the Azure VNet.
- IPsec configured on each guest OS, limited to machines.
- MACsec, a Layer-2 feature where you can implement your own encryption from your VE to the MSEE, encrypting all traffic, not just to/from VMs.
The MACsec key is stored securely in Azure Key Vault. From what I can see, MACsec is only available on ExpressRoute Direct. Microsoft claims that it does not cause a performance issue on their routers, but they do warn you to check your CE vendor guidance.
Now you’ll see why I talked about Layer-2 and Layer-3. Depending on your service provider type and their connectivity to non-Microsoft clouds, if you have a circuit with the service provider (from your CEs to their CE facing PEs) that same circuit can be used to connect to Azure over ExpressRoute and to other clouds such as AWS or others. With BGP propagation, you could route from on-premises to/from either cloud, and your deployments in those clouds could route to each other.
Bidirectional Forwarding Detection (BFD)
The circuit is deployed as two connections, ideally connected to 2 CEs in your edge network. Failover is automated, but some will want failover to be as quick as possible. You can reduce the BGP keepalive and hold-time but this will be processor intensive on the network equipment.
A feature called BFD can detect link failure in a sub-second with low overhead. BFD is enabled on “newly” created ExpressRoute private peering interfaces on the MSEEs – you can reset the peering if required. If you want this feature then you need to enable it on your CEs – the service provider must also enable it on their PEs.
Azure Monitor provides a bunch of metrics for ExpressRoute that you can visualise or create alerts on.
Azure’s Connection Monitor is the Microsoft-offered solution for monitoring an ExpressRoute connection. The idea is that a Log Analytics agent (Windows or Linux) is deployed onto one or more always-on on-premises machines. A test is configured to run across the circuit measuring availability and performance.