ExpressRoute

Designing An Azure Hub Virtual Network

In this post, I am going to share a process for designing a hub virtual network for a hub & spoke secured virtual network deployment in Microsoft Azure.

The process I lay out in this document will not work for everyone.I think, based experience, that very few organisations will find exceptions to this process.

What Is And Is Not In This Post

This post is going to focus on the process of designing a hub virtual network. You will not find a design here … that will come in a later post.

You will also not find any mention of Azure Virtual WAN. You DO NOT need to use Azure Virtual WAN to do SD-WAN, despite the claptrap on Microsoft documentation on this topic. Virtual WAN also:

Restricts your options on architecture, features, and network design.
Is a nightmare to troubleshoot because the underlying virtual network is hidden in a Microsoft tenant.

Rules Of Engagement

The hub will be your network core in a network stamp: a hub & spoke. The hub & spoke will contain networks in a single region, following concepts:

Resilience & independence: Workloads in a spoke in North Europe should not depend on a hub in West Europe.
Micro-segmentation: Workloads in North Europe trying to access workloads in West Europe should go through a secure route via hubs in each region.
Performance: Workload A in North Europe should not go through a hub in West Europe to reach Workload B in North Europe.
Cost Management: Minimise global VNet peering to just what is necessary. Enable costs of hubs to be split into different parts of the organisation.
Delegation of Duty: If there are different network teams, enable each team to manage their hubs.
Minimised Resources: The hub has roles only of transit, connectivity, and security. Do not place compute or other resources into the hub; this is to minimise security/networking complexity and increase predictability.

A Hub Design Process

The core of our Azure network will have very little in the way of resources. What can be (not “must be”)included in that hub can be thought of as functions:

Site-to-site networking: VPN, ExpressRoute, and SD-WAN.
Point-to-site VPN: Enabling individuals to connect to the Azure networks using a VPN client on their device.
Firewall: Providing security for ingress, egress, and inter-workload communications.
Virtual Machines: Reduce costs of secured RDP/SSH by deploying Azure Bastion in the hub.

If we are doing a high-level design, we have a two questions that we will ask about each of thse functions:

Is the function required?
What technology will be used?

We won’t get into tiers/SKUs, features, or configurations just yet; that’s when we get into low-level or detailed design.

One can use the following flow chart to figure out what to use – it’s a bit of an eye test so you might need to open the image in another tab:

Site-to-Site (S2S) Networking

While it is very commonly used, not every organisation requires site-to-site connectivity to Azure.

For example, I had a migration customer that was (correctly) modernising to the “top tier” of cloud computing by migrating from legacy apps to SaaS. They wanted to re-implement an SD-WAN for over 100 offices to connect their new and small Azure footprint. I was the lead designer so I knew their connectivity requirements – they were going to use Azure Virtual Desktop (AVD) only to connect to their remaining legacy apps. AVD doesn’t need a site-to-site connection. I was able to save that organisation from entering into a costly managed SD-WAN services contract and instead focus on Internet connectivity – not long later they shutdown their Azure footprint when SaaS aleternatives were found for the the last legacy applications.

If we establish that site-to-site connectivity is required then we must ask the first question:

Are latency and SLA important?

If the answer to either of these items is “yes” then there is no choice: An ExpressRoute Virtual Network Gateway is required.

If the answer is no, then we are looking at some kind of VPN connectivity. We can ask another question to determine the type of solution:

Will there be a small number of VPN connections?

If a small number of VPN connections is required, the Azure VPN Virtual Network Gateway is suitable – consider the SKUs/sizes and complexities of management to determine what “a small number” is.

If you determine that the VPN Virtual Network Gateway is unsuitable then an SD-WAN network virtual appliance (NVA) should be used. Note that it would be recommended to deploy Azure Route Server with a third-party VPN/SD-WAN appliance to enable propagation network prefixes:

Azure > SD-WAN
SD-WAN > Azure

You may find that you need one or more of the above solutions! For example:

Some ExpressRoute customers may opt to deploy a parallel VPN tunnel with an identical routing configuration over a completely different ISP. This enables automatic failover from ExpressRoute to VPN in the event of a circuit failure.
An SD-WAN customer may also have ExpressRoute for some offices/workloads where SLA or latency are important. Another consideration may be that one workload has other technical requirements that only ExpressRoute (Direct) can service such as very high throughput.

You have one more question to ask after you have picked the site-to-site component(s):

Will you require site-to-site transit through Azure via the site-to-site network connections?

In other words, should Remote Site A be able to route to Remote Site B using your Azure site-to-site connections? If the answer is yes then you must deploy Azure Route Server to enable that routing.

Point-To-Site (P2S) VPN

I personally have not deployed very much of this solution but I do hear it being discussed quite a bit. Some organisations must enable users (or external suppliers) to create a VPN connection from their individual devices to Azure. If this is required then you must ask:

Is the scenario(s) simple?

I’ve kept that vague because the problem is vague. There are two solutions with one being overly-simplistic in capabilities and the other being more fully-featured.

The Azure VPN Gateway (also used for site-to-site VPN) offers a very available (Azure resource) solution for P2S VPN. It offers different configuration for authentication and device support. But it is very limited. For example, it has no routing rules to restrict which users get access to which networks. This means that if you grant network (firewall/NSG) access to one user via the VPN address pool, you must grant the same access to all users, which is clearly pretty poor if you have many types/roles of remote VPN clients (IT, developer of workload X, developer of workload Y, Vendor A, Vendor B, etc).

In such scenarios, one should consider a third-party NVA for point-to-site networking. Third-party NVAs may offer more features for P2S VPN than the VPN Virtual Network Gateway.

A P2S NVA may reside in the same hub as a VPN Virtual Network Gateway (and other S2S solutions).

It’s not in the diagram but you should also consider Entra Global Secure Access as an alternative to P2S VPN. The Private Network Connector would be deployed in a spoke(s), not the hub.

Firewall

Is a firewall required? The correct answer for anyone considering a hub & spoke architecutre should be “of course it is”. But you might not like security, so we’ll ask that question anyway.

Once you determine that security is important to your employer, you must ask yourself:

Shall I use a native PaaS firewall?

The native PaaS solution in Azure is Azure Firewall. I have many technical reasons to prefer Azure Firewall over third-party alternatives. For consultants, a useful attribute of Azure Firewall is that you can skill up on one solution that you can implement/use/manage for many customers and projects (migrations) won’t face repeated delays as you wait on others to implement rules in third-party firewalls.

If you want to use a different firewall then you are free to do so.

If you are using Azure Firewall then there is a follow-up question if there will be S2S network connections:

Are the remote networks using non-RFC1918 address prefixes?

In other words, do the remote networks use address prefixes outside of:

192.168.0.0/16
172.16.0.0/12
10.0.0.0/8

If they do then Azure Firewal requires some configuration because traffic to non-RFC1918 prefixes is forced to the Internet by default – they are Internet addresses after all! You can statically configure the prefixes if they do not change. Or …

If you are using Azure Route Server
The prefixes can change a lot thanks to scenarios such as acquisition or rapid growth

… you can (in preview today) configure integration between Azure Firewall and Azure Route Server so the firewall dynamically learns the address prefixes from the remote networks.

Virtual Machines

Do not put compute in the hub!

This scenario asks:

Will any of the workloads in your spoke virtual networks have virtual machines?

You will have virtual machines even if you “ban” virtual machines – I guarantee that they will eventually appear for things like security solutions, self-hosted agents, Azure Virtual Desktop, AKS, and so on.

Unfortunately, many consider secure remote access (SSH/RDP) to be opening a port in the firewall for TCP 22/3389. That is not considered secure because those protocols can be and have been attacked. In the past, those who took security seriously used a dedicated “jump box” or “bastion host” to isolate vulnerable on-premises machines from assets in the data centre. We can use the same process with Azure Bastion where there is no IaaS requirement – we leverage Entra security features to authenticate the connection request and the guest OS credentials to verify VM access.

One can deploy Bastion in a spoke – that is perfectly valid for some scenarios. However, many important features are only in the paid-for SKUs so you might wish to deploy a shared Azure Bastion. Unfortunately, routing restrictions by Bastion prevent deploying a shared Bastion in a spoke, so we have no choice but to deploy a shared Azure Bastion in a hub. If you wish to have a share an Azure Bastion across workloads then it will be the final component in the hub.

If/when Azure Bastion supports route tables in the AzureBastionSubnet I will recommend moving shared Bastion deployments to a spoke – yes, I know that we can do that with Azure Virtual WAN but there are many things that we cannot do with Azure Virtual WAN.

You could consider a third-party alterantive or a DIY bastion solution. If so, place that into a spoke because it will be compute-based.

Wrapping Up

As you can see, the high-level design of the hub is very simple.

There are few functions in it because when you understand Azure virtual networks, routing, and NSGs, then you understand that designing a secure network should not be complex. Complexity is the natural predator of manageability and dependable security. There is a little more detail when we get into a low-level or detailed design, but that’s a topic for another day.

Azure Route Server Saves The Day

In this post, I will discuss a recent scenario where we used Azure Route Server branch-to-branch routing to rescue a client.

The Original Network Design

This client is a large organisation with a global footprint. They had a previous WAN design that was out of scope for our engagement. The heart of the design was Meraki SD-WAN, connecting their global locations. I like Meraki – it’s relatively simple and it just works – that’s coming from me, an Azure networking person with little on-premises networking experience.

The client started using the services of a cloud provider (not Microsoft). The client followed the guidance of the vendor and deployed a leased line connection to a cloud region that was close to their headquarters and to their own main data centre. The leased line provides low latency connectivity between applications hosted on-premises and applications/data hosted in the other cloud.

Adding Azure

The customer wanted to start using Azure for general compute/data tasks. My employer was engaged to build the original footprint and to get them started on their journey.

I led the platform build-out, delegating most of the hands-on and focusing on the design. We did some research and determined the best approach to integrate with the other cloud vendor was via ExpressRoute. The Azure footprint was placed in an Azure region very close to the other vendor’s region.

An ExpressRoute circuit was deployed between a VNet-based hub in Azure – always my preference because of the scalability, security/governance concepts, and the superiority over Virtual WAN hub when it comes to flexibility and troubleshooting. The Meraki solution from the Azure Marketplace was added to the hub to connect Azure to the SD-WAN and BGP propagation with Azure was enabled using Azure Route Server. To be honest – that was relatively simple.

The customer had two clouds:

The other vendor via a leased line.
Azure via SD-WAN.
And an interconnect between Azure and the other cloud via ExpressRoute.

Along Came a Digger

My day-to-day involvement with the client was over months previously. I got a message early one morning from a colleague. The client was having a serious networking issue and could I get online. The issue was that an excavator/digger had torn up the lines that provided connectivity between the client’s data centre and the other cloud.

Critical services in the other Cloud were unavailable:

App integration and services with the on-premises data centre.
App availability to end users in the global offices.

I thought about it for a short while and checked out my theory online. One of the roles of Azure Route server is to enable branch to branch connectivity between “on-premises” locations between ExpressRoute/VPN.

Forget that the other cloud is a cloud – think of the other cloud’s region as an on-premises site that is connected via ExpressRoute and the above Microsoft diagram makes sense – we can interconnect the two locations via BGP propagation through Azure Route Server:

The “on-premises” location via ExpressRoute
The SD-WAN via the Meraki which is already peered with Azure Route Server

I presented the idea to the client. They processed the information quickly and the plan was implemented quickly. How quickly? It’s one setting in Azure Route Server!

The Solution

The workaround was to use Azure as a temporary route to the other Cloud. The client had routes from their data centre and global offices to Azure via the Meraki SD-WAN. BGP routes were propagating between the SD-WAN connected locations, thanks to the peering between the Meraki NVA in the Azure hub and Azure Route Server.

BGP routes were also propagating between the other cloud and Azure thanks to ExpressRoute.

The BGP routes that did exist between the SD-WAN and the other cloud were gone because the leased line was down – and was going to be down for some time.

We wanted to fill the gap – get routes from the other cloud and the SD-WAN to propagate through Azure. If we did that then the SD-WAN locations and the other cloud could route via the Meraki and the ExpressRoute gateway in the Azure Hub – Azure would become the gateway between the SD-WAN and the other cloud.

The solution was very simple: enable branch-to-branch connectivity in Azure Route Server. There’s a little wait when you do that and then you run a command to check the routes that are being advertised to the Route Server peer (the Meraki NVA in this case).

The result was near instant. Routes were advertised. We checked Azure Monitor metrics on the ExpressRoute circuit and could see a spike in traffic that coincided with the change. The plan had worked.

The Results

I had not heard anything in a while. This morning I heard that the client was happy with the fix. In fact, user experience was faster.

Go back to the original diagram before Azure and I can explain. Users are located in the branch offices around the world. Their client applications are connecting to services/data in the other cloud. Their route is a “backhaul”:

SD-WAN to central data centre
Leased line over long distance to the other cloud

When we introduced the “Azure bypass” after the leased line failure, a new route appeared for end users:

SD-WAN to Azure
A very short distance hop over ExpressRoute

Latency was reduced quite a bit so user experience improved. On the contrary, latency between the on-premises data centre and the other cloud has increased because the SD-WAN is a new hop but at least the path is available. The original leased line is still down after a few weeks – this is not the fault of the client!

Some Considerations

Ideally one would have two leased lines in place for failover. That incurs costs and it was not possible. What about Azure ExpressRoute Metro? That is still in preview at this time and is not available in the Azure metro in question.

However, this workaround has offered a triangle of connectivity. When the lease line in repaired, I will recommend that the triangle becomes their failover – if any one path fails, the other two will take the place, bringing the automatic recoverability that was part of the concept of the original ARPANET.

The other change is that the other cloud should become another site in the Meraki SD-WAN to improve the user app experience.

If we do keep branch-to-branch connectivity then we need to consider “what is the best path”? For example, we want the data centre to route directly to the other cloud when the leased line is available because that offers the lowest latency. But what if a route via Azure is accidentally preferred? We need control.

In Azure Route Server, we have the option to control connectivity from the Azure perspective (my focus):

(Default) Prefer ExpressRoute: Any routes received over ExpressRoute will be used. This would offer sub-optimal routes because on-premises prefixes will be received from the other cloud.
Prefer VPN: Any routes received over VPN will be used. This would offer sub-optimal routes because other cloud prefixes will be received from on-premises.
Use AS path: Let the admin/network advertise a preferred path. This would offer the desired control – “use this path unless something goes wrong”.

Azure & Oracle Cloud Interconnect

This post will explain how you can connect your Azure network(s) with Oracle Cloud Infrastructure (OCI) via the Oracle Cloud Interconnect.

Background

Many mid-large organisations run applications that are based on Oracle software. When these organisations move to the cloud, they may choose to use Oracle Cloud for their Oracle workloads and Azure for everything else.

But that raises some interesting questions:

How do we connect Azure workloads to Oracle workloads?
If Oracle is hosting data services, how do we minimise latency?

The answer is: The Oracle Cloud Interconnect (OCI).

Azure ExpressRoute and Oracle FastConnect

Microsoft and Oracle are inter-connected via their respective private “site-to-site” connection mechanisms:

Azure: ExpressRoute
Oracle: FastConnect

This is achieved by both service providers sharing a “meet me” location where each cloud’s edge networks allow a “cross-connection”. So, there is no need to contact an ISP to lease an ExpressRoute circuit. The circuit already exists. There is no need to sign a circuit contract. The ISP is “Oracle” and you pay for the usage of it – in the case of Azure by paying for the ExpressRoute circuit Azure resource.

Location, Location, Location

The inter-connect mechanism is obviously play a role in where you can deploy your ExpressRoute Circuit and FastConnect resource. But performance also comes into play here – latency must be kept to a minimum. As a result, there is a support restriction on which Azure/Oracle regions can be inter-connected and where the circuit must be terminated.

At the time of writing, the below list was published by Microsoft:

What does this?

Let’s imagine that we are using OCI Amsterdam. If we want to connect Azure to it then we must use Azure West Europe.

Now, what about keeping that latency low? The trick there is in selecting a Peering Location that is closeby. Note that the Oracle docs do a better job at defining the Azure peering location (see under Availability).

In my scenario, the peering location would be Amsterdam2. According to Microsoft:

Connectivity is only possible where an Azure ExpressRoute peering location is in proximity to or in the same peering location as the OCI FastConnect.

That means you must always keep the following close to be able to use this solution:

The Oracle Cloud Infrastructure region
The Azure region
The peering location of the ExpressRoute circuit & FastConnect circuit

Configuring ExpressRoute

You have few options to decide between. The first is the SKU of ExpressRoute that you will choose.

Type

Billing

Connections

Local

Unlimited

1 or 2 Azure regions in the same metro as the peering location.

Standard

Metered or Unlimited

Up to 10 connection in the same geo zone as the peering location.

You also have to choose one of the supported speeds for this solution: 1, 2, 5, or 10 Gbps.

The ISP will be Oracle Cloud FastConnect.

So do you choose Local or Standard? I think that really comes down to balancing the cost. Local has unlimited data transfer but it is billed based on bandwidth. The entry cost per month in Zone 1 is €1,111.27/month with 1 Gbps and unlimited data transfer.

The entry point for a Standard metered plan is €403.76/month. That is €707.51 cheaper than the Local SKU but that savings has to cover your outbound data transfer cost in Azure. At €0.024/GB, that leaves you with (707.51/0.024) 29,479 GB of outbound data transfer per month until the Local SKU is more affordable.

The safe tip here is choose Local, monitor data usage, and consider jumping to Standard if you are using a small enough amount of outbound data transfer to make the metered Standard SKU more affordable.

Note that you can upgrade from Local but you cannot downgrade to Local.

Getting Connected (From Azure)

I’ll talk about the Azure side of things because that’s what I know. I will cover a little bit about Oracle, from what I have learned.

You will need an ExpressRoute Gateway in the selected Azure region. Then you will create an ExpressRoute Circuit in the same region:

Your chosen SKU/billing model.
The speed from 1, 2, 5, or 10 Gbps.
The Provider is Oracle Cloud FastConnect.
The peering location from the Oracle docs.

Retrieve the service key and then continue the process in the OCI portal. There is one screen that is very confusing: configuring the BGP addresses.

You are going to need two /30 prefixes that are not used in your OCI/Azure networks. I’m going to use 192.168.0.0/30 and 192.168.0.4/32 for my example. You need two prefixes because Azure and Oracle are running highly available resources under the covers. The ExpressRoute Gateway is two active/active compute instances. Each will require an IP address to advertise/receive addresses prefixes via BGP from the OCI gateway, and vice versa.

What addresses do you need? Oracle requires you to enter:

Customer (Azure) BGP IP Address 1
Oracle BGP IP Address 1
Customer (Azure) BGP IP Address 2
Oracle BGP IP Address 2

Here’s how you calculate them:

Customer (Azure) BGP IP Address 1: Usable IP #2 from Prefix 1
Oracle BGP IP Address 1: Usable IP #1 from Prefix 1.
Customer (Azure) BGP IP Address 2: Usable IP #2 from Prefix 2
Oracle BGP IP Address 2: Usable IP #1 from Prefix 1

The below is not the final answer yet! But we’re getting there. That would lead us to caclulating:

Customer BGP IP Address 1: 192.168.0.2
Oracle BGP IP Address 1: 192.168.0.1
Customer BGP IP Address 2: 192.168.0.6
Oracle BGP IP Address 2: 192.168.0.5

But the Oracle GUI has an illogical check and will tell you that those addresses are wrong. They are correct – it’s just the Oracle GUI is broken by design! Here is what you need to enter:

Customer BGP IP Address 1: 192.168.0.2/30
Oracle BGP IP Address 1: 192.168.0.1/30
Customer BGP IP Address 2: 192.168.0.6/30
Oracle BGP IP Address 2: 192.168.0.5/30

You finish the process and wait a little bit. The ExpressRoute circuit will eventually change status to Provisioned. Now you can create a connection between the circuit and the ExpressRoute Gateway. When I did it, the Private Peering was automatically configured, using 192.168.0.0/30 and 192.168.04/30 as the peering subnets.

Check your ARP records and route tables in the circuit (under Private Peering) and you should see that Oracle has propagated its known addresses to your Azure ExpressRoute Gateway, and on to any subnets that are not blocking propagation from the gateway.

And that’s it!

Other Support Things

The following Oracle services are supported:

E-Business Suite
JD Edwards EnterpriseOne
PeopleSoft
Oracle Retail applications
Oracle Hyperion Financial Management

Naturally, your OCI and Azure networks must not have overlapping prefixes.

You can do transitive routing. For example, you can route through the interconnect to an Oracle network and then on to a peered Oracle network (a hub and spoke).

You cannot use the interconnect to route to on-premises from Azure or from OCI.

Connecting To A Third-Party Network From Azure Using NAT

An unfortunately common scenario is where you must create a site-to-site network connection with a third-party network from your Azure network using NAT. This post will explain a few solutions.

The Scenario

There are those out there who think that every implementation in The Cloud is 100% under your control and is cloud-ready. But sometimes, you must fit in with other people’s designs and you can’t use cool integrations such as Private Link or API. Sometimes you need to connect your network to a third party and they dictate the terms of the connection.

The connection is typically a site-to-site connection, usually VPN but I have seen ExpressRoute used. VPN means there are messy bits – you can control that with your own on-premises firewalls but you have no control over the VPN configuration of an externally owned firewall.

Site-to-site connections with a service provider means that there could be IP address overlap. The only way to handle that is to use NAT – and that is not always possible natively in the platform or it’s really badly documented.

Solution 1: On-Premises Relay

In this scenario, the third-party will make a connection to your on-premises network. NAT is implemented on the on-premises network to translate your private Azure address to some “public address” (it is routed only over the private connections).

The connection between on-premises and Azure could be VPN or ExpressRoute.

This design is useful in two situations:

You are using ExpressRoute – the ExpressRoute Gateway does not offer NAT functionality.
The third-party insists that you use some kind of VPN configuration that is not supported in Azure, for example, GRE.

The downside with this design is that might be additional latency between the third-party and your Azure network.

Solution 2: AWS Relay

Oh – did this post by an Azure MVP just mention AWS? Sure – there is a time and a place for everything.

This solution is similar to the on-premises relay solution but it replaces on-premises with AWS. This can be useful where:

You want to minimise on-premises resources. AWS does support GRE so a VPN connection to a third-party that requires GRE can be handled in this way.
You can use an AWS region that is close to either the third-party and/or your Azure region and minimise latency.

Note that the connection from AWS or Azure could be either VPN or ExpressRoute (with an ISP that supports Azure ExpressRoute and AWS Direct Connect).

The downside is that there is still “more stuff” and a requirement for skills that you might not have. On the plus side, it offers compatibility with reduced latency.

Solution 3: Azure Relay

In this design, the third-party makes a connection to your Azure network(s) using ExpressRoute. But as usual, you must implement a NAT rule. The ExpressRoute Gateway cannot natively implement NAT. That requires that you must deploy “an appliance” (NVA or Linux VM with NAT tables).

In the above design, there is a route table associated with the GatewaySubnet of the ExpressRoute Gateway. An user-defined route with a prefix of 40.40.40.4 will forward to the appliance as the next hop. A user-defined route on the VM’s subnet with a prefix of the third-party network(s) will use the appliance as the next hop.

This design allows you to use ExpressRoute to connect to the third-party but it also allow you to implement NAT.

Solution 4: VPN Gateway & NAT

Other than using some modern solution, such as authenticated API over HTTPS, this is probably “the best” scenario in terms of Azure resource simplicity.

The third-party connects to your Azure network using a site-to-site VPN. The connection is terminated in Azure using a VPN Gateway. The Azure VPN Gateway is capable of supporting NAT rules. Unfortunately, that’s where things begin to fall apart because of the documentation (quality and completeness).

This is a simple scenario where the third-party needs access to an IP address (VM other otherwise) hosted in your Azure network. That internal address of your Azure resource must be translated to a different External IP Address.

As long as your VPN Gateway is VpnGw2/VpnGw2Az or higher, then you can create NAT rules in the Gateway. The scenario that I have described requires a confusingly-named egress NAT rule – you are translating an internal IP address(es) to an external IP address(es) to abstract the internal address(es) for ingress traffic. An ingress NAT rule translates an external IP address(es) to an internal address(es) to abstract the external address(es) for ingress traffic.

The Terraform code for my scenario is shown below: I want to make my Azure resource with 10.10.8.4 available externally as 40.40.40.4 on TCP 443:

Once you have the NAT rule, you will associate it with the Connection resource for the VPN.

And that’s it – 10.10.8.4 will be available as 40.40.40.4 on TCP 443 to the third-party – no other connection can use this NAT rule unless it is associated with it.

Solution 5 – NVA & NAT

This is alm ost the same as the previous example, but an NVA is used instead of the Azure VPN Gateway, maybe because you like their P2S VPN solution or you are using SD-WAN. The NAT rules are implemented in the NVA.

An Introduction to Azure ExpressRoute Architecture

This post will give you an overview of Azure ExpressRoute architecture. This is not a “how to” post; instead, the purpose of this post is to document the options for architecting connectivity with Microsoft Azure in one concise (as much as possible) document.

Introduction to ExpressRoute

Azure ExpressRoute is a form of private Layer-2 or Layer-3 network connectivity between a customer’s on-premises network(s) and a virtual network hosted in Microsoft Azure. ExpressRoute is one of the 2 Azure-offered solutions (also, VPN) for achieving a private network connection.

There are 2 vendor types that can connect you to Azure using ExpressRoute:

Exchange provider: Has an ExpressRoute circuit in their data centre. Either you run your “on-premises” in their data centre or you connect to their data centre.
Network service provider: You get a connection to an ISP and they relay you to a Microsoft edge data centre or POP.

The locations of ExpressRoute and Azure are often confused. A connection using ExpressRoute, at a very high level and from your perspective, has three pieces:

Circuit: A connection to a Microsoft edge data centre or pop. This can be one of many global locations that are often nothing to do with Azure regions; they are connected to the same Microsoft WAN as Azure (and Microsoft 365) and are a means to relay you to Azure (or Microsoft 365) using Azure ExpressRoute.
Connection: Connecting an Azure Virtual Network (ExpressRoute Gateway) in an Azure region to a circuit that terminates at the edge data centre or POP.
Peering: Configuring the routing across the circuit and connection.

For example, a customer in Eindhoven, Netherlands might have an ExpressRoute circuit that connects to “Amsterdam”; This POP or edge data centre is probably in Amsterdam, Netherlands, or the suburbs. The customer might use that circuit to connect to Azure West Europe, colloquially called “Amsterdam”, but is actually in Middenmeer, approximately 60 KM north of Amsterdam.

ExpressRoute Versus VPN

The choice between ExpressRoute and site-to-site VPN isn’t always as clear-cut as one might think: “big organisations go with ExpressRoute and small/mid go with VPN”. Very often, organisations are choosing to access Azure services over the Internet using HTTPS, with small amounts of legacy traffic traversing a private connection. In this case, VPN is perfect. But when you want an SLA or low latency, ExpressRoute is your choice.

	Site-to-Site VPN	ExpressRoute
Microsoft SLA	Microsoft: Azure Internet: No one	Microsoft: Azure Service Provider: Circuit
Max bandwidth	Aggregate of 10 Gbps	100 Gbps
Routing	BGP (even if you don’t use/enable it)	BGP
Latency	Internet	Low
Multi-Site	See SD-WAN (Azure Virtual WAN)	Global Reach Also see Azure Virtual WAN
Connections	Azure Virtual Networks	Azure Virtual Networks Other Azure Services Microsoft 365 Dynamics 365 Other clouds, depending on service provider
Payment	Outbound data transfer and your regular Internet connection	Payment to service provider for the circuit. Payment for either a metered (outbound data + circuit) or unlimited data (circuit) to Microsoft.

Terminology

Customer premises equipment (CPE) or Customer edge routers (CEs): 2, ideally, edge devices that will be connected in a highly available way to 2 lines connecting your network(s) to the service provider.
Provider edge routers (PEs), CE facing: Routers or switches operated by the service provider that the customer is connected to.
Provider edge routers (PEs), MSEE facing: Routers or switches operated by the service provider that connect to Microsoft’s MSEEs.
Microsoft Enterprise Edge (MSEE) routers: Routers in the Microsoft POP or edge data centre that the service provider has connected to.

The MSEE is what:

Your ExpressRoute virtual network gateway connects to.
Propagates BGP routes to your virtual network.
Can connect two virtual networks together (with BGP propagation) if they both connect to the same circuit (MSEE).
Can relay you to other Azure services or other Microsoft cloud services.

It is very strongly recommended that the customer deploys two highly available pieces of hardware for the CEs. The ExpressRoute virtual network gateway is also HA, but if the Azure region supports it, spread the two nodes across different availability zones for a higher level of availability.

FYI, these POPs or Edge Data Centers also host other Azure services for edge services.

Peering

Quite often, the primary use case for Azure ExpressRoute is to connect to Azure virtual networks, and resources connected to those virtual networks such as:

Virtual machines
VNet integrated SKUs such as App Service Environment, API Management, and SQL Managed Instance
Platform services supporting Private Endpoint

That connectivity is provided by Azure Private Peering. However, you can also connect to other Microsoft services using Microsoft Peering:

Services that are normally routed over the Internet, such as Azure SQL or Azure Storage Accounts.
Microsoft 365, if you can prove a legal/compliance requirement to Microsoft
Dynamics 365

To use Microsoft Peering you will need to configure NAT to convert connections from private IP addresses to public IP addresses before they enter the Microsoft network.

ExpressRoute And VPN

There are two scenarios where ExpressRoute and site-to-site VPN can coexist to connect the same on-premises network and virtual network.

The first is for failover. If you deploy a /27 or larger GatewaySubnet then that subnet can contain an ExpressRoute Virtual Network Gateway and a VPN Virtual Network Gateway. You can then configure ExpressRoute and VPN to connect the same on-premises and Azure networks. The scenario here is that the VPN tunnel will be an automated failover connection for the ExpressRoute circuit – failover will happen automatically with less than 10 packets being lost. Two things immediately come to mind:

Use a different ISP for Internet/VPN connection than used for ExpressRoute
Both connections must propagate the same on-premises networks.

An interesting new twist was announced recently for Virtual Network Gateway and Azure Virtual WAN. By default, there is no encryption on your ExpressRoute circuit (more on this later). You will be able to initiate a site-to-site VPN connection across the ExpressRoute circuit to a VPN Virtual Network Gateway that is in the same GatewaySubnet as the ExpressRoute Virtual Network Gateway, encrypting your traffic.

ExpressRoute Tiers

There are three tiers of ExpressRoute circuit that you can deploy in Microsoft Azure. I have not found a good comparison table, so the below will not be complete:

	Standard	Premium
Price	Normal	More Expensive
Azure Virtual WAN support	Announced, not GA	GA
Azure Global Reach	Limited to same geo-zone	All regions
Max connections per circuit	10	100, depending on the circuit size (Mbps) – 20 for 50 Mbps, 100 for 10 Gbps+
Connections from different subscriptions	No	Yes
Max routes advertised	Private peering: 4,000 Microsoft peering: 200	Private Peering: Up to 10,000 Microsoft peering: 200

I said “three tiers”, right? But there is also a third tier called Local which is very lightly documented. ExpressRoute Local is a subset of ExpressRoute Standard where:

The circuit can only connect to 1 or 2 Azure regions in the same metro as the POP or edge data centre. Therefore it is available in fewer locations than ExpressRoute Standard.
ExpressRoute Global Reach is not available.
It requires an unlimited data plan with at least 1 Gbps, coming in at ~25% of the price of a 1 Gbps Standard tier unlimited data plan.

Service Provider Types

There are three ways that a service provider can connect you to Azure using ExpressRoute, with two of them being:

Layer-2: A VLAN is stretched from your on-premises network to Azure
Layer-3: You connect to Azure over IP VPN or MPLS VPN. Your on-premises network connects either by BGP or a static default route.

There is a third option, called ExpressRoute Direct.

ExpressRoute Direct

A subset of the Microsoft POPs or edge data centres offer a third kind of connection for Azure ExpressRoute called ExpressRoute Direct. The features of this include:

Larger sizes: You can have sizes from 1 Gbps to 100 Gbps for massive data ingestion, for things like Cosmos DB or storage (HPC).
Physical Isolation: Some organisations will have a compliance reason to avoid connections to shared network equipment (the CEs and MSEE).
Granular control of circuit distribution: Based on business unit

This is a very specialised SKU that you must apply to use.

ExpressRoute FastPath

The normal flow of packets routing into Azure over ExpressRoute is:

Enter Microsoft at the MSEE
Travel via the ExpressRoute Virtual Network Gateway.
If a route table exists, follow that route, for example, to a hub-based firewall.
Route to the NIC of the virtual machine

There is a tiny latency penalty by routing through the Virtual Network Gateway. For a tiny percentage of customers, this latency may cause issues.

The concept of ExpressRoute Fast Path is that you can skip the hop of the virtual network gateway and route directly to the NICs of the virtual machines (in the same virtual network as the gateway).

To use this feature you must be using one of these gateway sizes:

Ultra Performance
ErGw3AZ

The following are not supported and will force traffic to route via the ExpressRoute Virtual Network Gateway:

There is a UDR on the GatewaySubnet
Virtual Network Peering is used. An alternative is to connect the otherwise-peered VNets directly to the circuit with their own VNet Gateway.
You use a Basic Load Balancer in front of the VMs; use a Standard tier Load Balancer.
You are attempting to connect to Private Endpoint.

ExpressRoute Global Reach

I think that ExpressRoute Global Reach is one of the more interesting features in ExpressRoute. You can have two or more offices, each with their own ExpressRoute (not Local tier) circuit to a local POP/edge data center, and enable Global Reach to allow:

The offices to connect to Azure/Microsoft cloud resources
Connect to each other over the Microsoft WAN instead of deploying a WAN

Note that ExpressRoute Standard will support connecting locations in the same geo-zone, and ExpressRoute Premium will support all geo-zones. Supported POPs are limited to a small subset of locations.

Encryption

Traffic over ExpressRoute is not encrypted and as Edward Snowden informed us, various countries are doing things to sniff traffic. If you wish to protect your traffic you will have to “bring your own key”. We have a few options:

The aforementioned VPN over ExpressRoute, which is available now for Virtual Network Gateway and Azure Virtual WAN.
Implement a site-to-site VPN across ExpressRoute using a third-party virtual appliance hosted in the Azure VNet.
IPsec configured on each guest OS, limited to machines.
MACsec, a Layer-2 feature where you can implement your own encryption from your VE to the MSEE, encrypting all traffic, not just to/from VMs.

The MACsec key is stored securely in Azure Key Vault. From what I can see, MACsec is only available on ExpressRoute Direct. Microsoft claims that it does not cause a performance issue on their routers, but they do warn you to check your CE vendor guidance.

Multi-Cloud

Now you’ll see why I talked about Layer-2 and Layer-3. Depending on your service provider type and their connectivity to non-Microsoft clouds, if you have a circuit with the service provider (from your CEs to their CE facing PEs) that same circuit can be used to connect to Azure over ExpressRoute and to other clouds such as AWS or others. With BGP propagation, you could route from on-premises to/from either cloud, and your deployments in those clouds could route to each other.

Bidirectional Forwarding Detection (BFD)

The circuit is deployed as two connections, ideally connected to 2 CEs in your edge network. Failover is automated, but some will want failover to be as quick as possible. You can reduce the BGP keepalive and hold-time but this will be processor intensive on the network equipment.

A feature called BFD can detect link failure in a sub-second with low overhead. BFD is enabled on “newly” created ExpressRoute private peering interfaces on the MSEEs – you can reset the peering if required. If you want this feature then you need to enable it on your CEs – the service provider must also enable it on their PEs.

Monitoring

Azure Monitor provides a bunch of metrics for ExpressRoute that you can visualise or create alerts on.

Azure’s Connection Monitor is the Microsoft-offered solution for monitoring an ExpressRoute connection. The idea is that a Log Analytics agent (Windows or Linux) is deployed onto one or more always-on on-premises machines. A test is configured to run across the circuit measuring availability and performance.

Azure Virtual WAN – Connectivity

In this post, I’ll explain how Azure Virtual WAN offers its core service: connections.

SD-WAN

Some of you might be thinking – this is just for large corporations and I’m outta here. Don’t run just yet. Azure Virtual WAN is a rethinking of how to:

Connect users to Azure services and on-premises at the same time
Connect sites to Azure and (optionally) other sites
Replace the legacy hardware-defined WAN
Connect Azure virtual networks together.

That first point is quite timely – connecting users to services. Work-from-home (WFH) has forced enterprises to find ways to connect users to services no matter where they are. That connectivity was often limited to a privileged few. The pandemic forced small/large organisations to re-think productivity connectivity and to scale out. Before COVID19 struck, I was starting to encounter businesses that were considering (some even starting) to replace their legacy MPLS WAN with a software-defined WAN (SD-WAN) where media of different types, suitable to different kinds of sites/users/services, were aggregated via appliances; this SD-WAN is lower cost, more flexible, and by leveraging local connectivity, enables smaller locations, such as offices or retail outlets, to have an affordable direct connection to the cloud for better performance. How the on-premises part of the SD-WAN is managed is completely up to you; some will take direct control and some will outsource it to a network service provider.

Connections

Azure Virtual WAN is all about connections. When you start to read about the new Custom Routing model in Azure Virtual WAN, you’ll see how route tables are associated with connections. In summary, a connection is a link between an on-premises location (referred to as a branch, even if it’s HQ) or a spoke virtual network with a Hub. And now we need to talk about some Azure resources.

Azure Resources

I’ve provided lots more depth on this topic elsewhere so I will keep this to the basics. There are two core resources in Azure Virtual WAN:

A Virtual WAN
A Hub

The Virtual WAN is a logical resource that provides a global service, although it is actually located in one Azure region. Any hubs that are connected to this Virtual WAN resource can talk to each other (automatically), route it’s connections to another hub’s connections and share resources.

A Virtual WAN Hub is similar to a hub in an Azure hub & spoke architecture. It is a central routing point (with a hidden virtual router) that is the meeting point for any connections to that hub. An Azure region can have 1 hub in your tenant. That means I can have 1 Hub in West Europe and 1 Hub in East US. The Hubs must be connected to an Azure WAN resource; if they share a WAN resource then their connections can talk to each other. I might have all my branches in Europe connect to the Hub in West Europe, and I will connect all my spoke virtual networks in West Europe to the Hub in West Europe too; this means that by default (and I can control this):

The virtual networks can route to each other
The virtual networks can route to the branches
The branches can route to the virtual networks
The branches can route to other branches

We can extend this routing by connecting my branches in North America to the East US Hub and the spoke virtual networks in East US to the East US Hub. Yes; all those North American locations can route to each other. Because the Hubs are connected to a common Virtual WAN, the routing now extends across the Microsoft WAN. That means a retail outlet in the further reaches of northwest rural Ireland can connect to services hosted in East US, via a connection to the Hub in West Europe, and then hopping across the Atlantic Ocean using Microsoft’s low-latency WAN. Nice, right? Even, better – it routes just like that automatically if you are using SD-WAN appliances in the branches.

Note that a managed WAN might wire up that retail outlet differently, but still provide a fairly low-latency connection to the local Hub.

Branch Connections

If you have done any Azure networking then you are probably familiar with:

Site-to-site VPN: Connecting a location with a cost-effective but no-SLA VPN tunnel to Azure.
ExpressRoute: A circuit rented from an ISP for low-latency, high bandwidth, and an SLA-supported private connection to Azure
Point-to-Site VPN: Enabling end-users to create a private VPN tunnel to Azure from their devices while on the move or working from home

Each of the above is enabled in Azure using a Virtual Network Gateway, each running independently. Routing from branch to branch is not an intended purpose. Routing from user to the branch is not an intended purpose. The Virtual Network Gateway’s job is to connect a user to Azure.

The Azure Virtual WAN Hub supports gateways – as hidden resources that must be enabled and configured. All three of the above media types are supported as 3 different types of gateway, sized based on a billing concept called scale units – more scale units means more bandwidth and more cost, with a maximum hub throughput of 40 Gbps (including traffic to/from/between spokes).

Note that a Secured Virtual Hub, featuring the Azure Firewall, has a limit of 30 Gbps if all traffic is routed through that firewall.

You can be flexible with the branch connections. Some locations might be small and have a VPN connection to the Hub. Other locations might require an SLA and use ExpressRoute. Some might require low latency or greater bandwidth and use higher SKUs of ExpressRoute. And of course, some users will be on the move or at home and use P2S VPN. A combination of all 3 connection types can be used at once, providing each location and user the connections and costs that suit them best.

You will be using ExpressRoute Standard for Azure Virtual WAN; this is a requirement. I don’t think there’s really too much more to say here – the tech just works once the circuit is up, and a combination of Global Reach and the any-to-any connections/routing of Azure WAN means that things will just work.

Site-to-Site VPN

The VPN gateway is deployed in an active/active cluster configuration with two public IP addresses. A branch using VPN for connectivity can have:

A single VPN connection over a single ISP connection.
Resilient VPN connections over two ISP connections, ideally with different physical providers or even media types.

An on-premises SD-WAN appliance is strongly recommended for Azure Virtual WAN, but you can use any VPN appliance that is supported for route-based VPN by Microsoft Azure; if you are doing the latter you can use BGP or the Azure WAN alternative to Local Network Gateway-provided prefixes for routing to on-premises.

Point-to-Site (P2S) VPN

The P2S gateway offers a superior service to what you might have observed with the traditional Virtual Network Gateway for VPN. Connectivity from the user device is to a hub with a routing appliance. Any-to-any connectivity treats the user device as a branch, albeit in a dedicated network address space. Once the user has connected the VPN tunnel, they can route to (by default):

Any spoke virtual network connected to the Hub
Any spoke virtual network connected to another Hub on the same Virtual WAN
Any branch office connected to any Hub on the Virtual WAN

In summary, the user is connected to the WAN as a result of being connected to the Hub and is subject to the routing and firewall configurations of that Hub. That’s a pretty nice WFH connectivity solution.

Note that you have support for certificate and RADIUS authentication in P2S VPN, as well as the OpenVPN and Microsoft client.

The Connectivity Experience

Imagine we’re back in normal times again with common business travel. A user in Amsterdam could sit down at their desk in the office and connect to services in West Europe via VPN. They could travel to a small office in Luxembourg and connect to the same services via VPN with no discernible difference. That user could travel to a conference in London and use P2S VPN from their hotel room to connect via the Amsterdam Hub. Now that user might get a jet to Philadelphia, and use their mobile hotspot to offer connectivity to the Azure Virtual WAN Hub in East US via P2S VPN – and the experience is no different!

One concept I would like to try out and get a support statement on is to abstract the IP addresses and locations of the P2S gateways using Azure Traffic Manager so the user only needs to VPN to a single FQDN and is directed (using the performance profile) to the closest (latency) Hub in the Virtual WAN with a P2S gateway.

Simplicity

So much is done for you with Azure Virtual WAN. If you like to click in the Azure Portal, it’s a pretty simple set up to get things going, although security engineering looks to have a steep learning curve with Custom Routing. By default, everything is connected to everything; that’s what a network should do. You shouldn’t have to figure out how to route from A to B. I believe that Azure WAN will offer a superior connectivity solution, even for a single location organisation. That’s why I’ve been spending time figuring this tech out over the last few weeks.

Azure Virtual WAN ARM – The Resources

In this post, I will explain the types of resources used in Azure Virtual WAN and the nature of their relationships.

Note, I have not included any content on the recently announced preview of third-party NVAs. I have not seen any materials on this yet to base such a post on and, being honest, I don’t have any use-cases for third-party NVAs.

As you can see – there are quite a few resources involved … and some that you won’t see listed at all because of the “appliance-like” nature of the deployment. I have not included any detail on spokes or “branch offices”, which would require further resources. The below diagram is enough to get a hub operational and connected to on-premises locations and spoke virtual networks.

The Virtual WAN – Microsoft.Network/virtualWans

You need at least one Virtual WAN to be deployed. This is what the hub will connect to, and you can connect many hubs to a common Virtual WAN to get automated any-to-any connectivity across the Microsoft physical WAN.

Surprisingly, the resource is deployed to an Azure region and not as a global resource, such as other global resources such as Traffic Manager or Azure DNS.

The Virtual Hub – Microsoft.Network/virtualHubs

Also known as the hub, the Virtual Hub is deployed once, and once only, per Azure region where you need a hub. This hub replaces the old hub virtual network (plus gateway(s), plus firewall, plus route tables) deployment you might be used to. The hub is deployed as a hidden resource, managed through the Virtual WAN in the Azure Portal or via scripting/ARM.

The hub is associated with the Virtual WAN through a virtualWAN property that references the resource ID of the virtualWans resource.

In a previous post, I referred to a chicken & egg scenario with the virtualHubs resource. The hub has properties that point to the resource IDs of each deployed gateway:

vpnGateway: For site-to-site VPN.
expressRouteGateway: For ExpressRoute circuit connectivity.
p2sVpnGateway: For end-user/device tunnels.

If you choose to deploy a “Secured Virtual Hub” there will also be a property called azureFirewall that will point to the resource ID of an Azure Firewall with the AZFW_Hub SKU.

Note, the restriction of 1 hub per Azure region does introduce a bottleneck. Under the covers of the platform, there is actually a virtual network. The only clue to this network will be in the peering properties of your spoke virtual networks. A single virtual network can have, today, a maximum of 500 spokes. So that means you will have a maximum of 500 spokes per Azure region.

Routing Tables – Microsoft.Network/virtualHubs/hubRouteTables & Microsoft.Network/virtualHubs/routeTables

These are resources that are used in custom routing, a recently announced as GA feature that won’t be live until August 3rd, according to the Azure Portal. The resource control the flows of traffic in your hub and spoke architecture. They are child-resources of the virtualHubs resource so no references of hub resource IDs are required.

Azure Firewall – Microsoft.Network/azureFirewalls

This is an optional resource that is deployed when you want a “Secured Virtual Hub”. Today, this is the only way to put a firewall into the hub, although a new preview program should make it possible for third-parties to join the hub. Alternatively, you can use custom routing to force north-south and east-west traffic through an NVA that is running in a spoke, although that will double peering costs.

The Azure Firewall is deployed with the AZFW_Hub SKU. The firewall is not a hidden resource. To manage the firewall, you must use an Azure Firewall Policy (aka Azure Firewall Manager). The firewall has a property called firewallPolicy that points to the resource ID of a firewallPolicies resource.

Azure Firewall Policy – Microsoft.Network/firewallPolicies

This is a resource that allows you to manage an Azure Firewall, in this case, an AZFW_Hub SKU of Azure Firewall. Although not shown here, you can deploy a parent/child configuration of policies to manage firewall configurations and rules in a global/local way.

VPN Gateway – Microsoft.Network/vpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using VPN. The VPN Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the VPN gateway resource ID using a property called vpnGateway.

ExpressRoute Gateway – Microsoft.Network/expressRouteGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using ExpressRoute. The ExpressRoute Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the ExpressRoute gateway resource ID using a property called p2sGateway.

Point-to-Site Gateway – Microsoft.Network/p2sVpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides users/devices with connectivity using VPN tunnels. The Point-to-Site Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

The Point-to-Site Gateway inherits a VPN configuration from a VPN configuration resource based on Microsoft.Network/vpnServerConfigurations, referring to the configuration resource by its resource ID using a property called vpnServerConfiguration.

Note that the virtualHubs resource must also point at the resource ID of the Point-to-Site gateway resource ID using a property called p2sVpnGateway.

VPN Server Configuration – Microsoft.Network/vpnServerConfigurations

This configuration for Point-to-Site VPN gateways can be seen in the Azure WAN and is intended as a shared configuration that is reusable with more than one Point-to-Site VPN Gateway. To be honest, I can see myself using it as a per-region configuration because of some values like DNS servers and RADIUS servers that will probably be placed per-region for performance and resilience reasons. This is a hidden resource.

The following resources were added on 22nd July 2020:

VPN Sites – Microsoft.Network/vpnSites

This resource has a similar purpose to a Local Network Gateway for site-to-site VPN connections; it describes the on-premises location, AKA “branch office”. A VPN site can be associated with one or many hubs, so it is actually connected to the Virtual WAN resource ID using a property called virtualWan. This is a hidden resource.

An array property called vpnSiteLinks describes possible connections to on-premises firewall devices.

VPN Connections – Microsoft.Network/vpnGateways/vpnConnections

A VPN Connections resource associates a VPN Gateway with the on-premises location that is described by an associated VPN Site. The vpnConnections resource is a child resource of vpnGateways, so there is no actual resource; the vpnConnections resource takes its name from the parent VPN Gateway, and the resource ID is an extension of the parent VPN Gateway resource ID.

By necessity, there is some complexity with this resource type. The remoteVpnSite property links the vpnConnections resource with the resource ID of a VPN Site resource. An array property, called vpnSiteLinkConnections, is used to connect the gateway to the on-premises location using 1 or 2 connections, each linking from vpnSiteLinkConnections to the resource/property ID of 1 or 2 vpnSiteLinks properties in the VPN Site. With one site link connection, you have a single VPN tunnel to the on-premises location. With 2 link connections, the VPN Gateway will take advantage of its active/active configuration to set up resilient tunnels to the on-premises location.

Virtual Network Connections – Microsoft.Network/virtualHubs/hubVirtualNetworkConnections

The purpose of a hub is to share resources with spoke virtual networks. In the case of the Virtual Hub, those resources are gateways, and maybe a firewall in the case of Secured Virtual Hub. As with a normal VNet-based hub & spoke, VNet peering is used. However, the way that VNet peering is used changes with the Virtual Hub; the deployment is done using the hub/VirtualNetworkConnections child resource, whose parent is the Virtual Hub. Therefore, the name and resource ID are based on the name and resource ID of the Virtual Hub resource.

The deployment is rather simple; you create a Virtual Network Connection in the hub specifying the resource ID of the spoke virtual network, using a property called remoteVirtualNetwork. The underlying resource provider will initiate both sides of the peering connection on your behalf – there is no deployment required in the spoke virtual network resource. The Virtual Network Connection will reference the Hub Route Tables in the hub to configure route association and propagation.

More Resources

There are more resources that I’ve yet to document, including:

Verifying Propagated BGP Routes on Azure ExpressRoute

An important step of verifying or troubleshooting communications over ExpressRoute is checking that all the required routes to get to on-premises or WAN subnets have been propagated by BGP to your ExpressRoute Virtual Network Gateway (and the connected virtual networks) by the on-premises edge router.

The Problem

Routing to Azure is often easy; your network admins allocate you a block of private address space on the “WAN” and you use it for your virtual network(s). They add a route entry to that CIDR block on their VPN/ExpressRoute edge device and packets can now get to Azure. The other part of that story is that Azure needs to know how to send packets back to on-premises – this affects responses and requests. And I have found that this is often overlooked and people start saying things like “Azure networking is broken” when they haven’t sent a route to Azure so that the Azure resources connected to the virtual network(s) can respond.

The other big cause is that the on-premises edge firewall doesn’t allow the traffic – this is the #1 cause of RDP/SSH to Azure virtual machines not working, in my experience.

I had one such scenario where a system in Azure was “not-accessible”. We verified that everything in Azure was correct. When we looked at the propagated BGP routes (via ExpressRoute) then we saw the client subnets were not included in the Route Table. The on-prem network admins had not propagated those routes so the Azure ExpressRoute Gateway did not have a route to send clients responses to. Once the route was propagated, things worked as expected.

Finding the Routes

There are two ways you can do this. The first is to use PowerShell:

Get-AzExpressRouteCircuitRouteTable -DevicePath Primary -ExpressRouteCircuitName TheNameOfMyCircuitResourceInAzure -PeeringType AzurePrivatePeering -ResourceGroupName TheNameOfTheResourceGroupTheCircuitResourceIsIn

The command takes quite a while to run. Eventually, it will spit out the full route table. If there are lots of routes (there could be hundreds if not thousands) then they will scroll beyond the buffer of your console. So modify the command to send the output to a text file:

Get-AzExpressRouteCircuitRouteTable -DevicePath Primary -ExpressRouteCircuitName TheNameOfMyCircuitResourceInAzure -PeeringType AzurePrivatePeering -ResourceGroupName TheNameOfTheResourceGroupTheCircuitResourceIsIn > BgpRouteTable.txt

Unfortunately, it does not create a CSV format by default but one could format the output to get something that’s easier to filter and manipulate.

You can also use the Azure Portal where you can view routes from the Route Table and export a CSV file with the contents of the Route Table. Open the ExpressRoute Circuit and browse to Peerings.

Click Azure Private, which is the site-to-site ExpressRoute connection.

Now a pop-up blade appears in the Azure Portal called Private Peering. There are three interesting options here:

Get ARP records to see information on ARP.
Get Route Table – more on this in a second.
Get Route Table Summary to get a breakdown/summary of the records, including neighbor, version, status ASN, and a count of routes.

We want to see the Route Table so you click that option. Another pop-up blade appears and now you wait for several minutes. Eventually, the screen will load up to 200 of the entries from the Route Table. If you want to see the entire list of entries or you want an export, click Download. A CSV file will download via your browser, with one line per route from the Route Table, including every one of the routes.

Search the Route Table and look for a listing that either lists the on-premises/WAN subnet or includes it’s space, for example, a route to 10.10.0.0/16 includes a subnet called 10.10.10.0/24.

Microsoft Ignite 2019 – Global Transit Network Architectures With Azure Virtual WAN

Speakers:

Reshmi Yandapalli (main speaker), Principal Program Manager
Ben Peeri, KPMG customer story

Lots more content in the hidden slides in the download.

Scale

Usual stats. Interesting note: a new POP being built almost every day.

Azure WAN: Global Transit Architecture

The Beginning

HQ/Bigger Office
Branhc office(s)
Users
Private WAN
Shared services

Start with HQ. Users multiply. VLANs multiply. Locations multiply. WAN grows. You grow:

Need to simplify network
Need ease of use
Need operational savings.

Azure Virtual WAN

Managed hub & spoke architecture, with hub being Azure and spokes being offices.
Public (VPN) and private (ExpressRoute) connectivity.
Global Scale:
20 Gbps S2S VPN and 20 Gbps ER = 20 Gbps user VPN
10K users per hub
1000 sites per hub
1 hub per region
Transit routing
Cloud Network Orchestration
Automated large-sale branch/SDWAN CPE connectivity

Connectivity

What if you had many regions – many hubs. And what if you wanted any branch to access any Azure VNet, regardless of local vWAN hub? In other words, connect to a hub, and use the Azure WAN to seamlessly reach the destination. So you build hub/spoke in different Azure regions, each with a vWAN hub. And a branch connects to the closest vWAN hub, and can get to any Azure VNet via transitive routing between vWAN hubs across the Azure WAN.

Simplified network
Ease of use
Operational savings

This is called Global Transit Architecture over Azure Virtual WAN.

Azure Virtual WAN – What’s New

Any-to-Any connectivity (Preview, soon GA)
ExpressRoute and User VPN GA
ExpressRoute encryption
Multi-link Azure Path Selection
Custom IPsec
Connect VNG VPN to Virtual WAN
Availble in Gov Cloud & China
Azure Firewall integration (Preview) – this is the big announcement IMO
Pricing – reduced
New partnerships coming soon
- Arista,
- Aruba
- Cisco
- F5
- OpenSystems
- VeroCloud

Global Transit Architecture – A Customer Example

4 regions, 70 countries with 100’s of sites. 34 VNets, 2 ExpressRoute Premium circuits.
Challenges: scale issues, routing complexity, ER VNet limits

The before and after architecture diagrams are totally different – after is much more simple.

Azure Virtual WAN Types

Basic:

VPN only
- Branch to Azure
- Branch to Branch
Connect VNet
- DIY VNet peering, VNet to VNet non-transitive via hub
- Hubs are not connected

Standard = Basic + Following

Stuff

Multi-Link Support in VPN Site

Support dual links of different types/ISPs. Azure sees the link information. The branch partner can do path selection across these links.

Barracuda CloudGen Firewall is the first to support this. You get always-on Azure in the branch.

ExpressRoute

GA in Standard Virtual WAN.
Up to 20 Gbps aggregate per hub.
Private connectivity – requires premium circuit.
In Global Reach Location
ExpressRoute VPN Interconnect
Integrated with Azure Monitor

EXPRESSROUTE + VPN Path Selection

Path selection between ER and VPN. Fortinet can do this.

Customer Story – Ben Peeri, KPMG

No notes here – sales story.

User VPN

Available in Standard Virtual WAN
Up to 20 Gbps aggregate and 10K users per hub
Cloud based secure remote access
- Works with OpenVON and IKEv2 client
- Cert based and RADIU authentication
Any-to-Any
- User to branch, user to Azure VNet
More

Azure Firewall

Firewall in virtual hub
Centralized policy and route management
- VNet to Internet through Azure Firewall
- Branch to Internet through Azure Firewall
- Managed through Azure Firewall Manager

Azure MSP Program

Announced in July. Focused on networking. Offerings in Azure Marketplace.

Pricing

Connection Unit
- Site-to-site VPN / ExpressRoute: No reduced
- User VPN
Scale Unit – aggregate throughput
- 1 VPN scale unit
- 1 ER scale unit
Virtual Hub (Effective CYQ1 2020)
- Basic vWAN hub: no charge
- Standard hub
- Data processing intra region
- Data processing inter region

Private Connections to Azure PaaS Services

In this post, I’d like to explain a few options you have to get secure/private connections to Azure’s platform-as-a-service offerings.

Express Route – Microsoft Peering

ExpressRoute comes in a few forms, but at a basic level, it’s a “WAN” connection to Azure virtual networks via one or more virtual network gateways; Customers this private peering to connect on-premises networks to Azure virtual networks over an SLA-protected private circuit. However, there is another form of peering that you can do over an ExpressRoute circuit called Microsoft peering. This is where you can use your private circuit to connect to Microsoft cloud services that are normally connected to over the public Internet. What you get:

Private access to PaaS services from your on-premises networks.
Access to an entire service, such as Azure SQL.
A wide array of Azure and non-Azure Microsoft cloud services.

FYI, Office 365 is often mentioned here. In theory, you can access Office 365 over Microsoft peering/ExpressRoute. However, the Office 365 group must first grant you permission to do this – the last I checked, you had to have legal proof of a regulatory need for private access to Cloud services.

Service Endpoint

Imagine that you are running some resources in Azure, such as virtual machines or App Service Environment (ASE); these are virtual network integrated services. Now consider that these services might need to connect to other services such as storage accounts, Azure SQL, or others. Normally, when a VNet connected resource is communicating with, say, Azure SQL, the packets will be routed to “Internet” via the 0.0.0.0/0 default route for the subnet – “Internet” is everywhere outside the virtual network, not necessarily The Internet. The flow will hit the “public” Azure backbone and route to the Azure SQL compute cluster. There are two things about that flow:

It is indirect and introduces latency.
It traverses a shared network space.
A growing number of Azure-only services that support service endpoints.

A growing number of services, including storage accounts, Azure SQL, Cosmos DB, and Key Vault, all have services endpoints available to them. You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet” and the packets will “drop” through the service endpoint to the required Azure service – make sure that any firewall in the service accepts packets from the private subnet IP address of the source (VM or whatever). Now you have a more direct and more private connection to the platform service in Azure from your VNet. What you get:

Private access to PaaS services from your Azure virtual networks.
Access to an entire service, such as Azure SQL, but you can limit this to a region.

Service Endpoint Trick #1

Did you notice in the previous section on service endpoints that I said:

You can enable a service endpoint anywhere in the route from the VM (or whatever) to “Internet”

Imagine you have a complex network and not everyone enables service endpoints the way that they should. But you manage the firewall, the public IPs, and the routing. Well, my friend, you can force traffic to support Azure platform services via service endpoints. If you have a firewall, then your routes to “Internet” should direct outbound traffic through the firewall. In the firewall (frontend) subnet, you can enable all the Azure service endpoints. Now when packets egress the firewall, they will “drop” through the service endpoints and to the desired Azure platform service, without ever reaching “Internet”.

Service Endpoint Trick #2

You might know that I like Azure Firewall. Here’s a trick that the Azure networking teams shared with me – it’s similar to the above one but is for on-premises clients trying to access Azure platform services.

You’ve got a VPN connection to a complex virtual network architecture in Azure. And at the frontend of this architecture is Azure Firewall, sitting in the AzureFirewallSubnet; in this subnet you enabled all the available service endpoints. Let’s say that someone wants to connect to Azure SQL using Power BI on their on-premises desktop. Normally that traffic will go over the Internet. What you can do is configure name resolution on your network (or PC) for the database to point at the private IP address of the Azure Firewall. Now Power BI will forward traffic to Azure Firewall, which will relay you to Azure SQL via the service endpoint. What you get:

Private access to PaaS services from your on-premises or Azure networks.
Access to individual instances of a service, such as an Azure SQL server
A growing number of Azure-only services that support service endpoints.

Private Link

In this post, I’m focusing on only one of the 3 current scenarios for Private Link, which is currently in unsupported preview in limited US regions only, for limited platform services – in other words, it’s early days.

This approach aims to give a similar solution to the above “Service Endpoint Trick #2” without the use of trickery. You can connect an instance of an Azure platform service to a virtual network using Private Link. That instance will now have a private IP address on the VNet subnet, making it fully routable on your virtual network. The private link gets a globally unique record in the Microsoft-managed privatelink.database.windows.net DNS zone. For example, your Azure SQL Server would now be resolvable to the private IP address of the private link as yourazuresqlsvr.privatelink.database.windows.net. Now your clients, be the in Azure or on-premises, can connect to this DNS name/IP address to connect to this Azure SQL instance. What you get:

Private access to PaaS services from your on-premises or Azure networks.
Access to individual instances of a service, such as an Azure SQL server.
(PREVIEW LIMITATIONS) A limited number of platform services in limited US-only regions.