Virtual WAN Is Not Required For SD-WAN

Did you know that you do not need to use Virtual WAN to implement an SD-WAN with Azure? In fact, contrary to the recommendations from Microsoft, Virtual WAN might be the worst way to add Azure networks to an SD-WAN.

My History With Virtual WAN

You might think that the introduction of this post paints me as a complete hater who has never given Virtual WAN a chance. I have. In fact, I can point out features that some of my 1:1 feedback calls probably contributed to. I’ve implemented Virtual WAN with customers.

However, I’ve seen the problems. I’ve seen that the hype doesn’t always work. I’ve personally experienced the lack of troubleshooting capabilities that depended on my deep understanding of the hidden networking. I’ve seen colleagues struggle with the complexity. I’ve seen how some customers’ routing requirements cannot be met with Virtual WAN. And many architectural features that some organisations require cannot be deployed with Virtual WAN.

I concluded that my time with Virtual WAN was over during a proof of concept that I insisted a customer do. They had previously used Virtual WAN without a firewall. I was asked to build a new multi-region Azure environment (multiple hubs) with firewalls. I was not sure that it would go well – this was before routing intent was in preview. I tested and confirmed that Virtual WAN was not going to work; the customer implemented a Meraki SD-WAN using Virtual Network-based hubs and lost no functionality. In fact, they gained functionality.

In an older case, I convinced a customer to go with Virtual WAN. I regret this one. There was a lot of hype. They used Meraki. There was a solution from Meraki to integrate with the Virtual WAN VPN Gateway. We found bugs in the script and fixed them. But the most annoying thing about that solution was that every time the customer changed anything in the SD-WAN, every VPN tunnel to Azure was torn down and recreated. I heard recently that the customer is looking to remove SD-WAN. I don’t blame them, and I regret ever recommending it to them.

The Microsoft Claims

The Azure Cloud Adoption Framework incorrectly states the following:

Use a Virtual WAN topology if any of the following requirements apply to your organization:

  • Your organization intends to deploy resources across several Azure regions and requires global connectivity between virtual networks in these Azure regions and multiple on-premises locations.
  • Your organization intends to use a software-defined WAN (SD-WAN) deployment to integrate a large-scale branch network directly into Azure, or requires more than 30 branch sites for native IPSec termination.
  • You require transitive routing between a virtual private network (VPN) and Azure ExpressRoute. For example, if you use a site-to-site VPN to connect remote branches or a point-to-site VPN to connect remote users, you might need to connect the VPN to an ExpressRoute-connected DC through Azure.

I will burst those bubbles one by one.

Several Regions & Global Connectivity

Do you want to deploy across multiple regions? Not a problem. You can very easily do that with Virtual Network-based hubs. I’ve done it again and again.

Do you want to connect the spokes in different regions? Yup, also easy:

  • Build each hub-and-spoke from a single IP prefix.
  • Your spokes already route via the hub.
  • Peer the hubs.
  • Create User-Defined Routes in each firewall subnet (you will be using firewalls in this day and age) to route to remote hub-and-spoke IP prefixes via the remote hub firewalls.

Job done! The only additional steps were:

  • Peer the hubs
  • Add UDRs to each firewall subnet for each remote hub-and-spoke IP prefix

You do that once. Once!

How about connecting the remote sites? Simples: you connect them as usual.

There is some marketing material about how we can use the Microsoft WAN as the company WAN using vWAN. Yes, in theory. The concept is that the Microsoft Global WAN is amazing. You VPN from site A (let’s say Oslo, Norway) to a local Azure region and you VPN from site B (let’s say Houston, Texas) to a local Azure region. Then vWAN automatically enables Oslo <> Texas connectivity over the Microsoft Global Network. Yes, it does. And the performance should be amazing. I did a proof-of-concept in 2 hours with a customer. The performance of VPN directly between Oslo <> Houston was much better. Don’t buy the hype! Question it and test. And by the way, we can build this with VNets too – I was told by an MS partner that they did this solution between two sites on different continents years before vWAN existed.

SD-WAN

Microsoft suggests that you can only add Azure networks to an SD-WAN if you use Virtual WAN.

Here’s some truth. Under the covers, vWAN hub is built on a traditional Virtual Network. Then you can use (don’t) a VPN Gateway or a third-party SD-WAN appliance for connectivity.

The list of partners supporting vWAN was greatly increased recently – I remember looking for Meraki support a few months ago, and it was not there (it is now). But guess what, I bet you that everyone one of those partners offers the exact same solution for Virtual Networks via the Marketplace. And I bet:

  • There are more partner options
  • There are no trade-offs
  • The resilience is just the same

I have done Azure/Meraki SD-WAN twice since the above customer X. In both cases, we went with the Azure Marketplace and Virtual Network. And in both cases, it was:

  • Dead simple to set up.
  • It worked the first time.

Transitive Routing

Virtual WAN is powered by a feature that is hidden unless you do an ARM export. That feature is Azure Route Server. Did you know:

  • You can deploy Azure Route Server to a Virtual Network. The deployment is a next-next-net.
  • It can be easily BGP peered with a third-party networking appliance.
  • The Azure Route server will learn remote site prefixes from the networking appliance/SD-WAN.
  • The Azure Route Server will advertise routes to the networking appliance/SD-WAN.

Azure Route Server BGP propagation is managed using the same VNet peering settings as Virtual Network Gateway.

There is a single checkbox (true/false property) to enable transitive routing between VPN/ExpressRoute remote sites. And that setting is amazing.

I signed in to work one day and was asked a question. I had built out the environment for a large customer with an HQ in Oslo:

  • Remote sites around the world with a Meraki SD-WAN.
  • Leased line to Oracle Cloud – the global sites backhauled through Oslo.
  • The VNet-based hub in Azure was added to the SD-WAN. All offices wre connected directly to Azure via VPN.
  • Azure Route Server was added and peered to the Meraki SD-WAN.
  • Azure had an ExpressRoute connection (Oracle Cloud Interconnect) to Oracle Cloud.

An excavator has torn up the leased line to Oracle. The essential services in Oracle Cloud were unavailable. I was asked if the Azure connection to Oracle Cloud coule be leveraged to get the business back online? I thought for 30 seconds and said, “Yes, give me 5 minutes”. Here’s what I did:

  1. I check the box to enable transitive routing in Azure Route Server.
  2. I clicked Save/Apply and waited a few minutes for the update task
  3. I asked the client to test.

And guess what? Contrary to the above CAF text, the client was back online. A few weeks later, I was told that not only did they get back online, but the SD-WAN connection to the VIRUTAL NETWORK-BASED hub in Azure gave the global branch offices lower latency connections than their backhaul through Oslo to Oracle Cloud. Whoda-thunk-it?

vWAN is PaaS

One of the arguments for the vWAN hub is that it pushes complexity down into the platform; it’s a PaaS sub-resource.

Yes, it’s a PaaS sub-resource. Is a well-designed hub complex? A hub should contain very few resources, based around:

  • Remote connectivity resource
  • Firewall
  • Maybe Azure Bastion

There’s not much more to a hub than that if you value security. What exactly am I saving with the more-expensive vWAN?

Limitations of vWAN

Let’s start with performance. A hub in Virtual WAN has a throughput limitation of 50 Gbps. I thought that was a theoretical limit … until I did a network review for a client a few years ago. They had a single workload that pushed 29Gbps through the hub, 1 Gbps shy of the limit for a Standard tier Azure Firewall. I recommended an increase to the 100 Gbps Premium tier, but warned that the bottleneck was always going to be the vWAN hub.

The architectural limitations of vWAN are many – so many that I will miss some:

  • No VNet Flow Logs
  • Impossible to troubleshoot routing/connectivity in a real way
  • No support for Azure Bastion in the hub
  • No support for NAT Gateway for firewall egress traffic (SNAT port exhaustion)
  • Secured traffic between different secured (firewall) hubs requires Routing Intent
  • No Forced Tunnelling in Azure Firewall without Routing Intent
  • Routing Intent is overly simplistic – everything goes through the firewall
  • No support for IP Prefix for the firewall
  • Azure Firewall cannot use Route Server Integration (auto-configuration of non-RFC1918 usage in private networks)
  • Hub Route Tables are a complexity nightmare

Impossible Solution

Anyone who has deployed more than a couple of Azure networks has heard the following statement made regarding failing connections over site-to-site networking:

The Azure network is broken

A new site-to-site appliance or firewall has been placed in Azure, and the root cause of the issue is “never the remote network“.

Proving that the issue isn’t the firewall can be tricky. That’s because firewall appliances are black boxes. I updated my standard hub design last year to assist with this:

  • Add a subnet with identical routing configuration (BGP propagation and user-defined routes) as the (private) firewall subnet.
  • Add a low-spec B-series VM to this subnet with an autoshutdown. This VM is used only for diagnostics.

The design allows an Azure admin to log into the VM. The VM mimics the connectivity of the firewall and allows tests to be done against failing connections. If the test fails from the VM, it proves that the firewall is not at fault.

No other compute resources are placed in the hub.

Here’s the gotcha. I can do this in a VNet hub. I cannot do this in a vWAN hub. The vWAN hub Virtual Network is in a Microsoft-managed tenant/subscription. You have no access, and you cannot troubleshoot it. You are entirely at the mercy of Azure support – and, sadly, we know how that process will go.

Virtual WAN In Summary

You do not need Virtual WAN for connectivity or SD-WAN. So why would one adopt it instead of VNet-based hubs, especially when you consider costs and the loss of functionality? I just do not understand (a) why Microsoft continues to push Virtual WAN and (b) why it continues to exist.

Tracing Packets in Azure Networks

While I’m on the topic of troubleshooting, I thought that I would add some tips on how to trace packets in Microsoft Azure.

The Problem

Here are a few scenarios, in descending order from most common, that I’ve been through over the years:

Remote Desktop To New VMs

You’ve just established a new site-to-site connection between a remote location and a (probably) new Azure network. The remote site admin complains:

No packets are getting through. Your Azure network is broken.

You know that everything in Azure is in good order, and you’re pretty sure the remote site firewall is blocking the traffic. However, many systems administrators jump to “the new Azure network is broken” when something doesn’t work – even if they configured their firewall to block that damned RDP traffic (it’s nearly always RDP in the first tests!).

Connecting to PaaS Services

You’ve deployed some PaaS services in Azure. Something can’t connect to them. The client might be in the same Virtual Network. Maybe it’s in another spoke that must route through a hub? Or maybe the client is in a remote site? The developer or operator is going to say:

We’re getting timeouts when we connect. Your network is broken.

So many things could be wrong here.

SSL Goes Wrong

SSL/PKI feels to me like a dinosaur technology that:

  • Most never learn
  • Few who did learn it never completely mastered it (I’m here, I’d estimate)
  • Those of us who did learn it have forgotten most of it

And modern application/network security is built on this deck of cards (from a knowledge perspective). I’ve seen a few scenarios:

My app gets a weird response from the database when it attempts to connect.

That one’s probably because something is reverse proxying the connection and something is going wrong in the connection – see Application Rules NATing east-west connections in Azure Firewall, causing the client IP to change.

How about this one I saw recently:

My application is failing to connect to a remote server.

When I dug into it, I saw that the TLS handshake was failing and the TCP connection was cleanly terminated. A self-signed certificate was to blame. Other scenarios I’ve seen are where Linux-based appliances fail the same handshake because the server cert doesn’t contain the full keychain. Tip: Windows LIES to you when it shows the whole keychain which it self-builds from the trusted publishers store on the machine. Most appliances require the full keychain in the cert, which many online CAs do not do by default. You’d be amazed how many weeks are wasted and repeated discusssions are had because of this.

But how have I proven this?

Complex Routing

Not everyone builds a simple hub-and-spoke. Sometimes there is a need for complexity. I had one of these a few years ago, where a customer required an ExpressRoute connection to a third-party data provider. The data provider mandated:

  • An ExpressRoute connection
  • The use of SNAT

The ExpressRoute Gateway doesn’t offer SNAT (unlike the VPN Gateway), so I had to conjure an interesting design. Luckily, I know Azure routing pretty well, and I tested this design in a lab. I was sure it would work – it did. But what if something went wrong? I would have had to troubleshoot what was happening.

The Need

What we need is:

  1. The ability to prove that packets are routing to confirm the infrastructure’s ability.
  2. Check how a PaaS resource has responded to connections.
  3. The ability to see inside those packets to investigate application-layer issues.

Folks, most of this is basic logging/querying. But there are a few tricks.

Packet Travel

I want to confirm that a packet reached A, then went to B, then got to the destination. For example:

  1. A packet from a remote client entered the hub and went through the firewall.
  2. It then routed across peering – GAH! More on this in a moment.
  3. And the packet routed through the destination spoke Virtual Network to reach the destination server – double GAH!

Before we proceed, I literally get session audiences to repeat the following 3 times each to enforce some basic knowledge:

Virtual Networks do not exist.

Subnets do not exist

Peering does not exist

Packets go directly from the source to the destination

This is why tracert is useless in Azure.

Understanding the above is halfway to mastering Azure networking. Please read this post before asking me questions or attempting to debate me on this topic of existence.

By the way, if you are using Azure Firewall, then (PowerShell) test-networkconnection is useful only to generate logs. The result may not be the actual result. Azure Firewall feeds “200” results from application rules, even when denying traffic. I always advise: generate the traffic and then check the logs.

Back to the topic …

The basic tool we need is a log of a packet or flow (a series of packets in a “conversation” between the client and server). Fortunately, we have a few sources of those.

The first is your firewall. Azure Firewall’s diagnostics settings send logs to your preferred destination. I prefer Log Analytics. You might prefer Splunk or similar. Potatoe Potahtoh. In Azure Firewall, the “decision making logs” include:

  • Threat Intelligence (an under-appreciated and oh-so useful feature)
  • IDPS
  • Network rules
  • Application rules

Log Analytics has a built-in query to search all those logs in a union. I can search for any combination of source IP, source port (not typically useful), protocol, destination IP, and destination port (very useful).

A third-party firewall has similar logs, often locked away in the previous grip of the firewall administrator. Sorry, I’m binge-watching Lord of the Rings, and I couldn’t help myself, firewall admins đŸ™‚ Some firewalls can make those logs more available to other Azure operators. For example, the Palo Alto Cloud NGFW has the ability to route logs, via Application Insights, into Log Analytics, where queries, dashboards, and workbooks can share that data. Nice!

The firewall logs will show me:

  • If packets entered the firewall
  • If those packets were allowed or denied

The simple mention of a flow from a client to a server in the firewall log means that packets made it there:

  • A spoke routed via the firewall to another spoke or a remote site.
  • Packets from a remote site passed successfully over a site-to-site network connection.

The firewall log is often my first port of call. Sometimes, however, it doesn’t go deep enough. There have been a number of times where I’ve been told something along the lines of:

I can ping VM X in Azure, but I cannot make a HTTPS connection to it.

I know from experience that they have made a successful connectionless ping (ICMP). But they have failed to make a connection-oriented (TCP) HTTPS request. The stateful firewall is blocking a response to the connection request because it never saw the original SYN. Thank you to my 3rd-year networking lecturer – I can picture the guy demonstrating a luggable PC to us around 1993, but I don’t remember his name. Experience has taught me that:

  • A route for the spoke network prefix is missing from the GatewaySubnet, and the request is bypassing the firewall.
  • A Private Endpoint has added a /32 route to the GatewaySunet (see network policies for Private Endpoint), and routing “long prefix match” has chosen that system route over your User-Defined Route for the spoke prefix.

For these crazy situations, you need to dig a little deeper into the firewall logs. I cannot speak for third-party firewalls here. Azure Firewall doesn’t capture dropped connections such as these. For that deep dive, we need Flow Trace logs to be enabled. Note that:

  • Enabling the logs does not enable the feature; this must be enabled using PowerShell.
  • The logs will be very detailed – and expensive to ingest into your monitoring solution. Only leave this feature enabled while troubleshooting the issue – set a calendar entry to unset it.

I haven’t had the opportunity to use this one in the real world personally, but I wonder if I sent those JSON logs to blob storage, could I download them to Copilot and get a reasonable response to my queries? Note to self.

Did the packet traverse a Virtual Network? Now you should know that’s a dumb question. The Azure fabric takes packets from source NICs and drops them into destination NICs. The correct question is: Did a packet reach the destination NIC?

The correct solution to answer that question today is Virtual Network Flow Logs with Traffic Analytics.

The wrong answer is the deprecated NSG Flow Logs. Virtual Network Flow Logs are current and capture much better data, including Private Endpoints.

Flow Logs will tell me about:

  • Outbound flows – did a packet leave a client?
  • Inbound flows – did a packet reach a server?
  • NSG Rules – what rule allowed/denied a connection?

Now I know if a connection:

  • Left an Azure client
  • Reached an Azure server
  • Was allowed or denied by an NSG rule

Flow Logs take time to generate:

  1. The logs will take 30+ seconds to be written to blob storage. Honestly, I’ve seen this take longer during the pandemic. I think MSFT might throttle monitoring when CPU usage is in high demand.
  2. Traffic Analytics is configured to run every 10 or 60 minutes. I prefer the 10-minute option.
  3. Log Analytics will take time to process the data. I was told many years ago to allow up to 15 minutes for NSG Flow Logs to be processed.

Between the firewall logs and the Virtual Network Flow Logs, I have visibility of the traffic. Or some of it.

PaaS Resources

A PaaS resource may be deployed with:

  • Public endpoint: Firewall or Virtual Network Flow Logs will show my traffic leaving my network, but not the last mile.
  • Private Endpoint: Private Endpoint NICs fool us, because the packet is sent directly by the fabric from the client NIC to the NIC of the machine hosting the PaaS resource instance. Virtual Network Flow Logs show us “connectability” but not the full connection.
  • VNet Injection and VNet Integration: The PaaS resources don’t really live in our Virtual Network. I know that it’s confusing.

Let me give you a working example. You have an App Service wth VNet Integration that is attempting to talk to a Key Vault with a Private Endpoint. We can see the flows in the previously discussed logs. But are the packets really getting to the Key Vault? What happens when the App Service attempts to access a secret?

The only answer to this is to enable the diagnostics settings in the Key Vault. Querying those logs in Log Analytics, Splunk, etc, will tell you exactly what’s going on:

  • Was there a connection?
  • Was the connection successful?
  • Why did the connection fail?

Packet Capture

Don’t get scared! I promise that packet capture is easier than ever now. I’ll explain later.

The results of a packet capture show you the contents (as much as encryption allows) of packets in a flow between a client and a server. This is super useful for investigating further. Let me explain two scenarios:

In the first scenario, we have proven that packets get from A to B, but the customer/developer/operator doesn’t accept that because their application is failing. If we know the packets from client to server, then we know the error is further up the stack – it’s an application configuration or authoring issue. The only way to prove to the other person is to show them the actual packets.

Network Watcher provides a feature called Packet Capture. The only place you need the free Wireshark client is on your PC to open the capture. Network Watcher will automatically add an Azure extension (agent) to the client/server VM, based on your Azure rights over that VM. You can capture all or filtered packets and save the resulting .CAP file to blob storage. Unfortunately, this ability is limited to VMs.

The second scenario is where we have a remote admin complaining about their failing RDP connection (it’s always this) over site-to-site networking. You’ve proven the traffic doesn’t reach the firewall/Azure VM. You know their firewall is blocking the outbound connection, but they won’t accept that. You have to prove that the traffic never crossed the site-to-site connection. You can enable packet capture on a VPN Virtual Network Gateway or a Virtual WAN VPN Gateway. This will ultimately prove that packets never got across the tunnel, and the remote admin must face the mirror.

Back to the scary part about packet captures. Who the heck can read those things? Not many of us can. I understand some basics, such as control flags like SYN, SYN-ACK, and RST. But what would I do if I had to really understand a packet capture? Enter Copilot or another AI:

  1. In Wireshark, click File > Export Packet Dissections > As JSON and select “Packet Range: All packets” and “Packet Format: JSON”. JSON is nice for AI to parse.
  2. Upload the capture to your AI and ask it your questions.

You’ll get an answer that you can work with. I used this recently for an application issue to help a (good guy) developer get to the root cause of an issue.

By the way:

  • ExpressRoute does not offer packet capture, but Traffic Collector provides a Flow Log experience.
  • Azure Firewall with a Management NIC (recommended by me for the last 1.5+ years) has packet capture.

Some Other Tricks

Network Watcher can be useful for doing some basic diagnostics:

  • IP flow verify: Checks whether a specific traffic flow would be allowed or denied by NSG rules.
  • NSG diagnostics: Analyse NSG rules across hops to identify which rule permits or blocks traffic.
  • Next hop: Identifies the next routing hop a packet will take from a selected VM.
  • Effective security rules: Displays the combined, active security rules applied to a network interface after all NSGs are merged.
  • VPN troubleshoot: Diagnoses issues with Azure VPN gateways and site‑to‑site or point‑to‑site tunnels.
  • Packet capture: Captures packet data directly from a VM’s network stack for deep traffic analysis.
  • Connection troubleshoot: Tests end‑to‑end connectivity between a VM and a target to identify routing or NSG issues.

Connection Troubleshoot is especially nice:

We can send a bunch of probe packets from a source to a destination and see if the connection was successful. If not, the tool gives you some indication why – keep in mind that remote destinations will result in vague failure reasoning because Azure doesn’t control remote locations.

The sources can be:

  • VMs and VM Scale Sets: Using the Network Watcher extension.
  • Application Gateway: Great for figuring out those pesky backend health issues and proving that the CA-provided cert (lacking the complete trust chain) is the cause of the failure.
  • Bastion Host: Bastion-to-VM connections can be a head-wrecker.

If there is a connection that is working but you consider to be critical, then I recommend using Connection Monitor in Network Watcher:

  • Works with any mix of Arc agents (non-Azure VMs) and Azure VMs – consider those remote site connections!
  • Model application connections.
  • Tests success and speed (latency).
  • Can trigger an alert/Action Group.

I used this a few years ago for a SaaS company that was using Placement Proximity Groups as a part of their need to minimise latency. I wanted proof of the platform performance, just in case. My colleague who wrote the Terraform for modelling the application in Connection Monitor probably didn’t like me for requiring this đŸ˜‰ I started seeing alerts one day, so I let the customer know that I was opening a support ticket with Microsoft. We found out that there was a physical issue with one network appliance, and Microsoft fixed it. Wow – not only were we monitoring our infrastructure and the application’s networking, but we were monitoring Azure’s physical network too!

Last Tool

The last tool is you, not Copilot. Honestly, Azure Copilot is not good at this stuff. I’ve tested it in my build labs, and it hasn’t a clue (thankfully for us IT pros). You need a combination of:

  • Experience: What’s most common?
  • Intuition: Listen to the customer – did they just mention a cert, for example?
  • Knowledge: Understanding how Azure networks function is critical – did you know that not setting the network policies for Private Endpoint in your subnet causes asynchronous routing in the firewall?

Using your tools will better prepare you to use the above Azure tools.

If You Liked This …

Maybe you liked this post and are wondering: “Could Aidan help me?” Maybe I can through my company, Cloud Mechanix. Whether you need a review, design something, figure out some issue, do a large deployment, or figure out why the cloud is not working for your organisation, I can help – and other things too. Cloud Mechanix works with large and small organisations and service providers throughout Europe. Check out the site, and contact me if you are interested.

Azure Virtual WAN ARM – The Resources

In this post, I will explain the types of resources used in Azure Virtual WAN and the nature of their relationships.

Note, I have not included any content on the recently announced preview of third-party NVAs. I have not seen any materials on this yet to base such a post on and, being honest, I don’t have any use-cases for third-party NVAs.

As you can see – there are quite a few resources involved … and some that you won’t see listed at all because of the “appliance-like” nature of the deployment. I have not included any detail on spokes or “branch offices”, which would require further resources. The below diagram is enough to get a hub operational and connected to on-premises locations and spoke virtual networks.

The Virtual WAN – Microsoft.Network/virtualWans

You need at least one Virtual WAN to be deployed. This is what the hub will connect to, and you can connect many hubs to a common Virtual WAN to get automated any-to-any connectivity across the Microsoft physical WAN.

Surprisingly, the resource is deployed to an Azure region and not as a global resource, such as other global resources such as Traffic Manager or Azure DNS.

The Virtual Hub – Microsoft.Network/virtualHubs

Also known as the hub, the Virtual Hub is deployed once, and once only, per Azure region where you need a hub. This hub replaces the old hub virtual network (plus gateway(s), plus firewall, plus route tables) deployment you might be used to. The hub is deployed as a hidden resource, managed through the Virtual WAN in the Azure Portal or via scripting/ARM.

The hub is associated with the Virtual WAN through a virtualWAN property that references the resource ID of the virtualWans resource.

In a previous post, I referred to a chicken & egg scenario with the virtualHubs resource. The hub has properties that point to the resource IDs of each deployed gateway:

  • vpnGateway: For site-to-site VPN.
  • expressRouteGateway: For ExpressRoute circuit connectivity.
  • p2sVpnGateway: For end-user/device tunnels.

If you choose to deploy a “Secured Virtual Hub” there will also be a property called azureFirewall that will point to the resource ID of an Azure Firewall with the AZFW_Hub SKU.

Note, the restriction of 1 hub per Azure region does introduce a bottleneck. Under the covers of the platform, there is actually a virtual network. The only clue to this network will be in the peering properties of your spoke virtual networks. A single virtual network can have, today, a maximum of 500 spokes. So that means you will have a maximum of 500 spokes per Azure region.

Routing Tables – Microsoft.Network/virtualHubs/hubRouteTables & Microsoft.Network/virtualHubs/routeTables

These are resources that are used in custom routing, a recently announced as GA feature that won’t be live until August 3rd, according to the Azure Portal. The resource control the flows of traffic in your hub and spoke architecture. They are child-resources of the virtualHubs resource so no references of hub resource IDs are required.

Azure Firewall – Microsoft.Network/azureFirewalls

This is an optional resource that is deployed when you want a “Secured Virtual Hub”. Today, this is the only way to put a firewall into the hub, although a new preview program should make it possible for third-parties to join the hub. Alternatively, you can use custom routing to force north-south and east-west traffic through an NVA that is running in a spoke, although that will double peering costs.

The Azure Firewall is deployed with the AZFW_Hub SKU. The firewall is not a hidden resource. To manage the firewall, you must use an Azure Firewall Policy (aka Azure Firewall Manager). The firewall has a property called firewallPolicy that points to the resource ID of a firewallPolicies resource.

Azure Firewall Policy – Microsoft.Network/firewallPolicies

This is a resource that allows you to manage an Azure Firewall, in this case, an AZFW_Hub SKU of Azure Firewall. Although not shown here, you can deploy a parent/child configuration of policies to manage firewall configurations and rules in a global/local way.

VPN Gateway – Microsoft.Network/vpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using VPN. The VPN Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the VPN gateway resource ID using a property called vpnGateway.

ExpressRoute Gateway – Microsoft.Network/expressRouteGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using ExpressRoute. The ExpressRoute Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the ExpressRoute gateway resource ID using a property called p2sGateway.

Point-to-Site Gateway – Microsoft.Network/p2sVpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides users/devices with connectivity using VPN tunnels. The Point-to-Site Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

The Point-to-Site Gateway inherits a VPN configuration from a VPN configuration resource based on Microsoft.Network/vpnServerConfigurations, referring to the configuration resource by its resource ID using a property called vpnServerConfiguration.

Note that the virtualHubs resource must also point at the resource ID of the Point-to-Site gateway resource ID using a property called p2sVpnGateway.

VPN Server Configuration – Microsoft.Network/vpnServerConfigurations

This configuration for Point-to-Site VPN gateways can be seen in the Azure WAN and is intended as a shared configuration that is reusable with more than one Point-to-Site VPN Gateway. To be honest, I can see myself using it as a per-region configuration because of some values like DNS servers and RADIUS servers that will probably be placed per-region for performance and resilience reasons. This is a hidden resource.

The following resources were added on 22nd July 2020:

VPN Sites – Microsoft.Network/vpnSites

This resource has a similar purpose to a Local Network Gateway for site-to-site VPN connections; it describes the on-premises location, AKA “branch office”.  A VPN site can be associated with one or many hubs, so it is actually connected to the Virtual WAN resource ID using a property called virtualWan. This is a hidden resource.

An array property called vpnSiteLinks describes possible connections to on-premises firewall devices.

VPN Connections – Microsoft.Network/vpnGateways/vpnConnections

A VPN Connections resource associates a VPN Gateway with the on-premises location that is described by an associated VPN Site. The vpnConnections resource is a child resource of vpnGateways, so there is no actual resource; the vpnConnections resource takes its name from the parent VPN Gateway, and the resource ID is an extension of the parent VPN Gateway resource ID.

By necessity, there is some complexity with this resource type. The remoteVpnSite property links the vpnConnections resource with the resource ID of a VPN Site resource. An array property, called vpnSiteLinkConnections, is used to connect the gateway to the on-premises location using 1 or 2 connections, each linking from vpnSiteLinkConnections to the resource/property ID of 1 or 2 vpnSiteLinks properties in the VPN Site. With one site link connection, you have a single VPN tunnel to the on-premises location. With 2 link connections, the VPN Gateway will take advantage of its active/active configuration to set up resilient tunnels to the on-premises location.

Virtual Network Connections – Microsoft.Network/virtualHubs/hubVirtualNetworkConnections

The purpose of a hub is to share resources with spoke virtual networks. In the case of the Virtual Hub, those resources are gateways, and maybe a firewall in the case of Secured Virtual Hub. As with a normal VNet-based hub & spoke, VNet peering is used. However, the way that VNet peering is used changes with the Virtual Hub; the deployment is done using the hub/VirtualNetworkConnections child resource, whose parent is the Virtual Hub. Therefore, the name and resource ID are based on the name and resource ID of the Virtual Hub resource.

The deployment is rather simple; you create a Virtual Network Connection in the hub specifying the resource ID of the spoke virtual network, using a property called remoteVirtualNetwork. The underlying resource provider will initiate both sides of the peering connection on your behalf – there is no deployment required in the spoke virtual network resource. The Virtual Network Connection will reference the Hub Route Tables in the hub to configure route association and propagation.

More Resources

There are more resources that I’ve yet to document, including:

Azure Virtual WAN ARM – Secured Virtual Hub Azure Firewall

I have spent quite a few hours figuring out how to deploy Azure’s new Secured Virtual Hub, an extension of Azure Virtual WAN, deployed using ARM templates (JSON). A lot of the bits are either not documented or incorrectly documented. One of the frustrating bits to deploy was the Azure Firewall resource – and the online examples did not help.

The issue was that the 2 sources I could find did not include public IP addresses on the firewall:

  • The quick start for Secured Virtual Hub on docs.microsoft.com
  • The new Enterprise-Scale “well-architected” Framework, found in Cloud Adoption Framework

Digging to solve that uncovered:

  • The examples used quite an old API version, 2019-08-01, to deploy the Microsoft.Network/azureFirewalls resource.
  • There was no example of how to add a public IP address to the firewall in Secured Virtual Hub because it was not possible with that API – SVH is quite different from a VNet deployment because you do have direct access to the underlying hub virtual network.
  • Being an old API, we lose features such as SNAT for non-RFC1918 addresses (important in universities and public sector) and the newer custom & proxy DNS features.

In my digging, I did uncover that the ARM reference for the Azure Firewall was incorrect, but I did uncover a new, barely-documented property called hubIPAddresses; I knew this property was the key to solving the public IP address issue. So I thought about what was going on and how I was going to solve it.

I ended up doing what I would normally do if I did not have a quick start template to start with:

  1. Deploy the resource(s) by hand in the Azure Portal
  2. Observe the options – there was a slide control for the quantity of firewall public IP addresses
  3. Export the resulting template

And … there was the solution:

  1. There is a new, undocumented API version for the Azure Firewall resource: 2020-05-01
  2. There is a new object property called hubIPAddresses that contains an object sub-property called publicIps. You can set a string value called count to control how many public IP addresses that Azure will assign (on your behalf) to the firewall – you do not need to create the public IP address resources.
        "hubIPAddresses": {
          "publicIPs": {
            "count": "[parameters('firewallPublicIpQuantity')]",
          }
        }

Sorted!

Azure Virtual WAN Introducing A New Kind Of Route Table

In this post, I will quickly introduce you to a new kind of Route Table in Microsoft Azure that has been recently introduced by Azure Virtual WAN – and hence is included in the newly generally available Secured Virtual Hub.

The Old “Subnet” Route Table

This Route Table, which I will call “Subnet Route Table” (derived from the ARM name) is a simple resource that we associate with a subnet. It contains User-Defined Routes that force traffic to flow in desirable directions, typically when we use some kind of firewall appliance (Azure Firewall or third-party) or a third-party routing appliance. route The design is simple enough:

  • Name: A user-friendly name
  • Prefix: The CIDR you want to get to
  • Next Hop Type: What kind of “router” is the next hop, e.g. Virtual Network, Internet, or Virtual Appliance
  • Next Hop IP Address: Used when Next Hop Type is Virtual Appliance (any firewall or third-party router)

Azure Virtual WAN Hub

Microsoft introduced Azure Virtual WAN quite a while ago (by Cloud standards), but few still have heard of it, possibly because of how it was originally marketed as an SD-WAN solution compatible originally with just a few on-prem SD-WAN vendors (now a much bigger list). Today it supports IKEv1 and IKEv2 site-to-site VPN, point-to-site VPN, and ExpressRoute Standard (and higher). You might already be familiar with setting up a hub in a hub-and-spoke: you have to create the virtual network, the Route Table for inbound traffic, the firewall, etc. Azure Virtual WAN converts the hub into an appliance-like experience surfacing just two resources: the Virtual WAN (typically 1 global resource per organisation) and the hub (one per Azure region). Peering, routing, connectivity are all simplified.

A more recent change has been the Secured Virtual Hub, where Azure Firewall is a part of the Virtual WAN Hub; this was announced at Ignite and has just gone GA. Choosing the Secured Virtual Hub option adds security to the Virtual WAN Hub. Don’t worry, though, if you prefer a third-party firewall; the new routing model in Azure Virtual WAN Hub allows you to deploy your firewall into a dedicated spoke virtual network and route your isolated traffic through there.

The New Route Tables

There are two new kinds of route table added by the Virtual WAN Hub, or Virtual Hub, both of which are created in the Virtual Hub as sub-resources.

  • Virtual Wan Hub Route Table
  • Virtual WAN Route Table

Virtual WAN Hub Route Table

The Virtual Hub Hub Route Table affects traffic from the Virtual Hub to other locations.  A possible scenario is when you want to route traffic to a CIDR block of virtual network(s) through a third-party firewall (network virtual appliance/NVA):

AzureVirtualWanHubHubRouteTable

The routing rule setup here is similar to the Subnet Route Table, specifying where you want to get to (CIDR, resource ID, or service), the next hop, and a next hop IP address.

Virtual WAN Route Table

The Virtual WAN Route Table is created as a sub resource of the Virtual Hub but it has a different purpose. The Virtual Hub is assigned to connections and affects routing from the associated branch offices or virtual networks. Whoa, Finn! There is a lot of terminology in that sentence!

A connection is just that; it is a connection between the hub and another network. Each spoke connected directly to the hub has a connection to the hub – a Virtual WAN Route Table can be associated with each connection. A Virtual WAN Route Table can be associated with 1 virtual network connection, a subset of them, or all of them.

The term “branch offices” refers to sites connected by ExpressRoute, site-to-site VPN, or point-to-site VPN. Those sites also have connections that a Virtual WAN Route Table can be associated with.

This is a much more interesting form of route table. I haven’t had time to fully get under the covers here, but comparing ARM to the UI reveals two methodologies. The Azure Portal reveals one way of visualising routing that I must admit that I find difficult to scale in my mind. The ARM resource looks much more familiar to me, but until I get into a lab and fully test (which I hope I will find some hours to do soon), I cannot completely document.

Here are the basics of what I have gleaned from the documentation, which covers the Azure Portal method:

The linked documentation is heavy reading. I’m one of those people that needs to play with this stuff before writing too much in detail – I never trust the docs and, to be honest, this content is complicated, as you can see above.