Networking | Aidan Finn, IT Pro

Azure & Oracle Cloud Interconnect

This post will explain how you can connect your Azure network(s) with Oracle Cloud Infrastructure (OCI) via the Oracle Cloud Interconnect.

Background

Many mid-large organisations run applications that are based on Oracle software. When these organisations move to the cloud, they may choose to use Oracle Cloud for their Oracle workloads and Azure for everything else.

But that raises some interesting questions:

How do we connect Azure workloads to Oracle workloads?
If Oracle is hosting data services, how do we minimise latency?

The answer is: The Oracle Cloud Interconnect (OCI).

Azure ExpressRoute and Oracle FastConnect

Microsoft and Oracle are inter-connected via their respective private “site-to-site” connection mechanisms:

Azure: ExpressRoute
Oracle: FastConnect

This is achieved by both service providers sharing a “meet me” location where each cloud’s edge networks allow a “cross-connection”. So, there is no need to contact an ISP to lease an ExpressRoute circuit. The circuit already exists. There is no need to sign a circuit contract. The ISP is “Oracle” and you pay for the usage of it – in the case of Azure by paying for the ExpressRoute circuit Azure resource.

Location, Location, Location

The inter-connect mechanism is obviously play a role in where you can deploy your ExpressRoute Circuit and FastConnect resource. But performance also comes into play here – latency must be kept to a minimum. As a result, there is a support restriction on which Azure/Oracle regions can be inter-connected and where the circuit must be terminated.

At the time of writing, the below list was published by Microsoft:

What does this?

Let’s imagine that we are using OCI Amsterdam. If we want to connect Azure to it then we must use Azure West Europe.

Now, what about keeping that latency low? The trick there is in selecting a Peering Location that is closeby. Note that the Oracle docs do a better job at defining the Azure peering location (see under Availability).

In my scenario, the peering location would be Amsterdam2. According to Microsoft:

Connectivity is only possible where an Azure ExpressRoute peering location is in proximity to or in the same peering location as the OCI FastConnect.

That means you must always keep the following close to be able to use this solution:

The Oracle Cloud Infrastructure region
The Azure region
The peering location of the ExpressRoute circuit & FastConnect circuit

Configuring ExpressRoute

You have few options to decide between. The first is the SKU of ExpressRoute that you will choose.

Type

Billing

Connections

Local

Unlimited

1 or 2 Azure regions in the same metro as the peering location.

Standard

Metered or Unlimited

Up to 10 connection in the same geo zone as the peering location.

You also have to choose one of the supported speeds for this solution: 1, 2, 5, or 10 Gbps.

The ISP will be Oracle Cloud FastConnect.

So do you choose Local or Standard? I think that really comes down to balancing the cost. Local has unlimited data transfer but it is billed based on bandwidth. The entry cost per month in Zone 1 is €1,111.27/month with 1 Gbps and unlimited data transfer.

The entry point for a Standard metered plan is €403.76/month. That is €707.51 cheaper than the Local SKU but that savings has to cover your outbound data transfer cost in Azure. At €0.024/GB, that leaves you with (707.51/0.024) 29,479 GB of outbound data transfer per month until the Local SKU is more affordable.

The safe tip here is choose Local, monitor data usage, and consider jumping to Standard if you are using a small enough amount of outbound data transfer to make the metered Standard SKU more affordable.

Note that you can upgrade from Local but you cannot downgrade to Local.

Getting Connected (From Azure)

I’ll talk about the Azure side of things because that’s what I know. I will cover a little bit about Oracle, from what I have learned.

You will need an ExpressRoute Gateway in the selected Azure region. Then you will create an ExpressRoute Circuit in the same region:

Your chosen SKU/billing model.
The speed from 1, 2, 5, or 10 Gbps.
The Provider is Oracle Cloud FastConnect.
The peering location from the Oracle docs.

Retrieve the service key and then continue the process in the OCI portal. There is one screen that is very confusing: configuring the BGP addresses.

You are going to need two /30 prefixes that are not used in your OCI/Azure networks. I’m going to use 192.168.0.0/30 and 192.168.0.4/32 for my example. You need two prefixes because Azure and Oracle are running highly available resources under the covers. The ExpressRoute Gateway is two active/active compute instances. Each will require an IP address to advertise/receive addresses prefixes via BGP from the OCI gateway, and vice versa.

What addresses do you need? Oracle requires you to enter:

Customer (Azure) BGP IP Address 1
Oracle BGP IP Address 1
Customer (Azure) BGP IP Address 2
Oracle BGP IP Address 2

Here’s how you calculate them:

Customer (Azure) BGP IP Address 1: Usable IP #2 from Prefix 1
Oracle BGP IP Address 1: Usable IP #1 from Prefix 1.
Customer (Azure) BGP IP Address 2: Usable IP #2 from Prefix 2
Oracle BGP IP Address 2: Usable IP #1 from Prefix 1

The below is not the final answer yet! But we’re getting there. That would lead us to caclulating:

Customer BGP IP Address 1: 192.168.0.2
Oracle BGP IP Address 1: 192.168.0.1
Customer BGP IP Address 2: 192.168.0.6
Oracle BGP IP Address 2: 192.168.0.5

But the Oracle GUI has an illogical check and will tell you that those addresses are wrong. They are correct – it’s just the Oracle GUI is broken by design! Here is what you need to enter:

Customer BGP IP Address 1: 192.168.0.2/30
Oracle BGP IP Address 1: 192.168.0.1/30
Customer BGP IP Address 2: 192.168.0.6/30
Oracle BGP IP Address 2: 192.168.0.5/30

You finish the process and wait a little bit. The ExpressRoute circuit will eventually change status to Provisioned. Now you can create a connection between the circuit and the ExpressRoute Gateway. When I did it, the Private Peering was automatically configured, using 192.168.0.0/30 and 192.168.04/30 as the peering subnets.

Check your ARP records and route tables in the circuit (under Private Peering) and you should see that Oracle has propagated its known addresses to your Azure ExpressRoute Gateway, and on to any subnets that are not blocking propagation from the gateway.

And that’s it!

Other Support Things

The following Oracle services are supported:

E-Business Suite
JD Edwards EnterpriseOne
PeopleSoft
Oracle Retail applications
Oracle Hyperion Financial Management

Naturally, your OCI and Azure networks must not have overlapping prefixes.

You can do transitive routing. For example, you can route through the interconnect to an Oracle network and then on to a peered Oracle network (a hub and spoke).

You cannot use the interconnect to route to on-premises from Azure or from OCI.

Default Outbound Access For VMs In Azure Will Be Retired

Microsoft has announced that the default route, an implicit public IP address, is being deprecated 30 September 2025.

Background

Let’s define “Internet” for the purposes of this post. The Internet includes:

The actual Internet.
Azure services, such as Azure SQL or Azure’s KMS for Windows VMs, that are shared with a public endpoint (IP address).

We have had ways to access those services, including:

Public IP address associated with a NIC of the virtual machine
Load Balancer with a public IP address with the virtual machine being a backend
A NAT Gateway
An appliance, such as a firewall NVA or Azure firewall, being defined as the next hop to Internet prefixes, such as 0.00.0/0

If a virtual machine is deployed without having any of the above, it still needs to reach the Internet to do things like:

Activate a Windows license against KVM
Download packages for Ubuntu
Use Azure services such as Key Vault, My SQL for Azure SQL, or storage accounts (diagnostics settings)

For that reason, all Azure virtual machines are able to reach the Internet using an implied public IP address. This is an address that is randomly assigned to SNAT the connection out from the virtual machine to the Internet. That address:

Is random and can change
Offers no control or security

Modern Threats

There are two things that we should have been designing networks to stop for years:

Malware command and control
Data exfiltration

The modern hack is a clever and gradual process. Ransomware is not some dumb bot that gets onto your network and goes wild. Some of the recent variants are manually controlled. The malware gets onto the network and attempts to call home to a “machine” on the Internet. From there, the controllers can explore the network and plan their attack. This is the command and control. This attempt to “call home” should be blocked by network/security designs that block outbound access to the Internet by default, opening only connections that are required for workloads to function.

The controller will discover more vulnerabilities and download more software, taking further advantage of vulnerable network/security designs. Backups are targeted for attack first, data is stolen, and systems are crippled and encrypted.

The data theft, or exfiltration, is to an IP address that a modern network/security design would block.

So you can see, that a network design where an implied public IP address is used is not a good practice. This is a primary consideration for Microsoft in making its decision to end the future use of implied public IP addresses.

What Is Happening?

On September 30th, all future virtual machines will no longer be able to use an implied public IP address. Existing virtual machines will be unaffected – but I want to drill into that because it’s not as simple as one might think.

A virtual machine is a resource in Azure. It’s not some disks. It’s not your concept of “I have something called X” that is a virtual machine. It’s a resource that exists. At some point, that resource might be removed. At that point, the virtual machine no longer exists, even if you recreate it with the exact same disks and name.

So keep in mind:

Virtual networks with existing VMs: The existing VMs are unaffected, but new VMs in the VNet will be affected and won’t work.
Scale-out: Let’s say you have a big workload with dozens of VMs with no public IP usage. You add more VMs and they don’t work – it’s because they don’t have an implied IP address, unlike their older siblings.
Restore from backup: You restore a VM to create a new VM. The new VM will not have an implied public IP address.

Is This a Money Grab?

No, this is not a money grab. This is an attempt by Microsoft to correct a “wrong” (it was done to be helpful to cloud newcomers) that was done in the original design. Some of the mitigations are quite low-cost, even for small businesses. To be honest, what money could be made here is pennies compared to the much bigger money that is made elsewhere by Azure.

The goal here is to:

Be secure by default by controlling egress traffic to limit command & control and data exfiltration.
Provide more control over egress flows by selecting the appliance/IP address that is used.
Enable more visibility over public IP addresses, for example, what public address should I share with a partner for their firewall rules?
Drive better networking and security architectures by default.

What Is Your Mitigation?

There are several paths that you can choose.

Assign a public IP address to a virtual machine: This is the lowest cost option but offers no egress security. It can get quite messy if multiple virtual machines require public IP addresses. Rate this as “better than nothing”.
Use a NAT Gateway: This allows a single IP address (or a range from an Azure Public IP Address Prefix) to be shared across an entire subnet. Note that NAT Gateway gets messy if you span availability zones, requiring disruptive VNet and workload redesign. Again this is not a security option.
Use a next hop: You can use an appliance (virtual machine or Marketplace network virtual appliance) or the Azure Firewall as a next hop to the Internet (0.0.0.0/0) or specific Internet IP prefixes. This is a security option – a firewall can block unwanted egress traffic. If you are budget-conscious, then consider Azure Firewall Basic. No matter what firewall/appliance you choose, there will be some subnet/VNet redesign and changes required to routing, which could affect VNet-integrated PaaS services such as API Management Premium.

September 2025 is a long time away. But you have options to consider and potentially some network redesign work to do. Don’t sit around – start working.

In Summary

The implied route to the Internet for Azure VMs will stop being available to new VMs on September 30th, 2025. This is not a money grab – you can choose low-cost options to mitigate the effects if you wish. The hope is that you opt to choose better security, either from Microsoft or a partner. The deadline is a long time away. Do not assume that you are not affected – one day you will expand services or restore a VM from backup and be affected. So get started on your research & planning.

Azure WAF and False Positives

This post will explain how to override false positives in the (network) Azure Web Application Firewall (WAF), without compromising security, using one of four methods in combination with a tiered WAF Policy architecture:

Managed Rulesets
Custom Rules
Exclusions
Disabled rules

False Positives

A WAF is a rather simple solution, attempting to inspect L7 (application layer) traffic and intercept attacks such as protocol misuse, SQL injection, or cross-site scripting. Unfortunately, false positives can occur.

For example, let’s assume that an API app is securely shared using a WAF. Messages sent to the API might be formatted in JSON, with lots of special characters to format the message. SQL Inspection defenses count special characters, trying to find where an attacker is trying to escape out of a web request to create a database command that will execute. If the defense counts too many special characters (it will!) then an alert will be created and the message will be blocked if Prevention mode is enabled.

One must allow that traffic through because it is expected traffic that the application (and the business) requires. But one must do this without opening up too many holes in the WAF, making the WAF a costly, pointless existence.

Log Analytics Ingestion Charge

There is a side effect to false positives. False positives will vastly outnumber actual attack/probing attempts. Busy workloads can generate huge amounts of logs for false positives. If you use Log Analytics, that data has a cost:

Storage: Not too bad
Ingestion: This one is painful

The way to reduce the cost is to reduce the noise by overriding the detections that create false positives. Organizations that have a lot of web traffic could save a significant amount of money here.

WAF Policies

The WAF functionality of the Azure Application Gateway (AppGw) is managed by a resource called an Application Gateway WAF Policy (WAF Policy). The typical approach is to associate 1 WAF Policy with a WAF resource. The WAF policy will create customizations. For reasons that should become apparent later, I am going to urge you to take a slightly more granular approach to manage your WAF if your WAF is used to securely share more than one workload or listener:

WAF parent policy: A WAF policy will be associated with the WAF. This policy will apply to the WAF and all listeners unless another WAF Policy overrides specific settings.
Per-Listener/Per-Workload policy: This is a policy that is created specifically for a listener or a workload (a set of listeners). Any customisations that apply only to a listener or a workload will be applied here, without affecting any other listener or workload.

Methodology

You will never know what false positives you will encounter. If your WAF goes straight into Prevention mode then you will create a world of pain and be the recipient of a lot of hate-messages/emails.

Here’s the approach that I recommend:

Protect your WAF with an NSG that has Traffic Analytics enabled. The NSG should only allow the necessary HTTP, HTTPS, WAF monitoring (from Azure), and load balancing traffic. Use a custom deny-all rule to block everything else.
Enable monitoring for the Application Gateway, sending all logs to a queryable destination such as Log Analytics.
Monitor traffic for a period of time – enough to allow expected normal usage of the full systems. Your monitoring should detect the false positives.
Verify that Traffic Analytics did not record malicious IP addresses hitting your WAF.
Query your monitoring data to find the false positives for each listener. Identify the hostname, request URI, ruleset, rule group, and rule ID that is causing the issue on a per-listener/workload basis.
Ideally, developers fix any issues that create false positives but this is unlikely – so we’ll move on.
Determine your override strategy (see below).
Deploy your overrides with the policies still in Detection mode.
Monitor traffic for another period of time to ensure that there are no more false positives.
Switch the parent policy to Prevention Mode.
Swith each per-listener/per-workload policy to Prevention Mode
Monitor

Managed Rule Sets

The WAF today has two rulesets that you can use:

OWASP: Used to detect attacks such as SQL Injection, Cross-site scripting, and so on.
Microsoft Bot Manager Rule Set: Used to prevent malicious bots from browsing/attacking your workloads.

You need the OWASP ruleset – but we will need to manage it (later). The bot ruleset, in my experience, creates a huge amount of noise will no way of creating granular overrides. One can override the bot ruleset using custom rules, but as you’ll see later, that’s a big stick that is not granular at all!

My approach to this is to disable the Microsoft Bot Manager Rule Set (or leave it disabled) in the parent and child rulesets. If I have a need to enable it somewhere, I can do it in a per-listener or per-workload ruleset.

Custom Rules

A custom rule is created in a WAF Policy to force traffic that matches certain criteria to be:

Always allowed
Always denied
Logged only without denying it

You can create a sequence of filters based on:

IP Address
Number
String
Geo Location

If the set of filters matches a request then your desired action will apply. For example, if I want to force traffic to be allowed to my API, I can enter the API URI as one of the filters (as above) and all traffic will be allowed.

Yes, all traffic will be allowed, including traffic that is not a false positive. If I only had a few OWASP rules that were blocking the traffic, the custom rule would disable all OWASP rules.

If you must use this approach, then implement it in the child policy so it is limited to the associated listener/workload.

Exclusions

This is the newest of the override types in WAF Policy – and I’ve found it to be the least useful.

The theory is that you can create an exclusion for one or more OWASP rules based on the values of request headers. For example, if a header called RequestHeaderKeys contains a value of X-Scanner you can instruct the affected OWASP rules to be disabled. This sounds really powerful and quite granular. But this starts to fall apart with other scenarios, such as the aforementioned SQL Injection.

Another common rule that alerts on or blocks traffic is Missing User Agent Header. Exclusions work on the value of a header, so if the header is missing, Exclusions cannot evaluate it.

Another gotcha is that you cannot combine header filters to create an exclusion. The Azure Portal experience for creating an Exclusion makes it look like you can. However, the result is two or more Exclusions that work independently.

If Exclusions will work for you, implement them in the per-listener/per-workload policy and specify only the rules that must be overridden. This approach will limit the effect of the exclusion:

The scope is just the listener/workload that is associated with the WAF Policy.
The scope is further limited to just requests where the header matches, allowing all other requests and all OWASP rules to be applied.

Disabled Rules

The final approach that you can use is to disable rules that are creating false positive alerts. A simple workload might only require one or two rules to be disabled. An older & larger workload might require many OWASP rules to be disabled!

If you are going to disable OWASP rules, then do it in the per-listener/per-workload policy. This will limit the effect of the changes to that listener/workload.

This is a fairly each approach and it is pretty granular – not as much as Exclusions. The downside is that you are completely disabling certain protections for an entire listener/workload, leaving the workload vulnerable to attacks of those previously protected types.

Combinations

If you have the time and the data, you can combine different approaches. For example:

A webhook that comes from the same IP address all of the time can be allowed via a Custom Rule based on an IP Address filter. Any other traffic will be subject to the fill defenses of the WAF.
If you have certain headers that must be allowed and you want to enable all other protections for all other traffic then use Exclusions.
If traffic can come from anywhere and you need to override OWASP rules, then disable those rules.

No Great Solution

In summary, there is no perfect solution. The best you can do is find the correct override solution for the specific false positive and deploy it to a specific listener or workload. This will limit the holes that you create in the WAF to the absolute minimum while enabling your workloads to function.

Referencing Private Endpoint IP Addresses In Terraform

It is possible to dynamically retrieve the resulting IP address of an Azure Private Endpoint and use it in other resources in Terraform. This post will show you how.

Scenario

You are building some PaaS resources using Private Endpoints. You have no idea what the IP addresses are going to be. But you need to use those IP addresses elsewhere in your Terraform code, for example in an NSG rule. How do you get the IP addresses?

Find The Properties

The trick for this is to use the terraform state command. In my case, I deployed a Cosmos DB resource using azurerm_private_endpoint.cosmosdb-account1. To view the state of the resource, I can run:

terraform state show azurerm_private_endpoint.cosmosdb-account1

That outputs a bunch of code:

You can think of the exposed state as a description of the resource the moment after it was deployed. Everything in that state is addressable. A common use might be to refer to the resource ID (azurerm_private_endpoint.cosmosdb-account1.id) or resource name (azurerm_private_endpoint.cosmosdb-account1.name) properties. But you can also get other properties that you don’t know in advance.

The Solution

Take another look at the above diagram. There is an array property called private_dns_zone_configs that has one item. We can address this property as azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].

In there there is another array property, with two items, called record_sets. There is one record set per IP address created for this private endpoint. We can address these properties as azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0] and azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].

Cosmos DB creates a private endpoint with multiple different IP addresses. I deliberately chose Cosmos DB for this example because it shows a more complex probelm and solution, demonstrating a little bit more of the method.

Dig into record_sets and you’ll find an array property called ip_addresses with 1 item. If I want the two IP addresses of this private endpoint then I will use: azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0].ip_addresses[0] and azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].ip_addresses[0].

Using the Addresses

destination_address_prefixes = [
 azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0].ip_addresses[0], // Cosmos DB Private Endpoint IP 1
 azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].ip_addresses[0] // Cosmos DB Private Endpoint IP 2
 ]                       
}

And now I have code that will deploy an NSG rule with the correct destination IP address(es) of my private endpoint without knowing them. And even better, if something causes the IP address(es) to change, I can rerun my code without changing it, and the rules will automatically update.

Azure Firewall DevSecOps in Azure DevOps

In this post, I will share the details for granting the least-privilege permissions to GitHub action/DevOps pipeline service principals for a DevSecOps continuous deployment of Azure Firewall.

Quick Refresh

I wrote about the design of the solution and shared the code in my post, Enabling DevSecOps with Azure Firewall. There I explained how you could break out the code for the rules of a workload and manage that code in the repo for the workload. Realistically, you would also need to break out the gateway subnet route table user-defined route (legacy VNet-based hub) and the VNet peering connection. All the code for this is shared on GitHub – I did update the repo with some structure and with working DevOps pipelines.

This Update

There were two things I wanted to add to the design:

Detailed permissions for the service principal used by the workload DevOps pipeline, limiting the scope of change that is possible in the hub.
DevOps pipelines so I could test the above.

The Code

You’ll find 3 folders in the Bicep code now:

hub: This deploys a (legacy) VNet-based hub with Azure Firewall.
customRoles: 4 Azure custom roles are defined. This should be deployed after the hub.
spoke1: This contains the code to deploy a skeleton VNet-based (spoke) workload with updates that are required in the hub to connect the VNet and route ingress on-prem traffic through the firewall.

DevOps Pipelines

The hub and spoke1 folders each contain a folder called .pipelines. There you will find a .yml file to create a DevOps pipeline.

The DevOps pipeline uses Azure CLI tasks to:

Select the correct Azure subscription & create the resource group
Deploy each .bicep file.

My design uses 1 sub for the hub and 1 sub for the workload. You are not glued to this bu you would need to make modifications to how you configure the service principal permissions (below).

To use the code:

Create a repo in DevOps for (1 repo) hub and for (1 repo) spoke1 and copy in the required code.
Create service principals in Azure AD.
Grant the service principal for hub owner rights to the hub subscription.
Grant the service principal for the spoke owner rights to the spoke subscription.
Create ARM service connections in DevOps settings that use the service principals. Note that the names for these service connections are referred to by azureServiceConnection in the pipeline files.
Update the variables in the pipeline files with subscription IDs.
Create the pipelines using the .yml files in the repos.

Don’t do anything just yet!

Service Principal Permissions

The hub service principal is simple – grant it owner rights to the hub subscription (or resource group).

The workload is where the magic happens with this DevSecOps design. The workload updates the hub suing code in the workload repo that affects the workload:

Ingress route from on-prem to the workload in the hub GatewaySubnet.
The firewall rules for the workload in the hub Azure Firewall (policy) using a rules collection group.
The VNet peering connection between the hub VNet and the workload VNet.

That could be deployed by the workload DevOps pipeline that is authenticated using the workload’s service principal. So that means the workload service principal must have rights over the hub.

The quick solution would be to grant contributor rights over the hub and say “we’ll manage what is done through code reviews”. However, a better practice is to limit what can be done as much as possible. That’s what I have done with the customRoles folder in my GitHub share.

Those custom roles should be modified to change the possible scope to the subscription ID (or even the resource group ID) of the hub deployment. There are 4 custom roles:

customRole-ArmValidateActionOperator.json: Adds the CUSTOM – ARM Deployment Operator role, allowing the ARM deployment to be monitored and updated.
customRole-PeeringAdmin.json: Adds the CUSTOM – Virtual Network Peering Administrator role, allowing a VNet peering connection to be created from the hub VNet.
customRole-RoutesAdmin.json: Adds the CUSTOM – Azure Route Table Routes Administrator role, allowing a route to be added to the GatewaySubnet route table.
customRole-RuleCollectionGroupsAdmin.json: Adds the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role, allowing a rules collection group to be added to an Azure Firewall Policy.

Deploy The Hub

The hub is deployed first – this is required to grant the permissions that are required by the workload’s service principal.

Grant Rights To Workload Service Principals

The service principals for all workloads will be added to an Azure AD group (Workloads Pipeline Service Principals in the above diagram). That group is nested into 4 other AAD security groups:

Resource Group ARM Operations: This is granted the CUSTOM – ARM Deployment Operator role on the hub resource group.
Hub Firewall Policy: This is granted the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role on the Azure Firewalll Policy that is associated with the hub Azure Firewall.
Hub Routes: This is granted the CUSTOM – Azure Route Table Routes Administrator role on the GattewaySubnet route table.
Hub Peering: This is granted the CUSTOM – Virtual Network Peering Administrator role on the hub virtual network.

Deploy The Workload

The workload now has the required permissions to deploy the workload and make modifications in the hub to connect the hub to the outside world.

Azure Virtual WAN ARM – The Resources

In this post, I will explain the types of resources used in Azure Virtual WAN and the nature of their relationships.

Note, I have not included any content on the recently announced preview of third-party NVAs. I have not seen any materials on this yet to base such a post on and, being honest, I don’t have any use-cases for third-party NVAs.

As you can see – there are quite a few resources involved … and some that you won’t see listed at all because of the “appliance-like” nature of the deployment. I have not included any detail on spokes or “branch offices”, which would require further resources. The below diagram is enough to get a hub operational and connected to on-premises locations and spoke virtual networks.

The Virtual WAN – Microsoft.Network/virtualWans

You need at least one Virtual WAN to be deployed. This is what the hub will connect to, and you can connect many hubs to a common Virtual WAN to get automated any-to-any connectivity across the Microsoft physical WAN.

Surprisingly, the resource is deployed to an Azure region and not as a global resource, such as other global resources such as Traffic Manager or Azure DNS.

The Virtual Hub – Microsoft.Network/virtualHubs

Also known as the hub, the Virtual Hub is deployed once, and once only, per Azure region where you need a hub. This hub replaces the old hub virtual network (plus gateway(s), plus firewall, plus route tables) deployment you might be used to. The hub is deployed as a hidden resource, managed through the Virtual WAN in the Azure Portal or via scripting/ARM.

The hub is associated with the Virtual WAN through a virtualWAN property that references the resource ID of the virtualWans resource.

In a previous post, I referred to a chicken & egg scenario with the virtualHubs resource. The hub has properties that point to the resource IDs of each deployed gateway:

vpnGateway: For site-to-site VPN.
expressRouteGateway: For ExpressRoute circuit connectivity.
p2sVpnGateway: For end-user/device tunnels.

If you choose to deploy a “Secured Virtual Hub” there will also be a property called azureFirewall that will point to the resource ID of an Azure Firewall with the AZFW_Hub SKU.

Note, the restriction of 1 hub per Azure region does introduce a bottleneck. Under the covers of the platform, there is actually a virtual network. The only clue to this network will be in the peering properties of your spoke virtual networks. A single virtual network can have, today, a maximum of 500 spokes. So that means you will have a maximum of 500 spokes per Azure region.

Routing Tables – Microsoft.Network/virtualHubs/hubRouteTables & Microsoft.Network/virtualHubs/routeTables

These are resources that are used in custom routing, a recently announced as GA feature that won’t be live until August 3rd, according to the Azure Portal. The resource control the flows of traffic in your hub and spoke architecture. They are child-resources of the virtualHubs resource so no references of hub resource IDs are required.

Azure Firewall – Microsoft.Network/azureFirewalls

This is an optional resource that is deployed when you want a “Secured Virtual Hub”. Today, this is the only way to put a firewall into the hub, although a new preview program should make it possible for third-parties to join the hub. Alternatively, you can use custom routing to force north-south and east-west traffic through an NVA that is running in a spoke, although that will double peering costs.

The Azure Firewall is deployed with the AZFW_Hub SKU. The firewall is not a hidden resource. To manage the firewall, you must use an Azure Firewall Policy (aka Azure Firewall Manager). The firewall has a property called firewallPolicy that points to the resource ID of a firewallPolicies resource.

Azure Firewall Policy – Microsoft.Network/firewallPolicies

This is a resource that allows you to manage an Azure Firewall, in this case, an AZFW_Hub SKU of Azure Firewall. Although not shown here, you can deploy a parent/child configuration of policies to manage firewall configurations and rules in a global/local way.

VPN Gateway – Microsoft.Network/vpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using VPN. The VPN Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the VPN gateway resource ID using a property called vpnGateway.

ExpressRoute Gateway – Microsoft.Network/expressRouteGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides you with site-to-site connectivity using ExpressRoute. The ExpressRoute Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

Note that the virtualHubs resource must also point at the resource ID of the ExpressRoute gateway resource ID using a property called p2sGateway.

Point-to-Site Gateway – Microsoft.Network/p2sVpnGateways

This is one of 3 ways (one, two or all three at once) that you can connect on-premises (branch) sites to the hub and your Azure deployment(s). This gateway provides users/devices with connectivity using VPN tunnels. The Point-to-Site Gateway uses a property called virtualHub to point at the resource ID of the associated hub or virtualHubs resource. This is a hidden resource.

The Point-to-Site Gateway inherits a VPN configuration from a VPN configuration resource based on Microsoft.Network/vpnServerConfigurations, referring to the configuration resource by its resource ID using a property called vpnServerConfiguration.

Note that the virtualHubs resource must also point at the resource ID of the Point-to-Site gateway resource ID using a property called p2sVpnGateway.

VPN Server Configuration – Microsoft.Network/vpnServerConfigurations

This configuration for Point-to-Site VPN gateways can be seen in the Azure WAN and is intended as a shared configuration that is reusable with more than one Point-to-Site VPN Gateway. To be honest, I can see myself using it as a per-region configuration because of some values like DNS servers and RADIUS servers that will probably be placed per-region for performance and resilience reasons. This is a hidden resource.

The following resources were added on 22nd July 2020:

VPN Sites – Microsoft.Network/vpnSites

This resource has a similar purpose to a Local Network Gateway for site-to-site VPN connections; it describes the on-premises location, AKA “branch office”. A VPN site can be associated with one or many hubs, so it is actually connected to the Virtual WAN resource ID using a property called virtualWan. This is a hidden resource.

An array property called vpnSiteLinks describes possible connections to on-premises firewall devices.

VPN Connections – Microsoft.Network/vpnGateways/vpnConnections

A VPN Connections resource associates a VPN Gateway with the on-premises location that is described by an associated VPN Site. The vpnConnections resource is a child resource of vpnGateways, so there is no actual resource; the vpnConnections resource takes its name from the parent VPN Gateway, and the resource ID is an extension of the parent VPN Gateway resource ID.

By necessity, there is some complexity with this resource type. The remoteVpnSite property links the vpnConnections resource with the resource ID of a VPN Site resource. An array property, called vpnSiteLinkConnections, is used to connect the gateway to the on-premises location using 1 or 2 connections, each linking from vpnSiteLinkConnections to the resource/property ID of 1 or 2 vpnSiteLinks properties in the VPN Site. With one site link connection, you have a single VPN tunnel to the on-premises location. With 2 link connections, the VPN Gateway will take advantage of its active/active configuration to set up resilient tunnels to the on-premises location.

Virtual Network Connections – Microsoft.Network/virtualHubs/hubVirtualNetworkConnections

The purpose of a hub is to share resources with spoke virtual networks. In the case of the Virtual Hub, those resources are gateways, and maybe a firewall in the case of Secured Virtual Hub. As with a normal VNet-based hub & spoke, VNet peering is used. However, the way that VNet peering is used changes with the Virtual Hub; the deployment is done using the hub/VirtualNetworkConnections child resource, whose parent is the Virtual Hub. Therefore, the name and resource ID are based on the name and resource ID of the Virtual Hub resource.

The deployment is rather simple; you create a Virtual Network Connection in the hub specifying the resource ID of the spoke virtual network, using a property called remoteVirtualNetwork. The underlying resource provider will initiate both sides of the peering connection on your behalf – there is no deployment required in the spoke virtual network resource. The Virtual Network Connection will reference the Hub Route Tables in the hub to configure route association and propagation.

More Resources

There are more resources that I’ve yet to document, including:

Securing A Storage Account Static Website Using VNet Web Application Firewall

Alternative title: Using the Azure Application Gateway to do content redirection with a storage account static website in a secure way.

I was looking at a scenario where I needed to find a platform method of setting up a website that would:

Be cost-effective
Be able to easily receive content directly from Azure virtual machines
Be secure

This post will describe the solution.

The Storage Account

A resilient storage account is set up with a static website. The content can be uploaded to the $web container. The firewall is enabled and only traffic from the virtual machine subnet and an Azure Application Gateway or Web Application Firewall subnet is allowed. This means that you get a 404 error when you try to access the website from any other address space.

The WAF

A WAFv2 is set up. The WAF subnet is protected by an NSG. The WAF is controlled by a WAF policy. And certificates for custom domains are stored in a Key Vault – the WAF uses a user-managed identity to get Get/List rights to secrets/certificates in the Key Vault’s access policy. A multi-site HTTPS Listener is set up for the static website using a custom domain name:

The HTTP setting will handle the name translation from the custom domain name to the default storage account URI.
The Key Vault will store the certificate for the custom domain name.
There is full end-to-end encryption thanks to the storage account using a Microsoft-supplied certificate for the default storage account URI.

The HTTP setting in the WAF will be set up as follows:

HTTPS
Use Well-Known CA Certificate (Yes)
Override with a new hostname: the default URI of the static website

Solution 1 – Service Endpoint

In this case, the WAF subnet has a Microsoft.Storage service endpoint enabled. This will mean that traffic from the WAF to the storage account hosting the static website will fall through a routing “trap door” across the Azure private backbone to reach the storage account. This keeps the traffic relatively private and reduces latency.

The backend pool of the WAF is the FQDN of the static website.

Pros:

Easy to set up.

Cons:

Service Endpoints appear to be a dead-end technology
It will require the Microsoft.Storage Service Endpoint to be configured in every subnet that needs to interact with the website/storage account.

Solution 2 – Private Link/Private Endpoint

In this design, Service Endpoint is dropped and replaced with a Private Endpoint associated with the Web API of the storage account. This Private Endpoint can be in the same VNet as the WAF or even a different (peered) VNet to the WAF.

The only change to the WAF configuration is that the backend pool must now be the private IP address of the Private Endpoint. Now traffic will route from the WAF subnet to the storage account subnet, even across peering connections.

Pros:

Private Link/Private Endpoint is the future for this kind of connectivity.
There is no need to configure subnets with anything – they just need to route to the storage account (to modify content) or the WAF (access content).

Cons:

A little more complex to set up, but the effort is returned in the long-run with less configuration required.

There is no support for inbound NSG rules for the Private Endpoint but:

That is coming in the future
The storage account firewall is rejecting unwanted direct traffic
The NSG in front of the WAF provides Layer-4 security and the WAF provides Layer-7 security

Want to Learn More?

Why not join me for an ONLINE 1-day training course on securing Azure IaaS and PaaS services. Securing Azure Services & Data Through Azure Networking is my newest Azure training course, designed to give Level 400 training to those who have been using Azure for a while. It dives deep on topics that most people misunderstand and covers many topics similar to the above content.

Connecting Azure Hub-And-Spoke Architectures Together

In this post, I will explain how you can connect multiple Azure hub-and-spoke (virtual data centre) deployments together using Azure networking, even across different Azure regions.

There is a lot to know here so here is some recommended reading that I previously published:

If you are using Azure Virtual WAN Hub then some stuff will be different and that scenario is not covered fully here – Azure Virtual WAN Hub has a preview (today) feature for Any-to-Any routing.

The Scenario

In this case, there are two hub-and-spoke deployments:

Blue: Multiple virtual networks covered by the CIDR of 10.1.0.0/16
Green: Another set of multiple virtual networks covered by the CIDR of 10.2.0.0/16

I’m being strategic with the addressing of each hub-and-spoke deployment, ensuring that a single CIDR will include the hub and all spokes of a single deployment – this will come in handy when we look at User-Defined Routes.

Either of these hub-and-spoke deployments could be in the same region or even in different Azure regions. It is desired that if:

Any spoke wishes to talk to another spoke it will route through the local firewall in the local hub.
All traffic coming into a spoke from an outside source, such as the other hub-and-spoke, must route through the local firewall in the local hub.

That would mean that Spoke 1 must route through Hub 1 and then Hub 2 to talk to Spoke 4. The firewall can be a third-party appliance or the Azure Firewall.

Core Routing

Each subnet in each spoke needs a route to the outside world (0.0.0.0/0) via the local firewall. For example:

The Blue firewall backend/private IP address is 10.1.0.132
A Route Table for each subnet is created in the Blue deployment and has a route to 0.0.0.0/0 via a virtual appliance with an IP address of 10.1.0.132
The Greenfirewall backend/private IP address is 10.2.0.132
A Route Table for each subnet is created in the Green deployment and has a route to 0.0.0.0/0 via a virtual appliance with an IP address of 10.2.0.132

Note: Some network-connected PaaS services, e.g. API Management or SQL Managed Instance, require additional routes to the “control plane” that will bypass the local firewall.

Site-to-Site VPN

In this scenario, the organisation is connecting on-premises networks to 1 or more of the hub-and-spoke deployments with a site-to-site VPN connection. That connection goes to the hub of Blue and to Green hubs.

To connect Blue and Green you will need to configure VNet Peering, which can work inside a region or across regions (using Microsoft’s low latency WAN, the second-largest private WAN on the planet). Each end of peering needs the following settings (the names of the settings change so I’m not checking their exact naming):

Enabled: Yes
Allow Transit: Yes
Use Remote Gateway: No
Allow Gateway Sharing: No

Let’s go back and do some routing theory!

That peering connection will add a hidden Default (“system”) route to each subnet in the hub subnets:

Blue hub subnets: A route to 10.2.0.0/24
Green hub subnets: A route to 10.1.0.0/24

Now imagine you are a packet in Spoke 1 trying to get to Spoke 4. You’re sent to the firewall in Blue Hub 1. The firewall lets the traffic out (if a rule allows it) and now the packet sits in the egress/frontend/firewall subnet and is trying to find a route to 10.2.2.0/24. The peering-created Default route covers 10.2.0.0/24 but not the subnet for Spoke 4. So that means the default route to 0.0.0.0/0 (Internet) will be used and the packet is lost.

To fix this you will need to add a Route Table to the egress/frontend/firewall subnet in each hub:

Blue firewall subnet Route Table: 10.2.0.0/16 via virtual appliance 10.2.0.132
Red firewall subnet Route Table: 10.1.0.0/16 via virtual appliance 10.1.0.132

Thanks to my clever addressing of each hub-and-spoke, a single route will cover all packets leaving Blue and trying to get to any spoke in Red and vice-versa.

ExpressRoute

Now the customer has decided to use ExpressRoute to connect to Azure – Sweet! But guess what – you don’t need 1 expensive circuit to each hub-and-spoke.

You can share a single circuit across multiple ExpressRoute gateways:

ExpressRoute Standard: Up to 10 simultaneous connections to Virtual Network Gateways in 1+ regions in the same geopolitical region.
ExpressRoute Premium: Up to 100 simultaneous connections to Virtual Network Gateways in 1+ regions in any geopolitical region.

FYI, ExpressRoute connections to the Azure Virtual WAN Hub must be of the Premium SKU.

ExpressRoute is powered by BGP. All the on-premises routes that are advertised propagate through the ISP to the Microsoft edge router (“meet-me”) in the edge data centre. For example, if I want an ExpressRoute circuit to Azure West Europe (Middenmeer, Netherlands – not Amsterdam) I will probably (not always) get a circuit to the POP or edge data centre in Amsterdam. That gets me a physical low-latency connection onto the Microsoft WAN – and my BGP routes get to the meet-me router in Amsterdam. Now I can route to locations on that WAN. If I connect a VNet Gateway to that circuit to Blue in Azure West Europe, then my BGP routes will propagate from the meet-me router to the GatewaySubnet in the Blue hub, and then on to my firewall subnet.

BGP propagation is disabled in the spoke Route Tables to ensure all outbound flows go through the local firewall.

But that is not the extent of things! The hub-and-spoke peering connections allow Gateway Sharing from the hub and Use Remote Gateway from the spoke. With that configuration, BGP routes to the spoke get propagated to the GatewaySubnet in the hub, then to the meet-me router, through the ISP and then to the on-premises network. This is what our solution is based on.

Let’s imagine that the Green deployment is in North Europe (Dublin, Ireland). I could get a second ExpressRoute connection but:

That will add cost
Not give me the clever solution that I want – but I could work around that with ExpressRoute Global Reach

I’m going to keep this simple – by the way, if I wanted Green to be in a different geopolitical region such as East US 2 then I could use ExpressRoute Premium to make this work.

In the Green hub, the Virtual Network Gateway will connect to the existing ExpressRoute circuit – no more money to the ISP! That means Green will connect to the same meet-me router as Blue. The on-premises routes will get into Green the exact same way as with Blue. And the routes to the Green spokes will also propagate down to on-premises via the meet-me router. That meet-me router knows all about the subnets in Blue and Green. And guess what BGP routers do? They propagate – so, the routes to all of the Blue subnets propagate to Green and vice-versa with the next hop (after the Virtual Network Gateway) being the meet-me router. There are no Route Tables or peering required in the hubs – it just works!

Now the path from Blue Spoke 1 to Green Spoke 4 is Blue Hub Firewall, Blue Virtual Network Gateway, <the Microsoft WAN>, Microsoft (meet-me) Router, <the Microsoft WAN>, Green Virtual Network Gateway, Green Hub Firewall, Green Spoke 4.

There are ways to make this scenario more interesting. Let’s say I have an office in London and I want to use Microsoft Azure. Some stuff will reside in UK South for compliance or performance reasons. But UK South is not a “hero region” as Microsoft calls them. There might be more advanced features that I want to use that are only in West Europe. I could use two ExpressRoute circuits, one to UK South and one to West Europe. Or I could set up a single circuit to London to get me onto the Microsoft WAN and connected this circuit to both of my deployments in UK South and West Europe. I have a quicker route going Office > ISP > London edge data center > Azure West Europe than from Office > ISP > Amsterdam edge data center > Azure West Europe because I have reduced the latency between me and West Europe by reducing the length of the ISP circuit and using the more-direct Microsoft WAN. Just like with Azure Front Door, you want to get onto the Microsoft WAN as quickly as possible and let it get you to your destination as quickly as possible.

I’m Presenting Two Sessions At NIC 20/20 Vision in Oslo

I will be presenting two Azure sessions at the (NICCONF) NIC 20/20 Vision conference in Oslo on February 6th.

The content I’m presenting on is inspired by the work I have been doing with Innofactor Norway for customers in Norway. So it will be kind of cool to stand (once again) on a stage in Oslo and share what I’ve learned. I have two sessions on the afternoon of the 6th.

Secure Azure Network Architecture

Azure networking & security has become my focus area. I enjoy the organic nature of how Azure’s software-defined networking functions. I enjoy the scale, the possibilities, and the variety of options. And most of all, I appreciate how the near-universally overlooked fundamentals play a bigger role in network security than people realise. It’s a huge area to cover, but I will do my best in the hour that I have:

This session will walk you through the components of Azure network security, and how to architect a secure network for Azure virtual machines or platform services, including VNets, network security groups, routing tables, VNet peering, web application gateway, DDoS protection, and firewall appliances.

Auditing Azure – Compliance, Oversight, Governance, and Protection

An important part of governance is recording what is going on in Azure and being able to retain, query, and report on that data. This is an area I had a cool solution for this time last year, but Microsoft blew that up. Recently I revisited this space and found cool new things that I could do. And in preparing for this session, I found more stuff that I could talk about. I’ve enjoyed preparing this session and it has contributed back to my work. This session is late in the day for most Norwegians, but I hope that attendees stick around.

Auditing isn’t the most glamorous subject, but in a self-service environment, it becomes important to protect assets, the company, and even your job. In this session, you’ll learn how Azure provides auditing functionality that you can query, report on, and store securely for as long as you need it in cost-efficient ways.

Hopefully, I will see some of you there at the event!

Back Teaching – Implementing Secure Azure Networks

After a quiet 2019, I am getting back into Azure training starting in March in Brussels, Belgium, with a new hands-on course called Implementing Secure Azure Networks.

2019 was a year of (good) upheaval. I started a new job with big responsibilities and a learning curve. Family-wise, we had a lot of good things going on. So I decided to put our (my wife and I) Cloud Mechanix training on the shelf for a while. All last year, I’ve been putting a lot of cool Azure networking & security things into practice with larger enterprises so I’ve been learning … new things, good practices, what works, what doesn’t, and so on. That put the seed into my head for the next class that I would write. Then along came Workshop Summit and asked if I would like to submit a 1-day practical training course. So I did, and they accepted.

The Course

Security is always number 1 or 2 in any survey on the fears of cloud computing. Networking in The Cloud is very different to traditional physical networking … but in some ways it is quite similar. The goals of this workshop are:

To teach you the fundamentals, the theory, of how Azure networking functions so you can understand the practical design and application
Do hands-on deployments of secure networks

As a result, this workshop takes you all the way back to the basics of Azure networking so you really understand the “wiring” of a secure network in the cloud. Only with that understanding do you understand that small is big. The topics covered in this class will secure small/mid businesses, platform deployments that require regulatory compliance, and large enterprises:

The Microsoft global network
Availability & SLA
Virtual network basics
Virtual network adapters
Peering
Service endpoints
Private Link & Private Endpoints
Public IP Addresses
VNet gateways: VPN & ExpressRoute
Network Security Groups
Application Firewall
Route Tables
Third-Party Firewalls
Azure Firewall
Architectures

Attendees will require an Azure subscription capable of deploying multiple 4 x single-core virtual machines, 1 x Azure Firewall, 1 x Web Application Gateway, and 1 x per GB Log Analytics Workspace for 1 day.

When

Tuesday, 3rd March

Where

Venue: the Hackages Lab, located at Avenue des Arts 3-4-5 in Brussels

Organisers & Registration

This event is being run by The Workshop Summit. All registration and payments are handled by that event.

Who Should Attend

You don’t need to be a networking guru to attend this class. I always start my Azure networking training by explaining that I have never set up a VLAN; I’m proud of that! But I can out-network most people in Azure. Azure networking requires some learning, especially to do it correctly and securely, and that starts with re-learning some fundamentals. Those who understand basic concepts like a route, a firewall rule, network addressing (CIDR blocks), and so on will do fine on this course.

Who will benefit? Anyone planning on working with Azure. If you’re the person building the first “landing zone” for a migration, setting up the infrastructure for a new cloud-based service, working with IaaS VMs or platform (PaaS – yes network security plays a big role here!) then this course is for you. Get this stuff right early on and you’ll look like a genius. Or maybe you’ve already got an infrastructure and it’s time to learn how to mature it? We will start with the basics, cover them deeply, and then dive deep, focusing on security in ways that a typical Azure introduction course cannot do.