Building A Hub & Spoke Using Azure Virtual Network Manager

In this post, I will show how to use Azure Virtual Network Manager (AVNM) to enforce peering and routing policies in a zero-trust hub-and-spoke Azure network. The goal will be to deliver ongoing consistency of the connectivity and security model, reduce operational friction, and ensure standardisation over time.

Quick Overview

AVNM is a tool that has been evolving and continues to evolve from something that I considered overpriced and under-featured, to something that I would want to deploy first in my networking architecture with its recently updated pricing. In summary, AVNM offers:

  • Network/subnet discovery and grouping
  • IP Address Management (IPAM)
  • Connectivity automation
  • Routing automation

There is (and will be) more to AVNM, but I want to focus on the above features because together they simplify the task of building out Azure platform and application landing zones.

The Environment

One can manage virtual networks using static groups but that ignores the fact that The Cloud is a dynamic and agile place. Developers, operators, and (other) service providers will be deploying virtual networks. Our goal will be to discover and manage those networks. An organisation might be simple, and there will be a one-size-fits-all policy. However, we might need to engineer for complexity. We can reduce that complexity by organising:

  • Adopt the Cloud Adoption Framework and Zero Trust recommendations of 1 subscription/virtual network per workload.
  • Organising subscriptions (workloads) using Management Groups.
  • Designing a Management Group hierarchy based on policy/RBAC inheritance instead of basing it on an organisation chart.
  • Using tags to denote roles for virtual networks.

I have built a demo lab where I am creating a hub & spoke in the form of a virtual data centre (an old term used by Microsoft). This concept will use a hub to connect and segment workloads in an Azure region. Based on Route Table limitations, the hub will support up to 400 networked workloads placed in spoke virtual networks. The spokes will be peered to the hub.

A Management Group has been created for dub01. All subscriptions for the hub and workloads in the dub01 environment will be placed into the dub01 Management Group.

Each workload will be classified based on security, compliance, and any other requirements that the organisation may have. Three policies have been predefined and named gold, silver, and bronze. Each of these classifications has a Management Group inside dub01, called dub01gold, dub01silver, and dub01bronze. Workloads are placed into the appropriate Management Group based on their classification and are subject to Azure Policy initiatives that are assigned to dub01 (regional policies) and to the classification Management Groups.

You can see two subscriptions above. The platform landing zone, p-dub01, is going to be the hub for the network architecture. It has therefore been classified as gold. The workload (application landing zone) called p-demo01 has been classified as silver and is placed in the appropriate Management Group. Both gold and silver workloads should be networked and use private networking only where possible, meaning that p-demo01 will have a spoke virtual network for its resources. Spoke virtual networks in dub01 will be connected to the hub virtual network in p-dub01.

Keep in mind that no virtual networks exist at this time.

AVNM Resource

AVNM is based on an Azure resource and subresources for the features/configurations. The AVNM resource is deployed with a management scope; this means that a single AVNM resource can be created to manage a certain scope of virtual networks. One can centrally manage all virtual networks. Or one can create many AVNM resources to delegate management (and the cost) of managing various sets of virtual networks.

I’m going to keep this simple and use one AVNM resource as most organisations that aren’t huge will do. I will place the AVNM resource in a subscription at the top of my Management Group hierarchy so that it can offer centralised management of many hub-and-spoke deployments, even if we only plan to have 1 now; plans change! This also allows me to have specialised RBAC for managing AVNM.

Note that AVNM can manage virtual networks across many regions so my AVNM resource will, for demonstration purposes, be in West Europe while my hub and spoke will be in North Europe. I have enabled the Connectivity, Security Admin, and User-Defined Routing features.

AVNM has one or more management scopes. This is a central AVNM for all networks, so I’m setting the Tenant Root Group as the top of the scope. In a lab, you might use a single subscription or a dedicated Management Group.

Defining Network Groups

We use Network Groups to assign a single configuration to many virtual networks at once. There are two kinds of members:

  • Static: You add/remove members to or from the group
  • Dynamic: You use a friendly wizard to define an Azure Policy to automatically find virtual networks and add/remove them for you. Keep in mind that Azure Policy might take a while to discover virtual networks because of how irregularly it runs. However, once added, the configuration deployment is immediately triggered by AVNM.

There are two kinds of members in a group:

  • Virtual networks: The virtual network and contained subnets are subject to the policy. Virtual networks may be static or dynamic members.
  • Subnets: Only the subnet is targeted by the configuration. Subnets are only static members.

Keep in mind that something like peering only targets a virtual network and User-Defined Routes target subnets.

I want to create a group to target all virtual networks in the dub01 scope. This group will be the basis for configuring any virtual network (except the hub) to be a secured spoke virtual network.

I created a Network Group called dub01spokes with a member type of Virtual Networks.

I then opened the Network Group and configured dynamic membership using this Azure Policy editor:

Any discovered virtual network that is not in the p-dub01 subscription and is in North Europe will be automatically added to this group.

The resulting policy is visible in Azure Policy with a category of Azure Virtual Network Manager.

IP Address Management

I’ve been using an approach of assigning a /16 to all virtual networks in a hub & spoke for years. This approach blocks the prefix in the organisation and guarantees IP capacity for all workloads in the future. It also simplifies routing and firewall rules. For example, a single route will be needed in other hubs if we need to interconnect multiple hub-and-spoke deployments.

I can reserve this capacity in AVNM IP Address Management. You can see that I have reserved 10.1.0.0/16 for dub01:

Every virtual network in dub01 will be created from this pool.

Creating The Hub Virtual Network

I’m going to save some time/money here by creating a skeleton hub. I won’t deploy a route NVA/Virtual Network Gateway so I won’t be able to share it later. I also won’t deploy a firewall, but the private address of the firewall will be 10.1.0.4.

I’m going to deploy a virtual network to use as the hub. I can use Bicep, Terraform, PowerShell, AZ CLI, or the Azure Portal. The important thing is that I refer to the IP address pool (above) when assigning an address prefix to the new virtual network. A check box called Allocate Using IP Address Pools opens a blade in the Azure Portal. Here you can select the Address Pool to take a prefix from for the new virtual network. All I have to do is select the pool and then use a subnet mask to decide how many addresses to take from the pool (/22 for my hub).

Note that the only time that I’ve had to ask a human for an address was when I created the pool. I can create virtual networks with non-conflicting addresses without any friction.

Create Connectivity Configuration

A Connectivity Configuration is a method of connecting virtual networks. We can implement:

  • Hub-spoke peering: A traditional peering between a hub and a spoke, where the spoke can use the Virtual Network Gateway/Azure Route Server in the hub.
  • Mesh: A mesh using a Connected Group (full mesh peering between all virtual networks). This is used to minimise latency between workloads with the understanding that a hub firewall will not have the opportunity to do deep inspection (performance over security).
  • Hub & spoke with mesh: The targeted VNets are meshed together for interconnectivity. They will route through the hub to communicate with the outside world.

I will create a Connectivity Configuration for a traditional hub-and-spoke network. This means that:

  • I don’t need to add code for VNet peering to my future templates.
  • No matter who deploys a VNet in the scope of dub01, they will get peered with the hub. My design will be implemented, regardless of their knowledge or their willingness to comply with the organisation’s policies.

I created a new Connectivity Configuration called dub01spokepeering.

In Topology I set the type to hub-and-spoke. I select my hub virtual network from the p-dub01 subscription as the hub Virtual Network. I then select my group of networks that I want to peer with the hub by selecting the dub01spokes group. I can configure the peering connections; here I should select Hub As Gateway – I don’t have a Virtual Network Gateway or an Azure Route Server in the hub, so the box is greyed out.

I am not enabling inter-spoke connectivity using the above configuration – AVNM has a few tricks, and this is one of them, where it uses Connected Groups to create a mesh of peering in the fabric. Instead, I will be using routing (later) via a hub firewall for secure transitive connectivity, so I leave Enable Connectivity Within Network Group blank.

Did you notice the checkbox to delete any pre-existing peering configurations? If it isn’t peered to the hub then I’m removing it so nobody uses their rights to bypass by networking design.

I completed the wizard and executed the deployment against the North Europe region. I know that there is nothing to configure, but this “cleans up” the GUI.

Create Routing Configuration

Folks who have heard me discuss network security in Azure should have learned that the most important part of running a firewall in Azure is routing. We will configure routing in the spokes using AVNM. The hub firewall subnet(s) will have full knowledge of all other networks by design:

  • Spokes: Using system routes generated by peering.
  • Remote networks: Using BGP routes. The VPN Local Network Gateway creates BGP routes in the Azure Virtual Networks for “static routes” when BGP is not used in VPN tunnels. Azure Route Server will peer with NVA routers (SD-WAN, for example) to propagate remote site prefixes using BGP into the Azure Virtual Networks.

The spokes routing design is simple:

  • A Route Table will be created for each subnet in the spoke Virtual Networks. This design for these free resources will allow customised routing for specific scenarios, such as VNet-integrated PaaS resources that require dedicated routes.
  • A single User-Defined Route (UDR) forces traffic leaving a spoke Virtual Network to pass through the hub firewall, where firewall rules will deny all traffic by default.
  • Traffic inside the Virtual Network will flow by default (directly from source to destination) and be subject to NSG rules, depending on support by the source and destination resource types.
  • The spoke subnets will be configured not to accept BGP routes from the hub; this is to prevent the spoke from bypassing the hub firewall when routing to remote sites via the Virtual Network Gateway/NVA.

I created a Routing Configuration called dub01spokerouting. In this Routing Configuration I created a Rule Collection called dub01spokeroutingrules.

A User-Defined Route, known as a Routing Rule, was created called everywhere:

The new UDR will override (deactivate) the System route to 0.0.0.0/0 via Internet and set the hub firewall as the new default next hop for traffic leaving the Virtual Network.

Here you can see the Routing Collection containing the Routing Rule:

Note that Enable BGP Route Propagation is left unchecked and that I have selected dub01spokes as my target.

And here you can see the new Routing Configuration:

Completed Configurations

I now have two configurations completed and configured:

  • The Connectivity Configuration will automatically peer in-scope Virtual Networks with the hub in p-dub01.
  • The Routing Configuration will automatically configure routing for in-scope Virtual Network subnets to use the p-dub01 firewall as the next hop.

Guess what? We have just created a Zero Trust network! All that’s left is to set up spokes with their NSGs and a WAF/WAFs for HTTPS workloads.

Deploy Spoke Virtual Networks

We will create spoke Virtual Networks from the IPAM block just like we did with the hub. Here’s where the magic is going to happen.

The evaluation-style Azure Policy assignments that are created by AVNM will run approximately every 30 minutes. That means a new Virtual Network won’t be discovered straight after creation – but they will be discovered not long after. A signal will be sent to AVNM to update group memberships based on added or removed Virtual Networks, depending on the scope of each group’s Azure Policy. Configurations will be deployed or removed immediately after a Virtual Network is added or removed from the group.

To demonstrate this, I created a new spoke Virtual Network in p-demo01. I created a new Virtual Network called p-demo01-net-vnet in the resource group p-demo01-net:

You can see that I used the IPAM address block to get a unique address space from the dub01 /16 prefix. I added a subnet called CommonSubnet with a /28 prefix. What you don’t see is that I configured the following for the subnet in the subnet wizard:

As you can see, the Virtual Network has not been configured by AVNM yet:

We will have to wait for Azure Policy to execute – or we can force a scan to run against the resource group of the new spoke Virtual Network:

  • Az CLI: az policy state trigger-scan –resource-group <resource group name>
  • PowerShell: Start-AzPolicyComplianceScan -ResourceGroupName <resource group name>

You could add a command like above into your deployment code if you wished to trigger automatic configuration.

This force process is not exactly quick either! 6 minutes after I forced a policy evaluation, I saw that AVNM was informed about a new Virtual Network:

I returned to AVNM and checked out the Network Groups. The dub01spokes group has a new member:

You can see that a Connectivity Configuration was deployed. Note that the summary doesn’t have any information on Routing Configurations – that’s an oversight by the AVNM team, I guess.

The Virtual Network does have a peering connection to the hub:

The routing has been deployed to the subnet:

A UDR has been created in the Route Table:

Over time, more Virtual Networks are added and I can see from the hub that they are automatically configured by AVNM:

Summary

I have done presentations on AVNM and demonstrated the above configurations in 40 minutes at community events. You could deploy the configurations in under 15 minutes. You can also create them using code! With this setup we can take control of our entire Azure networking deployment – and I didn’t even show you the Admin Rules feature for essential “NSG” rules (they aren’t NSG rules but use the same underlying engine to execute before NSG rules).

Want To Learn More?

Check out my company, Cloud Mechanix, where I share this kind of knowledge through:

  • Consulting services for customers and Microsoft partners using a build-with approach.
  • Custom-written and ad-hoc Azure training.

Together, I can educate your team and bring great Azure solutions to your organisation.

Designing An Azure Hub Virtual Network

In this post, I am going to share a process for designing a hub virtual network for a hub & spoke secured virtual network deployment in Microsoft Azure.

The process I lay out in this document will not work for everyone.I think, based experience, that very few organisations will find exceptions to this process.

What Is And Is Not In This Post

This post is going to focus on the process of designing a hub virtual network. You will not find a design here … that will come in a later post.

You will also not find any mention of Azure Virtual WAN. You DO NOT need to use Azure Virtual WAN to do SD-WAN, despite the claptrap on Microsoft documentation on this topic. Virtual WAN also:

  • Restricts your options on architecture, features, and network design.
  • Is a nightmare to troubleshoot because the underlying virtual network is hidden in a Microsoft tenant.

Rules Of Engagement

The hub will be your network core in a network stamp: a hub & spoke. The hub & spoke will contain networks in a single region, following concepts:

  • Resilience & independence: Workloads in a spoke in North Europe should not depend on a hub in West Europe.
  • Micro-segmentation: Workloads in North Europe trying to access workloads in West Europe should go through a secure route via hubs in each region.
  • Performance: Workload A in North Europe should not go through a hub in West Europe to reach Workload B in North Europe.
  • Cost Management: Minimise global VNet peering to just what is necessary. Enable costs of hubs to be split into different parts of the organisation.
  • Delegation of Duty: If there are different network teams, enable each team to manage their hubs.
  • Minimised Resources: The hub has roles only of transit, connectivity, and security. Do not place compute or other resources into the hub; this is to minimise security/networking complexity and increase predictability.

A Hub Design Process

The core of our Azure network will have very little in the way of resources. What can be (not “must be”)included in that hub can be thought of as functions:

  • Site-to-site networking: VPN, ExpressRoute, and SD-WAN.
  • Point-to-site VPN: Enabling individuals to connect to the Azure networks using a VPN client on their device.
  • Firewall: Providing security for ingress, egress, and inter-workload communications.
  • Virtual Machines: Reduce costs of secured RDP/SSH by deploying Azure Bastion in the hub.

If we are doing a high-level design, we have a two questions that we will ask about each of thse functions:

  • Is the function required?
  • What technology will be used?

We won’t get into tiers/SKUs, features, or configurations just yet; that’s when we get into low-level or detailed design.

One can use the following flow chart to figure out what to use – it’s a bit of an eye test so you might need to open the image in another tab:

Site-to-Site (S2S) Networking

While it is very commonly used, not every organisation requires site-to-site connectivity to Azure.

For example, I had a migration customer that was (correctly) modernising to the “top tier” of cloud computing by migrating from legacy apps to SaaS. They wanted to re-implement an SD-WAN for over 100 offices to connect their new and small Azure footprint. I was the lead designer so I knew their connectivity requirements – they were going to use Azure Virtual Desktop (AVD) only to connect to their remaining legacy apps. AVD doesn’t need a site-to-site connection. I was able to save that organisation from entering into a costly managed SD-WAN services contract and instead focus on Internet connectivity – not long later they shutdown their Azure footprint when SaaS aleternatives were found for the the last legacy applications.

If we establish that site-to-site connectivity is required then we must ask the first question:

Are latency and SLA important?

If the answer to either of these items is “yes” then there is no choice: An ExpressRoute Virtual Network Gateway is required.

If the answer is no, then we are looking at some kind of VPN connectivity. We can ask another question to determine the type of solution:

Will there be a small number of VPN connections?

If a small number of VPN connections is required, the Azure VPN Virtual Network Gateway is suitable – consider the SKUs/sizes and complexities of management to determine what “a small number” is.

If you determine that the VPN Virtual Network Gateway is unsuitable then an SD-WAN network virtual appliance (NVA) should be used. Note that it would be recommended to deploy Azure Route Server with a third-party VPN/SD-WAN appliance to enable propagation network prefixes:

  • Azure > SD-WAN
  • SD-WAN > Azure

You may find that you need one or more of the above solutions! For example:

  • Some ExpressRoute customers may opt to deploy a parallel VPN tunnel with an identical routing configuration over a completely different ISP. This enables automatic failover from ExpressRoute to VPN in the event of a circuit failure.
  • An SD-WAN customer may also have ExpressRoute for some offices/workloads where SLA or latency are important. Another consideration may be that one workload has other technical requirements that only ExpressRoute (Direct) can service such as very high throughput.

You have one more question to ask after you have picked the site-to-site component(s):

Will you require site-to-site transit through Azure via the site-to-site network connections?

In other words, should Remote Site A be able to route to Remote Site B using your Azure site-to-site connections? If the answer is yes then you must deploy Azure Route Server to enable that routing.

Point-To-Site (P2S) VPN

I personally have not deployed very much of this solution but I do hear it being discussed quite a bit. Some organisations must enable users (or external suppliers) to create a VPN connection from their individual devices to Azure. If this is required then you must ask:

Is the scenario(s) simple?

I’ve kept that vague because the problem is vague. There are two solutions with one being overly-simplistic in capabilities and the other being more fully-featured.

The Azure VPN Gateway (also used for site-to-site VPN) offers a very available (Azure resource) solution for P2S VPN. It offers different configuration for authentication and device support. But it is very limited. For example, it has no routing rules to restrict which users get access to which networks. This means that if you grant network (firewall/NSG) access to one user via the VPN address pool, you must grant the same access to all users, which is clearly pretty poor if you have many types/roles of remote VPN clients (IT, developer of workload X, developer of workload Y, Vendor A, Vendor B, etc).

In such scenarios, one should consider a third-party NVA for point-to-site networking. Third-party NVAs may offer more features for P2S VPN than the VPN Virtual Network Gateway.

A P2S NVA may reside in the same hub as a VPN Virtual Network Gateway (and other S2S solutions).

It’s not in the diagram but you should also consider Entra Global Secure Access as an alternative to P2S VPN. The Private Network Connector would be deployed in a spoke(s), not the hub.

Firewall

Is a firewall required? The correct answer for anyone considering a hub & spoke architecutre should be “of course it is”. But you might not like security, so we’ll ask that question anyway.

Once you determine that security is important to your employer, you must ask yourself:

Shall I use a native PaaS firewall?

The native PaaS solution in Azure is Azure Firewall. I have many technical reasons to prefer Azure Firewall over third-party alternatives. For consultants, a useful attribute of Azure Firewall is that you can skill up on one solution that you can implement/use/manage for many customers and projects (migrations) won’t face repeated delays as you wait on others to implement rules in third-party firewalls.

If you want to use a different firewall then you are free to do so.

If you are using Azure Firewall then there is a follow-up question if there will be S2S network connections:

Are the remote networks using non-RFC1918 address prefixes?

In other words, do the remote networks use address prefixes outside of:

  • 192.168.0.0/16
  • 172.16.0.0/12
  • 10.0.0.0/8

If they do then Azure Firewal requires some configuration because traffic to non-RFC1918 prefixes is forced to the Internet by default – they are Internet addresses after all! You can statically configure the prefixes if they do not change. Or …

  • If you are using Azure Route Server
  • The prefixes can change a lot thanks to scenarios such as acquisition or rapid growth

… you can (in preview today) configure integration between Azure Firewall and Azure Route Server so the firewall dynamically learns the address prefixes from the remote networks.

Virtual Machines

Do not put compute in the hub!

This scenario asks:

Will any of the workloads in your spoke virtual networks have virtual machines?

You will have virtual machines even if you “ban” virtual machines – I guarantee that they will eventually appear for things like security solutions, self-hosted agents, Azure Virtual Desktop, AKS, and so on.

Unfortunately, many consider secure remote access (SSH/RDP) to be opening a port in the firewall for TCP 22/3389. That is not considered secure because those protocols can be and have been attacked. In the past, those who took security seriously used a dedicated “jump box” or “bastion host” to isolate vulnerable on-premises machines from assets in the data centre. We can use the same process with Azure Bastion where there is no IaaS requirement – we leverage Entra security features to authenticate the connection request and the guest OS credentials to verify VM access.

One can deploy Bastion in a spoke – that is perfectly valid for some scenarios. However, many important features are only in the paid-for SKUs so you might wish to deploy a shared Azure Bastion. Unfortunately, routing restrictions by Bastion prevent deploying a shared Bastion in a spoke, so we have no choice but to deploy a shared Azure Bastion in a hub. If you wish to have a share an Azure Bastion across workloads then it will be the final component in the hub.

If/when Azure Bastion supports route tables in the AzureBastionSubnet I will recommend moving shared Bastion deployments to a spoke – yes, I know that we can do that with Azure Virtual WAN but there are many things that we cannot do with Azure Virtual WAN.

You could consider a third-party alterantive or a DIY bastion solution. If so, place that into a spoke because it will be compute-based.

Wrapping Up

As you can see, the high-level design of the hub is very simple.

There are few functions in it because when you understand Azure virtual networks, routing, and NSGs, then you understand that designing a secure network should not be complex. Complexity is the natural predator of manageability and dependable security. There is a little more detail when we get into a low-level or detailed design, but that’s a topic for another day.

Manage Existing Azure Firewall With Firewall Policy Using Bicep

In this post, I want to discuss how I recently took over the management of an existing Azure Firewall using Firewall Policy/Azure Firewall Manager and Bicep.

Background

We had a customer set up many years ago using our old templated Azure deployment based on ARM. At the centre of their network is Azure Firewall. That firewall plays a big role in the customer’s micro-segmented network, with over 40,000 lines of ARM code defining the many firewall rules.

The firewall was deployed before Azure Firewall Manager (AFM) was released. AFM is a pretty GUI that enables the management of several Azure networking resource types, including Azure Firewall. But when it comes to managing the firewall, AFM uses a resource called Firewall Policy; you don’t have to touch AFM at all – you can deploy a Firewall Policy, link the firewall to it (via Resource ID), and edit the Firewall Policy directly (Azure Portal or code) to manage the firewall settings or code.


One of the nicest features of Azure Firewall is a result of it being an Azure PaaS resource. Like every other resource type (there are exceptions sometimes) Azure Firewall is completely manageable via code. Not only can you deploy the firewall. You can operate it on a day-to-day basis using ARM/Bicep/Terraform/Pulumi if you want: the settings and the firewall rules. That means you can have complete change control and rollback using the features of Git in DevOps, GitHub, etc.


All new features in Azure Firewall have surfaced only via Firewall Policy since the general availability release of AFM. A legacy Azure Firewall that doesn’t have a Firewall Policy is missing many security and management features. The team that works regularly with this customer approached me about adding Firewall Policy to the customer’s deployment and including that in the code.

The Old Code

As I said before, the old code was written in ARM. I won’t get into it here, but we couldn’t add the required code to do the following without significant risk:

  • A module for Firewall Policy
  • Updating the module for Azure Firewall to include the link to the FIrewall Policy.

I got a peer to give me a second opinion and he agreed with my original assessment. We should:

  1. Create a new set of code to manage the Azure Firewall using Bicep.
  2. Introduce Firewall Policy via Bicep.
  3. Remove the ARM module for Azure Firewall from the ARM code.
  4. Leave the rest of the hub as is (ARM) because this is a mission-critical environment.

The High-Level Plan

I decided to do the following:

  1. Set up a new repo just for the Azure Firewall and Firewall Policy.
  2. Deploy the new code in there.
  3. Create a test environment and test like crazy there.
  4. The existing Azure Firewall public IP could not change because it was used in DNAT rules and by remote parties in their firewall rules.
  5. We agreed that there should be “no” downtime in the process but I wanted time for a rollback just in case. I would create non-parameterised ARM exports of the entire hub, the GatewaySubnet route table (critical to routing intent and a risk point in this kind of work), and the Azure Firewall. Our primary rollback plan would be to run the un-modified ARM code to restore everything as it was.

The Build

I needed an environment to work in. I did a non-parameterised export of the hub, including the Azure Firewall. I decompiled that to Bicep and deployed it to a dedicated test subscription. This did require some clean-up:

  • The public IP of the firewall would be different so DNAT rules would need a new destination IP.
  • Every rules collection group (many hundreds of them) had a resource ID that needed to be removed – see regex searches in Visual Studio Code.

The deployment into the test environment was a two-stage job – I needed the public IP address to obtain the destination address value for the DNAT rules.

Now I had a clone of the production environment, including all the settings and firewall rules.

The Bicep Code

I’ve been doing a lot of Bicep since the Spring of this year (2024). I’ve been using Azure Verified Modules (AVM) since early Summer – it’s what we’ve decided should be our standard approach, emulating the styling of Azure Verified Solutions.


We don’t use Microsoft’s landing zones. I have dug into them and found a commonality. The code is too impressive. The developer has been too clever. Very often, “customer configuration” is hard-coded into the Bicep. For example, the image template for Azure Image Builder (in the AVD landing zone) is broken up across many variables which are unioned until a single variable is produced. The image template is file that should be easy to get at and commonly updated.

A managed service provider knows that architecture (the code) should be separated from customer configuration. This allows the customer configuration to be frequently updated separately from the architecture. And, in turn, it should be possible to update the architecture without having to re-import the customer configuration.


My code design is simple:

  • Main.bicep which deploys the Azure Firewall (AVM) and the Firewal Policy (AVM).
  • A two-property paramater controls the true/false (bool) condition of whether or not the two resources are deployed.
  • A main.bicepparam supplies parameters to configure the SKUs/features/settings of the Azure Firewall and Firewall Policy using custom types (enabling complete Intellisense in VS Code).
  • A simple module documents the Rules Collections in single array. This array is returned as an output to main.bicep and fed as a single value to the Firewall Policy module.

I did attempt to document the Rules Collections as ARM and use the Bicep function to load an ARM file. This was my preference because it would simplify producing the firewall rules from the Azure Portal and inputting them into the file, both for the migration and for future operations. However, the Bicep function to load a file is limited to too few characters. The eventual Rules Colleciton Group module had over 40,000 lines!

My test process eventually gave me a clean result from start to finish.

The Migration

The migration was scheduled for late at night. Earlier in the afternoon, a freeze was put in place on the firewall rules. That enabled me to:

  1. Use Azure Firewall Manager to start the process of producing a Firewall Policy. I chose the option to import the rules from the existing production firewall. I then clicked the link to export the rules to ARM and saved the file locally.
  2. I decompiled the ARM code to Bicep. I copied and pasted the 3 Rules Collection Groups into my Rules Collection Group module.
  3. I then ran the deployment with no resources enabled. This told me that the pipeline was function correctly against the production environment.
  4. When the time came, I made my “backups” of the production hub and firewall.
  5. I updated the parameters to enable the deployment of the Firewall Policy. That was a quick run – the Azure Firewall was not touched so there was no udpate to the Firewall. This gave me one last chance to compare the firewall settings and rules before the final steps began.
  6. I removed the DNS settings from the Azure Firewall. I found in testing that I could not attach a Firewall Policy to an Azure Firewall if both contained DNS settings. I had to remove those settings from the production firewall. This could have caused some downtime to any clients using the firewall as their DNS server but the feature is not rolled out yet.
  7. I updated the parameters to enable management of the Azure Firewall. The code here included the name of the in-place Public IP Address. The parameters also included the resource IDs of the hub Virtual Network and the Log Analytics Workspace (Resource Specfic tables in the code). The pipeline ran … this was the key part because the Bicep code was updating the firewall with the resource ID of the Firewall Policy. Everything worked perfectly … almost … the old diagnostics settings were still there and had to be removed because the new code used a new naming standard. One quick deletion and a re-run and all was good.
  8. One of my colleagues ran a bunch of pre-documented and pre-verified tests to confirm that all was was.
  9. I then commented out the code for the Azure Firewall from the old ARM code for the hub. I re-ran the pipeline and cleaned up some errors until we had a repeated clean run.

The technical job was done:

  • Azure Firewall was managed using a Firewall Policy.
  • Azure Firewall had modern diagnostics settings.
  • The configuration is being done using code (Bicep).

You might say “Aidan, there’s a PowerShell script to do that job”. Yes there is, but it wasn’t going to produce the code that we needed to leave in place. This task did the work and has left the customer with code that is extremely flexible with every resource property available as a mandatory/optional property through a documented type specific to the resource type. As long as no bugs are found, the code can be used as is to configure any settings/features/rules in Azure Firewall or Azure Firewall manager either through the parameters files (SKUs and settings) or the Rules Collection Groups module (firewall rules).

Azure Firewall Deep Dive Training

If you thought that this post was interesting then please do check out my Azure Firewall Deep Dive course that is running on February 12th – February 13th, 2025 from 09:30-16:00 UK/Irish time/10:30-17:00 Amsterdam/Berlin time. I’ve run this course twice in the last two weeks and the feedback has been super.

Azure Firewall Deep Dive Training

I’ll tell you about my new virtual training course on Azure Firewall and share some schedule information in this post.

Background

I’ve been talking about Azure Firewall for years. I’ve done lots of sessions at user groups and conferences. I’ve done countless handovers with customers and colleagues. One of my talking points is that I reckoned that I could teach someone with a little Azure/networking knowledge everything there is to know about Azure Firewall in 2 days. And that’s what I decided to do!

I was updating one of my sessions earlier in the year when I realised that it was pretty must the structure of a training couse. Instead of me just listing out features or barely dicusssing architecture to squeeze it into a 45-60 minute-long session, I could take the time to dive deep and share all that I know or could research.

The Course

I produced a 2-day course that could be taught in-person, but my primary vector is virtual/online – it’s hard to get a bunch of people from all over into one place and there is also a cost to me in hosting a physical event that would increse the cost of the course. I decided that virtual was best, with an option off doing it in person if a suitable opportunity arose.

The course content is delivered using a combination of presentation and demo. Presentation lets me explain the what’s, why’s and so on. Demonstration lets me show you how.

The demo lab is built from a Bicep deployment, based on Azure Verified Modules (AVM). A hub & spoke network architecture is created with an Application Gateway, a simple VM workload, and a simple App Services (Private Endpoint) workload. The demonstrations follow a “hands-on guide”; this guide is written as if this was a step-by-step hands-on course, instructing the reader exactly which button to click and what/where to type. Each exercise builds on the last, eventually resulting in a secure network architecture with all of the security, monitoring, and management bells and whistles.

Why did I opt for demo instead of hands-on? Hands-on works for in-person classes. But you cannot assist in the same way when people struggle. In addition, waiting for attendees to complete labs would add another day (and cost) to the class.

Before and class, I share all of the content that I use:

  • System requirements and setup instructions.
  • The Bicep files for the demo lab.
  • The hands-on lab instructions
  • The PowerPoint
  • And a few more useful bits

I always update content – for example, my first run of this class was during Microsoft Ignite 2024 and I added a few bits from the news. Therefore I share the updated content with attendees after the course.

The First Run

I ran the class for the first time earlier this week, Novemer 20-21 2024. Attendees from all around Europe joined me for 2 days. At first they were quiet. Online is tough for speakers like me because I look for visual feedback on how I’m doing. But then the questions started coming – people were interested in what I was saying. Interaction also makes the class more interesting for me – sometimes you get comments that coer things you didn’t originally include and everyone benefits – I updated the course with one such item at the end of day 1!

I shared a 4-question anonymouse survey to learn what people thought. The feedback was awesome.

Feedback

This course was previously run in November 2024 for a European audience. The survey feedback was as follows:

How would you rate this course?

  • Excellent: 83%
  • Good: 17%

Was This Course Worth Your Time?

  • Yes: 100%

Would you recommend this course to others?

  • Yes: 100%

Some of the comments:

“I think it is a very good introduction to Azure Firewall, but it goes beyond foundational concepts so medium- experienced admins will also get value from this. I like the sections on architecture and explanations of routing and DNS. I think this course will enable people to do a good job more than for example az 700 because of the more practical approach. You are good at explaining the material”.

“Just what I wanted from a Deep dive course.”

“Perfectly delivered. Crystal clear content and very well explained”.

Future Classes

I have this class scheduled for two more runs, each timed for different parts of the world:

The classes are ultra-affordable. A few hundred Euros/dollars gets you custom content based on real-world usage. I did fint a virtual 2-day course on Palo Alto firewalls that cost $1700! You’ll also find that I run early-bird registration costs and discounts for more than 1 booking. If you have a large group (5+) then we might be able to figure out a lower rate 🙂

More To Come

More classes are coming! I have an old one to reinvent based on lots of experience over the years and at least 1 new one to write from scratch. Watch out for more!

Network Rules Versus Application Rules for Internal Traffic

This post is about using either Network Rules or Application Rules in Azure Firewall for internal traffic. I’m going to discuss a common scenario, a “problem” that can occur, and how you can deal with that problem.

The Rules Types

There are three kinds of rules in Azure Firewall:

  • DNAT Rules: Control traffic originating from the Internet, directed to a public IP address attached to Azure Firewall, and translated to a private IP Address/port in an Azure virtual network. This is implicitly applied as a Network Rule. I rarely use DNAT Rules – most modern applications are HTTP/S and enter the virtual network(s) via an Application Gateway/WAF.
  • Application Rules: Control traffic going to HTTP, HTTPS, or MSSQL (including Azure database services).
  • Network Rules: Control anything going anywhere.

The Scenario

We have an internal client, which could be:

  • On-premises over private networking
  • Connecting via point-to-site VPN
  • Connected to a virtual network, either the same as the Azure Firewall or to a peered virtual network.

The client needs to connect to a server, with SSL authentication, to a server. The server is connected to another virtual network/subnet. The route to the server goes through the Azure Firewall. I’ll complicate things by saying that the server is a PaaS resource with a Private Endpoint – this doesn’t affect the core problem but it makes troubleshooting more difficult 🙂

NSG rules and firewall rules have been accounted for and created. The essential connection is either HTTPS or MSSQL and is implemented as an Application Rule.

The Problem

The client attempts a connection to the server. You end up with some type of application error stating that there was either a timeout or a problem with SSL/TLS authentication.

You begin to troubleshoot:

  • Azure Firewall shows the traffic is allowed.
  • NSG Flow Logs show nothing – you panic until you remember/read that Private Endpoints do not generate flow logs – I told you that I’d complicate things 🙂 You can consider VNet Flow Logs to try to capture this data and then you might discover the cause.

You try and discover two things:

  • If you disconnect the NSG from the subnet then the connection is allowed. Hmm – the rules are correct so the traffic should be allowed: traffic from the client prefix(es) is permitted to the server IP address/port. The rules match the firewall rules.
  • You change the Application Rule to a Network Rule (with the NSG still associated and unchanged) and the connection is allowed.

So, something is going on with the Application Rules.

The Root Cause

In this case, the Application Rule is SNATing the connection. In other words, when the connection is relayed from the Azure Firewall instance to the server, the source IP is no longer that of the client; the source IP address is a compute instance in the AzureFirewallSubnet.

That is why:

  • The connection works when you remove the NSG
  • The connection works when you use a Network Rule with the NSG – the Network Rule does not SNAT the connection.

Solutions

There are two solutions to the problem:

Using Application Rules

If you want to continue to use Application Rules then the fix is to modify the NSG rule. Change the source IP prefix(es) to be the AzureFirewallSubnet.

The downsides to this are:

  • The NSG rules are inconsistent with the Azure Firewall rules.
  • The NSG rules are no longer restricting traffic to documented approved clients.

Using Network Rules

My preference is to use Network Rules for all inbound and east-west traffic. Yes, we lose some of the “Layer-7” features but we still have core features, including IDPS in the Premium SKU.

Contrary to using Application Rules:

  • The NSG rules are consistent with the Azure Firewall rules.
  • The NSG rules restrict traffic to the documented approved clients.

When To Use Application Rules?

In my sessions/classes, I teach:

  • Use DNAT rules for the rare occasion where Internet clients will connect to Azure resources via the public IP address of Azure Firewall.
  • Use Application Rules for outbound connections to Internet, including Azure resources via public endpoints, through the Azure Firewall.
  • Use Network Rules for everything else.

This approach limits “weird sh*t” errors like what is described above and means that NSG rules are effectively clones of the Azure Firewall rules, with some additional stuff to control stuff inside of the Virtual Network/subnet.

Default Outbound Access For VMs In Azure Will Be Retired

Microsoft has announced that the default route, an implicit public IP address, is being deprecated 30 September 2025.

Background

Let’s define “Internet” for the purposes of this post. The Internet includes:

  • The actual Internet.
  • Azure services, such as Azure SQL or Azure’s KMS for Windows VMs, that are shared with a public endpoint (IP address).

We have had ways to access those services, including:

  • Public IP address associated with a NIC of the virtual machine
  • Load Balancer with a public IP address with the virtual machine being a backend
  • A NAT Gateway
  • An appliance, such as a firewall NVA or Azure firewall, being defined as the next hop to Internet prefixes, such as 0.00.0/0

If a virtual machine is deployed without having any of the above, it still needs to reach the Internet to do things like:

  • Activate a Windows license against KVM
  • Download packages for Ubuntu
  • Use Azure services such as Key Vault, My SQL for Azure SQL, or storage accounts (diagnostics settings)

For that reason, all Azure virtual machines are able to reach the Internet using an implied public IP address. This is an address that is randomly assigned to SNAT the connection out from the virtual machine to the Internet. That address:

  • Is random and can change
  • Offers no control or security

Modern Threats

There are two things that we should have been designing networks to stop for years:

  • Malware command and control
  • Data exfiltration

The modern hack is a clever and gradual process. Ransomware is not some dumb bot that gets onto your network and goes wild. Some of the recent variants are manually controlled. The malware gets onto the network and attempts to call home to a “machine” on the Internet. From there, the controllers can explore the network and plan their attack. This is the command and control. This attempt to “call home” should be blocked by network/security designs that block outbound access to the Internet by default, opening only connections that are required for workloads to function.

The controller will discover more vulnerabilities and download more software, taking further advantage of vulnerable network/security designs. Backups are targeted for attack first, data is stolen, and systems are crippled and encrypted.

The data theft, or exfiltration, is to an IP address that a modern network/security design would block.

So you can see, that a network design where an implied public IP address is used is not a good practice. This is a primary consideration for Microsoft in making its decision to end the future use of implied public IP addresses.

What Is Happening?

On September 30th, all future virtual machines will no longer be able to use an implied public IP address. Existing virtual machines will be unaffected – but I want to drill into that because it’s not as simple as one might think.

A virtual machine is a resource in Azure. It’s not some disks. It’s not your concept of “I have something called X” that is a virtual machine. It’s a resource that exists. At some point, that resource might be removed. At that point, the virtual machine no longer exists, even if you recreate it with the exact same disks and name.

So keep in mind:

  • Virtual networks with existing VMs: The existing VMs are unaffected, but new VMs in the VNet will be affected and won’t work.
  • Scale-out: Let’s say you have a big workload with dozens of VMs with no public IP usage. You add more VMs and they don’t work – it’s because they don’t have an implied IP address, unlike their older siblings.
  • Restore from backup: You restore a VM to create a new VM. The new VM will not have an implied public IP address.

Is This a Money Grab?

No, this is not a money grab. This is an attempt by Microsoft to correct a “wrong” (it was done to be helpful to cloud newcomers) that was done in the original design. Some of the mitigations are quite low-cost, even for small businesses. To be honest, what money could be made here is pennies compared to the much bigger money that is made elsewhere by Azure.

The goal here is to:

  • Be secure by default by controlling egress traffic to limit command & control and data exfiltration.
  • Provide more control over egress flows by selecting the appliance/IP address that is used.
  • Enable more visibility over public IP addresses, for example, what public address should I share with a partner for their firewall rules?
  • Drive better networking and security architectures by default.

What Is Your Mitigation?

There are several paths that you can choose.

  1. Assign a public IP address to a virtual machine: This is the lowest cost option but offers no egress security. It can get quite messy if multiple virtual machines require public IP addresses. Rate this as “better than nothing”.
  2. Use a NAT Gateway: This allows a single IP address (or a range from an Azure Public IP Address Prefix) to be shared across an entire subnet. Note that NAT Gateway gets messy if you span availability zones, requiring disruptive VNet and workload redesign. Again this is not a security option.
  3. Use a next hop: You can use an appliance (virtual machine or Marketplace network virtual appliance) or the Azure Firewall as a next hop to the Internet (0.0.0.0/0) or specific Internet IP prefixes. This is a security option – a firewall can block unwanted egress traffic. If you are budget-conscious, then consider Azure Firewall Basic. No matter what firewall/appliance you choose, there will be some subnet/VNet redesign and changes required to routing, which could affect VNet-integrated PaaS services such as API Management Premium.

September 2025 is a long time away. But you have options to consider and potentially some network redesign work to do. Don’t sit around – start working.

In Summary

The implied route to the Internet for Azure VMs will stop being available to new VMs on September 30th, 2025. This is not a money grab – you can choose low-cost options to mitigate the effects if you wish. The hope is that you opt to choose better security, either from Microsoft or a partner. The deadline is a long time away. Do not assume that you are not affected – one day you will expand services or restore a VM from backup and be affected. So get started on your research & planning.

Azure Infrastructure Announcements – August 2023

This post brings you a summary of the infrastructure announcements from Azure that were made during August 2023. There are lots of announcements from Storage and a few interesting notes for VMs, networking, and ASR.

Storage

Azure Managed Lustre: not your grandparents’ parallel file system

With a few clicks of a web interface or an Azure Resource Manager template, AMLFS lets you provision an all-flash Lustre file system in minutes. What’s different is that this Lustre file system is all yours. If someone else in Azure is running a job that creates a million files, you won’t ever know it because your Lustre servers and SSDs are exclusively yours.

Massively scaled and high performance file systems for HPC workloads.

General availability | Azure NetApp Files: SMB Continuous Availability (CA) shares

To enhance resiliency during storage service maintenance operations, SMB volumes used by Citrix App Layering, FSLogix user profile containers and Microsoft SQL Server on Microsoft Windows Server can be enabled with Continuous Availability

SMB Transparent Failover means that clients should not notice maintenance operations.

Public preview: Azure Storage Mover support for SMB and Azure Files

Storage Mover is a fully managed migration service that enables you to migrate on-premises files and folders to Azure Storage while minimizing downtime for your workload. Azure Storage Mover can now migrate your SMB shares to Azure file shares.

To be honest, I’ve not encountered a “replace the file server with Azure Files” scenario yet. Third-party vendors often won’t support it for LOB apps. User data typically ends up in SharePoint/OneDrive. And wouldn’t most Citrix/RDS admins want to start with new profiles?

Generally available: Azure Blob Storage Cold Tier

Azure Blob Storage Cold Tier is now generally available. It is a new online access tier that is the most cost-effective Azure Blob offering for storing infrequently accessed data with long-term retention requirements, while providing instant access. The pricing of the cold tier storage option lies between the cool and archive tiers, and it follows a 90-day early deletion policy. You can seamlessly utilize the cold tier in the same way as the hot and cool tiers.

Cool – Cold. Tell me that isn’t confusing. The scenario is that you want to store data for a long time, but you need it immediately available. Archive requires a 15-hour restore (“rehydration”) that can be accelerated with a charge. Cold is one step up, but not as cost-effective.

Public Preview: Azure NetApp Files Cloud Backup for Virtual Machines

With Cloud Backup for Virtual Machines, you can now create VM consistent snapshot backups of VMs on Azure NetApp Files datastores. The associated virtual appliance installs in the Azure VMware Solution cluster and provides policy-based automated and consistent backup of VMs integrated with Azure NetApp Files snapshot technology for fast backups and restores of VMs, groups of VMs (organized in resource groups) or complete datastores lowering RTO, RPO, and improving total cost of ownership.

General Availability: Incremental snapshots for Premium SSD v2 Disk and Ultra Disk Storage

You can now instantly restore Premium SSD v2 and Ultra Disks from snapshots and attach them to a running VM without waiting for any background copy of data. This new capability allows you to read and write data on disks immediately after creation from snapshots, enabling you to recover your data from accidental deletes or a disaster quickly

I can see third-party backup making use of this.

Azure Elastic SAN updates: Private Endpoints & Shared Volumes

As we approach general availability of Azure Elastic SAN, we continue improving the service and adding features based on your feedback. Today, we are releasing private endpoint support and volume sharing support via SCSI (Small Computer System Interface) Persistent Reservation.

This sounds like the sort of feature maturity one will expect as the service approaches general availability. I wonder what the actual target market is for this service.

Azure Site Recovery

Private Preview – DR for Shared Disks – Azure Site Recovery

We are excited to announce the Private Preview of DR for Azure Shared Disks for workloads running Windows Server Failover Clusters (WSFC) on Azure VMs. Now you can protect, monitor, and recover your WSFC-clusters as a single unit across its DR Lifecycle, while also generating cluster-consistent recovery points – which are consistent across all the disks (including the Shared Disk) of the cluster.

This feature is long overdue for customers using shared virtual hard disks to create failover clusters.

Networking

Public preview: Support for new custom error pages in Application Gateway

In addition to the response codes 403 and 502, the Azure Application Gateway now lets you configure company-branded error pages for more response codes – 400, 405, 408, 500, 503, and 504. You can configure these error pages at a global level to apply to all the listeners on your gateway or individually for each listener. 

These pages can be shared on any publicly accessible URI.

Azure Firewall: New Monitoring and Logging Updates

Notes:

  • (Preview) With the Azure Firewall Resource Health check, you can now view the health status of your Azure Firewall and address service problems that may affect your Azure Firewall resource. Resource Health allows IT teams to receive proactive notifications regarding potential health degradations and recommended mitigation actions for each health event type
  • (Preview) The Azure Firewall Workbook presents a dynamic platform for analyzing Azure Firewall data. Within the Azure portal, you can utilize it to generate visually engaging reports.
  • (GA) The Latency Probe metric is designed to measure the overall latency of Azure Firewall and provide insight into the health of the service. IT administrators can use the metric for monitoring and alerting if there is observable latency and diagnosing if the Azure Firewall is the cause of latency in a network.

Resource health should make for a useful alert, especially when enabling DevSecOps – be aware of the dreaded “out of sync” error. I just tried the workbook in a production system – I noticed a couple of things that I might not have otherwise noticed because they didn’t trigger a human response (yet). The latency probe is interesting – I think it originated from customer network performance scenarios where it was suspected that the firewall was the root cause.

Virtual Machines

Public preview: Azure Mv3 Medium Memory (MM) Virtual Machines

Today we are announcing the public preview of the next generation Mv3 Medium Memory (MM) virtual machine series. Powered by the 4th Generation Intel® Xeon® Scalable Processor and DDR5 DRAM technology, the Mv3 medium memory (MM) virtual machines can scale for SAP workloads from 250GB to 4TB. With Azure Boost, Mv3 MM provides a ~25% improvement in network throughput and up to 1.5X improvement in remote storage throughput over the previous M-series families. 

These machines start at 12 vCPUs and 240 GB RAM, scaling up to 176 vCPUs and 2794 RAM. That should just about be enough to run Teams.

Azure Firewall Basic – For Small/Medium Business & “Branch”

Microsoft has just announced a lower cost SKU of Azure Firewall, Basic, that is aimed at small/medium business but could also play a role in “branch office” deployments in Microsoft Azure.

Standard & Premium

Azure Firewall launched with a Standard SKU several years ago. The Standard SKU offered a lot of features, but some things deemed necessary for security were missing: IDPS and TLS Inspection were top of the list. Microsoft added a Premium SKU that added those features as well as fuller web category inspection and URL filtering (not just FQDN).

However, some customers didn’t adopt Azure Firewall because of the price. A lot of those customers were small-medium businesses (SMBs). Another scenario that might be affected is a “branch office” in an Azure region – a smaller footprint that is closer to clients that isn’t a main deployment.

Launching The Basic SKU

Microsoft has been working on a lower cost SKU for quite a while. The biggest challenge, I think, that they faced was trying to figure out how to balance feature, performance, and availability with price. They know that the target market has a finite budget, but there are necessary feature requirements. Every customer is different, so I guess when face with this conundrum, one needs to satisfy the needs of 80% of customers.

The clues for a new SKU have been publicly visible for quite a while – the ARM reference for Azure Firewall documented that a Basic SKU existed somewhere in Azure (in private preview). Tonight, Microsoft launched the Basic SKU inpublic Preview. A longer blog post adds some details.

Introducing the Azure Firewall

The primary target market for the Basic SKU hasn’t deployed a firewall appliance of any kind in Azure – if they are in Azure then they are most likely only using NSGs for security – which operates only at the transport protocol (TCP, UDP, ICMP) layer in a decentralised way.

The Azure Firewall is a firewall appliance, allowing centralised control. It should be deployed with NSGs and resource firewalls for layered protection, and where there is a zero-trust configuration (deny all by default) in all directions, even inside of a workload.

The Azure Firewall is native to Microsoft Azure – you don’t need a third party license or support contract. It is fully deployable and configured as code (ARM, Bicep, Terraform, Pulumi, etc), making it ideal for DevSecOps. Azure Firewall is much easier to learn than NVAs because the firewall is easily available through an Azure subscription and the training (Microsoft Learn) is publicly available – not hidden behind classic training paywalls. Thanks to the community and a platform model, I expect that more people are learning Azure Firewall than any other kind of firewall today – skills are in short supply so using native tech that is easy to learn and many are learning just makes sense.

Comparing Azure Basic With Standard and Premium

Microsoft helpfully put together a table to compare the 3 SKUs:

Comparing Azure Firewall Basic with Standard and Premium

Another difference with the Basic SKU is that you must deploy the AzureFirewallManagementSubnet in addition to the AzureFirewallSubnet – this additional subnet is often associated with forced tunneling. The result is that the firewall will have a second public IP address that is used only for management tasks.

Pricing

The Basic SKU follows the same price model as the higher SKUs: a base compute cost and a data processing cost. The shared pricing is for the Preview so it is subject to change.

The Basic SKU base compute (deployment) cost is €300.03 per month in West Europe. That’s less than 1/3 of the cost of the Standard SKU at €947.54 per month. The data processing cost for the Basic SKU is higher at €0.068 per GB. However, the amount of data passing through such a firewall deployment will be much lower so it probably will not be a huge add-on.

Preview Deployment Error

At this time, the Basic SKU is in preview. You must enable the preview in your subscription. If you do not do this, your deployment will fail with this error:

“code”: “FirewallPolicyMissingRequiredFeatureAllowBasic”,

“message”: “Subscription ‘someGuid’ is missing required feature ‘Microsoft.Network/AzureFirewallBasic’ for Basic policies.”

Some Interesting Notes

I’ve not had a chance to do much work with the Basic SKU – work is pretty crazy lately. But here are two things to note:

  • A hub & spoke deployment is still recommended, even for SMBs.
  • Availability zones are supported for higher availability.
  • You are forced to use Azure Firewall Manager/Azure Firewall Policy – this is a good thing because newer features are only in the new management plane.

Final Thoughts

The new SKU of Azure Firewall should add new customers to this service. I also expect that larger enterprises will also be interested – not every deployment needs the full blown Standard/Premium deployment but some form of firewall is still required.

Azure Firewall DevSecOps in Azure DevOps

In this post, I will share the details for granting the least-privilege permissions to GitHub action/DevOps pipeline service principals for a DevSecOps continuous deployment of Azure Firewall.

Quick Refresh

I wrote about the design of the solution and shared the code in my post, Enabling DevSecOps with Azure Firewall. There I explained how you could break out the code for the rules of a workload and manage that code in the repo for the workload. Realistically, you would also need to break out the gateway subnet route table user-defined route (legacy VNet-based hub) and the VNet peering connection. All the code for this is shared on GitHub – I did update the repo with some structure and with working DevOps pipelines.

This Update

There were two things I wanted to add to the design:

  • Detailed permissions for the service principal used by the workload DevOps pipeline, limiting the scope of change that is possible in the hub.
  • DevOps pipelines so I could test the above.

The Code

You’ll find 3 folders in the Bicep code now:

  • hub: This deploys a (legacy) VNet-based hub with Azure Firewall.
  • customRoles: 4 Azure custom roles are defined. This should be deployed after the hub.
  • spoke1: This contains the code to deploy a skeleton VNet-based (spoke) workload with updates that are required in the hub to connect the VNet and route ingress on-prem traffic through the firewall.

DevOps Pipelines

The hub and spoke1 folders each contain a folder called .pipelines. There you will find a .yml file to create a DevOps pipeline.

The DevOps pipeline uses Azure CLI tasks to:

  • Select the correct Azure subscription & create the resource group
  • Deploy each .bicep file.

My design uses 1 sub for the hub and 1 sub for the workload. You are not glued to this bu you would need to make modifications to how you configure the service principal permissions (below).

To use the code:

  1. Create a repo in DevOps for (1 repo) hub and for (1 repo) spoke1 and copy in the required code.
  2. Create service principals in Azure AD.
  3. Grant the service principal for hub owner rights to the hub subscription.
  4. Grant the service principal for the spoke owner rights to the spoke subscription.
  5. Create ARM service connections in DevOps settings that use the service principals. Note that the names for these service connections are referred to by azureServiceConnection in the pipeline files.
  6. Update the variables in the pipeline files with subscription IDs.
  7. Create the pipelines using the .yml files in the repos.

Don’t do anything just yet!

Service Principal Permissions

The hub service principal is simple – grant it owner rights to the hub subscription (or resource group).

The workload is where the magic happens with this DevSecOps design. The workload updates the hub suing code in the workload repo that affects the workload:

  • Ingress route from on-prem to the workload in the hub GatewaySubnet.
  • The firewall rules for the workload in the hub Azure Firewall (policy) using a rules collection group.
  • The VNet peering connection between the hub VNet and the workload VNet.

That could be deployed by the workload DevOps pipeline that is authenticated using the workload’s service principal. So that means the workload service principal must have rights over the hub.

The quick solution would be to grant contributor rights over the hub and say “we’ll manage what is done through code reviews”. However, a better practice is to limit what can be done as much as possible. That’s what I have done with the customRoles folder in my GitHub share.

Those custom roles should be modified to change the possible scope to the subscription ID (or even the resource group ID) of the hub deployment. There are 4 custom roles:

  • customRole-ArmValidateActionOperator.json: Adds the CUSTOM – ARM Deployment Operator role, allowing the ARM deployment to be monitored and updated.
  • customRole-PeeringAdmin.json: Adds the CUSTOM – Virtual Network Peering Administrator role, allowing a VNet peering connection to be created from the hub VNet.
  • customRole-RoutesAdmin.json: Adds the CUSTOM – Azure Route Table Routes Administrator role, allowing a route to be added to the GatewaySubnet route table.
  • customRole-RuleCollectionGroupsAdmin.json: Adds the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role, allowing a rules collection group to be added to an Azure Firewall Policy.

Deploy The Hub

The hub is deployed first – this is required to grant the permissions that are required by the workload’s service principal.

Grant Rights To Workload Service Principals

The service principals for all workloads will be added to an Azure AD group (Workloads Pipeline Service Principals in the above diagram). That group is nested into 4 other AAD security groups:

  • Resource Group ARM Operations: This is granted the CUSTOM – ARM Deployment Operator role on the hub resource group.
  • Hub Firewall Policy: This is granted the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role on the Azure Firewalll Policy that is associated with the hub Azure Firewall.
  • Hub Routes: This is granted the CUSTOM – Azure Route Table Routes Administrator role on the GattewaySubnet route table.
  • Hub Peering: This is granted the CUSTOM – Virtual Network Peering Administrator role on the hub virtual network.

Deploy The Workload

The workload now has the required permissions to deploy the workload and make modifications in the hub to connect the hub to the outside world.

Enabling DevSecOps with Azure Firewall

In this post, I will share how you can implement DevSecOps with Azure Firewall, with links to a bunch of working Bicep files to deploy the infrastructure-as-code (IaC) templates.

This example uses a “legacy” hub and spoke – one where the hub is VNet-based and not based on Azure Virtual WAN Hub. I’ll try to find some time to work on the code for that one.

The Concept

Hold on, because there’s a bunch of things to understand!

DevSecOps

The DevSecOps methodology is more than just IaC. It’s a combination of people, processes, and technology to enable a fail-fast agile delivery of workloads/applications to the business. I discussed here how DevSecOps can be used to remove the friction of IT to deliver on the promises of the Cloud.

The Azure features that this design is based on are discussed in concept here. The idea is that we want to enable Devs/Ops/Security to manage firewall rules in the workload’s Git repository (repo). This breaks the traditional model where the rules are located in a central location. The important thing is not the location of the rules, but the processes that manage the rules (change control through Git repo pull request reviews) and who (the reviewers, including the architects, firewall admins, security admins, etc).

So what we are doing is taking the firewall rules for the workload and placing them in with the workload’s code. NSG rules are probably already there. Now, we’re putting the Azure Firewall rules for the workload in the workload repo too. This is all made possible thanks to changes that were made to Azure Firewall Policy (Azure Firewall Manager) Rules Collection Groups – I use one Rules Collection Group for each workload and all the rules that enable that workload are placed in that Rules Collection Group. No changes will make it to the trunk branch (deployment action/pipelines look for changes here to trigger a deployment) without approval by all the necessary parties – this means that the firewall admins are still in control, but they don’t necessarily need to write the rules themselves … and the devs/operators might even write the rules, subject to review!

This is the killer reason to choose Azure Firewall over NVAs – the ability to not only deploy the firewall resource, but to manage the entire configuration and rule sets as code, and to break that all out in a controlled way to make the enterprise more agile.

Other Bits

If you’ve read my posts on Azure routing (How to Troubleshoot Azure Routing? and BGP with Microsoft Azure Virtual Networks & Firewalls) then you’ll understand that there’s more going on than just firewall rules. Packets won’t magically flow through your firewall just because it’s in the middle of your diagram!

The spoke or workload will also need to deploy:

  • A peering connection to the hub, enabling connectivity with the hub and the firewall. All traffic leaving the spoke will route through the firewall thanks to a user-defined route in the spoke subnet route table. Peering is a two-way connection. The workload will include some bicep to deploy the spoke-hub and the hub-spoke connections.
  • A route for the GatewaySubnet route table in the hub. This is required to route traffic to the spoke address prefix(es) through the Azure Firewall so on-premises>spoke traffic is correctly inspected and filtered by the firewall.

The IaC

In this section, I’ll explain the code layout and placement.

My Code

You can find my public repo, containing all the Bicep code here. Please feel free to download and use.

The Git Repo Design

You will have two Git repos:

  1. The first repo is for the hub. This repo will contain the code for the hub, including:
    • The hub VNet.
    • The Hub VNet Gateway.
    • The GatewaySubnet Route Table.
    • The Azure Firewall.
    • The Azure Firewall Policy that manages the Azure Firewall.
  2. The second repo is for the spoke. This skeleton example workload contains:

Action/Pipeline Permissions

I have written a more detailed update on this section, which can be found here

Each Git repo needs to authenticate with Azure to deploy/modify resources. Each repo should have a service principal in Azure AD. That service principal will be used to authenticate the deployment, executed by a GitHub action or a DevOps pipeline. You should restrict what rights the service principal will require. I haven’t worked out the exact minimum permissions, but the high-level requirements are documented below:

 

Trunk Branch Protection &  Pull Request

Some of you might be worried now – what’s to stop a developer/operator working on Workload A from accidentally creating rules that affect Workload X?

This is exactly why you implement standard practices on the Git repos:

  • Protect the Trunk branch: This means that no one can just update the version of the code that is deployed to your firewall or hub. If you want to create an updated, you have to create a branch of the trunk, make your edits in that trunk, and submit the changes to be merged into trunk as a pull request.
  • Enable pull request reviews: Select a panel of people that will review changes that are submitted as pull requests to the trunk. In our scenario, this should include the firewall admin(s), security admin(s), network admin(s), and maybe the platform & workload architects.

Now, I can only submit a suggested set of rules (and route/peering) changes that must be approved by the necessary people. I can still create my code without delay, but a change control and rollback process has taken control. Obviously, this means that there should be SLAs on the review/approval process and guidance on pull request, approval, and rejection actions.

And There You Have It

Now you have the design and the Bicep code to enable DevSecOps with Azure Firewall.