Security | Aidan Finn, IT Pro

Designing An Azure Hub Virtual Network

In this post, I am going to share a process for designing a hub virtual network for a hub & spoke secured virtual network deployment in Microsoft Azure.

The process I lay out in this document will not work for everyone.I think, based experience, that very few organisations will find exceptions to this process.

What Is And Is Not In This Post

This post is going to focus on the process of designing a hub virtual network. You will not find a design here … that will come in a later post.

You will also not find any mention of Azure Virtual WAN. You DO NOT need to use Azure Virtual WAN to do SD-WAN, despite the claptrap on Microsoft documentation on this topic. Virtual WAN also:

Restricts your options on architecture, features, and network design.
Is a nightmare to troubleshoot because the underlying virtual network is hidden in a Microsoft tenant.

Rules Of Engagement

The hub will be your network core in a network stamp: a hub & spoke. The hub & spoke will contain networks in a single region, following concepts:

Resilience & independence: Workloads in a spoke in North Europe should not depend on a hub in West Europe.
Micro-segmentation: Workloads in North Europe trying to access workloads in West Europe should go through a secure route via hubs in each region.
Performance: Workload A in North Europe should not go through a hub in West Europe to reach Workload B in North Europe.
Cost Management: Minimise global VNet peering to just what is necessary. Enable costs of hubs to be split into different parts of the organisation.
Delegation of Duty: If there are different network teams, enable each team to manage their hubs.
Minimised Resources: The hub has roles only of transit, connectivity, and security. Do not place compute or other resources into the hub; this is to minimise security/networking complexity and increase predictability.

A Hub Design Process

The core of our Azure network will have very little in the way of resources. What can be (not “must be”)included in that hub can be thought of as functions:

Site-to-site networking: VPN, ExpressRoute, and SD-WAN.
Point-to-site VPN: Enabling individuals to connect to the Azure networks using a VPN client on their device.
Firewall: Providing security for ingress, egress, and inter-workload communications.
Virtual Machines: Reduce costs of secured RDP/SSH by deploying Azure Bastion in the hub.

If we are doing a high-level design, we have a two questions that we will ask about each of thse functions:

Is the function required?
What technology will be used?

We won’t get into tiers/SKUs, features, or configurations just yet; that’s when we get into low-level or detailed design.

One can use the following flow chart to figure out what to use – it’s a bit of an eye test so you might need to open the image in another tab:

Site-to-Site (S2S) Networking

While it is very commonly used, not every organisation requires site-to-site connectivity to Azure.

For example, I had a migration customer that was (correctly) modernising to the “top tier” of cloud computing by migrating from legacy apps to SaaS. They wanted to re-implement an SD-WAN for over 100 offices to connect their new and small Azure footprint. I was the lead designer so I knew their connectivity requirements – they were going to use Azure Virtual Desktop (AVD) only to connect to their remaining legacy apps. AVD doesn’t need a site-to-site connection. I was able to save that organisation from entering into a costly managed SD-WAN services contract and instead focus on Internet connectivity – not long later they shutdown their Azure footprint when SaaS aleternatives were found for the the last legacy applications.

If we establish that site-to-site connectivity is required then we must ask the first question:

Are latency and SLA important?

If the answer to either of these items is “yes” then there is no choice: An ExpressRoute Virtual Network Gateway is required.

If the answer is no, then we are looking at some kind of VPN connectivity. We can ask another question to determine the type of solution:

Will there be a small number of VPN connections?

If a small number of VPN connections is required, the Azure VPN Virtual Network Gateway is suitable – consider the SKUs/sizes and complexities of management to determine what “a small number” is.

If you determine that the VPN Virtual Network Gateway is unsuitable then an SD-WAN network virtual appliance (NVA) should be used. Note that it would be recommended to deploy Azure Route Server with a third-party VPN/SD-WAN appliance to enable propagation network prefixes:

Azure > SD-WAN
SD-WAN > Azure

You may find that you need one or more of the above solutions! For example:

Some ExpressRoute customers may opt to deploy a parallel VPN tunnel with an identical routing configuration over a completely different ISP. This enables automatic failover from ExpressRoute to VPN in the event of a circuit failure.
An SD-WAN customer may also have ExpressRoute for some offices/workloads where SLA or latency are important. Another consideration may be that one workload has other technical requirements that only ExpressRoute (Direct) can service such as very high throughput.

You have one more question to ask after you have picked the site-to-site component(s):

Will you require site-to-site transit through Azure via the site-to-site network connections?

In other words, should Remote Site A be able to route to Remote Site B using your Azure site-to-site connections? If the answer is yes then you must deploy Azure Route Server to enable that routing.

Point-To-Site (P2S) VPN

I personally have not deployed very much of this solution but I do hear it being discussed quite a bit. Some organisations must enable users (or external suppliers) to create a VPN connection from their individual devices to Azure. If this is required then you must ask:

Is the scenario(s) simple?

I’ve kept that vague because the problem is vague. There are two solutions with one being overly-simplistic in capabilities and the other being more fully-featured.

The Azure VPN Gateway (also used for site-to-site VPN) offers a very available (Azure resource) solution for P2S VPN. It offers different configuration for authentication and device support. But it is very limited. For example, it has no routing rules to restrict which users get access to which networks. This means that if you grant network (firewall/NSG) access to one user via the VPN address pool, you must grant the same access to all users, which is clearly pretty poor if you have many types/roles of remote VPN clients (IT, developer of workload X, developer of workload Y, Vendor A, Vendor B, etc).

In such scenarios, one should consider a third-party NVA for point-to-site networking. Third-party NVAs may offer more features for P2S VPN than the VPN Virtual Network Gateway.

A P2S NVA may reside in the same hub as a VPN Virtual Network Gateway (and other S2S solutions).

It’s not in the diagram but you should also consider Entra Global Secure Access as an alternative to P2S VPN. The Private Network Connector would be deployed in a spoke(s), not the hub.

Firewall

Is a firewall required? The correct answer for anyone considering a hub & spoke architecutre should be “of course it is”. But you might not like security, so we’ll ask that question anyway.

Once you determine that security is important to your employer, you must ask yourself:

Shall I use a native PaaS firewall?

The native PaaS solution in Azure is Azure Firewall. I have many technical reasons to prefer Azure Firewall over third-party alternatives. For consultants, a useful attribute of Azure Firewall is that you can skill up on one solution that you can implement/use/manage for many customers and projects (migrations) won’t face repeated delays as you wait on others to implement rules in third-party firewalls.

If you want to use a different firewall then you are free to do so.

If you are using Azure Firewall then there is a follow-up question if there will be S2S network connections:

Are the remote networks using non-RFC1918 address prefixes?

In other words, do the remote networks use address prefixes outside of:

192.168.0.0/16
172.16.0.0/12
10.0.0.0/8

If they do then Azure Firewal requires some configuration because traffic to non-RFC1918 prefixes is forced to the Internet by default – they are Internet addresses after all! You can statically configure the prefixes if they do not change. Or …

If you are using Azure Route Server
The prefixes can change a lot thanks to scenarios such as acquisition or rapid growth

… you can (in preview today) configure integration between Azure Firewall and Azure Route Server so the firewall dynamically learns the address prefixes from the remote networks.

Virtual Machines

Do not put compute in the hub!

This scenario asks:

Will any of the workloads in your spoke virtual networks have virtual machines?

You will have virtual machines even if you “ban” virtual machines – I guarantee that they will eventually appear for things like security solutions, self-hosted agents, Azure Virtual Desktop, AKS, and so on.

Unfortunately, many consider secure remote access (SSH/RDP) to be opening a port in the firewall for TCP 22/3389. That is not considered secure because those protocols can be and have been attacked. In the past, those who took security seriously used a dedicated “jump box” or “bastion host” to isolate vulnerable on-premises machines from assets in the data centre. We can use the same process with Azure Bastion where there is no IaaS requirement – we leverage Entra security features to authenticate the connection request and the guest OS credentials to verify VM access.

One can deploy Bastion in a spoke – that is perfectly valid for some scenarios. However, many important features are only in the paid-for SKUs so you might wish to deploy a shared Azure Bastion. Unfortunately, routing restrictions by Bastion prevent deploying a shared Bastion in a spoke, so we have no choice but to deploy a shared Azure Bastion in a hub. If you wish to have a share an Azure Bastion across workloads then it will be the final component in the hub.

If/when Azure Bastion supports route tables in the AzureBastionSubnet I will recommend moving shared Bastion deployments to a spoke – yes, I know that we can do that with Azure Virtual WAN but there are many things that we cannot do with Azure Virtual WAN.

You could consider a third-party alterantive or a DIY bastion solution. If so, place that into a spoke because it will be compute-based.

Wrapping Up

As you can see, the high-level design of the hub is very simple.

There are few functions in it because when you understand Azure virtual networks, routing, and NSGs, then you understand that designing a secure network should not be complex. Complexity is the natural predator of manageability and dependable security. There is a little more detail when we get into a low-level or detailed design, but that’s a topic for another day.

Micro-Segmentation Security In Azure Networks

In this post, I want to discuss the importance of designing and implementing micro-segmentation in Azure networks.

Repeating The Same Mistakes

In 2002-2003, the world was being hammered by malware. So much so, that Microsoft did a reset on their Windows development processes and effectively built a new version of Windows XP with Windows XP Service Pack 2. The main security feature of that release was the Windows Firewall – the purpose of this was to isolate each Windows machine in the network by default. It’s a pity that nearly every Windows admin then used Group Policy to disable the Windows Firewall!

Times have moved on and so have the bad guys. Malware isn’t just an anarchist or hobby activity. Malware is a billion-dollar business (ransomware/data theft) and a military activity. Naturally, defences have evolved .. wait .. no … most admins/consultants are still deploying networks that your Daddy/Mommy deployed 22 years ago but I’ll deal with that in another post.

Instead, I want to discuss a part of the defensive solution: micro-segmentation.

Assume Penetration

We must assume that the attacker will always find a way in. Not every attack will be by Sandra Bullock clicking some magical symbol on a website to penetrate the firewall. Most attacks have relatively simple vectors such as stealing a password, hash highjacking, or getting an accountant to open a PDF. Determined attackers aren’t just “driving by”; they will look for an entry. Maybe it’s malware in vendor software that you will deploy! Maybe, it’s a vulnerability in open-source software that your developers will deploy via GitHub? Maybe a managed service provider’s Entra ID tenant has been penetrated and they have Lighthouse access to your Azure subscriptions? Each of those examples bypasses your firewall and any advanced scanning features that it may have. How do you stop them?

Micro-Segmentation

Let me conjure an image for you. A submarine is on patrol. It has a wartime mission. The submarine is always under orders to continue that mission. The submarine is detected by the enemy and is attacked. The attack causes damage which creates a flood. If left unchecked, the flood will sink the ship. What happens? The crew is trained to isolate the flood by sealing the leaking compartment – doors are slammed, seals are locked, and the water is contained in that compartment. Sure, the sailors and ship functions in that compartment are dead, but the ship can continue its mission.

That is a way to visualise micro-segmentation.

Microsoft Zero-Trust

Microsoft has a relatively small collection of documentation on zero-trust architecture for Azure. There are 3 useful bullet points:

Be ready to handle attacks before they happen.

Minimize the extent of the damage and how fast it spreads.

Increase the difficulty of compromising your cloud footprint.

Let’s expand on that a little.

Be Ready

You will be ready for an attack because you assume that you already are under attack. You don’t wait to deploy security systems and configurations; you design them with your workloads. You deploy security with your workloads. You maintain security with your workloads.

Increase The Difficulty of Compromising Your Cloud Footprint

You should put in the defences that are appropriate to your actual risks and ability to install/manage. A bad example is a medical organisation choosing a more affordable firewall to save a few bucks – this is the sort of organisation that will be targeted.

Minimise The Extent of Damage

This can also be referred to as minimising the blast zone. You want to limit how much damage the bad guys cause, just like the submarine limited flooding to the damaged compartment. This means that we make it harder to get from any one point on the network to the next.

It’s one thing to put in the security defences, but you must also:

Enable/configure the security features: it shocks me how many organisations/consultants opt not to or don’t know how to enable essential features in their security solution.
Monitor your security systems: If we assume that the attacker will get in, then we should monitor our security features to detect and shut down the attack. Again, I’m shocked every time I see security features in Azure that have no logging or alerting enabled.

Microsoft lays out a path to zero-trust where step number one is network segmentation. The basic pattern is laid out:

Applications are partitioned to different Azure Virtual Networks (VNets) and connected using a hub-spoke model

Microsoft uses the term “application”. I prefer the term “workload”. Some, like ITIL, might use the term “service”. A workload is a collection of resources that work together to provide a service to or for the organisation. Maybe it’s a bunch of Azure resources that create a retail site. Maybe it’s a CRM system. Maybe it’s an identity management & governance workload.

The pattern that Microsoft is recommending is one that I have been promoting through my employer for the last 6 years. Each workload gets a dedicated “small” virtual network. The workload VNet is peered with a hub (and only the hub by default). The hub firewall provides isolation and deeper inspection than NSGs can offer.

Step 4 tells us:

Fully distributed ingress/egress cloud micro-perimeters and deeper micro-segmentation

NSGs micro-segment the single or small set of subnet(s) in the VNet, restriocting resource-to-resource connections to just what is required. Isolation is now done centrally and at the NIC, thanks to NSGs. You should also consider network protections on PaaS resources such as Storage Accounts or Key Vaults.

If we revisit the submarine comparison, the workload-specific virtual network is one of the compartments in the boat. If there is a leak (an attack), the NSGs limit or slow down expansion in the subnet(s). The firewall isolates the workload/compartment from other workloads/compartments and the Internet by default to prevent command and control or downloads by the attacker. Deeper firewall inspection searches for attack patterns.

Don’t Forget Monitoring

Microsoft zero-trust has more than just networking. One other step I want to highlight is monitoring/alerting because it ties into the micro-segmentation features of networking. Consider the mechanisms we can put in place:

Paas resource firewalls with logging
NSG with VNet Flow Logging
(Azure) Firewall with logging for firewall rules and deep inspection features (Azure Firewall has Threat Intelligence and IDPS).

Each of those barriers or detection systems can be thought of as a string with a bell on it. The attacker will tickle or trip over those strings. If the bell rings, we should be paying attention. When you fail to put in the barriers or configure monitoring then you don’t know that the attacker is there doing something – and we assume that the attacker will get in and do something – so aren’t we failing to do our job?

It’s Not Just Me Telling You

You can say “There goes Aidan, rattling on about micro-segmentation. Why should I listen to him?”. It would be one thing if it were just me sharing my opinion on Azure network security but what if others told you to do the same things?

Microsoft tells you to implement micro-segmentation. The US NSA tells you to do it. The Canadian Centre for Cyber Security tells you to do it. The UK NCSC tells you to do it. I could keep googling (binging, of course) national security agencies and I’d find the same recommendation with each result. If you are not implementing this security technique designed for today’s threats (not for the Blaster worm of 2003) then you are not only not doing your job but you are choosing to leave the door open for attackers; that could be viewed very poorly by employers, by shareholders, or by informed compliance auditors.

How Do Network Security Groups Work?

A Greek Phalanx, protected by a shield wall made up of many individuals working under 1 instruction as a unit – like an NSG.

Yesterday, I explained how packets travel in Azure networking while telling you Azure virtual networks do not exist. The purpose was to get readers closer to figuring out how to design good and secure Azure networks without falling into traps of myths and misbeliefs. The next topic I want to tackle is Network Security Groups – I want you to understand how NSGs work … and this will also include Admin Rules from Azure Virtual Network Manager (AVNM).

Port ACLs

In my previous post, Azure Virtual Networks Do Not Exist, I said that Azure was based on Hyper-V. Windows Server 2012 introduced loads of virtual networking features that would go on to become something bigger in Azure. One of them was a mostly overlooked-by-then-customers feature called Port ACLs. I liked Port ACLs; it was mostly unknown, could only be managed using PowerShell and made for great demo content in some TechEd/Ignite sessions that I did back in the day.

Remember: Everything in Azure is a virtual machine somewhere in Azure, even “serverless” functions.

The concept of Port ACLs was it gave you a simple firewall feature controlled through the virtualisation platform – the virtual machine and the guest OS had no control and had to comply. You set up simple rules to allow or deny transport layer (TCP/UDP) traffic on specific ports. For example, I could block all traffic to a NIC by default with a low-priority inbound rule and introduce a high-priority inbound rule to allow TCP 443 (HTTPS). Now I had a web service that could receive HTTPS traffic only, no matter what the guest OS admin/dev/operator did.

Where are Port ACLs implemented? Obviously, it is somewhere in the virtualisation product, but the clue is in the name. Port ACLs are implemented by the virtual switch port. Remember that a virtual machine NIC connects to a virtual switch in the host. The virtual switch connects to the physical NIC in the host and the external physical network.

A virtual machine NIC connects to a virtual switch using a port. You probably know that a physical switch contains several ports with physical cables plugged into them. If a Port ACL is implemented by a switch port and a VM is moved to another host, then what happens to the Port ACL rules? The Hyper-V networking team played smart and implemented the switch port as a property of the NIC! That means that any Port ACL rules that are configured in the switch port move with the NIC and the VM from host to host.

NSG and Admin Rules Are Port ACLs

Along came Azure and the cloud needed a basic rules system. Network Security Groups (NSGs) were released and gave us a pretty interface to manage security at the transport layer; now we can allow or deny inbound or outbound traffic on TCP/UDP/ICMP/Any.

What technology did Azure use? Port ACLs of course. By the way, Azure Virtual Network Manager introduced a new form of basic allow/deny control that is processed before NSG rules called Admin Rules. I believe that this is also implemented using Port ACLs.

A Little About NSG Rules

This is a topic I want to dive deep into later, but let’s talk a little about NSG rules. We can implement inbound (allow or deny traffic coming in) or outbound (allow or deny traffic going out) rules.

A quick aside: I rarely use outbound NSG rules. I prefer using a combination of routing and a hub firewall (dey all by default) to control egress traffic.

When I create a NSG I can associate it with:

A NIC: Only that NIC is affected
A subnet: All NICs, including Vnet integrated PaaS resources and Private Endpoints, are affected

The association is simply a management scaling feature. When you associate a NSG with a subnet the rules are not processed at the subnet.

Tip: virtual networks do not exist!

Associating a NSG resource with a subnet propagates the rules from the NSG to all NICs that are connected to that subnet. The processing is done by Port ACLs at the NIC.

This means:

Inbound rules prevent traffic from entering the virtual machine.
Outbound rules prevent traffic from leaving the virtual machine.

Which association should you choose? I advise you to use subnet association. You can see/manage the entire picture in one “interface” and have an easy-to-understand processing scenario.

If you want to micro-manage and have an unpredictable future then go ahead and associate NSGs with each NIC.

If you hate yourself and everyone around you, then use both options at the same time:

The subnet NSG is processed first for inbound traffic.
The NIC NSG is processed first for outbound traffic.

Keep it simple, stupid (the KISS principle).

Micro-Segmentation

As one might grasp, we can use NSGs to micro-segment a subnet. No matter what the resources do, they cannot bypass the security intent of the NSG rules. That means we don’t need to have different subnets for security zones:

We zone using NSG rules.
Virtual networks and their subnets do not exist!

The only time we need to create additional subnets is when there are compatibility issues such as NSG/Route table association or a PaaS resource requires a dedicated subnet.

Watch out for more content shortly where I break some myths and hopefully simplify some of this stuff for you. And if I’m doing this right, you might start to look at some Azure networks (like I have) and wonder “Why the heck was that implemented that way?”.

Network Rules Versus Application Rules for Internal Traffic

This post is about using either Network Rules or Application Rules in Azure Firewall for internal traffic. I’m going to discuss a common scenario, a “problem” that can occur, and how you can deal with that problem.

The Rules Types

There are three kinds of rules in Azure Firewall:

DNAT Rules: Control traffic originating from the Internet, directed to a public IP address attached to Azure Firewall, and translated to a private IP Address/port in an Azure virtual network. This is implicitly applied as a Network Rule. I rarely use DNAT Rules – most modern applications are HTTP/S and enter the virtual network(s) via an Application Gateway/WAF.
Application Rules: Control traffic going to HTTP, HTTPS, or MSSQL (including Azure database services).
Network Rules: Control anything going anywhere.

The Scenario

We have an internal client, which could be:

On-premises over private networking
Connecting via point-to-site VPN
Connected to a virtual network, either the same as the Azure Firewall or to a peered virtual network.

The client needs to connect to a server, with SSL authentication, to a server. The server is connected to another virtual network/subnet. The route to the server goes through the Azure Firewall. I’ll complicate things by saying that the server is a PaaS resource with a Private Endpoint – this doesn’t affect the core problem but it makes troubleshooting more difficult 🙂

NSG rules and firewall rules have been accounted for and created. The essential connection is either HTTPS or MSSQL and is implemented as an Application Rule.

The Problem

The client attempts a connection to the server. You end up with some type of application error stating that there was either a timeout or a problem with SSL/TLS authentication.

You begin to troubleshoot:

Azure Firewall shows the traffic is allowed.
NSG Flow Logs show nothing – you panic until you remember/read that Private Endpoints do not generate flow logs – I told you that I’d complicate things 🙂 You can consider VNet Flow Logs to try to capture this data and then you might discover the cause.

You try and discover two things:

If you disconnect the NSG from the subnet then the connection is allowed. Hmm – the rules are correct so the traffic should be allowed: traffic from the client prefix(es) is permitted to the server IP address/port. The rules match the firewall rules.
You change the Application Rule to a Network Rule (with the NSG still associated and unchanged) and the connection is allowed.

So, something is going on with the Application Rules.

The Root Cause

In this case, the Application Rule is SNATing the connection. In other words, when the connection is relayed from the Azure Firewall instance to the server, the source IP is no longer that of the client; the source IP address is a compute instance in the AzureFirewallSubnet.

That is why:

The connection works when you remove the NSG
The connection works when you use a Network Rule with the NSG – the Network Rule does not SNAT the connection.

Solutions

There are two solutions to the problem:

Using Application Rules

If you want to continue to use Application Rules then the fix is to modify the NSG rule. Change the source IP prefix(es) to be the AzureFirewallSubnet.

The downsides to this are:

The NSG rules are inconsistent with the Azure Firewall rules.
The NSG rules are no longer restricting traffic to documented approved clients.

Using Network Rules

My preference is to use Network Rules for all inbound and east-west traffic. Yes, we lose some of the “Layer-7” features but we still have core features, including IDPS in the Premium SKU.

Contrary to using Application Rules:

The NSG rules are consistent with the Azure Firewall rules.
The NSG rules restrict traffic to the documented approved clients.

When To Use Application Rules?

In my sessions/classes, I teach:

Use DNAT rules for the rare occasion where Internet clients will connect to Azure resources via the public IP address of Azure Firewall.
Use Application Rules for outbound connections to Internet, including Azure resources via public endpoints, through the Azure Firewall.
Use Network Rules for everything else.

This approach limits “weird sh*t” errors like what is described above and means that NSG rules are effectively clones of the Azure Firewall rules, with some additional stuff to control stuff inside of the Virtual Network/subnet.

Azure WAF and False Positives

This post will explain how to override false positives in the (network) Azure Web Application Firewall (WAF), without compromising security, using one of four methods in combination with a tiered WAF Policy architecture:

Managed Rulesets
Custom Rules
Exclusions
Disabled rules

False Positives

A WAF is a rather simple solution, attempting to inspect L7 (application layer) traffic and intercept attacks such as protocol misuse, SQL injection, or cross-site scripting. Unfortunately, false positives can occur.

For example, let’s assume that an API app is securely shared using a WAF. Messages sent to the API might be formatted in JSON, with lots of special characters to format the message. SQL Inspection defenses count special characters, trying to find where an attacker is trying to escape out of a web request to create a database command that will execute. If the defense counts too many special characters (it will!) then an alert will be created and the message will be blocked if Prevention mode is enabled.

One must allow that traffic through because it is expected traffic that the application (and the business) requires. But one must do this without opening up too many holes in the WAF, making the WAF a costly, pointless existence.

Log Analytics Ingestion Charge

There is a side effect to false positives. False positives will vastly outnumber actual attack/probing attempts. Busy workloads can generate huge amounts of logs for false positives. If you use Log Analytics, that data has a cost:

Storage: Not too bad
Ingestion: This one is painful

The way to reduce the cost is to reduce the noise by overriding the detections that create false positives. Organizations that have a lot of web traffic could save a significant amount of money here.

WAF Policies

The WAF functionality of the Azure Application Gateway (AppGw) is managed by a resource called an Application Gateway WAF Policy (WAF Policy). The typical approach is to associate 1 WAF Policy with a WAF resource. The WAF policy will create customizations. For reasons that should become apparent later, I am going to urge you to take a slightly more granular approach to manage your WAF if your WAF is used to securely share more than one workload or listener:

WAF parent policy: A WAF policy will be associated with the WAF. This policy will apply to the WAF and all listeners unless another WAF Policy overrides specific settings.
Per-Listener/Per-Workload policy: This is a policy that is created specifically for a listener or a workload (a set of listeners). Any customisations that apply only to a listener or a workload will be applied here, without affecting any other listener or workload.

Methodology

You will never know what false positives you will encounter. If your WAF goes straight into Prevention mode then you will create a world of pain and be the recipient of a lot of hate-messages/emails.

Here’s the approach that I recommend:

Protect your WAF with an NSG that has Traffic Analytics enabled. The NSG should only allow the necessary HTTP, HTTPS, WAF monitoring (from Azure), and load balancing traffic. Use a custom deny-all rule to block everything else.
Enable monitoring for the Application Gateway, sending all logs to a queryable destination such as Log Analytics.
Monitor traffic for a period of time – enough to allow expected normal usage of the full systems. Your monitoring should detect the false positives.
Verify that Traffic Analytics did not record malicious IP addresses hitting your WAF.
Query your monitoring data to find the false positives for each listener. Identify the hostname, request URI, ruleset, rule group, and rule ID that is causing the issue on a per-listener/workload basis.
Ideally, developers fix any issues that create false positives but this is unlikely – so we’ll move on.
Determine your override strategy (see below).
Deploy your overrides with the policies still in Detection mode.
Monitor traffic for another period of time to ensure that there are no more false positives.
Switch the parent policy to Prevention Mode.
Swith each per-listener/per-workload policy to Prevention Mode
Monitor

Managed Rule Sets

The WAF today has two rulesets that you can use:

OWASP: Used to detect attacks such as SQL Injection, Cross-site scripting, and so on.
Microsoft Bot Manager Rule Set: Used to prevent malicious bots from browsing/attacking your workloads.

You need the OWASP ruleset – but we will need to manage it (later). The bot ruleset, in my experience, creates a huge amount of noise will no way of creating granular overrides. One can override the bot ruleset using custom rules, but as you’ll see later, that’s a big stick that is not granular at all!

My approach to this is to disable the Microsoft Bot Manager Rule Set (or leave it disabled) in the parent and child rulesets. If I have a need to enable it somewhere, I can do it in a per-listener or per-workload ruleset.

Custom Rules

A custom rule is created in a WAF Policy to force traffic that matches certain criteria to be:

Always allowed
Always denied
Logged only without denying it

You can create a sequence of filters based on:

IP Address
Number
String
Geo Location

If the set of filters matches a request then your desired action will apply. For example, if I want to force traffic to be allowed to my API, I can enter the API URI as one of the filters (as above) and all traffic will be allowed.

Yes, all traffic will be allowed, including traffic that is not a false positive. If I only had a few OWASP rules that were blocking the traffic, the custom rule would disable all OWASP rules.

If you must use this approach, then implement it in the child policy so it is limited to the associated listener/workload.

Exclusions

This is the newest of the override types in WAF Policy – and I’ve found it to be the least useful.

The theory is that you can create an exclusion for one or more OWASP rules based on the values of request headers. For example, if a header called RequestHeaderKeys contains a value of X-Scanner you can instruct the affected OWASP rules to be disabled. This sounds really powerful and quite granular. But this starts to fall apart with other scenarios, such as the aforementioned SQL Injection.

Another common rule that alerts on or blocks traffic is Missing User Agent Header. Exclusions work on the value of a header, so if the header is missing, Exclusions cannot evaluate it.

Another gotcha is that you cannot combine header filters to create an exclusion. The Azure Portal experience for creating an Exclusion makes it look like you can. However, the result is two or more Exclusions that work independently.

If Exclusions will work for you, implement them in the per-listener/per-workload policy and specify only the rules that must be overridden. This approach will limit the effect of the exclusion:

The scope is just the listener/workload that is associated with the WAF Policy.
The scope is further limited to just requests where the header matches, allowing all other requests and all OWASP rules to be applied.

Disabled Rules

The final approach that you can use is to disable rules that are creating false positive alerts. A simple workload might only require one or two rules to be disabled. An older & larger workload might require many OWASP rules to be disabled!

If you are going to disable OWASP rules, then do it in the per-listener/per-workload policy. This will limit the effect of the changes to that listener/workload.

This is a fairly each approach and it is pretty granular – not as much as Exclusions. The downside is that you are completely disabling certain protections for an entire listener/workload, leaving the workload vulnerable to attacks of those previously protected types.

Combinations

If you have the time and the data, you can combine different approaches. For example:

A webhook that comes from the same IP address all of the time can be allowed via a Custom Rule based on an IP Address filter. Any other traffic will be subject to the fill defenses of the WAF.
If you have certain headers that must be allowed and you want to enable all other protections for all other traffic then use Exclusions.
If traffic can come from anywhere and you need to override OWASP rules, then disable those rules.

No Great Solution

In summary, there is no perfect solution. The best you can do is find the correct override solution for the specific false positive and deploy it to a specific listener or workload. This will limit the holes that you create in the WAF to the absolute minimum while enabling your workloads to function.

Azure Firewall Basic – For Small/Medium Business & “Branch”

Microsoft has just announced a lower cost SKU of Azure Firewall, Basic, that is aimed at small/medium business but could also play a role in “branch office” deployments in Microsoft Azure.

Standard & Premium

Azure Firewall launched with a Standard SKU several years ago. The Standard SKU offered a lot of features, but some things deemed necessary for security were missing: IDPS and TLS Inspection were top of the list. Microsoft added a Premium SKU that added those features as well as fuller web category inspection and URL filtering (not just FQDN).

However, some customers didn’t adopt Azure Firewall because of the price. A lot of those customers were small-medium businesses (SMBs). Another scenario that might be affected is a “branch office” in an Azure region – a smaller footprint that is closer to clients that isn’t a main deployment.

Launching The Basic SKU

Microsoft has been working on a lower cost SKU for quite a while. The biggest challenge, I think, that they faced was trying to figure out how to balance feature, performance, and availability with price. They know that the target market has a finite budget, but there are necessary feature requirements. Every customer is different, so I guess when face with this conundrum, one needs to satisfy the needs of 80% of customers.

The clues for a new SKU have been publicly visible for quite a while – the ARM reference for Azure Firewall documented that a Basic SKU existed somewhere in Azure (in private preview). Tonight, Microsoft launched the Basic SKU inpublic Preview. A longer blog post adds some details.

Introducing the Azure Firewall

The primary target market for the Basic SKU hasn’t deployed a firewall appliance of any kind in Azure – if they are in Azure then they are most likely only using NSGs for security – which operates only at the transport protocol (TCP, UDP, ICMP) layer in a decentralised way.

The Azure Firewall is a firewall appliance, allowing centralised control. It should be deployed with NSGs and resource firewalls for layered protection, and where there is a zero-trust configuration (deny all by default) in all directions, even inside of a workload.

The Azure Firewall is native to Microsoft Azure – you don’t need a third party license or support contract. It is fully deployable and configured as code (ARM, Bicep, Terraform, Pulumi, etc), making it ideal for DevSecOps. Azure Firewall is much easier to learn than NVAs because the firewall is easily available through an Azure subscription and the training (Microsoft Learn) is publicly available – not hidden behind classic training paywalls. Thanks to the community and a platform model, I expect that more people are learning Azure Firewall than any other kind of firewall today – skills are in short supply so using native tech that is easy to learn and many are learning just makes sense.

Comparing Azure Basic With Standard and Premium

Microsoft helpfully put together a table to compare the 3 SKUs:

Comparing Azure Firewall Basic with Standard and Premium

Another difference with the Basic SKU is that you must deploy the AzureFirewallManagementSubnet in addition to the AzureFirewallSubnet – this additional subnet is often associated with forced tunneling. The result is that the firewall will have a second public IP address that is used only for management tasks.

Pricing

The Basic SKU follows the same price model as the higher SKUs: a base compute cost and a data processing cost. The shared pricing is for the Preview so it is subject to change.

The Basic SKU base compute (deployment) cost is €300.03 per month in West Europe. That’s less than 1/3 of the cost of the Standard SKU at €947.54 per month. The data processing cost for the Basic SKU is higher at €0.068 per GB. However, the amount of data passing through such a firewall deployment will be much lower so it probably will not be a huge add-on.

Preview Deployment Error

At this time, the Basic SKU is in preview. You must enable the preview in your subscription. If you do not do this, your deployment will fail with this error:

“code”: “FirewallPolicyMissingRequiredFeatureAllowBasic”,
“message”: “Subscription ‘someGuid’ is missing required feature ‘Microsoft.Network/AzureFirewallBasic’ for Basic policies.”

Some Interesting Notes

I’ve not had a chance to do much work with the Basic SKU – work is pretty crazy lately. But here are two things to note:

A hub & spoke deployment is still recommended, even for SMBs.
Availability zones are supported for higher availability.
You are forced to use Azure Firewall Manager/Azure Firewall Policy – this is a good thing because newer features are only in the new management plane.

Final Thoughts

The new SKU of Azure Firewall should add new customers to this service. I also expect that larger enterprises will also be interested – not every deployment needs the full blown Standard/Premium deployment but some form of firewall is still required.

Azure Firewall DevSecOps in Azure DevOps

In this post, I will share the details for granting the least-privilege permissions to GitHub action/DevOps pipeline service principals for a DevSecOps continuous deployment of Azure Firewall.

Quick Refresh

I wrote about the design of the solution and shared the code in my post, Enabling DevSecOps with Azure Firewall. There I explained how you could break out the code for the rules of a workload and manage that code in the repo for the workload. Realistically, you would also need to break out the gateway subnet route table user-defined route (legacy VNet-based hub) and the VNet peering connection. All the code for this is shared on GitHub – I did update the repo with some structure and with working DevOps pipelines.

This Update

There were two things I wanted to add to the design:

Detailed permissions for the service principal used by the workload DevOps pipeline, limiting the scope of change that is possible in the hub.
DevOps pipelines so I could test the above.

The Code

You’ll find 3 folders in the Bicep code now:

hub: This deploys a (legacy) VNet-based hub with Azure Firewall.
customRoles: 4 Azure custom roles are defined. This should be deployed after the hub.
spoke1: This contains the code to deploy a skeleton VNet-based (spoke) workload with updates that are required in the hub to connect the VNet and route ingress on-prem traffic through the firewall.

DevOps Pipelines

The hub and spoke1 folders each contain a folder called .pipelines. There you will find a .yml file to create a DevOps pipeline.

The DevOps pipeline uses Azure CLI tasks to:

Select the correct Azure subscription & create the resource group
Deploy each .bicep file.

My design uses 1 sub for the hub and 1 sub for the workload. You are not glued to this bu you would need to make modifications to how you configure the service principal permissions (below).

To use the code:

Create a repo in DevOps for (1 repo) hub and for (1 repo) spoke1 and copy in the required code.
Create service principals in Azure AD.
Grant the service principal for hub owner rights to the hub subscription.
Grant the service principal for the spoke owner rights to the spoke subscription.
Create ARM service connections in DevOps settings that use the service principals. Note that the names for these service connections are referred to by azureServiceConnection in the pipeline files.
Update the variables in the pipeline files with subscription IDs.
Create the pipelines using the .yml files in the repos.

Don’t do anything just yet!

Service Principal Permissions

The hub service principal is simple – grant it owner rights to the hub subscription (or resource group).

The workload is where the magic happens with this DevSecOps design. The workload updates the hub suing code in the workload repo that affects the workload:

Ingress route from on-prem to the workload in the hub GatewaySubnet.
The firewall rules for the workload in the hub Azure Firewall (policy) using a rules collection group.
The VNet peering connection between the hub VNet and the workload VNet.

That could be deployed by the workload DevOps pipeline that is authenticated using the workload’s service principal. So that means the workload service principal must have rights over the hub.

The quick solution would be to grant contributor rights over the hub and say “we’ll manage what is done through code reviews”. However, a better practice is to limit what can be done as much as possible. That’s what I have done with the customRoles folder in my GitHub share.

Those custom roles should be modified to change the possible scope to the subscription ID (or even the resource group ID) of the hub deployment. There are 4 custom roles:

customRole-ArmValidateActionOperator.json: Adds the CUSTOM – ARM Deployment Operator role, allowing the ARM deployment to be monitored and updated.
customRole-PeeringAdmin.json: Adds the CUSTOM – Virtual Network Peering Administrator role, allowing a VNet peering connection to be created from the hub VNet.
customRole-RoutesAdmin.json: Adds the CUSTOM – Azure Route Table Routes Administrator role, allowing a route to be added to the GatewaySubnet route table.
customRole-RuleCollectionGroupsAdmin.json: Adds the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role, allowing a rules collection group to be added to an Azure Firewall Policy.

Deploy The Hub

The hub is deployed first – this is required to grant the permissions that are required by the workload’s service principal.

Grant Rights To Workload Service Principals

The service principals for all workloads will be added to an Azure AD group (Workloads Pipeline Service Principals in the above diagram). That group is nested into 4 other AAD security groups:

Resource Group ARM Operations: This is granted the CUSTOM – ARM Deployment Operator role on the hub resource group.
Hub Firewall Policy: This is granted the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role on the Azure Firewalll Policy that is associated with the hub Azure Firewall.
Hub Routes: This is granted the CUSTOM – Azure Route Table Routes Administrator role on the GattewaySubnet route table.
Hub Peering: This is granted the CUSTOM – Virtual Network Peering Administrator role on the hub virtual network.

Deploy The Workload

The workload now has the required permissions to deploy the workload and make modifications in the hub to connect the hub to the outside world.

Defending Against Supply Chain Attacks

In this post, I will discuss the concepts of supply chain attacks and some thoughts around defending against them.

What Is A Supply Channel Attack?

The recent ransomware attack through Kaseya made the news but the concept of a supply chain attack isn’t new at all. Without doing any research I can think of two other examples:

SolarWinds: In December 2020, attackers used compromised code in SolarWinds monitoring solutions to compromise customers of SolarWinds.
RSA: In 2011, the Chinese PLA (or hackers sponsored by them) compromised RSA and used that access to attack customers of RSA.

What is a supply chain attack? It’s pretty hard to break into a network, especially one that has hardened itself. Users can be educated – ok, some will never be educated! Networks can be hardened and micro-segmented. Identity protections such as MFA and threat detection can be put in place.

But there remains a weakness – or several of them. There’s always a way into a network – the third party! Even the most secure network deployments require some kind of monitoring system – something where a piece of software is deployed onto “every VM”. Or there’s some software vendor that’s deep into your network that has openings all over the place. Those are your threats. If an attacker compromises the software from one of those vendors then they will get into your network during your next update and they will use the existing firewall holes & permissions that are required by the software to probe, spread, and attack.

Protection

You still need to have your first lines of defense, ideally using tools that are designed for protection against advanced persistent threats – not your regular AV package, dumby:

Identity
Email
Firewall
Backup with isolated offline storage protected by MFA

That’s a start, but, but a supply chain attack bypasses all that by using existing channels to enter your network as if it is from a trusted source – because the attack is embedded in the code from a trusted source.

Micro-Segmentation

The first step should be micro-segmentation (AKA multi-segementation). No two nodes on your network should be able to communicate unless:

They have to
They are restricted to the required directions, protocols, and ports.
That traffic passes through a firewall – and ideally several firewalls.

In Microsoft Azure, that means using:

A central firewall, in the form of a network firewall and/or web application firewall (Azure or NVA). This firewall controls connections between the outside world and your workloads, between your workloads, and importantly from your workloads to the outside world (prevents malware from talking to its human controller).
Network Security Groups at the subnet level that protect the subnet and even isolate nodes inside the subnet (use a custom Deny All rule because the default Deny All rule is useless when you understand the logic of how it works).
Resource firewalls – that’s the guest OS firewall and Azure resource firewalls.

If you have a Windows ADDS domain, use Group Policy to force the use of Windows Firewall – lazy admins and those very same vendors that will be the channel of attack will be the first to attempt to disable the firewall on machines that they are working on.

For Azure resources, consider the use of Azure Policy to force/audit the use of the firewalls in your resources and a default route to 0.0.0.0/0 via your central firewall.

An infrastructure-as-code approach to the central firewall (Azure Firewall) and NSGs brings documentation, change control, and rollback to network security.

Security Monitoring

This is where most organisations fail, and even where IT security officers really don’t get it.

Syslog is not security monitoring. Your AV is not security monitoring. You need something bigger, that is automated, and can filter through the noise – I regularly use the term “be your Neo to read the Matrix”. That’s because even in a small network, there is a lot of noise. Something needs to filter through that noise and identity the threats.

For example, there’s a lot of TCP 445 connection attempts coming from one IP address. Or there are lots of failed attempts to sign in as a user from one IP address. Or there are lots of failed connections logged by NSG rules. Or even better – all of the above. These are the sorts of things that malware that is attempting to spread will do. This is the sort of work that Azure Sentinel is perfect for – Sentinel connects to many data sources, pulls that data to a central place where complex queries can be run to look for threats that a human won’t be able to do. Threats can create incidents, incidents can trigger automated flows to eliminate the noise, and the remaining incidents can create alerts that humans will act upon.

But some malware is clever and not so noisy. The malware that hit the HSE (the Irish national health service) uses a lot of manual control to quietly spread over a very long time. Restricting outbound access to the Internet to just the required connections for business needs will cripple this control mechanism. But there’s still an automated element to this malware.

Other things to implement in Azure will include:

IDPS: An intrusion detection & prevention in the firewall, for example Azure Firewall Premium. When known malware/attack flows pass through the firewall, the firewall can log an alert or alert/deny the flows.
Security Center: Enabling Security Center “Azure Defender” (the tier previously known as the Azure Security Center Standard) provides you with oodles of new features, including some endpoint protections that are very confusingly packaged and licensed by Microsoft.

Managed Services Providers

MSPs are a part of the supply chain for their customers. MSP staff typically have credentials that allow them into many customer networks/services. That makes the identities of those staff very valuable.

A managed service provider should be a leader in identity security process, tooling, and governance. In the Microsoft world, that means using Azure AD Premium with MFA enabled for all staff. In the Azure world, Lighthouse should be used to gain access to customers’ cloud implementations. And that access should be zero-trust, powered by Privileged Identity Management (PIM).

Oh Cr@p!

These attackers are not script kiddies. They are professional organisations with big budgets, very skilled programmers and operators, and a lot of time and will. They know that with some persistent effort targeting a vendor, they can enter a lot of networks with ease. Hitting a systems management company, or more scarily, a security vendor, reaps BIG rewards because we invest in these products to secure our entire networks. The other big worry is those vendors that are deeply embedded with certain verticals such as finance or government. Imagine a vendor that is in every branch of a national government – one successful attack could bring down that entire government after a wave of upgrades! Or hitting a well known payment vendor could open up every bank in the EU.

Azure App Service, Private Endpoint, and Application Gateway/WAF

In this post, I will share how to configure an Azure Web App (or App Service) with Private Endpoint, and securely share that HTTP/S service using the Azure Application Gateway, with the optional Web Application Firewall (WAF) feature. Whew! That’s lots of feature names!

Background

Azure Application (App) Services or Web Apps allows you to create and host a web site or web application in Azure without (directly) dealing with virtual machines. This platform service makes HTTP/S services easy. By default, App Services are shared behind a public/ & shared frontend (actually, load-balanced frontends) with public IP addresses.

Earlier this year, Microsoft released Private Link, a service that enables an Azure platform resource (or service shared using a Standard Tier Load Balancer) to be connected to a virtual network subnet. The resource is referred to as the linked resource. The linked resource connects to the subnet using a Private Endpoint. There is a Private Endpoint resource and a special NIC; it’s this NIC that shares the resource with a private IP address, obtained from the address space of the subnet. You can then connect to the linked resource using the Private Endpoint IPv4 address. Note that the Private Endpoint can connect to many different “subresources” or services (referred to as serviceGroup in ARM) that the linked resource can offer. For example, a storage account has serviceGroups such as file, blob, and web.

Notes: Private Link is generally available. Private Endpoint for App Services is still in preview. App Services Premium V2 is required for Private Endpoint.

The Application Gateway allows you to share/load balance a HTTP/S service at the application layer with external (virtual network, WAN, Internet) clients. This reverse proxy also offers an optional Web Application Firewall (WAF), at extra cost, to protect the HTTP/S service with the OWASP rule set and bot protection. With the Standard Tier of DDoS protection enabled on the Application Gateway virtual network, the WAF extends this protection to Layer-7.

Design Goal

The goal of this design is to ensure that all HTTP/S (HTTPS in this example) traffic to the Web App must:

Go through the WAF.
Reverse proxy to the App Service via the Private Endpoint private IPv4 address only.

The design will result in:

Layer-4 protection by an NSG associated with the WAF subnet. NSG Traffic Analytics will send the data to Log Analytics (and optionally Azure Sentinel for SIEM) for logging, classification, and reporting.
Layer-7 protection by the WAF. If the Standard Tier of DD0S protection is enabled, then the protection will be at Layer-4 (Application Gateway Public IP Address) and Layer-7 (WAF). Logging data will be sent to Log Analytics (and optionally Azure Sentinel for SIEM) for logging and reporting.
Connections directly to the web app will fail with a “HTTP Error 403 – Forbidden” error.

Note: If you want to completely prevent TCP connections to the web app then you need to consider App Service Environment/Isolated Tier or a different Azure platform/IaaS solution.

Design

Here is the design – you will want to see the original image:

There are a number of elements to the design:

Private DNS Zone

You must be able to resolve the FQDNs of your services using the per-resource type domain names. App Services use a private DNS zone called privatelink.azurewebsites.net. There are hacks to get this to work. The best solution is to create a central Azure Private DNS Zone called privatelink.azurewebsites.net.

If you have DNS servers configured on your virtual network(s), associate the Private DNS Zone with your DNS servers’ virtual network(s). Create a conditional forwarder on the DNS servers to forward all requests to privatelink.azurewebsites.net to 168.63.129.16 (https://docs.microsoft.com/en-us/azure/virtual-network/what-is-ip-address-168-63-129-16). This will result in:

A network client sending a DNS resolution request to your DNS servers for *.privatelink.azurewebsites.net.
The DNS servers forwarding the requests for *.privatelink.azurewebsites.net to 168.63.129.16.
The Azure Private DNS Zone will receive the forwarded request and respond to the DNS servers.
The DNS servers will respond to the client with the answer.

App Service

As stated before the App Service must be hosted on a Premium v2 tier App Service Plan. In my example, the app is called myapp with a default URI of https://myapp.azurewebsites.net. A virtual network access rule is added to the App Service to permit access from the subnet of the Application Gateway. Don’t forget to figure out what to do with the SCM URI for DevOps/GitHub integration.

Private Endpoint

A Private Endpoint was added to the App Service. The service/subresource/serviceGroup is sites. Automatically, Microsoft will update their DNS to modify the name resolution of myapp.azurewebsites.net to resolve to myapp.privatelink.azurewebsites.net. In the above example, the NIC for the Private Endpoint gets an IP address of 10.0.64.68 from the AppSubnet that the App Service is now connected to.

Add an A record to the Private DNS Zone for the App Service, resolving to the IPv4 address of the Private Endpoint NIC. In my case, myapp.privatelink.azurewebsites.net will resolve to 10.0.64.68. This in turn means that myapp.azurewebsites.net > myapp.privatelink.azurewebsites.net > 10.0.64.68.

Application Gateway/WAF

Add a new Backend Pool with the IPv4 address of the Private Endpoint NIC, which is 10.0.64.68 in my example.
Create a multisite HTTPS:443 listener for the required public URI, which will be myapp.joeelway.com in my example, adding the certificate, ideally from an Azure Key Vault. Use the public IP address (in my example) as the frontend.
Set up a Custom Probe to test https://myapp.azurewebsites.net:443 (using the hostname option) with acceptable responses of 200-399.
Create an HTTP Setting (the reverse proxy) to forward traffic to https://myapp.azurewebsites.net:443 (using the hostname option) using a well-known certificate (accepting the default cert of the App Service) for end-to-end encryption.
Bind all of the above together with a routing rule.

Public DNS

Now you need to get traffic for https://myapp.joeelway.com to go to the (public, in my example) frontend IP address of the Application Gateway/WAF. There are lots of ways to do this, including Azure Front Door, Azure Traffic Manager, and third-party solutions. The easy way is to add an A record to your public DNS zone (joeelway.com, in my example) that resolves to the public IP address of the Application Gateway.

The Result

A client browses https://myapp.joeelway.com.
The client name resolution goes to public DNS which resolves myapp.joeelway.com to the public IP address of the Application Gateway.
The client connects to the Application Gateway, requesting https://myapp.joeelway.com.
The Listener on the Application Gateway receives the connection.
- Any WAF functionality inspects and accepts/rejects the connection request.
The Routing Rule in the Application Gateway associates the request to https://myapp.joeelway.com with the HTTP Setting and Custom Probe for https://myapp.azurewebsites.net.
- The custom probe is constantly testing the availability of https://myapp.azurewebsites.net and a request is reverse proxied if the service is available.
The Application Gateway routes the request for https://myapp.joeelway.com to https://myapp.azurewebsites.net at the IPv4 address of the Private Endpoint (documented in the Application Gateway Backend Pool).
The App Service receives and accepts the request for https://myapp.azurewebsites.net and responds to the Application Gateway.
The Application Gateway reverse-proxies the response to the client.

For Good Measure

If you really want to secure things:

Deploy the Application Gateway as WAFv2 and store SSL certs in a Key Vault with limited Access Policies
The NSG on the WAF subnet must be configured correctly and only permit the minimum traffic to the WAF.
All resources will send all logs to Log Analytics.
Azure Sentinel is associated with the Log Analytics workspace.
Azure Security Center Standard Tier is enabled on the subscription and the Log Analytics Workspace.
If you can justify the cost, DDoS Standard Tier is enabled on the virtual network with the public IP address(es).

And that’s just the beginning 🙂

Rethinking Firewall Management With Azure Firewall Manager

Microsoft has just announced the general availability a feature that I’ve been waiting for since I first learned about it last Autumn, called Azure Firewall Manager. Azure Firewall Manager allows you to centrally manage one or more Azure Firewall instances through a central, policy-driven, user interface. And it’s those policies, Azure Firewall Policies, that made me re-think Azure Firewall management a few months ago when I was writing my Cloud Mechanix course (running next ONLINE on July 30th) “Securing Azure Services & Data Through Azure Networking”.

Azure Firewall Policy

This is a new resource type that is generally available today. Azure Firewall Policy outsources the configuration and management of the firewall to a policy resource; that means that the usual settings in the Azure Firewall for things like rules and Threat Intelligence move from the firewall resource to a policy when a policy is associated with the firewall.

Policies can be created in a hierarchy. You can create a parent/global policy that will contain configurations and rules that will apply to all/a number of firewall instances. Then you create a child policy that inherits from the parent; note that rules changes in the parent instantly appear in the child. The child is associated with a firewall and applies configurations/rules from the parent policy and the child policy instantly to the firewall.

Problem

I’ve deployed and configured multiple customers where we have virtual data centers (VDCs, which are governed & secured hub and spoke architectures) across multiple regions. Creating rules configurations to allow flows from a spoke/service in one region to another spoke/service in another region is a royal pain in the tushie. Here’s the network flow (as I documented with routing here):

Source device
Outbound NSG rules in source spoke
Firewall in source hub
Firewall in destination hub
Inbound NSG rules at destination spoke
Destination device

There are potentially 4 sets of rules to configure for a simple service running on a single protocol/port. Today I configured Microsoft Identity Management for this scenario and there were dozens of protocol/port combinations across three spokes. The work took hours to complete – which I did in code and it provided a working result for the identity consulting team.

I minimise the work by controlling outbound flows in the local hub firewall, not in the NSG. So the NSGs do not control outbound flows at all. I could allow all via the firewall, even to other private networks, but that goes against the idea of compartmentalisation or micro-segmentation to combat modern network threats – so I need to configure both firewalls for a flow.

Solution

Re-think the firewall for a moment. Imagine you had one virtual firewall that spanned all of your Azure regional deployments. You can control all global flows with one configuration in that global virtual firewall. The global virtual firewall has instances in each Azure region. Any local flows can be configured just in that instance. That’s what Firewall Policy allows.

Parent Policy: Place all your global configurations in here. Some configurations will be company-wide, such as Threat Intelligence. Some rules, like allowing access to Microsoft URIs or Azure services (service tags) will be global too. And this is where you put the rules to allow flows between one regional deployment and another. This global management takes all your local Azure Firewall resources and treats them as a single security service.
Child Policies: A child policy will be created for each Azure Firewall instance. This policy will inherit the above from the parent applying the global configuration. Local rules, to allow north-south access to/from local services (Internet or on-prem) or east-west (spoke-to-spoke in the same regional deployment) will be configured here. RBAC can be enabled to allow local network admins to do their own thing, but unable to undo what the parent has done.

I haven’t had a chance to test Azure Firewall Policy out yet since the GA announcement, but I’m hoping that the third tier in rules (Rules Groups) made it from preview to GA. I do have groupings of rules collections based on buckets of priorities. This organisation would be awesome in my vision of Azure Firewall management.