WAF

Designing Network Security To Combat Modern Threats

In this post, I want to discuss how one should design network security in Microsoft Azure, dispensing with past patterns and combatting threats that are crippling businesses today.

The Past

Network security did not change much for a very long time. The classic network design is focused on an edge firewall.”All the bad guys are trying to penetrate our network from the Internet” so we’ll put up a very strong wall at the edge. With that approach, you’ll commonly find the “DMZ” network; a place where things like web proxies and DNS proxies isolate interior users and services from the Internet.

The internal network might be made up of two/more VLANs. For example, one or more client device VLANs and a server VLAN. While the route between those VLANs might pass through the firewall, it probably didn’t; they really “routed” through a smart core switch stack and there was limited to no firewall isolation between those VLANs.

This network design is fertile soil for malware. Ports usually are not let open to attack on the edge firewall. Hackers aren’t normally going to brute force their way through a firewall. There are easier ways in such as:

Send an “invoice” PDF to the accounting department that delivers a trojan horse.
Impersonate someone, ideally someone that travels and shouts a lot, to convince a helpful IT person to reset a password.
Target users via phishing or spear phishing.
Cimpromise some upstream include that developers use and use it to attack from the servers.
Use a SQL injection attack to open a command prompt on an internal server.
And on and on and …

In each of those cases, the attack comes from within. The spread of the blast (the attack) is unfettered. The blast area (a term used to describe the spread of an attack) is the entire network.

Secure Zones To The Rescue!

Government agencies love a nice secure zone architecture. This is a design where sensitive systems, such as GDRP data or secrets are stored on an isolated network.

Some agencies will even create a whol duplicate network that is isolated, forcing users to have two PCs – one “regular” one on the Internet-connected network and a “secure” PC that is wired onto an isolated network with limited secret services.

Realistically, that isolated network is of little value to most, but if you have that extreme a need – then good luck. By the way, that won’t work in The Cloud 🙂 Back to the more regular secure zone …

A special VLAN will be deployed and firewall rules will block all traffic into and out of that secure zone. The user experience might be to use Citrix desktops, hosted in the secure zone, to access services and data in that secure zone. But then reality starts cracking holes in the firewall’s deny all rules. No line of business app lives alone. They all require data from somewhere. Or there are integrations. Printers must be used. Scanners need to scan and share data. And legacy apps often use:

Domain (ADDS) credentials (how many ports do you need for that!!!)
SMB (TCP 445) for data transfer and integration

Over time, “deny all” becomes a long list of allow * from X to *, and so on, with absolutely no help from the app vendors.

The theory is that if an attack is commenced, then the blast area will be limited to the client network and, if it reaches the servers, it will be limtied to the Internal network. But this design fails to understand that:

An attack can come from within. Consider the scneario where compromised runtimes are used or a SQL injection attack breaks out from a database server.
All the required integrations open up holes between the secure zone and the other networks, including those legacy protocols that things like ransomware live on.
If one workload in the secure zone is compromised, they all are because there is no network segmentation inside of the VLAN.

And eventually, the “secure zone” is no more secure than the Internal network.

Don’t Block The Internet!!!

I’m amazed how many organisations do not block outbound access to the Internet. It’s just such hard work to open up firewall rules for all these applications that have Internet dependencies. I can understand that for a client VLAN. But the server VLAN such be a controlled space – if it’s not known & controlled (i.e. governed) then it should not be permitted.

A modern attack, an advanced persistent threat (APT), isn’t just some dumb blast, grab, and run. It is a sneaky process of:

Penetration
Discovery, often manually controlled
Spread, often manually controlled
Steal
Destroy/encrypt/etc

Once an APT gets in, it usually wants to call home to pull instructions down from a rogue IP address or compromised bot. When the APT wants to steal data, to be used as blackmail and/or to be sold on the Darknet, the malware will seek to upload data to the Internet. Both of these actions are taking advantage of the all-too-common open access to the Internet.

Azure is Different

Years of working with clients has taught me that there are three kinds of people when it comes to Azure networking:

Those who managed on-premises networks: These folks struggle with Azure networking.
Those who didn’t do on-premises networking, but knew what to ask for: These folks take to Azure networking quite quickly.
Everyone else: Irrelevant to this topic

What makes Azure networking so difficult for the network admins? There is no cabling in the fabric – obviously there is cabling in the data centres but it’s all abstracted by the VXLAN software-defined networks. Packets are encapsulated on the source virtual machine’s host, transmitted over the physical network, decapstulated on the destination virtual machine host, and presented to the destination virtual machine’s NIC. In short, packets leave the source NIC and magically arrive on the destination NIC with no hops in between – this is why traceroute is pointless in Azure and why the default gateway doesn’t really exist.

I’m not going to use virtual machines, Aidan. I’m doing PaaS and serverless computing. In Azure, everything is based on virtual machines, unless they are explcitly hosted on physical hosts (Azure VMware Services and some SAP stuff, for example). Even Functions run on a VM somewhere hidden in the platform. Serverless means that you don’t need to manage it.

The software-defined thing is why:

Partitioned subnets for a firewall appliance (front, back, VPN, and management) offer nothing from a security perspective in Azure.
ICMP isn’t as useful as you’d imagine in Azure.
The concept of partitioning workloads for security using subnets is not as useful as you might think – it’s actually counter-productive over time.

Transformation

I like to remind people during a presentation or a project kickoff that going on a cloud journey is supposed to result in transformation. You now re-evaluate everything and find better ways to do old things using cloud-native concepts. And that applies to network security designs too.

Micro-Segmentation Is The Word

Forget “Greece”, get on board with what you need to counter today’s threats: micro-segmentation. This is a concept where:

We protect the edge, inbound and outbound, permitting only required traffic.
We apply network isolation within the workload, permitting only required traffic.
We route traffic between workloads through the edge firewall, , permitting only required traffic.

Yes, more work will be required when you migrate existing workloads to Azure. I’d suggest using Azure Migrate to map network flows. I never get to do that – I always get the “messy migration projects” and I never get to use Azure Migrate – so testing and accessing and understanding NSG Traffic Analytics and the Azure Firewall/firewall logs via KQL is a necessary skill.

Security Classification

Every workload should go through a security classification process. You need to weigh risk verus complexity. If you max the security, you will increase costs and difficulty for otherwise simple operations. For example, a dev won’t be able to connect Visual Studio straight to an App Service if you deploy that App Service on a private or isolated App Service Plan. You also will have to host your own DevOps agents/GitHub runners because the Microsoft-hosted containers won’t be able to reach your SCM endpoints.

Every piece of compute is a potential attack vector: a VM, an App Service, a Function, a Container, a Logic App. The question is, if it is compromised, will the attacker be able to jump to something else? Will the data that is accessible be secret, subject to regulation, or reputational damage?

This measurement process will determine if a workload should use resources that:

Have public endpoints (cheapest and easiest).
Use private endpoints (medium levels of cost, complexity, and security).
Use full VNet integration, such as an App Service Environment or a virtual machine (highest cost/complexity but most secure).

The Virtual Network & Subnet

Imagine you are building a 3-tier workload that will be isolated from the Internet using Azure virtual networking:

Web servers on the Internet
Middle tier
Databases

Not that long ago, we would have deployed that workload on 3 subnets, one for each tier. Then we would have built isolation using Network Security Groups (NSGs), one for each subnet. But you just learned that a SD-network routes packets directly from NIC to NIC. An NSG is a Hyper-V Port ACL that is implemented at the NIC, even if applied at the subnet level. We can create all the isolation we want using an NSG within the subnet. That means we can flatten the network design for the workload to one subnet. A subnet-associated subnet will restrict communications between the tiers – and ideally between nodes within the same tier. That level of isolation should block everything … should 🙂

Tips for virtual networks and subnets:

Deploy 1 virtual network per workload: Not only will this follow Azure Cloud Adoption Framework concepts, but it will help your overall security and governance design. Each workload is placed into a spoke virtual network and peered with a hub. The hub is used only for external connectivity, the firewall, and Azure Bastion (assuming this is not a vWAN hub).
Assign a single prefix to your hub & spoke: Firewall and NSG rules will be easier.
Keep the virtual newtorks small: Don’t waste your address space.
Flatten your subnets: Only deploy subnets when there is a technical need, for example VMs and private endpoints are in one subnet, VNet integration for an App Services plan is in another, a SQL managed instance, is in a third.

Resource Firewalls

It’s sad to see how many people disable operating system firewalls. For example, Group Policy is used to diable Windows Firewall. Don’t you know that Microsoft and Linux added those firewalls to protect machines from internal attacks? Those firewalls should remain operational and only permit required traffic.

Many Azure resources also offer firewalls. App Services have firewalls. Azure SQL has a firewall. Use them! The one messy resource is the storage account. The location of the endpoints for storage clusters is in a weird place – and this causes interesting situations. For example, a Logic App’s storage account with a configured firewall will prevent workflows from being created/working correctly.

Network Security Groups

Take a look at the default inbound rules in an NSG. You’ll find there is a Deny All rule which is the lowest possible priority. Just up from that rule, is a built in rule to allow traffic from VirtualNetwork. VirtualNetwork includes the subnet, the virtual network, and all routed networks, including peers and site-to-site connections. So all traffic from internal networks is … permitted! This is why every NSG that I create has a custom DenyAll rule with a priority of 4000. Higher priority rules are created to permit required traffic and only that required traffic.

Tips with your NSGs:

Use 1 NSG per subnet: Where the subnet resources will support an NSG. You will reduce your overall complexity and make troubleshooting easier. Remember, all NSG rules are actually applied at the source (outbound rules) or target (inbound rules) NIC.
Limit the use of “any”: Rules should be as accurate as possible. For example: Allow TCP 445 from source A to destination B.
Consider the use of Application Security Groups: You can abstract IP addresses with an Application Security Group (ASG) in an NSG rule. ASGs can be used with NICs – virtual machines and private endpoints.
Enable NSG Flow Logs & Traffic Analytics: Great for troubleshooting networking (not just firewall stuff) and for feeding data to a SIEM. VNet Flow Logs will be a superior replacement when it is ready for GA.

The Hub

As I’ve implied already, you should employ a hub & spoke design. The hub should be simple, small and free of compute. The hub:

Makes connections using site-to-site networking using SD-WAN, VPN, and/or ExpressRoute.
Hosts the firewall. The firewall blocks everything in every direction by default,
Hosts Azure Bastion, unless you are running Azure Virtual WAN – then deploy it to a spoke.
Is the “Public IP” for egress traffic for workloads trying to reach the Internet. All egress traffic is via the firewall. Azure Policy should be used to restrict Public IP Addresses to just those requires that require it – things like Azure Bastion require a public IP and you should create a policy override for each required resource ID.

My preference is to use Azure Firewall. That’s a long conversation so let’s move on to another topic; Azure Bastion.

Most folks will go into Azure thinking that they will RDP/SSH straight to their VMs. RDP and SSH are not perfect. This is something that the secure zone concept recognised. It was not unusual for admins/operators to use a bastion host to hop via RDP or SSH from their PC to the required server via another server. RDP/SSH were not open directly to the protected machines.

Azure Bastion should offer the same isolation. Your NSG rules should only permit RDP/SSH from:

The AzureBastionSubnet
Any other bastion hosts that might be employed, typically by developers who will deploy specialist tools.

Azure Bastion requires:

An Entra ID sign-in, ideally protected by features such as conditional access and MFA, to access the bastion service.
The destination machine’s credentials.

Routing

Now we get to one of my favourite topics in Azure. In the on-prem world we can control how packets get from A to B using cables. But as you’ve learned, we can run cables in Azure. But we can control the next hop of a packet.

We want to control flows:

Ingress from site-to-site networking to flow through the hub firewall: A route in the GatewaySubnet to use the hub firewall as the next hop.
All traffic leaving a spoke (workload virtual network) to flow through the hub firewall: A route to 0.0.0.0/0 using the firewall backend/private IP as the next hop.
All traffic between hub & spokes to flow through the remote hub firewall: A route to the remote hub & spoke IP prefix (see above tip) with a next hop of the remote hub firewall.

If you follow my tips, especially with the simple hub, then the routing is actually quite easy to implement and maintain.

Tips:

Keep the hub free of compute.
NSG Traffic Analytics helps to troubleshoot.

Web Application Firewall

The hub firewall shold not be used to present web applications to the Internet. If a web app is classified as requireing network security, then it should be reverse proxied using a Web Application Firewall (WAF). This specialised firewall inspects traffic at the application layer and can block threats.

The WAF will have a lot of false positives. Heavy traffic applications can produce a lot of false positives in your logs; in the case of Log Analytics, the ingestion charge can be huge so get to optimising those false positives as quickly as you can.

My preference is to route the WAF through the hub firewall to the backend applications. The WAF is a form of compte, even the Azure WAF. If you do not need end-to-end TLS, then the firewall could be used to inspect the HTTP traffic from the WAF to the backend using Intrusion Detection Prevention System (IDPS), offering another layer of protection.

Azure offers a couple of WAF options. Front Door with WAF is architecturally interesting, but the default design is that the backend has a public endpoint that limits access to your Front Door instance at the application layer. What if the backend is network connected for max protection? Then you get into complexities with Private Link/Private Endpoint.

A regional WAF is network connected and offers simpler networking, but it sacrifices the performance boosts from Front Door. You can combine Front Door with a regional WAF, but there are more costs with this.

Third party solutions are posisble Services such as Cloud Flare offer performance and security features. One could argue that Cloud Flare offers more features. From the performance perspective, keep in mind that Cloud Flare has only a few peering locations with the Microsoft WAN, so a remote user might have to take a detour to get to your Azure resources, increasing latency.

You can seek out WAF solutions from the likes of F5 and Citrix in the Azure Marketplace. Keep in mind that NVAs can continue skills challenges by siloing the skill – native cloud skills are easier to develop and contract/hire.

Summary

I was going to type something like “this post gives you a quick tour of the micro-segmentation approach/features that you can use in Azure” but then I reaslised that I’ve had keyboard diarrhea and this post is quite Sinofskian. What I’ve tried to explain is that the ways of the past:

Don’t do much for security anymore
Are actually more complex in architecture than Azure-native patterns and solutions that will work.

If you implement security at three layers, assuming that a breach will happen and could happen anywhere then you limit the blast area of a threat:

The edge, using the firewall and a WAF
The NIC, using a Network Security Group
The resource, using a guest OS/resource firewall

This trust-no-one approach that denies all but the minimum required traffic will make life much harder for an attacker. Including logging and the use of a well configured SIEM will create trip wires that an attacker must trip over to attempt an expansion. You will make their expansion harder & slower, and make it easier to detect them. You will also limit how much they can spread and how much the damage that the attack can create. Furthermore, you will be following the guidance the likes of the FBI are recommending.

There is so much more to consider when it comes to security, but I’ve focused on micro-segmentation in a network context. People do think about Entra ID and management solutions (such as Defender for Cloud and/or SIEM) but they rarely think through the network design by assuming that what they did on-prem will still be fine. It won’t because on-prem isn’t fine right now! So take my advice, transform your network, and protect your assets, shareholders, and your career.

Azure WAF and False Positives

This post will explain how to override false positives in the (network) Azure Web Application Firewall (WAF), without compromising security, using one of four methods in combination with a tiered WAF Policy architecture:

Managed Rulesets
Custom Rules
Exclusions
Disabled rules

False Positives

A WAF is a rather simple solution, attempting to inspect L7 (application layer) traffic and intercept attacks such as protocol misuse, SQL injection, or cross-site scripting. Unfortunately, false positives can occur.

For example, let’s assume that an API app is securely shared using a WAF. Messages sent to the API might be formatted in JSON, with lots of special characters to format the message. SQL Inspection defenses count special characters, trying to find where an attacker is trying to escape out of a web request to create a database command that will execute. If the defense counts too many special characters (it will!) then an alert will be created and the message will be blocked if Prevention mode is enabled.

One must allow that traffic through because it is expected traffic that the application (and the business) requires. But one must do this without opening up too many holes in the WAF, making the WAF a costly, pointless existence.

Log Analytics Ingestion Charge

There is a side effect to false positives. False positives will vastly outnumber actual attack/probing attempts. Busy workloads can generate huge amounts of logs for false positives. If you use Log Analytics, that data has a cost:

Storage: Not too bad
Ingestion: This one is painful

The way to reduce the cost is to reduce the noise by overriding the detections that create false positives. Organizations that have a lot of web traffic could save a significant amount of money here.

WAF Policies

The WAF functionality of the Azure Application Gateway (AppGw) is managed by a resource called an Application Gateway WAF Policy (WAF Policy). The typical approach is to associate 1 WAF Policy with a WAF resource. The WAF policy will create customizations. For reasons that should become apparent later, I am going to urge you to take a slightly more granular approach to manage your WAF if your WAF is used to securely share more than one workload or listener:

WAF parent policy: A WAF policy will be associated with the WAF. This policy will apply to the WAF and all listeners unless another WAF Policy overrides specific settings.
Per-Listener/Per-Workload policy: This is a policy that is created specifically for a listener or a workload (a set of listeners). Any customisations that apply only to a listener or a workload will be applied here, without affecting any other listener or workload.

Methodology

You will never know what false positives you will encounter. If your WAF goes straight into Prevention mode then you will create a world of pain and be the recipient of a lot of hate-messages/emails.

Here’s the approach that I recommend:

Protect your WAF with an NSG that has Traffic Analytics enabled. The NSG should only allow the necessary HTTP, HTTPS, WAF monitoring (from Azure), and load balancing traffic. Use a custom deny-all rule to block everything else.
Enable monitoring for the Application Gateway, sending all logs to a queryable destination such as Log Analytics.
Monitor traffic for a period of time – enough to allow expected normal usage of the full systems. Your monitoring should detect the false positives.
Verify that Traffic Analytics did not record malicious IP addresses hitting your WAF.
Query your monitoring data to find the false positives for each listener. Identify the hostname, request URI, ruleset, rule group, and rule ID that is causing the issue on a per-listener/workload basis.
Ideally, developers fix any issues that create false positives but this is unlikely – so we’ll move on.
Determine your override strategy (see below).
Deploy your overrides with the policies still in Detection mode.
Monitor traffic for another period of time to ensure that there are no more false positives.
Switch the parent policy to Prevention Mode.
Swith each per-listener/per-workload policy to Prevention Mode
Monitor

Managed Rule Sets

The WAF today has two rulesets that you can use:

OWASP: Used to detect attacks such as SQL Injection, Cross-site scripting, and so on.
Microsoft Bot Manager Rule Set: Used to prevent malicious bots from browsing/attacking your workloads.

You need the OWASP ruleset – but we will need to manage it (later). The bot ruleset, in my experience, creates a huge amount of noise will no way of creating granular overrides. One can override the bot ruleset using custom rules, but as you’ll see later, that’s a big stick that is not granular at all!

My approach to this is to disable the Microsoft Bot Manager Rule Set (or leave it disabled) in the parent and child rulesets. If I have a need to enable it somewhere, I can do it in a per-listener or per-workload ruleset.

Custom Rules

A custom rule is created in a WAF Policy to force traffic that matches certain criteria to be:

Always allowed
Always denied
Logged only without denying it

You can create a sequence of filters based on:

IP Address
Number
String
Geo Location

If the set of filters matches a request then your desired action will apply. For example, if I want to force traffic to be allowed to my API, I can enter the API URI as one of the filters (as above) and all traffic will be allowed.

Yes, all traffic will be allowed, including traffic that is not a false positive. If I only had a few OWASP rules that were blocking the traffic, the custom rule would disable all OWASP rules.

If you must use this approach, then implement it in the child policy so it is limited to the associated listener/workload.

Exclusions

This is the newest of the override types in WAF Policy – and I’ve found it to be the least useful.

The theory is that you can create an exclusion for one or more OWASP rules based on the values of request headers. For example, if a header called RequestHeaderKeys contains a value of X-Scanner you can instruct the affected OWASP rules to be disabled. This sounds really powerful and quite granular. But this starts to fall apart with other scenarios, such as the aforementioned SQL Injection.

Another common rule that alerts on or blocks traffic is Missing User Agent Header. Exclusions work on the value of a header, so if the header is missing, Exclusions cannot evaluate it.

Another gotcha is that you cannot combine header filters to create an exclusion. The Azure Portal experience for creating an Exclusion makes it look like you can. However, the result is two or more Exclusions that work independently.

If Exclusions will work for you, implement them in the per-listener/per-workload policy and specify only the rules that must be overridden. This approach will limit the effect of the exclusion:

The scope is just the listener/workload that is associated with the WAF Policy.
The scope is further limited to just requests where the header matches, allowing all other requests and all OWASP rules to be applied.

Disabled Rules

The final approach that you can use is to disable rules that are creating false positive alerts. A simple workload might only require one or two rules to be disabled. An older & larger workload might require many OWASP rules to be disabled!

If you are going to disable OWASP rules, then do it in the per-listener/per-workload policy. This will limit the effect of the changes to that listener/workload.

This is a fairly each approach and it is pretty granular – not as much as Exclusions. The downside is that you are completely disabling certain protections for an entire listener/workload, leaving the workload vulnerable to attacks of those previously protected types.

Combinations

If you have the time and the data, you can combine different approaches. For example:

A webhook that comes from the same IP address all of the time can be allowed via a Custom Rule based on an IP Address filter. Any other traffic will be subject to the fill defenses of the WAF.
If you have certain headers that must be allowed and you want to enable all other protections for all other traffic then use Exclusions.
If traffic can come from anywhere and you need to override OWASP rules, then disable those rules.

No Great Solution

In summary, there is no perfect solution. The best you can do is find the correct override solution for the specific false positive and deploy it to a specific listener or workload. This will limit the holes that you create in the WAF to the absolute minimum while enabling your workloads to function.

Azure App Service, Private Endpoint, and Application Gateway/WAF

In this post, I will share how to configure an Azure Web App (or App Service) with Private Endpoint, and securely share that HTTP/S service using the Azure Application Gateway, with the optional Web Application Firewall (WAF) feature. Whew! That’s lots of feature names!

Background

Azure Application (App) Services or Web Apps allows you to create and host a web site or web application in Azure without (directly) dealing with virtual machines. This platform service makes HTTP/S services easy. By default, App Services are shared behind a public/ & shared frontend (actually, load-balanced frontends) with public IP addresses.

Earlier this year, Microsoft released Private Link, a service that enables an Azure platform resource (or service shared using a Standard Tier Load Balancer) to be connected to a virtual network subnet. The resource is referred to as the linked resource. The linked resource connects to the subnet using a Private Endpoint. There is a Private Endpoint resource and a special NIC; it’s this NIC that shares the resource with a private IP address, obtained from the address space of the subnet. You can then connect to the linked resource using the Private Endpoint IPv4 address. Note that the Private Endpoint can connect to many different “subresources” or services (referred to as serviceGroup in ARM) that the linked resource can offer. For example, a storage account has serviceGroups such as file, blob, and web.

Notes: Private Link is generally available. Private Endpoint for App Services is still in preview. App Services Premium V2 is required for Private Endpoint.

The Application Gateway allows you to share/load balance a HTTP/S service at the application layer with external (virtual network, WAN, Internet) clients. This reverse proxy also offers an optional Web Application Firewall (WAF), at extra cost, to protect the HTTP/S service with the OWASP rule set and bot protection. With the Standard Tier of DDoS protection enabled on the Application Gateway virtual network, the WAF extends this protection to Layer-7.

Design Goal

The goal of this design is to ensure that all HTTP/S (HTTPS in this example) traffic to the Web App must:

Go through the WAF.
Reverse proxy to the App Service via the Private Endpoint private IPv4 address only.

The design will result in:

Layer-4 protection by an NSG associated with the WAF subnet. NSG Traffic Analytics will send the data to Log Analytics (and optionally Azure Sentinel for SIEM) for logging, classification, and reporting.
Layer-7 protection by the WAF. If the Standard Tier of DD0S protection is enabled, then the protection will be at Layer-4 (Application Gateway Public IP Address) and Layer-7 (WAF). Logging data will be sent to Log Analytics (and optionally Azure Sentinel for SIEM) for logging and reporting.
Connections directly to the web app will fail with a “HTTP Error 403 – Forbidden” error.

Note: If you want to completely prevent TCP connections to the web app then you need to consider App Service Environment/Isolated Tier or a different Azure platform/IaaS solution.

Design

Here is the design – you will want to see the original image:

There are a number of elements to the design:

Private DNS Zone

You must be able to resolve the FQDNs of your services using the per-resource type domain names. App Services use a private DNS zone called privatelink.azurewebsites.net. There are hacks to get this to work. The best solution is to create a central Azure Private DNS Zone called privatelink.azurewebsites.net.

If you have DNS servers configured on your virtual network(s), associate the Private DNS Zone with your DNS servers’ virtual network(s). Create a conditional forwarder on the DNS servers to forward all requests to privatelink.azurewebsites.net to 168.63.129.16 (https://docs.microsoft.com/en-us/azure/virtual-network/what-is-ip-address-168-63-129-16). This will result in:

A network client sending a DNS resolution request to your DNS servers for *.privatelink.azurewebsites.net.
The DNS servers forwarding the requests for *.privatelink.azurewebsites.net to 168.63.129.16.
The Azure Private DNS Zone will receive the forwarded request and respond to the DNS servers.
The DNS servers will respond to the client with the answer.

App Service

As stated before the App Service must be hosted on a Premium v2 tier App Service Plan. In my example, the app is called myapp with a default URI of https://myapp.azurewebsites.net. A virtual network access rule is added to the App Service to permit access from the subnet of the Application Gateway. Don’t forget to figure out what to do with the SCM URI for DevOps/GitHub integration.

Private Endpoint

A Private Endpoint was added to the App Service. The service/subresource/serviceGroup is sites. Automatically, Microsoft will update their DNS to modify the name resolution of myapp.azurewebsites.net to resolve to myapp.privatelink.azurewebsites.net. In the above example, the NIC for the Private Endpoint gets an IP address of 10.0.64.68 from the AppSubnet that the App Service is now connected to.

Add an A record to the Private DNS Zone for the App Service, resolving to the IPv4 address of the Private Endpoint NIC. In my case, myapp.privatelink.azurewebsites.net will resolve to 10.0.64.68. This in turn means that myapp.azurewebsites.net > myapp.privatelink.azurewebsites.net > 10.0.64.68.

Application Gateway/WAF

Add a new Backend Pool with the IPv4 address of the Private Endpoint NIC, which is 10.0.64.68 in my example.
Create a multisite HTTPS:443 listener for the required public URI, which will be myapp.joeelway.com in my example, adding the certificate, ideally from an Azure Key Vault. Use the public IP address (in my example) as the frontend.
Set up a Custom Probe to test https://myapp.azurewebsites.net:443 (using the hostname option) with acceptable responses of 200-399.
Create an HTTP Setting (the reverse proxy) to forward traffic to https://myapp.azurewebsites.net:443 (using the hostname option) using a well-known certificate (accepting the default cert of the App Service) for end-to-end encryption.
Bind all of the above together with a routing rule.

Public DNS

Now you need to get traffic for https://myapp.joeelway.com to go to the (public, in my example) frontend IP address of the Application Gateway/WAF. There are lots of ways to do this, including Azure Front Door, Azure Traffic Manager, and third-party solutions. The easy way is to add an A record to your public DNS zone (joeelway.com, in my example) that resolves to the public IP address of the Application Gateway.

The Result

A client browses https://myapp.joeelway.com.
The client name resolution goes to public DNS which resolves myapp.joeelway.com to the public IP address of the Application Gateway.
The client connects to the Application Gateway, requesting https://myapp.joeelway.com.
The Listener on the Application Gateway receives the connection.
- Any WAF functionality inspects and accepts/rejects the connection request.
The Routing Rule in the Application Gateway associates the request to https://myapp.joeelway.com with the HTTP Setting and Custom Probe for https://myapp.azurewebsites.net.
- The custom probe is constantly testing the availability of https://myapp.azurewebsites.net and a request is reverse proxied if the service is available.
The Application Gateway routes the request for https://myapp.joeelway.com to https://myapp.azurewebsites.net at the IPv4 address of the Private Endpoint (documented in the Application Gateway Backend Pool).
The App Service receives and accepts the request for https://myapp.azurewebsites.net and responds to the Application Gateway.
The Application Gateway reverse-proxies the response to the client.

For Good Measure

If you really want to secure things:

Deploy the Application Gateway as WAFv2 and store SSL certs in a Key Vault with limited Access Policies
The NSG on the WAF subnet must be configured correctly and only permit the minimum traffic to the WAF.
All resources will send all logs to Log Analytics.
Azure Sentinel is associated with the Log Analytics workspace.
Azure Security Center Standard Tier is enabled on the subscription and the Log Analytics Workspace.
If you can justify the cost, DDoS Standard Tier is enabled on the virtual network with the public IP address(es).

And that’s just the beginning 🙂

Failed to add new rule: IpSecurityRestriction.VnetSubnetResourceId is invalid.

This post is focused on a scenario where you are creating an Access Restriction rule in an Azure App Service to allow client requests from a subnet in a Virtual Network (VNET) and you get this error:

Failed to add new rule: IpSecurityRestriction.VnetSubnetResourceId is invalid. For request GET https://management.azure.com/subscriptions/xxxxxx/resourceGroups/xxxxxx/providers/Microsoft.Network/virtualNetworks/xxxxxx/taggedTrafficConsumers?api-version=2018-01-01 with clientRequestId xxxxxx and correlationRequestId xxxxxx, received a response with status code Forbidden, error code AuthorizationFailed, and response content: {“error”:{“code”:”AuthorizationFailed”,”message”:”The client ‘xxxxxx’ with object id ‘xxxxxx’ does not have authorization to perform action ‘Microsoft.Network/virtualNetworks/taggedTrafficConsumers/read’ over scope ‘/subscriptions/xxxxxx/resourceGroups/xxxxxx/providers/Microsoft.Network/virtualNetworks/xxxxxx’ or the scope is invalid. If access was recently granted, please refresh your credentials.”}}.

The Scenario

The customer wanted to deploy Standard Tier Azure App Services with some level of security in a hub and spoke architecture. The hub is in Subscription A. There a virtual network with an Azure Application Gateway (WAG)/Web Application Firewall(WAF) is deployed into a VNET/subnet. The WAF subnet has the Microsoft.Web Service Endpoint enabled, allowing the WAF to reverse proxy web requests via the direct path of the Service Endpoint to the App Service(s).

The App Service Plan and App Services are in Subscription B. The goal is to only allow traffic to the App Services via the WAF. All the necessary DNS/SSL stuff was done and the WAF was configured to route traffic. Now, the customer wanted to prevent requests from coming in directly to the App Service – an Access Restriction rule would be created with the Virtual Network type. However, when we tried to create that rule, it failed with the above security error.

Troubleshooting

At first, we thought there was an error with Azure Privileged Identity Management (PIM), but we soon ruled that out. The customer had Contributor rights and I had Owner rights over both subscriptions and we verified access. While doing a Teams screen share the customer read an article about Azure Key Vault with a similar error that indicated an issue with Resource Providers. We both had the same idea at the same time.

Solution

In the WAF subscription, enable the Microsoft.Web resource provider. This will allow the App Service to “configure” the integration with the subnet from its own subscription and solves the security issue.

Microsoft Ignite 2019 – Deliver Highly Available Secure Web Application Gateway and Web Application Firewall

Speaker:

Amit Srivastava, Principal Program Manager, Microsoft

Mission Critical HTTP Applications

Always On
Secure
Scalable
Telemetry
Polygot – variety of backed, IaaS, PaaS, on-prem

Many things to think about.

What Azure Pieces Can We Use?

WAG
AFD
CDN
WAF
Azure Load Balancer
Azure Traffic Manager

WAG

Regional ADS as a service. A full reverse proxy. It terminates the incoming connection and creates a new one to the web server.

Platform managed: built-in HA and sclability
Layer 7 load balancing: URL path, host based, round robin, session affinity, redirection
Security and SSL management: WAF, SSL Offload, SSL re-encryption, SSL policy
Public or ILB: Public internet, internal or both.
Flexible backends: VMs, VMSS, AKS, public IP, cloud services, ALB/ILB, On-premises
Rich diagnostics: Azure monitor, log analytics, network watcher, RHC, more

Standard v2 SKU in GA

Available in 26 regions
Built-in zone redundancy
Static VIP
HTTP header/cookies insertion/modification
Increased scale limits 20 -> 100 listeners
Key vault integration and autorenewal of SSL certs (GA)
AKS ingress controller (GA)

Autoscaling and performance improvements:

Grow and shrink based on app traffic requirements
5 x better SSL offloads performance
- 500-50,000 connections/sec with RSA 2048 bit certs
- 30,000, 3,000,000 persistent connections
- 2,500 – 250,0000 HTTP req/sec
75% reduction in provisioning time ~5mins

Key Vault Integration in v2 GA

Front end TLS cert integrated with Azure Key Vault
Utilizes user-assigned management identity for access control on key vault
Use certificate or secrets on Key Vault
Pools every 4 hours to enable automatic cert renewal – you can force a poll if you need to
Manual override or specific certificate version retrieval

WAG v2 Header Rewrites

Manipulate request and response headers and cookies
- Strip port from x-forwarded-for header
- Add security headers like HSTS and X-XSS-Protection
- Common header manipulation ex: HOST, SERVER
Conditional header rewrites … something

Ingress Controller

Ingress controller for 1+ AKS clusters at one time
Deployed using HELM – newer easier options by EOY
Utilized pod-AAD for ARM authentication
Tighter integration with AKS add-on support upcoming
Supports URI-path based, host based, SSL termination, SSL re-encryption, redirection, custom health probes, draining, cookie affinity.
Support for Let’s Encrypt provided TLS certs
WAF fully supported with custom listener policies
Support for multiple AKS as backend
Support for mixed mode- both AKS and other backend types on the same application gateway.

http://aka.ms/appgawks

Application Gateway Wildcard Listener

Managed preview
Support for wildcard characters in listener host name
Supports * and ? characters in host name
Associate wildcard or SAN certs to serve HTTPS

Telemetry Enhancements

GA
Diagnostics Log Enhancements
- TLS protocol version, cipher spec selected.
- Backend target server, response code, latency.
Metrics Enahncements
- Backend response status code
- RPS/healthy node
- End-to-end latency
- Backend latency
- Backend connect, first byte, and last byte latency.

Azure Monitor Insights for Application Gateway

Public Preview
Sign health and metric console for your entire cloud network#
No agent/configuration required
Visualize the structure and functional dependencies
More

AKS Demo

He loads a Helm YAML config to the AKS cluster. Now the AKS cluster can configure listers, backend pools, rules, etc for the containers/services running on the cluster. Pretty cool.

Azure WAF

Cloud native WAF

Unified WAF offering
- Protect your apps at network edge or in region uniformly
Public preview:
- Microsoft threat intelligence
  - Protect apps against automated attacks
  - Manage good/bad bots with Azure BotManager RuleSet
- Site and URI pathc specific WAF policies
  - Customise WAF policies at regional WAF for finer grained protection at each host/listener or URI path level
- Geo-filtering on regional WAF

HA, scalable fully platform managed
Auto-scaling support
New RuleSet CRS 3.1 added, will soon be the default
Integration with Azure Sentinel SIEM
Performance and concurrency enhancements
More

WAF Policy Enhancements

Assign different policies to different sites behind the same WAF
Increased configurability
Per-URI policy

Geo Filtering Public Preview

Block, allow, log countries.
Easily configurable in WAF policy
Geo data refreshed weekly

Only in special Portal URI at the moment – normal Azure Portal soon.

Bot Protection (Public Preview)

Stuff

Microsoft Ignite 2019 – Securing Your Cloud Perimeter With Azure Network Security

Speaker: Sinead O’Donvan (Irish, by the accent)

Zero Trust Architecture document

7 pillars:

Identity
Devices
Data
Apps
Infrastructure
Networking – the focus here

Verify explicitly every access control

Being on the network is not enough

Use least privilege access

IP address is not enough

Assume breach

No one is perfectly secure. Identify the breach. Contain the breach. Do your best to stop breaches in the first place.

You cannot claim success:

It requires constant improvement.

Network Maturity Model

Traditional (most customers)
- Few network security perimeters and flat open network
- Minimal threat protection and static filtering
- Internal traffic is not encrypted
Advanced
- Many ingress/egress cloud micro-perimeters with some micro-segmentation
- Cloud native filtering and protection for known threats
- User to app internal traffic is encrypted
Optimal
- Fully distributed ingress/egress cloud micro-perimeters and deeper micro-segmentation
- ML-based threat protection and filtering with context-based signals
- All traffic is encrypted

Three Cores of Azure Network Security

Segment – prevent lateral movement and data exfiltration
Protect – secure network with threat intelligence
Connect – embrace distributed connectivity … or face revolt from the users/devs

Deploy securely across DevOps process

Azure Features

Azure Firewall
Azure WAF
Azure Private Link
Azure DD0S Protection

Plus:

VNets
NSGs
UDRs
Load Balancer

Network Segmentation

3 approaches:

Host-based: an agent on the VM implements it
Hypervisor: Example, VMware SNX
Network controls

Azure Network Segmentation Controls

Subscription: RABC, logic isolation for all resources
Virtual network: An isolated and highly secure environment to run your VMs and apps. “This is the hero of segmentation”
NSG: Enforce and control network traffic security rules that allow or deny network traffic for a VNet or a VM.
WAF: Define application specific policies to protect web workloads.
Azure Firewall: Create and enforce connectivity policies using application, network and threat intelligence filtering across subscription(s) and VNet(s).

Multi-Level Segmentation

Connectivity:
- Use both public or private IP. Public app interface is public, backend is private.
- Choose cloud transit approach VNet peering or Virtual WAN.
- Carefully control routing
Infrastructure
- Segment across subscription, vnet, and subnet boundaries
- Managed at an org level
Application
- Enable application aware segmentation
- Easily create micro perimeters
- Managed at an application level

Azure Firewall Manager (Preview)

Central deployment and configuration
- Deploy and configure multiple Azure Firewall instances
- Optimized for DevOps with hierarchical policies
Automated Routing
- Easily direct traffic to your secured hub for filtering and logging without UDRs
And more

Azure Web Application Firewall

Preview:

Microsoft threat intelligence
- Protect apps against automated attacks
- Manage good/bad bots with Azure BotManager RuleSet
Site and URI patch specific WAF policies
- Customise WAF policies at regional WAF for finer grained protection at each host/listener or URI path level
Geo-filtering on regional WAF
- Enhanced custom rule matching criterion

MS sees 20/30 DDoS attacks per day.

WAF as a Service

Barracuda
Radware

Both run in Azure.

Connectivity

It’s time to transform your network.

User to app moves to Internet centric connectivity
Application to backend resources use private connectivity
Redesign your network and network security models to optimize user experience for cloud
Continue to extend app delivery models and network security to the edge

Azure Firewall Manager

Easily create multiple secured virtual hubs (DMZ Hubs) in Azure
Use Azure Firewall or 3^rd party security
Create global and local policies
Easy to set up connectivity
Roadmap:
- Split routing – optimized O365 and Azure public PaaS

CheckPoint CloudGuard Connect will debut soon as a partner extension.

Azure Private Link

Highly secure and private connectivity solution for Azure Platform.

Private access from VNet resources, peered networks and on-premises networks
In-built data exfiltration protection
Predictable private IP addresses for PaaS resources
Unified experience across PaaS customer owned and marketplace services

Microsoft taking this very seriously. All new PaaS services “from Spring onwards” must support Private Link.

Azure Bastion

See previous posts on this – it requires more work IMO because it lacks VNet peering support and requires login via the Azure Portal – doesn’t support MSTSC or SSH clients.

Key Takeaways

Embrace zero trust network model
Segment your network and create micro-perimters with Azure Firewall, NSG, etc
Use a defense in depth security strategy with cloud native services
Enable WAF and DDoS
Explore Azure as your secure Internet edge with Azure Firewall Manager

Creating an Azure Service for Slow Moving Organisations

In this post, I will explain how you can use Azure’s Public IP Prefix feature to pre-create public IP addresses to access Azure services when you are working big/government organisations that can take weeks to configure a VPN tunnel, outbound firewall rule, and so on.

In this scenario, I need a predictable IP address so that means I must use the Standard SKU address tier.

The Problem

It normally only takes a few minutes to create a firewall rule, a VPN tunnel, etc in an on-premises network. But sometimes it seems to take forever! I’ve been in that situation – you’ve set up an environment for the customer to work with, but their on-premises networking team(s) are slow to do anything. And you only wish that you had given them all the details that they needed earlier in the project so their configuration work would end when your weeks of engineering was wrapping up.

But you won’t know the public IP address until you create it. And that is normally only created when you create the virtual network gateway, Azure Firewall, Application Firewall, etc. But what if you had a pool of Azure public IP addresses that were pre-reserved and ready to share with the network team. Maybe they could be used to make early requests for VPN tunnels, firewall rules, and so on? Luckily, we can do that!

Public IP Prefix

An Azure Public IP Prefix is a set of reserved public IP addresses (PIPs). You can create an IP Prefix of a certain size, from /31 (2 addresses) to /24 (256 addresses), in a certain region. The pool of addresses is a contiguous block of predictable addresses. And from that pool, you can create public IP addresses for your Azure resources.

In my example, I want a Standard tier IP address and this requires a Standard tier Public IP Prefix. Unfortunately, the Azure Portal doesn’t allow for this with Public IP Prefix, so we need some PowerShell. First, we’ll define some reused variables:

$rgName = "test"
$region = "westeurope"
$ipPrefixName = "test-ipfx"

Now we will create the Publix IP Prefix. Note that the length refers to the subnet mask length. In my example that’s a /30 resulting in a prefix with 4 reserved public IP addresses:

$ipPrefix = New-AzPublicIpPrefix -Name $ipPrefixName -ResourceGroupName $rgName -PrefixLength 30 -Sku Standard -Location $region

You’ll note above that I used Standard in the command. This creates a pool of static Standard tier public IP addresses. I could have dropped the Standard, and that would have created a pool of static Basic tier IP addresses – you can use the Azure Portal to deploy Basic tier Public IP Prefix and public IP addresses from that prefix. The decision to use Standard tier or Basic tier affects what resources I can deploy with the addresses:

Standard: Azure Firewall, zone-redundant virtual network gateways, v2 application gateways/firewalls, standard tier load balancers, etc.
Basic static: Basic tier load balancers, v1 application gateways/firewalls, etc.

Note that the non-zone redundant virtual network gateways cannot use static public IP addresses and therefore cannot use Public IP Prefix.

Creating a Public IP Address

Let’s say that I have a project coming up where I need to deploy an Application Firewall and I know the on-premises network team will take weeks to allow outbound access to my new web service. Instead of waiting until I build the application, I can reserve the IP address now, tell the on-premises firewall team to allow it, and then work on my project. Hopefully, by the time I have the site up and running and presented to the Internet by my Application Firewall, they will have created the outbound firewall rule from the company network.

Browse to the Public IP Prefix and make sure that it is in the same region as the new virtual network and virtual network gateway. Open the prefix and check Allocated IP Addresses in the Overview. Make sure that there is free capacity in the reserved block.

Now I can continue to use my variables from above and create a new public IP address from one of the reserved addresses in the Public IP Prefix:

New-AzPublicIpAddress -Name "test-vpn-pip" -ResourceGroupName $rgName -AllocationMethod Static -DomainNameLabel "test-vpn" -Location $region -PublicIpPrefix $ipPrefix -Sku Standard

Use the Public IP Address

I now have everything I need to pass onto the on-premises network team in my request. In my example, I am going to create a v2 Application Firewall.

Once I configure the WAF, the on-premises firewall will (hopefully) already have the rule to allow outbound connections to my pre-reserved IP address and, therefore, my new web service.

Understanding How Azure Application Gateway Works

In this post, I will explain how things such as frontend configurations, listeners, HTTP settings, probes, backend pools, and rules work together to enable service publication in the Azure Web Application Gateway (WAG)/Web Application Firewall (WAF).

Introduction

The WAF/WAG is a scary beast at first. When you open one up there are just so many settings to be tweaked. If you are publishing just a simple test HTTP server, it’s easy: you populate the default backend pool and things just start to work. But if you want HTTPS, or to service many pools/sites, then things get complicated. And frustratingly slow 🙂 – Things have improved in v1 and v2 is significantly faster to configure, although it has architectural limitations (force public IP address and lack of support for route tables) that prevent me from using v2 in my large network deployments. Hopefully, the above map and following text will simplify things by explaining what all the pieces do and how they work together.

The below is not feature complete, and things will change in the future. But for 99% of you, this should (hopefully) be helpful.

Backend Pool

The backend pool describes a set of machines/services that will work together. The members of a backend pool must be all of the same type from one of these types:

IP address/hostname: a common choice in large Azure deployments – you can span peering connections to other VNets
Virtual machine: Select a machine from the same VNet as the WAG/WAF
VMSS: Virtual machine scale sets in the same VNet as the WAG/WAF
App Services: In the same subscription as the WAG/WAF

From here on out, I’ll be using the term “web server” to describe the above.

Note that this are the machines that host your website/service. They will all run the same website/service. And you can configure an optional custom probe to test the availability of the service on these machines.

(Optional) Health Probe

You can create a HTTP/HTTPS probe to do deeper probe tests of a service running on a backend pool. The probe is configured for HTTP or HTTPS and tests a hostname on the web server. You specify a path on the website, a frequency, timeout and allowed number of retries before designating a web site on a web server as being unhealthy and no longer a candidate for load balancing.

HTTP Setting

The HTTP setting configures how the WAG/WAF will talk to the members of the backend pool. It does not configure how clients talk to the site (Listener). So anything you see below here is for configuring WAG/WAF to web server communications (see HTTPS).

Control cookie-based affinity for load balancing
Configure connection draining when a machine is removed from a backend pool
Specify if this is for a HTTP or a HTTPS connection to the webserver. This is for end-to-end encryption.
- For HTTPS, you will upload a certificate that will match the web servers’ certificate.
The port that the web server is listening on.
Override the path
Override the hostname
Use a custom probe

Remember that the above HTTPS setting is not required for website to be published as SSL. It is only required to ensure that encryption continues from the WAG/WAF to the web servers.

Frontend IP Configuration

A WAG/WAF can have public or private frontend IP addresses – the variation depends on if you are using V1 (you have a choice on the mix) or V2 (you must use public and private). The public front end is a single public IP address used for publishing services publicly. The private frontend is a single virtual network address used for internal service publication, requiring virtual network connectivity (virtual network, VPN, ExpressRoute, etc).

The DNS records for your sites will point at the frontend IP address of the WAG/WAF. You can use third-party or Azure DNS – Azure DNS has the benefit of being hosted in every Azure region and in edge sites around the world so it is faster to resolve names than some DNS hoster with 3 servers in a single continent.

A single frontend can be shared by many sites. http://www.aidanfinn.com, http://www.cloudmechanix.com and http://www.joeeleway.com can all point to the same IP address. The hostname configuration that you have in the Listener will determine what happens to the incoming traffic afterwards.

Listener

A Listener is configured to listen for traffic destined to a particular hostname and port number and forward it, eventually, to the correct backend pool. There are two kinds of listener:

Basic: For very simple configurations where a site has exclusive ownership over a port number on one of the frontends. Typically this is for point solutions where a WAG/WAF is dedicated to a service.
Multi-Site: A listener shares a frontend configuration with other listeners, and is looking for traffic destined to a specific hostname/port/protocol.

Note that the Listner is where you place the certificate to secure client > WAG/WAF communications. This is known as SSL offloading. If you enable HTTPS you will place the “site certificate” on the WAG/WAF via the Listener. You can optionally re-encrypt traffic to the webserver from the WAG/WAF using the previously discussed HTTP Setting. WAGv2/WAFv2 have a no-support preview to use certs that are securely stored in Key Vault.

The configuration of a basic listener is:

Frontend
Frontend port
HTTP or HTTPS protocol
- The certificate for securing client > WAG/WAF traffic
Optional custom error pages

The multi-site listener is adds an extra configuration: hostname. This is because now the listener is sharing the frontend and is only catching traffic for its website. So if I want 3 websites on my WAG/WAF sharing a frontend, I will have 3 x HTTPS listeners and maybe 3 x HTTP listeners.

Rules

A rule glues together the configuration. A basic rule is pretty easy:

Traffic comes into a Listener
The HTTP Setting determines how to forward that traffic to the backend pool
The Backend Pool lists the web servers that host the site

A path-based rule allows you to extend your site across many backend pools. You might have a set of content for /media on pool1. Therefore all http://www.aidanfinn.com/media content is pulled from that pool1. All video content might be on http://www.aidanfinn.com/video, so you’ll redirect /video to pool2. And so on. And you can have individual HTTP settings for each redirection.

My Tips

There’s nothing like actually setting this up at scale to try this out. You will need a few DNS names to be able to work with.
Remember to enable the protection mode of WAF. I have audited deployments and found situations where people thought they had Layer-7 security but only had the default “alert-only” configuration of WAFv1.
In large environments, don’t forget to ensure that the NSGs protecting any webservers allow traffic in from the WAG/WAF’s subnet into the web servers on the port(s) specified in the HTTP Setting(s). Also ensure that any guest OS firewall is similarly configured.
Possibly the biggest issue you will have is with devs not assigning hostnames to websites in their webservers. If you’re using shared WAGs/WAFs you must use multi-site listeners and the websites should be configured with the hostname.
And the biggest tip I can give is to work out a naming standard for each of the above components so you know what piece is associated with what site. I can’t share what we’re using at work, but we have some big configurations and they are very easy to troubleshoot because of how we have named things.

Locking Down Network Access to the Azure Application Gateway/Firewall

In this post, I will explain how you can use a Network Security Group (NSG) to completely lock down network access to the subnet that contains an Azure Web Application Gateway (WAG)/Web Application Firewall (WAF).

The stops are as follows:

Deploy a WAG/WAF to a dedicated subnet.
Create a Network Security Group (NSG) for the subnet.
Associate the NSG with the subnet.
Create an inbound rule to allow TCP 65503-65534 from the Internet service tag to the CIDR address of the WAG/WAF subnet.
Create rules to allow application traffic, such as TCP 443 or TCP 80, from your sources to the CIDR address of the WAG/WAF
Create a low priority (4000) rule to allow any protocol/port from the AzureLoadBlanacer service tag to the CIDR address of the WAG/WAF
Create a rule, with the lowest priority (4096) to Deny All from Any source.

The Scenario

It is easy to stand up a WAG/WAF in Azure and get it up and running. But in the real world, you should lock down network access. In the world of Azure, all network security begins with an NSG. When you deploy WAG/WAF in the real world, you should create an NSG for the WAG/WAF subnet and restrict the traffic to that subnet to what is just required for:

Health monitoring of the WAG/WAF
Application access from the authorised sources
Load balancing of the WAG/WAF instances

Everything else inbound will be blocked.

The NSG

Good NSG practice is as follows:

Tiers of services are placed into their own subnet. Good news – the WAG/WAF requires a dedicated subnet.
You should create an NSG just for the subnet – name the NSG after the VNet-Subnet, and maybe add a prefix or suffix of NSG to the name.

Health Monitoring

Azure will need to communicate with the WAG/WAF to determine the health of the backends – I know that this sounds weird, but it is what it is.

Note: You can view the health of your backend pool by opening the WAG/WAF and browsing to Monitoring > Backend Health. Each backend pool member will be listed here. If you have configured the NSG correctly then the pool member status should be “Healthy”, assuming that they are actually healthy. Otherwise, you will get a warning saying:

Unable to retrieve health status data. Check presence of NSG/UDR blocking access to ports 65503-65534 from Internet to Application Gateway.

OK – so you need to open those ports from “Internet”. Two questions arise:

Is this secure? Yes – Microsoft states here that these ports are “are protected (locked down) by Azure certificates. Without proper certificates, external entities, including the customers of those gateways, will not be able to initiate any changes on those endpoints”.
What if my WAG/WAF is internal and does not have a public IP address? You will still do this – remember that “Internet” is everything outside the virtual network and peered virtual networks. Azure will communicate with the WAG/WAF via the Azure fabric and you need to allow this communication that comes from an external source.

In my example, my WAF subnet CIDR is 10.0.2.4/24:

Application Traffic

Next, I need to allow application traffic. Remember that the NSG operates at the TCP/UDP level and has no idea of URLs – that’s the job of the WAG/WAF. I will use the NSG to define what TCP ports I am allowing into the WAG/WAF (such as TCP 443) and from what sources.

In my example, the WAF is for internal usage. Clients will connect to applications over a VPN/ExpressRoute connection. Here is a sample rule:

If this was an Internet-facing WAG or WAF, then the source service tag would be Internet. If other services in Azure need to connect to this WAG or WAF, then I would allow traffic from either Virtual Network or specific source CIDRs/addresses.

The Azure Load Balancer

To be honest, this one caught me out until I reasoned what the cause was. My next rule will deny all other traffic to the WAG/WAF subnet. Without this load balancer rule, the client could not connect to the WAG/WAF. That puzzled me, and searches led me nowhere useful. And then I realized:

A WAG/WAF is 1+ instances (2+ in v2), each consuming IP addresses in the subnet.
They are presented to clients as a single IP.
That single IP must be a load balancer
That load balancer needs to probe the load balancer’s own backend pool – which are the instance(s) of the WAG/WAF in this case

You might ask: isn’t there a default rule to allow a load balancer probe? Yes, it has priority 65001. But we will be putting in a rule at 4096 to prevent all connections, overriding the 65000 rule that allows everything from VirtualNetwork – which includes all subnets in the virtual network and all peered virtual networks.

The rule is simple enough:

Deny Everything Else

Now we will override the default NSG rules that allow all communications to the subnet from other subnets in the same VNet or peered VNets. This rule should have the lowest possible user-defined priority, which is 4096:

Why am I using the lowest possible priority? This is classic good firewall rule practice. General rules should be low priority, and specific rules should be high priority. The more general, the lower. The more specific, the higher. The most general rule we have in firewalls is “block everything we don’t allow”; in other words, we are creating a white list of exceptions with the previously mentioned rules.

The Results

You should end up with:

The health monitoring rule will allow Azure to check your WAG/WAF over a certificate-secured channel.
Your application rules will permit specified clients to connect to the WAG/WAF, via a hidden load balancer.
The load balancer can probe the WAG/WAF and forward client connections.
The low priority deny rule will block all other communications.

Job done!