Microsoft has just announced a lower cost SKU of Azure Firewall, Basic, that is aimed at small/medium business but could also play a role in “branch office” deployments in Microsoft Azure.
Standard & Premium
Azure Firewall launched with a Standard SKU several years ago. The Standard SKU offered a lot of features, but some things deemed necessary for security were missing: IDPS and TLS Inspection were top of the list. Microsoft added a Premium SKU that added those features as well as fuller web category inspection and URL filtering (not just FQDN).
However, some customers didn’t adopt Azure Firewall because of the price. A lot of those customers were small-medium businesses (SMBs). Another scenario that might be affected is a “branch office” in an Azure region – a smaller footprint that is closer to clients that isn’t a main deployment.
Launching The Basic SKU
Microsoft has been working on a lower cost SKU for quite a while. The biggest challenge, I think, that they faced was trying to figure out how to balance feature, performance, and availability with price. They know that the target market has a finite budget, but there are necessary feature requirements. Every customer is different, so I guess when face with this conundrum, one needs to satisfy the needs of 80% of customers.
The clues for a new SKU have been publicly visible for quite a while – the ARM reference for Azure Firewall documented that a Basic SKU existed somewhere in Azure (in private preview). Tonight, Microsoft launched the Basic SKU inpublic Preview. A longer blog post adds some details.
Introducing the Azure Firewall
The primary target market for the Basic SKU hasn’t deployed a firewall appliance of any kind in Azure – if they are in Azure then they are most likely only using NSGs for security – which operates only at the transport protocol (TCP, UDP, ICMP) layer in a decentralised way.
The Azure Firewall is a firewall appliance, allowing centralised control. It should be deployed with NSGs and resource firewalls for layered protection, and where there is a zero-trust configuration (deny all by default) in all directions, even inside of a workload.
The Azure Firewall is native to Microsoft Azure – you don’t need a third party license or support contract. It is fully deployable and configured as code (ARM, Bicep, Terraform, Pulumi, etc), making it ideal for DevSecOps. Azure Firewall is much easier to learn than NVAs because the firewall is easily available through an Azure subscription and the training (Microsoft Learn) is publicly available – not hidden behind classic training paywalls. Thanks to the community and a platform model, I expect that more people are learning Azure Firewall than any other kind of firewall today – skills are in short supply so using native tech that is easy to learn and many are learning just makes sense.
Comparing Azure Basic With Standard and Premium
Microsoft helpfully put together a table to compare the 3 SKUs:
Another difference with the Basic SKU is that you must deploy the AzureFirewallManagementSubnet in addition to the AzureFirewallSubnet – this additional subnet is often associated with forced tunneling. The result is that the firewall will have a second public IP address that is used only for management tasks.
The Basic SKU follows the same price model as the higher SKUs: a base compute cost and a data processing cost. The shared pricing is for the Preview so it is subject to change.
The Basic SKU base compute (deployment) cost is €300.03 per month in West Europe. That’s less than 1/3 of the cost of the Standard SKU at €947.54 per month. The data processing cost for the Basic SKU is higher at €0.068 per GB. However, the amount of data passing through such a firewall deployment will be much lower so it probably will not be a huge add-on.
Preview Deployment Error
At this time, the Basic SKU is in preview. You must enable the preview in your subscription. If you do not do this, your deployment will fail with this error:
“message”: “Subscription ‘someGuid’ is missing required feature ‘Microsoft.Network/AzureFirewallBasic’ for Basic policies.”
Some Interesting Notes
I’ve not had a chance to do much work with the Basic SKU – work is pretty crazy lately. But here are two things to note:
A hub & spoke deployment is still recommended, even for SMBs.
Availability zones are supported for higher availability.
You are forced to use Azure Firewall Manager/Azure Firewall Policy – this is a good thing because newer features are only in the new management plane.
The new SKU of Azure Firewall should add new customers to this service. I also expect that larger enterprises will also be interested – not every deployment needs the full blown Standard/Premium deployment but some form of firewall is still required.
One can use simple deployment tasks in a DevOps pipeline:
A task that runs a deployment
A simple PowerShell/Azure CLI task that runs an inline script
But you might want something that does more. For example, you might want to do some error checking. Or maybe you are going to use a custom container (Azure Container Registry) and execute complex tasks from it. In my case, I wanted to do lots of error checking and give myself the ability to wrap scripts around my deployments.
The First Obstacle: Documentation
Azure DevOps documentation is notorious for being:
Out of date
Hard to find
The article you need to get started on using PowerShell can be found here. There is a Hello World example that shows how to pass in two parameters to a PowerShell script. I used that as the basis of my deployment – but it is not enough! I will fix that here.
The Second Obstacle: Examples
The DevOps world is very much a closed box. There’s lots of people doing stuff, but finding working examples is a nightmare. Once again, I will fix that here. The goal is to:
Store your code in an Azure DevOps repo
Create an Azure DevOps pipeline to deploy that code to Azure. It will authorised against the Azure subscription (or resource groups) using an App Registration that is stored in DevOps as a Service Connection.
The pipeline will execute a PowerShell script to deploy code from the DevOps repo into your subscription (or resource groups).
For this post, I will use the hub deployment from my GitHub AzireFirewall/DevSecOps repo – this deploys a VNet-based (legacy) hub in an Azure hub & spoke architecture.. There are a number of things you are going to need.
Afterwards, I will explain how the pipeline calls the PowerShell script.
Set up a repository in Azure DevOps. Copy the required files into the repo. In my example, there are two folders:
platform: This contains the files to deploy the hub in an Azure subscription. In my example, you will find bicep files with JSON parameter files.
scripts: This folder contains scripts used in the deployment. In my example, deploy.ps1 is a generic script that will deploy an ARM/Bicep template to a selected subscription/resource group.
.pipelines: This contains the files to deploy the code. In my example, you will find a YAML file for a DevOps pipeline called hub.yaml that will execute the script, deploy.ps1.
.github/workflows: This is where you will find YAML files that create workflows in GitHub actions. Any valid file will automatically create a workflow when there is a sucessful merge. My example contains hub.yaml to execute the script, deploy.ps1.
You can upload the files into the repo or sync using Git/VS Code.
Azure AD App Registration (Service Principal or SPN)
You will require an App Registration; this will be used by the Azure DevOps pipeline to gain authorised access to the Azure subscription.
Create an App Registation in Azure AD. Create a secret and store that secret (Azure Key Vault is a good location) because you will not be able to see the secret after creation. Grant the App Registration Owner rights to the Azure subscription (as in my example) or to the resource groups if you prefer that sort of deployment.
In your Azure DevOps project, browse to Project Settings > Service Connections. Create a new Service Connection of the Azure Resource Manager type. Select Service Principal (Manual) and enter the required details:
Subscription ID: The ID of the subscription that will be the default subscription when the Service Principal signs in.
Subscription Name: The name of the subscription that will be the default subscription when the Service Principal signs in.
Service Principal ID: The Client ID of the App Registration (see it’s Overview page).
Service Principal Key: The secret of the App Registration that you should have saved.
Tenant ID: The ID of the Azure AD tenant. You can get this from the Overview of the App Registration.
Service Connection Name: The name of the Service Connection; a naming standard helps here. For example, I name it after the scope of the deployment (the target subscription name). Remember this name because it will be used by the YAML file (a value called
azureSubscription) to create the pipeline. In my example, the service connection is called “hub”.
Hit Verify And Save – DevOps will verify that the Service Principal can sign in. You should double-check that it has rights over your desired scope in Azure (subscription in my example).
Create the Pipeline
In my example, the hard work is done. A YAML file defines the pipeline; you just need to import the YAML file to create the pipeline.
Go back to your DevOps project, browse to Pipelines, and click New Pipeline. Choose Azure Repos Git as the code location. Select your repo and choose Existing Azure Pipelines YAML File. Use the dropdown list box to select the YAML file – /.pipelines/devops-hub.yml in my case. Save the pipeline and it will run. If you go into the running job you should see a prompt, asking you to authorise the pipeline to use the “hub” service connection.
The pipeline will execute a task, that in turn, will run a PowerShell script. That PowerShell script takes in several parameters that tell the script what to deploy (bicep and parameter files), where to deploy it from (the temporary location where the files are downloaded into the pipeline container), and where to deploy it to (the subscription/resource group).
Executing A PowerShell Script
A pipeline has a section called steps; in here, you create a task for each job that you want to run. For example, I can execute an Azure CLI task, a PowerShell task that runs one/a few lines of inline code, or a PowerShell task that executes a PowerShell script from a file. It’s that last one that is interesting.
I can create a PowerShell script that does lots of cool things and store it in my repo. That script can be edited and managed by change control (pull requests) just like my code that I’m deploying. There is an example of this below:
The tasks is running “PowerShell v5” (see AzurePowerShell@5). That’s an important thing to note. The Microsoft documentation for running PowerShell shows PowerShell v2, and that does not support the Az modules, which is pretty pointless! PowerShell v4 added the Az modules.
The azureSubscription value refers to the Service Connection that we created earlier, authorising the pipeline against the desired target scope.
ScriptType is set to FilePath (not inline) so I can run a PowerShell script from a file. That requires me to use ScriptPath to define where the script is.
When the pipeline runs, it is executed in a container (defined earlier in the YAML file as ubuntu-latest in my example, Linux to speed things up). The files in the repo are downloaded to a working folder. That location is saved as $(System.DefaultWorkingDirectory). I can then add the relative location of the PowerShell script from the repo ( /.pipelines/deploy.ps1 ) to that path so the pipeline can find the script in the container.
My script is pretty generic. I can have:
Multiple Bicep files/JSON parameter files
Multiple target scopes
I can create a PowerShell task for each deployment and use the parameters to specialise the execution of the script.
We wrap up the task by specifying the PowerShell version to use and the display name for the task in the DevOps job console.
The PowerShell Script
The full code for the script can be found here. I’m going to focus on a few little things:
You can see in the above example that I passed in several parameters:
subscriptionId: The ID of the subscription to deploy the code to. This does not have to be the same as the default subscription specified in the Service Connction. The Service Principal used by the pipeline must have the required permissions in this subcsription.
resourceGroupName: The name of the resource group that the deployment will go into. My script will create the resource group if required.
location: The Azure region of the resource group.
deploymentName: The name of the ARM deployment that will be created in the resource group for the deployment (remember that Bicep deployments become ARM deployments).
templateFile: The path to the template file in the pipeline container.
templateParameterFile: The path to the parameter file for the template in the pipeline container.
Each of those parameters is identically named in param () at the start of the PowerShell script and those values specialise the execution of the generic script.
You can use Write-Host to output a value from the script to appear in the console of the running job. If you add -ForegroundColor then you can make certain messages, such as errors or warnings, stand out.
Beware of Manual Inputs
Some PowerShell commands might want a manual input. This is not supported in a pipeline and will terminate the pipeline with an error. Test for this happening and use code logic wrapped around your cmdlets to prevent it from happening – this is why a file-based script is better than a simple/short inline script, even to handle a situation like creating a resource group.
Error handling is a big deal in a hands-off script. You will find that 90% of my script is checking for things and dealing with unwanted scenarios that can happen. A simple example is a resource group.
An ARM deployment (remember this includes Bicep) must go into a resource group. You can just go ahead and write the one-liner to create a resource group. But what happens when you update the code, the script re-runs and sees the resource group is already there? In that scenario, a manual input will appear (and fail the pipeline) to confirm that you want to continue. So I have an elaborate test/to process:
if (!(Get-AzResourceGroup $resourceGroupName -ErrorAction SilentlyContinue))
# The resource group does not exist so create it
Write-Host "Creating the $resourceGroupName resource group"
New-AzResourceGroup -Name $resourceGroupName -Location $location -ErrorAction SilentlyContinue
# There was an error creating the resoruce group
Write-Host "There was an error creaating the $resourceGroupName resource group" -ForegroundColor Red
# The resoruce group already exists so there is nothing to do
Write-Host "The $resourceGroupName resource group already exists"
Once you know how to do it, executing a script in your pipeline is easy. Then your PowerShell knowledge can take over and your deployments can become more flexible and more powerful. My example executes ARM/Bicep deployments. Yours could do a PowerShell deployment, add scripted configurations to a template deployment, or even run another language like Terraform. The real thing to understand is that now you have a larger scripting toolset available to your automated deployments.
This post will discuss the pros & cons of creating & using Infrastructure-as-Code/IaC Modules – based on 2 years of experience in creating and using a modular approach.
Anyone who has done just a little bit of template work knows that ARM templates can get quickly get too big. Even a simple deployment, like a hub & spoke network architecture, can quickly expand out to several hundred lines without very much being added. Heck, when Microsoft first released the Cloud Adoption Framework “Enterprise Scale” example architecture, one of the ARM/JSON files was over 20,000 lines long!
The length of a template file can cause so many issues, including but definitely not limited to:
It becomes hard to find anything
Big code becomes hard code to update – one change has many unintended repercussions
Collaboration becomes near impossible
Agility is lost
One of the pain points that really annoyed one of my colleagues is that “big code” usually becomes non-standardised code; that becomes a big issue when a “service organisation” is supporting multiple clients (consulting company, managed services, Operations, or cloud centre of excellence).
The idea of modularisation is that commonly written code is written once as a module. That module is then referred to by other code whenever the functions of the module are required. This is nothing new – the concept of an “include” or “DLL” is very old in the computing world.
For example, I can create a Bicep/ARM/Terraform module for an Azure App Service. My module can deploy an App Service the way that I believe is correct for my “clients” and colleagues. It might even build some governance in, such as a naming standard, by automating the naming of the new resource based on some agreed naming pattern. Any customisations for the resource will be passed in as parameters, and any required values for inter-module dependencies can be passed out as outputs.
Quickly I can build out a library of modules, each deploying different resource types – now I have a module library. All I need now is code to call the modules, model dependencies, pass in parameters, and take outputs from one module and pass them in as parameters to others.
Quickly, the benefits appear:
You write less code because the code is written once and you reuse it.
Code is standardised. You can go from one workload to another, or one client to another, and you know how the code works.
Governance is built into the code. Things like naming standards are taken out of the hands of the human and written as code.
You have the potential to tap into new Azure features such as Template Specs.
Smaller code is easier to troubleshoot.
Breaking your code into smaller modules makes collaboration easier.
Most of the issues are related to the fact that you have now built a software product that must be versioned and maintained. Few of us outside the development world have the know-how to do this. And quite frankly, the work is time-consuming and detracts from the work that we should be doing.
No matter how well you write a module, it will always require updates. There is always a new feature or a previously unknown use case that requires new code in the module.
New code means new versions. No matter how well you plan, new versions will change how parameters are used and will introduce breaking changes with some or all previous usage of the module.
Trying to create a one-size-fits-all module is hard. Azure App Services are a perfect example because there are dozens if not hundreds of different configuration options. Your code will become long.
The code length is compounded by code complexity. Many values require some sort of input, such as NULL. Quickly you will have if-then-elses all over your code.
You will have to create a code release and versioning system that must be maintained. These are skills that Ops people typically do not have.
Changes to code will now be slowed down. If a project needs a previously unwritten module/feature, the new code cannot be used until it goes through the software release mechanism. Now you have lost one of the key features of The Cloud: agility.
So What Is Right?
The answer is, I do not know. I know that “big code” without some optimisation is not the way forward. I think the type of micro-modularisation (one module per resource type) that we normally think of when “IaC Modules” is mentioned doesn’t work either.
One of the reasons that I’ve been working on and writing about Bicep/Azure Firewall/DevSecOps recently is to experiment with things such as the concept of modularisation. I am starting to think that, yes, the modularisation concept is what we need, but how we have implemented the module is wrong.
My biggest concern with the micro-module approach is that it actually slowed me down. I ended up spending more time trying to get the modules to run cleanly than I would have if I’d just written the code myself.
Maybe the module should be a smaller piece of code, but it shouldn’t be a read-only piece of code. Maybe it should be an example that I can take and modify to my own requirements. That’s the approach that I have used in my DevSecOps project. My Bicep code is written into smaller files, each handling a subset of the tasks. That code could easily be shared in a reference library by a “cloud centre of excellence” and a “standard workload” repo could be made available as a starting point for new projects.
Please share below if you have any thoughts on the matter.
In this post, I will share the details for granting the least-privilege permissions to GitHub action/DevOps pipeline service principals for a DevSecOps continuous deployment of Azure Firewall.
I wrote about the design of the solution and shared the code in my post, Enabling DevSecOps with Azure Firewall. There I explained how you could break out the code for the rules of a workload and manage that code in the repo for the workload. Realistically, you would also need to break out the gateway subnet route table user-defined route (legacy VNet-based hub) and the VNet peering connection. All the code for this is shared on GitHub – I did update the repo with some structure and with working DevOps pipelines.
There were two things I wanted to add to the design:
Detailed permissions for the service principal used by the workload DevOps pipeline, limiting the scope of change that is possible in the hub.
hub: This deploys a (legacy) VNet-based hub with Azure Firewall.
customRoles: 4 Azure custom roles are defined. This should be deployed after the hub.
spoke1: This contains the code to deploy a skeleton VNet-based (spoke) workload with updates that are required in the hub to connect the VNet and route ingress on-prem traffic through the firewall.
The hub and spoke1 folders each contain a folder called .pipelines. There you will find a .yml file to create a DevOps pipeline.
The DevOps pipeline uses Azure CLI tasks to:
Select the correct Azure subscription & create the resource group
Deploy each .bicep file.
My design uses 1 sub for the hub and 1 sub for the workload. You are not glued to this bu you would need to make modifications to how you configure the service principal permissions (below).
To use the code:
Create a repo in DevOps for (1 repo) hub and for (1 repo) spoke1 and copy in the required code.
Create service principals in Azure AD.
Grant the service principal for hub owner rights to the hub subscription.
Grant the service principal for the spoke owner rights to the spoke subscription.
Create ARM service connections in DevOps settings that use the service principals. Note that the names for these service connections are referred to by azureServiceConnection in the pipeline files.
Update the variables in the pipeline files with subscription IDs.
Create the pipelines using the .yml files in the repos.
Don’t do anything just yet!
Service Principal Permissions
The hub service principal is simple – grant it owner rights to the hub subscription (or resource group).
The workload is where the magic happens with this DevSecOps design. The workload updates the hub suing code in the workload repo that affects the workload:
Ingress route from on-prem to the workload in the hub GatewaySubnet.
The firewall rules for the workload in the hub Azure Firewall (policy) using a rules collection group.
The VNet peering connection between the hub VNet and the workload VNet.
That could be deployed by the workload DevOps pipeline that is authenticated using the workload’s service principal. So that means the workload service principal must have rights over the hub.
The quick solution would be to grant contributor rights over the hub and say “we’ll manage what is done through code reviews”. However, a better practice is to limit what can be done as much as possible. That’s what I have done with the customRoles folder in my GitHub share.
Those custom roles should be modified to change the possible scope to the subscription ID (or even the resource group ID) of the hub deployment. There are 4 custom roles:
customRole-ArmValidateActionOperator.json: Adds the CUSTOM – ARM Deployment Operator role, allowing the ARM deployment to be monitored and updated.
customRole-PeeringAdmin.json: Adds the CUSTOM – Virtual Network Peering Administrator role, allowing a VNet peering connection to be created from the hub VNet.
customRole-RoutesAdmin.json: Adds the CUSTOM – Azure Route Table Routes Administrator role, allowing a route to be added to the GatewaySubnet route table.
customRole-RuleCollectionGroupsAdmin.json: Adds the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role, allowing a rules collection group to be added to an Azure Firewall Policy.
Deploy The Hub
The hub is deployed first – this is required to grant the permissions that are required by the workload’s service principal.
Grant Rights To Workload Service Principals
The service principals for all workloads will be added to an Azure AD group (Workloads Pipeline Service Principals in the above diagram). That group is nested into 4 other AAD security groups:
Resource Group ARM Operations: This is granted the CUSTOM – ARM Deployment Operator role on the hub resource group.
Hub Firewall Policy: This is granted the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role on the Azure Firewalll Policy that is associated with the hub Azure Firewall.
Hub Routes: This is granted the CUSTOM – Azure Route Table Routes Administrator role on the GattewaySubnet route table.
Hub Peering: This is granted the CUSTOM – Virtual Network Peering Administrator role on the hub virtual network.
Deploy The Workload
The workload now has the required permissions to deploy the workload and make modifications in the hub to connect the hub to the outside world.
In this post, I will share how you can implement DevSecOps with Azure Firewall, with links to a bunch of working Bicep files to deploy the infrastructure-as-code (IaC) templates.
This example uses a “legacy” hub and spoke – one where the hub is VNet-based and not based on Azure Virtual WAN Hub. I’ll try to find some time to work on the code for that one.
Hold on, because there’s a bunch of things to understand!
The DevSecOps methodology is more than just IaC. It’s a combination of people, processes, and technology to enable a fail-fast agile delivery of workloads/applications to the business. I discussed here how DevSecOps can be used to remove the friction of IT to deliver on the promises of the Cloud.
The Azure features that this design is based on are discussed in concept here. The idea is that we want to enable Devs/Ops/Security to manage firewall rules in the workload’s Git repository (repo). This breaks the traditional model where the rules are located in a central location. The important thing is not the location of the rules, but the processes that manage the rules (change control through Git repo pull request reviews) and who (the reviewers, including the architects, firewall admins, security admins, etc).
So what we are doing is taking the firewall rules for the workload and placing them in with the workload’s code. NSG rules are probably already there. Now, we’re putting the Azure Firewall rules for the workload in the workload repo too. This is all made possible thanks to changes that were made to Azure Firewall Policy (Azure Firewall Manager) Rules Collection Groups – I use one Rules Collection Group for each workload and all the rules that enable that workload are placed in that Rules Collection Group. No changes will make it to the trunk branch (deployment action/pipelines look for changes here to trigger a deployment) without approval by all the necessary parties – this means that the firewall admins are still in control, but they don’t necessarily need to write the rules themselves … and the devs/operators might even write the rules, subject to review!
This is the killer reason to choose Azure Firewall over NVAs – the ability to not only deploy the firewall resource, but to manage the entire configuration and rule sets as code, and to break that all out in a controlled way to make the enterprise more agile.
A peering connection to the hub, enabling connectivity with the hub and the firewall. All traffic leaving the spoke will route through the firewall thanks to a user-defined route in the spoke subnet route table. Peering is a two-way connection. The workload will include some bicep to deploy the spoke-hub and the hub-spoke connections.
A route for the GatewaySubnet route table in the hub. This is required to route traffic to the spoke address prefix(es) through the Azure Firewall so on-premises>spoke traffic is correctly inspected and filtered by the firewall.
In this section, I’ll explain the code layout and placement.
You can find my public repo, containing all the Bicep code here. Please feel free to download and use.
The Git Repo Design
You will have two Git repos:
The first repo is for the hub. This repo will contain the code for the hub, including:
The hub VNet.
The Hub VNet Gateway.
The GatewaySubnet Route Table.
The Azure Firewall.
The Azure Firewall Policy that manages the Azure Firewall.
The second repo is for the spoke. This skeleton example workload contains:
I have written a more detailed update on this section, which can be found here.
Each Git repo needs to authenticate with Azure to deploy/modify resources. Each repo should have a service principal in Azure AD. That service principal will be used to authenticate the deployment, executed by a GitHub action or a DevOps pipeline. You should restrict what rights the service principal will require. I haven’t worked out the exact minimum permissions, but the high-level requirements are documented below:
Trunk Branch Protection & Pull Request
Some of you might be worried now – what’s to stop a developer/operator working on Workload A from accidentally creating rules that affect Workload X?
This is exactly why you implement standard practices on the Git repos:
Protect the Trunk branch: This means that no one can just update the version of the code that is deployed to your firewall or hub. If you want to create an updated, you have to create a branch of the trunk, make your edits in that trunk, and submit the changes to be merged into trunk as a pull request.
Enable pull request reviews: Select a panel of people that will review changes that are submitted as pull requests to the trunk. In our scenario, this should include the firewall admin(s), security admin(s), network admin(s), and maybe the platform & workload architects.
Now, I can only submit a suggested set of rules (and route/peering) changes that must be approved by the necessary people. I can still create my code without delay, but a change control and rollback process has taken control. Obviously, this means that there should be SLAs on the review/approval process and guidance on pull request, approval, and rejection actions.
And There You Have It
Now you have the design and the Bicep code to enable DevSecOps with Azure Firewall.
In this post, I’m going to discuss how to solve an age-old problem that still hurts us in The Cloud with DevSecOps: the on-going friction between devs and ops and how the adoption of the cloud is making this worse.
Us Versus Them
Let me say this first: when I worked as a sys admin, I was a “b*st*rd operator from hell”. I locked things down as tight as I could for security and to control supportability. And as you can imagine, I had lots of fans in the development teams – not!
Ops and devs have traditionally disliked each other. Ops build the servers perfectly. Devs write awesome code. But when something goes wrong:
Their servers are too slow
Their architecture/code is rubbish
Along Came a Cloud
The cloud was meant to change things. And in some ways, it did. In the early days, when AWS was “the cloud”, devs got a credit card from somewhere and started building. The rush of freedom and bottomless resources oxygenated their creativity and they build and deployed like they were locked in a Lego shop for the weekend.
Eventually, the sober-minded Ops, Security, and Compliance folks observed what was happening and decided to pull the reigns back. A “landing zone” was built in The Cloud (now Azure and others are in play) and governance was put in place.
What was delivered in that landing zone? A representation of the on-premises data center that the devs were trying to escape from. Now they are told to work in this locked-down environment and the devs are suddenly slowed down and restricted. Change control, support tickets, and a default answer from Ops of “no” means that agility and innovation die.
But here’s the thing – the technology was a restricting factor when working on-premises: physical hardware means and 100% IaaS means that Ops need to deliver every part of the platform. In the cloud, technology wasn’t the cause of the issue. The Cloud started with self-service, all-you-can-eat capacity, and agility. And then traditional lockdowns were put in place.
A good salesperson might have said that there can be cost optimisations but cost savings should not be a primary motivation to go with the cloud. Real rewards come from agility, which leads to innovation. The ability to build fast, see if it works, develop it if it does, dump it if it doesn’t, and not commit huge budgets to failed efforts is huge to a business. When Ops locks down The Cloud, some of the best features of The Cloud are lost. And then the business is unhappy – there were costly migration projects, actual IT spend might have increased, and they didn’t get what they wanted – IT failed again.
By the way, this is something we (me and my colleagues at work) have started to see as a trend with mid-large organisations that have made the move to Azure. The technology isn’t failing them – people and processes are.
People & Processes
Technology has a role to play but we can probably guesstimate that it’s about 20% of the solution. People and processes must evolve to use The Cloud effectively. But those things are overlooked.
The CAF starts out by analysing the business wants from The Cloud. You cannot shape anything IT-wise without instruction from above. What does the business want? Do you know who you should not ask? The IT Manager – they want what IT wants. To complete the strategy definition, you need to get to the owners/C-level folks in the business – getting time with them is hard! Once you have a vision from the business you can start looking at how to organise the people and set up the processes.
Think about the structure of IT. There is an Ops team/department with a lead. That group of people has pillars of expertise in a mid-large organisation:
The Windows team
And so on
Even those people don’t work well in collaboration. There is also a Dev department that is made up of many teams (workloads) that may even have their own pillars of expertise – some/many of those are externals. There is no alignment or collaboration between all the parties involved in building, running, and continuously improving a workload.
DevOps is a methodology that brings Ops and Devs together in actual or virtual teams for each workload. For example, let’s say that a workload requires the following skills from many teams/departments:
That might be skills from 4 teams. But in DevSecOps, the workload defines a virtual or actual team of people that will work on that application and its underlying infrastructure together. The application and infrastructure architects will design together. The devs and ops skills will work together to produce the code that will create the underlying platform (PaaS and/or IaaS) that will be continuously developed/improved/deployed using GitHub/DevOps actions/pipelines.
Agile methodologies will be brought into plan:
Work through epics, user stories, features and tasks (backlog)
That are scheduled to sprints (kanban board)
And are assigned to/pulled by members of the DevOps team (resource planning)
What has been accomplished? Now a team works together. They have a single vision through a united team. They share a plan and communicate through daily standup meetings and modern tooling such as Teams. By working as one, they can produce code fast. And that means they can fail fast:
Produce a minimally viable product
Test if it works
If it does, improve on it in sprints
If it doesn’t, tear it down quickly with minimal money lost
In The Cloud, modern workloads are presented to clients over the Internet using TLS. The edge means that there is a security role. And in a good design, micro-segmentation is required, which means an expanded security role. And considering the nature of threats today, the security role should have some developer skills to analyse code and runtimes for security vulnerabilities.
If we don’t change how the security role is done then it can undo everything that DevOps accomplishes – all of a sudden a default “no” appears, halting all the progress towards agility and innovation.
DevSecOps adds the security role to DevOps. Now security personnel is a part of the workload’s team. They will be a part of the design process. They will be the ones that either implement in code and/or review firewall rules in the pull request. Elements of security are moved from a central location out to the repos for the workloads – the result is that the what and who don’t change; all that changes is the where.
Introducing the sort of changes that DevSecOps will require is not going to be easy or quick. We can do the tech pieces in Azure pretty easily, actually, but the people might resist and the processes won’t exist in the organising. Introducing change will be hard and it will be resisted. That’s why the process must be lead from the C-level.
In this post, I will show you how to test IDPS in Azure Firewall Premium, including test exploits and how to search the logs for alerts.
Azure Firewall Setup
You are going to need a few things:
Ideally a hub and spoke deployment of some kind, with a virtual machine in two different spokes. My lab is Azure Virtual WAN, using a VNet as the “compromised on-premises” and a second VNet as the target.
Azure Firewall Premium SKU with logging enabled to a Log Analytics Workspace.
Azure Firewall Policy Premium SKU, with IDPS enabled for Alert & Deny.
Make sure that you have firewall rules and NSG rules open to allow your “attacks” – the point of IDPS is to stop traffic on legitimate protocols/ports.
In this post, I will document the resources used in Azure Virtual Desktop, what they do, and how they interconnect.
This is a work-in-progress, so any updates I discover along the way will be added. You should also check out a similar post on Azure Image Builder.
Host Pool – Microsoft.DesktopVirtualization/hostpools
The host pool documents the configuration of the hosts that will provide the desktops/applications. Note that a Host Pool resource ID is required to create an Application Group.
Note, the VMs themselves are deployed using a linked template when you use the Azure Portal. My deployment used the “managed disks” template. This template deploys the VMs, runs some DSC, joins the machines to your domain. There is also a task to update the host pool.
The result of running Microsoft.DesktopVirtualization/hostpools does not create the VMs – it just manages any VMs added to the Host Pool.
The mandatory properties appear to be:
hostPoolType: BYODesktop, Personal, or Pooled.
loadBalancerType: BreadthFirst, DepthFirst, or Persistent.
preferredAppGroupType: Persistent, None, or RailApplications.
Application Group – Microsoft.DesktopVirtualization/applicationgroups
The Application Group documents the applications, user associations (the Desktop Virtualization User role is assigned to users/groups), and is associated with a Host Pool; therefore you must deploy a Host Pool resource before you deploy the planned Application Group.
The mandatory values appear to be:
hostPoolArmPath: The resource ID of the associated Host Pool
applicationGroupType: Desktop or RemoteApp
We know that Windows 365 (AKA “Cloud PC”) is built on Azure Virtual Desktop. Proof of that is in ARM, with a true/false property called cloudPcResource.
The Azure Virtual Desktop Workspace is the glue that holds everything together. The Workspace can be associated with no, 1, or many Application Groups via a non-mandatory array value called applicationGroupReferences. You can build a Workspace before your Application Groups and update this value later. Or you can build the (1) Host Pool(s), (2) Application Group(s), followed by the Workspace.
The mandatory values appear to be:
applicationGroupReferences: An array value with 0+ items, each being the resource ID of an Application Group.
The Host Pool will require virtual machines; these are created as a separate deployment. There’s nothing special here; they are virtual machines created from the Marketplace or from your own generalised image (captured or Shared Image Gallery). Two actions must be done to the VMs:
Domain Join: Either (legacy) ADDS (including Azure AD DS or Windows Server ADDS) or an Azure AD Join (a recent feature add).
Virtual Desktop agent: DSC will be used to deploy the agent. This will make an outbound connection to the Host Pool and register the VM.
AAD, AADDS, or ADDS? I prefer ADDS. This is because:
Most of the controls that you need are in Group Policy and AAD doesn’t do Group Policy.
AADDS relies on AAD which is a single-region service. If that region has AAD issues (and this happens pretty frequently) then your Azure Virtual Desktop farm is dead.
Third-party applications typically expect ADDS and will not support AADDS/AAD, even if it “works”.
In this post, I will discuss how you can use Rules Collection Groups in Azure Firewall to aggregate your Rules Collections and Rules to be aligned with a service definition or workload definition.
Workload and Service Definitions
In an organised environment, every workload (or service) has a definition. That definition describes all of the components that make up and facilitate the workload. That includes your firewall rules.
So it would make sense if you had a way of grouping firewall rules together if those rules are used to make a workload possible. For example, if I was running an Azure Virtual Desktop pool, I might treat that pool as a workload, document it as a workload, and want all of the rules that make that pool possible to be grouped together and managed as a unit.
The challenges I have faced with Azure Firewall and aligning rules along the model of a workload definition have been:
Realisation & Conceptualisation
I’ve always used some kind of workload-based approach but my approach wasn’t perfect. I decided to try to align rules with NSGs, placing inbound rules with the workload that was the destination. But then some workloads, such as Windows Admin Center or ADDS, require more reach, and then you get messy. And what if you have a dozen or more workloads that are atomic units but are also extremely integrated? Where do the rules go?
I’ve come to realise that rules should go with the service that they empower, regardless of destination. What made me think like that? Documentation of workloads for a forced-ad-hoc migration project that I’ve been working on for the last year. We didn’t get the chance to assess and plan in detail, so everything was an on-the-fly discovery. Our method of “place the rule with the target workload” has created very complicated ARM templates for Azure Firewall; defining a workload from a firewall perspective is very hard with that approach.
If we said that “all network rules that empower a workload go with the workload’s network Rules Collection” then things would get a bit better – a bit.
Rules Groups Collections Limitations
It is a year since Firewall Policy became generally available. Just like last year, I had some hours to experiment on Azure Firewall. I tried out the new tier, Rules Collection Groups. A reminder:
Rules > Rules Collections (typed, based on DNAT, Network, or Application)
Rules Collections > Rules Collection Groups
But at the time, the new tech was immature. Mixing Rules Collection types between Rules Groups was a disaster. The advice I got from the product group was “don’t do it, it’s not ready, stick with the default groups for now”.
So I did just that. That means that the DNAT rules collection, the Network Rules Collection, and the Application Rules Collection for any workload were split into 3 deployments, the default rules group collections for:
Each of those deployments could have dozens or hundreds of rules collections for each workload. And when you combine that with my previous approach to rules placement:
It’s a mess
Deploying a rule change required re-deploying all Rules Collections of a type for all workloads in using that type of Rules Collection.
A Better Approach
I have had some time to play and things are better.
Rules Alignment With Workload
I’ve discussed this already – place rules that empower a workload with the workload, not the destination workload.
If Workload X requires TCP 445 access to Workload Y, then I will create a Network Rule to Workload Y on TCP 445 in a Network Rules Collection for Workload X. The result is that all rules that make Workload X function will be in rules collections for Workload X. That makes documentation easier and makes the next step work.
Rules Collection Groups For Workload
This is the big change in Azure from this time last year. I can now create lots of Rules Collection Groups, each with a priority (for processing order). I will create 1 Rules Collection Group per workload. Workload X will get 1 Rules Collection Group.
All rules that make Workload X go into Rules Collection Groups for Workload X. I might have, depending on rules requirements, up to 6 Rules Collection Groups:
The Rules Collection Group is its own Deployment from an ARM perspective. If I’m managing the firewall as code (I do) then I can have 1 template (or parameters file) that defines the Rules Collection Group (and contained Rules Collections and Rules) for the entire workload and just the workload. Each workload will have its own template or parameter file. A change to a workload definition will affect 1 file only, and require 1 deployment only.
If you want to see an ARM template for deploying one of the workloads in my screenshot, then head on over to my GitHub.
This approach should leave the firewall much better organised, easier to manage in smaller chunks if using infrastructure-as-code, be easier to document, and more suitable for organisations that like to create/maintain service definitions.
RSA: In 2011, the Chinese PLA (or hackers sponsored by them) compromised RSA and used that access to attack customers of RSA.
What is a supply chain attack? It’s pretty hard to break into a network, especially one that has hardened itself. Users can be educated – ok, some will never be educated! Networks can be hardened and micro-segmented. Identity protections such as MFA and threat detection can be put in place.
But there remains a weakness – or several of them. There’s always a way into a network – the third party! Even the most secure network deployments require some kind of monitoring system – something where a piece of software is deployed onto “every VM”. Or there’s some software vendor that’s deep into your network that has openings all over the place. Those are your threats. If an attacker compromises the software from one of those vendors then they will get into your network during your next update and they will use the existing firewall holes & permissions that are required by the software to probe, spread, and attack.
You still need to have your first lines of defense, ideally using tools that are designed for protection against advanced persistent threats – not your regular AV package, dumby:
Backup with isolated offline storage protected by MFA
That’s a start, but, but a supply chain attack bypasses all that by using existing channels to enter your network as if it is from a trusted source – because the attack is embedded in the code from a trusted source.
They are restricted to the required directions, protocols, and ports.
That traffic passes through a firewall – and ideally several firewalls.
In Microsoft Azure, that means using:
A central firewall, in the form of a network firewall and/or web application firewall (Azure or NVA). This firewall controls connections between the outside world and your workloads, between your workloads, and importantly from your workloads to the outside world (prevents malware from talking to its human controller).
Network Security Groups at the subnet level that protect the subnet and even isolate nodes inside the subnet (use a custom Deny All rule because the default Deny All rule is useless when you understand the logic of how it works).
Resource firewalls – that’s the guest OS firewall and Azure resource firewalls.
If you have a Windows ADDS domain, use Group Policy to force the use of Windows Firewall – lazy admins and those very same vendors that will be the channel of attack will be the first to attempt to disable the firewall on machines that they are working on.
For Azure resources, consider the use of Azure Policy to force/audit the use of the firewalls in your resources and a default route to 0.0.0.0/0 via your central firewall.
An infrastructure-as-code approach to the central firewall (Azure Firewall) and NSGs brings documentation, change control, and rollback to network security.
This is where most organisations fail, and even where IT security officers really don’t get it.
Syslog is not security monitoring. Your AV is not security monitoring. You need something bigger, that is automated, and can filter through the noise – I regularly use the term “be your Neo to read the Matrix”. That’s because even in a small network, there is a lot of noise. Something needs to filter through that noise and identity the threats.
For example, there’s a lot of TCP 445 connection attempts coming from one IP address. Or there are lots of failed attempts to sign in as a user from one IP address. Or there are lots of failed connections logged by NSG rules. Or even better – all of the above. These are the sorts of things that malware that is attempting to spread will do. This is the sort of work that Azure Sentinel is perfect for – Sentinel connects to many data sources, pulls that data to a central place where complex queries can be run to look for threats that a human won’t be able to do. Threats can create incidents, incidents can trigger automated flows to eliminate the noise, and the remaining incidents can create alerts that humans will act upon.
But some malware is clever and not so noisy. The malware that hit the HSE (the Irish national health service) uses a lot of manual control to quietly spread over a very long time. Restricting outbound access to the Internet to just the required connections for business needs will cripple this control mechanism. But there’s still an automated element to this malware.
Other things to implement in Azure will include:
IDPS: An intrusion detection & prevention in the firewall, for example Azure Firewall Premium. When known malware/attack flows pass through the firewall, the firewall can log an alert or alert/deny the flows.
Security Center: Enabling Security Center “Azure Defender” (the tier previously known as the Azure Security Center Standard) provides you with oodles of new features, including some endpoint protections that are very confusingly packaged and licensed by Microsoft.
Managed Services Providers
MSPs are a part of the supply chain for their customers. MSP staff typically have credentials that allow them into many customer networks/services. That makes the identities of those staff very valuable.
A managed service provider should be a leader in identity security process, tooling, and governance. In the Microsoft world, that means using Azure AD Premium with MFA enabled for all staff. In the Azure world, Lighthouse should be used to gain access to customers’ cloud implementations. And that access should be zero-trust, powered by Privileged Identity Management (PIM).
These attackers are not script kiddies. They are professional organisations with big budgets, very skilled programmers and operators, and a lot of time and will. They know that with some persistent effort targeting a vendor, they can enter a lot of networks with ease. Hitting a systems management company, or more scarily, a security vendor, reaps BIG rewards because we invest in these products to secure our entire networks. The other big worry is those vendors that are deeply embedded with certain verticals such as finance or government. Imagine a vendor that is in every branch of a national government – one successful attack could bring down that entire government after a wave of upgrades! Or hitting a well known payment vendor could open up every bank in the EU.