3 Tips For MS Certification Hands-On-Labs

Here are my three tips for improving your results in a hands-on-lab in Microsoft certification exam:

  1. Read the task description carefully.
  2. Optimise your time.
  3. Review your work.

About Hands-On-Labs

If you’ve sat a Microsoft century exam over the past quarter century then you are familiar with the traditional format:

  • Either a simple scenario or a case study.
  • Multiple choice questions where you select one best answer or multiple answers that are correct or part of the best solution, or sometimes ordering the steps.
  • “Correct” answers that are wrong and “wrong” answers that are right depending on feature/update releases and when the question was probably written.
  • Trick questions that are quite unfair.

I sat the AZ-700: Designing and Implementing Microsoft Azure Networking Solutions this week and was surprised to see a hands-on-lab at the very end. Before the lab appeared I had approximately an hour left in the exam. When I was finished, I had 5 minutes left. The exercises were not hard, especially for anyone used to deploying Azure networking resources, but it was time consuming and there were a lot of tasks to complete.

The lab appeared at the end of the exam. It provided a username and password. The instructions do not offer a copy/paste feature in the Azure Portal which is embedded in the test. But you can click the username and password to get them to appear in the log-in screen. The absence of copy/paste means that you need to be careful when you are asked to enter a specific name for a resource.

The lab was made up of a number of exercises. Each exercise was discrete in my exam – no exercise depended on another. Each exercise had a description of differing complexity and clarity. Some were precise and some were vague. Some were short tasks and others were long-running tasks.

I found myself answering a comment on LinkedIn this morning and thought “this would be a nice blog post”. So on with the tips!

Read The Task Description Carefully

Just like the multiple-choice questions, the exercises are probably (I’m guessing) written by Microsoft Certified Trainers (MCTs) who may or may not be experts in the exam content – this sort of comment often makes MCTs angry and defensive. It’s clear from the language that some of the authors don’t have a clue – this is where MCTs say “leave feedback” but I would have spent another 2 hours leaving feedback when I had a family to get home to!

A task might have multiple ways to complete it. I can’t share specifics from the exam, but one of the tasks was very vague and offered no context. Without thinking too much I thought immediately of 3 possible deployments that could solve the issue. Which was right? Were all 3 right? I guessed with the one that would require the least work – it felt right based on some of the language but I was reading between lines and trying to think what the author was thinking.

Read the task instructions carefully. Take notes of things to complete – use the dry marker pens and boards you are given and check steps off as you do them. If a required resource is named, then note the case and duplicate that in your lab – Azure might drop it to lower case because that’s what it might do for that resource type!

Don’t jump to any assumptions. Look for clue words that hunt at requirements or things to avoid. In the unclear question that I had, there was one word that lead me to choose the approach that I took. I’ve no idea if the answer was right or wrong, but it felt right.

  1. Read the task carefully.
  2. Take notes on actions to create a checklist.
  3. Re-read the task looking for clue words.
  4. Verify your checklist is complete.
  5. Tick off items on the checklist as you work.

Optimise Your Time

I had 12 tasks (I think) to complete after answering dozens of multiple choice questions. I had an hour or so left in the exam and that hour flew by. As I said earlier, some tasks were quick. Some tasks required a lot of work. And some tasks were long-running.

In my exam, the tasks were independent of each other. That meant I could start on task 2 while a resource for task 1 was still deploying.

The Azure Portal can offer several ways to accomplish a task. You can build out each resource you require in individual wizards. Sometimes, the last “big” resource that you need has a wizard that can deploy everything – that’s the method that you want. Practice is your friend here, especially if you normally work using infrastructure-as-code (like me) and rarely deploy in the Azure Portal. Find different ways to deploy things and compare which is more efficient for basic deployments.

There are certain resources in Azure networking that take 10-45 minutes. If you have a task such as this then do not wait for the deployment to complete. Jump ahead to the next task and start reading.

You might find yourself working on 2-3 tasks at once if you use this approach. This is where tracking your work becomes critical. Earlier, I stated that you should track the requirements of a task using a checklist. You should also track the completion status of each task – you don’t want to forget to complete a task where you are waiting on a resource to deploy. Each task has a “Mark As Complete” button – use it and don’t consider the lab as complete until all tasks have green check marks.

  1. Practice deployments in the Azure Portal.
  2. Choose the deployment method that will complete more of the task requests in less time.
  3. Do not wait on long-running deployments.
  4. Track task completion using the “Mark As Complete” button.

Review Your Work

In my exam, the tasks normally did not instruct you to use a resource name. So I created names using the naming standard that I am used to. When I had completed all the tasks – all had green check marks – I decided to review my work. I read through the task requirements again and verified the results in the Azure Portal. I found that one task asked me to create a resource with a specific name and I had created it using my normal naming standard. I fixed my error and continued to check everything.

When you have finished your work, go through the exercise descriptions again. Confirm that the checklists are complete. And compare the asks with what you have done in the Azure Portal to verify that everything is done as it is required.

  1. Read the task descriptions again.
  2. Compare with your checklist.
  3. Compare with the results in the Azure Portal.

My Experience

It took me a few minutes to get over the shock of doing a hands-on-lab. I dreaded every minute because I have heard the horror tales of labs being slow or crashing mid-exam. I also was glad of blogging and lab work – in my day job I rarely use the Azure Portal to deploy networking (or any) resources.

But once settled in, I found that the labs were not difficult. The asks were not complicated – they were a mixture of vague and detailed:

  • How to decide what to deploy wasn’t written down.
  • Only one task had a requirement to name 1 resource.

The thing that really shocked me, though, was that I did not learn my result at the end of the exam. The normal Microsoft exam experience is that you confirm that you are finished the exam, you spend a few painful seconds wondering if the program has crashed, and a result appears. Instead, I was told to log into http://www.microsoft.com/learning/dashboard later in the day to see my result! I had to drive home and then take care of my kids so it was 90 minutes later when I finally got to sign in and navigate to see my result – which was a pass with no score shared. So I guess that I did OK in the labs.

Dealing With Azure Capacity Shortages

In this post, I will discuss the recent news that Azure is having some serious capacity issues affecting customers in 2022 that could last into 2023.

The News

Personally, I didn’t think that Azure having capacity issues was news. I thought that thanks to all the jokes on social media, everyone took it for granted that the effects of the pandemic and cargo crises had crippled electronics supplies the world over. Even the recent news that the annual MVP renewal (July 1st) was being postponed to July 5 was jokingly blamed on the capacity issues.

Last Friday, The Information (subscription required) reported:

… more than two dozen Azure data centers in countries around the world are operating with limited server capacity available to customers, according to two current Microsoft managers contending with the issue and an engineer who works for a major customer.

This is not the news we want to hear when there are repeated reports that cloud adoption is increasing, Azure adoption is growing faster than AWS adoption, and Microsoft’s total (including SaaS) cloud business is (marginally) the largest in the industry.

The Effect

Most people who have used Azure for a few years have seen the dreaded deployment fails – you cannot get capacity because the region you have selected doesn’t have it. And the “helpful” support agent tells you to try a less-impacted region on another continent.

That suggestion indicates how little that person cares or the lack of understanding about IT performance or compliance issues.

I work mostly with Norwegian clients. Typically, they need to stay inside the EU (using West Europe, the largest hero region), and often they have to stay inside Norway (Norway East, because Norway West is not publicly available to customers). If one of those clients tries to start something or create something in their chosen region then being told to deploy in Asia or the US isn’t going to work. If they chose Norway East because they need to comply with the local “Arkiv” (archive) law then venturing outside the border is not an option.

Let’s say you’ve built a highly integrated system in West US. You have lots of containers and functions, all integrating with each other, and hitting databases. Your workload is time-sensitive so you need to minimise every millisecond of latency. Heck, you have even started to use VMs to get access to proximity placement groups to control physical distances between tiers of your workloads. And then a support engineer tells you that your additional scale-out capacity should be halfway around the planet? Seriously?

Solutions

Other than some miracle where manufacturing backlogs and shipping mismanagement are solved, you will need to minimise risks of this sort of thing happening to you.

Auto-Scaling

Customers use auto-scaling to reduce costs. You dynamically create or power-up (allocate to a running/billing state) compute instances to stay slightly ahead of demand. And when demand (profits) reduce, you dynamically destroy or power down (deallocate to a stopped/non-billing state) unneeded compute instances.

When capacity is not available, auto-scaling can be a gamble. You want to minimise wastage during non-peak periods, but not having enough compute instances could damage revenue/operations more than having unused compute instances. For example, your Citrix/Azure Virtual Desktop instances are required for people to work, but when people try to log in on Monday morning, Azure cannot supply enough machines to scale-out your worker pools. Whoops! You saved some money over the weekend, but now only a small percentage of your employees can work.

In these times of shortage, you need to hold on to what you have got. It is worth considering the disablement of auto-scaling and going back to old practices of trying to figure out what you need, and keeping that capacity running.

Virtual Machine Migration

After the summer, I will have some involvement in three migration projects. Two of those could result in large numbers of virtual machines being deployed to Azure. All three projects have hard deadlines for completion. The last thing that those clients are going to want to hear is: “Sorry, your business plans are not feasible because there aren’t enough servers in the Cloud that IT has selected”.

Microsoft offers a program where you can reserve virtual machine capacity for selected series of virtual machines, called On-demand Capacity Reservation. In short, you pay for the price of the machine you want, before you deploy it, to guarantee the capacity that you want when you need it later.

Or as some cynics might say: you pay a vig to Microsoft for capacity that you should have expected in a hyper-scale cloud in the first place.

If you are in a scenario where you have an upcoming spike in capacity, such as a migration, and you need to be sure the capacity will be there, then this form of reservation must be strongly considered. However, telling a client that they need to pay for compute weeks before they need it – will be a hard sell.

SKU Choice

When we deploy new tech, we always want the latest and greatest. Why would you deploy a years-old D_v3 when there’s a D_v5 with a faster processor? If you really need the performance of the newer SKU, then I get the choice. But how many workloads genuinely need that kind of sizing. I’ve rarely encountered one.

Under the covers of the platform, that v5 is limited to new hardware with matching physical processors. That is the same capacity that Microsoft (and others) cannot get their hands on. However, the v3 is able to run on the v3 hardware, and thanks to Hyper-V processor management features, it is also able to run on newer hardware. So if you want more potential host capacity to run your compute instances, then you will choose older SKUs that run on older processors.

This advice will apply to any resource type – not just virtual machines. In the end, everything (including so-called serverless computing) is a virtual machine under the platform, except for items such as VMware hosts and SAP Hana machines which are physical servers.

Summary

It sounds like we are facing some issues. Azure might be making the headlines now, but Microsoft uses the same physical components as everyone else – there is only AMD and Intel, pretty much everything electronic is assembled in China, and everything needs months to move by cargo ship. We are going to have to box clever to ensure that we have enough capacity for resources such as virtual machines, app services, containers, databases, and so on – everything that relies on compute instances under the covers.

Deploy Shared Azure Bastion To Virtual WAN Spoke

In this post, I will explain how you can deploy Azure Bastion into a spoke in a hub & spoke architecture with a Virtual WAN hub – and use that Bastion to securely log into virtual machines in other spokes using RDP or SSH. I will also explain why this has limitations in a hub & spoke architecture with a VNet-based hub.

The Need

Even organisations that opt for a PaaS-only Azure implementation need virtual machines. Once you do network security, you need virtual machines for those DevOps build agents or those GitHub runners. And realistically, migrated legacy workloads need VMs. And things like AKS and HPC are based on VMs that you build and troubleshoot. So you need VMs. And therefore, you need a secure way to log into those VMs.

Those of us who want an air gap between the PC and the servers have tried things like RD Gateway and Guacamole. Neither is perfect. Ideally, we want Azure AD integration (for Premium security features) and a platform resource (to minimise maintenance).

And along came Azure Bastion. At first reading, it seemed ideal. And then we started to discover warts. Many of those warts were cleaned up. Bastion got support for a desktop client through a CLI login. A hub deployment was possible – if you use a VNet-based hub – but it gave Bastion users (including external support) staff a map of your entire Azure network because they read access to the hub VNet – and all its peering connections. For many of us, that left us with deploying Bastion in every subnet – both costly and a waste of IP space.

We needed an Azure Bastion that we could deploy once in a spoke. We could log into it, route through the firewall in the hub, and log into VMs running in other spokes.

IP-Based Connection

Microsoft announced a new feature in the Standard tier of Azure Bastion called IP-Based Connection. With this feature, you can log into a virtual machine across an IP network from your Bastion. That means you can log into:

  • Virtual machines in the same subnet or virtual network
  • Virtual machines in other Azure virtual networks
  • On-premises computers across site-to-site network connections such as VPN or ExpressRoute

The assumptions are that:

  • The NSG protecting the Azure virtual machine allows SSH/RDP from AzureBastionSubnet o the virtual machine.
  • The hub firewall (if you have one) allows SSH/RDP from AzureBastionSubnet to the virtual machine.
  • The on-premises firewall, if the virtual machine is on-premises, allows RDP/SSH from AzureBastionSubnet to the computer.
  • The OS of the virtual machine or computer allows RDP/SSH rom AzureBastionSubnet.

VNet-Based Hub

In my first experiment, I tried deploying a shared Azure Bastion in a spoke with a VNet-based hub. I could deploy it no problem but Azure Bastion could not route to other spokes sharing the hub. Why?

There were no routes from the spoke subnets to other spoke subnets. But that’s OK – I know how to fix that.

Let’s say my entire Azure network is in the 10.1.0.0/16 address space. Well, if I want to route to any spoke in that address space, I can:

  1. Create a route table and *cough* associate it with the AzureBastionSubnet
  2. Add a user-defined route (UDR) for 10.1.0.0/16 with a next hop of the hub firewall (or a routing appliance).

Azure Bastion needs to have a route of 0.0.0.0/0 to Internet so you can’t do the usual spoke thing with the UDR so I could leave the default Internet route in place.

Did it work? No – that’s because AzureBastionSubnet has a hard-coded rule to prevent the association of a route table. I guess that Microsoft had too many support calls with people doing bad things with a route table associated with AzureBastionSubnet.

It turns out that the only way to get a route from one spoke to another with a VNet-based hub is:

  1. Use Azure Virtual Network Manager (AVNM – currently in preview) to peer your spokes with transitive peering.
  2. Do not expect spoke-to-spoke traffic to flow through a hub firewall – AVNM does not support configuring a next-hop.

That means the only shared Azure Bastion option for VNet-based hubs is to deploy Bastion in the hub and leave all your peering connections visible. Ick!

vWAN-Based Hub

I really like using Azure Virtual WAN (vWAN) for my hub and spoke – and almost none of my customers use it for SD-WAN (the primary use case). The reasons I like it are:

  • It pushes the hub into the platform, reducing administrative efforts
  • Routing becomes something that you push from the hub using eBGP

“Ah – what’s that you say about eBGP, Aidan?”

You can create route tables in Virtual WAN Hubs – let’s call them hub route tables. Then in the properties of a spoke virtual network connection you can configure:

  • Propagation: Have the route table learn routes from the spoke virtual network using eBGP.
  • Association: Share routes from the route table to the spoke virtual network.

And you can put static routes into a hub route table.

My Scenario

A shared Azure Bastion in the spoke of an Azure Virtual WAN hub

When I deploy a Virtual WAN Hub, I choose the Secured Virtual WAN Hub option. This places an Azure Firewall in the hub. I then add a static route for the 0.0.0.0/0 and private IP address spaces to route via the Azure Firewall in the build-in Default hub route table.

All spokes:

  • Propagate to the built-in None hub route table, so the routes of the spoke are forgotten.
  • Associate with the built-in Default hub route table, so they learn the next hop to 0.0.0.0/0 and the private IP address spaces is via the firewall.

I can deploy Azure Bastion into a spoke but this spoke will require a different route configuration. That is because I use a firewall to isolate the spokes. If I had open spoke-to-spoke traffic then Bastion would probably just work. My scenario is actually simple to fix:

  1. Create a new hub route table, maybe called Bastion.
  2. Add a static route to the rest of the hub and spoke (10.1.0.0/16 in my example) with a next-hop of the hub firewall.
  3. Configure the Bastion spoke connection to associate with the Bastion hub route table and propagate to none.

Now:

  • The Bastion will use the firewall as the next hop to all other spokes.
  • Go directly to Internet for the control plane traffic, using the default route in the subnet.
  • Other spokes will have the same route back to the Bastion using the firewall as the next-hop.

Finally, you need to ensure that Firewall and NSG rules allow RDP/SSH from AzureBastionSubnet in the Bastion spoke to the VMs in other spokes. And it works! All an operator/developer/support staff member needs now is:

  • An Azure AD account with read access to the Azure Bastion resource – no need for read to the hub or even the spoke with the VM!
  • The IP address for the machine they want to sign into – my design is limited to Azure VMs but on-premises static routes could be added to the Bastion hub route table.
  • A username and password with login rights to the virtual machine or computer.

Digital Transformation Is Not Just A Tech Thing

In this post, I want to discuss why many businesses fail to get what they expect from The Cloud – why their “digital transformation” (or “cloud transformation”) fails.

Cloud Concerns

We’ve all seen the surveys. CIOs are scared about lots of things when it comes to cloud migration/adoption. Costs might overrun. Security might be insufficient. Skills are in short supply. I’m afraid to say those are all concerns – manageable ones. The big question is rarely discussed until it is too late: what are we really getting into?

Failure Starts In The Middle

Many cloud adoption projects start at the wrong place in the organisation. Instead of an instruction coming with direction and authority from the C-suite (CEO, CTO, CSO, CIO, etc), IT management, typically in Operations, make the decision to go to The Cloud for mundane reasons such as “try to reduce costs” or “avoid doing another hardware upgrade”.

Operations go ahead and build (or request a consultant to deliver) what they know: a centrally managed, locked-down environment. You know what I mean; a rigid environment that complies with old ITIL-style processes from the early 2000s. If a developer needs something, log a ticket, and we’ll get around to it.

Meanwhile, developers hear that The Cloud is coming and they imagine a world where there is self-service and bottomless pits of compute and storage. Oh! And less waiting on tickets logged with Operations. And the C-suite gets a visit from AWS or Microsoft and is told about how agile and disruptive their businesses will become. Super!

The Age-Old Battle Continues

The Cloud landing zone is built and developers are given “access” (if we can call it that) to their new virtual workspace, only to find that they can create limited quantities and sizes of (breath) virtual machines. Modern platforms are nowhere to be found. The ability to create is locked away behind custom permissions roles. If they want something, to do their job for the business, they need to log a ticket and wait – just how is this any different from the old VMware platform they probably ran on before The Cloud?

And the business is digitally transformed. Well, no, not really. Some SAP stuff might be running on Azure hardware and VMs now. And some databases might have been moved to The Cloud. But that’s about it. None of the agility or disruption happens.

What Went Wrong?

The issue began right at the start. I’ll give you an example. A Microsoft account manager (or whatever they are called this financial year), a Microsoft partner, and an IT operations manager have a meeting. This sounds like a bad joke so far, and I promise that no one will laugh. The Microsoft account manager offers X dollars to perform an assessment. Someone will come in, scan the VMware machines, write a report that says “here is the TCO comparison” and “you’d be silly not to get started now”. So the Operations manager gives the go-ahead and the direction of what to do.

That contrasts greatly with the Microsoft Cloud Adoption Framework (CAF). Phase 1 (Strategy) of the CAF is all about getting business motivations from the C-suite that can be shaped into the direction in phase 2 (Plan) where the tech stuff begins – including the (digital estate) assessment. And the Plan phase is all about people and process. Ahh – people (skills) and process (change). This is where digital transformation begins. We haven’t even gotten to the part where things are deployed (phase 3, Ready) in The Cloud yet because we don’t know what “shape” the organisation will be until business motivations tell the IT departments what is expected from The Cloud.

The Microsoft Azure Cloud Adoption Framework

So What Is Digital Transformation?

In short, digital transformation should change:

  1. Process: Legacy methods of creating & running IT systems may not suite the business, especially if self-service, agility, and disruption and required. Look at the organisations that have shaken up different verticals in different industries and service sectors. Their IT departments are very different to the vertical pillars that you may recognise in your organisation. The order to change came from the top where the authority resides.
  2. People: How people are organised may need to change. The skills they possess must change. Roles must be identified and skills must be developed before the project, not during or even after some handover phase from a consulting company. A budget and time for training, possibly even a budget to recruit additional bodies requires authority that can only come from the top.
  3. Technology: It’s easier to change technology than to change process and people. The problem is what to change it to. The shape of the technology must reflect the people and process patterns, otherwise it is not fit for purpose.

Az Module Scripts in GitHub Actions

In this post, I will show how to run Azure Az module scripts as tasks in a GitHub action workflow. Working examples can be found in my GitHub AzireFirewall/DevSecOps repo which is the content for my DevSecOps articles.

Credit to my old colleague Alan Kinane for getting me started with his post, Azure CI/CD with ARM templates and GitHub Actions.

Why Use A Script?

One can do some simple deployment tasks, but a script offers some advantages:

  • A task that runs a deployment
  • A simple PowerShell/Azure CLI task that runs an inline script

But you might want something that does more. For example, you might want to do some error checking. Or maybe you are going to use a custom container and execute complex tasks from it. In my case, I wanted to do lots of error checking and give myself the ability to wrap scripts around my deployments.

My Example

For this post, I will use the hub deployment from my GitHub AzireFirewall/DevSecOps repo – this deploys a VNet-based (legacy) hub in an Azure hub & spoke architecture.. There are a number of things you are going to need. I’ve just rorganised and updated the main branch to support both Azure DevOps pipelines (/.pipelines) and GitHub actions (/.github).

Afterwards, I will explain how the action workflow calls the PowerShell script.

GitHub Repo

Set up a repository in GitHub. Copy the required files into the repo. In my example, there are four folders:

  • platform: This contains the files to deploy the hub in an Azure subscription. In my example, you will find bicep files with JSON parameter files.
  • scripts: This folder contains scripts used in the deployment. In my example, deploy.ps1 is a generic script that will deploy an ARM/Bicep template to a selected subscription/resource group.
  • .github/workflows: This is where you will find YAML files that create workflows in GitHub actions. Any valid file will automatically create a workflow when there is a sucessful merge. My example contains hub.yaml to execute the script, deploy.ps1.
  • .pipelines: This contains the files to deploy the code. In my example, you will find a YAML file for a DevOps pipeline called hub.yaml that will execute the script, deploy.ps1.

You can upload the files into the repo or sync using Git/VS Code.

Azure AD App Registration (Service Principal or SPN)

You will require an App Registration; this will be used by the GitHub workflow to gain authorised access to the Azure subscription.

Create an App Registation in Azure AD. Create a secret and store that secret (Azure Key Vault is a good location) because you will not be able to see the secret after creation. Grant the App Registration Owner rights to the Azure subscription (as in my example) or to the resource groups if you prefer that sort of deployment.

Repo Secret

One of the features that I like a lot in GitHub is that you can store secrets at different levels (organisation, repo, environment). In my example, I am storing the secrets for using the the Service Principal in the repo, making it available to the workflow(s) that are in this repo only.

Open the repo, browse to Settings, and then go to Secrets. Create a new Repository Secret called AZURE_CREDENTIALS with the following struture:

{
  "tenantId": "<GUID>",
  "subscriptionId": "<GUID>",
  "clientId": "<GUID>",
  "clientSecret": "<GUID>"
}

Create the Workflow

GitHub makes this task easy. You can into your Repo>Actions and create a workflow from a template. Or if you have a YAML file that defines your workflow, you can place it into your repo in /.github/workflows. When that change is merged, GitHub will automatically create a workflow for you.

Tip: To delete a workflow, rename or delete the associated file and merge the change.

Logging In

The workflow must be able to sign into Azure. There are plenty of examples out there but you need to accomplish 3 things:

  1. Log into Azure
  2. Enable the Az modules

The following code in the workflow accomplishes those three tasks:

      - name: Login via Az module
        uses: azure/login@v1
        with:
          creds: ${{secrets.AZURE_CREDENTIALS}}
          enable-AzPSSession: true 

The line uses: azure/login@v1 enables and Azure sign-in. Note that the with statement selects the credentials that we previously added to the repo.

The line enable-AzPSSession: true enables the Az PowerShell modules. With that, we are signed in and have all we need to execute Azure PowerShell cmdlets.

Executing A PowerShell Script

A workflow has a section called jobs; in here, you can create multiple jobs that share something in common, such as a login. Each job is made up of steps. Steps perform actions such as checking out code from the repo, logging into Azure, and executing a deployment task (running a PowerShell script, for example).

I can create a PowerShell script that does lots of cool things and store it in my repo. That script can be edited and managed by change control (pull requests) just like my code that I’m deploying. There is an example of this below:

The PowerShell script step running in a GitHub action
      - name: Deploy Hub
        uses: azure/powershell@v1
        with:
          inlineScript: |
            .\scripts\deploy.ps1 -subscriptionId "${{env.hubSub}}" -resourceGroupName "${{env.hubNetworkResourceGroupName}}" -location "${{env.location}}" -deployment "hub" -templateFile './platform/hub.bicep' -templateParameterFile './platform/hub-parameters.json'
          azPSVersion: "latest"

The task is running “PowerShell v1” which will allow us to run an inline script using the sign-in that was previously created in the job (the login step). The configuration of this step is really simple – you specify the PowerShell cmdlets to run. In my example, it executes the PowerShell script and passes in the parameter values, some of which are stored as environment variables at the workflow level.

My script is pretty generic. I can have:

  • Multiple Bicep files/JSON parameter files
  • Multiple target scopes

I can create a PowerShell step for each deployment and use the parameters to specialise the execution of the script.

Running multiple PowerShell script tasks in a GitHub workflow

The PowerShell Script

The full code for the script can be found here. I’m going to focus on a few little things:

The Parameters

You can see in the above example that I passed in several parameters:

  • subscriptionId: The ID of the subscription to deploy the code to. This does not have to be the same as the default subscription specified in the Service Connction. The Service Principal used by the pipeline must have the required permissions in this subcsription.
  • resourceGroupName: The name of the resource group that the deployment will go into. My script will create the resource group if required.
  • location: The Azure region of the resource group.
  • deploymentName: The name of the ARM deployment that will be created in the resource group for the deployment (remember that Bicep deployments become ARM deployments).
  • templateFile: The path to the template file in the pipeline container.
  • templateParameterFile: The path to the parameter file for the template in the pipeline container.

Each of those parameters is identically named in param () at the start of the PowerShell script and those values specialise the execution of the generic script.

Outputs

You can use Write-Host to output a value from the script to appear in the console of the running job. If you add -ForegroundColor then you can make certain messages, such as errors or warnings, stand out.

Beware of Manual Inputs

Some PowerShell commands might want a manual input. This is not supported in a pipeline and will terminate the pipeline with an error. Test for this happening and use code logic wrapped around your cmdlets to prevent it from happening – this is why a file-based script is better than a simple/short inline script, even to handle a situation like creating a resource group.

Try/Catch

Error handling is a big deal in a hands-off script. You will find that 90% of my script is checking for things and dealing with unwanted scenarios that can happen. A simple example is a resource group.

An ARM deployment (remember this includes Bicep) must go into a resource group. You can just go ahead and write the one-liner to create a resource group. But what happens when you update the code, the script re-runs and sees the resource group is already there? In that scenario, a manual input will appear (and fail the pipeline) to confirm that you want to continue. So I have an elaborate test/to process:

if (!(Get-AzResourceGroup $resourceGroupName -ErrorAction SilentlyContinue))
{ 
  try 
  {
    # The resource group does not exist so create it
    Write-Host "Creating the $resourceGroupName resource group"
    New-AzResourceGroup -Name $resourceGroupName -Location $location -ErrorAction SilentlyContinue
  }
  catch 
  {
    # There was an error creating the resoruce group
    Write-Host "There was an error creaating the $resourceGroupName resource group" -ForegroundColor Red
    Break
  }
}
else
{
  # The resoruce group already exists so there is nothing to do
  Write-Host "The $resourceGroupName resource group already exists"
}

Conclusion

Once you know how to do it, executing a script in your pipeline is easy. Then your PowerShell knowledge can take over and your deployments can become more flexible and more powerful. My example executes ARM/Bicep deployments. Yours could do a PowerShell deployment, add scripted configurations to a template deployment, or even run another language like Terraform. The real thing to understand is that now you have a larger scripting toolset available to your automated deployments.

Az Module Scripts in DevOps Pipelines

In this post, I will show how to run Azure Az module scripts as tasks in an Azure DevOps pipeline. Working examples can be found in my GitHub AzireFirewall/DevSecOps repo which is the content for my DevSecOps articles.

Why Use A Script?

One can use simple deployment tasks in a DevOps pipeline:

  • A task that runs a deployment
  • A simple PowerShell/Azure CLI task that runs an inline script

But you might want something that does more. For example, you might want to do some error checking. Or maybe you are going to use a custom container (Azure Container Registry) and execute complex tasks from it. In my case, I wanted to do lots of error checking and give myself the ability to wrap scripts around my deployments.

The First Obstacle: Documentation

Azure DevOps documentation is notorious for being:

  • Bad
  • Out of date
  • Hard to find
  • Incomplete

The article you need to get started on using PowerShell can be found here. There is a Hello World example that shows how to pass in two parameters to a PowerShell script. I used that as the basis of my deployment – but it is not enough! I will fix that here.

The Second Obstacle: Examples

The DevOps world is very much a closed box. There’s lots of people doing stuff, but finding working examples is a nightmare. Once again, I will fix that here. The goal is to:

  1. Store your code in an Azure DevOps repo
  2. Create an Azure DevOps pipeline to deploy that code to Azure. It will authorised against the Azure subscription (or resource groups) using an App Registration that is stored in DevOps as a Service Connection.
  3. The pipeline will execute a PowerShell script to deploy code from the DevOps repo into your subscription (or resource groups).

My Example

For this post, I will use the hub deployment from my GitHub AzireFirewall/DevSecOps repo – this deploys a VNet-based (legacy) hub in an Azure hub & spoke architecture.. There are a number of things you are going to need.

Afterwards, I will explain how the pipeline calls the PowerShell script.

DevOps Repo

Set up a repository in Azure DevOps. Copy the required files into the repo. In my example, there are two folders:

  • platform: This contains the files to deploy the hub in an Azure subscription. In my example, you will find bicep files with JSON parameter files.
  • scripts: This folder contains scripts used in the deployment. In my example, deploy.ps1 is a generic script that will deploy an ARM/Bicep template to a selected subscription/resource group.
  • .pipelines: This contains the files to deploy the code. In my example, you will find a YAML file for a DevOps pipeline called hub.yaml that will execute the script, deploy.ps1.
  • .github/workflows: This is where you will find YAML files that create workflows in GitHub actions. Any valid file will automatically create a workflow when there is a sucessful merge. My example contains hub.yaml to execute the script, deploy.ps1.

You can upload the files into the repo or sync using Git/VS Code.

Azure AD App Registration (Service Principal or SPN)

You will require an App Registration; this will be used by the Azure DevOps pipeline to gain authorised access to the Azure subscription.

Create an App Registation in Azure AD. Create a secret and store that secret (Azure Key Vault is a good location) because you will not be able to see the secret after creation. Grant the App Registration Owner rights to the Azure subscription (as in my example) or to the resource groups if you prefer that sort of deployment.

Service Connection

In your Azure DevOps project, browse to Project Settings > Service Connections. Create a new Service Connection of the Azure Resource Manager type. Select Service Principal (Manual) and enter the required details:

  • Subscription ID: The ID of the subscription that will be the default subscription when the Service Principal signs in.
  • Subscription Name: The name of the subscription that will be the default subscription when the Service Principal signs in.
  • Service Principal ID: The Client ID of the App Registration (see it’s Overview page).
  • Service Principal Key: The secret of the App Registration that you should have saved.
  • Tenant ID: The ID of the Azure AD tenant. You can get this from the Overview of the App Registration.
  • Service Connection Name: The name of the Service Connection; a naming standard helps here. For example, I name it after the scope of the deployment (the target subscription name). Remember this name because it will be used by the YAML file (a value called
    azureSubscription) to create the pipeline. In my example, the service connection is called “hub”.

Hit Verify And Save – DevOps will verify that the Service Principal can sign in. You should double-check that it has rights over your desired scope in Azure (subscription in my example).

Create the Pipeline

In my example, the hard work is done. A YAML file defines the pipeline; you just need to import the YAML file to create the pipeline.

Go back to your DevOps project, browse to Pipelines, and click New Pipeline. Choose Azure Repos Git as the code location. Select your repo and choose Existing Azure Pipelines YAML File. Use the dropdown list box to select the YAML file – /.pipelines/devops-hub.yml in my case. Save the pipeline and it will run. If you go into the running job you should see a prompt, asking you to authorise the pipeline to use the “hub” service connection.

 

The pipeline will execute a task, that in turn, will run a PowerShell script. That PowerShell script takes in several parameters that tell the script what to deploy (bicep and parameter files), where to deploy it from (the temporary location where the files are downloaded into the pipeline container), and where to deploy it to (the subscription/resource group).

Executing A PowerShell Script

A pipeline has a section called steps; in here, you create a task for each job that you want to run. For example, I can execute an Azure CLI task, a PowerShell task that runs one/a few lines of inline code, or a PowerShell task that executes a PowerShell script from a file. It’s that last one that is interesting.

I can create a PowerShell script that does lots of cool things and store it in my repo. That script can be edited and managed by change control (pull requests) just like my code that I’m deploying. There is an example of this below:

The PowerShell script job running in Azure DevOps
steps:

# Deploy Hub

- task: AzurePowerShell@5
  inputs:
    azureSubscription: $(azureServiceConnection)
    ScriptType: 'FilePath'
    ScriptPath: $(System.DefaultWorkingDirectory)/.pipelines/deploy.ps1
    ScriptArguments: > # Use this to avoid newline characters in multiline string
      -subscriptionId $(hubSub)
      -resourceGroupName $(hubNetworkResourceGroupName)
      -location $(location)
      -deployment "hub"
      -templateFile $(System.DefaultWorkingDirectory)"/platform/hub.bicep"
      -templateParameterFile $(System.DefaultWorkingDirectory)"/platform/hub-parameters.json"
    azurePowerShellVersion: 'LatestVersion'
  displayName: 'Deploy Hub'

The tasks is running “PowerShell v5” (see AzurePowerShell@5). That’s an important thing to note. The Microsoft documentation for running PowerShell shows PowerShell v2, and that does not support the Az modules, which is pretty pointless! PowerShell v4 added the Az modules.

The azureSubscription value refers to the Service Connection that we created earlier, authorising the pipeline against the desired target scope.

ScriptType is set to FilePath (not inline) so I can run a PowerShell script from a file. That requires me to use ScriptPath to define where the script is.

When the pipeline runs, it is executed in a container (defined earlier in the YAML file as ubuntu-latest in my example, Linux to speed things up). The files in the repo are downloaded to a working folder. That location is saved as $(System.DefaultWorkingDirectory). I can then add the relative location of the PowerShell script from the repo ( /.pipelines/deploy.ps1 ) to that path so the pipeline can find the script in the container.

My script is pretty generic. I can have:

  • Multiple Bicep files/JSON parameter files
  • Multiple target scopes

I can create a PowerShell task for each deployment and use the parameters to specialise the execution of the script.

Running multiple PowerShell script tasks in a DevOps pipeline

We wrap up the task by specifying the PowerShell version to use and the display name for the task in the DevOps job console.

The PowerShell Script

The full code for the script can be found here. I’m going to focus on a few little things:

The Parameters

You can see in the above example that I passed in several parameters:

  • subscriptionId: The ID of the subscription to deploy the code to. This does not have to be the same as the default subscription specified in the Service Connction. The Service Principal used by the pipeline must have the required permissions in this subcsription.
  • resourceGroupName: The name of the resource group that the deployment will go into. My script will create the resource group if required.
  • location: The Azure region of the resource group.
  • deploymentName: The name of the ARM deployment that will be created in the resource group for the deployment (remember that Bicep deployments become ARM deployments).
  • templateFile: The path to the template file in the pipeline container.
  • templateParameterFile: The path to the parameter file for the template in the pipeline container.

Each of those parameters is identically named in param () at the start of the PowerShell script and those values specialise the execution of the generic script.

Outputs

You can use Write-Host to output a value from the script to appear in the console of the running job. If you add -ForegroundColor then you can make certain messages, such as errors or warnings, stand out.

Beware of Manual Inputs

Some PowerShell commands might want a manual input. This is not supported in a pipeline and will terminate the pipeline with an error. Test for this happening and use code logic wrapped around your cmdlets to prevent it from happening – this is why a file-based script is better than a simple/short inline script, even to handle a situation like creating a resource group.

Try/Catch

Error handling is a big deal in a hands-off script. You will find that 90% of my script is checking for things and dealing with unwanted scenarios that can happen. A simple example is a resource group.

An ARM deployment (remember this includes Bicep) must go into a resource group. You can just go ahead and write the one-liner to create a resource group. But what happens when you update the code, the script re-runs and sees the resource group is already there? In that scenario, a manual input will appear (and fail the pipeline) to confirm that you want to continue. So I have an elaborate test/to process:

if (!(Get-AzResourceGroup $resourceGroupName -ErrorAction SilentlyContinue))
{ 
  try 
  {
    # The resource group does not exist so create it
    Write-Host "Creating the $resourceGroupName resource group"
    New-AzResourceGroup -Name $resourceGroupName -Location $location -ErrorAction SilentlyContinue
  }
  catch 
  {
    # There was an error creating the resoruce group
    Write-Host "There was an error creaating the $resourceGroupName resource group" -ForegroundColor Red
    Break
  }
}
else
{
  # The resoruce group already exists so there is nothing to do
  Write-Host "The $resourceGroupName resource group already exists"
}

Conclusion

Once you know how to do it, executing a script in your pipeline is easy. Then your PowerShell knowledge can take over and your deployments can become more flexible and more powerful. My example executes ARM/Bicep deployments. Yours could do a PowerShell deployment, add scripted configurations to a template deployment, or even run another language like Terraform. The real thing to understand is that now you have a larger scripting toolset available to your automated deployments.

Pros & Cons: IaC Modules

This post will discuss the pros & cons of creating & using Infrastructure-as-Code/IaC Modules – based on 2 years of experience in creating and using a modular approach.

Why Modules?

Anyone who has done just a little bit of template work knows that ARM templates can get quickly get too big. Even a simple deployment, like a hub & spoke network architecture, can quickly expand out to several hundred lines without very much being added. Heck, when Microsoft first released the Cloud Adoption Framework “Enterprise Scale” example architecture, one of the ARM/JSON files was over 20,000 lines long!

The length of a template file can cause so many issues, including but definitely not limited to:

  • It becomes hard to find anything
  • Big code becomes hard code to update – one change has many unintended repercussions
  • Collaboration becomes near impossible
  • Agility is lost

One of the pain points that really annoyed one of my colleagues is that “big code” usually becomes non-standardised code; that becomes a big issue when a “service organisation” is supporting multiple clients (consulting company, managed services, Operations, or cloud centre of excellence).

Modularisation

The idea of modularisation is that commonly written code is written once as a module. That module is then referred to by other code whenever the functions of the module are required. This is nothing new – the concept of an “include” or “DLL” is very old in the computing world.

For example, I can create a Bicep/ARM/Terraform module for an Azure App Service. My module can deploy an App Service the way that I believe is correct for my “clients” and colleagues. It might even build some governance in, such as a naming standard, by automating the naming of the new resource based on some agreed naming pattern. Any customisations for the resource will be passed in as parameters, and any required values for inter-module dependencies can be passed out as outputs.

Quickly I can build out a library of modules, each deploying different resource types – now I have a module library. All I need now is code to call the modules, model dependencies, pass in parameters, and take outputs from one module and pass them in as parameters to others.

The Benefits

Quickly, the benefits appear:

  • You write less code because the code is written once and you reuse it.
  • Code is standardised. You can go from one workload to another, or one client to another, and you know how the code works.
  • Governance is built into the code. Things like naming standards are taken out of the hands of the human and written as code.
  • You have the potential to tap into new Azure features such as Template Specs.
  • Smaller code is easier to troubleshoot.
  • Breaking your code into smaller modules makes collaboration easier.

The Issues

Most of the issues are related to the fact that you have now built a software product that must be versioned and maintained. Few of us outside the development world have the know-how to do this. And quite frankly, the work is time-consuming and detracts from the work that we should be doing.

  • No matter how well you write a module, it will always require updates. There is always a new feature or a previously unknown use case that requires new code in the module.
  • New code means new versions. No matter how well you plan, new versions will change how parameters are used and will introduce breaking changes with some or all previous usage of the module.
  • Trying to create a one-size-fits-all module is hard. Azure App Services are a perfect example because there are dozens if not hundreds of different configuration options. Your code will become long.
  • The code length is compounded by code complexity. Many values require some sort of input, such as NULL. Quickly you will have if-then-elses all over your code.
  • You will have to create a code release and versioning system that must be maintained. These are skills that Ops people typically do not have.
  • Changes to code will now be slowed down. If a project needs a previously unwritten module/feature, the new code cannot be used until it goes through the software release mechanism. Now you have lost one of the key features of The Cloud: agility.

So What Is Right?

The answer is, I do not know. I know that “big code” without some optimisation is not the way forward. I think the type of micro-modularisation (one module per resource type) that we normally think of when “IaC Modules” is mentioned doesn’t work either.

One of the reasons that I’ve been working on and writing about Bicep/Azure Firewall/DevSecOps recently is to experiment with things such as the concept of modularisation. I am starting to think that, yes, the modularisation concept is what we need, but how we have implemented the module is wrong.

My biggest concern with the micro-module approach is that it actually slowed me down. I ended up spending more time trying to get the modules to run cleanly than I would have if I’d just written the code myself.

Maybe the module should be a smaller piece of code, but it shouldn’t be a read-only piece of code. Maybe it should be an example that I can take and modify to my own requirements. That’s the approach that I have used in my DevSecOps project. My Bicep code is written into smaller files, each handling a subset of the tasks. That code could easily be shared in a reference library by a “cloud centre of excellence” and a “standard workload” repo could be made available as a starting point for new projects.

Please share below if you have any thoughts on the matter.

Azure Firewall DevSecOps in Azure DevOps

In this post, I will share the details for granting the least-privilege permissions to GitHub action/DevOps pipeline service principals for a DevSecOps continuous deployment of Azure Firewall.

Quick Refresh

I wrote about the design of the solution and shared the code in my post, Enabling DevSecOps with Azure Firewall. There I explained how you could break out the code for the rules of a workload and manage that code in the repo for the workload. Realistically, you would also need to break out the gateway subnet route table user-defined route (legacy VNet-based hub) and the VNet peering connection. All the code for this is shared on GitHub – I did update the repo with some structure and with working DevOps pipelines.

This Update

There were two things I wanted to add to the design:

  • Detailed permissions for the service principal used by the workload DevOps pipeline, limiting the scope of change that is possible in the hub.
  • DevOps pipelines so I could test the above.

The Code

You’ll find 3 folders in the Bicep code now:

  • hub: This deploys a (legacy) VNet-based hub with Azure Firewall.
  • customRoles: 4 Azure custom roles are defined. This should be deployed after the hub.
  • spoke1: This contains the code to deploy a skeleton VNet-based (spoke) workload with updates that are required in the hub to connect the VNet and route ingress on-prem traffic through the firewall.

DevOps Pipelines

The hub and spoke1 folders each contain a folder called .pipelines. There you will find a .yml file to create a DevOps pipeline.

The DevOps pipeline uses Azure CLI tasks to:

  • Select the correct Azure subscription & create the resource group
  • Deploy each .bicep file.

My design uses 1 sub for the hub and 1 sub for the workload. You are not glued to this bu you would need to make modifications to how you configure the service principal permissions (below).

To use the code:

  1. Create a repo in DevOps for (1 repo) hub and for (1 repo) spoke1 and copy in the required code.
  2. Create service principals in Azure AD.
  3. Grant the service principal for hub owner rights to the hub subscription.
  4. Grant the service principal for the spoke owner rights to the spoke subscription.
  5. Create ARM service connections in DevOps settings that use the service principals. Note that the names for these service connections are referred to by azureServiceConnection in the pipeline files.
  6. Update the variables in the pipeline files with subscription IDs.
  7. Create the pipelines using the .yml files in the repos.

Don’t do anything just yet!

Service Principal Permissions

The hub service principal is simple – grant it owner rights to the hub subscription (or resource group).

The workload is where the magic happens with this DevSecOps design. The workload updates the hub suing code in the workload repo that affects the workload:

  • Ingress route from on-prem to the workload in the hub GatewaySubnet.
  • The firewall rules for the workload in the hub Azure Firewall (policy) using a rules collection group.
  • The VNet peering connection between the hub VNet and the workload VNet.

That could be deployed by the workload DevOps pipeline that is authenticated using the workload’s service principal. So that means the workload service principal must have rights over the hub.

The quick solution would be to grant contributor rights over the hub and say “we’ll manage what is done through code reviews”. However, a better practice is to limit what can be done as much as possible. That’s what I have done with the customRoles folder in my GitHub share.

Those custom roles should be modified to change the possible scope to the subscription ID (or even the resource group ID) of the hub deployment. There are 4 custom roles:

  • customRole-ArmValidateActionOperator.json: Adds the CUSTOM – ARM Deployment Operator role, allowing the ARM deployment to be monitored and updated.
  • customRole-PeeringAdmin.json: Adds the CUSTOM – Virtual Network Peering Administrator role, allowing a VNet peering connection to be created from the hub VNet.
  • customRole-RoutesAdmin.json: Adds the CUSTOM – Azure Route Table Routes Administrator role, allowing a route to be added to the GatewaySubnet route table.
  • customRole-RuleCollectionGroupsAdmin.json: Adds the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role, allowing a rules collection group to be added to an Azure Firewall Policy.

Deploy The Hub

The hub is deployed first – this is required to grant the permissions that are required by the workload’s service principal.

Grant Rights To Workload Service Principals

The service principals for all workloads will be added to an Azure AD group (Workloads Pipeline Service Principals in the above diagram). That group is nested into 4 other AAD security groups:

  • Resource Group ARM Operations: This is granted the CUSTOM – ARM Deployment Operator role on the hub resource group.
  • Hub Firewall Policy: This is granted the CUSTOM – Azure Firewall Policy Rule Collection Group Administrator role on the Azure Firewalll Policy that is associated with the hub Azure Firewall.
  • Hub Routes: This is granted the CUSTOM – Azure Route Table Routes Administrator role on the GattewaySubnet route table.
  • Hub Peering: This is granted the CUSTOM – Virtual Network Peering Administrator role on the hub virtual network.

Deploy The Workload

The workload now has the required permissions to deploy the workload and make modifications in the hub to connect the hub to the outside world.

Enabling DevSecOps with Azure Firewall

In this post, I will share how you can implement DevSecOps with Azure Firewall, with links to a bunch of working Bicep files to deploy the infrastructure-as-code (IaC) templates.

This example uses a “legacy” hub and spoke – one where the hub is VNet-based and not based on Azure Virtual WAN Hub. I’ll try to find some time to work on the code for that one.

The Concept

Hold on, because there’s a bunch of things to understand!

DevSecOps

The DevSecOps methodology is more than just IaC. It’s a combination of people, processes, and technology to enable a fail-fast agile delivery of workloads/applications to the business. I discussed here how DevSecOps can be used to remove the friction of IT to deliver on the promises of the Cloud.

The Azure features that this design is based on are discussed in concept here. The idea is that we want to enable Devs/Ops/Security to manage firewall rules in the workload’s Git repository (repo). This breaks the traditional model where the rules are located in a central location. The important thing is not the location of the rules, but the processes that manage the rules (change control through Git repo pull request reviews) and who (the reviewers, including the architects, firewall admins, security admins, etc).

So what we are doing is taking the firewall rules for the workload and placing them in with the workload’s code. NSG rules are probably already there. Now, we’re putting the Azure Firewall rules for the workload in the workload repo too. This is all made possible thanks to changes that were made to Azure Firewall Policy (Azure Firewall Manager) Rules Collection Groups – I use one Rules Collection Group for each workload and all the rules that enable that workload are placed in that Rules Collection Group. No changes will make it to the trunk branch (deployment action/pipelines look for changes here to trigger a deployment) without approval by all the necessary parties – this means that the firewall admins are still in control, but they don’t necessarily need to write the rules themselves … and the devs/operators might even write the rules, subject to review!

This is the killer reason to choose Azure Firewall over NVAs – the ability to not only deploy the firewall resource, but to manage the entire configuration and rule sets as code, and to break that all out in a controlled way to make the enterprise more agile.

Other Bits

If you’ve read my posts on Azure routing (How to Troubleshoot Azure Routing? and BGP with Microsoft Azure Virtual Networks & Firewalls) then you’ll understand that there’s more going on than just firewall rules. Packets won’t magically flow through your firewall just because it’s in the middle of your diagram!

The spoke or workload will also need to deploy:

  • A peering connection to the hub, enabling connectivity with the hub and the firewall. All traffic leaving the spoke will route through the firewall thanks to a user-defined route in the spoke subnet route table. Peering is a two-way connection. The workload will include some bicep to deploy the spoke-hub and the hub-spoke connections.
  • A route for the GatewaySubnet route table in the hub. This is required to route traffic to the spoke address prefix(es) through the Azure Firewall so on-premises>spoke traffic is correctly inspected and filtered by the firewall.

The IaC

In this section, I’ll explain the code layout and placement.

My Code

You can find my public repo, containing all the Bicep code here. Please feel free to download and use.

The Git Repo Design

You will have two Git repos:

  1. The first repo is for the hub. This repo will contain the code for the hub, including:
    • The hub VNet.
    • The Hub VNet Gateway.
    • The GatewaySubnet Route Table.
    • The Azure Firewall.
    • The Azure Firewall Policy that manages the Azure Firewall.
  2. The second repo is for the spoke. This skeleton example workload contains:

Action/Pipeline Permissions

I have written a more detailed update on this section, which can be found here

Each Git repo needs to authenticate with Azure to deploy/modify resources. Each repo should have a service principal in Azure AD. That service principal will be used to authenticate the deployment, executed by a GitHub action or a DevOps pipeline. You should restrict what rights the service principal will require. I haven’t worked out the exact minimum permissions, but the high-level requirements are documented below:

 

Trunk Branch Protection &  Pull Request

Some of you might be worried now – what’s to stop a developer/operator working on Workload A from accidentally creating rules that affect Workload X?

This is exactly why you implement standard practices on the Git repos:

  • Protect the Trunk branch: This means that no one can just update the version of the code that is deployed to your firewall or hub. If you want to create an updated, you have to create a branch of the trunk, make your edits in that trunk, and submit the changes to be merged into trunk as a pull request.
  • Enable pull request reviews: Select a panel of people that will review changes that are submitted as pull requests to the trunk. In our scenario, this should include the firewall admin(s), security admin(s), network admin(s), and maybe the platform & workload architects.

Now, I can only submit a suggested set of rules (and route/peering) changes that must be approved by the necessary people. I can still create my code without delay, but a change control and rollback process has taken control. Obviously, this means that there should be SLAs on the review/approval process and guidance on pull request, approval, and rejection actions.

And There You Have It

Now you have the design and the Bicep code to enable DevSecOps with Azure Firewall.

DevSecOps Resolving IT Friction In The Cloud

In this post, I’m going to discuss how to solve an age-old problem that still hurts us in The Cloud with DevSecOps: the on-going friction between devs and ops and how the adoption of the cloud is making this worse.

Us Versus Them

Let me say this first: when I worked as a sys admin, I was a “b*st*rd operator from hell”. I locked things down as tight as I could for security and to control supportability. And as you can imagine, I had lots of fans in the development teams – not!

Ops and devs have traditionally disliked each other. Ops build the servers perfectly. Devs write awesome code. But when something goes wrong:

  • Their servers are too slow
  • Their architecture/code is rubbish

Along Came a Cloud

The cloud was meant to change things. And in some ways, it did. In the early days, when AWS was “the cloud”, devs got a credit card from somewhere and started building. The rush of freedom and bottomless resources oxygenated their creativity and they build and deployed like they were locked in a Lego shop for the weekend.

Eventually, the sober-minded Ops, Security, and Compliance folks observed what was happening and decided to pull the reigns back. A “landing zone” was built in The Cloud (now Azure and others are in play) and governance was put in place.

What was delivered in that landing zone? A representation of the on-premises data center that the devs were trying to escape from. Now they are told to work in this locked-down environment and the devs are suddenly slowed down and restricted. Change control, support tickets, and a default answer from Ops of “no” means that agility and innovation die.

But here’s the thing – the technology was a restricting factor when working on-premises: physical hardware means and 100% IaaS means that Ops need to deliver every part of the platform. In the cloud, technology wasn’t the cause of the issue. The Cloud started with self-service, all-you-can-eat capacity, and agility. And then traditional lockdowns were put in place.

Business Dissatisfaction

A good salesperson might have said that there can be cost optimisations but cost savings should not be a primary motivation to go with the cloud. Real rewards come from agility, which leads to innovation. The ability to build fast, see if it works, develop it if it does, dump it if it doesn’t, and not commit huge budgets to failed efforts is huge to a business. When Ops locks down The Cloud, some of the best features of The Cloud are lost. And then the business is unhappy – there were costly migration projects, actual IT spend might have increased, and they didn’t get what they wanted – IT failed again.

By the way, this is something we (me and my colleagues at work) have started to see as a trend with mid-large organisations that have made the move to Azure. The technology isn’t failing them – people and processes are.

People & Processes

Technology has a role to play but we can probably guesstimate that it’s about 20% of the solution. People and processes must evolve to use The Cloud effectively. But those things are overlooked.

Microsoft’s Cloud Adoption Framework (CAF) recognises this – the first half of the CAF is all about the soft side of things:

The CAF starts out by analysing the business wants from The Cloud. You cannot shape anything IT-wise without instruction from above. What does the business want? Do you know who you should not ask? The IT Manager – they want what IT wants. To complete the strategy definition, you need to get to the owners/C-level folks in the business – getting time with them is hard! Once you have a vision from the business you can start looking at how to organise the people and set up the processes.

Organisational Failure

Think about the structure of IT. There is an Ops team/department with a lead. That group of people has pillars of expertise in a mid-large organisation:

  • The Windows team
  • Linux
  • Networking
  • SAN
  • And so on

Even those people don’t work well in collaboration. There is also a Dev department that is made up of many teams (workloads) that may even have their own pillars of expertise – some/many of those are externals. There is no alignment or collaboration between all the parties involved in building, running, and continuously improving a workload.

DevOps

DevOps is a methodology that brings Ops and Devs together in actual or virtual teams for each workload. For example, let’s say that a workload requires the following skills from many teams/departments:

  • .NET developers
  • Application architect
  • Infrastructure architect
  • Azure operators

That might be skills from 4 teams. But in DevSecOps, the workload defines a virtual or actual team of people that will work on that application and its underlying infrastructure together. The application and infrastructure architects will design together. The devs and ops skills will work together to produce the code that will create the underlying platform (PaaS and/or IaaS) that will be continuously developed/improved/deployed using GitHub/DevOps actions/pipelines.

Agile methodologies will be brought into plan:

  • Work through epics, user stories, features and tasks (backlog)
  • That are scheduled to sprints (kanban board)
  • And are assigned to/pulled by members of the DevOps team (resource planning)

What has been accomplished? Now a team works together. They have a single vision through a united team. They share a plan and communicate through daily standup meetings and modern tooling such as Teams. By working as one, they can produce code fast. And that means they can fail fast:

  • Produce a minimally viable product
  • Test if it works
  • If it does, improve on it in sprints
  • If it doesn’t, tear it down quickly with minimal money lost

DevSecOps

In The Cloud, modern workloads are presented to clients over the Internet using TLS. The edge means that there is a security role. And in a good design, micro-segmentation is required, which means an expanded security role. And considering the nature of threats today, the security role should have some developer skills to analyse code and runtimes for security vulnerabilities.

If we don’t change how the security role is done then it can undo everything that DevOps accomplishes – all of a sudden a default “no” appears, halting all the progress towards agility and innovation.

DevSecOps adds the security role to DevOps. Now security personnel is a part of the workload’s team. They will be a part of the design process. They will be the ones that either implement in code and/or review firewall rules in the pull request. Elements of security are moved from a central location out to the repos for the workloads – the result is that the what and who don’t change; all that changes is the where.

Influence

Introducing the sort of changes that DevSecOps will require is not going to be easy or quick. We can do the tech pieces in Azure pretty easily, actually, but the people might resist and the processes won’t exist in the organising. Introducing change will be hard and it will be resisted. That’s why the process must be lead from the C-level.

Got Something To Add?

What do you think? Please comment below.