Terraform

Azure Infrastructure Announcements – September 2023

September is a month of storms. There appears to have been lots of activity in the Azure cloud last month too. Everyone working on Azure should pay attention to the PAY ATTENTION! section.

PAY ATTENTION!

Default outbound access for VMs in Azure will be retired— transition to a new method of internet access

On 30 September 2025, default outbound access connectivity for virtual machines in Azure will be retired. After this date, all new VMs that require internet access will need to use explicit outbound connectivity methods such as Azure NAT Gateway, Azure Load Balancer outbound rules, or a directly attached Azure public IP address.

There will be more communications on this from Microsoft. But this is more than a “don’t worry about your existing VMs” situation. What happens when you add more VMs to an existing old network? What happens when you do a restore? What happens when you do an Azure Site Recovery failover? Those are all new VMs in old networks and they are affected. Everyone should do some work to see if they are affected and prepare remediations in advance – not on the day when they are stressed out by a restore or a Black Friday expansion.

App Service Environment version 1 and version 2 will be retired on 31 August 2024

After 31 August 2024, App Service Environment v1 and v2 will no longer be supported and these App Service Environments and the applications running on them will be deleted and any application data associated with them will be lost.

Oh yeah, you’d better start working on migrations now.

Azure Kubernetes Service

Application gateway for Containers vs Application Gateway Ingress Controller – What’s changed?

Application Gateway for Containers is a new application (layer 7) load balancing and dynamic traffic management product for workloads running in a Kubernetes cluster. At the time of writing this service is currently in public preview. In this article we will look at the differences between AGIC and Application Gateway for containers and some of the great new features available through this new offering.

I know little about AKS but this subject seems to have excited some AKS users.

A Bucket Load Of Stuff

Too much for me to get into and I don’t know enough about this stuff:

App Services

Announcing Public Preview of Free Hosting Plan for WordPress on App Service

We announced the General Availability of WordPress on App Service one year ago, in August 2022 with 3 paid hosting plans. We learnt that sometimes you might need to try out the service before you migrate your production applications. So, we are offering you a playground for a limited period – a free hosting plan to and explore and experiment with WordPress on App Service. This will help you understand the offering better before you make a long-term investment.

They really want you to try this out – note that this plan is not for production workloads.

Hybrid

Announcing the General Availability of Jumpstart HCIBox

Almost one year ago the Jumpstart team released the public preview of HCIBox, our self-contained sandbox for exploring Azure Stack HCI capabilities without the need for physical hardware. Feedback from the community has been fantastic, with dozens of feature requests and issues submitted and resolved through our open-source community.

Today, the Jumpstart team is excited to announce the general availability of HCIBox!

It’s one thing to test out the software functionality of Azure Stack HCI. But the reality is that this is a hardware-centric solution and there is no simulating the performance, stability, or operations of something this complex.

Generally Available: Windows Server 2012 and 2012 R2 Extended Security Updates enabled by Azure Arc

Windows Server 2012 and 2012 R2 Extended Security Updates (ESUs) enabled by Azure Arc is now Generally Available. Windows Server 2012 and 2012 R2 are going End of Support on October 10, 2023. With ESUs, customers who are running Windows Server 2012 on-premises or in other clouds can get three more years of critical security updates from Microsoft to protect their End of Life infrastructure.

This is not free. This is tied into the news about Azure Update Manager (below).

Miscellaneous

Detailed CSP to EA Migration guidance and crucial considerations

In this blog, I’ve shared insights drawn from real-world migration experiences. This article can help you meticulously plan your own CSP to EA migration, ensuring a smoother transition while incorporating critical considerations into your migration strategy.

One really wishes that CSP, EA, etc were just differences in billing and not Azure APIs. Changing of billing should be like changing a phone plan.

Top 10 Considerations for running your workload successfully on Azure this Holiday Season

Black Friday, Small Business Saturday and Cyber Monday will test your app’s limits, and so it’s time for your Infrastructure and Application teams to ensure that your platforms delivers when it is needed the most. Be it shopping applications on the web and mobile or payment gateways or banking systems supporting payments or inventory systems or billing systems – anything and everything associated with the shopping season should be prepared to face the load for this holiday season.

The “holiday season” starts earlier every year. Tesco Ireland started in August. Amazon has a Prime Day next Tuesday (October 10). These events test systems harder than ever and monolithic on-prem designs will not handle it. It’s time to get ready – if it’s not already too late!

Ungated Public Preview: Azure API Center

We’re thrilled to share that Azure API Center is now open for everyone to try during our ungated public preview! Azure API Center is a new Azure service that is part of the Azure API Management platform. It is the central hub where you can effortlessly keep track of all your APIs company-wide, making them readily discoverable, reusable, and manageable.

Managing a catalog of APIs could be challenging. Tooling is welcome.

Generally available: Secure critical infrastructure from accidental deletions at scale with Policy

We are thrilled to announce the general availability of DenyAction, a new effect in Azure Policy! With the introduction of Deny Action, policy enforcement now expands into blocking request based on actions to the resource. These deny action policy assignments can safeguard critical infrastructure by blocking unwarranted delete calls.

Can you believe that Azure was designed deliberately to not have a deny permission? Adding it after is not easy. The idea here is that delete locks on resources/resource groups become too easy to remove – and are frequently removed. Something, like a policy, that is enforced in the API (between you and the resources) is always applied and is not easy to remove and can be easily deployed at scale.

Virtual Machines

Generally available: Azure Premium SSD v2 Disk Storage is now available in more regions

Azure Premium SSD v2 Disk Storage is now available in Australia East, France Central, Norway East and UAE North regions. This next-generation storage solution offers advanced general-purpose block storage with the best price performance, delivering sub-millisecond disk latencies for demanding IO-intensive workloads at a low cost.

Expanded region availability makes this something more interesting. But, Azure Backup support is in very limited preview since the Spring.

Announcing the general availability of new Azure burstable virtual machines

we are announcing the general availability of the latest generations of Azure Burstable virtual machine (VM) series – the new Bsv2, Basv2, and Bpsv2 VMs based on the Intel® Xeon® Platinum 8370C, AMD EPYC™ 7763v, and Ampere® Altra® Arm-based processors respectively.

Faster and cheaper than the previous editions of B-Series VMs and they include ARM support too. The new virtual machines support all remote disk types such as Standard SSD, Standard HDD, Premium SSD and Ultra Disk storage.

Generally Available: Azure Update Manager

We are pleased to announce that Azure Update Manager, previously known as Update Management Center, is now generally available.

The controversial news is that Arc-managed machines will cost $5/month. I’m still not sold on this solution – it still feels less than legacy solutions like WSUS.

Announcing Public Preview of NVMe-enabled Ebsv5 VMs offering 400K IOPS and 10GBps throughput

Today, we are announcing a Public Preview of accelerated remote storage performance using Azure Premium SSD v2 or Ultra disk and selected sizes within the existing NVMe-enabled Ebsv5 family. The higher storage performance is offered on the E96bsv5 and E112ibsv5 VM sizes and delivers up to 400K IOPS (I/O operations per second) and 10GBps of remote disk storage throughput.

Even the largest SQL VM that I have worked with comes nowhere near these specs. The customer(s) that have justified this investment by Microsoft must be huge.

Azure savings plan for compute: How the benefit is applied

Organizations are benefiting from Azure savings plan for compute to save up to 65% on select compute services – and you could too. By committing to spending a fixed hourly amount for either one year or three years, you can save on plans tailored to your budget needs. But you may wonder how Azure applies this benefit.

It’s simple really. The system looks at your VMs, calculates the theoretical savings, and first applies your discount to the machines where you will save the most money, and then repeats until your discount is used.

General Availability: Share VM images publicly with community gallery – Azure Compute Gallery feature

With community gallery, a new feature of Azure Compute Gallery, you can now easily share your VM images with the wider Azure community. By setting up a ‘community gallery’, you can group your images and make them available to other Azure customers. As a result, any Azure customer can utilize images from the community gallery to create resources such as virtual machines (VMs) and VM scale sets.

This is a cool idea.

Trusted Launch for Azure VMware Solution virtual machines

Azure VMware Solution proudly introduces Public Preview of Trusted Launch for Virtual Machines. This advanced feature comprises Secure Boot, Virtual Trusted Platform Module (vTPM), and Virtualization-based Security (VBS), collectively forming a formidable defense against modern cyber threats.

A feature that was introduced in Windows Server 2016 Hyper-V.

Infrastructure-As-Code

Introduction to Azure DevOps Workload identity federation (OIDC) with Terraform

Workload identity federation is an OpenID Connect implementation for Azure DevOps that allow you to use short-lived credential free authentication to Azure without the need to provision self-hosted agents with managed identity. You configure a trust between your Azure DevOps organisation and an Azure service principal. Azure DevOps then provides a token that can be used to authenticate to the Azure API.

This looks like a more secure way to authenticate your pipelines. No secrets are stored and a trust between your DevOps organasation and Azure enables short-lived authentication with desired access rights/scopes.

Quickstart: Automate an existing load test with CI/CD

In this article, you learn how to automate an existing load test by creating a CI/CD pipeline in Azure Pipelines. Select your test in Azure Load Testing, and directly configure a pipeline in Azure DevOps that triggers your load test with every source code commit. Automate load tests with CI/CD to continuously validate your application performance and stability under load.

This is not something that I have played with but I suspect that you don’t want to do this against production systems!

General Availability: GitHub Advanced Security for Azure DevOps

Starting September 20th, 2023, the core scanning capabilities of GitHub Advanced Security for Azure DevOps can now be self-enabled within Azure DevOps and connect to Microsoft Defender for Cloud. Customers can automate security checks in the developer workflow using:

Code Scanning: locates vulnerabilities in source code and provides remediation guidance.
Secret Scanning: identifies high-confidence secrets and blocks developers from pushing secrets into code repositories.
Dependency Scanning: discovers vulnerabilities with open-source dependencies and automates update alerts for developers.

This seems like a good direction to go but I’m told it’s quite pricey.

Networking

General availability: Sensitive Data Protection for Application Gateway Web Application Firewall

WAF running on Application Gateway now supports sensitive data protection through log scrubbing. When a request matches the criteria of a rule, and triggers a WAF action, that event is captured within the WAF logs. WAF logs are stored as plain text for debuggability, and any matching patterns with sensitive customer data like IP address, passwords, and other personally identifiable information could potentially end up in logs as plain text. To help safeguard this sensitive data, you can now create log scrubbing rules that replace the sensitive data with “******”.

Sounds good to me!

General availability: Gateway Load Balancer IPv6 Support

Azure Gateway Load Balancer now supports IPv6 traffic, enabling you to distribute IPv6 traffic through Gateway Load Balancer before it reaches your dual-stack applications.

With this support, you can now add IPv6 frontend IP addresses and backend pools to Gateway Load Balancer. This allows you to inspect, protect, or mirror both IPv4 and IPv6 traffic flows using third-party or custom network virtual appliances (NVAs).

Useful for security architectures where NVAs are being used

Azure Backup

Preview: Cross Region Restore (CRR) for Recovery Services Agent (MARS) using Azure Backup

We are announcing the support of Cross Region Restore for Recovery Services Agent (MARS) using Azure Backup.

This makes sense. Let’s say I back up my on-prem data, located in Virginia, to Azure East US, in Boydton Virginia. And then there’s a disaster in VA that wipes out my office and Azure East US. Now I can restore to a new location from the paired region replica.

Preview: Save Azure Backup Recovery Services Agent (MARS) passphrase to Azure Key Vault

Now, you can save your Azure Recovery Services Agent encryption passphrase in Azure Key Vault directly from the console, making the Recovery Services Agent installation seamless and secure.

This beats the old default option of saving it as a text file on the machine that you were backing up.

General availability: Selective Disk Backup and Restore in Enhanced Policy for Azure VM Backup

We are adding the “Selective Disk Backup and Restore” capability in Enhanced Policy of Azure VM Backup.

Be careful out there!

Storage

General Availability: Malware Scanning in Defender for Storage

Malware Scanning in Defender for Storage will be generally available September 1, 2023.

Please make sure that you read up on how much this will cost you. The DfC plans changed recently, and the pricing model for Storage plans changed to include this feature.

Azure Monitor

Public preview: Alerts timeline view

Azure Monitor alerts is previewing a new timeline view that simplifies the consumption experience of fired alerts. The new view has the following advantages:

Shows fired alerts on a timeline
Helps identify co-occurrence of alerts
Displays alerts in the context of the resources they fired on
Focuses on showing counts of alerts to better understand impact
Supports viewing alerts by severity
Provides a more intuitive discovery and investigation path

This might be useful if you are getting a lot of alerts.

Azure Virtual Desktop

Announcing general availability of Azure Virtual Desktop Custom Image Templates

Custom image templates allow admins to build a custom “golden image” using the Azure Virtual Desktop management user interface. Leverage a variety of built-in customizations or add your own customization scripts to install applications or configurations.

Why are they not using Azure Image Builder like I do?

Terrafying Azure – A Tale From The Dark Side

This post is a part of the Azure Back to School 2023 online event. In this post, I will discuss using Microsoft Azure Export for Terraform, also known as Aztfexport and previously known as Azure Terrafy (a great name!), to create Terraform code from existing Azure deployments, why you would do it, and share a few tips.

Terraform is one of a few Infrastructure-as-Code (IaC) languages out there that support Microsoft Azure. You might wonder why I would use it when Azure has ARM and Bicep. I’ll do a quick introduction to Terraform and then explain my reasoning which you are free to disagree with 🙂

Terraform is a product of Hashicorp available as a free-to-use product that is supported with some paid-for services. Like other IaC languages, it describes and desired end result. The major feature that differs from the native Azure languages is the use of state files – a file that describes what is deployed in Azure. This state file has a few nice use cases, including:

The outputs of a resource are documented, enabling effortless integration between resources in the same or even different files – with some effort, outputs from different deployments can be included in another deployment.
A true what-if engine that (mostly) works, unlike the native what-if in Azure, greatly reducing the time required for deployments and the ability to plan (pre-review) a deployment’s expected changes.

My first encounter with Terraform was a government project where the customer wanted to use Terraform over Bicep. Their reasoning was that elected politicians come and go, and suppliers come and go. If they were going to invest in an IaC skillset, they wanted the knowledge to be transferrable across clouds.

That’s the big advantage of Terraform. While the code itself is not cloud portable, the skill is. Terraform uses providers to be able to manage different resource types. Azure is a provider, written by Microsoft. Azure AD is a provider – ARM/Bicep still do not support Azure AD! AWS and GCP have providers. VMware has a provider. GitHub has a provider – the list goes on and on. If a provider does not exist, you can (in theory) write your own.

On that project, I was meant to be hands-off as an architect. But there were staffing and scheduling issues so I stepped up. Having never written a line of Terraform before I had my first workload, with some review help from a teammate, written in under a day. By the way, the same thing in Bicep took three days! Terraform is really well documented, with lots of examples, and the language makes sense.

Unlike Bicep, which is still beholden to a lot of the complexity of ARM. Doing simple things can involve stupidly complicated functions that only a C programmer (I used to be one) could enjoy (and I didn’t). I got hooked on Terraform and convinced my colleagues that it was a better path than Bicep, which was our original plan to replace ARM/JSON.

Aztfexport

Switching Terraform creates a question – what do we do with our existing workloads which are either deploying using Click Ops (Portal), script, or ARM/Bicep?

Microsoft has created a tool called Azure Export for Terraform (Aztfexport) on GitHub. The purpose of this tool is to take an existing resource group/resource/Graph query string and export it as Terraform code.

The code that is produced is intended to be used in some other way. In other words, Microsoft is not exporting code that should be able to immediately deploy new resources. They say that the produced code should be able to pass a terraform plan where the existing resources are compared with the state file and the code and say “the code is clean and there are no changes required”.

The Terraform configurations generated by aztfexport are not meant to be comprehensive and do not ensure that the infrastructure can be fully reproduced from said generated configurations. For details, please see limitations).
Azure/aztfexport: (github.com)

Why Use Aztfexport?

If I can’t use the code to deploy resources then what value is it? Hopefully you will see what aztfexport is a central part of my toolkit. I see it being useful in the following ways:

Learning Terraform: If you’ve not used Terraform before then it’s useful to see how the code can be produced, especially from resources that you are already familiar with.
Creating TF for an existing workload: You need to “terrafy” a resource/resource group and you want a starting point.
Azure-to-Azure migrations: You have a set of existing resources and you want to get a dump of all the settings and configurations.
Learning how a resource type/solution is coded: My favourite learning method is to follow the step-by-step and then inspect the resource(s) as code.
Understand how a resource type/solution works: This is a logical jump from the previous example, now including more resources as a whole solution.
Auditing: Comparing what is there with what should be there – or not there.
Documentation: The best form of resource documentation is IaC – why create lengthy documentation when the code is the resource?

I did use Aztfexport to learn Terraform more. In my current project, I have used it again and again to do Azure-to-Azure migrations, taking legacy ClickOps deployments and rewriting them as new secure/governed deployments. I’ve save countless hours capturing settings and configurations and re-using them as new code.

The Bad Stuff

Nothing is perfect, and Aztfexport has some thorns too. Notice that the expected usage is that the produced code should pass a terraform plan. That is because in many situations (like with ARM exports) the code is not usable to deploy resources. That can be because:

ARM APIs do not expose everything, so how can Terraform get those settings?
The tool or the providers using used do not export everything.

One example I’ve seen includes App Services configurations that do not include the code type details. Another recent one was with WAF Policies overridden WAF rules were not documented. In both cases, the code would pass a plan. But neither would re-produce the resources. I’ve learned that I do need to double-check things with a resource type that I’ve never worked with before – then I know what to go and manually grab either from an ARM export or a visual inspection in the Portal.

Another thing is that the resources are named by a “machine” – there is no understanding of the role. Every resource is res-1, res-2, and so on, no matter the type or the role in the workload. That is a bit anonymous, but I find that useful when inspecting dependencies between resources.

A giant main.tf file is created, which I break up into many smaller files. I can find relationships based on those easy-to-track dependencies and logically group resources where it suits my coding style.

One feature of TF is the easy reuse of resource IDs. One can easy refer to resource_type.resource_name.id in a property and know that the resource ID of that resource will be used. Unfortunately, some Aztfsexport code doesn’t do that so you get static resource IDs that should be replaced – that happens with other properties of resources too, so that all should be cleaned up to make code more reusable.

Installing Aztfexport

You will need to install Terraform – I prefer to use a Package Manager for that – the online instructions for a manual installation are a mess. You will also require Azure CLI.

The full instructions for installing Aztfexport are shared on GitHub, covering Windows, MacOS and Linux. The Windows installation is easy:

winget install aztfexport

You will need to restart your terminal (Windows) to get an updated Path variable so the aztfexport binary can be found.

Before you use aztfexport, you will need to log in using Azure CLI:

Open your terminal

Login:
az login

Change subscription:
az account set -subscription <subscription ID>

Verify the correct subscription was selected by checking the resource groups:
az group list

Create an empty folder on your PC and navigate to that folder in your terminal. The aztfexport tool requires an empty folder, by default, to create an export including all the required provider files and the generated code.

If you want to create an export of a single resource then you can run:

aztfexport resource <resource ID>

If you want to create an export of a resource group, then you can run:

aztfexport resource-group -n <resource group name>

Not the -n above means “don’t bother me with manual confirmation of what resources to include in the export”. In Terraform, sub-resources that can be managed as their own Terraform resources would otherwise need to be confirmed and that gets pretty tiresome pretty fast.

Tips

I’ve got to hammer on this one again, the produced code is not intended for deployment. Take the code, copy and paste it into new files and clean it up.

If your goal is to take over an existing IaC/ClickOps deployment with Terraform then you are going to have some fun. The resources already exist and Terraform is going to be confused because there is no state file. You will have to produce a state file using Terraform export for every resource definition in your code. That means knowing the resource IDs of everything, including Azure AD objects, role assignments, and sub-resources. You’ll need to understand the format of those resource IDs – use an existing state file for that. Often the resource ID is the simple Azure resource ID, or a derivation of a parent resource ID that you can figure out from another state file. Sometimes you need to wander through Azure AD (look at assignments in scopes that you do have access to if you don’t have direct Azure AD rights), use Azure CLI to “list” resources or items, or browse around using Resource Explorer in the Azure Portal.

Do take some time to compare your code with any previous IaC code or with an ARM export. Look for things that are missing – Terraform has many defaults that won’t be included and that code is missing because it is not required. I often include that code because I know that they are settings that Devs/Ops might want to tune later.

If you have the misfortune of having to work an existing Terraform module library then you will have to translate the exported code as parameter/variable files for the new code – I do not envy you 🙂

Summary

This post is an introduction to Microsoft Azure Export for Terraform and a quick how-to-get-started guide. There is much more to learn about, such as how to use a custom backend (if resource names in Terraform are not a big deal and to eliminate the terraform import task) or even how to use a resource map to identify resources to export across many resource groups.

The tool is not perfect but it has saved me countless hours over the last year or so, dating back to when it was called Azure Terrafy. It’s one in my toolkit and I regularly break it out to speed up my work. In my opinion, anyone starting to work with Terraform should install and use this tool.

Referencing Private Endpoint IP Addresses In Terraform

It is possible to dynamically retrieve the resulting IP address of an Azure Private Endpoint and use it in other resources in Terraform. This post will show you how.

Scenario

You are building some PaaS resources using Private Endpoints. You have no idea what the IP addresses are going to be. But you need to use those IP addresses elsewhere in your Terraform code, for example in an NSG rule. How do you get the IP addresses?

Find The Properties

The trick for this is to use the terraform state command. In my case, I deployed a Cosmos DB resource using azurerm_private_endpoint.cosmosdb-account1. To view the state of the resource, I can run:

terraform state show azurerm_private_endpoint.cosmosdb-account1

That outputs a bunch of code:

You can think of the exposed state as a description of the resource the moment after it was deployed. Everything in that state is addressable. A common use might be to refer to the resource ID (azurerm_private_endpoint.cosmosdb-account1.id) or resource name (azurerm_private_endpoint.cosmosdb-account1.name) properties. But you can also get other properties that you don’t know in advance.

The Solution

Take another look at the above diagram. There is an array property called private_dns_zone_configs that has one item. We can address this property as azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].

In there there is another array property, with two items, called record_sets. There is one record set per IP address created for this private endpoint. We can address these properties as azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0] and azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].

Cosmos DB creates a private endpoint with multiple different IP addresses. I deliberately chose Cosmos DB for this example because it shows a more complex probelm and solution, demonstrating a little bit more of the method.

Dig into record_sets and you’ll find an array property called ip_addresses with 1 item. If I want the two IP addresses of this private endpoint then I will use: azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0].ip_addresses[0] and azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].ip_addresses[0].

Using the Addresses

destination_address_prefixes = [
 azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[0].ip_addresses[0], // Cosmos DB Private Endpoint IP 1
 azurerm_private_endpoint.cosmosdb-account1.private_dns_zone_configs[0].record_sets[1].ip_addresses[0] // Cosmos DB Private Endpoint IP 2
 ]                       
}

And now I have code that will deploy an NSG rule with the correct destination IP address(es) of my private endpoint without knowing them. And even better, if something causes the IP address(es) to change, I can rerun my code without changing it, and the rules will automatically update.

Get The Diagnostics Logs Names For An Azure Resource

This post will show you how to get the ARM (also for Bicep, Terraform, etc) names of the diagnostics logs for an Azure resource.

Problem

When you are deploying Azure resources as code, you might need to enable diagnostics logs. This might require you to know the name of each log. Here’s the issue: the names of the logs in the Azure Portal are usually different from the names that are used in the code. Sure, they’ll remove the spaces and use camel-case, but that’s predictable. Often, the logs have completely different names.

Sometimes the names are documented – thank you App Services! Sometimes you cannot find the log names – boo Azure SQL!

Solution

The tip that I’m going to share is useful – this is the second time in a few weeks that I’ve used this approach.

If you know what you are looking for, diagnostics logs in this case, then do a search online for something like “Azure Diagnostics Settings REST API”. This will bring you to a Microsoft page that shares different methods for the API.

I wanted to see what the log names are for an Azure SQL Database. So I manually created the diagnostic setting. After that, grab the resource ID of the Azure SQL Database.

Then I did the above search. I clicked the Get method and then clicked the Try It button. Put the name of the diagnostic setting (that you created) in name. Put the resource ID of the Azure SQL Database in resourceID. And then click Run. A second later, the ARM for the diagnostic setting is presented on a screen below, including all the diagnostics log names.

Pros & Cons: IaC Modules

This post will discuss the pros & cons of creating & using Infrastructure-as-Code/IaC Modules – based on 2 years of experience in creating and using a modular approach.

Why Modules?

Anyone who has done just a little bit of template work knows that ARM templates can get quickly get too big. Even a simple deployment, like a hub & spoke network architecture, can quickly expand out to several hundred lines without very much being added. Heck, when Microsoft first released the Cloud Adoption Framework “Enterprise Scale” example architecture, one of the ARM/JSON files was over 20,000 lines long!

The length of a template file can cause so many issues, including but definitely not limited to:

It becomes hard to find anything
Big code becomes hard code to update – one change has many unintended repercussions
Collaboration becomes near impossible
Agility is lost

One of the pain points that really annoyed one of my colleagues is that “big code” usually becomes non-standardised code; that becomes a big issue when a “service organisation” is supporting multiple clients (consulting company, managed services, Operations, or cloud centre of excellence).

Modularisation

The idea of modularisation is that commonly written code is written once as a module. That module is then referred to by other code whenever the functions of the module are required. This is nothing new – the concept of an “include” or “DLL” is very old in the computing world.

For example, I can create a Bicep/ARM/Terraform module for an Azure App Service. My module can deploy an App Service the way that I believe is correct for my “clients” and colleagues. It might even build some governance in, such as a naming standard, by automating the naming of the new resource based on some agreed naming pattern. Any customisations for the resource will be passed in as parameters, and any required values for inter-module dependencies can be passed out as outputs.

Quickly I can build out a library of modules, each deploying different resource types – now I have a module library. All I need now is code to call the modules, model dependencies, pass in parameters, and take outputs from one module and pass them in as parameters to others.

The Benefits

Quickly, the benefits appear:

You write less code because the code is written once and you reuse it.
Code is standardised. You can go from one workload to another, or one client to another, and you know how the code works.
Governance is built into the code. Things like naming standards are taken out of the hands of the human and written as code.
You have the potential to tap into new Azure features such as Template Specs.
Smaller code is easier to troubleshoot.
Breaking your code into smaller modules makes collaboration easier.

The Issues

Most of the issues are related to the fact that you have now built a software product that must be versioned and maintained. Few of us outside the development world have the know-how to do this. And quite frankly, the work is time-consuming and detracts from the work that we should be doing.

No matter how well you write a module, it will always require updates. There is always a new feature or a previously unknown use case that requires new code in the module.
New code means new versions. No matter how well you plan, new versions will change how parameters are used and will introduce breaking changes with some or all previous usage of the module.
Trying to create a one-size-fits-all module is hard. Azure App Services are a perfect example because there are dozens if not hundreds of different configuration options. Your code will become long.
The code length is compounded by code complexity. Many values require some sort of input, such as NULL. Quickly you will have if-then-elses all over your code.
You will have to create a code release and versioning system that must be maintained. These are skills that Ops people typically do not have.
Changes to code will now be slowed down. If a project needs a previously unwritten module/feature, the new code cannot be used until it goes through the software release mechanism. Now you have lost one of the key features of The Cloud: agility.

So What Is Right?

The answer is, I do not know. I know that “big code” without some optimisation is not the way forward. I think the type of micro-modularisation (one module per resource type) that we normally think of when “IaC Modules” is mentioned doesn’t work either.

One of the reasons that I’ve been working on and writing about Bicep/Azure Firewall/DevSecOps recently is to experiment with things such as the concept of modularisation. I am starting to think that, yes, the modularisation concept is what we need, but how we have implemented the module is wrong.

My biggest concern with the micro-module approach is that it actually slowed me down. I ended up spending more time trying to get the modules to run cleanly than I would have if I’d just written the code myself.

Maybe the module should be a smaller piece of code, but it shouldn’t be a read-only piece of code. Maybe it should be an example that I can take and modify to my own requirements. That’s the approach that I have used in my DevSecOps project. My Bicep code is written into smaller files, each handling a subset of the tasks. That code could easily be shared in a reference library by a “cloud centre of excellence” and a “standard workload” repo could be made available as a starting point for new projects.

Please share below if you have any thoughts on the matter.