Beware Of The Default Rules In Network Security Groups

The Network Security Group (NSG) is the primary mechanism for segmenting a subnet in Microsoft Azure. NSGs are commonly implemented. Unfortunately, people assume quite a bit about NSGs, and I want to tackle that by explaining why you need to be aware of the default rules in Network Security Groups.

The Assumption

Let’s say I have an extremely simple workload consisting of:

  • A virtual machine acting as a web server.
  • A virtual machine acting as a database server.

Yes, this could be VNet-connected PaaS services and have the same issues, but I want the example to be as clear as possible.

I want to lock down and protect that subnet so I will create an NSG and associate it with the subnet. Traffic from the outside world is blocked, right? Now, I will create an inbound rule to allow HTTPS/TCP 443 from client external addresses to the web server.

NameSourceProtocolPortDestinationAction
AllowWeb<clients>TCP443Web VMAllow

The logic I expect is:

  1. Allow web traffic from the clients to the web server.
  2. Allow SQL traffic from the web server to the database server in the same subnet.
  3. Everything else is blocked.

I check the Inbound Rules in the NSG and I can see my custom rules and the built-in default rules. This confirms my logic, right?

All is well, until one day, every computer in the office has a ransomware demand screen and both of my Azure VMs are offline. Now my boss is screaming at me because the business’s online application is not selling our products to customers.

Where It All Went Wrong

Take a look at the default rules in the above screenshot. Rule 65500 denies all traffic. That’s what we want; block all traffic where a higher priority rule doesn’t allow it. That’s the rule that we were banking on to protect our Azure workload.

But take a look at rule 65000. That rule allows all traffic from VirtualNetwork to VirtualNetwork. We have assumed that VirtualNetwork means the virtual network that the NSG of the subnet that it is associated with – in other words, the virtual network that we are working on.

You are in for a bigger surprise than a teddy bear picnic in your woods if you research the definition of VirtualNetwork:

The virtual network address space (all IP address ranges defined for the virtual network), all connected on-premises address spaces, peered virtual networks, virtual networks connected to a virtual network gateway, the virtual IP address of the host, and address prefixes used on user-defined routes. This tag might also contain default routes.

In summary, this means that VirtualNetwork contains:

  • The prefixes in your virtual network
  • Any peered virtual networks
  • Any remote networks connected by site-to-site networking
  • Any networks where you have referenced in a user-defined route in your subnets.

Or, pretty much every network you can route to/from. And that’s how the ransomware got from someone’s on-premises PC into the virtual network. The on-premises networks were connected with the Azure virtual network by VPN. The built-in 65000 rule allowed all traffic from on-premises. There was nothing to block the ransomware from spreading to the Azure VMs from the on-premises network.

Solving This Problem

There are a few ways to solve this issue. I’ll show you a couple. I am a believer in true micro-segmentation to create trust-noone networks. The goal here is that no traffic is allowed anywhere on any Azure network without a specific rule to permit flows that are required by the business/technology.

The logic of the below is that all traffic will be denied by default, including traffic inside the subnet.

Remember, all NSG rules are processed at the NIC, no matter how the NSG is associated.

I have added a low-priority (4000) rule to deny everything that is not allowed in the higher-priority rules. That will affect all traffic from any source, including sources in the same virtual network or subnet.

By the way, the above is the sort of protection that many national cyber security agencies are telling people to implement to stop modern threats – not just the threats of 2003.

I know that some of you will prefer to treat the NSG as an edge defence, allowing all traffic inside the virtual network. You can do that too. Here’s an example of that:

My rule at 3900 allows all traffic inside the address prefix of the virtual network. The following rule, 4000, denies everything, which means that anything from outside the network (not including the traffic in rule 100) will be blocked.

The Lesson

Don’t assume anything. You now know that VirtualNetwork means everything that can route to your virtual network. For example the Internet service tag includes The Internet and Microsoft Azure!

How Do Network Security Groups Work?

A Greek Phalanx, protected by a shield wall made up of many individuals working under 1 instruction as a unit – like an NSG.

Yesterday, I explained how packets travel in Azure networking while telling you Azure virtual networks do not exist. The purpose was to get readers closer to figuring out how to design good and secure Azure networks without falling into traps of myths and misbeliefs. The next topic I want to tackle is Network Security Groups – I want you to understand how NSGs work … and this will also include Admin Rules from Azure Virtual Network Manager (AVNM).

Port ACLs

In my previous post, Azure Virtual Networks Do Not Exist, I said that Azure was based on Hyper-V. Windows Server 2012 introduced loads of virtual networking features that would go on to become something bigger in Azure. One of them was a mostly overlooked-by-then-customers feature called Port ACLs. I liked Port ACLs; it was mostly unknown, could only be managed using PowerShell and made for great demo content in some TechEd/Ignite sessions that I did back in the day.

Remember: Everything in Azure is a virtual machine somewhere in Azure, even “serverless” functions.

The concept of Port ACLs was it gave you a simple firewall feature controlled through the virtualisation platform – the virtual machine and the guest OS had no control and had to comply. You set up simple rules to allow or deny transport layer (TCP/UDP) traffic on specific ports. For example, I could block all traffic to a NIC by default with a low-priority inbound rule and introduce a high-priority inbound rule to allow TCP 443 (HTTPS). Now I had a web service that could receive HTTPS traffic only, no matter what the guest OS admin/dev/operator did.

Where are Port ACLs implemented? Obviously, it is somewhere in the virtualisation product, but the clue is in the name. Port ACLs are implemented by the virtual switch port. Remember that a virtual machine NIC connects to a virtual switch in the host. The virtual switch connects to the physical NIC in the host and the external physical network.

A virtual machine NIC connects to a virtual switch using a port. You probably know that a physical switch contains several ports with physical cables plugged into them. If a Port ACL is implemented by a switch port and a VM is moved to another host, then what happens to the Port ACL rules? The Hyper-V networking team played smart and implemented the switch port as a property of the NIC! That means that any Port ACL rules that are configured in the switch port move with the NIC and the VM from host to host.

NSG and Admin Rules Are Port ACLs

Along came Azure and the cloud needed a basic rules system. Network Security Groups (NSGs) were released and gave us a pretty interface to manage security at the transport layer; now we can allow or deny inbound or outbound traffic on TCP/UDP/ICMP/Any.

What technology did Azure use? Port ACLs of course. By the way, Azure Virtual Network Manager introduced a new form of basic allow/deny control that is processed before NSG rules called Admin Rules. I believe that this is also implemented using Port ACLs.

A Little About NSG Rules

This is a topic I want to dive deep into later, but let’s talk a little about NSG rules. We can implement inbound (allow or deny traffic coming in) or outbound (allow or deny traffic going out) rules.

A quick aside: I rarely use outbound NSG rules. I prefer using a combination of routing and a hub firewall (dey all by default) to control egress traffic.

When I create a NSG I can associate it with:

  • A NIC: Only that NIC is affected
  • A subnet: All NICs, including Vnet integrated PaaS resources and Private Endpoints, are affected

The association is simply a management scaling feature. When you associate a NSG with a subnet the rules are not processed at the subnet.

Tip: virtual networks do not exist!

Associating a NSG resource with a subnet propagates the rules from the NSG to all NICs that are connected to that subnet. The processing is done by Port ACLs at the NIC.

This means:

  • Inbound rules prevent traffic from entering the virtual machine.
  • Outbound rules prevent traffic from leaving the virtual machine.

Which association should you choose? I advise you to use subnet association. You can see/manage the entire picture in one “interface” and have an easy-to-understand processing scenario.

If you want to micro-manage and have an unpredictable future then go ahead and associate NSGs with each NIC.

If you hate yourself and everyone around you, then use both options at the same time:

  • The subnet NSG is processed first for inbound traffic.
  • The NIC NSG is processed first for outbound traffic.

Keep it simple, stupid (the KISS principle).

Micro-Segmentation

As one might grasp, we can use NSGs to micro-segment a subnet. No matter what the resources do, they cannot bypass the security intent of the NSG rules. That means we don’t need to have different subnets for security zones:

  • We zone using NSG rules.
  • Virtual networks and their subnets do not exist!

The only time we need to create additional subnets is when there are compatibility issues such as NSG/Route table association or a PaaS resource requires a dedicated subnet.

Watch out for more content shortly where I break some myths and hopefully simplify some of this stuff for you. And if I’m doing this right, you might start to look at some Azure networks (like I have) and wonder “Why the heck was that implemented that way?”.

Azure Virtual Networks Do Not Exist

I see many bad designs where people bring cable-oriented designs from physical locations into Azure. I hear lots of incorrect assumptions when people are discussing network designs. For example: “I put shared services in the hub because they will be closer to their clients”. Or my all-time favourite: people assuming that ingress traffic from site-to-site connections will go through a hub firewall because it’s in the middle of their diagram. All this is because of one common mistake – people don’t realise that Azure Virtual Networks do not exist.

Some Background

Azure was designed to be a multi-tenant cloud capable of hosting an “unlimited” number of customers.

I realise that “unlimited” easily leads us to jokes about endless capacity issues 🙂

Traditional hosting (and I’ve worked there) is based on good old fashioned networking. There’s a data centre. At the heart of the data centre network there is a network core. The entire facility has a prefix/prefixes of address space. Every time that a customer is added, the network administrators carve out a prefix for the new customer. It’s hardly self-service and definitely not elastic.

Azure settled on VXLAN to enable software-defined networking – a process where customer networking could be layered upon the physical networks of Microsoft’s physical data centres/global network.

Falling Into The Myth

Virtual Networks make life easy. Do you want a network? There’s no need to open a ticket? You don’t need to hear from a snarky CCIE who snarls when you ask for a /16 address prefix as if you’ve just asked for Flash “RAID10” from a SAN admin. No; you just open up the Portal/PowerShell/VS Code and you deploy a network of whatever size you want. A few seconds later, it’s there and you can start connecting resources to the new network.

You fire up a VM and you get an address from “DHCP”. It’s not really DHCP. How can it be when Azure virtual networks do not support broadcasts or multicasts? You log into that VM and have networking issues so you troubleshoot like you learn how to:

  1. Ping the local host
  2. Ping the default gateway – oh! That doesn’t work.
  3. Traceroute to a remote address – oh! That doesn’t work either.

And then you start to implement stuff just like you would in your own data centre.

How “It Works”

Let me start by stating that I do not know how the Azure fabric works under the covers. Microsoft aren’t keen on telling us how the sausage is made. But I know enough to explain the observable results.

When you click the Create button at the end of the Create Virtual Machine wizard, an operator is given a ticket, they get clearance from data center security, they grab some patch leads, hop on a bike, and they cycle as fast as they can to the patch panel that you have been assigned in Azure.

Wait … no … that was a bad stress dream. But really, from what I see/hear from many people, they think that something like that happens. Even if that was “virtual”, the who thing just would not be scaleable.

Instead, I want you to think of an Azure virtual network as a Venn diagram. The process of creating a virtual network instructs the Azure fabric that any NIC that connects to the virtual network can route to any other NIC in the virtual network.

Two things here:

  1. You should take “route” as meaning a packet can go from the source NIC to the destination NIC. It doesn’t mean that it will make it through either NIC – we’ll cover that topic in another post soon.
  2. Almost everything in Azure is a virtual machine at some level in Azure. For example, “serverless” Functions run in Microsoft-managed VMs in a Microsoft tenant. Microsoft surfaces the Function functionality to you in your tenant. If you connect those PaaS services (like ASE or SQL Manage Instance) to your virtual network then there will be NICs that connect to a subnet.

Connecting a NIC to a virtual network adds the new NIC to the Venn Diagram. The Azure fabric now knows that this new NIC should be able to route to other NICs in the same virtual network and all the previous NICs can route to it.

Adding Virtual Network Peering

Now we create a second virtual network. We peer those virtual networks and then … what happens now? Does a magic/virtual pipe get created? Nope – it’s fault tolerant so two magic/virtual lines connect the virtual networks? Nope. It’s Venn diagram time again.

The Azure fabric learns that the NICs in Virtual Network 1 can now route to the NICs in Virtual Network 2 and vice versa. That’s all. There is no magic connection. From a routing/security perspective, the NICs in Virtual Network 1 are no different to the NICs in Virtual Network 2. You’ve just created a bigger mesh from (at least) two address prefixes.

Repeat after me:

Virtual networks do not exist

How Do Packets Travel?

OK Aidan (or “Joe” if you arrived here from Twitter), how the heck do packets get from one NIC to another?

Let’s melt some VMware fanboy brains – that sends me to my happy place. Aure is built using Windows Server Hyper-V; the same Hyper-V that you get with commercially available Windows Server. Sure, Azure layers a lot of stuff on top of the hypervisor, but if you dig down deep enough, you will find Hyper-V.

Virtual machines, your or those run by Microsoft, are connected to a virtual switch on the host. The virtual switch is connected to a physical ethernet port on the host. The host is addressable on the Microsoft physical network.

You come along and create a virtual network. The fabric knows to track NICs that are being connected. You create a virtual machine and connect it to the virtual network. Azure will place that virtual machine on one host. As far as you are concerned, that virtual machine has an address from your network.

You then create a second VM and connect it to your virtual network. Azure places that machine on a different host – maybe even in a different data centre. The fabric knows that both virtual machines are in the same virtual network so they should be able to reach each other.

You’ve probably use a 10.something address, like most other customers, so how will your packets stay in your virtual network and reach the other virtual machine? We can thank software-defined networking for this.

Let’s use the addresses of my above diagram for this explanation. The source VM has a customer IP address of 10.0.1.4. It is sending a packet to the destination VM with a customer address of 10.0.1.5. The packet will leave the source NIC, 10.0.1.4 and reaches the host’s virtual switch. This is where the magic happens.

The packet is encapsulated, changing the destination address to that of the destination virtual machine’s host. Imagine you are sending a letter (remember those?) to an apartment number. It’s not enough to say “Apartment 1”; you have to include other information to encapsulate it. That’s what the fabric enables by tracking where your NICs are hosted. Encapsulation wraps the customer packet up in an Azure packet that is addressed to the host’s address, capable of travelling over the Microsoft Global Network – supporting single virtual networks and peered (even globally) virtual networks.

The packet routes over the Microsoft physical network unbeknownst to us. It reaches the destination host, and the encapsulation is removed at the virtual switch. The customer packet is dropped into the memory of the destination virtual machine and Bingo! the transmission is complete.

From our perspective, the packet routes directly from source to destination. This is why you can’t ping a default gateway – it’s not there because it plays no role in routing because: the virtual network does not exist.

I want you to repeat this:

Packets go directly from source to destination

Two Most Powerful Pieces Of Knowledge

If you remember …

  • Virtual networks do not exist and
  • Packets go directly from source to destination

… then you are a huge way along the road of mastering Azure networking. You’ve broken free from string theory (cable-oriented networking) and into quantum physics (software-defined networking). You’ll understand that segmenting networks into subnets for security reasons makes no sense. You will appreciate that placing “shared services” in the hub offered no performance gain (and either broke your security model or made it way more complicated).

What Happens When An Azure Region Is Destroyed?

This is a topic that has been “top of mind” (I sound like a management consulting muppet) recently: how can I recover from an Azure region being destroyed?

Why Am I Thinking About This?

Data centres host critical services. If one of these data centres disappears then everything that was hosted in them is gone. The cause of the disaster might be a forest fire, a flood, or even a military attack – the latter was once considered part of a plot for a far-fetched airport novel but now we have to consider that it’s a real possibility, especially for countries close to now-proven enemies.

We have to accept that there is a genuine risk that an area that hosts several data centres could be destroyed, along with everything contained in those data centres.

Azure Resilience Features

Availability Sets

The first level of facility resilience in Microsoft’s global network (hosting all of their cloud/internal services) is the availability set concept; this is the default level of high availability designed to keep highly-available services online during a failure in a single fault domain (rack of computers) or deployment of changes/reboots to an update domain (virtual selection of computers) in a single row/room (rooms are referred to as colos). With everything in a single room/building we cannot consider an availability set to be a disaster resilience feature.

Availability Zones

The next step up is availability zones. Many Azure regions have multiple data centres. Those data centres are split into availability zones. Each availability zone has independent resources for networking, cooling and power. The theory is that if you spread a highly-available service across three zones, then if should remain operational if even two of the zones go down.

Source: https://learn.microsoft.com/en-us/azure/well-architected/reliability/regions-availability-zones

Paired Regions

An Azure region is a collection of data centres that are built close to each other (in terms of networking latency). For example, North Europe (Grangecastle, Dublin Ireland) has many physical buildings hosting Microsoft cloud services. Microsoft has applied to build more data centres in Newhall, Naas, Kildare, which is ~20 miles away but will only be a few milliseconds away on the Microsoft global network. Those new data centres will be used to expand North Europe – the existing site is full and more land would be prohibitively expensive.

Many Azure regions are deployed as pairs. Microsoft has special rules for picking the locations of those paired regions, including:

  • They must be a minimum distance apart from each other
  • They do not share risks of a common natural disaster

For example, North Europe in Dublin, Ireland is paired with West Europe in Middenmeer, Netherlands.

The pairing means that systems that have GRS-based storage are able to replicate to each other. The obvious example of that is a Storage Account. Less obvious examples are things like Recovery Services Vaults and some PaaS database systems that are built on Storage Account services such as blob or file.

Mythbusting

Microsoft Doesn’t Do Your Disaster Recovery By Default

Many people enter the cloud thinking that lots of things are done for them which are not. For example, when one deploys services in Azure, Azure does not replicate those things to the paired region for you unless:

  1. You opt-in/configure it
  2. You pay for it

That means if I put something in West US, I will need to configure and pay for replication somewhere else. If my resources use Virtual Networks, then I will need to have those Virtual Networks deployed in the other Azure Region for me.

The Paired Region Is Available To Me

In the case of hero regions, such as East US/West US or North Europe/West Europe, then the paired region is available to you. But in most cases that I have looked into, that is not the case with local regions.

Several regions do not have a paired region. And if you look at the list of paired regions, look for the * which denotes that the paired region is not available to you. For example:

  • Germany North is not available to customers of Germany West Central
  • Korea South is not available to users of Korea Central
  • Australia Central 2 is not available to customers of Australia Central
  • Norway West is not available to users of Norway East

The Norway case is a painful one. Many Norwegian organisations must comply with national laws that restrict the placement of data outside of Norwegian borders. This means that if they want to use Azure, they have to use Norway East. Most of those customers assume that Norway West will be available to them in the event of a disaster. Norway West is a restricted region; I am led to believe that:

  • Norway West is available to just three important Microsoft customers (3 or 10, it’s irrelevant because it’s not generally available to all customers).
  • It is hosted by a third-party company called Green Mountain near Stavanger, which is considerably smaller than an Azure region. This means that it will be small and offer a small subset of typical Azure services.

Let’s Burn Down A Region! (Hypothetically)

What’ll happen we this happens to an Azure region?

The Disaster

We can push the cause aside for one moment – there are many possible causes and the probability of each varies depending on the country that you are talking about. Certainly, I have discovered, both public and private organisations in some countries genuinely plan for some circumstances that one might consider a Tom Clancy fantasy.

I have heard Microsoft staff and heard of Microsoft staff telling people that we should use availability zones as our form of disaster recovery, not paired regions. What good will an availability zone do me if a missile, fire, flood, chemical disaster, or earthquake takes out my Azure region? Could it be that there might be other motivations for this advice?

Paired Region Failover

Let’s just say that I was using a region with an available pair. In the case of GRS-based services, we will have to wait for Microsoft to trigger a failover. I wonder how that will fare? Do you think that’s ever been tested? Will those storage systems ever have had the load that’s about to be placed on them?

As for your compute, you can forget it. You’re not going to start up/deploy your compute in the paired region. We all know that Azure is bursting at the seams. Everyone has seen quota limits of one kind of another restrict our deployments. The advice from Microsoft is to reserve your capacity – yes, you will need to pre-pay for the compute that you hope you will never need to use. That goes against the elastic and bottomless glass concepts we expect from The Cloud but reality bites – Azure is a business and Microsoft cannot afford to have oodles of compute sitting around “just in case”.

Non-Available Pair Failover

This scenario sucks! Let’s say that you are in Sweden Central or the new, not GA, region in Espoo, Finland. The region goes up in a cloud of dust & smoke, and now you need to get up and running elsewhere. The good news is that stateless compute is easy to bring online anywhere else – as long as there is capacity. But what about all that data? Your Data Lake is based on blob storage and couldn’t replicate anywhere. Your databases are based on blob/file storage and couldn’t replicate anywhere. Azure Backup is based on blob and you couldn’t enable cross-region restore. Unless you chose your storage very carefully, your data is gone along with the data centres.

Resource Groups

This one is fun! Let’s say I deploy some resources in Korea Central. Where will my resource group be? I will naturally pick Korea Central. Now let’s enable DR replication. Some Azure services will place the replica resources in the same resource group.

Now let’s assume that Korea Central is destroyed. My resources are hopefully up and running elsewhere. But have you realised that the resource IDs of those resources include the resource group that is in Korea Central (the destroyed region) then you will have some problems. According to Microsoft:

If a resource group’s region is temporarily unavailable, you might not be able to update resources in the resource group because the metadata is unavailable. The resources in other regions still function as expected, but you might not be able to update them. This condition might also apply to global resources like Azure DNS, Azure DNS Private Zones, Azure Traffic Manager, and Azure Front Door. You can view which types have their metadata managed by Azure Resource Manager in the list of types for the Azure Resource Graph resources table.

The same article mentions that you should pick an Azure region that is close to you to optimise metadata operations. I would say that if disaster recovery is important, maybe you should pick an Azure region that is independent of both your primary and secondary locations and likely to survive the same event that affects your primary region – if your resource types support it.

The Solution?

I don’t have one, but I’m thinking about it. Here are a few thoughts:

  • Where possible, architect workloads where compute is stateless and easy to rebuild (from IaC).
  • Make sure that your DevOps/GitHub/etc solutions will be available after a disaster if they are a part of your recovery strategy.
  • Choose data storage types/SKUs/tiers (if you can) that offer replication that is independent of region pairing.
  • Consider using IaaS for compute. IaaS, by the way, isn’t just simple Windows/Linux VMs. AKS is a form of very complicated IaaS. IaaS has the benefit of being independent of Azure, and can be restored elsewhere.
  • Use a non-Microsoft backup solution. Veeam for example (thank you Didier Van Hoye, MVP) can restore to Azure, on-premises, AWS, or GCP.

What Do You Think?

I know that there are people in some parts of the world will think I’ve fallen off something and hit my head 🙂 I get that. But I also know, and it’s been confirmed by recent private discussions, that my musings here are already considered by some markets when adoption of The Cloud is raised as a possibility. Some organisations/countries are forced to think along these lines. Just imagine how silly folks in Ukraine would have felt if they’d deployed all their government and business systems in a local (hypothetical) Azure region without any disaster recovery planning; one of the first things to be at the wrong end of a missile would have been those data centres.

Please use the comments or social media and ping me your thoughts.

Azure Virtual Networks Do Not Exist

In this post, I want to share the most important thing that you should know when you are designing connectivity and security solutions in Microsoft Azure: Azure virtual networks do not exist.

A Fiction Of Your Own Mind

I understand why Microsoft has chosen to use familiar terms and concepts with Azure networking. It’s hard enough for folks who have worked exclusively with on-premises technologies to get to grips with all of the (ongoing) change in The Cloud. Imagine how bad it would be if we ripped out everything they knew about networking and replaced it with something else.

In a way, that’s exactly what happens when you use Azure’s networking. It is most likely very different to what you have previously used. Azure is a multi-tenant cloud. Countless thousands of tenants are signed up and using a single global physical network. If we want to avoid all the pains of traditional hosting and enable self-service, then something different has to be done to abstract the underlying physical network. Microsoft has used VXLAN to create software-defined networking; this means that an Azure customer can create their own networks with address spaces that have nothing to do with the underlying physical network. The Azure fabric tracks what is running where, and what NICs can talk to each other, and forward packets as required.

In Azure, everything is either a physical (rare) or a virtual (most common) machine. This includes all the PaaS resources and even those so-called serverless resources. When you drill down far enough in the platform, you will find either a machine with an operating system with a NIC. That NIC is connected to a network of some kind, either an Azure-hosted one (in the platform) or a virtual network that you created.

The NIC Is The Router

The above image is from a slide I use quite often in my Azure networking presentations. I use it to get a concept across to the audience.

Every virtual machine (except for Azure VMware Services) is hosted on a Hyper-V host, and remember that most PaaS services are hosted in virtual machines. In the image, there are two virtual machines that want to talk to each other. They are connected to a common virtual network that uses a customer-defined prefix of 10.0.0.0/8.

The source VM sends a packet to 10.10.1.5. The packet exits the VM’s guest OS and hits the Azure NIC. The NIC is connected to a virtual switch in the host – did you know that in Hyper-V, the switch port is a part of the NIC to enable consistent processing no matter what host the VM is moved to? The virtual switch encapsulates the packet to enable transmission across the physical network – the physical network has no idea about the customer’s prefix of 10.0.0.0/8. How could it? I’d guess that 80% of customers use all or some of that prefix. Encapsulation allows the pack to hide the customer-defined source and destination addresses. The Azure Fabric knows where the customer’s destination (10.10.1.5) is running, so it uses the physical destination host’s address in the encapsulated packet.

Now the packet is free to travel across the physical Azure network – across the rack, data centre, region or even the global network – to reach the destination host. Now the packet moves up the stack, is encapsulated and dropped into the NIC of the destination VM where things like NSG rules (how the NSG is associated doesn’t matter) are processed.

Here’s what you need to learn here:

  1. The packet went directly from source to destination at the customer level. Sure it travelled along a Microsoft physical network but we don’t see that. We see that the packet left the source NIC and arrived directly at the destination NIC.
  2. Each NIC is effectively its own router.
  3. Each NIC is where NSG rules are processed: source NIC for outbound rules and destination NIC for inbound rules.

The Virtual Network Does Not Exist

Have you ever noticed that every Azure subnet has a default gateway that you cannot ping?

In the above example, no packets travelled across a virtual network. There were no magical wires. Packets didn’t go to a default gateway of the source subnet, get routed to a default gateway of a destination subnet and then to the destination NIC. You might have noticed in the diagram that the source and destination were on different peered virtual networks. When you peer a virtual network, an operator is not sent sprinting into the Azure data centres to install patch cables. There is no mysterious peering connection.

This is the beauty and simplicity of Azure networking in action. When you create a virtual network, you are simply stating:

Anything connected to this network can communicate with each other.

Why do we create subnets? In the past, subnets were for broadcast control. We used them for network isolation. In Azure:

  • We can isolate items from each other in the same subnet using NSG rules.
  • We don’t have broadcasts – they aren’t possible.

Our reasons for creating subnets are greatly reduced, and so are our subnet counts. We create subnets when there is a technical requirement – for example, an Azure Bastion requires a dedicated subnet. We should end up with much simpler, smaller virtual networks.

How To Think of Azure Networks

I cannot say that I know how the underlying Azure fabric works. But I can imagine it pretty well. I think of it simply as a mapping system. And I explain it using Venn diagrams.

Here’s an example of a single virtual network with some connected Azure resources.

Connecting these resources to the same virtual network is an instruction to the fabric to say: “Let these things be able to route to each other”. When the app service (with VNet Integration) wants to send a packet to the virtual machine, the NIC on the source VM will send the packets directly to the NIC of the destination VM.

Two more virtual networks, blue and green, are created. Note that none of the virtual networks are connected/peered. Resources in the black network can talk only to each other. Resources in the blue network can talk only to each other. Resources in the green network can talk only to each other.

Now we will introduce some VNet peering:

  • Black <> Blue
  • Black <> Green

As I stated earlier, no virtual cables are created. Instead, the fabric has created new mappings. These new mappings enable new connectivity:

  • Black resources can talk with blue resources
  • Black resources can talk with green resources.

However, green resources cannot talk directly to blue resources – this would require routing to be enabled via the black network with the current peering configuration.

I can implement isolation within the VNets using NSG rules. If I want further inspection and filtering from a firewall appliance then I can deploy one and force traffic to route via it using BGP or User-Defined Routing.

Wrapping Up

The above simple concept is the biggest barrier I think that many people have when it comes to good Azure network design. If you grasp the fact that virtual networks do not exist and that packets route directly from source to destination and then be able to process those two facts then you are well on your way to designing well-connected/secured networks and being able to troubleshoot them.

If You Liked This Article

If you liked this article, then why don’t you check out my custom Azure training with my company, Cloud Mechanix. My next course is Azure Firewall Deep Dive, a two day virtual course where I go through how to design and implement Azure Firewall, including every feature. This two day course runs on February 12/13, timed for (but not limited to) European attendees.

Will International Events Impact Cloud Computing

You must have been hiding under a rock if you haven’t noticed how cloud computing has become the default in IT. I have started to wonder about the future of cloud computing. Certain international events have the potential to disrupt cloud computing in a major way. I’m going to play out two scenarios in this post and illustrate what the possible problems may be.

Bear In The East

Russia expanded their conflict with Ukraine in February 2024. This was the largest signal so far that the leadership of Russia wanted to expand their post-Soviet borders to include some of the former USSR nations. The war in Ukraine is taking much longer than expected and has eaten the Russian military, thanks to the determination of the Ukrainian people. However, we know that Russia has eyes elsewhere.

The Baltic nations (Lithuania, Latvia and Estonia) provide a potential land link between Russia and the Baltic Sea. North of those nations is Finland, a country with a long & wild border with Russia – and also one with a history of conflict with Russia. Finland (and Sweden) has recognised the potential of this expanded threat by joining NATO.

If you read “airport thrillers” like me, then you’ll know that Sweden has an island called Gotland in the Baltic Sea. It plays a huge strategic role in controlling that sea. If Russia were to take that island, they could prevent resupply via the Baltic Sea to the Baltic countries and Finland, leaving only air, land, and the long route up North – speaking of which …

Norway also shares a land border with Russia to the north of Finland. The northern Norwegian coast faces the main route from Murmansk (a place I attacked many times when playing the old Microprose F-19 game). Murmansk is the home of the Russian Atlantic fleet. Their route to the Atlantic is north of the Norwegian coast and south between Iceland and Ireland.

In the Artic is Svalbard, a group of islands that is host to polar bears and some pretty tough people. This island is also eyed up by Russia – I’m told that it’s not unusual to hear stories of some kind of espionage there.

So Russia could move west and attack. What would happen then?

Nordic Azure Regions

There are several Azure regions in the Nordics:

  • Norway East, paired with Norway West
  • Sweden Central, paired with Sweden South
  • One is “being built” in Espoo, Finland, just outside the capital of Helsinki.

Norway West is a small facility that is hosted in a third-party data centre and is restricted to a few customers.

I say “being built” with the Finish region because I suspect that its been active for a while with selected customers. Not long after the announcement of the region (2022) I had a nationally strategic customer tell me that the local Microsoft data centre salesperson was telling them to stop deploying in Azure West Europe (Netherlands) and to start using the new Finnish region.

FYI: the local Microsoft data centre salesperson has a target of selling only the local Azure region. The local subsidiary has to make a usage commitment to HQ before a region is approved. Adoption in another part of Azure doesn’t contribute to this target.

I remember this conversation because it was not long after tanks rolled into Ukraine and talk of Finland joining NATO began heating up. I asked my customer: “Let’s say you place nationally critical services into the new Finnish region. What is one of the first things that Russia will send missiles to?” Yes, they will aim to shut down any technology and communications systems first … including Azure regions. All the systems hosted in Espoo will disappear in a flaming pile of debris. I advised the customer that if I were them, I would continue to use cloud regions that were as far away as possible while still meeting legal requirements.

Norway’s situation is worse. Their local and central governments have to comply with a data placement law, which prevents the placement of certain data outside of Norway. If you’re using Azure, you have no choice, you must use Norway East, which is in urban Oslo (the capital on the south coast). Private enterprises can choose any of the European regions (they typically take West Europe/Netherlands, paired with North Europe/Ireland) so they have a form of disaster recovery (I’ll come back to this topic later). However, Norway East users cannot replicate into Norway West – the Stavanger-located region is only available to a select (allegedly) three customers and it is very small.

FYI: restricted access paired regions are not unusual in Azure.

Disaster Recovery

So a hypersonic missile just took out my Azure region – what do I do next? In an ideal world, all of your data was replicated in another location. Critical systems were already built with redundant replicas. Other systems can be rebuilt by executing pipelines with another Azure region selected.

Let’s shoot all of that down, shall we?

So I have used Norway East. And I’ve got a bunch of PaaS data storage systems. Many of those storage systems (Azure Backup recovery services vaults) are built on blob storage. Blob storage offers geo-redundancy which is restricted to the paired region. If my data storage can only replicate to the paired region and there is no paired region available to me, when there is no replication option. You will need to bake your own replication system.

Some compute/data resource types offer replication in any region. For example, Cosmos DB can replicate to other regions but that comes with potential sync/latency issues. Azure VMs offer Azure Site Recovery which enables replication to any region. This is where I expect the “cloud native” types to be “GitOps!” but they always seem to focus only on compute and forget things like data – no we won’t be putting massive data stores in an AKS container 🙂

Has anyone not experienced capacity issues in an Azure region in the last few years? There are probably many causes for that so we won’t go down that rabbit hole. But a simple task of deploying a new AVD worker pool or a firewall with zone resilience commonly results in a failure because the region doesn’t have capacity. What would happen if Norway East disappeared and all of the tenants started to failover/redeploy to other European regions? Let’s just say that there would be massive failures everywhere.

Orange Man In The West

Greenland is an autonomous territory of the Kingdom of Denmark. Being a Danish territory makes it a part of the EU. US president-elect, Donald Trump, has been sabre-rattling about Greenland recently. He either wants the US to take it over by economic (trade war) or military means.

If the USA goes into a trade war with Denmark, then it will go into a trade war with all of the EU. Neither side will win. If the tech giants continue to personally support Donald Trump then I can imagine the EU retaliating against them. Considering that Microsoft, Amazon, and Google are American companies, sanctions against those companies would be bad – the cost of cloud computing could rocket and make it unviable.

If the USA invaded Greenland (a NATO ally by virtue of being a Danish territory) then it would lead to very a unpleasant situation between NATO/EU and the USA. One could imagine that American companies would be shunned, not just emotionally but also legally. That would end Azure, AWS, and Google in the EU.

So how would one recover from losing their data and compute platform? It’s not like you can just live migrate a petabyte data lake or a workload based on Azure Functions.

The Answer

I don’t have a good answer. I know of an organisation that had a “only do VMs in Azure” policy. I remember bing dumbfounded at the time. They explained that it was for support reasons. But looking back on it, they abstracted themselves from Azure by use of an operating system. They could simply migrate/restore their VMs to another location if necessary – on-prem, another cloud, another country. They are not tied to the cloud platform, the location, or the hardware. But they do lose so many of the benefits of using the cloud.

I expect someone will say “use on-prem for DR”. OK, so you’ll build a private cloud, at huge expense and let it sit there doing nothing on the off-chance that it might be used. If I was in that situation then I wouldn’t be using Azure/etc at all!

I’ve been wondering for a while if the EU could fund/sponsor the creation of an IT sector in Europe that is independent from the USA. It would need an operating system, productivity software, and a cloud platform. We don’t have any tech giants as big or as cash rich as Microsoft in the EU so this would have to be sponsored. I also think that it would have to be a collaboration. My fear is that it would be bogged down in bureaucracy and have a heavy Germany/France first influence. But I am looking at the news every day and realsing that we need to consider a non-USA solution.

Wrapping Up

I’m all doom and gloom today. Maybe it’s all of the negativity in the news that is bringing me down. I see continued war in Ukraine, Russia attacking infrastructure in the Baltic sea, and threats from the USA. The world has changed and we all will need to start thinking about how we act in it.

Manage Existing Azure Firewall With Firewall Policy Using Bicep

In this post, I want to discuss how I recently took over the management of an existing Azure Firewall using Firewall Policy/Azure Firewall Manager and Bicep.

Background

We had a customer set up many years ago using our old templated Azure deployment based on ARM. At the centre of their network is Azure Firewall. That firewall plays a big role in the customer’s micro-segmented network, with over 40,000 lines of ARM code defining the many firewall rules.

The firewall was deployed before Azure Firewall Manager (AFM) was released. AFM is a pretty GUI that enables the management of several Azure networking resource types, including Azure Firewall. But when it comes to managing the firewall, AFM uses a resource called Firewall Policy; you don’t have to touch AFM at all – you can deploy a Firewall Policy, link the firewall to it (via Resource ID), and edit the Firewall Policy directly (Azure Portal or code) to manage the firewall settings or code.


One of the nicest features of Azure Firewall is a result of it being an Azure PaaS resource. Like every other resource type (there are exceptions sometimes) Azure Firewall is completely manageable via code. Not only can you deploy the firewall. You can operate it on a day-to-day basis using ARM/Bicep/Terraform/Pulumi if you want: the settings and the firewall rules. That means you can have complete change control and rollback using the features of Git in DevOps, GitHub, etc.


All new features in Azure Firewall have surfaced only via Firewall Policy since the general availability release of AFM. A legacy Azure Firewall that doesn’t have a Firewall Policy is missing many security and management features. The team that works regularly with this customer approached me about adding Firewall Policy to the customer’s deployment and including that in the code.

The Old Code

As I said before, the old code was written in ARM. I won’t get into it here, but we couldn’t add the required code to do the following without significant risk:

  • A module for Firewall Policy
  • Updating the module for Azure Firewall to include the link to the FIrewall Policy.

I got a peer to give me a second opinion and he agreed with my original assessment. We should:

  1. Create a new set of code to manage the Azure Firewall using Bicep.
  2. Introduce Firewall Policy via Bicep.
  3. Remove the ARM module for Azure Firewall from the ARM code.
  4. Leave the rest of the hub as is (ARM) because this is a mission-critical environment.

The High-Level Plan

I decided to do the following:

  1. Set up a new repo just for the Azure Firewall and Firewall Policy.
  2. Deploy the new code in there.
  3. Create a test environment and test like crazy there.
  4. The existing Azure Firewall public IP could not change because it was used in DNAT rules and by remote parties in their firewall rules.
  5. We agreed that there should be “no” downtime in the process but I wanted time for a rollback just in case. I would create non-parameterised ARM exports of the entire hub, the GatewaySubnet route table (critical to routing intent and a risk point in this kind of work), and the Azure Firewall. Our primary rollback plan would be to run the un-modified ARM code to restore everything as it was.

The Build

I needed an environment to work in. I did a non-parameterised export of the hub, including the Azure Firewall. I decompiled that to Bicep and deployed it to a dedicated test subscription. This did require some clean-up:

  • The public IP of the firewall would be different so DNAT rules would need a new destination IP.
  • Every rules collection group (many hundreds of them) had a resource ID that needed to be removed – see regex searches in Visual Studio Code.

The deployment into the test environment was a two-stage job – I needed the public IP address to obtain the destination address value for the DNAT rules.

Now I had a clone of the production environment, including all the settings and firewall rules.

The Bicep Code

I’ve been doing a lot of Bicep since the Spring of this year (2024). I’ve been using Azure Verified Modules (AVM) since early Summer – it’s what we’ve decided should be our standard approach, emulating the styling of Azure Verified Solutions.


We don’t use Microsoft’s landing zones. I have dug into them and found a commonality. The code is too impressive. The developer has been too clever. Very often, “customer configuration” is hard-coded into the Bicep. For example, the image template for Azure Image Builder (in the AVD landing zone) is broken up across many variables which are unioned until a single variable is produced. The image template is file that should be easy to get at and commonly updated.

A managed service provider knows that architecture (the code) should be separated from customer configuration. This allows the customer configuration to be frequently updated separately from the architecture. And, in turn, it should be possible to update the architecture without having to re-import the customer configuration.


My code design is simple:

  • Main.bicep which deploys the Azure Firewall (AVM) and the Firewal Policy (AVM).
  • A two-property paramater controls the true/false (bool) condition of whether or not the two resources are deployed.
  • A main.bicepparam supplies parameters to configure the SKUs/features/settings of the Azure Firewall and Firewall Policy using custom types (enabling complete Intellisense in VS Code).
  • A simple module documents the Rules Collections in single array. This array is returned as an output to main.bicep and fed as a single value to the Firewall Policy module.

I did attempt to document the Rules Collections as ARM and use the Bicep function to load an ARM file. This was my preference because it would simplify producing the firewall rules from the Azure Portal and inputting them into the file, both for the migration and for future operations. However, the Bicep function to load a file is limited to too few characters. The eventual Rules Colleciton Group module had over 40,000 lines!

My test process eventually gave me a clean result from start to finish.

The Migration

The migration was scheduled for late at night. Earlier in the afternoon, a freeze was put in place on the firewall rules. That enabled me to:

  1. Use Azure Firewall Manager to start the process of producing a Firewall Policy. I chose the option to import the rules from the existing production firewall. I then clicked the link to export the rules to ARM and saved the file locally.
  2. I decompiled the ARM code to Bicep. I copied and pasted the 3 Rules Collection Groups into my Rules Collection Group module.
  3. I then ran the deployment with no resources enabled. This told me that the pipeline was function correctly against the production environment.
  4. When the time came, I made my “backups” of the production hub and firewall.
  5. I updated the parameters to enable the deployment of the Firewall Policy. That was a quick run – the Azure Firewall was not touched so there was no udpate to the Firewall. This gave me one last chance to compare the firewall settings and rules before the final steps began.
  6. I removed the DNS settings from the Azure Firewall. I found in testing that I could not attach a Firewall Policy to an Azure Firewall if both contained DNS settings. I had to remove those settings from the production firewall. This could have caused some downtime to any clients using the firewall as their DNS server but the feature is not rolled out yet.
  7. I updated the parameters to enable management of the Azure Firewall. The code here included the name of the in-place Public IP Address. The parameters also included the resource IDs of the hub Virtual Network and the Log Analytics Workspace (Resource Specfic tables in the code). The pipeline ran … this was the key part because the Bicep code was updating the firewall with the resource ID of the Firewall Policy. Everything worked perfectly … almost … the old diagnostics settings were still there and had to be removed because the new code used a new naming standard. One quick deletion and a re-run and all was good.
  8. One of my colleagues ran a bunch of pre-documented and pre-verified tests to confirm that all was was.
  9. I then commented out the code for the Azure Firewall from the old ARM code for the hub. I re-ran the pipeline and cleaned up some errors until we had a repeated clean run.

The technical job was done:

  • Azure Firewall was managed using a Firewall Policy.
  • Azure Firewall had modern diagnostics settings.
  • The configuration is being done using code (Bicep).

You might say “Aidan, there’s a PowerShell script to do that job”. Yes there is, but it wasn’t going to produce the code that we needed to leave in place. This task did the work and has left the customer with code that is extremely flexible with every resource property available as a mandatory/optional property through a documented type specific to the resource type. As long as no bugs are found, the code can be used as is to configure any settings/features/rules in Azure Firewall or Azure Firewall manager either through the parameters files (SKUs and settings) or the Rules Collection Groups module (firewall rules).

Azure Firewall Deep Dive Training

If you thought that this post was interesting then please do check out my Azure Firewall Deep Dive course that is running on February 12th – February 13th, 2025 from 09:30-16:00 UK/Irish time/10:30-17:00 Amsterdam/Berlin time. I’ve run this course twice in the last two weeks and the feedback has been super.

Azure Firewall Deep Dive Training

I’ll tell you about my new virtual training course on Azure Firewall and share some schedule information in this post.

Background

I’ve been talking about Azure Firewall for years. I’ve done lots of sessions at user groups and conferences. I’ve done countless handovers with customers and colleagues. One of my talking points is that I reckoned that I could teach someone with a little Azure/networking knowledge everything there is to know about Azure Firewall in 2 days. And that’s what I decided to do!

I was updating one of my sessions earlier in the year when I realised that it was pretty must the structure of a training couse. Instead of me just listing out features or barely dicusssing architecture to squeeze it into a 45-60 minute-long session, I could take the time to dive deep and share all that I know or could research.

The Course

I produced a 2-day course that could be taught in-person, but my primary vector is virtual/online – it’s hard to get a bunch of people from all over into one place and there is also a cost to me in hosting a physical event that would increse the cost of the course. I decided that virtual was best, with an option off doing it in person if a suitable opportunity arose.

The course content is delivered using a combination of presentation and demo. Presentation lets me explain the what’s, why’s and so on. Demonstration lets me show you how.

The demo lab is built from a Bicep deployment, based on Azure Verified Modules (AVM). A hub & spoke network architecture is created with an Application Gateway, a simple VM workload, and a simple App Services (Private Endpoint) workload. The demonstrations follow a “hands-on guide”; this guide is written as if this was a step-by-step hands-on course, instructing the reader exactly which button to click and what/where to type. Each exercise builds on the last, eventually resulting in a secure network architecture with all of the security, monitoring, and management bells and whistles.

Why did I opt for demo instead of hands-on? Hands-on works for in-person classes. But you cannot assist in the same way when people struggle. In addition, waiting for attendees to complete labs would add another day (and cost) to the class.

Before and class, I share all of the content that I use:

  • System requirements and setup instructions.
  • The Bicep files for the demo lab.
  • The hands-on lab instructions
  • The PowerPoint
  • And a few more useful bits

I always update content – for example, my first run of this class was during Microsoft Ignite 2024 and I added a few bits from the news. Therefore I share the updated content with attendees after the course.

The First Run

I ran the class for the first time earlier this week, Novemer 20-21 2024. Attendees from all around Europe joined me for 2 days. At first they were quiet. Online is tough for speakers like me because I look for visual feedback on how I’m doing. But then the questions started coming – people were interested in what I was saying. Interaction also makes the class more interesting for me – sometimes you get comments that coer things you didn’t originally include and everyone benefits – I updated the course with one such item at the end of day 1!

I shared a 4-question anonymouse survey to learn what people thought. The feedback was awesome.

Feedback

This course was previously run in November 2024 for a European audience. The survey feedback was as follows:

How would you rate this course?

  • Excellent: 83%
  • Good: 17%

Was This Course Worth Your Time?

  • Yes: 100%

Would you recommend this course to others?

  • Yes: 100%

Some of the comments:

“I think it is a very good introduction to Azure Firewall, but it goes beyond foundational concepts so medium- experienced admins will also get value from this. I like the sections on architecture and explanations of routing and DNS. I think this course will enable people to do a good job more than for example az 700 because of the more practical approach. You are good at explaining the material”.

“Just what I wanted from a Deep dive course.”

“Perfectly delivered. Crystal clear content and very well explained”.

Future Classes

I have this class scheduled for two more runs, each timed for different parts of the world:

The classes are ultra-affordable. A few hundred Euros/dollars gets you custom content based on real-world usage. I did fint a virtual 2-day course on Palo Alto firewalls that cost $1700! You’ll also find that I run early-bird registration costs and discounts for more than 1 booking. If you have a large group (5+) then we might be able to figure out a lower rate 🙂

More To Come

More classes are coming! I have an old one to reinvent based on lots of experience over the years and at least 1 new one to write from scratch. Watch out for more!

Azure Image Builder Job Fails With TCP 60000, 5986 or 22 Errors

In this post, I will explain how to solve the situation when an Azure Image Builder job fails with the following errors:

[ERROR] connection error: unknown error Post “https://10.1.10.9:5986/wsman&#8221;: proxyconnect tcp: dial tcp 10.0.1.4:60000: i/o timeout
[ERROR] WinRM connection err: unknown error Post “https://10.1.10.9:5986/wsman&#8221;: proxyconnect tcp: dial tcp 10.0.1.4:60000: i/o timeout

[ERROR]connection error: unknown error Post “https://10.1.10.8:5986/wsman&#8221;: context deadline exceeded
[ERROR] WinRM connection err: unknown error Post “https://10.1.10.8:5986/wsman&#8221;: context deadline exceeded

Note that the second error will probably be the following if you are building a Linux VM:

[ERROR]connection error: unknown error Post “https://10.1.10.8:22/ssh&#8221;: context deadline exceeded
[ERROR] SSH connection err: unknown error Post “https://10.1.10.8:22/ssh&#8221;: context deadline exceeded

The Scenario

I’m using Azure Image Builder to prepare a reusable image for Azure Virtual Desktop with some legacy software packages from external vendors. Things like re-packaging (MSIX) will be a total support no-no, so the software must be pre-installed into the image.

I need a secure solution:

  • The virtual machine should not be reachable on the Internet.
  • Software packages will be shared from a storage account with a private endpoint for the blob service.

This scenario requires that I prepare a virtual network and customise the Image Template to use an existing subnet ID. That’s all good. I even looked at the PowerShell example from Microsoft which told me to allow TCP 60000-60001 from Azure Load Balancer to Virtual Network:

I also added my customary DenyAll rule at priority 4000 – the built in Deny rule doesn’t deny all that much!

I did that … and the job failed, initially with the first of the errors above, related to TCP 60000. Weird!

Troubleshooting Time

Having migrated countless legacy applications with missing networking documentation into micro-segmented Azure networks, I knew what my next steps were:

  1. Deploy Log Analytics and a Storage Account for logs
  2. Enable VNet Flow Logs with Traffic Analytics on the staging subnet (where the build is happening) NSG
  3. Recreate the problem (do a new build)
  4. Check the NTANetAnalytics table in Log Analytics

And that’s what I did. Immediately I found that there were comms problems between the Private Endpoint (Azure Container Instance) and the proxy VM. TCP 60000 was attempted and denied because the source was not the Azure Load Balancer.

I added a rule to solve the first issue:

I re-ran the test (yes, this is painfully slow) and the job failed.

This time the logs showed failures from the proxy VM to the staging (build) VM on TCP 5986. If you’re building a Linux VM then this will be TCP 22.

I added a third rule:

Now when I test I see the status switch from Running, to Distributing, to Success.

Root Cause?

Adding my DenyAll rule caused the scenario to vary from the Microsoft docs. The built-in AllowVnetInBound rule is too open because it allows all sources in a routed “network”, including other networks in a hub & spoke. So I micro-segment using a low priority DenyAll rule.

The default AllowVnetInBound rule would have allowed the container>proxy and proxy>VM traffic, but I had overridden it. So I need to create rules to allow that traffic.

Lots of Speaking Activity

After a quiet few pandemic years with no in-person events and the arrival of twins, my in-person presentation activity was minimal. My activity has started to increase, and there have been plenty of recent events and more are scheduled for the near future.

The Recent Past

Experts Live Netherlands 2024

It was great to return to The Netherlands to present at Experts Live Netherlands. Many European IT workers know the Experts Live brand; Isidora Maurer (MVP) has nurtured & shepherded this conference brand over the years, starting with a European-wide conference and then working with others to branch it out to localised events that give more people a chance to attend. I presented at this event a few years ago but personal plans prevented me from submitting again until this year. And I was delighted to be accepted as a speaker.

Hosted in Nieuwegein, just a short train ride from Schiphol airport in Amsterdam (Dutch public transport is amazing) the conference featured a packed expo hall and many keen attendees. I presented my “Azure Firewall: The Legacy Firewall Killer” session to a standing-room-only room.

TechMentor Microsoft HQ 2024

The first conference I attended was WinConnections 2004 in Lake Las Vegas. That conference changed my career. I knew that TechMentor had become something like that – the quality of the people I knew who were presenting at the event in the past was superb. I had the chance to submit some sessions this time around and was happy to have 3 accepted, including a pre-conference all-day seminar.

I worked my tail off on that “pre-con”. It’s an expansion of one of my favourite sessions that many events are scared of, probably because they think it’s too niche or too technical: “Routing – The Virtual Cabling of Secure Azure Networking”. Expanding a 1 hour session to a full day might seem daunting but I had to limit how much content I included! Plus I had to make this a demo session. I worked endless hours on a Bicep deployment to build a demo lab for the attendees. This was necessary because it would take too long to build by hand. I had issues with Azure randomly failing and with version stuff changing inside Microsoft. As one might expect, the demo gods were not kind on the day and I quickly had to pivot from hands-on labs to demos. While the questions were few during the class, there were lots of conversations during the breaks and even on the following days.

My second session was “Azure Firewall: The Legacy Firewall Killer” – this is a popular session. I like doing this topic because it gives me a chance to crack a few jokes – my family will groan at that thought!

My final session was the one that I was most worried about. “Your Azure Migration Project Is Doomed To FAIL” was never accepted by any event before. I think the title might seem negative but it’s meant to be funny. The content is based on my experience dealing with mid-large organisations who never quite understand the difference between cloud migration and cloud adoption. I explain this through several fictional stories. There is liberal use of images from Unsplash and opportunities to make some laughter. This have been the session that I was least confident in, but it worked.

TechMentor puts a lot of effort into mixing the attendees and the presenters. On the first night, attendees and presenters went to a local pizza place/bar and sat in small booths. We had to talk to each other. The booth that I was at featured people from all over the USA with different backgrounds. People came and went, but we talked and were the last to leave. On the second day, lunch was an organised affair where each presenter was host to a table. Attendees could grab lunch and sit with a presenter to discuss what was on their minds. I knew that migrations were a hot topic. And I also knew that some of those attendees were either doing their first migration or first re-attempt at a migration. I was able to tune my session a tiny bit to the audience and it hit home. I think the best thing about this was the attention I saw in the room, the verbal feedback that I heard just after the session, and the folks who came up to talk to me after.

A Break

I brought my family to the airport the day before I flew to TechMentor. They were going to Spain for 4 weeks and I joined them a few days later after a l-o-n-g Seattle-Las Angeles-Dublin-Alicante journey (I really should have stayed one extra night in Seattle and taken the quicker 1-hop via Iceland).

33+ Celsius heat, sunshine, a pool, a relaxed atmosphere in a residential town (we didn’t go to a “hotel town”) was a great place to work for a week and then do two weeks of vacation.

I went running most mornings, doing 5-7KMs. I enjoy getting up early in places like this, picking a route to explore on a map, and hitting the streets to discover the locality and places to go with my family. It’s so different to home where I have just two routes with footpaths that I can use.

Coming home was a shock. Ireland isn’t the sunniest or the warmest place in the world, but it feels like mid-winter at the moment. I think I acclimatised to Spain as much as a pasty Irish person can. This morning I even had to put a jacket on and do a couple of KMs to wait for my legs to warm up before picking up the pace.

Upcoming Events

There are three confirmed events coming up:

Nieuwegein (Netherlands) September 11: Azure Fest 2025

I return to this Dutch city in a few days to do a new session “Azure Virtual Network Manager”. I’ve been watching this product develop since the private preview. It’s not quite ready (pricing is hopefully being fixed) but it could be a complete game changer for managing complex Azure networks for secure/compliant PaaS and IaaS deployments. I’ll discuss and demo the product, sharing what I like and don’t like.

Dublin (Ireland) October 7: Microsoft Azure Community & AI Day

Organised by Nicolas Chang (MVP) this event will feature a long list of Irish MVPs discussing Azure and AI in a rapid-fire set of short sessions. I don’t think that the event page has gone live yet so watch out for it. I will be presenting the “Azure Virtual Network Manager” again at this event.

TBA: Nordics

I’ve confirmed my speaking slots for 2 sessions at an event that has not publicly announced the agenda yet. I look forward to heading north and sharing some of my experiences.

My Sessions

If you are curious, then you can see my Sessionize public profile here, which is where you’ll see my collection of available sessions.