Microsoft recently announced a public preview of User-Defined Route (UDR) management using Azure Virtual Network Manager. I’ve taken some time to play with it, and here are my thoughts.
Azure Virtual Network Manager (AVNM)
AVNM has been around for a while but I have mostly ignored it up to now because:
The connectivity configuration feature (centrally manage VNet connections) was pointless to me without route management – what’s the point of a hub & spoke in a business setting without a firewall?
I liked the Security Admin Rule configuration (same tech as NSG rules in the Hyper-V switch port, but processed before NSG rules) but pricing of AVNM was too much – more on this later.
Connectivity was missing something – the ability to deploy UDRs or BGP routes from a central policy that would force a next hop to a routing/firewall appliance.
AVNM is deployed centrally but can operate potentially across all virtual networks in your tenant (defined by a scope at the time of deployment) and even across other tenants via mutually agreed guest access – the latter would be useful in acquisition or managed services scenarios.
Routing Configuration Preview
Routing Configuration was introduced as a preview on May 2nd. Immediately I was drawn to it. But I needed to spend some time with it – to dig a little deeper and not just start spouting off without really understanding what was happening. I spent quite a bit of time reading and playing last week and now I feel happier about it.
Network Groups
Network Groups power everything in AVNM. A Network Group is a listing of either subnets or virtual networks. It can be a static list that you define or it can be a dynamic query.
At first, dynamic query looks cool. You can build up a dynamic query using one or a number of parameters:
Name
Id
Tags
Location
Subscription Name
Subscription ID
Subscription Tags
Resource Group Name
Resource Group Id
When you add members via a query, an Azure Policy is created.
When that policy (re)evaluates a notification is sent to AVNM and any policies that target the updated network group are applied to the group members. That creates a possible negative scenario:
You build a workload in code from VNet all the way through to resource/code
You deploy the workload IaC
The VNet is deployed, without any peering/routing configurations because that’s the job of AVNM
The workload components that rely on routing/peering fail and the deployment fails
Azure Policy runs some time later and then you can run your code.
Ick! You don’t want to code peering/routing if it’s being deployed by AVNM – you could end up with a mess when code runs and then AVNM runs and so on.
What do you do? AVNM has a very nice code structure under the covers. The AVNM resource is simple – all the configurations, the rules collections, and rules, and groups are defined as sub-resources. One could build the group membership using static membership and place the sub-resource with the workload code. That will mean that the app registration used by the pipeline will require rights in the central AVNM – that could be an issue because AVNM is supposed to be a governance tool.
Ideally, Azure Policy would trigger much faster than it does (not scientific, but it was taking 15-ish minutes in my tests) and update the group membership with less latency. Once the membership is updated, configurations are deployed nearly instantly – faster than I could measure it.
Routing Configuration
I like how AVNM has structured the configurations for Security Admin Rules and Routing Configurations. It reminds me of how Azure Firewall has handled things.
Rule Collection
A Routing Configuration is deployed to a scope. The configuration is like a bucket – it has little in the way of features – all that happens in the Routing Rule Collections. The configuration contains one or more Rule Collections. Each collection targets a specific group. So I could have three groups defined:
Production
Dev
Secure
Each would have a different set of routes, the rules, defined. I have only one deployment (the configuration) which automatically applies to the correct VNets/subnets based on the group memberships. If I am using dynamic group membership, I can use governance features like tags (which can be controlled from management groups, subscriptions, resource groups or at the resource level) for large-scale automation and control.
There are 3 kinds of Local Route Setting – this configures:
How many Route Tables are deployed per VNet and how they are associated
Whether or not a route to the prefix of the target resource is created
None Specified
Direct Routing Within Virtual Network
Direct Routing Within Subnet
How Many Route Tables?
One per VNet
One per VNet
One per subnet
Association
All subnets in the VNet
All subnets in the VNet
With the subnet
Local_0 Route
N/A
Yes > VNet Prefix
Yes > Subnet Prefix
Local Route Setting in a Rule Collection
The Route Tables are created an AVNM-managed resource group in the target subscription. If you choose one of the “Direct …” Local Route Setting options then a UDR is created for the target prefix:
Direct Routing Within Virtual Network
Direct Routing Within Subnet
Address Prefix
The VNet Prefix
The subnet prefix
Next Hop Type
Virtual Network
Virtual Network
Using the Direct Routing options
The concept is that you can force routing to stay within the target VNet/Subnet if the destination is local, while routing via a different next hop when leaving the target. For example, force traffic to the local VNet via the firewall while staying in the subnet (Direct Routing Withing Subnet). Note that the Default rules for VNet via Virtual Network are not deactivated by default, which you can see below – localRoute_0 is created by AVNM to implement the “Direct …” option.
You have the option to control BGP propagation – which is important when using a firewall to isolate site-to-site connections from your Azure services.
Some Notes
AVNM isn’t meant to be the “I’ll manage all the routes centrally” solution. It manages what is important to the organisation – the governance of the network security model. You have the ability to edit routes in the resulting Route Table. So if I need to create custom routes for PaaS services or for a special network design then I can do that. The resulting Route Tables are just regular Azure Route Tables so I can add/edit/remove routes as I desire.
If you manually create a route in the Route Table and AVNM then tries to create a route to the same destination then AVNM will ignore the new rule – it’s a “what’s the point?” situation.
If someone updates an AVNM-managed rule then AVNM will not correct it until there is a change to the Rule Collection. I do not like this. I deem this to be a failure in the application of governance.
Pricing
This is the graveyard of AVNM. If you run Azure like a small business then you lump lots of workloads into a few subscriptions. If you, like I started doing years ago, have a “1 workload per subscription” model (just like in the Azure Cloud Adoption Framework) then AVNM is going to be pricey!
AVNM costs $0.10/subscription/hour. At 730 hours per average month, AVNM for a single subscription will cost $73/month. Let’s say that I have 100 workloads. That will cost me $7300/month! Azure Firewall Premium (compute only) costs $1277.50/month so how could some policy tool cost nearly 6 times more!?!?!
Quite honestly, I would have started to use AVNM last year for a customer when we wanted to roll out “NSG rules” to every subnet in Azure. I didn’t want to do an IaC edit and a DevOps pull request for every workload. That would have taken days/hours (and it did take days). I could have rolled out the change using AVNM in minutes. But the cost/benefit wasn’t worth it – so I spent days doing code and pull requests.
I hear it again and again. AVNM is not perfect, but its usable (feature improvements will come). But the pricing kills it before customer evaluation can even happen.
Conclusion
If a better triggering system for dynamic member Network Groups can be created then I think the routing solution is awesome. But with the pricing structure that is there today, the product is dead to me, which makes me sad. Come on Microsoft, don’t make me sad!
Have you wondered why an Azure subnet with no route table has so many default routes? What the heck is 25.176.0.0/13? Or What is 198.18.0.0/15? And why are they routing to None?
The Scenario
You have deployed a virtual machine. The virtual machine is connected to a subnet with no Route Table. You open the NIC of the VM and view Effective Routes. You expect to see a few routes for the non-RFC1918 ranges (10.0.0.0/8, 172.16.0.0/12, etc) and “quad zero” (0.0.0.0/0) but instead you find this:
What in the nelly is all that? I know I was pretty freaked out when I first saw it some time ago. Here are the weird addresses in text, excluding quad zero and the virtual network prefix:
The first thing that you might notice is the next hop which is sent to None.
Remember that there is no “router” by default in Azure. The network is software-defined so routing is enacted by the Azure NIC/the fabric. When a packet is leaving the VM (and everything, including “serverless”, is a VM in the end unless it is physical) the Azure NIC figures out the next hop/route.
When traffic hits a NIC, the best route is selected. If that route has a next hop set to None then the traffic is dropped like it disappeared into a black hole. We can use this feature as a form of “firewall – we don’t want the traffic so “Abracadabra – make it go away”.
A Microsoft page (and some googling) gives us some more clues.
RFC-1918 Private Addresses
We know these well-known addresses, even if we don’t necessarily know the RFC number:
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
These addresses are intended to be used privately. But why is traffic to them dropped? If your network doesn’t have a deliberate route to other address spaces then there is no reason to enable routing to them. So Azure takes a “secure by default” stance and drops the traffic.
Remember that if you do use a subset of one of those spaces in your VNet or peered VNets, then the default routes for those prefixes will be selected ahead of the more general routes that dropping the traffic.
RFC-6598 Carrier Grade NAT
The subnet, 100.64.0.0/10, is defined as being used for carrier-grade NAT. This block of addresses is specifically meant to be used by Internet service providers (or ISPs) that implement carrier-grade NAT, to connect their customer-premises equipment (CPE) to their core routers. Therefore we want nothing to do with it – so drop traffic to there.
Microsoft Prefixes
20.35.252.0/22 is registered in Redmond, Washington, the location of Microsoft HQ. Other prefixes in 20.235 are used by Exchange Online for the US Government. That might give us a clue … maybe Microsoft is firewalling sensitive online prefixes from Azure? It’s possible someone could hack a tenant, fire up lots of machines to act as bots and then attack sensitive online services that Microsoft operates. This kind of “route to None” approach would protect those prefixes unless someone took the time to override the routes.
104.146.0.0/17 is a block that is owned by Microsoft with a location registered as Boydton, Virginia, the home of the East US region. I do not know why it is dropped by default. The zone that resolves names is hosted on Azure Public DNS. It appears to be used by Office 365, maybe with sharepoint.com.
104.147.0.0/16 is also owned by Microsoft which is also in Boydton, Virginia. This prefix is even more mysterious.
Doing a google search for 157.59.0.0/16 on the Microsoft.com domain results in the fabled “google whack”: a single result with no adverts. That links to a whitepaper on Microsoft.com which is written in Russian. The single mention translates to “Redirecting MPI messages of the MyApp.exe application to the cluster subnet with addresses 157.59.x.x/255.255.0.0.” . This address is also in Redmond.
23.103.0.0/18 has more clues in the public domain. This prefix appears to be split and used by different parts of Exchange Online, both public and US Government.
The following block is odd:
25.148.0.0/15
25.150.0.0/16
25.152.0.0/14
25.156.0.0/16
25.159.0.0/16
25.176.0.0/13
25.184.0.0/14
25.4.0.0/14
They are all registered to Microsoft in London and I can find nothing about them. But … I have a sneaky tin (aluminum) foil suspicion that I know what they are for.
40.108.0.0/17 and 40.109.0.0/16 both appear to be used by SharePoint Online and OneDrive.
Other Special Purpose Subnets
RFC-5735 specifies some prefixes so they are pretty well documented.
127.0.0.0/8 is the loopback address. The RFC says “addresses within the entire 127.0.0.0/8 block do not legitimately appear on any network anywhere” so it makes sense to drop this traffic.
198.18.0.0/15 “has been allocated for use in benchmark tests of network interconnect devices … Packets with source addresses from this range are not meant to be forwarded across the Internet”.
Adding User-Defined Routes (UDRs)
Something interesting happens if you start to play with User-Defined Routes. Add a table to the subnet. Now add a UDR:
Prefix: 0.0.0.0/0
Next Hop: Internet
When you check Effective Routes, the default route to 0.0.0.0/0 is deactivated (as expected) and the UDR takes over. All the other routes are still in place.
If you modify that UDR just a little, something different happens:
Prefix: 0.0.0.0/0
Next Hop: Virtual Appliance
Next Hop IP Address: {Firewall private IP address}
All the mysterious default routes are dropped. My guess is that the Microsoft logic is “This is a managed network – the customer put in a firewall and that will block the bad stuff”.
The magic appears only to happen if you use the prefix 0.0.0.0/0 – try a different prefix and all the default routes re-appear.
This post contains a script that will find all Azure Private DNS Zones in a tenant and export information on screen and as markdown in a file.
I found myself in a situation where I needed to document a lot of Azure Private DNS Zones. I needed the following information:
Name of the zone
Subscription name
Resource group name
Name of associated virtual networks
The list was long so a copy and paste from the Azure Portal was going to take too long. Instead, I put a few minutes into a script to do the job – it even writes the content as a Markdown table in a .md file, making it super simple to copy/paste the entire piece of text into my documentation in VS Code.
In this post, I want to discuss how one should design network security in Microsoft Azure, dispensing with past patterns and combatting threats that are crippling businesses today.
The Past
Network security did not change much for a very long time. The classic network design is focused on an edge firewall.”All the bad guys are trying to penetrate our network from the Internet” so we’ll put up a very strong wall at the edge. With that approach, you’ll commonly find the “DMZ” network; a place where things like web proxies and DNS proxies isolate interior users and services from the Internet.
The internal network might be made up of two/more VLANs. For example, one or more client device VLANs and a server VLAN. While the route between those VLANs might pass through the firewall, it probably didn’t; they really “routed” through a smart core switch stack and there was limited to no firewall isolation between those VLANs.
This network design is fertile soil for malware. Ports usually are not let open to attack on the edge firewall. Hackers aren’t normally going to brute force their way through a firewall. There are easier ways in such as:
Send an “invoice” PDF to the accounting department that delivers a trojan horse.
Impersonate someone, ideally someone that travels and shouts a lot, to convince a helpful IT person to reset a password.
Target users via phishing or spear phishing.
Cimpromise some upstream include that developers use and use it to attack from the servers.
Use a SQL injection attack to open a command prompt on an internal server.
And on and on and …
In each of those cases, the attack comes from within. The spread of the blast (the attack) is unfettered. The blast area (a term used to describe the spread of an attack) is the entire network.
Secure Zones To The Rescue!
Government agencies love a nice secure zone architecture. This is a design where sensitive systems, such as GDRP data or secrets are stored on an isolated network.
Some agencies will even create a whol duplicate network that is isolated, forcing users to have two PCs – one “regular” one on the Internet-connected network and a “secure” PC that is wired onto an isolated network with limited secret services.
Realistically, that isolated network is of little value to most, but if you have that extreme a need – then good luck. By the way, that won’t work in The Cloud 🙂 Back to the more regular secure zone …
A special VLAN will be deployed and firewall rules will block all traffic into and out of that secure zone. The user experience might be to use Citrix desktops, hosted in the secure zone, to access services and data in that secure zone. But then reality starts cracking holes in the firewall’s deny all rules. No line of business app lives alone. They all require data from somewhere. Or there are integrations. Printers must be used. Scanners need to scan and share data. And legacy apps often use:
Domain (ADDS) credentials (how many ports do you need for that!!!)
SMB (TCP 445) for data transfer and integration
Over time, “deny all” becomes a long list of allow * from X to *, and so on, with absolutely no help from the app vendors.
The theory is that if an attack is commenced, then the blast area will be limited to the client network and, if it reaches the servers, it will be limtied to the Internal network. But this design fails to understand that:
An attack can come from within. Consider the scneario where compromised runtimes are used or a SQL injection attack breaks out from a database server.
All the required integrations open up holes between the secure zone and the other networks, including those legacy protocols that things like ransomware live on.
If one workload in the secure zone is compromised, they all are because there is no network segmentation inside of the VLAN.
And eventually, the “secure zone” is no more secure than the Internal network.
Don’t Block The Internet!!!
I’m amazed how many organisations do not block outbound access to the Internet. It’s just such hard work to open up firewall rules for all these applications that have Internet dependencies. I can understand that for a client VLAN. But the server VLAN such be a controlled space – if it’s not known & controlled (i.e. governed) then it should not be permitted.
A modern attack, an advanced persistent threat (APT), isn’t just some dumb blast, grab, and run. It is a sneaky process of:
Penetration
Discovery, often manually controlled
Spread, often manually controlled
Steal
Destroy/encrypt/etc
Once an APT gets in, it usually wants to call home to pull instructions down from a rogue IP address or compromised bot. When the APT wants to steal data, to be used as blackmail and/or to be sold on the Darknet, the malware will seek to upload data to the Internet. Both of these actions are taking advantage of the all-too-common open access to the Internet.
Azure is Different
Years of working with clients has taught me that there are three kinds of people when it comes to Azure networking:
Those who managed on-premises networks: These folks struggle with Azure networking.
Those who didn’t do on-premises networking, but knew what to ask for: These folks take to Azure networking quite quickly.
Everyone else: Irrelevant to this topic
What makes Azure networking so difficult for the network admins? There is no cabling in the fabric – obviously there is cabling in the data centres but it’s all abstracted by the VXLAN software-defined networks. Packets are encapsulated on the source virtual machine’s host, transmitted over the physical network, decapstulated on the destination virtual machine host, and presented to the destination virtual machine’s NIC. In short, packets leave the source NIC and magically arrive on the destination NIC with no hops in between – this is why traceroute is pointless in Azure and why the default gateway doesn’t really exist.
I’m not going to use virtual machines, Aidan. I’m doing PaaS and serverless computing. In Azure, everything is based on virtual machines, unless they are explcitly hosted on physical hosts (Azure VMware Services and some SAP stuff, for example). Even Functions run on a VM somewhere hidden in the platform. Serverless means that you don’t need to manage it.
The software-defined thing is why:
Partitioned subnets for a firewall appliance (front, back, VPN, and management) offer nothing from a security perspective in Azure.
ICMP isn’t as useful as you’d imagine in Azure.
The concept of partitioning workloads for security using subnets is not as useful as you might think – it’s actually counter-productive over time.
Transformation
I like to remind people during a presentation or a project kickoff that going on a cloud journey is supposed to result in transformation. You now re-evaluate everything and find better ways to do old things using cloud-native concepts. And that applies to network security designs too.
Micro-Segmentation Is The Word
Forget “Greece”, get on board with what you need to counter today’s threats: micro-segmentation. This is a concept where:
We protect the edge, inbound and outbound, permitting only required traffic.
We apply network isolation within the workload, permitting only required traffic.
We route traffic between workloads through the edge firewall, , permitting only required traffic.
Yes, more work will be required when you migrate existing workloads to Azure. I’d suggest using Azure Migrate to map network flows. I never get to do that – I always get the “messy migration projects” and I never get to use Azure Migrate – so testing and accessing and understanding NSG Traffic Analytics and the Azure Firewall/firewall logs via KQL is a necessary skill.
Security Classification
Every workload should go through a security classification process. You need to weigh risk verus complexity. If you max the security, you will increase costs and difficulty for otherwise simple operations. For example, a dev won’t be able to connect Visual Studio straight to an App Service if you deploy that App Service on a private or isolated App Service Plan. You also will have to host your own DevOps agents/GitHub runners because the Microsoft-hosted containers won’t be able to reach your SCM endpoints.
Every piece of compute is a potential attack vector: a VM, an App Service, a Function, a Container, a Logic App. The question is, if it is compromised, will the attacker be able to jump to something else? Will the data that is accessible be secret, subject to regulation, or reputational damage?
This measurement process will determine if a workload should use resources that:
Have public endpoints (cheapest and easiest).
Use private endpoints (medium levels of cost, complexity, and security).
Use full VNet integration, such as an App Service Environment or a virtual machine (highest cost/complexity but most secure).
The Virtual Network & Subnet
Imagine you are building a 3-tier workload that will be isolated from the Internet using Azure virtual networking:
Web servers on the Internet
Middle tier
Databases
Not that long ago, we would have deployed that workload on 3 subnets, one for each tier. Then we would have built isolation using Network Security Groups (NSGs), one for each subnet. But you just learned that a SD-network routes packets directly from NIC to NIC. An NSG is a Hyper-V Port ACL that is implemented at the NIC, even if applied at the subnet level. We can create all the isolation we want using an NSG within the subnet. That means we can flatten the network design for the workload to one subnet. A subnet-associated subnet will restrict communications between the tiers – and ideally between nodes within the same tier. That level of isolation should block everything … should 🙂
Tips for virtual networks and subnets:
Deploy 1 virtual network per workload: Not only will this follow Azure Cloud Adoption Framework concepts, but it will help your overall security and governance design. Each workload is placed into a spoke virtual network and peered with a hub. The hub is used only for external connectivity, the firewall, and Azure Bastion (assuming this is not a vWAN hub).
Assign a single prefix to your hub & spoke: Firewall and NSG rules will be easier.
Keep the virtual newtorks small: Don’t waste your address space.
Flatten your subnets: Only deploy subnets when there is a technical need, for example VMs and private endpoints are in one subnet, VNet integration for an App Services plan is in another, a SQL managed instance, is in a third.
Resource Firewalls
It’s sad to see how many people disable operating system firewalls. For example, Group Policy is used to diable Windows Firewall. Don’t you know that Microsoft and Linux added those firewalls to protect machines from internal attacks? Those firewalls should remain operational and only permit required traffic.
Many Azure resources also offer firewalls. App Services have firewalls. Azure SQL has a firewall. Use them! The one messy resource is the storage account. The location of the endpoints for storage clusters is in a weird place – and this causes interesting situations. For example, a Logic App’s storage account with a configured firewall will prevent workflows from being created/working correctly.
Network Security Groups
Take a look at the default inbound rules in an NSG. You’ll find there is a Deny All rule which is the lowest possible priority. Just up from that rule, is a built in rule to allow traffic from VirtualNetwork. VirtualNetwork includes the subnet, the virtual network, and all routed networks, including peers and site-to-site connections. So all traffic from internal networks is … permitted! This is why every NSG that I create has a custom DenyAll rule with a priority of 4000. Higher priority rules are created to permit required traffic and only that required traffic.
Tips with your NSGs:
Use 1 NSG per subnet: Where the subnet resources will support an NSG. You will reduce your overall complexity and make troubleshooting easier. Remember, all NSG rules are actually applied at the source (outbound rules) or target (inbound rules) NIC.
Limit the use of “any”: Rules should be as accurate as possible. For example: Allow TCP 445 from source A to destination B.
Consider the use of Application Security Groups: You can abstract IP addresses with an Application Security Group (ASG) in an NSG rule. ASGs can be used with NICs – virtual machines and private endpoints.
Enable NSG Flow Logs & Traffic Analytics: Great for troubleshooting networking (not just firewall stuff) and for feeding data to a SIEM. VNet Flow Logs will be a superior replacement when it is ready for GA.
Makes connections using site-to-site networking using SD-WAN, VPN, and/or ExpressRoute.
Hosts the firewall. The firewall blocks everything in every direction by default,
Hosts Azure Bastion, unless you are running Azure Virtual WAN – then deploy it to a spoke.
Is the “Public IP” for egress traffic for workloads trying to reach the Internet. All egress traffic is via the firewall. Azure Policy should be used to restrict Public IP Addresses to just those requires that require it – things like Azure Bastion require a public IP and you should create a policy override for each required resource ID.
My preference is to use Azure Firewall. That’s a long conversation so let’s move on to another topic; Azure Bastion.
Most folks will go into Azure thinking that they will RDP/SSH straight to their VMs. RDP and SSH are not perfect. This is something that the secure zone concept recognised. It was not unusual for admins/operators to use a bastion host to hop via RDP or SSH from their PC to the required server via another server. RDP/SSH were not open directly to the protected machines.
Azure Bastion should offer the same isolation. Your NSG rules should only permit RDP/SSH from:
The AzureBastionSubnet
Any other bastion hosts that might be employed, typically by developers who will deploy specialist tools.
Azure Bastion requires:
An Entra ID sign-in, ideally protected by features such as conditional access and MFA, to access the bastion service.
The destination machine’s credentials.
Routing
Now we get to one of my favourite topics in Azure. In the on-prem world we can control how packets get from A to B using cables. But as you’ve learned, we can run cables in Azure. But we can control the next hop of a packet.
We want to control flows:
Ingress from site-to-site networking to flow through the hub firewall: A route in the GatewaySubnet to use the hub firewall as the next hop.
All traffic leaving a spoke (workload virtual network) to flow through the hub firewall: A route to 0.0.0.0/0 using the firewall backend/private IP as the next hop.
All traffic between hub & spokes to flow through the remote hub firewall: A route to the remote hub & spoke IP prefix (see above tip) with a next hop of the remote hub firewall.
If you follow my tips, especially with the simple hub, then the routing is actually quite easy to implement and maintain.
Tips:
Keep the hub free of compute.
NSG Traffic Analytics helps to troubleshoot.
Web Application Firewall
The hub firewall shold not be used to present web applications to the Internet. If a web app is classified as requireing network security, then it should be reverse proxied using a Web Application Firewall (WAF). This specialised firewall inspects traffic at the application layer and can block threats.
The WAF will have a lot of false positives. Heavy traffic applications can produce a lot of false positives in your logs; in the case of Log Analytics, the ingestion charge can be huge so get to optimising those false positives as quickly as you can.
My preference is to route the WAF through the hub firewall to the backend applications. The WAF is a form of compte, even the Azure WAF. If you do not need end-to-end TLS, then the firewall could be used to inspect the HTTP traffic from the WAF to the backend using Intrusion Detection Prevention System (IDPS), offering another layer of protection.
Azure offers a couple of WAF options. Front Door with WAF is architecturally interesting, but the default design is that the backend has a public endpoint that limits access to your Front Door instance at the application layer. What if the backend is network connected for max protection? Then you get into complexities with Private Link/Private Endpoint.
A regional WAF is network connected and offers simpler networking, but it sacrifices the performance boosts from Front Door. You can combine Front Door with a regional WAF, but there are more costs with this.
Third party solutions are posisble Services such as Cloud Flare offer performance and security features. One could argue that Cloud Flare offers more features. From the performance perspective, keep in mind that Cloud Flare has only a few peering locations with the Microsoft WAN, so a remote user might have to take a detour to get to your Azure resources, increasing latency.
You can seek out WAF solutions from the likes of F5 and Citrix in the Azure Marketplace. Keep in mind that NVAs can continue skills challenges by siloing the skill – native cloud skills are easier to develop and contract/hire.
Summary
I was going to type something like “this post gives you a quick tour of the micro-segmentation approach/features that you can use in Azure” but then I reaslised that I’ve had keyboard diarrhea and this post is quite Sinofskian. What I’ve tried to explain is that the ways of the past:
Don’t do much for security anymore
Are actually more complex in architecture than Azure-native patterns and solutions that will work.
If you implement security at three layers, assuming that a breach will happen and could happen anywhere then you limit the blast area of a threat:
The edge, using the firewall and a WAF
The NIC, using a Network Security Group
The resource, using a guest OS/resource firewall
This trust-no-one approach that denies all but the minimum required traffic will make life much harder for an attacker. Including logging and the use of a well configured SIEM will create trip wires that an attacker must trip over to attempt an expansion. You will make their expansion harder & slower, and make it easier to detect them. You will also limit how much they can spread and how much the damage that the attack can create. Furthermore, you will be following the guidance the likes of the FBI are recommending.
There is so much more to consider when it comes to security, but I’ve focused on micro-segmentation in a network context. People do think about Entra ID and management solutions (such as Defender for Cloud and/or SIEM) but they rarely think through the network design by assuming that what they did on-prem will still be fine. It won’t because on-prem isn’t fine right now! So take my advice, transform your network, and protect your assets, shareholders, and your career.
This post will explain how you can connect your Azure network(s) with Oracle Cloud Infrastructure (OCI) via the Oracle Cloud Interconnect.
Background
Many mid-large organisations run applications that are based on Oracle software. When these organisations move to the cloud, they may choose to use Oracle Cloud for their Oracle workloads and Azure for everything else.
But that raises some interesting questions:
How do we connect Azure workloads to Oracle workloads?
If Oracle is hosting data services, how do we minimise latency?
The answer is: The Oracle Cloud Interconnect (OCI).
Microsoft and Oracle are inter-connected via their respective private “site-to-site” connection mechanisms:
Azure: ExpressRoute
Oracle: FastConnect
This is achieved by both service providers sharing a “meet me” location where each cloud’s edge networks allow a “cross-connection”. So, there is no need to contact an ISP to lease an ExpressRoute circuit. The circuit already exists. There is no need to sign a circuit contract. The ISP is “Oracle” and you pay for the usage of it – in the case of Azure by paying for the ExpressRoute circuit Azure resource.
Location, Location, Location
The inter-connect mechanism is obviously play a role in where you can deploy your ExpressRoute Circuit and FastConnect resource. But performance also comes into play here – latency must be kept to a minimum. As a result, there is a support restriction on which Azure/Oracle regions can be inter-connected and where the circuit must be terminated.
Let’s imagine that we are using OCI Amsterdam. If we want to connect Azure to it then we must use Azure West Europe.
Now, what about keeping that latency low? The trick there is in selecting a Peering Location that is closeby. Note that the Oracle docs do a better job at defining the Azure peering location (see under Availability).
In my scenario, the peering location would be Amsterdam2. According to Microsoft:
Connectivity is only possible where an Azure ExpressRoute peering location is in proximity to or in the same peering location as the OCI FastConnect.
That means you must always keep the following close to be able to use this solution:
The Oracle Cloud Infrastructure region
The Azure region
The peering location of the ExpressRoute circuit & FastConnect circuit
Configuring ExpressRoute
You have few options to decide between. The first is the SKU of ExpressRoute that you will choose.
Type
Billing
Connections
Local
Unlimited
1 or 2 Azure regions in the same metro as the peering location.
Standard
Metered or Unlimited
Up to 10 connection in the same geo zone as the peering location.
You also have to choose one of the supported speeds for this solution: 1, 2, 5, or 10 Gbps.
The ISP will be Oracle Cloud FastConnect.
So do you choose Local or Standard? I think that really comes down to balancing the cost. Local has unlimited data transfer but it is billed based on bandwidth. The entry cost per month in Zone 1 is €1,111.27/month with 1 Gbps and unlimited data transfer.
The entry point for a Standard metered plan is €403.76/month. That is €707.51 cheaper than the Local SKU but that savings has to cover your outbound data transfer cost in Azure. At €0.024/GB, that leaves you with (707.51/0.024) 29,479 GB of outbound data transfer per month until the Local SKU is more affordable.
The safe tip here is choose Local, monitor data usage, and consider jumping to Standard if you are using a small enough amount of outbound data transfer to make the metered Standard SKU more affordable.
Note that you can upgrade from Local but you cannot downgrade to Local.
Getting Connected (From Azure)
I’ll talk about the Azure side of things because that’s what I know. I will cover a little bit about Oracle, from what I have learned.
You will need an ExpressRoute Gateway in the selected Azure region. Then you will create an ExpressRoute Circuit in the same region:
Retrieve the service key and then continue the process in the OCI portal. There is one screen that is very confusing: configuring the BGP addresses.
You are going to need two /30 prefixes that are not used in your OCI/Azure networks. I’m going to use 192.168.0.0/30 and 192.168.0.4/32 for my example. You need two prefixes because Azure and Oracle are running highly available resources under the covers. The ExpressRoute Gateway is two active/active compute instances. Each will require an IP address to advertise/receive addresses prefixes via BGP from the OCI gateway, and vice versa.
What addresses do you need? Oracle requires you to enter:
Customer (Azure) BGP IP Address 1
Oracle BGP IP Address 1
Customer (Azure) BGP IP Address 2
Oracle BGP IP Address 2
Here’s how you calculate them:
Customer (Azure) BGP IP Address 1: Usable IP #2 from Prefix 1
Oracle BGP IP Address 1: Usable IP #1 from Prefix 1.
Customer (Azure) BGP IP Address 2: Usable IP #2 from Prefix 2
Oracle BGP IP Address 2: Usable IP #1 from Prefix 1
The below is not the final answer yet! But we’re getting there. That would lead us to caclulating:
Customer BGP IP Address 1: 192.168.0.2
Oracle BGP IP Address 1: 192.168.0.1
Customer BGP IP Address 2: 192.168.0.6
Oracle BGP IP Address 2: 192.168.0.5
But the Oracle GUI has an illogical check and will tell you that those addresses are wrong. They are correct – it’s just the Oracle GUI is broken by design! Here is what you need to enter:
Customer BGP IP Address 1: 192.168.0.2/30
Oracle BGP IP Address 1: 192.168.0.1/30
Customer BGP IP Address 2: 192.168.0.6/30
Oracle BGP IP Address 2: 192.168.0.5/30
You finish the process and wait a little bit. The ExpressRoute circuit will eventually change status to Provisioned. Now you can create a connection between the circuit and the ExpressRoute Gateway. When I did it, the Private Peering was automatically configured, using 192.168.0.0/30 and 192.168.04/30 as the peering subnets.
Check your ARP records and route tables in the circuit (under Private Peering) and you should see that Oracle has propagated its known addresses to your Azure ExpressRoute Gateway, and on to any subnets that are not blocking propagation from the gateway.
And that’s it!
Other Support Things
The following Oracle services are supported:
E-Business Suite
JD Edwards EnterpriseOne
PeopleSoft
Oracle Retail applications
Oracle Hyperion Financial Management
Naturally, your OCI and Azure networks must not have overlapping prefixes.
You can do transitive routing. For example, you can route through the interconnect to an Oracle network and then on to a peered Oracle network (a hub and spoke).
You cannot use the interconnect to route to on-premises from Azure or from OCI.
In this Festive Tech Calendar post, I am going to explain how to get Private Endpoints working in the real world.
Thank you to the team that runs Festive Tech Calendar every year for the work that they do and for raising funds for worthy causes.
Private Endpoints
When The Cloud was first envisioned, it was made a platform that didn’t really take network security seriously. The resources that developers want to use, Platform-as-a-Service (PaaS), were built to only have public endpoints. In the case of Microsoft Azure, if I deploy an App Service Plan, the compute that is provisioned for me shares a public IP address(es) with plans from other tenants. The App Service Plan is accessible directly on the Internet – that’s even true when you enable “firewall rules” in an App Service because those rules only control what HTTP/S requests will be responded to so raw TCP connections (zero day attacks) are still possible.
If I want to protect that App Service Plan I need to make it truly private by connecting it to a virtual network, using a private IP address, and maybe placing a Web Application Firewall in the flow oc the client connection.
The purpose of Private Endpoint is to alter the IP address that is used to connect to a platform resource. The public endpoint is, preferably, disabled for inbound connections and clients are redirected to a private IP address.
When we enable a Private Endpoint for a PaaS resource, a Private Endpoint resource is added and a NIC is created. The NIC is connected to a subnet in a virtual network and obtains or is supplied with an IP address for that subnet. All client connections will be via that private IP address. And this is where it all goes wrong in the real world.
If I browse myapp.azurewebsites.net my PC will resolve that name to the public endpoint IP address – even after I have implemented a Private Endpoint. That means that I have to redirect my client to the new IP address. Nothing on The Internet knows that private IP address mapping. The only way to map the FQDN of the App Service to the private endpoint is to use Private DNS.
You might remember this phrase for troubleshooting on-premises networks: “it’s always DNS”. In Azure, “it’s always routing, then it’s always DNS”, but the DNS part is what we need to figure out, not just for this App Service but for all workloads/resource types.
The Problems
There are three main issues:
Microsoft Documentation
Developers don’t do infrastructure
Who does DNS in The Cloud?
Microsoft Documentation
The documentation for Private Endpoint ranges from excellent to awful. That variance depends on the team/resource type that is covered by the documentation. Each resource team is responsible for their own implementation/documentation. And that means some documentation is good and clear, while some documentation should never have made it past a pull request.
The documentation on how to use Private Endpoint focuses on single workloads. You’ll find the same is true in the certifcation exams on Microsoft networking. In the real world, we have many workloads. Clients need to access those workloads over virtual networks. Those workloads integrate with each other, and that means that they must also resolve each others names. This name resolution must work for resources inside of individual workloads, for workload-to-workload communications, and on-premises clients-to-workload communications. You can eventually figure out how to do this from Microsoft documentation but, in my experience, many organisations give up during this journey and assume that Private Endpoint does not work.
Developers Don’t Do Infrastructure
Imagine asking a developer to figure out virtual networks and subnetting! OK, let’s assume you have reimagined IT processes and structures (like you are supposed to) and have all that figured out.
Now you are going to ask a developer to understand how DNS works. In the real world, most devs know their market verticals, language(s) and (quite complex) IDE toolset, and everything else is not important. I’ve had the pleasure of talking devs through running NSLOOKUP (something we IT pros often consider simple) and I basically ran a mini-class.
Assuming that a dev knows how DNS works and should be architected in The Cloud is a path to failure.
Who Does DNS In The Cloud?
I have lost track of how many cloud jorneys that I have been a part of, either from the start or where I joined a struggling project. A common wish for many of those customers is that they won’t run any virtual machines (some organisations even “ban” VMs) – I usually laugh and promise them some VMs later. Their DNS is usually based on Windows Server/Active Directory and with no VMs in their future, they assume that don’t need any DNS system.
If there is no DNS architecture, then how will a system, such as Private Endpoint, work?
A Working Architecture
I’m going to jump straight to a working archticture. I’ll start with a high-level design and then talk about some of the low-level design options.
This design works. It might not be exactly what you require but simple changes can be made for specific scenarios.
High-Level Design
Private DNS Zones are created for each resource type and service type in that resource that a Private Endpoint is deployed for. Those zones are deployed centrally and are associated with a virtual network/subnet that will be dedicated to a DNS service.
The DNS service of your choice will be deployed to the DNS virtual newtork/subnet. Forwarders wll be configured on that DNS Service to point to the “magic Azure virtual IP address” 168.63.129.16. That is an address that is dedicated to Azure services – if you send DNS requests to it then they will be handled by:
Azure Private DNS zones, looking for a matching zone/record
Azure DNS, which can resolve Azure Public DNS Zones or resolve Internet requests – ah you don’t need proxy DNS servers in a DMZ now because Azure becomes that proxy DNS server!
Depending on the detailed design, your DNS servers can also resolve on-premises records to enable Azure-to-on-premises connections – important for migration windows while services exist in two locations, connections to partners via private connections, and when some services will stay on-premises.
All other virtual networks in your deployment (my design assumes you have a hub & spoke for a mid/large scale deployment) will have custom DNS servers configured to point at the DNS servers in the DNS Workload.
One intersting option here is Azure Firewall in the hub. If you want to enable FQDNs in Network Rules then you will:
Enable DNS Proxy mode in the Azure Firewall.
Configure the DNS server IP addresses in the Azure Firewall.
Use the private IP address of the Azure Firewall (a HA resource type) as your DNS server in the virtual networks.
Low-Level Design
There are different options for your DNS servers:
Azure Private DNS Resolver
Active Directory Domain Services (ADDS) Domain Controllers
Simple DNS Servers
In an ideal world, you would choose Azure Private DNS Resolver. This is a pure PaaS resource that can be managed as code – remember “VMs are banned”. You can forward to Azure Private DNS Zones and forward to on-premises/remote DNS servers. Unfortunately, Azure Private DNS Resolver is a relatively expensive resource and the design and requirements are complex. I haven’t really used Azure Private DNS Resolver in the real world so I cannot comment on compatibility with complex on-premises DNS architectures, but I can imagine there being issues with organisations such as universities where every DNS technology known to mankind since the early 1990’s is probably employed.
Most of the customers that I have worked with have opted to use Domain Controllers (DCs) in Azure as their DNS servers. The DCs store all the on-premises AD-integrated zones and can resolve records independently of on-premises DNS server. The intereface is familiar to Windows admins and easily configured and managed. This increases usability and compatibility. If you choose a modest B-series SKU then the cost will be quite a bit lower than Azure Private DNS Resolver. You’ll also have an ADDS presence in Azure enabling legacy workloads to use their required authenetication/aauthorisation methods.
The third option is to just use either a simple Windows/Linux VM as the DNS server. This is a good choice where ADDS is not required or where Linux DNS is required.
The Private Endpoint
I metioned that a Private Endpoint/NIC combination would be deployed for each resource/service type that requires private connectivity. For example, a Storage Account can have blob, table, queue, web, file, dsf, afs, and disks services. We need to be able to redirect the client to the specific service – that means creating a NDS record in the correct Azure Private DNS Zone, such as privatelink.blob.core.windows.net. Some workloads, such as Cosmos DB, can require multiple DNS records – how do you know what to create?
Luckily, their is a feature in Private Endpoint that handles auto-registration for you:
All of the required DNS records are created in the correct DNS zones – you must have the Azure Private DNS Zones deployed beforehand.
If your resource changes IP address , the DNS records will be updated automatically.
Sadly, I could not find anydocumentation for this feature while writing this article. However, it’s an easy feature to configure. Open your new Private Endpoint and browse to DNS Configuration. There you can see the required DNS records for this Private Endpoint.
Click Add Configuration and supply the requested information. From now on, that Private Endpoint will handle record registration/updates for you. Nice!
With a central handler for DNS name resolution, on-premises clients have the ability to connect to your Private Endpoints – subject to network security rules. On-premises DNS servers should be configured with conditional forwarders (one for each Private Link Azure Private DNS Zone) to point at your Azure DNS servers – they can point at a Azure Firewall if the previously mentioned DNS options are used.
Some Complexities
Like everything, this design is not perfect. Centralised anything comes with authorisation/governance issues. Anyone deploying a Private Endpoint will require the rights to access the Azure Private DNS Zones/records. In the wrong hands, that could become a ticketing nightmare where simple tasks take 6 weeks – far from the agility that we dream of in The Cloud.
Conclusion
The above design is one that I have been using for years. It ahs evolved a little as new features/resources have been added to Azure but the core design has remained the same. It works and it is scalable. Importantly, once it is built, there is little for the devs to know about – just enable DNS Configuration in the Private Endpoint.
Tweaks can be made. I’ve discussed some DNS server options – some choose to dispense with DNS Servers altogether and use Azure Firewall as the DNS server, which forwards to the default Azure DNS services. On-premises DNS servers can forward to Azure Firewall or to the DNS servers. But the core design remains the same.
Microsoft has announced that the default route, an implicit public IP address, is being deprecated 30 September 2025.
Background
Let’s define “Internet” for the purposes of this post. The Internet includes:
The actual Internet.
Azure services, such as Azure SQL or Azure’s KMS for Windows VMs, that are shared with a public endpoint (IP address).
We have had ways to access those services, including:
Public IP address associated with a NIC of the virtual machine
Load Balancer with a public IP address with the virtual machine being a backend
A NAT Gateway
An appliance, such as a firewall NVA or Azure firewall, being defined as the next hop to Internet prefixes, such as 0.00.0/0
If a virtual machine is deployed without having any of the above, it still needs to reach the Internet to do things like:
Activate a Windows license against KVM
Download packages for Ubuntu
Use Azure services such as Key Vault, My SQL for Azure SQL, or storage accounts (diagnostics settings)
For that reason, all Azure virtual machines are able to reach the Internet using an implied public IP address. This is an address that is randomly assigned to SNAT the connection out from the virtual machine to the Internet. That address:
Is random and can change
Offers no control or security
Modern Threats
There are two things that we should have been designing networks to stop for years:
Malware command and control
Data exfiltration
The modern hack is a clever and gradual process. Ransomware is not some dumb bot that gets onto your network and goes wild. Some of the recent variants are manually controlled. The malware gets onto the network and attempts to call home to a “machine” on the Internet. From there, the controllers can explore the network and plan their attack. This is the command and control. This attempt to “call home” should be blocked by network/security designs that block outbound access to the Internet by default, opening only connections that are required for workloads to function.
The controller will discover more vulnerabilities and download more software, taking further advantage of vulnerable network/security designs. Backups are targeted for attack first, data is stolen, and systems are crippled and encrypted.
The data theft, or exfiltration, is to an IP address that a modern network/security design would block.
So you can see, that a network design where an implied public IP address is used is not a good practice. This is a primary consideration for Microsoft in making its decision to end the future use of implied public IP addresses.
What Is Happening?
On September 30th, all future virtual machines will no longer be able to use an implied public IP address. Existing virtual machines will be unaffected – but I want to drill into that because it’s not as simple as one might think.
A virtual machine is a resource in Azure. It’s not some disks. It’s not your concept of “I have something called X” that is a virtual machine. It’s a resource that exists. At some point, that resource might be removed. At that point, the virtual machine no longer exists, even if you recreate it with the exact same disks and name.
So keep in mind:
Virtual networks with existing VMs: The existing VMs are unaffected, but new VMs in the VNet will be affected and won’t work.
Scale-out: Let’s say you have a big workload with dozens of VMs with no public IP usage. You add more VMs and they don’t work – it’s because they don’t have an implied IP address, unlike their older siblings.
Restore from backup: You restore a VM to create a new VM. The new VM will not have an implied public IP address.
Is This a Money Grab?
No, this is not a money grab. This is an attempt by Microsoft to correct a “wrong” (it was done to be helpful to cloud newcomers) that was done in the original design. Some of the mitigations are quite low-cost, even for small businesses. To be honest, what money could be made here is pennies compared to the much bigger money that is made elsewhere by Azure.
The goal here is to:
Be secure by default by controlling egress traffic to limit command & control and data exfiltration.
Provide more control over egress flows by selecting the appliance/IP address that is used.
Enable more visibility over public IP addresses, for example, what public address should I share with a partner for their firewall rules?
Drive better networking and security architectures by default.
What Is Your Mitigation?
There are several paths that you can choose.
Assign a public IP address to a virtual machine: This is the lowest cost option but offers no egress security. It can get quite messy if multiple virtual machines require public IP addresses. Rate this as “better than nothing”.
Use a next hop: You can use an appliance (virtual machine or Marketplace network virtual appliance) or the Azure Firewall as a next hop to the Internet (0.0.0.0/0) or specific Internet IP prefixes. This is a security option – a firewall can block unwanted egress traffic. If you are budget-conscious, then consider Azure Firewall Basic. No matter what firewall/appliance you choose, there will be some subnet/VNet redesign and changes required to routing, which could affect VNet-integrated PaaS services such as API Management Premium.
September 2025 is a long time away. But you have options to consider and potentially some network redesign work to do. Don’t sit around – start working.
In Summary
The implied route to the Internet for Azure VMs will stop being available to new VMs on September 30th, 2025. This is not a money grab – you can choose low-cost options to mitigate the effects if you wish. The hope is that you opt to choose better security, either from Microsoft or a partner. The deadline is a long time away. Do not assume that you are not affected – one day you will expand services or restore a VM from backup and be affected. So get started on your research & planning.
Something new appeared in recent times: the “Managed Private Endpoint”. What the heck is it? Why would I use it? How is it different from a “Private Endpoint”?
Some Background
As you are probably aware, most PaaS services in Azure have a public endpoint by default. So if I use a Storage Account or Azure SQL, they have a public interface. If I have some security or compliance concerns, I can either:
Switch to a different resource type to solve the problem
Use a Private Endpoint
Private Endpoint is a way to interface with a PaaS resource from a subnet in a virtual network. The resource uses the Private Link service to receive connections and respond – this stateful service does not allow outbound connections providing a form of protection against some data leakage vectors.
Say I want to make a Storage Account only accessible on a VNet. I can set up a Private Endpoint for the particular API that I care about, such as Blob. A Private Endpoint resource is created and a NIC is created. The NIC connects to my designated subnet and uses an IP configuration for that subnet. Name resolution (DNS) is updated and now connections from my VNet(s) will go to the private IP address instead of the public endpoint. To enforce this, I can close down the public endpoint.
The normal process is that this is done from the “target resource”. In the above case, I created the Private Endpoint from the storage account.
Managed Private Endpoint
This is a term I discovered a couple of months ago and, to be honest, it threw me. I had no idea what it was.
So far, Managed Private Endpoints are features of:
The basic concept of a Managed Private Endpoint has not changed. It is used to connect to a PaaS resource, also referred to as the target resource (ah, there’s a clue!) over a private connection.
Microsoft: Azure Data Factory Integration Runtime connecting privately to other PaaS targets
What is different is that you create the Managed Private Endpoint from a client resource. Say, for example, I want Azure Synapse Analytics to connect privately to an Azure Cosmos DB resource. The Synapse Analytics resource doesn’t do normal networking so it needs something different. I can go to the Synapse Analytics resource and create a Managed Private Endpoint to the target Cosmos DB resource. This is a request – because the operator of the Cosmos DB resource must accept the Private Endpoint from their target resource.
Once done, Synapse Analytics will use the private Azure backbone instead of the public network to connect to the Cosmos DB resource.
Managed Virtual Network
Is your head wrecked yet? A Managed Private Endpoint uses a Managed Virtual Network. As I said above, a resource like Synapse Analytics doesn’t do normal networking. But a Managed Private Endpoint is going to require a Virtual Network and a subnet to connect the Managed Private Endpoint and NIC.
These are PaaS resources so the goal is to push IaaS things like networking into the platform to be managed by Microsoft. That’s what happens here. When you want to use a Managed Private Endpoint, a Managed Virtual Network is created for you in the same region as the client resource (Synapse Analytics in my example). That means that data engineers don’t need to worry about VNets, subnets, route tables, peering, and all the stuff when creating integrations.
September is a month of storms. There appears to have been lots of activity in the Azure cloud last month too. Everyone working on Azure should pay attention to the PAY ATTENTION! section.
On 30 September 2025, default outbound access connectivity for virtual machines in Azure will be retired. After this date, all new VMs that require internet access will need to use explicit outbound connectivity methods such as Azure NAT Gateway, Azure Load Balancer outbound rules, or a directly attached Azure public IP address.
There will be more communications on this from Microsoft. But this is more than a “don’t worry about your existing VMs” situation. What happens when you add more VMs to an existing old network? What happens when you do a restore? What happens when you do an Azure Site Recovery failover? Those are all new VMs in old networks and they areaffected. Everyone should do some work to see if they are affected and prepare remediations in advance – not on the day when they are stressed out by a restore or a Black Friday expansion.
After 31 August 2024, App Service Environment v1 and v2 will no longer be supported and these App Service Environments and the applications running on them will be deleted and any application data associated with them will be lost.
Oh yeah, you’d better start working on migrations now.
Application Gateway for Containers is a new application (layer 7) load balancing and dynamic traffic management product for workloads running in a Kubernetes cluster. At the time of writing this service is currently in public preview. In this article we will look at the differences between AGIC and Application Gateway for containers and some of the great new features available through this new offering.
I know little about AKS but this subject seems to have excited some AKS users.
A Bucket Load Of Stuff
Too much for me to get into and I don’t know enough about this stuff:
We announced the General Availability of WordPress on App Service one year ago, in August 2022 with 3 paid hosting plans. We learnt that sometimes you might need to try out the service before you migrate your production applications. So, we are offering you a playground for a limited period – a free hosting plan to and explore and experiment with WordPress on App Service. This will help you understand the offering better before you make a long-term investment.
They really want you to try this out – note that this plan is not for production workloads.
Almost one year ago the Jumpstart team released the public preview of HCIBox, our self-contained sandbox for exploring Azure Stack HCI capabilities without the need for physical hardware. Feedback from the community has been fantastic, with dozens of feature requests and issues submitted and resolved through our open-source community.
Today, the Jumpstart team is excited to announce the general availability of HCIBox!
It’s one thing to test out the software functionality of Azure Stack HCI. But the reality is that this is a hardware-centric solution and there is no simulating the performance, stability, or operations of something this complex.
Windows Server 2012 and 2012 R2 Extended Security Updates (ESUs) enabled by Azure Arc is now Generally Available. Windows Server 2012 and 2012 R2 are going End of Support on October 10, 2023. With ESUs, customers who are running Windows Server 2012 on-premises or in other clouds can get three more years of critical security updates from Microsoft to protect their End of Life infrastructure.
This is not free. This is tied into the news about Azure Update Manager (below).
In this blog, I’ve shared insights drawn from real-world migration experiences. This article can help you meticulously plan your own CSP to EA migration, ensuring a smoother transition while incorporating critical considerations into your migration strategy.
One really wishes that CSP, EA, etc were just differences in billing and not Azure APIs. Changing of billing should be like changing a phone plan.
Black Friday, Small Business Saturday and Cyber Monday will test your app’s limits, and so it’s time for your Infrastructure and Application teams to ensure that your platforms delivers when it is needed the most. Be it shopping applications on the web and mobile or payment gateways or banking systems supporting payments or inventory systems or billing systems – anything and everything associated with the shopping season should be prepared to face the load for this holiday season.
The “holiday season” starts earlier every year. Tesco Ireland started in August. Amazon has a Prime Day next Tuesday (October 10). These events test systems harder than ever and monolithic on-prem designs will not handle it. It’s time to get ready – if it’s not already too late!
We’re thrilled to share that Azure API Center is now open for everyone to try during our ungated public preview! Azure API Center is a new Azure service that is part of the Azure API Management platform. It is the central hub where you can effortlessly keep track of all your APIs company-wide, making them readily discoverable, reusable, and manageable.
Managing a catalog of APIs could be challenging. Tooling is welcome.
We are thrilled to announce the general availability of DenyAction, a new effect in Azure Policy! With the introduction of Deny Action, policy enforcement now expands into blocking request based on actions to the resource. These deny action policy assignments can safeguard critical infrastructure by blocking unwarranted delete calls.
Can you believe that Azure was designed deliberately to not have a deny permission? Adding it after is not easy. The idea here is that delete locks on resources/resource groups become too easy to remove – and are frequently removed. Something, like a policy, that is enforced in the API (between you and the resources) is always applied and is not easy to remove and can be easily deployed at scale.
Azure Premium SSD v2 Disk Storage is now available in Australia East, France Central, Norway East and UAE North regions. This next-generation storage solution offers advanced general-purpose block storage with the best price performance, delivering sub-millisecond disk latencies for demanding IO-intensive workloads at a low cost.
we are announcing the general availability of the latest generations of Azure Burstable virtual machine (VM) series – the new Bsv2, Basv2, and Bpsv2 VMs based on the Intel® Xeon® Platinum 8370C, AMD EPYC™ 7763v, and Ampere® Altra® Arm-based processors respectively.
Faster and cheaper than the previous editions of B-Series VMs and they include ARM support too. The new virtual machines support all remote disk types such as Standard SSD, Standard HDD, Premium SSD and Ultra Disk storage.
We are pleased to announce that Azure Update Manager, previously known as Update Management Center, is now generally available.
The controversial news is that Arc-managed machines will cost $5/month. I’m still not sold on this solution – it still feels less than legacy solutions like WSUS.
Today, we are announcing a Public Preview of accelerated remote storage performance using Azure Premium SSD v2 or Ultra disk and selected sizes within the existing NVMe-enabled Ebsv5 family. The higher storage performance is offered on the E96bsv5 and E112ibsv5 VM sizes and delivers up to 400K IOPS (I/O operations per second) and 10GBps of remote disk storage throughput.
Even the largest SQL VM that I have worked with comes nowhere near these specs. The customer(s) that have justified this investment by Microsoft must be huge.
Organizations are benefiting from Azure savings plan for compute to save up to 65% on select compute services – and you could too. By committing to spending a fixed hourly amount for either one year or three years, you can save on plans tailored to your budget needs. But you may wonder how Azure applies this benefit.
It’s simple really. The system looks at your VMs, calculates the theoretical savings, and first applies your discount to the machines where you will save the most money, and then repeats until your discount is used.
With community gallery, a new feature of Azure Compute Gallery, you can now easily share your VM images with the wider Azure community. By setting up a ‘community gallery’, you can group your images and make them available to other Azure customers. As a result, any Azure customer can utilize images from the community gallery to create resources such as virtual machines (VMs) and VM scale sets.
Azure VMware Solution proudly introduces Public Preview of Trusted Launch for Virtual Machines. This advanced feature comprises Secure Boot, Virtual Trusted Platform Module (vTPM), and Virtualization-based Security (VBS), collectively forming a formidable defense against modern cyber threats.
A feature that was introduced in Windows Server 2016 Hyper-V.
Workload identity federation is an OpenID Connect implementation for Azure DevOps that allow you to use short-lived credential free authentication to Azure without the need to provision self-hosted agents with managed identity. You configure a trust between your Azure DevOps organisation and an Azure service principal. Azure DevOps then provides a token that can be used to authenticate to the Azure API.
This looks like a more secure way to authenticate your pipelines. No secrets are stored and a trust between your DevOps organasation and Azure enables short-lived authentication with desired access rights/scopes.
In this article, you learn how to automate an existing load test by creating a CI/CD pipeline in Azure Pipelines. Select your test in Azure Load Testing, and directly configure a pipeline in Azure DevOps that triggers your load test with every source code commit. Automate load tests with CI/CD to continuously validate your application performance and stability under load.
This is not something that I have played with but I suspect that you don’t want to do this against production systems!
Starting September 20th, 2023, the core scanning capabilities of GitHub Advanced Security for Azure DevOps can now be self-enabled within Azure DevOps and connect to Microsoft Defender for Cloud. Customers can automate security checks in the developer workflow using:
Code Scanning: locates vulnerabilities in source code and provides remediation guidance.
Secret Scanning: identifies high-confidence secrets and blocks developers from pushing secrets into code repositories.
Dependency Scanning: discovers vulnerabilities with open-source dependencies and automates update alerts for developers.
This seems like a good direction to go but I’m told it’s quite pricey.
WAF running on Application Gateway now supports sensitive data protection through log scrubbing. When a request matches the criteria of a rule, and triggers a WAF action, that event is captured within the WAF logs. WAF logs are stored as plain text for debuggability, and any matching patterns with sensitive customer data like IP address, passwords, and other personally identifiable information could potentially end up in logs as plain text. To help safeguard this sensitive data, you can now create log scrubbing rules that replace the sensitive data with “******”.
Sounds good to me!
General availability: Gateway Load Balancer IPv6 Support
Azure Gateway Load Balancer now supports IPv6 traffic, enabling you to distribute IPv6 traffic through Gateway Load Balancer before it reaches your dual-stack applications.
With this support, you can now add IPv6 frontend IP addresses and backend pools to Gateway Load Balancer. This allows you to inspect, protect, or mirror both IPv4 and IPv6 traffic flows using third-party or custom network virtual appliances (NVAs).
Useful for security architectures where NVAs are being used
We are announcing the support of Cross Region Restore for Recovery Services Agent (MARS) using Azure Backup.
This makes sense. Let’s say I back up my on-prem data, located in Virginia, to Azure East US, in Boydton Virginia. And then there’s a disaster in VA that wipes out my office and Azure East US. Now I can restore to a new location from the paired region replica.
Now, you can save your Azure Recovery Services Agent encryption passphrase in Azure Key Vault directly from the console, making the Recovery Services Agent installation seamless and secure.
This beats the old default option of saving it as a text file on the machine that you were backing up.
Malware Scanning in Defender for Storage will be generally available September 1, 2023.
Please make sure that you read up on how much this will cost you. The DfC plans changed recently, and the pricing model for Storage plans changed to include this feature.
Azure Monitor alerts is previewing a new timeline view that simplifies the consumption experience of fired alerts. The new view has the following advantages:
Shows fired alerts on a timeline
Helps identify co-occurrence of alerts
Displays alerts in the context of the resources they fired on
Focuses on showing counts of alerts to better understand impact
Supports viewing alerts by severity
Provides a more intuitive discovery and investigation path
This might be useful if you are getting a lot of alerts.
Custom image templates allow admins to build a custom “golden image” using the Azure Virtual Desktop management user interface. Leverage a variety of built-in customizations or add your own customization scripts to install applications or configurations.
Why are they not using Azure Image Builder like I do?
This post is a part of the Azure Back to School 2023 online event. In this post, I will discuss using Microsoft Azure Export for Terraform, also known as Aztfexport and previously known as Azure Terrafy (a great name!), to create Terraform code from existing Azure deployments, why you would do it, and share a few tips.
Terraform
Terraform is one of a few Infrastructure-as-Code (IaC) languages out there that support Microsoft Azure. You might wonder why I would use it when Azure has ARM and Bicep. I’ll do a quick introduction to Terraform and then explain my reasoning which you are free to disagree with 🙂
Terraform is a product of Hashicorp available as a free-to-use product that is supported with some paid-for services. Like other IaC languages, it describes and desired end result. The major feature that differs from the native Azure languages is the use of state files – a file that describes what is deployed in Azure. This state file has a few nice use cases, including:
The outputs of a resource are documented, enabling effortless integration between resources in the same or even different files – with some effort, outputs from different deployments can be included in another deployment.
A true what-if engine that (mostly) works, unlike the native what-if in Azure, greatly reducing the time required for deployments and the ability to plan (pre-review) a deployment’s expected changes.
My first encounter with Terraform was a government project where the customer wanted to use Terraform over Bicep. Their reasoning was that elected politicians come and go, and suppliers come and go. If they were going to invest in an IaC skillset, they wanted the knowledge to be transferrable across clouds.
That’s the big advantage of Terraform. While the code itself is not cloud portable, the skill is. Terraform uses providers to be able to manage different resource types. Azure is a provider, written by Microsoft. Azure AD is a provider – ARM/Bicep still do not support Azure AD! AWS and GCP have providers. VMware has a provider. GitHub has a provider – the list goes on and on. If a provider does not exist, you can (in theory) write your own.
On that project, I was meant to be hands-off as an architect. But there were staffing and scheduling issues so I stepped up. Having never written a line of Terraform before I had my first workload, with some review help from a teammate, written in under a day. By the way, the same thing in Bicep took three days! Terraform is really well documented, with lots of examples, and the language makes sense.
Unlike Bicep, which is still beholden to a lot of the complexity of ARM. Doing simple things can involve stupidly complicated functions that only a C programmer (I used to be one) could enjoy (and I didn’t). I got hooked on Terraform and convinced my colleagues that it was a better path than Bicep, which was our original plan to replace ARM/JSON.
Aztfexport
Switching Terraform creates a question – what do we do with our existing workloads which are either deploying using Click Ops (Portal), script, or ARM/Bicep?
Microsoft has created a tool called Azure Export for Terraform (Aztfexport) on GitHub. The purpose of this tool is to take an existing resource group/resource/Graph query string and export it as Terraform code.
The code that is produced is intended to be used in some other way. In other words, Microsoft is not exporting code that should be able to immediately deploy new resources. They say that the produced code should be able to pass a terraform plan where the existing resources are compared with the state file and the code and say “the code is clean and there are no changes required”.
The Terraform configurations generated by aztfexport are not meant to be comprehensive and do not ensure that the infrastructure can be fully reproduced from said generated configurations. For details, please see limitations).
If I can’t use the code to deploy resources then what value is it? Hopefully you will see what aztfexport is a central part of my toolkit. I see it being useful in the following ways:
Learning Terraform: If you’ve not used Terraform before then it’s useful to see how the code can be produced, especially from resources that you are already familiar with.
Creating TF for an existing workload: You need to “terrafy” a resource/resource group and you want a starting point.
Azure-to-Azure migrations: You have a set of existing resources and you want to get a dump of all the settings and configurations.
Learning how a resource type/solution is coded: My favourite learning method is to follow the step-by-step and then inspect the resource(s) as code.
Understand how a resource type/solution works: This is a logical jump from the previous example, now including more resources as a whole solution.
Auditing: Comparing what is there with what should be there – or not there.
Documentation: The best form of resource documentation is IaC – why create lengthy documentation when the code is the resource?
I did use Aztfexport to learn Terraform more. In my current project, I have used it again and again to do Azure-to-Azure migrations, taking legacy ClickOps deployments and rewriting them as new secure/governed deployments. I’ve save countless hours capturing settings and configurations and re-using them as new code.
The Bad Stuff
Nothing is perfect, and Aztfexport has some thorns too. Notice that the expected usage is that the produced code should pass a terraform plan. That is because in many situations (like with ARM exports) the code is not usable to deploy resources. That can be because:
ARM APIs do not expose everything, so how can Terraform get those settings?
The tool or the providers using used do not export everything.
One example I’ve seen includes App Services configurations that do not include the code type details. Another recent one was with WAF Policies overridden WAF rules were not documented. In both cases, the code would pass a plan. But neither would re-produce the resources. I’ve learned that I do need to double-check things with a resource type that I’ve never worked with before – then I know what to go and manually grab either from an ARM export or a visual inspection in the Portal.
Another thing is that the resources are named by a “machine” – there is no understanding of the role. Every resource is res-1, res-2, and so on, no matter the type or the role in the workload. That is a bit anonymous, but I find that useful when inspecting dependencies between resources.
A giant main.tf file is created, which I break up into many smaller files. I can find relationships based on those easy-to-track dependencies and logically group resources where it suits my coding style.
One feature of TF is the easy reuse of resource IDs. One can easy refer to resource_type.resource_name.id in a property and know that the resource ID of that resource will be used. Unfortunately, some Aztfsexport code doesn’t do that so you get static resource IDs that should be replaced – that happens with other properties of resources too, so that all should be cleaned up to make code more reusable.
Installing Aztfexport
You will need to install Terraform – I prefer to use a Package Manager for that – the online instructions for a manual installation are a mess. You will also require Azure CLI.
The full instructions for installing Aztfexport are shared on GitHub, covering Windows, MacOS and Linux. The Windows installation is easy:
winget install aztfexport
You will need to restart your terminal (Windows) to get an updated Path variable so the aztfexport binary can be found.
Before you use aztfexport, you will need to log in using Azure CLI:
Open your terminalLogin:
az login
Change subscription:
az account set -subscription <subscription ID>
Verify the correct subscription was selected by checking the resource groups:
az group list
Create an empty folder on your PC and navigate to that folder in your terminal. The aztfexport tool requires an empty folder, by default, to create an export including all the required provider files and the generated code.
If you want to create an export of a single resource then you can run:
aztfexport resource <resource ID>
If you want to create an export of a resource group, then you can run:
aztfexport resource-group -n <resource group name>
Not the -n above means “don’t bother me with manual confirmation of what resources to include in the export”. In Terraform, sub-resources that can be managed as their own Terraform resources would otherwise need to be confirmed and that gets pretty tiresome pretty fast.
Tips
I’ve got to hammer on this one again, the produced code is not intended for deployment. Take the code, copy and paste it into new files and clean it up.
If your goal is to take over an existing IaC/ClickOps deployment with Terraform then you are going to have some fun. The resources already exist and Terraform is going to be confused because there is no state file. You will have to produce a state file using Terraform export for every resource definition in your code. That means knowing the resource IDs of everything, including Azure AD objects, role assignments, and sub-resources. You’ll need to understand the format of those resource IDs – use an existing state file for that. Often the resource ID is the simple Azure resource ID, or a derivation of a parent resource ID that you can figure out from another state file. Sometimes you need to wander through Azure AD (look at assignments in scopes that you do have access to if you don’t have direct Azure AD rights), use Azure CLI to “list” resources or items, or browse around using Resource Explorer in the Azure Portal.
Do take some time to compare your code with any previous IaC code or with an ARM export. Look for things that are missing – Terraform has many defaults that won’t be included and that code is missing because it is not required. I often include that code because I know that they are settings that Devs/Ops might want to tune later.
If you have the misfortune of having to work an existing Terraform module library then you will have to translate the exported code as parameter/variable files for the new code – I do not envy you 🙂
Summary
This post is an introduction to Microsoft Azure Export for Terraform and a quick how-to-get-started guide. There is much more to learn about, such as how to use a custom backend (if resource names in Terraform are not a big deal and to eliminate the terraform import task) or even how to use a resource map to identify resources to export across many resource groups.
The tool is not perfect but it has saved me countless hours over the last year or so, dating back to when it was called Azure Terrafy. It’s one in my toolkit and I regularly break it out to speed up my work. In my opinion, anyone starting to work with Terraform should install and use this tool.