5 Most Common Azure Review Findings

An Azure architecture review is something I’ve done many times. Some are focused on networking. Some take a broader look at governance, security, and disaster recovery. Some are urgent — a customer has a problem and needs to understand the full picture before they can fix it. Others are scheduled health checks. The nature of each engagement varies, but the findings? They’re remarkably consistent.

After completing several Azure architecture reviews across very different organisations – different sizes, sectors, and levels of Azure maturity – I’ve noticed the same issues surfacing time and again. I thought it was worth documenting them, because if these problems appear this consistently, they’re likely to appear in your environment too.

Here are the five most common findings.

1. Governance Is Either Missing or Broken

This one appears in every single review. Without exception.

The most common anti-pattern is the “everything in one subscription” model. I understand how it happens – an IT manager kicks off a cloud migration, picks up a subscription, and starts deploying things. It works, for a while. Then the environment grows, the resource groups multiply, and suddenly you have a sprawling mess where cost management is a nightmare, RBAC delegations are a headache, and nobody can tell which resources belong to which workload.

The Microsoft Cloud Adoption Framework (CAF) has a clear answer to this: Landing Zones. One subscription per workload. No cost. No catch. The result is a level of granularity that simplifies cost management, role assignment, quota management, naming, and troubleshooting in one move.

Beyond subscriptions, I typically find that Management Groups haven’t been set up correctly – or at all. Azure Policy is either absent or consists of a handful of default assignments nobody has reviewed. Naming standards are inconsistent, making the environment harder to read and operate at scale.

The fix isn’t a multi-year transformation project. The fix is a minimum viable product: get the right structure in place, assign sensible policies, and improve from there. I’ve designed starter governance architectures in a single afternoon that gave organisations a solid foundation to build on. I’ve written previously about how I interpret and apply the CAF with customers, and why it’s never too late to apply it – even if you’ve been in Azure for years.

My company, Cloud Mechanix, offers a Cloud Strategy consulting service built around the CAF that gets the right foundations in place without the overhead of a months-long engagement.

2. The Network Architecture Is Overly Complex and Doesn’t Enforce Zero Trust

The second finding is closely related to the first. When governance is weak, networks tend to be large, flat, and complicated.

The most common pattern I encounter is what I call the “big VNet” design. Everything lives in one or two large Virtual Networks. Multiple workloads share the same address space. Route Tables get bigger and bigger as more exceptions are added. The network becomes unpredictable. Nobody is entirely sure what path traffic takes from A to B.

The security implication of this is significant. Without workload isolation, without proper routing via a central firewall, and without meaningful NSG enforcement, the environment defaults to a “full trust” model. Every workload can, in principle, reach every other workload. That is the opposite of Zero Trust.

The right design is a proper hub-and-spoke architecture, with application Landing Zones providing the granularity needed to enforce isolation. Each workload gets its own small Virtual Network, peered with the hub. The hub contains the firewall, connectivity resources, and nothing else. Traffic between spokes goes through the firewall and is subject to rules and IDPS inspection. I covered this in more depth in There Is More To Azure Networking Than Connectivity & Security.

Azure Virtual Network Manager (AVNM) makes this scalable. Automatic peering, routing, and IP Address Management mean that a new workload Landing Zone can be connected correctly with minimal manual effort. Cloud Mechanix has published a Bicep module for AVNM if you want a head start. We also do a fixed-price 5 day review of your (selected) Azure networks.

3. The Firewall Has Significant Limitations

I review a lot of firewalls. Very few of them are doing the job that was intended when they were deployed.

The problems vary. In some environments, the firewall is only inspecting a fraction of the traffic. The rest bypasses it entirely because the Route Tables aren’t configured correctly, or because workloads are co-located in the hub where they communicate directly. In others, the firewall is a single instance in a single Availability Zone with no redundancy. One data centre issue and the organisation loses its primary security control.

Network Security Groups are another recurring issue. They are either missing from most subnets, configured with overly permissive rules, or duplicated inconsistently across the environment. In several environments I’ve reviewed, a single NSG was associated with just one subnet while all others had open traffic. That’s not a security boundary. That’s a gap.

WAF configurations also warrant attention. It’s not unusual to find a Web Application Firewall deployed in a way that places unnecessary load on the network firewall, or where the WAF itself has no high availability and is restricted to a single Availability Zone.

There is rarely a simple fix here. These issues tend to be symptomatic of a broader architectural problem – the network was built incrementally without a coherent design. The right answer, in most cases, is a rebuild using a proper hub-and-spoke design with a cloud-native, scalable firewall. If your team needs to get up to speed on how to design this correctly, Cloud Mechanix runs a Designing Secure Azure Networks course for exactly that purpose.

4. Disaster Recovery Is Backup-Based and Wouldn’t Survive a Real Incident

This one concerns me the most.

Almost universally, the disaster recovery capability I encounter in reviews is backup-based rather than replication-led. On the surface, this looks like disaster recovery – data is being backed up, and some of those backups are geo-replicated. But look at what would actually happen if a major incident occurred, and the picture changes quickly.

Recovery Time Objectives are measured in days or weeks rather than hours. Recovery Point Objectives are up to 24 hours because backups run once a day. Multiple backup solutions introduce complexity and inconsistency. Retention periods are short, meaning a ransomware attack that went undetected for several weeks could render the backups useless. Active Directory is being restored from backup, which is widely regarded as error-prone and risky. And in several environments, the disaster recovery region hasn’t been pre-built or secured to the same standard as production.

The regulatory stakes are rising. EU NIS2 makes clear that subject organisations must demonstrate tested recovery plans, reasonable recovery objectives, and appropriate governance. Backup-based disaster recovery will be difficult to defend to a regulator following a major incident. I explored the distinction between backup, resiliency, and genuine disaster recovery in Backup Versus Resiliency Versus Disaster Recovery – worth a read if you’re trying to explain the difference to stakeholders.

The right direction is a replication-led strategy with a warm secondary Azure region. Azure Site Recovery handles virtual machine replication. Azure Backup with geo-redundant replication handles retention and clean-room restores. Infrastructure-as-code ensures that the secondary environment stays consistent with production. And critically – it should be tested regularly, with documented, automated recovery plans.

Disaster recovery should be treated as a core business risk management capability, not an IT optimisation exercise.

5. Monitoring and Security Visibility Are Inadequate

The last finding is perhaps the least glamorous, but it enables everything else.

Across the environments I’ve reviewed, visibility is typically poor. Virtual Network Flow Logs are not enabled. Defender for Cloud is either unused or operating with a limited set of plans that don’t reflect the actual risk profile of the workloads. Subscription-level diagnostic logs and activity logs aren’t being forwarded to a central Log Analytics Workspace. Alerts – whether for threat intelligence signals, IDPS events, or operational anomalies – are either absent or minimal.

This matters for two reasons. First, without visibility, security incidents go undetected. The assumption that no alerts means no problems is dangerously wrong. The assumption should be the opposite. Second, troubleshooting complex connectivity issues without Flow Logs, firewall logs, or PaaS diagnostics is genuinely difficult. I’ve helped diagnose problems that should have taken minutes but took hours because the logging was never turned on.

The fix here isn’t particularly expensive. Virtual Network Flow Logs with Traffic Analytics, a centralised Log Analytics Workspace, Defender for Cloud with appropriate plans enabled, and a sensible set of alerts will transform the visibility of an Azure environment. These should be baseline requirements in any well-governed deployment, not optional extras.

A Pattern Worth Noting

Reading back through that list, there’s a common thread. Each of these findings is a consequence of deploying Azure without a framework. Without a governance strategy, without a landing zone architecture, without a security policy – teams make decisions in isolation, workloads accumulate, and complexity grows in ways that nobody fully intended.

The Cloud Adoption Framework exists precisely to avoid this. It’s not a lengthy consulting exercise. Done right, it provides a practical process for building Azure correctly – one that starts with business motivations, produces a clear architecture, and enables continuous improvement. Cloud Mechanix has developed its own interpretation of the CAF that keeps the process lean and focused on results rather than documentation.

If any of the above findings sound familiar, it may be worth taking stock.

Is Your Azure Environment on the Right Track?

If the findings in this post ring any bells, a structured Azure architecture review is the fastest way.

Cloud Mechanix offers a Fixed-Rate Cloud Environment Review – an expert-led review of your Azure environment, delivered in five business days. The scope is agreed upfront, access is read-only, and the output is a comprehensive report with clear, prioritised recommendations. No vague observations. No 200-page documents that nobody reads.

Whether the concern is security, governance, network architecture, disaster recovery, or the broader picture – get in touch and we can take it from there.

There Is More To Azure Networking Than Connectivity & Security

This post will explain how a well-designed, secured, governed and managed network design plays a foundational role in digital transformation and cloud enablement.

Cloud Adoption Versus Cloud Migration

What? Aidan – I thought this was a post about Azure networking!

Yes, it is … but you’ll have to join me on this journey. Lately, I’ve been using the “we need to step back and think about why we’re doing any of this” line quite a bit. The context of that line changes, but the message remains consistent.

Why did we go to The Cloud (Azure in our case)? For many, the reason is something like “I was told to”, “we were leaving our old hosting company”, or “our hardware support ended”. Those reasons triggered what I call a cloud migration project. I’ve done a LOT of those projects – thanks to scope limitations in the engagement, forced either by poorly advised customers (that lead to restricted tenders) or salespeople who refused to have a larger conversation.

Many organisations with internal developers that do a cloud migration end up in a situation 18-24 months later. Developers refuse to deploy into “IT’s cloud”. This is because IT has recreated its old data centre in Azure, along with the restrictions, controls, and lack of trust. We were told “cloud is how you work, not where you work”, but not many people heard that message. We end up with situations where businesses have paid for Azure, but developers don’t get the Cloud; they get IT-driven and IT-restricted virtualisation in Azure.

Cloud Adoption is a change journey, as documented by the Cloud Adoption Framework. We are supposed to:

  1. Understand why the business (not IT) wants to use the Cloud
  2. Create a cloud strategy for the organisation
  3. Define and enable a new way of delivering cross-functional digital services.
  4. Do all the other technical stuff that we focus on, with the architecture based on the above.

Steps 1 and 2 (CAF Strategy and Phase) are the keys to cloud adoption success. In theory, if we do everything correctly:

  1. The developers want to adopt the new cloud environment because it enables their mission.
  2. The business sees a return on the investment with faster innovation of digital services.

Where Does Networking Come Into This?

Pretty much every customer I’ve dealt with wants to improve their security for business protection or to meet compliance requirements. That typically results in larger usage of Virtual Networks. Many customers end up recreating their data centre networks in Azure; they create 1 Virtual Network (spoke) for each VLAN:

  • DMZ
  • Regular zone
  • Secure Zone

Or maybe they have:

  • Dev
  • Test
  • Production

Each of these networks shares various traits:

  • A big virtual network with many subnets
  • Managed by the central IT infrastructure

I can go into all the security and complexity flaws that result from this too-common design pattern. But my focus is on cloud adoption in this post:

  • Developers are actively prevented from having network access/control. They rely on helpdesk tickets to get anything done – what happened to the essential cloud trait of “on-demand self-service”?
  • Subscriptions are filled with dozens of resource groups. Access is granted on a per-resource group granularity, which complicates and slows things down.
  • The desire for more security is gradually eroded due to operational complexity and constant delegation of rights with complicated granularity.

So, believe it or not, Azure networking is our canary in the mine. I have used, and I continue to use this reliable little bird to smell out operational/security failures in customers’ Azure environments.

Now, you know how I can detect adoption problems from the floor up. Next I want to explain how I can architect the Azure network to solve these issues.

Landing Zones

Let’s bend some minds. 8-ish years ago, I started working on a new “standard design” for my employer (a consulting company) with a fellow principal consultant. We mutually came to the table with an alternative subscription strategy than usual. The norm was that each of the above traditional spoke VNets would be aligned with a subscription each. That results in very few subscriptions, with demands for complicated role delegations, tagging, cost management, and so on. We switched to a 1 subscription/workload (application/service) approach; this new level of granularity:

  • Required 1 small Virtual Network where networking is required
  • Developer/operator role delegations are done once per subscription
  • Cost management is done per subscription (Budgets) with much less tagging for metadata
  • Easier operations with fewer mistakes through subscription selection in Azure Portal/PowerShell/CLI/etc. The resource groups in the subscription are related to only that workload.
  • The security boundary is much smaller. The access boundary is the single workload. Any VNet-based workloads must route via the hub firewall to reach any other workload, subject to rules and IDPS inspection.

Microsoft introduced the concept of landing zones a few years ago, which uses the same subscription/workload approach:

  • Platform landing zone: A subscription that offers shared infrastructure, such as a hub, a shared Application Gateway/WAF, Active Directory Domain Controllers, DNS, etc.
  • Application landing zone: A subscription that hosts a single application/service/workload.

Like with my approach, each landing zone has a Virtual Network (if required) that is:

  • Sized according to the workload architecture with some spare capacity.
  • Peered with the hub, with the egress path from the workload being via the hub firewall.

Security & Governance

Let’s consider some things:

  • The business requires governance to manage IT and to ensure regulatory compliance.
  • IT security must protect the business, customers, vendors, etc.
  • We have many workloads/subscriptions.

We cannot have 1 policy for everything – sometimes we have business/operational reasons to have more-strict policies or less-strict policies. For example, we might require more Defender for Cloud features in some workloads or allow PaaS public endpoints in others.

Microsoft gave us Enterprise Scale around 5 years ago. This reference architecture (with supplied templated deployments) offers a subscription categorisation approach using Management Groups:

  • Corporate: Workloads that can connect to other networks.
  • Online: Workloads that have an online presence and should not connect to other workloads.

Azure Policy is used to enforce the standards for each Management Group.

I don’t know about you, but I have never seen such a binary requirement in the real world. I’ve seen many people discuss/use a third Management Group called Hybrid; they wonder how to build the policies to enforce the requirements.

In the real world, just about everything is shades of grey when it comes to connectivity. I’ve had ultra-secure workloads with web interfaces. I’ve had low-end workloads with high security. And I can guarantee you that sensitive workloads have compelling business reasons to be both online and integrated with traditional private-protocol connectivity.

I thought about this last year and came up with a different approach. We can use CAF’s operational methodologies to develop a tiered, documented, and implemented policy that aligns with the organisation’s governance, security, and management requirements. I suggested that we would have three tiers (names are irrelevant):

  • Gold: The strictest policies
  • Silver: Medium-level policies, containing the most workloads
  • Bronze: The most relaxed policies

The result is 3 Management Groups (above), each with Azure Policy automatically auditing/enforcing the designed and continuously improved requirements.

The new (CAF Plan) operational model would introduce a step to categorise the workload based on security risks, governance requirements, and management needs. Each workload would be placed in the correct Management Group with policies.

The policies give us automation and guardrails. For example, where appropriate, we can:

  • Restrict regions.
  • Ban public IP association with NICs
  • Disable public endpoints
  • Enable Defender for Cloud plans
  • Force VNet Flow Logging
  • Configure diagnostics settings
  • Enable VNet Flow Logs
  • And much more

The key to this is momentum. My approach is “minimum viable product” (MVP). For example, I had a 30-minute call with a customer last year and designed their starter policies. Now they (should) run regular reviews to assess the policies/risks/requirements and expand the policies/implementations. We didn’t freeze for 2 years to build a policy. We got some essentials in place and we carried on with getting results for the business.

Now, let’s get back to networking!

At-Scale Network Configuration And Enforcement

Developers, operators, and (rival) service providers are empowered to build in the Azure environment with a new guardrail-protected landing zone approach. How do we ensure that their Virtual Networks are built correctly?

We can use Azure Virtual Network Manager (AVNM).

Note that the horrid per-subscription pricing for AVNM was replaced a long time ago. Please go back and reassess the pricing before you run away.

AVNM gives us policy-driven:

  • Discovery and grouping of Virtual Networks for granular policy assignments
  • Peering with a hub and mesh capabilities
  • Route Table deployment/association with User-Defined Routes (UDRs)
  • Security Admin Rules that are processed before NSG rules with override capabilities
  • IP Address Management (IPAM) to provide approved, non-repeating IP prefixes for new networks and to manage their lifecycle

In short, if you deploy a VNet, I can:

  • Get an approved IP prefix for the Virtual Network
  • Use Azure Policy to automatically configure/enforce things like VNet Flow Logs and DNS settings
  • Use AVNM to correctly connect, route, and secure your VNet

To quote Van Halen: “they got you coming in, and they got you going out”. I always did prefer “Van Hagar” 🙂

Summary

A legacy, cable-oriented, on-prem network in Azure indicates that the organisation has not modernised how digital services are created, operated, and delivered to the business. In short, the business is paying for the cloud but is getting remotely hosted Hyper-V.

We can enable modern collaborative working processes by modernising our designs. Using application landing zones will create a new form of granularity for all aspects of infrastructure, security, governance, and management. We can use the governance features to create the guardrails and some of the autmations. We can use Azure Virtual Network Manager (AVNM) to ensure a good Virtual Network deployment.

If You Want To Learn More

Contact me via my consulting company, Cloud Mechanix, if you would like to learn how I can help you with this design pattern.