Errors When You Add A Cert To Application Gateway Listener From Key Vault

This post is dealing with a situation where you attempt to add a certificate to a v2 Azure Application Gateway/Firewall (WAG_v2/WAF_v2) from an Azure Key Vault. The attempt fails and any further attempt to delete/modify the certificate fails with this error:

Invalid value for the identities ‘/subscriptions/xxxxxxx/resourcegroups/myapp/providers/Microsoft.ManagedIdentity/userAssignedIdentities/myapp-waf-id’. The ‘UserAssignedIdentities’ property keys should only be empty json objects, null or the resource exisiting property.

Application Gateway v2 and Key Vault

Azure Key Vault is the best place to store secrets in Microsoft Azure – particularly SSL certificates. Key Vault has a nice system for abstracting versions of a certificate so you can put in newer versions without changing references to the older one. There is also a feature for automatic renewal of expiring certs from certain issuers. I also like the separation of exposed resource from organisation secrets that you get with this approach; the legacy method was that you had to upload the cert into the WAG/WAF, but now WAG_v2/WAF_v2 allow you to store the certs in a Key Vault, and that limited access is done using a managed user ID (an Azure resource, not an Azure AD resource, which makes it more agile).

The Problem

I was actually going to write a blog post about how to obtain the secret ID of a certificate from the Key Vault so you could add it to the WAGv2/WAFv2. But as I was setting up the lab, I realised that during the day, Microsoft had updated the Azure Portal blade so certs were instead presented as a drop-down list box; now my post was pointless. But I continued setting things up and hit the above issue.

The Cause/Fix

When you use this architecture, WAF_v2/WAG_v2 requires that you have enabled soft delete on the Key Vault. And that’s the only check that they have been doing. The default setting for Key Vault soft delete is 90 days. I was in a lab, I was mucking around, so I set soft delete in my Key Vault to 7 days – a perfectly legit value for Key Vault. However, the Application Gateway (AppGW) requires it to be set to 90 days minimum … even though it does not check it!

To undo the damage you can run the following PowerShell cmdlets:

  • Set-AzApplicationGatewayIdentity
  • Remove-AzApplicationGatewaySslCertificate
  • Remove-AzApplicationGatewayHttpListener
  • Set-AzApplicationGateway to update the WAF

Thanks to Cat in the Azure network team for the help!

Enabling NSG Traffic Analytics Fails

This post will deal with a scenario where you get this error when attempting to enable NSG Traffic Analytics with a Log Analytics Workspace:

Failed to save flow log settings
Failed to update flow logs settings for ‘NSG-NAME’. Error: An error occurred..

NSG Traffic Analytics

I work mostly in Azure networking these days. My customers are typically larger enterprises that are focused on governance and security. When you build Azure network architecture for these kinds of organisations, the networks have many pieces to make micro-segmented security a reality. And that means you need to be able to troubleshoot NSG rules and routing. I find the troubleshooting tools in Network Watcher to be useless. Instead, I use:

  • My own understanding to make up a mental map of the effective routes for the subnet – because this is missing in Azure unless you have an allocated VM NIC in that subnet (often the case)
  • Azure Firewall’s logs
  • NSG Traffic Analytics logs in a Log Analytics Workspace

In my architecture, there is a single, central Log Analytics Workspace that is in a different subscription to the virtual networks/NSGs. And this is where the problem is rooted.

Symptoms

When you attempt to enable Traffic Analytics you get the above error. Interestingly, if you only attempt to enable NSG Flow Logs (data logged to storage account) there is no problem. So the issue is related to getting the Workspace configured as a part of the solution (NSG Traffic Analytics).

The Problem & Fix

The problem is that the Microsoft.Network resource provider must be enabled in the subscription that the Workspace is located in. In my case, as I said, I have a dedicated management subscription so there are no network resources to require/enable that resource provider automatically.

If you go to Subscriptions > Resource Providers in the Azure Portal, you can enable the provider there. Wait (no more than 15 minutes) and things should be OK then.

Thanks to Dalan in Azure Networking for helping fix this one!

Photo by Viktor Forgacs on Unsplash

Azure Firewall Improvements – February 2020

Microsoft blogged a couple of posts in the last month to announce some interesting news about Azure Firewall, a resource that I’m using with every customer that I dealt with in the last year.

Azure Firewall Manager (Preview)

I first played with Azure Firewall Manager in the Secure Virtual Hub preview. Now the feature is in preview with the “network SKU” of Azure Firewall. The concept starts with Azure Firewall Manager, an Azure Portal GUI that isn’t a resource; it’s a way to centrally manage one or more Azure Firewall resources in one region or in many regions.

Azure Firewall Manager does control a new top-level resource: a firewall policy. Policies move the management of Azure Firewall configuration and rules from the firewall resource to the policy resource. You can create a simple hierarchy of policies.

For example, I find myself creating the same collections/rules in every Azure Firewall; if a customer has 3 network deployments around the world with identical base requirements then you can create a “parent” policy. Then you can create a child policy for each firewall instance that is a child of the parent; that means it inherits the current and future configurations of the parent policy. And then you associate the child policy with the correct firewall. Now you do the network-specific changes in the child. Any future global changes go into the parent, and they will inherit down to each firewall.

Cool, right?

IP Groups (Preview)

This is another cool top-level resource. Let’s say I’m managing an Azure Firewall with a site-to-site network connection. There’s a pretty good chance that I am constantly creating rules for specific groups of addresses, sets of networks, or even all the “super-nets” of the WAN. Do I really want to remember/type each of those addresses? Surely a mistake will be made?

IP Groups allow you to create an abstraction. For example, I can put each of my WAN super-nets into an IP Group resource called wan-ipg. Then I can use wan-ipg instead of listing each address. Nice!

Support for TCP/UDP 65535

One of those base configurations that I’m constantly deploying is to enable Active Directory Domain Services (ADDS) domain controllers to replicate through the Azure Firewall. If you go look at the TCP/UDP requirements you’ll find that one of the rules requires a huge range, with the high port being 65535. However, Azure Firewall only supported up to TCP/UDP 64000. It did not affect me, but there were reports of issues with ADDS replication. Now you can create rules up to the normal maximum port number.

Forced Tunnelling Support

This is for those of you who live in 1990 or have tinfoil on your heads. Now you can force all outbound traffic to go back to on-premises instead of to the Internet. I guess that this one is for the US government or someone with equally large purchasing power (influence).

Enable Public IP Addresses in Private Networks

I’m working with a customer that has used public IP addressing behind their on-premises firewall. One of my colleagues at work has a similar customer. I know of others with the same sort of customer.

Azure Firewall has not been compatible with that configuration. Imagine this:

  • The customer has a public IP range for their on-premises LAN – no NAT rules on the firewall.
  • They have a site-to-site network connection to Azure.
  • An Azure Firewall sits in the hub of a hub and spoke network – all ingress and all egress traffic must pass through the firewall.
  • A service in an Azure spoke tries to communicate with something on-premises on one of those public IP addresses.

And that’s where it all goes wrong. Azure Firewall sees that the destination is a non-RFC1918 IP address (not 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16) and forcefully SNAT’s the packets to the Internet, and the packets never reach the on-premises destination.

With this update, you can use PowerShell/JSON to configure public IP ranges that are to route via the AzureFirewallSubnet (propagated routes from GatewaySubnet) and not to the Internet.

ICSA Labs Corporate Firewall Certification

Certifications are good, and some customers probably compare using these sorts of things.

Verifying Propagated BGP Routes on Azure ExpressRoute

An important step of verifying or troubleshooting communications over ExpressRoute is checking that all the required routes to get to on-premises or WAN subnets have been propagated by BGP to your ExpressRoute Virtual Network Gateway (and the connected virtual networks) by the on-premises edge router.

The Problem

Routing to Azure is often easy; your network admins allocate you a block of private address space on the “WAN” and you use it for your virtual network(s). They add a route entry to that CIDR block on their VPN/ExpressRoute edge device and packets can now get to Azure. The other part of that story is that Azure needs to know how to send packets back to on-premises – this affects responses and requests. And I have found that this is often overlooked and people start saying things like “Azure networking is broken” when they haven’t sent a route to Azure so that the Azure resources connected to the virtual network(s) can respond.

The other big cause is that the on-premises edge firewall doesn’t allow the traffic – this is the #1 cause of RDP/SSH to Azure virtual machines not working, in my experience.

I had one such scenario where a system in Azure was “not-accessible”. We verified that everything in Azure was correct. When we looked at the propagated BGP routes (via ExpressRoute) then we saw the client subnets were not included in the Route Table. The on-prem network admins had not propagated those routes so the Azure ExpressRoute Gateway did not have a route to send clients responses to. Once the route was propagated, things worked as expected.

Finding the Routes

There are two ways you can do this. The first is to use PowerShell:

The command takes quite a while to run. Eventually, it will spit out the full route table. If there are lots of routes (there could be hundreds if not thousands) then they will scroll beyond the buffer of your console. So modify the command to send the output to a text file:

Unfortunately, it does not create a CSV format by default but one could format the output to get something that’s easier to filter and manipulate.

You can also use the Azure Portal where you can view routes from the Route Table and export a CSV file with the contents of the Route Table. Open the ExpressRoute Circuit and browse to Peerings.

Click Azure Private, which is the site-to-site ExpressRoute connection.

Now a pop-up blade appears in the Azure Portal called Private Peering. There are three interesting options here:

  • Get ARP records to see information on ARP.
  • Get Route Table – more on this in a second.
  • Get Route Table Summary to get a breakdown/summary of the records, including neighbor, version, status ASN, and a count of routes.

We want to see the Route Table so you click that option. Another pop-up blade appears and now you wait for several minutes. Eventually, the screen will load up to 200 of the entries from the Route Table. If you want to see the entire list of entries or you want an export, click Download. A CSV file will download via your browser, with one line per route from the Route Table, including every one of the routes.

Search the Route Table and look for a listing that either lists the on-premises/WAN subnet or includes it’s space, for example, a route to 10.10.0.0/16 includes a subnet called 10.10.10.0/24.

I’m Presenting Two Sessions At NIC 20/20 Vision in Oslo

I will be presenting two Azure sessions at the (NICCONF) NIC 20/20 Vision conference in Oslo on February 6th.

The content I’m presenting on is inspired by the work I have been doing with Innofactor Norway for customers in Norway. So it will be kind of cool to stand (once again) on a stage in Oslo and share what I’ve learned. I have two sessions on the afternoon of the 6th.

Secure Azure Network Architecture

Azure networking & security has become my focus area. I enjoy the organic nature of how Azure’s software-defined networking functions. I enjoy the scale, the possibilities, and the variety of options. And most of all, I appreciate how the near-universally overlooked fundamentals play a bigger role in network security than people realise. It’s a huge area to cover, but I will do my best in the hour that I have:

This session will walk you through the components of Azure network security, and how to architect a secure network for Azure virtual machines or platform services, including VNets, network security groups, routing tables, VNet peering, web application gateway, DDoS protection, and firewall appliances.

Auditing Azure – Compliance, Oversight, Governance, and Protection

An important part of governance is recording what is going on in Azure and being able to retain, query, and report on that data. This is an area I had a cool solution for this time last year, but Microsoft blew that up. Recently I revisited this space and found cool new things that I could do. And in preparing for this session, I found more stuff that I could talk about. I’ve enjoyed preparing this session and it has contributed back to my work. This session is late in the day for most Norwegians, but I hope that attendees stick around.

Auditing isn’t the most glamorous subject, but in a self-service environment, it becomes important to protect assets, the company, and even your job. In this session, you’ll learn how Azure provides auditing functionality that you can query, report on, and store securely for as long as you need it in cost-efficient ways.

Hopefully, I will see some of you there at the event!

Back Teaching – Implementing Secure Azure Networks

After a quiet 2019, I am getting back into Azure training starting in March in Brussels, Belgium, with a new hands-on course called Implementing Secure Azure Networks.

2019 was a year of (good) upheaval. I started a new job with big responsibilities and a learning curve. Family-wise, we had a lot of good things going on. So I decided to put our (my wife and I) Cloud Mechanix training on the shelf for a while. All last year, I’ve been putting a lot of cool Azure networking & security things into practice with larger enterprises so I’ve been learning … new things, good practices, what works, what doesn’t, and so on. That put the seed into my head for the next class that I would write. Then along came Workshop Summit and asked if I would like to submit a 1-day practical training course. So I did, and they accepted.

The Course

Security is always number 1 or 2 in any survey on the fears of cloud computing. Networking in The Cloud is very different to traditional physical networking … but in some ways it is quite similar. The goals of this workshop are:

  • To teach you the fundamentals, the theory, of how Azure networking functions so you can understand the practical design and application
  • Do hands-on deployments of secure networks

As a result, this workshop takes you all the way back to the basics of Azure networking so you really understand the “wiring” of a secure network in the cloud. Only with that understanding do you understand that small is big. The topics covered in this class will secure small/mid businesses, platform deployments that require regulatory compliance, and large enterprises:

  • The Microsoft global network
  • Availability & SLA
  • Virtual network basics
  • Virtual network adapters
  • Peering
  • Service endpoints
  • Private Link & Private Endpoints
  • Public IP Addresses
  • VNet gateways: VPN & ExpressRoute
  • Network Security Groups
  • Application Firewall
  • Route Tables
  • Third-Party Firewalls
  • Azure Firewall
  • Architectures

Attendees will require an Azure subscription capable of deploying multiple 4 x single-core virtual machines, 1 x Azure Firewall, 1 x Web Application Gateway, and 1 x per GB Log Analytics Workspace for 1 day.

When

Tuesday, 3rd March

Where

Venue: the Hackages Lab, located at Avenue des Arts 3-4-5 in Brussels

Organisers & Registration

This event is being run by The Workshop Summit. All registration and payments are handled by that event.

Who Should Attend

You don’t need to be a networking guru to attend this class. I always start my Azure networking training by explaining that I have never set up a VLAN; I’m proud of that! But I can out-network most people in Azure. Azure networking requires some learning, especially to do it correctly and securely, and that starts with re-learning some fundamentals. Those who understand basic concepts like a route, a firewall rule, network addressing (CIDR blocks), and so on will do fine on this course.

Who will benefit? Anyone planning on working with Azure. If you’re the person building the first “landing zone” for a migration, setting up the infrastructure for a new cloud-based service, working with IaaS VMs or platform (PaaS – yes network security plays a big role here!) then this course is for you. Get this stuff right early on and you’ll look like a genius. Or maybe you’ve already got an infrastructure and it’s time to learn how to mature it? We will start with the basics, cover them deeply, and then dive deep, focusing on security in ways that a typical Azure introduction course cannot do.

Why A Bastion Host Is Necessary For Remote VM Administration (Including Azure)

This post will explain why you should use a “Bastion Host” or a “Jump Box” to securely remote into Linux (SSH) or Windows (Remote Desktop) virtual machines. And this advice also includes machines that you run in a cloud, such as Microsoft Azure.

For the Fundamentalists on Social Media

Some people are going to make some comments like:

“This is why you should use remote Bash|PowerShell scripting”

Or maybe:

“You should be using Windows Admin Center”.

Windows Admin Center – great! Genuinely. But it does not do everything.

There are still many times when you need to directly log into a machine and do something; that’s real life, and not some blogger’s lab life.

Security Center JIT VM Access?

I was a fan of this feature. That was until they changed how the allow (RDP, SSH, etc) rules were added to an NSG. In my work, every subnet is micro-segmented. That means that the last user-defined NSG rule is Deny All from * to *. Since JIT VM Access was changed, it moves the last rule (if necessary) and puts in the allow-RDP or all-SSH (or whatever) rule after the DenyAll rule which is useless. Feedback on this has been ignored.

Why Are SSH and RDP Insecure?

I can’t comment too much on SSH because I’m allergic to penguins. But I can comment on RDP. Over the last few months, I can think of 3 security alerts that have been released about pre-authentication vulnerabilities that have been found in Remote Desktop. What does that mean?

Let’s say that you have a PC on your WAN that is infected by malware that leverages a known or zero-day pre-authentication remote desktop vulnerability. If that PC has the ability to communicate with a remote VM, such as an Azure Windows/Linux VM, via SSH or RDP then that remote machine is vulnerable to a pre-authentication attack. That means that if malware gets onto your network, and that malware scans the network for open TCP 22 or TCP 3389 ports, it will attempt to use the vulnerability to compromise the remote VM. It does not require the user of the PC to SSH or RDP into the remote VM, or to even have any guest OS access! You can put a firewall in front of the remote virtual machines, but it will do no good; it’s still allowing TCP 3389 or TCP 22 directly into the virtual machines and all it will offer is logging of the attack.

A Bastion Host

You might have heard the term “bastion” in the Azure world recently. However, the terms Bastion Host or Jump Box are far from new. They’re an old concept that allows you to isolate valuable machines and services behind a firewall but still have a way to remote into them.

The valuable remote virtual machines are placed behind a firewall. In Azure, that could be a firewall appliance, such as Azure Firewall, and/or Network Security Groups. Now to connect to the remote VMs, you must first remote into the Bastion Host. And from that machine, you will remote further into the network through the isolation of the firewall/NSGs.

But that’s still not perfect, is it? If we do simple SSH or RDP to the Bastion Host, then it is vulnerable to pre-authentication attacks. And that means once that machine is compromised, it can attack further into the remote network. What we need is some kind of transformation.

Remote Desktop Gateway

My preferred solution is to deploy a Remote Desktop Gateway (RDGW) as the bastion host – this does not require RDP licensing for administrative access to the remote virtual machines! The Bastion Host is deployed as one virtual machine or 2+ load-balanced virtual machines that allow in HTTPS connections via firewall/NSG rules. When an administrator/developer/operator needs to log into a remote VM, their Remote Desktop client is configured to connect to this gateway using HTTPS instead of RDP. Once the connection is authenticated by the RDGW, it reverse proxies the connection through to the desired virtual machine, further protected by firewall/NSG rules. Now the malware that is on the WAN cannot probe any machines in the remote network; there is no opening across the network to TCP 3389 or TCP 22. Instead, the only port open for remote connections is HTTPS which requires authentication. And internally, that transforms to connections from the RDGW to the remote VMs via TCP 3389.

Some sharp-eyed observers might notice that the recently announced CVE-2020-0609  is a pre-authentication attack on RDGW! Yes, unpatched RDGW deployments are vulnerable, but they are smaller in number and easier to manage patches for than a larger number of other machines. Best practice for any secure network is to limit all external ports. Transforming the protocol in some way, like an RDGW, further reduces the threat of that single opening to a single service that forwards the connection.

If you want to add bells and whistles, you can deploy Network Policy Server(s) to centrally manage RDGW policy and even add multi-factor authentication (MFA) via Azure AD.

This is great for Windows, but what about Linux? I’m told that Guacamole does a nice job there. However, Guacamole is not suitable for recent releases of Windows because of how it must have hardcoded admin credentials for Network Layer Authentication (NLA).

Azure Bastion

Azure Bastion made lots of noise in IT news sites, and on blogs and social media when it went into preview last year, and eventually it went GA at Ignite in November of last year. Azure Bastion is a platform-based RDGW. Today (January 2020), I find it way too limited to use in anything but the simplest of Azure deployments:

  • The remote desktop authentication/connection are both driven via the Azure Portal, which assumes that the person connecting into the guest OS even has rights to the Azure resources.
  • It does not support desktop Remote Desktop/SSH clients.
  • It does not offer MFA support for the guest OS login, only for the Azure Portal login (see above).
  • VNet peering is not supported, limiting Azure Bastion to pretty simple Virtual Network designs.

If Azure Bastion adds VNet peering, it will make it usable for many more customers. If it understands that guest OS/Azure resource rights OS/Azure Portal logins can be different, then it will be ready for mid-large enterprise.

 

Windows 7 Support Has Ended

You will have to be hiding under an “IT rock” to not know this: today, on January 14th, Microsoft is releasing their very last updates for Windows 7 to the public. Yes, after over 10 years of support, Windows 7 is now end-of-life.

Disclaimer: businesses can extend security fix availability for Windows 7 in one of two ways:

  • Run Windows 7 in Azure with appropriate RDS licensing for a VDI solution, with security fix availability for 3 years from today.
  • Subscribe to a year-by-year (maximum three years from today) security fix program, where the price will probably double each year.

It’s hard to believe that Windows 7 became generally available 10 years and 3 months ago. It was still early in my active-in-the-community days. This was a time when Microsoft used to run public events, and technical people would promote their products. I was asked by the DPE/partner teams in Dublin to work with them on their Windows 7 “community launch” roadshow in 4 cities around Ireland: Belfast, Galway, Cork, and Dublin. Each event featured 1 or 2 business-focused shows during the day, and 1 consumer-focused show in the evening. I honestly don’t remember what Windows 7 stuff I talked about back then – it could have been MDT, I don’t recall. But I remember each event had a huge attendance – the free copy of Windows 7 Ultimate (it should have been a Home version but accidentally was announced and supplied as Ultimate at great cost to MS Ireland!) helped. But despite the big freebie, the interest was genuine and there was lots of interaction.

Windows 7 was a great OS. From the first time I used it, either Beta or Release Candidate, it was stable. I logged a bug with the wi-fi config assuming you were in the USA, which was acknowledged and resulted in a free copy of Windows 7 for me (along with one from the roadshow!). Uptake with businesses was slow – the eventual end-of-life for Windows XP resulted in lots of rushed deployments. Then along came the deeply unpopular Windows 8/8.1 and that meant that people stuck with Windows 7. Even today, businesses have held on tight, fearing the forever-frequently-upgrading model and different management of Windows 10.

I’m actually feeling a little weird. It doesn’t feel like 10 years. On one hand, it feels like yesterday that I was hanging with the Windows 7 & Windows Server 2008 R2 launch team at a hotel in Galway, Belfast, or Cork. That’s us in the blue/black rugby jersey’s above, which had a 7 on the back. Dave moved into an enterprise role in Microsoft and has since left in recent years – he’s the one that got me involved in community stuff after I had been blogging for a while. Enda left Microsoft and emigrated with his family to live a great life in Norway. Wilbour moved to Microsoft in his native Canada and has since left the company. There’s me … And Patrick has since passed on. We literally presented that show on the seat of our pants. The demo lab build stated the night before in a hotel room in Galway, and I remember Patrick finishing his build behind the curtain while Dave was presenting! And that curry in the Indian Princess in Cork … Wilbour and I dared each other to eat the Chicken Phal. I think I needed 3-4 pints of beer to down it, and maybe some loo roll in the fridge. On the other hand, it feels like life has moved at lightspeed and so much has happened since then.

EDIT:

How could I forget … actually my work in Azure has me rarely signing into a customer’s OS anymore … but today is also the end of support for Windows Server 2008 and Windows Server 2008 R2. Wow! My first community involvement with Microsoft was the launch of W2008. Dave (above) ran a series of events during the beta/RC time period to bring IT pros up to speed on the new server OS. I was working with a “large” Irish hosting company as the senior Microsoft engineer, maintaining what was there and building a new VMware hosting platform – yeah, you read that right. I was invited to attend the sessions. Towards the end, Dave asked if anyone was interested in doing some community work. I volunteered and next thing I know, I was standing on the main stage with Dave and Mark (who now runs the Microsoft data centre tours in Dublin) for the launch of W2008. That was a mad day in the Pod nightclub in Dublin. There were three launch events in 1 day. Each had 3 session slots – a keynote presented by an Irish guy working in Redmond in Server marketing, and then two slots where you could attend different sessions. We were in the main hall and presented W2008 in slots 2 and 3, 3 times that day. I remember we had to time it perfectly … music would literally drown us out after 25 minutes so we had to be quick. That, and the fear of the crashes that plagued the local Vista launch, meant that all demos were recorded and editing was done to make the videos quicker. I think I talked about Server Core. I remember the install demo and saying how quick it was, and getting some laughs when I explained that it wasn’t as quick as the obviously edited video. And the following night was the first time that I hosted/presented at a user group community event in Dublin.

My big memory of the W2008 R2 launch was the roadshow we did in Dublin while it was still beta/RC to build up interest. By now, I was working for a different hosting company and was building a new hosting platform that would be based on W2008 R2 Hyper-V and System Center. It as another roadshow in Belfast, Galway, Cork, and Dublin, with the same gang as the previous Windows 7 one. I remember Dave build a Hyper-V lab using a couple of laptops and a 1 Gbps switch. He was so proud that he had a demo lab that didn’t rely on dodgy hotel wi-fi or phone signals. It worked fine in rehearsals, but Live Migration failed in every live demo, which Dave insisted on fixing in front of each audience. I was co-presenting with him. The Dublin event, in the Hilton by the Grand Canal, was crazy. Dave put his head down, waved at the audience and said “I’ll fix this”. Time was passing, so I decided to do “a dance” to entertain the crowd. When I say “dance” imagine the Umpalumpas dancing in Charlie & The Chocolate Factory.

Yes, time has moved on … 10+ years of it! And now Windows 7 is breathing its last hours as a fully supported OS. I sure hope that your desktop OS has moved on too.

Setting Up Azure – The Three Permissions You Will Need

You need to have rights to configure certain things in Microsoft Azure when you are setting it up for the first time. I will list those three permissions and the reasons for them in this post.

1. Global Admin Rights

You are going to need rights to configure things in Azure AD. For example, you should be creating security groups and using those for role-based access control of things like management groups, subscriptions, and maybe even resource groups – the higher in the hierarchy, the better, in my opinion.

This will require that you have Global Admin Rights. This is the equivalent of being a domain admin in Azure AD, and will impact all services attached to your directory such as Office 365. This right should be limited to just a few people. In a very large organisation, someone else might be doing these tasks for you because you will not be granted the necessary rights.

This role is easily added to the user account in Azure AD, either at the time of creation or later by opening the user account and selecting Assigned Roles.

2. Access Management For Azure Resources

This is an easy right to miss! It is also known as Elevated Access. This right gives you access to all subscriptions and management groups in your directory (tenant) and therefore grants you superuser powers that should be limited to a very small group of capable people. Here’s how I learned about the right: I was cleaning up management groups that I created using a service principal. I knew the management groups were there, and I could see them, but my Global Admin user couldn’t remove them. It was only when I elevated my account that I was able to move the subscriptions and remove the management groups.

Part of the reason this right is so hidden is that it is not configured in the user account screen in the Azure Portal. Instead, sign in to the Portal with your Global Admin-enabled user, open Azure AD, and then go to Properties. Now click “Yes” under Access Management For Azure Resources. Now you will have rights to everything in Azure even if you weren’t granted them originally – this is why this superpower should be tightly controlled!

3. Role-Based Access

The typical person working with Azure should have only the rights that they need to do their job. The two big reasons are:

  • External threats: Prevent someone from compromising a dev/ops employee’s account and using their rights to compromise the entire system.
  • Internal threats: Limit access that a single employee has, either for security or compliance reasons.

For example, one should not be made a subscription owner just “because”. Typically, being made a Contributor will give you more than enough rights to do your job in a subscription. And maybe a lesser right is necessary – an auditor might only be made a Reader or you might use/create a more specialised role.

One should start the RBAC design using management groups. As with organisational units in Active Directory Domain Services, management groups should model the administrative model, not the HR org chart. Permissions and policy association should start at the top and become more granular as you work your way down. Eventually, you will grant dev/ops rights often at the subscription or even resource group level.

Another Consideration: Privileged Identity Management

PIM is a solution in the Azure AD per-user licensing SKUs that is sometimes used in large enterprises. It allows you to deploy just-in-time access to Azure resources/rights. There are a bunch of features in PIM that make it a useful feature to limit any one person’s access to what they need, when they need it, and for only as long as they need it, with MFA, oversight, and auditing.

The Worst IT Project I Was Ever A Part Off

This post will discuss a failed project that I was brought into and what I observed and learned from that project. It’s a real scenario that happened years ago, involving an on-premises deployment.

Too Many Chefs Spoil the Broth

Back in 2010, I joined a Dublin-based services company. The stated intention from the MD was that I was to lead a new Microsoft infrastructure consulting team. As it turned out, not a single manager or salesperson in the company believed that there was any work out there in Microsoft infrastructure technology – really! – and it never really got off the ground. But I was brought into one customer, and this post is the story of that engagement.

It was a sunny, cold day when I drove out to the customer’s campus. They are a large state-owned … hmm … transport company. I had enough experience in IT to know that I was going to be dealing with strong personalities and opinions that were not necessarily based on fact. My brief was that I would be attending a meeting with all the participants of a failing Windows Server 2008 R2 Hyper-V and System Center 2008 R2 project. I came in and met a customer representative – the technical lead of the project. He immediately told me that I was to sit in the corner, observe, and not talk to any of the participants from the other services providers. Note that the last word is plural, very plural.

I sat at the far corner of a long board room table and in came everyone. There was the customer IT managers and tech staff, the storage manufacturer (HP – now HPE) and their partner, the networking manufacturer (Cisco) and their partner, a Microsoft Premier Field Engineer, the consultants that implemented Hyper-V, the consultants that implemented the System Center management of Hyper-V, and probably more. Before I continue, I think that the Hyper-V cluster was something like 6 nodes and maybe 50-100 VMs.

Quickly it became evident that this was the first time that any of the participants in the meeting had talked to each other. I should re-phrase that: this was the first time any of the participants in deploying the next-generation IT infrastructure for running the business had been allowed to talk to each other.

  • A new W2008 R2 Hyper-V cluster was built. Although the customer was adamant that this was not true (it was), a 2-site cluster was built as a single-site cluster. There was a latent link between the two sites, with no control of VM placement and no third-site witness.
  • HP P4000 “Lefthand” module-based iSCSI storage was used without any consideration of persistent iSCSI reservations – a common problem in the W2008 R2 era, where the volumes in the SAN would “disappear” from the cluster because the scale-out of NICs/CSVs/SAN nodes went beyond the limits of W2008 R2 – a result of poor understanding of storage performance and Hyper-V architecture.
  • I remember awful problems with backup. DPM was deployed by a consulting firm but configured by a local staff member. He had a nightmare with VSS providers (HP were awful at this) and backup job sizing. It was not helped by the fact that backup was an afterthought in Hyper-V back then, not resolved really until WS2012 when it became software-defined. This combined with how the P4000 worked, the multi-site cluster that wasn’t, and Redirected IO caused all sorts of fun.
  • VMs would disappear – yup the security officer insisted that AV was installed on each host and it scanned every folder, including the CSVs. They even resisted change when presented with the MS documentation on scan exceptions that must be configured on Windows Server roles/features, including Hyper-V.

These were just a few of the technical issues; there were many more – inconsistent or missing patching, NIC teaming issues, and so on. I even created a 2-hour presentation based on this project that I (unofficially) called “How to screw up a Hyper-V project”.

My role was to “observe” but I wanted this thing fixed, so I contributed. I remember I spent a lot of time with the MS PFE on the customer site. He was gathering logs on behalf of support and we shared notes. Together we identified many issues/solutions. I remember one day, the customer lead shouted at me and ordered me back to my desk. I was not there “to talk to people but to observe”. The fact that I was one of two people on site that could solve the issues was lost on him.

The customer’s idea of running a project was to divide it up into little boxes and keep everyone from talking to each other. Part of this was how they funded the project – once it went over a certain monetary level it had to be publicly tendered. They had their preferred vendors and they went with them, even if they were not the best people. This created islands of knowledge/expertise and a lack of a vision. The customer thought they could manage this, and they were wrong. Instead, each supplier/vendor did their own thing based on assumptions of what others were doing and based on incorrect information shared by the customer’s technical team. And it all blew up in the customer’s face.

In the end, I heard that the customer blamed the software, the implementors, and everyone else involved in the project but themselves. They scrapped the lot and went with VMware, allegedly.

Lessons Learned

I think that there were three major lessons to be learned from this project. I know that these lessons apply equally today, no matter what sort of IT project you are doing, including on-premises, hybrid, or pure cloud.

The Business

IT enables or breaks the business. That’s something that most boards/owners do not understand. They think of IT as the nerds playing Doom in a basement, with their flashing lights and whirring toys. Obviously, that’s a wrong opinion.

When IT works, it can make the IT faster, more agile, and more competitive. New practices, be they operational or planning, can change IT, but I’ve even read how SCRUM/Agile concepts can even be brought to business planning.

Any significant IT project that will impact the business must start with the business. Someone at the C-Level must own it, be invested in it, and provide the rails or mission statement that directs it. That oversight will force those involved in the project to operate correctly and give them guidance on how to best serve the business.

Architecture

Taking some large-impact IT project and treating it as a point solution will not work. For example, building an entirely new IT infrastructure without considering the impact of or the dependencies on networking is stupid! You cannot just hand-off systems to different vendors and wish them bon voyage. There must be a unified vision. This technical vision starts with the previously mentioned business vision that guide-rails the technical design. All components that interconnect and have direct/indirect involvements must be designed as a whole.

Unified Delivery

The worst thing one can do is divvy up IT infrastructure to 5 or 6 vendors and say, you do that, and I will participate in a monthly meeting. That’s not IT! That’s bailing out on your responsibility! IT vendors can play a role, when chosen well. But they need a complete vision to do their job. And if they cannot get that from you, they must be allowed to help you build it. If your IT department’s role is to manage outsourcing contracts and nothing more, you have already failed the business and should just step aside.

A unified delivery must start with internal guidance, sharing the complete vision with all included parties, internal and external, as early as possible. Revealing significant change that you are working on with Vendor A 6 months into a project with Vendor B is a fail. Isolating each of the vendors is a fail. Not giving each vendor clear rules of engagement with orchestrated interaction is a fail. The delivery must be unified under the guidance of the architect who has a complete vision.

Bad IT Starts at The Top

In my years, I’ve done plenty of projects, reviewed many customer’s IT systems, and worked as a part of IT departments. Some of them were completely shocking. A common theme was the CIO/CTO: typically, an accountant or finance officer who was handed the role of supervising IT because … well … it’s just IT and they have a budget to manage. Someone who doesn’t understand IT, hires/keeps bad IT managers, and bad IT managers hire bad IT staff, make bad IT decisions, and run bad IT projects. As the saying goes, sh&t rolls downhill. When these bad projects are happening to you, and you run IT, then you must look at the mirror and stop pointing the finger elsewhere.

And before you say it, yes, there are crap consultants too 😊