The Secret Sauce That Devs Don’t Want IT Pros to Know About

Honesty time: that title is a bit click-baitish, but the dev community is using a tool that most IT pros don’t know much/anything about, and it can be a real game changer, especially if you write scripts or work with deployment solutions such as Azure Resource Manager (ARM) JSON templates.

Shot Time

As soon as I say “DevOps” you’re already reaching for that X on the top right or saying “Oh, he’s lost it … again”. But that’s what I’m going to talk about: DevOps. Or to be more precise, Azure DevOps.

Methodology

For years, when I’ve thought about DevOps, I’ve thought “buggy software with more frequent releases”. And that certainly can be the case. But DevOps is born out of the realisation that how we have engineered software (or planned almost anything, to be honest) for the past few decades has not been ideal. Typically, we have some start-middle-end waterfall approach that assumes we know the end state. If this is a big (budget) or long (time) project, then getting half way to that planned end-state and realising that the preconception was wrong is very expensive – it leads to headlines! DevOps is a way of saying:

  • We don’t know the end-state
  • We’re going to work on smaller scenarios
  • We will evolve what we create based on those scenarios
  • The more we create, the more we’ll learn about what the end state will be
  • There will be larger milestones, which will be our releases

This is where the project management gurus will come out and say “this is Scrum” or some other codswallop that I do not care about; that’s the minutia for the project management nerds to worry about.

Devs started leaning this direction years ago. It’s not something that is always the case – most devs that I encountered in the past didn’t use some of the platform tools for DevOps such as GitHub or Azure DevOps (previously Teams). But here’s an interesting thing: some businesses are adopting the concepts of “DevOps” for building a business, even if IT isn’t involved, because they realised that some business problems are very like tech problems: big, complex, potentially expensive, and with an unknown end-state.

Why I Started

I got curious about Azure DevOps last year when my friend, Damian Flynn, was talking about it at events. Like me, Damian is an Azure MVP/IT Pro but, unlike me, he stayed in touch with development after college. I tried googling and reading Microsoft Docs, but the content was written in that nasty circular way that Technet used to be – there was no entry point for a non-dev that I could find.

And then I changed jobs … to work with Damian as it happens. We’ve been working on a product together for the last 7 months. And on day 1, he introduced me to DevOps. I’ll be honest, I was lost at first, but after a few attempts and then actually working with it, I have gotten to grips with it and it gives me a structured way to work, plan, and collaborate on a product that will never have and end-state.

What I’m Working On

I’m not going to tell you exactly what I’m working on but it is Azure and it is IT Pro with no dev stuff … from me, at least. Everything I’ve written or adjusted is PowerShell or Azure JSON. I can work with Damian (who is 2+ hours away by car) on Teams on the same files:

  • Changes are planned as features or tasks in Azure DevOps Boards.
  • Code is stored in an Azure DevOps repo (repository).
  • Major versions are built as branches (changes) of a master copy.
  • Changes to the master copy are peer reviewed when you try to merge a branch.
  • Repos are synchronized to our PCs using Git.
  • VS Code is our JSON and PowerShell editor.

It might all sound complex … but it really was pretty simple to set up. Now behind the scenes, there is some crazy-mad release “pipelines” stuff that Damian built, and that is far from simple, but not mandatory – don’t tell Damian that I said that!

Confusing Terminology

Azure DevOps inherits terminology from other sources, such as Git. And that is fine for devs in that space, but some of it made me scratch my head because it sounded “the wrong way around”. Here’s some of the terms:

  • Repo: A repository is where you store code.
  • Project: A project might have 1 or more repos. Each repo might be for a different product in that project.
  • Boards: A board is where you do the planning. You can create epics, tasks and issues. Typically, an Epic is a major part of a solution, a task is what you need to do to make that work, and an issue is a bug to be fixed.
  • Sprint: In managed projects, sprints are a predefined period of time that you assign people to. Tasks are pulled into the sprint and assigned to people (or pulled by people to themselves) who have available time and suitable skills.
  • Branch: You always have one branch called the master or trunk. This is the “master copy” of the code. Branches can be made from the master. For example, if I have a task, I might create a branch from the master in VS Code to work on that task. Once I am complete, I will sync/push that branch back up to Azure DevOps.
  • Pull Request: This is the one that wrecked my head for years. A pull request is when you want to take your changes that are stored in a branch and push it back into the parent branch. From Git’s or DevOps’ point of view, this is a pull, not a push. So you create a pull request for (a) identify the tasks you did, get someone to review/approve your changes, merge the branch (changes) back into the parent branch.
  • Nested branch: You can create branches from branches. Master is typically pretty locked down. A number of people might want a more flexible space to work in so they create a branch of master, maybe for a new version – let’s call this the second level branch. Each person then creates their own third level branches of the first branch. Now each person can work away and do pull requests into the more flexible second-level branch. And when they are done with that major piece of work, they can do a pull request to merge the second-level back into the master or trunk.
  • Release: Is what it sounds like – the “code” is ready for production, in the opinion of the creators.

Getting Started

The first two tools that you need are free:

  • Git command line client – you do not need a GitHub account.
  • Visual Studio Code

And then you need Azure DevOps. That’s where the free pretty much stops and you need to acquire either per-user/plan licensing or get it via MSDN/Visual Studio licensing.

Opinion

I came into this pretty open minded. Damian’s a smart guy and I had long conversations with one of our managers about these kinds of methodologies after he attended Scrum master training.

Some of the stuff in DevOps is nasty. The terminology doesn’t help, but I hope the above helps. Pipelines is still a mystery to me. Microsoft shared a doc to show how to integrate a JSON release via Pipelines and it’s a big ol’ mess of things to be done. I’ll be honest … I don’t go near that stuff.

I don’t think that me and Damian could have collaborated the way we have without DevOps. We’ve written many thousands of lines of code, planned tasks, fought bugs. It’s been done without a project manager – we discuss/record ideas, prioritize them, and then pull (assign to ourselves) the tasks when we have time. At times, we have worked in the same spaces and been able to work as one. And importantly, when it comes to pull requests, we peer review. The methodology has allowed other colleagues to participate and we’re already looking at how we can grow that more in the organization to bring in more skills/experience into the product. Without (Azure) DevOps we could not have done that … certainly storing code on some file storage in the cloud would have been a total disaster and lacked the structure that we have had.

What Impact on You Will AMD EPYC Processors Have?

Microsoft has announced new HB-V2, Das_v3, and Eas_v3 virtual machines based on hosts with AMD EPYC processors. What does this mean to you and when should you use these machines instead of the Intel Xeon alternatives?

A is for AMD

The nomenclature for Azure virtual machines is large. It can be confusing for those unfamiliar with the meanings. When I discussed the A-Series, the oldest of the virtual machine series, I would tell people “A is the start of the alphabet” and discuss these low power machines. The A-Series was originally hosted on physical machines with AMD Opteron processors, a CPU that had lots of cores and required little electricity when compared to the Intel Xeon competition. These days, an A-Series might actually be hosted on hosts with Intel CPUs, but each virtual processor is throttled to offer similar performance to the older hosts.

Microsoft has added the AMD EPYC 7002 family of processors to their range of hosts, powering new machines:

  • HB_v2: A high performance compute machine with high bandwidth between the CPU and RAM.
  • Das_v3 (and Da_v3): A new variation on the Ds_v3 that offers fast disk performance that is great for database virtual
  • Eas_v3 (and Ea_v3): Basically the Das_v3 with extra

EPYC Versus Xeon

The 7002 or “Rome” family of EPYC processors is AMD’s second generation of this type of processor. From everything I have read, this generation of the processor family firmly returns AMD back into the data centre.

I am not a hardware expert, but some things really stand out about the EPYC, which AMD claims is revolutionary about how it focuses on I/O, which pretty important for services such as databases (see the Ds_v3/Es_v3 core scenarios). EPYC uses PCI Gen 4 which is double the performance of Gen 3 which Intel still uses. That’s double the bus to storage … great for disk performance. The EPYC gets offers 45% faster RAM access than the Intel option … hence Microsoft’s choice for the HB_v2. If you want to get nerdy, then there are fewer NUMA nodes per socket, which reduces context switches for complex RAM v process placement scenarios.

Why AMD Now?

There have been rumours that Microsoft hasn’t been 100% happy with Intel for quite a while. Everything I heard was in the PC market (issues with 4th generation, battery performance, mobility, etc). I have not heard any rumours of discontent between Azure and Intel – in fact, the DC-Series virtual machine exists because of cooperation between the two giant technology corporations on SGX. But two things are evident:

  • Competition is good
  • Everything you read about AMD’s EPYC makes it sound like a genuine Xeon killer. As AMD says, Xeon is a BMW 3-series and EPYC is a Tesla – I hope the AMD build quality is better than the American-built EV!
  • As is often the case, the AMD processor is more affordable to purchase and to power – both big deals for a hosting/cloud company.

Choosing Between AMD and Xeon

OK, it was already confusing which machine to choose when deploying in Azure … unless you’ve heard me explain the series and specialisation meanings. But now we must choose between AMD and Intel processors!

I was up at 5 am researching so this next statement is either fuzzy or was dreamt up (I’m not kidding!): it appears that for multi-threaded applications, such as SQL Server, then AMD-powered virtual machines are superior. However, even in this age-of-the-cloud, single threaded applications are still running corporations. In that case, (this is where things might be fuzzy) an Intel Xeon-powered virtual machine might be best. You might think that single-threaded applications are a thing of the past but I recently witnessed the negative affect on performance of one of those – no matter what virtual/hardware was thrown at it.

The final element of the equation will be cost. I have no idea how the cost of the EPYC-powered machines will compare with the Xeon-powered ones. I do know that the AMD processor is cheaper and offers more threads per socket, and it should require less power. That should make it a cheaper machine to run, but higher consumption of IOs per machine might increase the cost to the hosting company (Azure). I guess we’ll know soon enough when the pricing pages are updated.

Migrating Azure Firewall To Availability Zones

Microsoft recently added support for availability zones to Azure firewall in regions that offer this higher level of SLA. In this post, I will explain how you can convert an existing Azure Firewall to availability zones.

Before We Proceed

There are two things you need to understand:

  1. If you have already deployed and configured Azure Firewall then there is no easy switch to turn on availability zones. What I will be showing is actually a re-creation.
  2. You should do a “dress rehearsal” – test this process and validate the results before you do the actual migration.

The Process

The process you will do will go as follows:

  1. Plan a maintenance window when the Azure Firewall (and dependent communications) will be unavailable for 1 or 2 hours. Really, this should be very quick but, as Scotty told Geordi La Forge, a good engineer overestimates the effort, leaves room for the unexpected, and hopefully looks like a hero if all goes to the unspoken plan.
  2. Freeze configuration changes to the Azure Firewall.
  3. Perform a backup of the Azure Firewall.
  4. Create a test environment in Azure – ideally a dedicated subscription/virtual network(s) minus the Azure Firewall (see the next step).
  5. Modify the JSON file to include support for availability zones.
  6. Restore the Azure Firewall backup as a new firewall in the test environment.
  7. Validate that that new firewall has availability zones and that the rules configuration matches that of the original.
  8. Confirm & wait for the maintenance window.
  9. Delete the Azure Firewall – yes, delete it.
  10. Restore the Azure Firewall from your modified JSON file.
  11. Validate the restore
  12. Celebrate – you have an Azure Firewall that supports multiple zones in the region.

Some of the Technical Bits

The processes of backing up and restoring the Azure Firewall are covered in my post here.

The backup is a JSON export of the original Azure Firewall, describing how to rebuild and re-configure it exactly as is – without support for availability zones. Open that JSON and make 2 changes.

The first change is to make sure that the API for deploying the Azure Firewall is up to date:

The next change is to instruct Azure which availability zones (numbered 1, 2, and 3) that you want to use for availability zones in the region:

And that’s that. When you deploy the modified JSON the new Azure Firewall will exist in all three zones.

Note that you can use this method to place an Azure Firewall into a single specific zone.

Costs Versus SLAs

A single zone Azure Firewall has a 99.95% SLA. Using 2 or 3 zones will increase the SLA to 99.99%. You might argue “what’s the point?”. I’ve witnessed a data center (actually, it was a single storage cluster) in an Azure region go down. That can have catastrophic results on a service. It’s rare but it’s bad. If you’re building a network where the Azure Firewall is the centre of secure, then it becomes mission critical and should, in my opinion, span availability zones, not for the contractual financial protections in an SLA but for protecting mission critical services.  That protection comes at a cost – you’ll now incur the micro-costs of data flows between zones in a region. From what I’ve seen so far, that’s a tiny number and a company that can afford a firewall will easily absorb that extra relatively low cost.

Working From Home – The Half Year Update

It’s July now, and it’s just over half a year (just over 7 months to be more accurate) since I started a new job where I work from home. Here are some thoughts and lessons from my experience.

The Change

In my previous job, I was mostly in the office. I had a certain amount of flexibility where I could work from home to get things done in peace and quiet, away from the noise of an open-plan office, but mostly I commuted (by car, because there is no public transport option from here to the outskirts of Dublin where most private sector employees work) with my wife every morning and evening. We worked together – having met at the office – we were “podmates” for years before she moved desks and we started going out. I was happy in that job, and then along came Innofactor Norway who offered me a job. A big thing for me was not to travel – we have a young family and I don’t want to miss life. It was agreed and I changed jobs, now working as a Principal Consultant, working mostly with Azure networking and security with large clients and some internal stuff too.

Work Environment

This is not my first time having a work-from-home job. I worked with a small company before where they eliminated the office from expenses to keep costs down. It suited everyone from the MD down that we could work from home or on the road – one salesman used to camp out in a hotel in west Dublin, order coffee all day, and use their WiFi while having meetings close to potential customers’ offices. I worked from my home in the midlands. I was single then and worked long hours. I remember one evening when I’d ordered a pizza. The poor delivery guy was stuck talking to me, who hadn’t spoke to anyone in a week, and was trying to get away from this overly chatty weirdo. I had a sort-of-office; it was a mess. I ended up getting into the habit of working from the sitting room – which was not good. It was uncomfortable and there were entertainment distractions.

And important part of this job was going to be the work environment. I have an office in the house, but had rarely used it. A lot of my writing work for Petri or even writing courses for Cloud Mechanix was actually done on the road, sometimes literally while in the car and waiting for my daughter outside ballet or gymnastics classes. The office needed work so I cleared it out completely, painted and redecorated, and then set up all the compute from scratch. A nice fast PC with dual monitors was isntalled. I added a smart TV from the new year’s sales to the wall for YouTube/conference streams, as well as an extra screen (the MS adapter is much better than built-in screen casting). For work I bought a new Surface Laptop 2 (black, 16 GB RAM, i7) with the Surface Dock. And the folks in my old job gave me a present of a StarTech 2 port KVM switch, to allow me to share the keyboard and monitors between my NUC PC and Surface Laptop. The Laptop is my primary machine with Microsoft Whiteboard and the pen being hugely important presenting tools. The desk in the above picture is gone – it was purchased 3 years ago as a temporary desk when we moved into our house. I replaced it with a white IKEA L-shaped desk because the old one was squeaky and annoying me during Teams calls. Next to be replaced will be the chair.

Teams & Browser

The two tools I use the most are Microsoft Teams and the browser.

Teams is the primary communications tool at work. Very little internal comms use email. Email tends to be for external comms or some sort of broadcast type of message. I work with another Irish MVP, Damian Flynn. We’ve worked on a couple of big projects together, which has been awesome, and it’s not unusual for us to have a window open to each other for most of the day, even if we’re working on different things. It’s like we’re “podmates” but with half the width of Ireland between us. I’ve really noticed the absence this week – Damian is on vacation and most of Norway is on vacation too. I’m starting my second week with barely any conversation during the workday.

Working in cloud, obviously the browser is important. I started off in Chrome. Then I discovered identity containers in Firefox and I switched. However, I had to use Edge too – our time keeping is (how we bill each hour) is done in CRM and it won’t work in a non-MS browser. I was having problems with Teams randomly dropping calls. I even was considering switching to Slack/Skype (consumer) for my work with Damian. Then I realised that I was running out of RAM and that was killing Teams calls. The culprit? Firefox was eating many GBs of RAM. I had started to play with the Chromium Edge preview (aka ChrEdge) and was impressed by the smoothness of it (particularly scrolling for reading). When I realised how it had implemented identities (which is not perfect) I made the switch. Now ChrEdge is the only browser running all of the time and Teams is not losing calls.

Of course, I do a lot of JSON, so VS Code is nearly always open all of the time too. With that, Git client, and Azure DevOps, I have a way of doing things at scale, repeatedly, and collaboratively.

Work Habits

I am quite strict with myself, as if I am in an office, even though “my office” is half way up the west coast of Norway. I’ve heard presenters on home working talk about getting on a bike or in a car to drive around the block and “come to the office” with a work mentality. I’ll be honest, I might not even put on pants in the morning (kidding!) but I do clock in at 8:00 every morning. By the way, Norwegians start work before most of Europe is even awake. I’m the “late riser” in the company. Sure it’s 9am Norwegian time when I start, but they’ve already been at work for hours when I start. I have a routine in the morning, getting the kids ready, having breakfast, and bringing a coffee into my office. I have a morning start routine when there isn’t an immediate call, and then I’m straight into it. My lunch routine involves a bike ride or walk, weather permitting, and a quick lunch, and then I work the rest of my hours. I try to work longer hours at the start of the week so I can finish early on Friday – it’s amazing how a 2 day weekend feels like 3 days when you can finish just a little bit earlier.

More Than Just Work

Being at home means I’m also here for my kids. I can pick up our youngest earlier from day care. Our eldest can be brought to training or football games – and for checkups after breaking her arm – without taking time off from work. All are nice perks from working from home and having flexible hours/employer. This week, my wife is away at Microsoft Inspire. Without the time waste of a commute, I can use that time to do work around the house and keep on top of things. Soon enough, our youngest will start pre-school and I will be able to bring her there and back again. And when school starts, she’ll be walked to the end of our road, where the school is, and back again. Even when things are normal, there are times when I’ll have dinner started before my wife gets home – my culinary knowledge and skills are limited so I won’t subject my poor family to me cooking every day!

I’ve mentioned that I am using my lunch break for exercise. I’m overweight – like many in the business – a byproduct of sitting 8-10 hours a day. I started cycling 2-3 times a week spring of last year and put the bike away for the winter. It came out again this spring, and I was on it 2 times per week. II needed more time on it, so I’ve started going out every day. Sometimes I’ll do a long/fast walk to break it up, giving me a chance to catch up on podcasts and audio books.

More Productive & Less Stress

With less time wasted in traffic jams and more time focused on doing productive work, I am sure that I am more productive at home. Study after study have documented how bad open plan offices are. In the quiet of my office, I can focus and get stuff done. In fact, this time lead to me doing some business changing work with Damian. I’m less stressed than ever too.

If you can find a way to work, have reliable broadband, and have an employer that has realized how a happier employee is a more productive one, then I couldn’t recommend this style of working enough.

 

Website

Understanding How Azure Application Gateway Works

In this post, I will explain how things such as frontend configurations, listeners, HTTP settings, probes, backend pools, and rules work together to enable service publication in the Azure Web Application Gateway (WAG)/Web Application Firewall (WAF).

Introduction

The WAF/WAG is a scary beast at first. When you open one up there are just so many settings to be tweaked. If you are publishing just a simple test HTTP server, it’s easy: you populate the default backend pool and things just start to work. But if you want HTTPS, or to service many pools/sites, then things get complicated. And frustratingly slow 🙂 – Things have improved in v1 and v2 is significantly faster to configure, although it has architectural limitations (force public IP address and lack of support for route tables) that prevent me from using v2 in my large network deployments. Hopefully, the above map and following text will simplify things by explaining what all the pieces do and how they work together.

The below is not feature complete, and things will change in the future. But for 99% of you, this should (hopefully) be helpful.

Backend Pool

The backend pool describes a set of machines/services that will work together. The members of a backend pool must be all of the same type from one of these types:

  • IP address/hostname: a common choice in large Azure deployments – you can span peering connections to other VNets
  • Virtual machine: Select a machine from the same VNet as the WAG/WAF
  • VMSS: Virtual machine scale sets in the same VNet as the WAG/WAF
  • App Services: In the same subscription as the WAG/WAF

From here on out, I’ll be using the term “web server” to describe the above.

Note that this are the machines that host your website/service. They will all run the same website/service. And you can configure an optional custom probe to test the availability of the service on these machines.

(Optional) Health Probe

You can create a HTTP/HTTPS probe to do deeper probe tests of a service running on a backend pool. The probe is configured for HTTP or HTTPS and tests a hostname on the web server. You specify a path on the website, a frequency, timeout and allowed number of retries before designating a web site on a web server as being unhealthy and no longer a candidate for load balancing.

HTTP Setting

The HTTP setting configures how the WAG/WAF will talk to the members of the backend pool. It does not configure how clients talk to the site (Listener). So anything you see below here is for configuring WAG/WAF to web server communications (see HTTPS).

  • Control cookie-based affinity for load balancing
  • Configure connection draining when a machine is removed from a backend pool
  • Specify if this is for a HTTP or a HTTPS connection to the webserver. This is for end-to-end encryption.
    • For HTTPS, you will upload a certificate that will match the web servers’ certificate.
  • The port that the web server is listening on.
  • Override the path
  • Override the hostname
  • Use a custom probe

Remember that the above HTTPS setting is not required for website to be published as SSL. It is only required to ensure that encryption continues from the WAG/WAF to the web servers.

Frontend IP Configuration

A WAG/WAF can have public or private frontend IP addresses – the variation depends on if you are using V1 (you have a choice on the mix) or V2 (you must use public and private). The public front end is a single public IP address used for publishing services publicly. The private frontend is a single virtual network address used for internal service publication, requiring virtual network connectivity (virtual network, VPN, ExpressRoute, etc).

The DNS records for your sites will point at the frontend IP address of the WAG/WAF. You can use third-party or Azure DNS – Azure DNS has the benefit of being hosted in every Azure region and in edge sites around the world so it is faster to resolve names than some DNS hoster with 3 servers in a single continent.

A single frontend can be shared by many sites. www.aidanfinn.com, www.cloudmechanix.com and www.joeeleway.com can all point to the same IP address. The hostname configuration that you have in the Listener will determine what happens to the incoming traffic afterwards.

Listener

A Listener is configured to listen for traffic destined to a particular hostname and port number and forward it, eventually, to the correct backend pool. There are two kinds of listener:

  • Basic: For very simple configurations where a site has exclusive ownership over a port number on one of the frontends. Typically this is for point solutions where a WAG/WAF is dedicated to a service.
  • Multi-Site: A listener shares a frontend configuration with other listeners, and is looking for traffic destined to a specific hostname/port/protocol.

Note that the Listner is where you place the certificate to secure client > WAG/WAF communications. This is known as SSL offloading. If you enable HTTPS you will place the “site certificate” on the WAG/WAF via the Listener. You can optionally re-encrypt traffic to the webserver from the WAG/WAF using the previously discussed HTTP Setting. WAGv2/WAFv2 have a no-support preview to use certs that are securely stored in Key Vault.

The configuration of a basic listener is:

  • Frontend
  • Frontend port
  • HTTP or HTTPS protocol
    • The certificate for securing client > WAG/WAF traffic
  • Optional custom error pages

The multi-site listener is adds an extra configuration: hostname. This is because now the listener is sharing the frontend and is only catching traffic for its website. So if I want 3 websites on my WAG/WAF sharing a frontend, I will have 3 x HTTPS listeners and maybe 3 x HTTP listeners.

Rules

A rule glues together the configuration. A basic rule is pretty easy:

  1. Traffic comes into a Listener
  2. The HTTP Setting determines how to forward that traffic to the backend pool
  3. The Backend Pool lists the web servers that host the site

A path-based rule allows you to extend your site across many backend pools. You might have a set of content for /media on pool1. Therefore all www.aidanfinn.com/media content is pulled from that pool1. All video content might be on www.aidanfinn.com/video, so you’ll redirect /video to pool2. And so on. And you can have individual HTTP settings for each redirection.

My Tips

  • There’s nothing like actually setting this up at scale to try this out. You will need a few DNS names to be able to work with.
  • Remember to enable the protection mode of WAF. I have audited deployments and found situations where people thought they had Layer-7 security but only had the default “alert-only” configuration of WAFv1.
  • In large environments, don’t forget to ensure that the NSGs protecting any webservers allow traffic in from the WAG/WAF’s subnet into the web servers on the port(s) specified in the HTTP Setting(s). Also ensure that any guest OS firewall is similarly configured.
  • Possibly the biggest issue you will have is with devs not assigning hostnames to websites in their webservers. If you’re using shared WAGs/WAFs you must use multi-site listeners and the websites should be configured with the hostname.
  • And the biggest tip I can give is to work out a naming standard for each of the above components so you know what piece is associated with what site. I can’t share what we’re using at work, but we have some big configurations and they are very easy to troubleshoot because of how we have named things.

Azure Lighthouse–Enabling Centralized Management of Many Azure Tenants

In this post, I will discuss a new feature to Azure called Lighthouse. With this service, you can delegate permissions to “customer” Azure deployments across many Azure tenants to staff in a central organization such as corporate IT or a managed service provider (Microsoft partner).

The wonderful picture of the Hook Head Lighthouse from my home county of Wexford (Ireland) is by Ana & Michal. The building dates back to just after the Norman invasion of Ireland and is the second oldest operating lighthouse in the world.

Here are some useful web links:

In short what you get with Lighthouse is a way to see/manage, within the scope of the permissions of your assigned role, deployments across many tenants. This solves a major problem with Azure. Microsoft is a partner-driven engine. Without partners, Microsoft is nothing. There’s a myth that Microsoft offers managed services in the cloud: that’s a fiction created by those that don’t know much about the actual delivery of Microsoft services. Partners deliver, managed, and provide the primary forms of support for Microsoft services for Microsoft’s customers. However, Azure had a major problem – each customer is a tenant and until last night, there was no good way to bring those customers under the umbrella of a single management system. You had hacks such as guest user access which didn’t unify management – native management tools were restricted to the boundaries of the single customer tenant. And third-party tools – sorry but they’ll not keep up with the pace of Azure.

Last night, Microsoft made Lighthouse available to everyone (not just partners which the headlines will suggest!). With a little up-front work, you can quickly and easily grant/request access to deployments or subscriptions in other tenants (internal or external customers) and have easy/quick/secure single-sign-on access from your own tenant. What does that look like? You sign in once, with your work account, ideally using MFA (a requirement now for CSP partners). And then you can see everything – every tenant, every subscription, every resource group that you have been granted access to. You can use Activity Log, Log Analytics Workspace, Security Center, Azure Monitor across every single resource that you can see.

The mechanics of this are pretty flexible. An “offer” can be made in one of two ways to a customer:

  • JSON: You describe your organization and who will have what access. The JSON is deployed in the customer subscription and access is granted after a few moments – it took a couple of minutes for my first run to work.
  • Azure Marketplace: You can advertise an offer in the Azure Marketplace. Note that a Marketplace offer can be private.

An offer is made up of a description of:

  • The service you are offering: the name, your tenant (service provider)
  • The authorizations: who or what will have access, and what roles (from your tenant) can be used. Owner is explicitly blocked by Microsoft.

Here’s a simple example of a JSON deployment where a group from the service provider-tenant will be granted Contributor Access to the customer subscription.

I need to gather a bit of information:

  • mspName: The name of the managed services provider. Note that this is a label.
  • mspOfferDescription: The name of the service being offered.
  • managedByTenantId: The Directory ID of the managed services provider (Azure Portal > Azure Active Directory > Properties > Directory ID)
  • Authorizations: A description of each entity (user/group/service principal) from the MSP tenant being granted access to the customer deployment
    • principalId: The ID of the user, group, or service principal. Remember – groups are best!
    • principalIdDisplayName: A label for the current principal – what you want to describe this principal as for your customer to see
    • roleDefinitionId: The GUID of the role that will grant permissions to the principal, e.g. Contributor. PowerShell > (Get-AzRoleDefinition -Name ‘<roleName>’).id

Armed with that information you can populate the fields in a JSON parameters file for delegating access to a subscription. Here’s a simple example:

And then you can deploy the above with the JSON file for delegating access to a subscription:

  1. Sign into the customer tenant using PowerShell
  2. Run the following:

Give it a few minutes and things will be in place:

  • The service provider will appear in Service Providers in the Azure Portal for the customer.
  • The customer will appear in My Customers in the Azure Portal for the service provider.
  • Anyone from the subscriber’s tenant in the scope of the authorization (.e.g. a member of a listed group) will have access to the customer’s subscription described by the role (roleDefintionId)
  • Any delegated admins from the service provider can see, operate. manage the customers’ resources in the Azure Portal, Azure tools, CLI/PowerShell, etc, as if they were in the same tenant as the service provider.

Once deployed, things appear to be pretty seamless – but it is early days and I am sure that we will see weirdness over time.

The customer can fire the service provider by deleting the delegation from Service Providers. I have not found a way for the service provider to fire the customer yet.

Backing Up Azure Firewall

In this post, I will outline how you can back up your Azure Firewall, enabling you to rebuild it in case it is accidentally/maliciously deleted or re-configured by an authorized person.

With the Azure Firewall adding new features, we should expect more customers to start using it. And if you are using it like I do with my customers, it’s the centre of everything and it can quickly contain a lot of collections/rules which took a long time to write.

Wait – what new features? Obviously, Threat Detection (using the MS security graph) is killer, but support for up to 100 public IP addresses was announced and is imminent, availability zones are there now for this mission critical service, application rule FQDN support was added for SQL databases, and HD Insight tags are in preview.

So back on topic: how do I backup Azure Firewall? It’s actually pretty simple. You will need to retrieve your firewall’s resource ID:


Then you will export a JSON copy of the firewall:


And that’s the guts of it! To do a restore you simply redeploy the JSON file to the resource group:


I’ve tested a delete and restore and it works. The magic here is using -SkipAllParameterization in the resource export to make the JSON file recreate exactly what was lost at the time of the backup/export.

If you wanted to get clever you could wrap up the backup cmdlets in an Azure Automation script. Add some lines to copy the alter the backup file name (date/time), and copy the backup to blob storage in a GPv2 storage account (with Lifecycle Management for automatic blob tiering and a protection policy to prevent deletion). And then you would schedule to the automation to run every day.

Azure Bastion For Secure SSH/RDP in Preview

Microsoft has announced a new preview of a platform-based jumpbox called Azure Bastion for providing secure RDP or SSH connections to virtual machines running or hosted in Azure.

Secure Remote Connections

Most people that are using The Cloud are using virtual machines, and one of the great challenges for them is secure remote access. You need RDP or SSH to be able to run these machines in the real world.

Remember: for 99.9% of customers, servers are not cattle, they are sacred cows.

Just opening up RDP or SSH straight through a public IP address is bad – hopefully you have an NSG in place, but even that’s bad. If you enable Standard Tier Security Center, the alerts will let you know how bad pretty quickly. And if the recent scare about the RDP vulnerability didn’t wake you up to this, then maybe you deserve to have someone else’s bot farm or a bitcoin mine running in your network.

There are ways that you can secure things, but they all have the pluses and minuses.

VPN

The real reason that we have point-to-site VPN in Azure virtual network gateway was as an admin entry point to the virtual network.

The clue is in the maximum number of simultaneous connections which is 128, way too low to consider as an end user solution for a Fortune 1000, who Microsoft really do their planning for.

If you have supported end user VPN then you know that it’s right up there with password resets for helpdesk ticket numbers, even with IT people like developers. Don’t go here – it won’t end well.

Just-in-Time VM Access

JIT VM Access is a feature of Security Center Standard Tier. It modifies your NSG rules to deny managed protocols such as RDP/SSH (the deny rules are stupidly made as low priority so they don’t override any allow rules!).

When you need to remote onto a VM, an NSG rule is added for a managed amount of time to allow remote access via the selected protocol from a specific source IP address.

So, if it’s all set up right, you deny remote access to virtual machines most of the time. But you will open direct access. And the way JIT VM Access manages the rules now is wonky, so I would not trust it.

An RDP Jumpbox

This is an old method – a single virtual machine, or maybe a few of them, are made available for direct access. They are isolated into a dedicated subnet. You remote into a jumpbox, and from there, you remote into one of your application/data virtual machines.

Unfortunately, it’s still straight RDP/SSH into a machine that is directly accessible on the Internet. So in the remoting protocol vulnerability scenario, you are still vulnerable at the application layer. You could combine JIT VM Access, but now normal daily operations are going to be a drag and I guarantee you that people will invest time to undermine network security. Also, you are limited to 2 RDS connections per jumpbox without investing in a larger RDS (machines + licensing) solution.

Guacamole

This one is relatively new to me. At first it looked awesome. It’s a HTTPS-based service that allows you to proxy into Linux or Windows virtual machines via RDP or SSH.

All looked good until you started running Windows Server 2016 or later in your virtual machines and you needed NLA for secure connections via RDP. Then it all fell apart. The solution requires you to either disable NLA in the guest OS (boo!) or to hard code a username/password with local logon rights for your guest OS’s into the Guacamole server (double-boo!).

Azure Bastion

In case you don’t know this, a bastion host is another name for a jumpbox – an isolated machine that you bounce through. In this case, Bastion is a service that is accessible via the Azure Portal. You sign into the portal, click Connect and use the Bastion service to connect to a Linux or Windows virtual machine via SSH/RDP in the Portal. The virtual machine does not require a public IP address or a “NAT rule”, but it’s still SSH/RDP.

Azure Bastion

On the downside:

  • There’s no multi-factor authentication (MFA)
  • It requires that you sign into the Azure Portal – many people running in the guest OS might not even have those rights!
  • VNet peering is not supported – so larger enterprises are ruled out here … no one in their right mind will deploy 500 bastion hosts (one per VNet) in a large enterprise.

Microsoft did say that these things will be worked on, but when? After GA, which based on the time of year I guess will be just before/after Ignite in early November?

In my opinion, Bastion is the right idea, but more of the backlog should have been included in the minimal viable product.

A Gateway to a Better Solution

If you are a Citrix or a RDS person then you’ve been screaming for the last 5 minutes. Because you’ve been using something for years that most people still don’t know is possible. Both Citrix and RDS have the concept of an SSL gateway.

In the case of RDS, we can deploy one or more (load balanced) Windows Server virtual machines with the RDS Gateway role. If we combine that with NPS and Azure AD, we can also add MFA. With a simple tweak to the Remote Desktop Connection client (MSTSC.EXE), we can RDP to a Windows machine behind the RDS Gateway. The connection from the client to the gateway is pre-authenticated, x.509 certificate protected, HTTPS traffic encapsulating the RDP stream. That connection terminates at the RDS Gateway and then forwards as RDS to the desired Windows Server virtual machine behind it.

Unlike the previous jumpbox solution:

  • This can be a low-end machine, such as a B-Series.
  • It can scale out using a load balancer
  • Many people can relay through a single jumpbox machine.
  • You won’t need RDS licensing at all, not even to scale out to more than 2 users per gateway machine.

So – there’s no SSH here. So Linux is a problem.

Opinion

We don’t really have a complete solution right now. Azure Bastion probably will be the best one in the long-run, but it has so many missing features that I couldn’t consider it now. For Windows, an RDS Gateway is probably best, and for Linux, a Guacamole server might be best.

What do you think?

Webinar – Getting More Performance From Azure VMs

I will be doing a webinar later today for the European SharePoint Office 365 & Azure Community (from the like-named conference). The webinar is at 14:00 UK/Irish, 15:00 CET, and 09:00 EST. Registration is here.

Title: Getting More Performance from Azure Virtual Machines

Speaker: Aidan Finn, MVP, Ireland

Date and Time: Wed, May 1, 2019 3:00 PM – 4:00 PM CEST

Webinar Description:  You’ve deployed your shiny new application in the cloud, and all that pride crashes down when developers and users start to complain that it’s slow. How do you fix it? In this session you’ll learn to understand what Azure virtual machines can offer, how to pick the right ones for the right job, and how to design for the best possible performance, including networking, storage, processor, and GPU.

Key benefits of attending:
– Understand virtual machine design
– Optimise storage performance
– Get more from Azure networking