Migrating Azure Firewall To Availability Zones

Microsoft recently added support for availability zones to Azure firewall in regions that offer this higher level of SLA. In this post, I will explain how you can convert an existing Azure Firewall to availability zones.

Before We Proceed

There are two things you need to understand:

  1. If you have already deployed and configured Azure Firewall then there is no easy switch to turn on availability zones. What I will be showing is actually a re-creation.
  2. You should do a “dress rehearsal” – test this process and validate the results before you do the actual migration.

The Process

The process you will do will go as follows:

  1. Plan a maintenance window when the Azure Firewall (and dependent communications) will be unavailable for 1 or 2 hours. Really, this should be very quick but, as Scotty told Geordi La Forge, a good engineer overestimates the effort, leaves room for the unexpected, and hopefully looks like a hero if all goes to the unspoken plan.
  2. Freeze configuration changes to the Azure Firewall.
  3. Perform a backup of the Azure Firewall.
  4. Create a test environment in Azure – ideally a dedicated subscription/virtual network(s) minus the Azure Firewall (see the next step).
  5. Modify the JSON file to include support for availability zones.
  6. Restore the Azure Firewall backup as a new firewall in the test environment.
  7. Validate that that new firewall has availability zones and that the rules configuration matches that of the original.
  8. Confirm & wait for the maintenance window.
  9. Delete the Azure Firewall – yes, delete it.
  10. Restore the Azure Firewall from your modified JSON file.
  11. Validate the restore
  12. Celebrate – you have an Azure Firewall that supports multiple zones in the region.

Some of the Technical Bits

The processes of backing up and restoring the Azure Firewall are covered in my post here.

The backup is a JSON export of the original Azure Firewall, describing how to rebuild and re-configure it exactly as is – without support for availability zones. Open that JSON and make 2 changes.

The first change is to make sure that the API for deploying the Azure Firewall is up to date:

The next change is to instruct Azure which availability zones (numbered 1, 2, and 3) that you want to use for availability zones in the region:

And that’s that. When you deploy the modified JSON the new Azure Firewall will exist in all three zones.

Note that you can use this method to place an Azure Firewall into a single specific zone.

Costs Versus SLAs

A single zone Azure Firewall has a 99.95% SLA. Using 2 or 3 zones will increase the SLA to 99.99%. You might argue “what’s the point?”. I’ve witnessed a data center (actually, it was a single storage cluster) in an Azure region go down. That can have catastrophic results on a service. It’s rare but it’s bad. If you’re building a network where the Azure Firewall is the centre of secure, then it becomes mission critical and should, in my opinion, span availability zones, not for the contractual financial protections in an SLA but for protecting mission critical services.  That protection comes at a cost – you’ll now incur the micro-costs of data flows between zones in a region. From what I’ve seen so far, that’s a tiny number and a company that can afford a firewall will easily absorb that extra relatively low cost.

Working From Home – The Half Year Update

It’s July now, and it’s just over half a year (just over 7 months to be more accurate) since I started a new job where I work from home. Here are some thoughts and lessons from my experience.

The Change

In my previous job, I was mostly in the office. I had a certain amount of flexibility where I could work from home to get things done in peace and quiet, away from the noise of an open-plan office, but mostly I commuted (by car, because there is no public transport option from here to the outskirts of Dublin where most private sector employees work) with my wife every morning and evening. We worked together – having met at the office – we were “podmates” for years before she moved desks and we started going out. I was happy in that job, and then along came Innofactor Norway who offered me a job. A big thing for me was not to travel – we have a young family and I don’t want to miss life. It was agreed and I changed jobs, now working as a Principal Consultant, working mostly with Azure networking and security with large clients and some internal stuff too.

Work Environment

This is not my first time having a work-from-home job. I worked with a small company before where they eliminated the office from expenses to keep costs down. It suited everyone from the MD down that we could work from home or on the road – one salesman used to camp out in a hotel in west Dublin, order coffee all day, and use their WiFi while having meetings close to potential customers’ offices. I worked from my home in the midlands. I was single then and worked long hours. I remember one evening when I’d ordered a pizza. The poor delivery guy was stuck talking to me, who hadn’t spoke to anyone in a week, and was trying to get away from this overly chatty weirdo. I had a sort-of-office; it was a mess. I ended up getting into the habit of working from the sitting room – which was not good. It was uncomfortable and there were entertainment distractions.

And important part of this job was going to be the work environment. I have an office in the house, but had rarely used it. A lot of my writing work for Petri or even writing courses for Cloud Mechanix was actually done on the road, sometimes literally while in the car and waiting for my daughter outside ballet or gymnastics classes. The office needed work so I cleared it out completely, painted and redecorated, and then set up all the compute from scratch. A nice fast PC with dual monitors was isntalled. I added a smart TV from the new year’s sales to the wall for YouTube/conference streams, as well as an extra screen (the MS adapter is much better than built-in screen casting). For work I bought a new Surface Laptop 2 (black, 16 GB RAM, i7) with the Surface Dock. And the folks in my old job gave me a present of a StarTech 2 port KVM switch, to allow me to share the keyboard and monitors between my NUC PC and Surface Laptop. The Laptop is my primary machine with Microsoft Whiteboard and the pen being hugely important presenting tools. The desk in the above picture is gone – it was purchased 3 years ago as a temporary desk when we moved into our house. I replaced it with a white IKEA L-shaped desk because the old one was squeaky and annoying me during Teams calls. Next to be replaced will be the chair.

Teams & Browser

The two tools I use the most are Microsoft Teams and the browser.

Teams is the primary communications tool at work. Very little internal comms use email. Email tends to be for external comms or some sort of broadcast type of message. I work with another Irish MVP, Damian Flynn. We’ve worked on a couple of big projects together, which has been awesome, and it’s not unusual for us to have a window open to each other for most of the day, even if we’re working on different things. It’s like we’re “podmates” but with half the width of Ireland between us. I’ve really noticed the absence this week – Damian is on vacation and most of Norway is on vacation too. I’m starting my second week with barely any conversation during the workday.

Working in cloud, obviously the browser is important. I started off in Chrome. Then I discovered identity containers in Firefox and I switched. However, I had to use Edge too – our time keeping is (how we bill each hour) is done in CRM and it won’t work in a non-MS browser. I was having problems with Teams randomly dropping calls. I even was considering switching to Slack/Skype (consumer) for my work with Damian. Then I realised that I was running out of RAM and that was killing Teams calls. The culprit? Firefox was eating many GBs of RAM. I had started to play with the Chromium Edge preview (aka ChrEdge) and was impressed by the smoothness of it (particularly scrolling for reading). When I realised how it had implemented identities (which is not perfect) I made the switch. Now ChrEdge is the only browser running all of the time and Teams is not losing calls.

Of course, I do a lot of JSON, so VS Code is nearly always open all of the time too. With that, Git client, and Azure DevOps, I have a way of doing things at scale, repeatedly, and collaboratively.

Work Habits

I am quite strict with myself, as if I am in an office, even though “my office” is half way up the west coast of Norway. I’ve heard presenters on home working talk about getting on a bike or in a car to drive around the block and “come to the office” with a work mentality. I’ll be honest, I might not even put on pants in the morning (kidding!) but I do clock in at 8:00 every morning. By the way, Norwegians start work before most of Europe is even awake. I’m the “late riser” in the company. Sure it’s 9am Norwegian time when I start, but they’ve already been at work for hours when I start. I have a routine in the morning, getting the kids ready, having breakfast, and bringing a coffee into my office. I have a morning start routine when there isn’t an immediate call, and then I’m straight into it. My lunch routine involves a bike ride or walk, weather permitting, and a quick lunch, and then I work the rest of my hours. I try to work longer hours at the start of the week so I can finish early on Friday – it’s amazing how a 2 day weekend feels like 3 days when you can finish just a little bit earlier.

More Than Just Work

Being at home means I’m also here for my kids. I can pick up our youngest earlier from day care. Our eldest can be brought to training or football games – and for checkups after breaking her arm – without taking time off from work. All are nice perks from working from home and having flexible hours/employer. This week, my wife is away at Microsoft Inspire. Without the time waste of a commute, I can use that time to do work around the house and keep on top of things. Soon enough, our youngest will start pre-school and I will be able to bring her there and back again. And when school starts, she’ll be walked to the end of our road, where the school is, and back again. Even when things are normal, there are times when I’ll have dinner started before my wife gets home – my culinary knowledge and skills are limited so I won’t subject my poor family to me cooking every day!

I’ve mentioned that I am using my lunch break for exercise. I’m overweight – like many in the business – a byproduct of sitting 8-10 hours a day. I started cycling 2-3 times a week spring of last year and put the bike away for the winter. It came out again this spring, and I was on it 2 times per week. II needed more time on it, so I’ve started going out every day. Sometimes I’ll do a long/fast walk to break it up, giving me a chance to catch up on podcasts and audio books.

More Productive & Less Stress

With less time wasted in traffic jams and more time focused on doing productive work, I am sure that I am more productive at home. Study after study have documented how bad open plan offices are. In the quiet of my office, I can focus and get stuff done. In fact, this time lead to me doing some business changing work with Damian. I’m less stressed than ever too.

If you can find a way to work, have reliable broadband, and have an employer that has realized how a happier employee is a more productive one, then I couldn’t recommend this style of working enough.

 

Website

Understanding How Azure Application Gateway Works

In this post, I will explain how things such as frontend configurations, listeners, HTTP settings, probes, backend pools, and rules work together to enable service publication in the Azure Web Application Gateway (WAG)/Web Application Firewall (WAF).

Introduction

The WAF/WAG is a scary beast at first. When you open one up there are just so many settings to be tweaked. If you are publishing just a simple test HTTP server, it’s easy: you populate the default backend pool and things just start to work. But if you want HTTPS, or to service many pools/sites, then things get complicated. And frustratingly slow 🙂 – Things have improved in v1 and v2 is significantly faster to configure, although it has architectural limitations (force public IP address and lack of support for route tables) that prevent me from using v2 in my large network deployments. Hopefully, the above map and following text will simplify things by explaining what all the pieces do and how they work together.

The below is not feature complete, and things will change in the future. But for 99% of you, this should (hopefully) be helpful.

Backend Pool

The backend pool describes a set of machines/services that will work together. The members of a backend pool must be all of the same type from one of these types:

  • IP address/hostname: a common choice in large Azure deployments – you can span peering connections to other VNets
  • Virtual machine: Select a machine from the same VNet as the WAG/WAF
  • VMSS: Virtual machine scale sets in the same VNet as the WAG/WAF
  • App Services: In the same subscription as the WAG/WAF

From here on out, I’ll be using the term “web server” to describe the above.

Note that this are the machines that host your website/service. They will all run the same website/service. And you can configure an optional custom probe to test the availability of the service on these machines.

(Optional) Health Probe

You can create a HTTP/HTTPS probe to do deeper probe tests of a service running on a backend pool. The probe is configured for HTTP or HTTPS and tests a hostname on the web server. You specify a path on the website, a frequency, timeout and allowed number of retries before designating a web site on a web server as being unhealthy and no longer a candidate for load balancing.

HTTP Setting

The HTTP setting configures how the WAG/WAF will talk to the members of the backend pool. It does not configure how clients talk to the site (Listener). So anything you see below here is for configuring WAG/WAF to web server communications (see HTTPS).

  • Control cookie-based affinity for load balancing
  • Configure connection draining when a machine is removed from a backend pool
  • Specify if this is for a HTTP or a HTTPS connection to the webserver. This is for end-to-end encryption.
    • For HTTPS, you will upload a certificate that will match the web servers’ certificate.
  • The port that the web server is listening on.
  • Override the path
  • Override the hostname
  • Use a custom probe

Remember that the above HTTPS setting is not required for website to be published as SSL. It is only required to ensure that encryption continues from the WAG/WAF to the web servers.

Frontend IP Configuration

A WAG/WAF can have public or private frontend IP addresses – the variation depends on if you are using V1 (you have a choice on the mix) or V2 (you must use public and private). The public front end is a single public IP address used for publishing services publicly. The private frontend is a single virtual network address used for internal service publication, requiring virtual network connectivity (virtual network, VPN, ExpressRoute, etc).

The DNS records for your sites will point at the frontend IP address of the WAG/WAF. You can use third-party or Azure DNS – Azure DNS has the benefit of being hosted in every Azure region and in edge sites around the world so it is faster to resolve names than some DNS hoster with 3 servers in a single continent.

A single frontend can be shared by many sites. www.aidanfinn.com, www.cloudmechanix.com and www.joeeleway.com can all point to the same IP address. The hostname configuration that you have in the Listener will determine what happens to the incoming traffic afterwards.

Listener

A Listener is configured to listen for traffic destined to a particular hostname and port number and forward it, eventually, to the correct backend pool. There are two kinds of listener:

  • Basic: For very simple configurations where a site has exclusive ownership over a port number on one of the frontends. Typically this is for point solutions where a WAG/WAF is dedicated to a service.
  • Multi-Site: A listener shares a frontend configuration with other listeners, and is looking for traffic destined to a specific hostname/port/protocol.

Note that the Listner is where you place the certificate to secure client > WAG/WAF communications. This is known as SSL offloading. If you enable HTTPS you will place the “site certificate” on the WAG/WAF via the Listener. You can optionally re-encrypt traffic to the webserver from the WAG/WAF using the previously discussed HTTP Setting. WAGv2/WAFv2 have a no-support preview to use certs that are securely stored in Key Vault.

The configuration of a basic listener is:

  • Frontend
  • Frontend port
  • HTTP or HTTPS protocol
    • The certificate for securing client > WAG/WAF traffic
  • Optional custom error pages

The multi-site listener is adds an extra configuration: hostname. This is because now the listener is sharing the frontend and is only catching traffic for its website. So if I want 3 websites on my WAG/WAF sharing a frontend, I will have 3 x HTTPS listeners and maybe 3 x HTTP listeners.

Rules

A rule glues together the configuration. A basic rule is pretty easy:

  1. Traffic comes into a Listener
  2. The HTTP Setting determines how to forward that traffic to the backend pool
  3. The Backend Pool lists the web servers that host the site

A path-based rule allows you to extend your site across many backend pools. You might have a set of content for /media on pool1. Therefore all www.aidanfinn.com/media content is pulled from that pool1. All video content might be on www.aidanfinn.com/video, so you’ll redirect /video to pool2. And so on. And you can have individual HTTP settings for each redirection.

My Tips

  • There’s nothing like actually setting this up at scale to try this out. You will need a few DNS names to be able to work with.
  • Remember to enable the protection mode of WAF. I have audited deployments and found situations where people thought they had Layer-7 security but only had the default “alert-only” configuration of WAFv1.
  • In large environments, don’t forget to ensure that the NSGs protecting any webservers allow traffic in from the WAG/WAF’s subnet into the web servers on the port(s) specified in the HTTP Setting(s). Also ensure that any guest OS firewall is similarly configured.
  • Possibly the biggest issue you will have is with devs not assigning hostnames to websites in their webservers. If you’re using shared WAGs/WAFs you must use multi-site listeners and the websites should be configured with the hostname.
  • And the biggest tip I can give is to work out a naming standard for each of the above components so you know what piece is associated with what site. I can’t share what we’re using at work, but we have some big configurations and they are very easy to troubleshoot because of how we have named things.

Azure Lighthouse–Enabling Centralized Management of Many Azure Tenants

In this post, I will discuss a new feature to Azure called Lighthouse. With this service, you can delegate permissions to “customer” Azure deployments across many Azure tenants to staff in a central organization such as corporate IT or a managed service provider (Microsoft partner).

The wonderful picture of the Hook Head Lighthouse from my home county of Wexford (Ireland) is by Ana & Michal. The building dates back to just after the Norman invasion of Ireland and is the second oldest operating lighthouse in the world.

Here are some useful web links:

In short what you get with Lighthouse is a way to see/manage, within the scope of the permissions of your assigned role, deployments across many tenants. This solves a major problem with Azure. Microsoft is a partner-driven engine. Without partners, Microsoft is nothing. There’s a myth that Microsoft offers managed services in the cloud: that’s a fiction created by those that don’t know much about the actual delivery of Microsoft services. Partners deliver, managed, and provide the primary forms of support for Microsoft services for Microsoft’s customers. However, Azure had a major problem – each customer is a tenant and until last night, there was no good way to bring those customers under the umbrella of a single management system. You had hacks such as guest user access which didn’t unify management – native management tools were restricted to the boundaries of the single customer tenant. And third-party tools – sorry but they’ll not keep up with the pace of Azure.

Last night, Microsoft made Lighthouse available to everyone (not just partners which the headlines will suggest!). With a little up-front work, you can quickly and easily grant/request access to deployments or subscriptions in other tenants (internal or external customers) and have easy/quick/secure single-sign-on access from your own tenant. What does that look like? You sign in once, with your work account, ideally using MFA (a requirement now for CSP partners). And then you can see everything – every tenant, every subscription, every resource group that you have been granted access to. You can use Activity Log, Log Analytics Workspace, Security Center, Azure Monitor across every single resource that you can see.

The mechanics of this are pretty flexible. An “offer” can be made in one of two ways to a customer:

  • JSON: You describe your organization and who will have what access. The JSON is deployed in the customer subscription and access is granted after a few moments – it took a couple of minutes for my first run to work.
  • Azure Marketplace: You can advertise an offer in the Azure Marketplace. Note that a Marketplace offer can be private.

An offer is made up of a description of:

  • The service you are offering: the name, your tenant (service provider)
  • The authorizations: who or what will have access, and what roles (from your tenant) can be used. Owner is explicitly blocked by Microsoft.

Here’s a simple example of a JSON deployment where a group from the service provider-tenant will be granted Contributor Access to the customer subscription.

I need to gather a bit of information:

  • mspName: The name of the managed services provider. Note that this is a label.
  • mspOfferDescription: The name of the service being offered.
  • managedByTenantId: The Directory ID of the managed services provider (Azure Portal > Azure Active Directory > Properties > Directory ID)
  • Authorizations: A description of each entity (user/group/service principal) from the MSP tenant being granted access to the customer deployment
    • principalId: The ID of the user, group, or service principal. Remember – groups are best!
    • principalIdDisplayName: A label for the current principal – what you want to describe this principal as for your customer to see
    • roleDefinitionId: The GUID of the role that will grant permissions to the principal, e.g. Contributor. PowerShell > (Get-AzRoleDefinition -Name ‘<roleName>’).id

Armed with that information you can populate the fields in a JSON parameters file for delegating access to a subscription. Here’s a simple example:

And then you can deploy the above with the JSON file for delegating access to a subscription:

  1. Sign into the customer tenant using PowerShell
  2. Run the following:

Give it a few minutes and things will be in place:

  • The service provider will appear in Service Providers in the Azure Portal for the customer.
  • The customer will appear in My Customers in the Azure Portal for the service provider.
  • Anyone from the subscriber’s tenant in the scope of the authorization (.e.g. a member of a listed group) will have access to the customer’s subscription described by the role (roleDefintionId)
  • Any delegated admins from the service provider can see, operate. manage the customers’ resources in the Azure Portal, Azure tools, CLI/PowerShell, etc, as if they were in the same tenant as the service provider.

Once deployed, things appear to be pretty seamless – but it is early days and I am sure that we will see weirdness over time.

The customer can fire the service provider by deleting the delegation from Service Providers. I have not found a way for the service provider to fire the customer yet.

Backing Up Azure Firewall

In this post, I will outline how you can back up your Azure Firewall, enabling you to rebuild it in case it is accidentally/maliciously deleted or re-configured by an authorized person.

With the Azure Firewall adding new features, we should expect more customers to start using it. And if you are using it like I do with my customers, it’s the centre of everything and it can quickly contain a lot of collections/rules which took a long time to write.

Wait – what new features? Obviously, Threat Detection (using the MS security graph) is killer, but support for up to 100 public IP addresses was announced and is imminent, availability zones are there now for this mission critical service, application rule FQDN support was added for SQL databases, and HD Insight tags are in preview.

So back on topic: how do I backup Azure Firewall? It’s actually pretty simple. You will need to retrieve your firewall’s resource ID:


Then you will export a JSON copy of the firewall:


And that’s the guts of it! To do a restore you simply redeploy the JSON file to the resource group:


I’ve tested a delete and restore and it works. The magic here is using -SkipAllParameterization in the resource export to make the JSON file recreate exactly what was lost at the time of the backup/export.

If you wanted to get clever you could wrap up the backup cmdlets in an Azure Automation script. Add some lines to copy the alter the backup file name (date/time), and copy the backup to blob storage in a GPv2 storage account (with Lifecycle Management for automatic blob tiering and a protection policy to prevent deletion). And then you would schedule to the automation to run every day.

Azure Bastion For Secure SSH/RDP in Preview

Microsoft has announced a new preview of a platform-based jumpbox called Azure Bastion for providing secure RDP or SSH connections to virtual machines running or hosted in Azure.

Secure Remote Connections

Most people that are using The Cloud are using virtual machines, and one of the great challenges for them is secure remote access. You need RDP or SSH to be able to run these machines in the real world.

Remember: for 99.9% of customers, servers are not cattle, they are sacred cows.

Just opening up RDP or SSH straight through a public IP address is bad – hopefully you have an NSG in place, but even that’s bad. If you enable Standard Tier Security Center, the alerts will let you know how bad pretty quickly. And if the recent scare about the RDP vulnerability didn’t wake you up to this, then maybe you deserve to have someone else’s bot farm or a bitcoin mine running in your network.

There are ways that you can secure things, but they all have the pluses and minuses.

VPN

The real reason that we have point-to-site VPN in Azure virtual network gateway was as an admin entry point to the virtual network.

The clue is in the maximum number of simultaneous connections which is 128, way too low to consider as an end user solution for a Fortune 1000, who Microsoft really do their planning for.

If you have supported end user VPN then you know that it’s right up there with password resets for helpdesk ticket numbers, even with IT people like developers. Don’t go here – it won’t end well.

Just-in-Time VM Access

JIT VM Access is a feature of Security Center Standard Tier. It modifies your NSG rules to deny managed protocols such as RDP/SSH (the deny rules are stupidly made as low priority so they don’t override any allow rules!).

When you need to remote onto a VM, an NSG rule is added for a managed amount of time to allow remote access via the selected protocol from a specific source IP address.

So, if it’s all set up right, you deny remote access to virtual machines most of the time. But you will open direct access. And the way JIT VM Access manages the rules now is wonky, so I would not trust it.

An RDP Jumpbox

This is an old method – a single virtual machine, or maybe a few of them, are made available for direct access. They are isolated into a dedicated subnet. You remote into a jumpbox, and from there, you remote into one of your application/data virtual machines.

Unfortunately, it’s still straight RDP/SSH into a machine that is directly accessible on the Internet. So in the remoting protocol vulnerability scenario, you are still vulnerable at the application layer. You could combine JIT VM Access, but now normal daily operations are going to be a drag and I guarantee you that people will invest time to undermine network security. Also, you are limited to 2 RDS connections per jumpbox without investing in a larger RDS (machines + licensing) solution.

Guacamole

This one is relatively new to me. At first it looked awesome. It’s a HTTPS-based service that allows you to proxy into Linux or Windows virtual machines via RDP or SSH.

All looked good until you started running Windows Server 2016 or later in your virtual machines and you needed NLA for secure connections via RDP. Then it all fell apart. The solution requires you to either disable NLA in the guest OS (boo!) or to hard code a username/password with local logon rights for your guest OS’s into the Guacamole server (double-boo!).

Azure Bastion

In case you don’t know this, a bastion host is another name for a jumpbox – an isolated machine that you bounce through. In this case, Bastion is a service that is accessible via the Azure Portal. You sign into the portal, click Connect and use the Bastion service to connect to a Linux or Windows virtual machine via SSH/RDP in the Portal. The virtual machine does not require a public IP address or a “NAT rule”, but it’s still SSH/RDP.

Azure Bastion

On the downside:

  • There’s no multi-factor authentication (MFA)
  • It requires that you sign into the Azure Portal – many people running in the guest OS might not even have those rights!
  • VNet peering is not supported – so larger enterprises are ruled out here … no one in their right mind will deploy 500 bastion hosts (one per VNet) in a large enterprise.

Microsoft did say that these things will be worked on, but when? After GA, which based on the time of year I guess will be just before/after Ignite in early November?

In my opinion, Bastion is the right idea, but more of the backlog should have been included in the minimal viable product.

A Gateway to a Better Solution

If you are a Citrix or a RDS person then you’ve been screaming for the last 5 minutes. Because you’ve been using something for years that most people still don’t know is possible. Both Citrix and RDS have the concept of an SSL gateway.

In the case of RDS, we can deploy one or more (load balanced) Windows Server virtual machines with the RDS Gateway role. If we combine that with NPS and Azure AD, we can also add MFA. With a simple tweak to the Remote Desktop Connection client (MSTSC.EXE), we can RDP to a Windows machine behind the RDS Gateway. The connection from the client to the gateway is pre-authenticated, x.509 certificate protected, HTTPS traffic encapsulating the RDP stream. That connection terminates at the RDS Gateway and then forwards as RDS to the desired Windows Server virtual machine behind it.

Unlike the previous jumpbox solution:

  • This can be a low-end machine, such as a B-Series.
  • It can scale out using a load balancer
  • Many people can relay through a single jumpbox machine.
  • You won’t need RDS licensing at all, not even to scale out to more than 2 users per gateway machine.

So – there’s no SSH here. So Linux is a problem.

Opinion

We don’t really have a complete solution right now. Azure Bastion probably will be the best one in the long-run, but it has so many missing features that I couldn’t consider it now. For Windows, an RDS Gateway is probably best, and for Linux, a Guacamole server might be best.

What do you think?

Webinar – Getting More Performance From Azure VMs

I will be doing a webinar later today for the European SharePoint Office 365 & Azure Community (from the like-named conference). The webinar is at 14:00 UK/Irish, 15:00 CET, and 09:00 EST. Registration is here.

Title: Getting More Performance from Azure Virtual Machines

Speaker: Aidan Finn, MVP, Ireland

Date and Time: Wed, May 1, 2019 3:00 PM – 4:00 PM CEST

Webinar Description:  You’ve deployed your shiny new application in the cloud, and all that pride crashes down when developers and users start to complain that it’s slow. How do you fix it? In this session you’ll learn to understand what Azure virtual machines can offer, how to pick the right ones for the right job, and how to design for the best possible performance, including networking, storage, processor, and GPU.

Key benefits of attending:
– Understand virtual machine design
– Optimise storage performance
– Get more from Azure networking

Do Not Enable Azure Storage Account Firewall – IaaS

If you read through the security recommendations in Azure Security Center, you do get given out to a lot. A lot of it makes no sense if you understand Azure and the recommendations. One that appeared to make sense was to enable the relatively new firewall in Azure Storage:

  • Only allow trusted subnets – nice idea to limit the attack surface on the storage account in conjunction with service endpoints.
  • Allow “trusted Microsoft services” to access the storage account (on by default).

Note: A storage account can only be connected if you know one of the really long random access keys.

But if you do enable this firewall in an Azure deployment, things will break:

  • Boot Diagnostics: Does not know how to write to a secured storage account, even with firewall rules and service endpoints enabled.
  • Serial Console Access: Requires Boot Diagnostics to be working so that’s dead too.
  • NSG Flow Logs/Traffic Analytics: Another feature that doesn’t understand a secured storage account, even with “trusted Microsoft services” marked as enabled (default).

And there might be more!

So you have to aks yourself – do you want maximum security or a usable & manageable system? Storage account firewalls are pretty new – we didn’t need them a few months ago. So we can drop that feature, and maybe use the new Advanced Threat Protection for storage accounts feature instead?

It’s a pit that some joined-up thinking and integration testing weren’t done here.

Global ONLINE Azure Bootcamp

On one day every year, community members all across the planet get together at local events and host/attend sessions on Azure; this is the Global Azure Bootcamp. It’s been running on a Spring Saturday for years, and this year it is on April 27th.

Unfortunately, Microsoft Ireland wasn’t able to provide a venue so it looked like there would not be a local event in this part of Ireland. While I was at the recent MVP Summit, I threw out the idea of running an online version of the Global Azure Bootcamp … a Global Online Azure Bootcamp. The MVP Lead for UK& Ireland, Claire, loved the idea, ran off to the organisers of the global event, came back and said “do it!”.

So I did … I reached out to the speaker community and … was blown away by the response. So much so, that this will be a truly Global ONLINE Azure Bootcamp with content for all timezones:

  • We’re starting at 09:00 Perth/Bejing time
  • Finishing at 17:00 Seattle/Los Angeles time

The idea is that sessions will be pre-recorded and made available online on a scheduled basis on April 27th. That means anyone with Internet access anywhere on the planet can join this instance of the Global Azure Bootcamp – some of the presenters will actually be live-presenting elsewhere that day!

The content spans many tracks: dev, infrastructure, devops, data, AI, governance, security, and more. There really is something for everyone that is interested in Azure.

You can learn more here on the official event site.

This event has no sponsorship and it’s all be organized at the very last second. So here’s my ask:

Hopefully we’ll see (so to speak because we don’t have tracking) you there on the day!

Aidan.

Azure Availability Zones in the Real World

I will discuss Azure’s availability zones feature in this post, sharing what they can offer for you and some of the things to be aware of.

Uptime Versus SLA

Noobs to hosting and cloud focus on three magic letters: S, L, A or service level agreement. This is a contractual promise that something will be running for a certain percentage of time in the billing period or the hosting/cloud vendor will credit or compensate the customer.

You’ll hear phrases like “three nines”, or “four nines” to express the measure of uptime. The first is a 99.9% measure, and the second is a 99.99% measure. Either is quite a high level of uptime. Azure does have SLAs for all sorts of things. For example, a service deployed in a valid virtual machine availability set has a connectivity (uptime) SLA of 99.9%.

Why did I talk about noobs? Promises are easy to make. I once worked for a hosting company that offers a ridiculous 100% SLA for everything, including cheap-ass generic Pentium “servers” from eBay with single IDE disks. 100% is an unachievable target because … let’s be real here … things break. Even systems with redundant components have downtime. I prefer to see realistic SLAs and honest statements on what you must do to get that guarantee.

Azure gives us those sorts of SLAs. For virtual machines we have:

  • 5% for machines with just Premium SSD disks
  • 9% for services running in a valid availability set
  • 99% for services running in multiple availability zones

Ah… let’s talk about that last one!

Availability Sets

First, we must discuss availability sets and what they are before we move one step higher. An availability set is anti-affinity, a feature of vSphere and in Hyper-V Failover Clustering (PowerShell or SCVMM); this is a label on a virtual machine that instructs the compute cluster to spread the virtual machines across different parts of the cluster. In Azure, virtual machines in the same availability set are placed into different:

  • Update domains: Avoiding downtime caused by (rare) host reboots for updates.
  • Fault domains: Enable services to remain operational despite hardware/software failure in a single rack.

The above solution spreads your machines around a single compute (Hyper-V) cluster, in a single room, in a single building. That’s amazing for on-premises, but there can still be an issue. Last summer, a faulty humidity sensor brought down one such room and affected a “small subset” of customers. “Small subset” is OK, unless you are included and some mission critical system was down for several hours. At that point, SLAs are meaningless – a refund for the lost runtime cost of a pair of Linux VMs running network appliance software won’t compensate for thousands or millions of Euros of lost business!

Availability Zones

We can go one step further by instructing Azure to deploy virtual machines into different availability zones. A single region can be made up of different physical locations with independent power and networking. These locations might be close together, as is typically the case in North Europe or West Europe. Or they might be on the other side of a city from each other, as is the case in some in North America. There is a low level of latency between the buildings, but this is still higher than that of a LAN connection.

A region that supports availability zones is split into 4 zones. You see three zones (round robin between customers), labeled as 1, 2, and 3. You can deploy many services across availability zones – this is improving:

  • VNet: Is software-defined so can cross all zones in a single region.
  • Virtual machines: Can connect to the same subnet/address space but be in different zones. They are not in availability sets but Azure still maintains service uptime during host patching/reboots.
  • Public IP Addresses: Standard IP supports anycast and can be used to NAT/load balance across zones in a single region.

Other network resources can work with availability zones in one of two ways:

  • Zonal: Instances are deployed to a specific zone, giving optimal latency performance within that zone, but can connect to all zones in the region.
  • Zone Redundant: Instances are spread across the zone for an active/active configuration.

Examples of the above are:

  • The zone-aware VNet gateways for VPN/ExpressRoute
  • Standard load balancer
  • WAGv2 / WAFv2

Considerations

There are some things to consider when looking at availability zones.

  • Regions: The list of regions that supports availability zones is increasing slowly but it is far from complete. Some regions will not offer this highest level of availability.
  • Catchup: Not every service in Azure is aware of availability zones, but this is changing.

Let me give you two examples. The first is VM Boot Diagnostics, a service that I consider critical for seeing the console of the VM and getting serial console access without a network connection to the virtual machine. Boot Diagnostics uses an agent in the VM to write to a storage account. That storage account can be:

  • LRS: 3 replicas reside in a single compute cluster, in a single room, in a single building (availability zone).
  • GRS: LRS plus 3 asynchronous replicas in the paired region, that are not available for write unless Microsoft declares a total disaster for the primary region.

So, if I have a VM in zone 1 and a VM in zone 2, and both write to a storage account that happens to be in zone 1 (I have no control over the storage account location), and zone 1 goes down, there will be issues with the VM in zone 2. The solution would be to use ZRS GPv2 storage for Boot Diagnostics, however, the agent will not support this type of storage configuration. Gotcha!

Azure Advisor will also be a pain in the ass. Noobs are told to rely on Advisor (it is several questions in the new Azure infrastructure exams) for configuration and deployment advice. Advisor will see the above two VMs as being not highly available because they are not (and cannot) be in a common availability set, so you are advised to degrade their SLA by migrating them to a single zone for an availability set configuration – ignore that advice and be prepared to defend the decision from Azure noobs, such as management, auditors, and ill-informed consultants.

Opinion

Availability zones are important – I use them in an architecture pattern that I am working on with several customers. But you need to be aware of what they offer and how certain things do not understand them yet or do not support them yet.