I Am Running My “Starting Azure Infrastructure” Course in London on Feb 22/23

I am delighted to announce the dates of the first delivery of my own bespoke Azure training in London, UK, on February 21st and 22nd. All the details can be found here.

In my day job, I have been teaching Irish Microsoft partners about Azure for the past three years, using training materials that I developed for my employer. I’m not usually one to brag, but we’ve been getting awesome reviews on that training and it has been critical to us developing a fast growing Azure market. I’ve tweeted about those training activities and many of my followers have asked about the possibility of bringing this training abroad.

So a new venture has started, with brand new training, called Cloud Mechanix. With this business, I am bringing brand-new Azure training to the UK and Europe.  This isn’t Microsoft official training – this is my real world, how-to, get-it-done training, written and presented by me. We are keeping the classes small – I have learned that this makes for a better environment for the attendees. And best of all – the cost is low. This isn’t £2,000 training. This isn’t even £1,000 training.

The first course is booked and will be running in London (quite central) on Feb 22-23. It’s a 2-day “Starting Azure Infrastructure” course that will get noobies to Azure ready to deploy solutions using Azure VMs. And experience has shown that my training also teaches a lot to those that think they already know Azure VMs. You can learn all about this course, the venue, dates, costs, and more here.

I’m excited by this because this is my business (with my wife as partner). I’ve had friends, such as Mark Minasi, telling me to do this for years. And today, I’m thrilled to make this happen. Hopefully some of you will be too and register for this training Smile

VMQ On Team Interface Breaking Hyper-V Networking

I recently had a situation where virtual machines on a Windows Server 2016 (WS2016) Hyper-V host could not communicate with each other. Ping tests were failing:

  • Extremely high latency
  • Lost packets

In this case, I was building a new Windows Server 2016 demo lab for some upcoming community events in The Netherlands and Germany, an updated version of my Hidden Treasures in Hyper-V talk that I’ve done previously at Ignite and TechEd Europe (I doubt I’ll ever do a real talk at Ignite again because I’m neither a MS employee or a conference sponsor). The machine I’m planning on using for these demos is an Intel NUC – it’s small, powerful, and is built with lots of flash storage. My lab consists of some domain controllers, storage, and some virtualized (nested) hosts, all originally connected to an external vSwitch. I built my new hosts, but could not join them to the domain. I did a ping from the new hosts to the domain controllers, and the tests resulted in massive packet loss. Some packets go through but with 3000+ MS latency.

At first I thought that I had fat-fingered some IPv4 configurations. But I double and triple checked things. No joy there. And that didn’t make sense (did I mention that this was at while having insomnia at 4am after doing a baby feed?) The usual cause of network problems is VMQ so that was my next suspect. I checked NCPA.CPL for the advanced NIC properties of the Intel NIC and there was no sign of VMQ. That’s not always a confirmation, so I ran Get-NetAdapterAdvancedProperty in PowerShell. My physical NIC did not have VMQ features at all, but the team interface of the virtual switch did.

And then I remembered reading that some people found that the team interface (virtual NIC) of the traditional Windows Server (LBFOADMIN) team (not Switch-Embedded Teaming) had VMQ enabled by default and that it caused VMQ-style issues. I ran Set-VMNetAdapterAdvancedProperty to disable the relevant RegistryKeyword for VMQ while running a ping –t and the result was immediate; my virtual switch was now working correctly. I know what you’re thinking – how can packets switching from one VM to another on the same host be affected by a NIC team? I don’t know, but they randomly are.

I cannot comment on how this affects 10 GbE networking – the jerks at Chelsio didn’t release WS2016 drivers for the T4 NICs and I cannot justify a spend on new NICs for WinServ work right now (it’s all Azure, all the time these days).  But if you are experiencing weird virtual switch packet issues, and you are using a traditional NIC team, then see if VMQ on the team interface (the one connected to your virtual switch) is causing the issue.

WatchGuard Now Supported by Azure for Dynamic/Route-Based VPN

Microsoft now supports WatchGuard’s firewalls with the 11.12 firmware (fireware) for dynamic or route-based VPN.

There are two kinds of VPN gateway in Azure:

  • Static / policy-based: 1:1  connections, don’t support point-to-site VPN, or VNet-to-VNet VPN, website-to-VNet VPN, and really only good for the simplest of designs.
  • Dynamic / route-based: Multiple simultaneous connections, supports all of Azure’s VPN features, and enables complicated designs.

I always prefer route-based VPNs, because they don’t restrict what I can do in Azure. Up to recently, though, that caused a complication for me at work. My employer distributes WatchGuard’s Firebox (XTM) unified threat management firewall devices, and those devices were restricted to policy-based VPN. Good news!

  • WatchGuard released 11.12 of their software (which works on all devices) and this added policy-based (aka Dynamic) VPN support.
  • Microsoft just listed WatchGuard’s devices as being supported by Azure for route-based VPN.

You can find WatchGuard’s instructions for configuring a route-based VPN here.

FYI, the notable devices that still don’t have route-based support are:

  • Cisco ASA (!!!)
  • Barracuda NextGen Firewall X-series
  • Brocade Vyatta 5400 vRouter
  • Citrix NetScaler MPX, SDX, VPX

I guess you can get fired for buying Cisco after all!

Technorati Tags: ,,

My Azure Load Balancer NAT Rule Won’t Work (Why & Solution)

I’ve had a bug in Azure bite me in the a$$ every time I’ve run an Azure training course. I thought I’d share it here. The course that I’ve been running recently focuses on VM solutions in a CSP subscription – so it’s all ARM, and the problem might be constrained to CSP subscriptions.

When I create a NAT rule via the portal, most of the time, the NAT rule fails to work. For example, I create a VM, enable an NSG to allow RDP inbound, and create a load balancer NAT rule to enable RDP inbound (TCP 50001 –> 3389 for a VM) It appears like there’s a timing issue behind the portal, because eventually the NAT rule starts to work.

There’s actually a variety of issues with load balancer administration in the Azure Portal:

  • The second step in creating a NAT rule is when the target NIC is updated; this fails a high percentage of the time (note the target being set to “–“ in the rule summary).
  • Creating/updating a backend pool can fail, with some/none of the virtual machines being added to the pool.

These problems are restricted to the Azure Portal. I have no such issues when configuring these settings using PowerShell or deploying a new resource group using a JSON template. That’s great, but not perfect – a lot of general administration is done in the portal, and the GUI is how people learn.

Ignite 2016 – Discover What’s New In Windows Server 2016 Virtualization

This post is a collection of my notes from the Ben Armstrong’s (Principal Program Manager Lead in Hyper-V) session (original here) on the features of WS2016 Hyper-V. The session is an overview of the features that are new, why they’re there, and what they do. There’s no deep-dives.

A Summary of New Features

Here is a summary of what was introduced in the last 2 versions of Hyper-V. A lot of this stuff still cannot be found in vSphere.

image

And we can compare that with what’s new in WS2016 Hyper-V (in blue at the bottom). There’s as much new stuff in this 1 release as there were in the last 2!

image

Security

The first area that Ben will cover is security. The number of attack vectors is up, attacks are on the rise, and the sophistication of those attacks is increasing. Microsoft wants Windows Server to be the best platform. Cloud is a big deal for customers – some are worried about industry and government regulations preventing adoption of the cloud. Microsoft wants to fix that with WS2016.

Shielded Virtual Machines

Two basic concepts:

  • A VM can only run on a trusted & healthy host – a rogue admin/attacker cannot start the VM elsewhere. A highly secured Host Guardian Service must authorize the hosts.
  • A VM is encrypted by the customer/tenant using BitLocker – a rogue admin/attacker/government agency cannot inspect the VM’s contents by mounting the disk(s).

image

There are levels of shielding, so it’s not an all or nothing.

Key Storage Drive for Generation 1 VMs

Shielding, as above, required Generation 2 VMs. You can also offer some security for Generation 1 virtual machines: Key Storage Drive. Not as secure as shielded virtual machines or virtual TPM, but it does give us a safe way to use BitLocker inside a Generation 1 virtual machine – required for older applications that depend on older operating systems (older OSs cannot be used in Generation 2 virtual machines).

 

image

Virtual Secure Mode (VSM)

We also have Guest Virtual Secure Mode:

  • Credential Guard: protecting ID against pass-the-hash by hiding LSASS in a secured VM (called VSM) … in a VM with a Windows 10 or Windows Server 2016 guest OS! Malware running with admin rights cannot steal your credentials in a VM.
  • Device Guard: Protect the critical kernel parts of the guest OS against rogue s/w, again, by hiding them in a VSM in a Windows 10 or Windows Server 2016 guest OS.

image

Secure Boot for Linux Guests

Secure boot was already there for Windows in Generation 2 virtual machines. It’s now there for Linux guest OSs, protecting the boot loader and kernel against root kits.

image

Host Resource Protection (HRP)

Ben hopes you never see this next feature in action in the field Smile This is because Host Resource Protection is there to protect hosts/VMs from a DOS attack against a host by someone inside a VM. The scenario: you have an online application running in a VM. An attacker compromises the application (example: SQL injection) and gets into the guest OS of the VM. They’re isolated from other VMs by the hypervisor and hardware/DEP, so they attack the host using DOS, and consume resources.

A new feature, from Azure, called HRP will determine that the VM is aggressively using resources using certain patterns, and start to starve it of resources, thus slowing down the DOS attack to the point of being pointless. This feature will be of particular interest to:

  • Companies hosting external facing services on Hyper-V/Windows Azure Pack/Azure Stack
  • Hosting companies using Hyper-V/Windows Azure Pack/Azure Stack

image

This is another great example of on-prem customers getting the benefits of Azure, even if they don’t use Azure. Microsoft developed this solution to protect against the many unsuccessful DOS attacks from Azure VMs, and we get it for free for our on-prem or hosted Hyper-V hosts. If you see this happening, the status of the VM will switch to Host Resource Protection.

Security Demos

Ben starts with virtual TPM. The Windows 10 VM has a virtual TPM enabled and we see that the C: drive is encrypted. He shuts down the VM to show us the TPM settings of the VM. We can optionally encrypt the state and live migration traffic of the VM – that means a VM is encrypted at rest and in transit. There is a “performance impact” for this optional protection, which is why it’s not on by default. Ben also enables shielding – and he loses console access to the VM – the only way to connect to the machine is to remote desktop/SSH to it.

Note: if he was running the full host guardian service (HGS) infrastructure then he would have had no control over shielding as a normal admin – only the HGS admins would have had control. And even the HGS admins have no control over BitLocker.

He switches to a Generation 1 virtual machine with Key Storage Drive enabled. BitLocker is running. In the VM settings (Generation 1) we see Security > Key Storage Drive Enabled. Under the hood, an extra virtual hard disk is attached to the VM (not visible in the normal storage controller settings, but visible in Disk Management in the guest OS). It’s a small 41 MB NTFS volume. The BitLocker keys are stored there instead of a TPM – virtual TPM is only in Generation 2, but it’s using the same sorts of tech/encryption/methods to secure the contents in the Key Storage Drive, but it cannot be as secure as virtual TPM, but it is better than not having BitLocker. Microsoft can make the same promises with data at rest encryption for Generation 1 VMs, but it’s still not as good as a Generation 2 VM with vTPM or even a shielded VM (requires Generation 2).

Availability

The next section is all about keeping services up and running in Hyper-V, whether it’s caused by upgrades or infrastructure issues. Everyone has outages and Microsoft wants to reduce the impact of these. Microsoft studied the common causes, and started to tackle them in WS2016

Cluster OS Rolling Upgrades

Microsoft is planning 2-3 updates per year for Nano Server, plus there’ll be other OS upgrades in the future. You cannot upgrade a cluster node. And in the past we could only do cluster-cluster migrations to adopt new versions of Windows Server/Hyper-V. Now, we can:

  1. Remove cluster node 1
  2. Rebuild cluster node 1 with the new version of Windows Server/Hyper-V
  3. Add cluster node 1 to the old cluster – the cluster runs happily in mixed-mode for a short period of time (weeks), with failover and Live Migration between the old/new OS versions.
  4. Repeat steps 1-3 until all nodes are up to date
  5. Upgrade the cluster functional level – Update-ClusterFunctionalLevel (see below for “Emulex incident”)
  6. Upgrade the VMs’ version level

Zero VM downtime, zero new hardware – 2 node cluster, all the way to a 64 node cluster.

If you have System Center:

  1. Upgrade to SCVMM 2016.
  2. Let it orchestrate the cluster upgrade (above)

Supports starts with WS2012 R2 to WS2016. Re-read that statement: there is no support for W2008/W2008 R2/WS2012. Re-read that last statement. No need for any questions now Smile

image

To avoid an “Emulex incident” (you upgrade your hosts – and a driver/firmware fails even though it is certified, and the vendor is going to take 9 months to fix the issue) then you can actually:

  1. Do the node upgrades.
  2. Delay the upgrade to the cluster functional level for a week or two
  3. Test your hosts/cluster for driver/firmware stability
  4. Rollback the cluster nodes to the older OS if there is an issue –> only possible if the cluster functional level is on the older version.

And there’s no downtime because it’s all leveraging Live Migration.

Virtual Machine Upgrades

This was done automatically when you moved a VM from version X to version X+1. Now you control it (for the above to work). Version 8 is WS2016 host support.

image

Failover Clustering

Microsoft identified two top causes of outages in customer environments:

  • Brief storage “outages” – crashing the guest OS of a VM when an IO failed. In WS2016, when an IO fails, the VM is put in a paused-critical state (for up to 24 hours, by default). The VM will resume as soon as the storage resumes.
  • Transient network errors – clustered hosts being isolated causing unnecessary VM failover (reboot), even if the VM was still on the network. A very common 30 seconds network outage will cause a Hyper-V cluster to panic up to and including WS2012 R2 – attempted failovers on every node and/or quorum craziness! That’s fixed in WS2016 – the VMs will stay on the host (in an unmonitored state) if they are still networked (see network protection from WS2012 R2). Clustering will wait (by default) for 4 minutes before doing a failover of that VM. If a host glitches 3 times in an hour it will be automatically quarantined, after resuming from the 3rd glitch, (VMs are then live migrated to other nodes) for 2 hours, allowing operator inspection.

image

Guest Clustering with Shared VHDX

Version 1 of this in WS2012 R2 was limited – supported guest clusters but we couldn’t do Live Migration, replication, or backup of the VMs/shared VHDX files. Nice idea, but it couldn’t really be used in production (it was supported, but functionally incomplete) instead of virtual fibre channel or guest iSCSI.

WS2016 has a new abstracted form of Shared VHDX – it’s even a new file format. It supports:

  • Backup of the VMs at the host level
  • Online resizing
  • Hyper-V Replica (which should lead to ASR support) – if the workload is important enough to cluster, then it’s important enough to replicate for DR!

image

One feature that does not work (yet) is Storage Live Migration. Checkpoint can be done “if you know what you are doing” – be careful!!!

Replica Support for Hot-Add VHDX

We could hot-add a VHDX file to a VM, but we could not add that to replication if the VM was already being replicated. We had to re-replicate the VM! That changes in WS2016, thanks to the concept of replica sets. A new VHDX is added to a “not-replicated” set and we can move it to the replicated set for that VM.

image

Hot-Add Remove VM Components

We can hot-add and hot-remove vNICs to/from running VMs. Generation 2 VMs only, with any supported Windows or Linux guest OS.

We can also hot-add or hot-remove RAM to/from a VM, assuming:

  • There is free RAM on the host to add to the VM
  • There is unused RAM in the VM to remove from the VM

This is great for those VMs that cannot use Dynamic Memory:

  • No support by the workload
  • A large RAM VM that will benefit from guest-aware NUMA

A nice GUI side-effect is that guest OS memory demand is now reported in Hyper-V Manager for all VMs.

Production Checkpoints

Referring to what used to be called (Hyper-V) snapshots, but were renamed to checkpoints to stop dumb people from getting confused with SAN and VSS snapshots – yes, people really are that stupid – I’ve met them.

Checkpoints (what are now called Standard Checkpoints) were not supported by many applications in a guest OS because they lead to application inconsistency. WS2016 adds a new default checkpoint type called a Production Checkpoint. This basically uses backup technology (and IT IS STILL NOT A BACKUP!) to create an application consistent checkpoint of a VM. If you apply (restore) the checkpoint the VM:

  • The VM will not boot up automatically
  • The VM will boot up as if it was restoring from a backup (hey dumbass, checkpoints are STILL NOT A BACKUP!)

For the stupid people, if you want to backup VMs, use a backup product. Altaro goes from free to quite affordable. Veeam is excellent. And Azure Backup Server gives you OPEX based local backup plus cloud storage for the price of just the cloud component. And there are many other BACKUP solutions for Hyper-V.

Now with production checkpoints, MSFT is OK with you using checkpoints with production workloads …. BUT NOT FOR BACKUP!

image

Demos

Ben does some demos of the above. His demo rig is based on nested virtualization. He comments that:

  • The impact of CPU/RAM is negligible
  • There is around a 25% impact on storage IO

Storage

The foundation of virtualization/cloud that makes or breaks a deployment.

Storage Quality of Service (QOS)

We had a basic system in WS2012 R2:

  • Set max IOPS rules per VM
  • Set min IOPS alerts per VM that were damned hard to get info from (WMI)

And virtually no-one used the system. Now we get storage QoS that’s trickled down from Azure.

In WS2016:

  • We can set reserves (that are applied) and limits on IOPS
  • Available for Scale-Out File Server and block storage (via CSV)
  • Metrics rules for VHD, VM, host, volume
  • Rules for VHD, VM, service, or tenant
  • Distributed rule application – fair usage, managed at storage level (applied in partnership by the host)
  • PoSH management in WS2016, and SCVMM/SCOM GUI image

You can do single-instance or multi-instance policies:

  • Single-instance: IOPS are shared by a set of VMs, e.g. a service or a cluster, or this department only gets 20,000 IOPS.
  • Multi-instance: the same rule is applied to a group of VMs, the same rule for a large set of VMs, e.g. Azure guarantees at least X IOPS to each Standard storage VHD.

image

Discrete Device Assignment – NVME Storage

DDA allows a virtual machine to connect directly to a device. An example is a VM connects directly to extremely fast NVME flash storage.

Note: we lose Live Migration and checkpoints when we use DDA with a VM.

image

Evolving Hyper-V Backup

Lots of work done here. WS2016 has it’s only block change tracking (Resilient Change Tracking) so we don’t need a buggy 3rd party filter driver running in the kernel of the host to do incremental backups of Hyper-V VMs. This should speed up the support of new Hyper-V versions by the backup vendors (except for you-know-who-yellow-box-backup-to-tape-vendor-X, obviously!).

Large clusters had scalability problems with backup. VSS dependencies have been lessened to allow reliable backups of 64 node clusters.

Microsoft has also removed the need for hardware VSS snapshots (a big source of bugs), but you can still make use of hardware features that a SAN can offer.

ReFS Accelerated VHDX Operations

Re-FS is the preferred file system for storing VMs in WS2016. ReFS works using metadata which links to data blocks. This abstraction allows very fast operations:

  • Fixed VHD/X creation (seconds instead of hours)
  • Dynamic VHD/X expansion
  • Checkpoint merge, which impacts VM backup

Note, you’ll have to reformat WS2012 R2 ReFS to get the new version of ReFS.

Graphics

A lot of people use Hyper-V (directly or in Azure) for RDS/Citrix.

RemoteFX Improvements

image

The AVC444 thing is a lossless codec – lossless 3D rendering, apparently … that’s gobbledegook to me.

DDA Features and GPU Capabilities

We can also use DDA to connect VMs directly to CPUs … this is what the Azure N-Series VMs are doing with high-end NVIDIA GFX cards.

  • DirectX, OpenGL, OpenCL, CUDA
  • Guest OS: Server 2012 R2, Server 2016, Windows 10, Linux

The h/w requirements are very specific and detailed. For example, I have a laptop that I can do RemoteFX with, but I cannot use for DDA (SRIOV not supported on my machine).

Headless Virtual Machine

A VM can be booted without display devices. Reduces the memory footprint, and simulates a headless server.

Operational Efficiency

Once again, Microsoft is improving the administration experience.

PowerShell Direct

You can now to remote PowerShell into a VM via the VMbus on the host – this means you do not need any network access or domain join. You can do either:

  • Enter-PSSession for an interactive session
  • Invoke-Command for a once-off instruction

Supports:

  • Host: Windows 10/WS2016
  • Guest: Windows 10/WS2016

You do need credentials for the guest OS, and you need to do it via the host, so it is secure.

This is one of Ben’s favourite WS2016 features – I know he uses it a lot to build demo rigs and during demos. I love it too for the same reasons.

PowerShell Direct – JEA and Sessions

The following are extensions of PowerShell Direct and PowerShell remoting:

  • Just Enough Administration (JEA): An admin has no rights with their normal account to a remote server. They use a JEA config when connecting to the server that grants them just enough rights to do their work. Their elevated rights are limited to that machine via a temporary user that is deleted when their session ends. Really limits what malware/attacker can target.
  • Justin-Time Administration (JITA): An admin can request rights for a short amount of time from MIM. They must enter a justification, and company can enforce management approval in the process.

vNIC Identification

Name the vNICs and make that name visible in the guest OS. Really useful for VMs with more than 1 vNIC because Hyper-V does not have consistent device naming.

image

Hyper-V Manager Improvements

Yes, it’s the same MMC-based Hyper-V Manager that we got in W2008, but with more bells and whistles.

  • Support for alternative credentials
  • Connect to a host IP address
  • Connect via WinRM
  • Support for high-DPI monitors
  • Manage WS2012, WS2012 R2 and WS2016 from one HVM – HVM in Win10 Anniversary Update (The big Redstone 1 update in Summer 2016) has this functionality.

VM Servicing

MS found that the vast majority of customers never updated the Integration services/components (ICs) in the guest OS of VMs. It was a horrible manual process – or one that was painful to automate. So customers ran with older/buggy versions of ICs, and VMs often lacked features that the host supported!

ICs are updated in the guest OS via Windows Update on WS2016. Problem sorted, assuming proper testing and correct packaging!

MSFT plans to release IC updates via Windows Update to WS2012 R2 in a month, preparing those VMs for migration to WS2016. Nice!

Core Platform

Ben was running out of time here!

Delivering the Best Hyper-V Host Ever

This was the Nano Server push. Honestly – I’m not sold. Too difficult to troubleshoot and a nightmare to deploy without SCVMM.

I do use Nano in the lab. Later, Ben does a demo. I’d not seen VM status in the Nano console before, which Ben shows – the only time I’ve used the console is to verify network settings that I set remotely using PoSH Smile There is also an ability to delete a virtual switch on the console.

Nested Virtualization

Yay! Ben admits that nested virtualization was done for Hyper-V Containers on Azure, but we people requiring labs or training environments can now run multiple working hosts & clusters on a single machine!

VM Configuration File

Short story: it’s binary instead of XML, improving performance on dense hosts. Two files:

  • .VMCX: Configuration
  • .VMRS: Run state

Power Management

Client Hyper-V was impacted badly by Windows 8 era power management features like Connected Standby. That included Surface devices. That’s sorted now.

Development Stuff

This looks like a seed for the future (and I like the idea of what it might lead to, and I won’t say what that might be!). There is now a single WMI (Root\HyperVCluster\v2) view of the entire Hyper-V cluster – you see a cluster as one big Hyper-V server. It really doesn’t do much now.

And there’s also something new called Hyper-V sockets for Microsoft partners to develop on. An extension of the Windows Socket API for “fast, efficient communication between the host and the guest”.

Scale Limits

The numbers are “Top Gear stats” but, according to a session earlier in the week, these are driven by Azure (Hyper-V’s biggest customer). Ben says that the numbers are nuts and we normals won’t ever have this hardware, but Azure came to Hyper-V and asked for bigger numbers for “massive scale”. Apparently some customers want massive super computer scale “for a few months” and Azure wants to give them an OPEX offering so those customers don’t need to buy that h/w.

Note Ben highlights a typo in max RAM per VM: it should say 12 TB max for a VM … what’s 4 TB between friends?!?!

image

Ben wraps up with a few demos.

Ignite 2016 – Introducing Windows Server and System Center 2016

This session (original here) introduces WS2016 and SysCtr 2016 at a high level. The speakers were:

  • Mike Neil: Corporate VP, Enterprise Cloud Group at Microsoft
  • Erin Chapple: General Manager, Windows Server at Microsoft

A selection of other people will come on stage to do demos.

20 Years Old

Windows Server is 20 years old. Here’s how it has evolved:

image

The 2008 release brought us the first version of Hyper-V. Server 2012 brought us the same Hyper-V that was running in Azure. And Windows Server 2016 brings us the cloud on our terms.

The Foundation of Our Cloud

The investment that Microsoft made in Azure is being returned to us. Lots of what’s in WS2016 came from Azure, and combined with Azure Stack, we can run Azure on-prem or in hosted clouds.

There are over 100 data centers in Azure over 24 regions. Windows Server is the platform that is used for Azure across all that capacity.

IT is Being Pulled in Two Directions – Creating Stresses

  • Provide secure, controlled IT resources (on prem)
  • Support business agility and innovation (cloud / shadow IT)

By 2017, 50% of IT spending will be outside of the organization.

Stress points:

  • Security
  • Data centre efficiency
  • Modernizing applications

Microsoft’s solution is to use unified management to:

  • Advanced multi-layer security
  • Azure-inspired, software-defined,
  • Cloud-read application platform

Security

Mike shows a number of security breach headlines. IT security is a CEO issue – costs to a business of a breach are shown. And S*1t rolls downhill.

Multi-layer security:

  • Protect identity
  • Secure virtual machines
  • Protect the OS on-prem or in the cloud

Challenges in Protecting Credentials

Attack vectors:

  1. Social engineering is the one they see the most
  2. Pass the hash
  3. Admin = unlimited rights. Too many rights given to too many people for too long.

To protect against compromised admin credentials:

image

  • Credential Guard will protect ID in the guest OS
  • JEA limits rights to just enough to get the job done
  • JITA limits the time that an admin can have those rights

The solution closes the door on admin ID vulnerabilities.

Ryan Puffer comes on stage to do a demo of JEA and JITA. The demo is based on PowerShell:

  1. He runs Enter-PSSession to log into a domain controller (DNS server). Local logon rights normally mean domain admin.
  2. He cannot connect to the DC, because his current logon doesn’t have DC rights, so it fails.
  3. He tries again, but adding –ConfiguratinName to add a JEA config to Enter-PSSession, and he can get in. The JEA config was set up by a more trusted admin. The JEA authentication is done using a temporary virtual local account on the DC that resides nowhere else. This account exists only for the duration of the login session. Malware cannot use this account because it has limited rights (to this machine) and will disappear quickly.
  4. The JEA configuration has also limited rights – he can do DNS stuff but he cannot browse the file system, create users/groups, etc. His ISE session only shows DNS Get- cmdlets.
  5. He needs some modify rights. He browses to a Microsoft Identity Manager (MIM) portal and has some JITA roles that he can request – one of these will give his JEA temp account more rights so he can modify DNS (via a group membership). He selects one and has to enter details to justify the request. He puts in a time-out of 30 minutes – 31 minutes later he will return to having just DNS viewer rights. MFA via Azure can be used to verify the user, and manager approval can be required.
  6. He logs in again using Enter-PSSession with the JEA config. Now he has DNS modify rights. Note: you can whitelist and blacklist cmdlets in a role.

Back to Mike.

Challenges Protecting Virtual Machines

VMs are files:

  • Easy to modify/copy
  • Too many admins have access

Someone can mount a VMs disks or copy a VM to gain access to the data. Microsoft believes that attackers (internal and external) are interested in attacking the host OS to gain access to VMs, so they want to prevent this.

This is why Shielded Virtual Machines was invented – secure the guest OS by default:

  • The VM is encrypted at rest and in transit
  • The VM can only boot on authorised hosts

Azure-Inspired, Software-Defined

Erin Chapple comes on stage.

This is a journey that has been going on for several releases of Windows Server. Microsoft has learned a lot from Azure, and is bringing that learning to WS2016.

Increase Reliability with Cluster Enhancements

  • Cloud means more updates, with feature improvements. OS upgrades weren’t possible in a cluster. In WS2016, we get cluster rolling upgrades. This allows us to rebuild a cluster node within a cluster, and run the cluster temproarily in mixed-version mode. Now we can introduce changes without buying new cluster h/w or VM downtime. Risk isn’t an upgrade blocker.
  • VM resiliency deals with transient errors in storage, meaning a brief storage outage pauses a VM instead of crashing it.
  • Fault domain-aware clusters allows us to control how errors affect a cluster. You can spread a cluster across fault domains (racks) just like Azure does. This means your services can be spread across fault domains, so a rack outage doesn’t bring down a HA service.

image

24 TB of RAM on a physical host and 12 TB RAM in a guest OS are supported. 512 physical LPs on a host, and 240 virtual processors in a VM. This is “driven by Azure” not by customer feedback.

Complete Software-Defined Storage Solution

Evolving Storage Spaces from WS2012/R2. Storage Spaces Direct (S2D) takes DAS and uses it as replicated/shared storage across servers in a cluster, that can either be:

  • Shared over SMB 3 with another tier of compute (Hyper-V) nodes
  • Used in a single tier (CSV, no SMB 3) of hyper-converged infrastructure (HCI)

image

Storage Replica introduces per-volume sync/async block-level beneath-the-file system replication to Windows Server, not caring about what the source/destination storage is/are (can be different in both sites) as long as it is cluster-supported.

Storage QoS guarantees an SLA with min and max rules, managed from a central point:

  • Tenant
  • VM
  • Disk

The owner of S2D, Claus Joergensen, comes on stage to do an S2D demo.

  1. The demo uses latest Intel CPUs and all-Intel flash storage on 16 nodes in a HCI configuration (compute and storage on a single cluster, shared across all nodes).
  2. There are 704 VMs run using an open source tool called VMFleet.
  3. They run a profile similar to Azure P10 storage (each VHD has 500 IOPS). That’s 350,000 IOPS – which is trivial for this system.
  4. They change this to Azure P20: now each disk has 2,300 IOPS, summing 1.6 million IOPS in the system – it’s 70% read and 30% write. Each S2D cluster node (all 16 of them) is hitting over 100,000 IOPS, which is about the max that most HCI solutions claim.
  5. Clause changes the QoS rules on the cluster to unlimited – each VM will take whatever IOPS the storage system can give it.
  6. Now we see a total of 2.7 million IOPS across the cluster, with each node hitting 157,000 to 182,000 IOPS, at least 50% more than the HCI vendors claim.

Note the CPU usage for the host, which is modest. That’s under 10% utilization per node to run the infrastructure at max speed! Thank Storage Spaces and SMB Direct (RDMA) for that!

image

  1. Now he switches the demo over to read IO only.
  2. The stress test hits 6.6 million read IOPS, with each node offering between 393,000 and 433,000 IOPS – that’s 16 servers, no SAN!
  3. The CPU still stays under 10% per node.
  4. Throughput numbers will be shown later in the week.

If you want to know where to get certified S2D hardware, then you can get DataON from MicroWarehouse in Dublin (www.mwh.ie):

image

Nano Server

Nano Server is not an edition – it is an installation option. You can install a deeply stripped down version of WS2016, that can only run a subset of roles, and has no UI of any kind, other than a very basic network troubleshooting console.

It consumes just 460 MB disk space, compared to 5.4 GB of Server Core (command prompt only). It boots in less than 10 seconds and a smaller attack surface. Ideal scenario: born in the cloud applications.

Nano Server is not launched in Current Branch for Business. If you install Nano Server, then you are forced into installing updates as Microsoft releases them, which they expect to do 2-3 times per year. Nano will be the basis of Microsoft’s cloud infrastructure going forward.

Azure-Inspired Software-Defined Networking

A lot of stuff from Azure here. The goal is that you can provision new networks in minutes instead of days, and have predictable/secure/stable platforms for connecting users/apps/data that can scale – the opposite of VLANs.

Three innovations:

  • Network Controller: From Azure, a fabric management solution
  • VXLAN support: Added to NVGRE, making the underlying transport less important and focusing more on the virtual networks
  • Virtual network functions: Also from Azure, getting firewall, load balancing and more built into the fabric (no, it’s not NLB or Windows Firewall – see what Azure does)

Greg Cusanza comes on stage – Greg has a history with SDN in SCVMM and WS2012/R2. He’s going to deploy the following:

image

That’s a virtual network with a private address space (NAT) with 3 subnets that can route and an external connection for end user access to a web application. Each tier of the service (file and web) has load balancers with VIPs, and AD in the back end will sync with Azure AD. This is all familiar if you’ve done networking in Azure Resource Manager (ARM).

  1. A bunch of VMs have been created with no network connections.
  2. He opens a PoSH script that will run against the network controller – note that you’ll use Azure Stack in the real world.
  3. The script runs in just over 29 seconds – all the stuff in the screenshot is deploy and the VMs are networked and have Internet connectivity – He can browse the Net from a VM, and can browse the web app from the Internet – he proves that load balancing (virtual network function) is working.

Now an unexpected twist:

  1. Greg browses a site and enters a username and password – he has been phished by a hacker and now pretends to be the attacker.
  2. He has discovered that the application can be connected to using remote desktop and attempts to sign in used the phished credentials. He signs into one of the web VMs.
  3. He uploads a script to do stuff on the network. He browses shares on the domain network. He copies ntds.dit from a DC and uploads it to OneDrive for a brute force attack. Woops!

This leads us to dynamic security (network security groups or firewall rules) in SDN – more stuff that ARM admins will be familiar with. He’ll also add a network virtual appliance (a specialised VM that acts as a network device, such as an app-aware firewall) from a gallery – which we know that Microsoft Azure Stack will be able to syndicate from :

image

 

  1. Back in PoSH, he runs another script to configure network security groups, to filter traffic on a TCP/UDP port level.
  2. Now he repeats the attack – and it fails. He cannot RDP to the web servers, he couldn’t browse shared folders if he did, and he prevented outbound traffic from the web servers anyway (stateful inspection).

The virtual appliance is a network device that runs a customized Linux.

  1. He launches SCVMM.
  2. We can see the network in Network Service – so System Center is able to deploy/manage the Network Controller.

Erin finished by mentioning the free WS2016 Datacenter license offer for retiring vSphere hosts “a free Datacenter license for every vSphere host that is retired”, good until June 30, 2017 – see www.microsoft.com/vmwareshift

Cloud-Ready Application Platform

Back to Mike Neil. We now have a diverse set of infrastructure that we can run applications one:

image

WS2016 adds new capabilities for cloud-based applications. Containers was a huge thing for MSFT.

A container virtualizes the OS, not the machine. A single OS can run multiple Windows Server Containers – 1 container per app. So that’s a single shared kernel – that’s great for internal & trusted apps, similar to containers that are available on Linux. Deployment is fast and you can get great app density. But if you need security, you can deploy compatible Hyper-V Containers. The same container images can be used. Each container has a stripped down mini-kernal (see Nano) isolated by a Hyper-V partition, meaning that untrusted or external apps can be run safely, isolated from each other and the container host (either physical or a VM – we have nested Hyper-V now!). Another benefit of Hyper-V Containers is staggered servicing. Normal (Windows Server) Containers share the kernal with the container host – if you service the host then you have to service all of the containers at the same time. Because they are partitioned/isolated, you can stagger the servicing of Hyper-V Containers.

Taylor Brown (ex- of Hyper-V and now Principal Program Manager of Containers) comes on stage to do a demo.

image

  1. He has a VM running a simple website – a sample ASP.NET site in Visual Studio.
  2. In IIS Manager, he does a Deploy > Export Application, and exports a .ZIP.
  3. He copies that to a WS2016 machine, currently using 1.5 GB RAM.
  4. He shows us a “Docker File” (above) to configure a new container. Note how EXPOSE publishes TCP ports for external access to the container on TCP 80 (HTTP) and TCP 8172 (management). A PowerShell snap-in will run webdeploy and it will restore the exported ZIP package.
  5. He runs Docker Build –t mysite  … with the location of the docker file.
  6. A few seconds later a new container is built.
  7. He starts the container and maps the ports.
  8. And the container is up and running in seconds – the .NET site takes a few seconds to compile (as it always does in IIS) and the thing can be browsed.
  9. He deploys another 2 instances of the container in seconds. Now there are 3 websites and only .5 GB extra RAM is consumed.
  10. He uses docker run -isolation=hyperv to get an additional Hyper-V Container. The same image is started … it takes an extra second or two because of “cloning technology that’s used to optimize deployment of Hyper-V Containers”.
  11. Two Hyper-V containers and 3 normal containers (that’s 5 unique instances of IIS) are running in a couple of minutes, and the machine has gone from using 1.5 GB RAM to 2.8 GB RAM.

Microsoft has been a significant contributor to the Docker open source project and one MS engineer is a maintainer of the project now. There’s a reminder that Docker’s enterprise management tools will be available to WS2016 customers free of charge.

On to management.

Enterprise-Class Data Centre Management

System Center 2016:

  • 1st choice for Windows Server 2016
  • Control across hybrid cloud with Azure integrations (see SCOM/OMS)

SCOM Monitoring:

  • Best of breed Windows monitoring and cross-platform support
  • N/w monitoring and cloud infrastructure health
  • Best-practice for workload configuration

Mahesh Narayanan, Principal Program Manager, comes on stage to do a demo of SCOM. IT pros struggle with alert noise. That’s the first thing he wants to show us – it’s really a way to find what needs to be overriden or customized.

  1. Tune Management Packs allows you to see how many alerts are coming from each management pack. You can filter this by time.
  2. He click Tune Alerts action. We see the alerts, and a count of each. You can then do an override (object or group of objects).

Maintenance cycles create a lot of alerts. We expect monitoring to suppress these alerts – but it hasn’t yet! This is fixed in SCOM 2016:

  1. You can schedule maintenance in advance (yay!). You could match this to a patching cycle so WSUS/SCCM patch deployments don’t break your heart on at 3am on a Saturday morning.
  2. Your objects/assets will automatically go into maintenance mode and have a not-monitored status according to your schedules.

All those MacGuyver solutions we’ve cobbled together for stopping alerts while patching can be thrown out!

That was all for System Center? I am very surprised!

PowerShell

PowerShell is now open source.

  • DevOps-oriented tooling in PoSH 5.1 in WS2016
  • vNext Alpha on Windows, macOS, and Linux
  • Community supported releases

Joey Aiello, Program Manager, comes up to do a demo. I lose interest here. The session wraps up with a marketing video.

Ignite 2016 – Microsoft Azure Networking: New Network Services, Features And Scenarios

This session (original here) from Microsoft Ignite 2016 is looking at new networking features in Azure such as Web Application Firewall, IPv6, DNS, accelerated networking, VNet Peering and more. This post is my collection of notes from the recording of this session.

The speakers are:

  • Yousef Khalidi, Corporate Vice President, Microsoft
  • Jason Carson, Enterprise Architect, Manulife
  • Art Chenobrov, Manager Identity, Access and Messaging, Hyatt Hotels
  • Gabriel Silva, Program Manager, Microsoft

A mix of Microsoft  and non-Microsoft speakers. There will be a breadth overview and some customer testimonials. A chunk of marketing consumes the first 7 minutes. Then on to the good stuff.

High Performance Networking

A number of improvements have been made at no cost to the customer. Honestly, I’ve seen some by accident, and they ruined (in a good way) some of my demos Smile

  • Improved performance of all VMs, seeing VNet performance improve by 33% to 50%
  • More IOPS to storage – I saw IOPS increase in some demo/tests
  • For Linux and Windows VMs
  • The global deployment will be completed in 2016 – phased deployments across the Azure regions.

You have to do nothing to get these benefits. I’m sure that Yousef said that’ll we’ll be able to get up to 21 Gbps down depending on the VM SKU/size. Some of this is made possible thanks to making better utilization of NIC capacity.

Accelerated Networking

Azure now has SR-IOV (single-root IO virtualization), where a VM can connect directly to a physical NIC without routing traffic via the virtual switch in the host partition. The results are:

image

  • 10 x latency improvement
  • Increased packets per second (PPS)
  • Reduced jitter – great for media/voice

Now Azure has the highest bandwidth VMs in the cloud: DS15v2 and D15v2 can hit 25 Gbps (in preview). The competition can get up to 20 Gbps “on a good day”.

Performance sensitive applications will benefit. There is a 1.5x improvement for Azure SQL DB in memory OLTP transactions.

Microsoft are rolling this out across Azure over this and the next calendar years. Gabe (Gabriel) does a demo, doing VM to VM latency and bandwidth test. You can enable SR-IOV in the Portal (Accelerated Network setting). The demo is done in West Central US region. You can verify that SR-IOV is enabled for the vNIC in the guest OS – Windows, look for a virtual function (VF) network adapter in Devices. Interestingly, in the demo, we can tell that the host uses Mellanox ConnectX-3 RDMA NICs. The first demo does 100,000 pings on VMs, and this is 10 times lower than current numbers. They run a network stress test between two VMs.

image

 

The get 25 Gbps of connectivity between the 2 VMs:

image

This functionality will be coming “soon” to us.

Next there’s a demo with connection latency tests to a database, from a VM with SRIOV and one without. We see that latency is significantly lower on the accelerated VM. They re-run the test to make the results more tangible. The un-accelerated machine can query 270 rows per second while the accelerated one is hitting 664. Same VMs – just SRIOV is enabled on one of them.

image

The subscription must be enabled for this feature first (still rolling it out) and then all of your VMs can leverage the feature. There is no cost to turning it on and using the feature.

Back to Yousef.

The Network Big Picture

The following is an old slide full of old features:

image

On to the new stuff.

VNet Peering (GA)

A customer can have lots of isolated features with duplicated effort.

image

Customers want to consolidate some of this. For example, can we:

 

  • Have one VNet that has load balancing and virtual appliance firewalls/proxies
  • Connect other VNets to this?

The answer is yes, you can now using VNet peering (limited to connections in a single region) which just went GA.

image

Note that VM connections across a VNet run at the speed of the VMs’ NICs.

Azure DNS (GA)

You can host your records in Azure DNS or elsewhere. The benefit of Azure is that it is global and fast.

image

IPv6 for Azure VMs

We can create IPv6 IP addresses on the load balancer, and use AAAA DNS records (which you can host in Azure DNS if you want) to access VM services in Azure. This is supported for Linux and Windows. This is a big deal for IoT devices.

image

Load Balancing (Review)

Yousef reviews how load balancing can be done toady in Azure. A traffic manager profile (based on DNS records and abstraction) does load balancing/fail over between 2+ Azure deployments (across 1+ regions). A single deployment has an Azure Load Balancer, which uses Layer 4 LB rules to pass traffic through to the VNet. Within the VNet, Azure application gateways (can) the proxy/direct/load balance Layer 7 traffic to web servers (VMs) on the VNet.

image

Web Application Firewall

The web application gateway is still relatively unknown, in my experience, even though it’s been around for 1 year. This is layer 7 handling of traffic to web farms/servers.

image

 

A preview for web application firewall (WAF) has been announced – an extension of the web application gateway.

image

WAF adds security to the WAG. In current preview, it uses a hard set of rules, but custom rules will be coming soon. MSFT hopes to GA it soon (must be ready first).

WAF is an add-on SKU to the gateway. It can run in detection mode (great to watch traffic without intervening – try it out). When you are happy, you switch over to prevention mode so it can intervene.

image

Multiple VIPS for Load Balancer

This is a cost reduction improvement. For example, you needed to run multiple databases behind internal load balancers, with each DB pair requiring a unique VIP. Now we can assign multiple VIPs to a LB, and consolidate the databases to a pair of VMs instead of multiple pairs of VMs.

image

Back end ports can also be reused to facilitate the above.

NIC Enhancements

These improvements didn’t get mentioned in any posts I read or announcements I heard. MAC addresses were not persistent. They have been for a few months now. Also, VM ordering in a VM is retained after VM start (important for NVAs) – there was a bug were the NICs weren’t in persistent order.

image

New virtual appliance scenarios are supported by adding functionality to additional NICs in a VM:

  • Load balancing
  • Direct public IP assignment
  • Multiple IPs on a single NIC

A marketing-heavy video is played to discuss how Hyatt Hotels are using Azure networking. I think that the jist of the story is that Hyatt went from a single data center in the USA, to having multiple PoPs around the world thanks to Azure networking (probably ExpressRoute).  The speaker from Hyatt comes on stage.

Yousef is back on stage to talk about connecting to Azure. I was ready to skip this piece of the video but Yousef did present some interesting stuff. The first is using the Azure backbone to connect disparate offices. Each office connects over “the last mile” to Azure using secure VPN. Then Azure VNet-VNet VPNs provide the WAN. I’d never thought of this architecture – it’s actually pretty simple to set up with the new VPN UI in the Azure Portal. Azure provides low latency and high bandwidth connections – this is a very cheap way to network sites together with lots of speed and low latency.

image

Highly Available Connections to Azure

We can create more than 1 connection to Azure VPN gateways, solving a concern that people have over reliance on a single link/ISP.

image

Most people don’t know it, but the Azure gateway was an active/passive VM cluster behind the curtain. You can now run the gateway in an active/active configuration, giving you greater HA for your site-to-Azure connections. And additionally, you can aggregate the bandwidth of both VPN tunnels/links/ISPs.

image

If you are interested in the expensive ExpressRoute WAN option, then the PoP locations have increased to 35 around the world – more than any other cloud, with lots of partners offering WAN and connection relay options.

image

ExpressRoute has a new UltraPerfromance gateway option: 5x improvement over the 2 Gbps HighPerformance gateway– up to 10 Gbps through to VNets

The ExpressRoute gateway SLA is increased to 99.95%.

More insights into ExpressRoute are being added: troubleshooting, BGP/traffic/routing statistics, diagnostics, alerting, monitoring, etc.

There’s a stint by the Manulife speaker to talk about their usage of Azure, which I skipped.

Monitoring And Diagnostics

Customers want visibility into the virtual networks that they are using for production and mission critical applications/services. So Microsoft has given us this in Azure:

image

More stuff will appear in PoSH, log extractions (for 3rd parties), and in the Portal in the future. And the session moved on to a summary.

Azure VNet Peering Is In Preview – But Has Registration Issues

Microsoft has launched the preview of Azure VNet Peering. You can find overview information on it and some how to’s.

You need to register for the VNet Peering preview using PowerShell:

Register-AzureRmProviderFeature -FeatureName AllowVnetPeering -ProviderNamespace Microsoft.Network –force

It takes up to 30 minutes for this to complete. You can check your registration status by running:

Get-AzureRmProviderFeature -FeatureName AllowVnetPeering -ProviderNamespace Microsoft.Network

However … there does appear to be issues during these early days of the preview.  I’ve tried it out with a couple of subscriptions (Open and CSP) and the registration claims to have succeeded but I cannot peer VNets yet, because I “have not registered yet”.

image

It’s not unusual for a preview to have issues in the first couple of days – this is the first time the feature (which is still preview!) will have widespread rollout and usage. I would expect that Microsoft has detected issues and is working on a fix for this anticipated feature.

Technorati Tags: ,

VNet Peering To Connect Azure Virtual Networks (Preview)

In a busy night of Azure announcements, Microsoft said that we can now peer two Azure VNets to connect them without using a VNet-to-VNet VPN. This in-preview feature will reduce costs and complexity.

image

I have yet to find any technical details, but this will be a great addition. I like that it supports ASM and ARM connections via different subscriptions – I can run RemoteApp in Open (ASM) to provide remote access to services in CSP (ARM).

As usual, you should carefully plan your VNet network address to plan for scalability – don’t be the idiot that deploys the entirety of 10.0.0.0 to a single VNet/subnet!

Note that I have been unable to find technical documentation yet.

Technorati Tags: ,

RunAsRadio Podcast – Hyper-V in Server 2016

I recently recorded an episode of the RunAsRadio podcast with Richard Campbell on the topic of Windows Server 2016 (WS2016) Hyper-V. We covered a number of areas, including containers, nested virtualization, networking, security, and PowerShell.

image