Ignite 2015 – Harden the Fabric: Protecting Tenant Secrets in Hyper-V

This post is HEAVY reading. It might take a few reads/watches.

This post is my set of notes from the session presented by Allen Marshall, Dean Wells, and Amitabh Tamhane at Microsoft Ignite 2015. Unfortunately it was on at the same time as the “What’s New” session by Ben Armstrong and Sarah Cooley. The focus is on protecting VMs so that fabric administrators:

Can power on or off VMs
Cannot inspect the disks
Cannot inspect the processes
Cannot attach debuggers to the system
Can’t change the configuration

This is to build a strong barrier between the tenant/customer and the administrator … and in turn, the three-letter agencies that are overstepping their bounds.

The Concern

Security concerns are the primary blocker in public cloud adoption. It’s not just the national agencies; people feature the operators and breached admin accounts of the fabric too. Virtual machines make VMs easier to move … and their disks to steal.

The obvious scenario is hosted. The less obvious scenario is a private cloud. A fabric admin is usually the admin of everything, and therefore and see into everything; is this desirable?

Now Hyper-V is defending the VM from the fabric.

What is a Shielded VM?

The data and state of a shielded VM are protected against inspection, theft, and tampering from both malware and data centre administrators)

Who is this for?

The result of shielding is:

Note: BitLocker is used to encrypt the disks of the VM from within the guest OS using a virtual TPM chip.

A service that runs outside of Hyper-V, the Host Guardian Service, is responsible for allowing VMs to boot up. Keys to boot the VM are only granted to the host when it is known and healthy – something that the host must prove.

FAQ

Can Azure do this? It doesn’t have shielding but it encrypts data at rest.
Can it work with Linux? Not yet, but they’re working on it.
What versions of guest OS? WS2012 and later are supported now, and they’re working on W2008 and W2008 R2. There are issues because they only work in Generation 1 VMs and shielding is a Generation 2 feature.

Demo

Scenario is that he has copied the data VHD of an un-shielded vDC and mounted it on his laptop where he has local admin rights. He browses the disk, alters ACLs on the folders and runs a scavenge & brute force attack (to match hashes) to retrieve usernames and passwords from the AD database. This sort of attack could also be done on vSphere or XenServer.

He now deploys a shielded VM from a shielded template using Windows Azure Pack – I guess this will be Azure Stack by RTM. Shielding data is the way that administrator passwords and RDP secrets are passed via a special secure/encyrpted package that the “hoster” cannot access. The template disk is also secured by a signature/hash that is contained in the package to ensure that the “hoster” has not altered the disk.

Another example: a pre-existing -non-shielded VM. He clicks Configure > Shielding and selects a shielding package to be used to protect the VM.

A console connection is not possible to a shielded VM by default.

He now tries to attach a shielded VHD using Disk Manager. The BitLocker protected disk is mounted but is not accessible. There are “no supported protectors”. This is real encryption and the disk is random 1s and 0s for everything but the owner VM.

Now even the most secure of organizations can deploy virtual DCs and sensitive data in virtual machines.

Security Assurances

At rest and in-flight encryption. The disks are encrypted and both VM state and Live Migration are encrypted.
Admin lockout: Host admins have no access to disk contents or VM state.
Attestation of health: VMs can only run on known and “healthy” (safe) hosts via the Host Guardian Service.

Methods of Deployment

There are two methods of deployment. The first is TPM-based and intended for hosters (isolation and mutlti-forest) and extremely difficult (if at all) to break. The second is AD-based and intended for enterprises (integrated networks and single forest) and might be how enterprises dip their tow into Shielded VMs before looking at TPM where all of the assurances are possible.

The latter is AD/Kerberos based. Hosts are added to a group and the Host Guardian Service ensures that the host is a member of the group when the host attempts to power up a shielded VM. Note that the Admin-trusted (AD) model does not have forced code integrity, hardware-rooted trust, or measured boot – the TPM model these features ensure trust of the host code.

TPM v2.0 is required on the host for h/w-trusted model. This h/w is not available yet on servers.

Architecture

Admin-trusted is friction-free with little change required.

A minimum of one WS2016 server is required to be the Host Guardian Service node. This is in it’s own AD forest of it’s own, known as a safe harbour active directory. Joining this service to the existing AD poisons it – it is the keys to the keys of the kingdom.

The more secure hardware-trusted model has special h/w requirements. Note the HSM, TPM 2.0 and UEFU 2.3.1 requirements. The HSM secures the certificates more effectively than software.

The HGS should be deployed with at least 3 nodes. There should be physical security and have limited number of admins. The AD should be dedicated to the HGS – each HGS nodes is a DC. The HGS client is a part of WS2016 Hyper-V. TPM is required and SecureBoot is recommended.

Virtualization Based Security (VBS)

Based on processor extensions in the hardware. VBS may be used by the host OS and the guest OS. This is also used by Device Guard in the Enterprise edition of Windows 10.

The hypervisor is responsible for enforcing security. It’s hardware protected and runs at a higher privilege level (ring -1) than the management OS, and it boots before the management OS (already running before the management OS starts to boot since WS2012). Hypervisor binaries can be measured and protected by Secure Boot. There are no drivers or installable code in Hyper-V, so no opportunity to attack there. The management OS kernel is code protected by the hypervisor too.

Physical presence, hardware and DOS attacks are still possible. The first 2 are prevented by good practice.

Hardware Requirements

SLAT is the key part that enables Hyper-V to enforce memory protection. Any server chipset from the last 8 or so years will have SLAT so there’s likely not an issue in production systems.

Security Boundaries Today (WS2012 R2)

Each VM has a VMWP.EXE (worker process) in the management OS that is under the control of the fabric admin. A rogue admin can misuse this to peer into the VM. The VHD/X files and others are also in the same trust boundary of the fabric admin. The hypervisor fully trusts the management OS. There are a litany of attacks that are possible by a rogue administrator or malware on the management OS.

Changing Security Boundaries in Hyper-V

Virtual Secure Mode: an enlightenment that any partition (host or guest) can take advantage of. There’s a tiny runtime environment in there. In there are trust-lets running on IUM and SMART Secure Kernel, aka SMART or SKERNEL).
The hypervisor now enforces code integrity for the management OS (hypervisor is running first) and for shielded VMs
A hardened VMWP is sued for shielded VMs to protect their state – e.g. prevent attaching a debugger.
A virtual TPM (vTPM) can be offered to a VM, e.g. disk encryption, measurement, etc.
Restrictions on host admin access to guest VMs
Strengthened the boundary to protect the hypervisor from the management OS.

Virtual Secure Mode (VSM)

VSM is the cornerstone of the new enterprise assurance features.

Protects the platform, shielded VMs, and Device Guard
It’s a tiny secure environment where platform secrets are kept safe

It operates based on virtual trust levels (VTLs). Kind of like user/kernel mode for the hypervisor. Two levels now but the design allows for future scalability. The higher the number, the higher the level of protection. The higher levels control access privileges for lower levels.

VTL 0: “normal world”
VTL 1: “secure world”

VTLs provide memory isolation and are created/managed by the hypervisor at the time of page translation. VTLs cannot be changed by the management OS.

Inside the VSM, trustlets execute on the SKERNEL. No third party code is allowed. Three major (but not all) components are:

Local Security Authority Sub System (LSASS) – credentials isolation, defeating “pass the hash”
Kernel code integrity – moving the kernel code integrity checks into the VSM
vTPM – provides a synthetic TPM device to guest VMs, enabling guest disk encryption

There is a super small kernel, meaning there’s a tiny attack surface. The hypervisor is in control of transitions/interactions between the management OS and the VSM.

The VSM is a rich target, so direct memory attacks (DMA) are likely. To protect against it, the IOMMUs in the system (Intel VT-D) prevents arbitrary access.

Protecting VM State

Requires a Generation 2 VM.
Enables secure boot
Support TPM 2.0
Supports WS2012 and later, looking at W2008 and W2008 R2.
Using Virtual Secure Mode in the guest OS requires WS2012 R2 – VSM is a hypervisor facility offered to enlightened guests (WS2016 only and not being backported).

vTPM

It is not backed by a physical TPM. Ensures that the VM is mobile.
Enables BitLocker in the guest OS, e.g. BitLocker in Transparent Mode – no need to sit there and type a key when it boots.
Hardened VMWP hosts the vTPM virtual device for protected VMs.

This hardened VMWP handles other encryption other than just at rest (BitLocker):

Live migration where egress traffic is enrypted
All other at rest files: runtime state file, saved state, checkpoint
Hyper-V Replica Log (HRL) file

There are overheads but they are unknown at this point.

VMWP Hardening

Run as “protected process light” (originally created for DRM)
Disallows debugging and restricts handles access – state and crash dump files are encrypted
Protected by code integrity
New permissions with “just enough access” (JEA)
Removes duplicate handles to VMWP.EXE

Restricted Access to Shielded VMs

Disallowed:

Basic mode of VMConnect
RemoteFX
Insecure WMI calls, screenshot, thumbnail, keyboard, mouse
Insecure KVPs: Host Only items, Host Exchange items, Guest Exchange items
Guest File Copy integration service (out-of band or OOB file copy)
Initial Machine Config registry hive injection – a way to inject a preconfigured registry hive into a new VM.

VM Generation ID is not affected.

Custom Security Configurations

How to dial back the secure-by-default configuration to suit your needs. Maybe the host admin is trusted or maybe you don’t have all of the host system requirements. Three levels of custom operation:

Basic TPM Functionality: Enable vTPM for secure boot, disk encryption, or VSC
Data at Rest Protections: Includes Basic TPM. The hardened VMWP protects VM state and Live Migration traffic. Console mode access still works.
Fully Shielded: Enables all protections, including restrictions of host admin operations.

The WS2016 Hyper-V Security Boundaries

Scenarios

Organizations with strict regulatory/compliance requirements for cloud deployments
Virtualising sensitive workloads, e.g. DCs
Placing sensitive workloads in physically insecure locations (HGS must be physically secure)

Easy, right?

Technorati Tags: Event Notes,Windows Server 2016,Hyper-V,Security