Failover Clustering

Physical Disks are Missing in Disk Management

In this post, I’ll explain how I fixed a situation where most of my Storage Spaces JBOD disks were missing in Disk Management and Get-PhysicalDisk showed their OperationalStatus as being stuck on “Starting”.

I’ve had some interesting hardware/software issues with an old lab at work. All of the hardware is quite old now, but I’ve been trying to use it in what I’ll call semi-production. The WS2016 Hyper-V cluster hardware consists of a pair of Dell R420 hosts and an old DataON 6 Gbps SAS Storage Spaces JBOD.

Most of the disks disappeared in Disk Management and thus couldn’t be added to a new Storage Spaces pool. I checked Device Manager and they were listed. I removed the devices and rebooted but the disks didn’t appear in Disk Management. I then ran Get-PhysicalDisk and this came up:

As you can see, the disks were there, but their OperationalStatus was hung on “Starting” and their HealthStatus was “Unknown”. If this was a single disk, I could imagine that it had failed. However, this was nearly every disk in the JBOD and spanned HDD and SSD. Something else was up – probably Windows Server 2016 or some firmware had threw a wobbly and wasn’t wrapping up some task.

The solution was to run Reset-PhysicalDisk. The example on docs.microsoft.com was incorrect, but adding a foreach loop fixed things:

$phydisk = (Get-Physicaldisk | Where-Object -FilterScript {$_.HealthStatus -Eq “Unknown”})

foreach ($item in $phydisk)
{
Reset-PhysicalDisk -FriendlyName $item.FriendlyName
}

A few seconds later, things looked a lot better:

I was then able to create the new pool and virtual disks (witness + CSVs) in Failover Cluster Manager.

Feedback Required By MS – Storage Replica in WS2019 STANDARD

Microsoft is planning to add Storage Replica into the Standard Edition of Windows Server 2019 (WS2019). In case you weren’t paying attention, Windows Server 2016 (WS2016) only has this feature in the Datacenter edition – a large number of us campaigned to get that changed. I personally wrecked the head of Ned Pyle (@NerdPyle) who, when he isn’t tweeting gifs, is a Principal Program Manager in the Microsoft Windows Server High Availability and Storage group – he’s one of the people responsible for the SR feature and he’s the guy who presents it at conferences such as Ignite.

What is SR? It’s volume based replication in Windows Server Failover Clustering. The main idea what to enable replication of LUNs when companies couldn’t afford SAN replication licensing. Some SAN vendors charge a fortune to enable LUN replication for disaster recovery and SR is a solution for this.

A by product of SR is a scenario for smaller businesses. With the death of cluster-in-a-box (manufacturers are focused on larger S2D customers) the small-medium business is left looking for a new way to build a Hyper-V cluster. You can do 2-node S2D clusters but they have single points of failure (4 nodes are required to get over this) and require at least 10 GBE networking. If you use SR, you can create an active/passive 2-node Hyper-V cluster using just internal RAID storage in your Hyper-V hosts. It’s a simpler solution … but it requires Datacenter Edition today, and in the SME & branch office scenario, Datacenter only makes financial sense when there are 13+ VMs per host.

Ned listened to the feedback. I think he had our backs and understood where we were coming from. So SR has been added to WS2019 Standard in the preview program. Microsoft wants telemetry (people to use it) and to give feedback – there’s a survey here. SR in Standard will be limited. Today, those limits are:

SR replicates a single volume instead of an unlimited number of volumes.
Servers can have one partnership instead of an unlimited number of partners.
Volume size limited to 2 TB instead of an unlimited size.

Microsoft really wants feedback on those limitations. If you think those limitations are too low, then TALK NOW. Don’t wait for GA when it is too late. Don’t be the idiot at some event who gives out shite when nothing can be done. ACT NOW.

If you cannot get the hint, complete the survey!

Talking Hyper-V & Azure At Upcoming Community Events

The last 12 months of my existence have been a steady diet of Azure. My focus at work has been on developing and delivering a set of bespoke Azure training courses aimed at our customers (MS partners) working in the Cloud Solutions Provider (CSP) channel. As of last week, my calendar became a lot more … reasonable. Don’t get me wrong, I’ve got meetings up the hoo-hah, but I’m not under the same deadline pressure as I was. And that frees up some time for some community stuff.

I’ve got three things coming up in April and May.

Lowlands Unite (Netherlands) – April 11th

A collection of MVPs from around Europe will be here for this 2-track event. I’ll be there presenting an updated version of the session that I did at TechEd Europe and Ignite 2015, The Hidden Treasures of Windows Server 2016 Hyper-V. This is a session where I like to talk and demonstrate the features in Hyper-V (and related) that don’t get the same coverage as the big ticket items, such as Storage Spaces Direct or Nano Server. And while these features don’t get those headlines, I often find that they are more useful for more customers.

Hyper-V Community (Munich) – May 3rd

This is a special pre-event day being organized by Hyper-V (Cloud & Datacenter Management) MVP, Carsten Rachfahl. Starting at midday, sessions will be presented by Ben Armstrong, Allesandro Pilotti, Didier Van Hoye, and myself. My session is a progression of the “When Disaster Strikes” session, moving into a more technical session on using Azure as a DR site for Hyper-V. I have a demo rig all set up, and am looking forward to showing it off with lots of practical advice.

Cloud & Datacenter Conference Germany (Munich) May 4th/5th

I spoke at this event last year, and it was easily the best run conference I’ve been to in Europe, the one with the best speakers & content, and the event with the best food (ever & anywhere). If you’re working in the Microsoft space (Windows, Server, Azure, Office, and more) and you can speak German then this is definitely the event for you. It’s an all-star cast of speakers, encouraged to talk and demonstrate tech, over 4 tracks spanning 2 days. I will be speaking on day 2 (Friday) and doing my new The Hidden Treasures of Windows Server 2016 Hyper-V session.

Ignite 2016 – Storage Spaces Direct

Read the notes from the session recording (original here) on Windows Server 2016 (WS2016) Storage Spaces Direct (S2D) and hyper-converged infrastructure, which was one of my most anticipated sessions of Microsoft Ignite 2016. The presenters were:

Claus Joergensen: Program Manager
Cosmos Darwin, Program Manager

Definition

Cosmos starts the session.

Storage Spaces Direct (S2D) is software-defined, shared-nothing storage.

Software-defined: Use industry standard hardware (not proprietary, like in a SAN) to build lower cost alternative storage. Lower cost doesn’t mean lower performance … as you’ll see
Shared-nothing: The servers use internal disks, not shared disk trays. HA and scale is achieved by pooling disks and replicating “blocks”.

Deployment

There’s a bunch of animated slides.

3 servers, each with internal disks, a mix of flash and HDD. The servers are connected over Ethernet (10 GbE or faster, RDMA)
Runs some PowerShell to query the disks on a server. The server has 4 x SATA HDD and 2 x SATA SSD. Yes, SATA. SATA is more affordable than SAS. S2D uses a virtual SAS bus over the disks to deal with SATA issues.
They form a cluster from the 3 servers. That creates a single “pool” of nodes – a cluster.
Now the magic starts. They will create a software-defined pool of virtually shared disks, using Enable-ClusterStorageSpacesDirect. That cmdlet does some smart work for us, identifying caching devices and capacity devices – more on this later.
Now they can create a series of virtual disks, each which will be formatted with ReFS and mounted by the cluster as CSVs – shared storage volumes. This is done with one cmdlet, New-Volume, which is doing all the lifting. Very cool!

There are two ways we can now use this cluster:

We expose the CSVs using file shares to another set of servers, such as Hyper-V hosts, and those servers store data, such as virtual machine files, using SMB 3 networking.
We don’t use any SMB 3 or file shares. Instead, we enable Hyper-V on all the S2D nodes, and run compute and storage across the cluster. This is hyper-converged infrastructure (HCI)

A new announcement. A 3rd scenario is SQL Server 2016 (supported). You install SQL Server 2016 on each node, and store database/log files on the CSVs (no SMB 3 file shares).

Scale-Out

So your S2D cluster was fine, but now your needs have grown and you need to scale out your storage/compute? It’s easy. Add another node (with internal storage) to the cluster. In moments, S2D will claim the new data disks. Data will be re-balanced over time across the disks in all the nodes.

Time to Deploy?

Once you have the servers racked/cabled, OS installed, and networking configured, you’re looking at under 15 minutes to get S2D configured and ready. You can automate a lot of the steps in SCVMM 2016.

Cluster Sizing

The minimum number of required nodes is an “it depends”.

Ideally you have a 4-node cluster. This offers HA, even during maintenance, and supports the most interesting form of data resilience that includes 3-way mirroring.
You could do a 3 node cluster, but that’s limited to 2-way mirroring.
And now, as of Ignite, you can do a 2-node cluster.

Scalability:

2-16 nodes in a single cluster – add nodes to scale out.
Over 3PB of raw storage per cluster – add drives to nodes to scale up (JBODS are supported).
The bigger the cluster gets, the better it will perform, depending on your network.

The procurement process is easy: add servers/disks

Performance

Claus takes over the presentation.

1,000,000 IOPS

Earlier in the week (I blogged this in the WS2016 and SysCtr 2016 session), Claus showed some crazy numbers for a larger cluster. He’s using a more “normal” 4-node (Dell R730xd) cluster in this demo. There are 4 CSVs. Each node has 4 NVMe flash devices and a bunch of HDDs. There are 80 VMs running on the HCI cluster. They’re using a open source stress test tool called VMFleet. The cluster is doing just over 1 million IOPS, over 925,000 read and 80.000 write. That’s 4 x 2U servers … not a rack of Dell Compellent SAN!

Disk Tiering

You can do:

SSD + HDD
All SSD

You must have some flash storage. That’s because HDD is slow at seek/read. “Spinning rust” (7200 RPM) can only do about 75 random IOs per second (IOPS). That’s pretty pathetic.

Flash gives us a built-in, always-on cache. One or more caching device (a flash disk) is selected by S2D. Caching devices are not pooled. The other disks, capacity devices, are used to store data, and are pooled and dynamically (not statically) bound to a caching device. All writes up to 256 KB and all reads up to 64 GB are cached – random IO is intercepted, and later sent it to capacity devices as optimized IO.

Note the dynamic binding of capacity devices to caching devices. If a server has more than one caching device, and one fails, the capacity devices of the failed caching device are dynamically re-bound.

Caching devices are deliberately not pooled – this allows their caching capability to be used by any pool/volume in the cluster –the flash storage can be used where it is needed.

The result (in Microsoft’s internal testing) was that they hit 600+ IOPS per HDD …. that’s how perfmon perceived it … in reality the caching devices were positively greatly impacting the performance of “spinning rust”.

NVMe

WS2016 S2D supports NVMe. This is a PCIe bus-connected form of very fast flash storage, that is many times faster than SAS HBA-connected SSD.

Comparing costs per drive/GB using retail pricing on NewEgg (a USA retail site):

Comparing performance, not price:

If we look at the cost per IOP, NVMe becomes a very affordable acceleration device:

Some CPU assist is require to move data to/from storage. Comparing SSD and NVMe, the NVMe has more CPU for Hyper-V or SQL Server.

The highest IOPS number that Microsoft has hit, so far, is over 6,000,000 read IOPS from a single cluster, which they showed earlier in the week.

1 Tb/s Throughput (New Record)

IOPS are great. But IOPS is much like horsepower in a car, we care more about miles/KMs per hour or amounts of data we can actually push in a second. Microsoft recently hit 1 terabit per second. The cluster:

12 nodes
All Micron NVMe
100 GbE Mellanox RDMA network adapters
336 VMs, stress tested by VMFleet.

Thanks to RDMA and NVMe, the CPU consumption was only 24-27%.

1 terabit per second. Wikipedia (English) is 11.5 GB. They can move English Wikipedia 14 times per second.

Fault Tolerance

Soooo, S2D is cheaper storage, but the performance is crazy good. Maybe there’s something wrong with fault tolerance? Think again!

Cosmos is back.

Failures are not a failure mode – they’re a critical design point. Failures happen, so Microsoft wants to make it easy to deal with.

Drive Fault Tolerance

You can survive up to 2 simultaneous drive failures. That’s because each chunk of data is stored on 3 drives. Your data stays safe and continuously (better than highly) available.
There is automatic and immediate repair (self-healing: parallelized restore, which is faster than classic RAID restore).
Drive replacement is a single-step process.

Demo:

3 node cluster, with 42 drives, 3 CSVs.
1 drive is pulled, and it shows a “Lost Communication” status.
The 3 CSVs now have a Warning health status – remember that each virtual disk (LUN) consumes space from each physical disk in the pool.
Runs: Cluster* | DebugStorageSubSystem …. this cmdlet for S2D does a complete cluster health check. The fault is found, devices identified (including disk & server serial), fault explained, and a recommendation is made. We never had this simple debug tool in WS2012 R2.
Runs: $Volumes | Debug-Volume … returns health info on the CSVs, and indicates that drive resiliency is reduced. It notes that a restore will happen automatically.
The drive is automatically marked as restired.
S2D (Get-StorageJob) starts a repair automatically – this is a parallelized restore writing across many drives, instead of just to 1 replacement/hot drive.
A new drive is inserted into the cluster. In WS2012 R2 we had to do some manual steps. But in WS2016 S2D, the disk is added automatically. We can audit this by looking at jobs.
A rebalance job will automatically happen, to balance data placement across the physical drives.

So what are the manual steps you need to do to replace a failed drive?

Pull the old drive
Install a new drive

S2D does everything else automatically.

Server Fault Tolerance

You can survive up to 2 node failures (4+ node cluster).
Copies of data are stored in different servers, not just different drives.
Able to accommodate servicing and maintenance – because data is spread across the nodes. So not a problem if you pause/drain a node to do planned maintenance.
Data resyncs automatically after a node has been paused/restarted.

Think of a server as a super drive.

Chassis & Rack Fault Tolerance

Time to start thinking about fault domains, like Azure does.

You can spread your S2D cluster across multiple racks or blade chassis. This is to create the concept of fault domains – different parts of the cluster depend on different network uplinks and power circuits.

You can tag a server as being in a particular rack or blade chassis. S2D will respect these boundaries for data placement, therefore for disk/server fault tolerance.

Efficiency

Claus is back on stage.

Mirroring is Costly

Everything so far about fault tolerance in the presentation has been about 3-copy mirror. And mirroring is expensive – this is why we encounter so many awful virtualization deployments on RAID5. So if 2-copy mirror (like RAID 10) gives us the raw storage as usable storage, and only 1/3 with 3-way mirroring, this is too expensive.

2-way and 3-way mirroring give us the best performance, but parity/erasure coding/RAID5 give us the best usable storage percentage. We want performance, but we want affordability too.

We can do erasure coding with 4 nodes in an S2D cluster, but there is a performance hit.

Issues with erasure coding (parity or RAID 5):

To rebuild from one failure, you have to read every column (all the disks), which ties up valuable IOPS.
Every write incurs an update of the erasure coding, which tiers up valuable CPU. Actively written data means calculating the encoding over and over again. This easily doubles the computational work involved in every write!

Local Reconstruction Codes

A product of Microsoft Research. It enables much faster recovery of a single drive by grouping bits. The XO the groups and restore required bits instead of an entire stripe. It reduces the number of devices that you need to touch to do a restore of a disk when using parity/erasure coding. This is used in Azure and in S2D.

This allows Microsoft to use erasure coding on SSD, as do many HCI vendors, but also on HDDs.

The below depicts the levels of efficiency you can get with erasure coding – note that you need 4 nodes minimum for erasure coding. The more nodes that you have, the better the efficiencies.

Accelerated Erasure Coding

S2D optimizes the read-modify-write nature of erasure coding. A virtual disk (a LUN) can combine mirroring and erasure coding!

Mirror: hot data with fast write
Erasure coding: cold data – fewer parity calculations

The tiering is real time, not scheduled like in normal Storage Spaces. And ReFS metadata handling optimizes things too – you should use ReFS on the data volumes in S2D!

Think about it. A VM sends a write to the virtual disk. The write is done to the mirror and acknowledged. The VM is happy and moves on. Underneath, S2D is continuing to handle the persistently stored updates. When the mirror tier fills, the aged data is pushed down to the erasure coding tier, where parity is done … but the VM isn’t affected because it has already committed the write and has moved on.

And don’t forget that we have flash-based caching devices in place before the VM hits the virtual disk!

As for updates to the parity volume, ReFS is very efficient, thanks to it’s way of abstracting blocks using metadata, e.g. accelerated VHDX operations.

The result here is that we get the performance of mirroring for writes and hot data (plus the flash-based cache!) and the economies of parity/erasure coding.

If money is not a problem, and you need peak performance, you can always go all-mirror.

Storage Efficiency Demo (Multi-Resilient Volumes)

Claus does a demo using PoSH.

Note: 2-way mirroring can lose 1 drive/system and is 50% efficient, e.g. 1 TB of usable capacity has a 2 TB footprint of raw capacity.

12 node S2D cluster, each has 4 SSDs and 12 HDDs. There is 500 TB of raw capacity in the cluster.
Claus creates a 3-way mirror volume of 1 TB (across 12 servers). The footprint is 3 TB of raw capacity. 33% efficiency. We can lose 2 systems/drives
He then creates a parity volume of 1 TB (across 12 servers). The The footprint is 1.4 TB of raw capacity. 73% efficiency. We can lose 2 systems/drives
3 more volumes are created, with different mixtures of 3-way mirroring and erasure coding.
The 500 GB mirror + 500 dual parity virtual disk has 46% efficiency with a 2.1 TB footprint.
The 300 GB mirror + 700 dual parity virtual disk has 54% efficiency with a 1.8 TB footprint.
The 100 GB mirror + 900 dual parity virtual disk has 65% efficiency with 1.5 TB footprint.

Microsoft is recommending that 10-20% of the usable capacity in “hybrid volumes” should be 3-way mirror.

If you went with the 100/900 balance for a light write workload in a hybrid volume, then you’ll get the same performance as a 1 TB 3-way mirror volume, but by using half of the raw capacity (1.5 TB instead of 3 TB).

CPU Efficiency

S2D is embedded in the kernal. It’s deep down low in kernel mode, so it’s efficient (fewer context switches to/from user mode). A requirement for this efficiency is using Remote Direct Memory Access (RDMA) which gives us the ultra-efficient SMB Direct.

There’s lots of replication traffic going on between the nodes (east-west traffic).

RDMA means that:

We use less CPU when doing reads/write
But we also can increase the amount of read/write IOPS because we have more CPU available
The balance is that we have more CPU for VM workloads in a HCI deployment

Customer Case Study

I normally hate customer case studies in these sessions because they’re usually an advert. But this quick presentation by Ben Thomas of Datacom was informative about real world experience and numbers.

They switched from using SANs to using 4-node S2D clusters with 120 TB usable storage – a mix of flash/SATA storage. Expansion was easy compared to compute + SAN > just buy a server and add it to the cluster. Their network was all Ethernet (even the really fast 100 Gbps Mellanox stuff is Ethernet-based) so they didn’t need fibre networks for SAN anymore. Storage deployment was easy. In SAN there’s create the LUN, zone it, etc. In S2D, 1 cmdlet creates a virtual disk with the required resilience/tiering, formats it, and it appears as a replicated CSV across all the nodes.

Their storage ended up costing them $0.04 / GB or $4 / 1000 IOPS. The IOPS was guaranteed using Storage QoS.

Manageability

Cosmos is back.

You can use PowerShell and FCM, but mid-large customers should use System Center 2016. SCVMM 2016 can deploy your S2D cluster on bare metal.

Note: I’m normally quite critical of SCVMM, but I’ve really liked how SCVMM simplified Hyper-V storage in the past.

If you’re doing a S2D deployment, do a Hyper-V deployment and check a single box to enable S2D and that’s it, you get a HCI cluster instead of a compute cluster that requires storage from elsewhere. Simple!

SCOM provides the monitoring. They have a big dashboard to visualize alerts and usage of your S2D cluster.

Where is all that SCOM data coming from? You can get this raw data yourself if you don’t have System Center.

Health Service

New in WS2016. S2D has a health service built into the OS. This is the service that feeds info to the SCOM agents. It has:

Always-on monitoring
Alerting with severity, description, and call to action (recommendation)
Root-cause analysis to reduce alert noise
Monitoring software and hardware from SLA down to the drive (including enclosure location awareness)

We actually saw the health service information in an earlier demo when a drive was pulled from an S2D cluster.

It’s not just health. There are also performance, utilization, and capacity metrics. All this is built into the OS too, and accessible via PowerShell or API: Cluster* | Get-StorageHealthReport

DataON MUST

Cosmos shows a new tool from DataON, a manufacturer of Storage Spaces and Storage Spaces Direct (S2D) hardware.

If you are a reseller in the EU, then you can purchase DataON hardware from my employer, MicroWarehouse (www.mwh.ie) to resell to your customers.

DataON has made a new tool called MUST for management and monitoring of Storage Spaces and S2D.

Cosmos logs into a cloud app, must.dataonstorage.com. It has a nice bright colourful and informative dashboard with details of the DataON hardware cluster. The data is live and updating in the console, including animated performance graphs.

There is an alert for a server being offline. He browses to Nodes. You can see healthy node with all it’s networking, drives, CPUs, RAM, etc.

He browses to the dead machine – and it’s clearly down.

Two things that Cosmos highlights:

It’s a browser-based HTML5 experience. You can access this tool from any kind of device.
DataON showed a prototype to Cosmos – a “call home” feature. You can opt in to get a notification sent to DataON of a h/w failure, and DataON will automatically have a spare part shipped out from a relatively local warehouse.

The latter is the sort of thing you can subscribe to get for high-end SANs, and very nice to see in commodity h/w storage. That’s a really nice support feature from DataON.

Cost

So, controversy first, you need WS2016 Datacenter Edition to run S2D. You cannot do this with Standard Edition. Sorry small biz that was considering this with a 2 node cluster for a small number of VMs – you’ll have to stick with a cluster in a box.

Me: And the h/w is rack servers with RDMA networking – you’ll be surprised how affordable the half-U 100 GbE switches from Mellanox are – each port breaks out to multiple cables if you want. Mellanox price up very nicely against Cisco/HPE/Dell/etc, and you’ll easily cover the cost with your SAN savings.

Hardware

Microsoft has worked with a number of server vendors to get validated S2D systems in the market. DataON will have a few systems, including an all-NVME one and this 2U model with 24 x 2.5” disks:

You can do S2D on any hardware with the pieces, Microsoft really wants you to use the right, validated and tested, hardware. you know, you can put a loaded gun to your head, release the safety, and pull the trigger, but you probably shouldn’t. Stick to the advice, and use especially engineered & tested hardware.

Project Kepler-47

One more “fun share” by Claus.

2-nodes are now supported by S2D, but Microsoft wondered “how low can we go?”. Kepler-47 is a proof-of-concept, not a shipping system.

These are the pieces. Note that the motherboard is mini-ITX; the key thing was that it had a lot of SATA connectors for drive connectivity. The installed Windows on a USB3 DOM. 32 GB RAM/node. There are 2 SATA SSDs for caching and 6 HDDs for capacity in each node.

There are two nodes in the cluster.

It’s still server + drive fault tolerant. They use either a file share witness or a cloud witness for quorum. It has 20 TB of usable mirrored capacity. Great concept for remote/branch office scenario..

Both nodes are 1 cubic foot, 45% smaller than 2U of rack space. In other words, you can fit this cluster into one carry-on bag in an airplane! Total hardware cost (retail, online), excluding drives, was $2,190.

The system has no HBA, no SAS expander, and no NIC, switch or Ethernet! They used Thunderbolt networking to get 20 Gbps of bandwidth between the 2 servers (using a PoC driver from Intel).

Summary

My interpretation:

Sooooo:

Faster than SAN
Cheaper than SAN
Probably better fault tolerance than SAN thanks to fault domains
And the same level of h/w support as high end SANs with a support subscription, via hardware from DataON

Why are you buying SAN for Hyper-V?

Technorati Tags: Event Notes,Storage Spaces,Storage,Windows Server 2016,Failover Clustering,Hardware,Hyper-V,Virtualisation

Ignite 2016 – Discover What’s New In Windows Server 2016 Virtualization

This post is a collection of my notes from the Ben Armstrong’s (Principal Program Manager Lead in Hyper-V) session (original here) on the features of WS2016 Hyper-V. The session is an overview of the features that are new, why they’re there, and what they do. There’s no deep-dives.

A Summary of New Features

Here is a summary of what was introduced in the last 2 versions of Hyper-V. A lot of this stuff still cannot be found in vSphere.

And we can compare that with what’s new in WS2016 Hyper-V (in blue at the bottom). There’s as much new stuff in this 1 release as there were in the last 2!

Security

The first area that Ben will cover is security. The number of attack vectors is up, attacks are on the rise, and the sophistication of those attacks is increasing. Microsoft wants Windows Server to be the best platform. Cloud is a big deal for customers – some are worried about industry and government regulations preventing adoption of the cloud. Microsoft wants to fix that with WS2016.

Shielded Virtual Machines

Two basic concepts:

A VM can only run on a trusted & healthy host – a rogue admin/attacker cannot start the VM elsewhere. A highly secured Host Guardian Service must authorize the hosts.
A VM is encrypted by the customer/tenant using BitLocker – a rogue admin/attacker/government agency cannot inspect the VM’s contents by mounting the disk(s).

There are levels of shielding, so it’s not an all or nothing.

Key Storage Drive for Generation 1 VMs

Shielding, as above, required Generation 2 VMs. You can also offer some security for Generation 1 virtual machines: Key Storage Drive. Not as secure as shielded virtual machines or virtual TPM, but it does give us a safe way to use BitLocker inside a Generation 1 virtual machine – required for older applications that depend on older operating systems (older OSs cannot be used in Generation 2 virtual machines).

Virtual Secure Mode (VSM)

We also have Guest Virtual Secure Mode:

Credential Guard: protecting ID against pass-the-hash by hiding LSASS in a secured VM (called VSM) … in a VM with a Windows 10 or Windows Server 2016 guest OS! Malware running with admin rights cannot steal your credentials in a VM.
Device Guard: Protect the critical kernel parts of the guest OS against rogue s/w, again, by hiding them in a VSM in a Windows 10 or Windows Server 2016 guest OS.

Secure Boot for Linux Guests

Secure boot was already there for Windows in Generation 2 virtual machines. It’s now there for Linux guest OSs, protecting the boot loader and kernel against root kits.

Host Resource Protection (HRP)

Ben hopes you never see this next feature in action in the field This is because Host Resource Protection is there to protect hosts/VMs from a DOS attack against a host by someone inside a VM. The scenario: you have an online application running in a VM. An attacker compromises the application (example: SQL injection) and gets into the guest OS of the VM. They’re isolated from other VMs by the hypervisor and hardware/DEP, so they attack the host using DOS, and consume resources.

A new feature, from Azure, called HRP will determine that the VM is aggressively using resources using certain patterns, and start to starve it of resources, thus slowing down the DOS attack to the point of being pointless. This feature will be of particular interest to:

Companies hosting external facing services on Hyper-V/Windows Azure Pack/Azure Stack
Hosting companies using Hyper-V/Windows Azure Pack/Azure Stack

This is another great example of on-prem customers getting the benefits of Azure, even if they don’t use Azure. Microsoft developed this solution to protect against the many unsuccessful DOS attacks from Azure VMs, and we get it for free for our on-prem or hosted Hyper-V hosts. If you see this happening, the status of the VM will switch to Host Resource Protection.

Security Demos

Ben starts with virtual TPM. The Windows 10 VM has a virtual TPM enabled and we see that the C: drive is encrypted. He shuts down the VM to show us the TPM settings of the VM. We can optionally encrypt the state and live migration traffic of the VM – that means a VM is encrypted at rest and in transit. There is a “performance impact” for this optional protection, which is why it’s not on by default. Ben also enables shielding – and he loses console access to the VM – the only way to connect to the machine is to remote desktop/SSH to it.

Note: if he was running the full host guardian service (HGS) infrastructure then he would have had no control over shielding as a normal admin – only the HGS admins would have had control. And even the HGS admins have no control over BitLocker.

He switches to a Generation 1 virtual machine with Key Storage Drive enabled. BitLocker is running. In the VM settings (Generation 1) we see Security > Key Storage Drive Enabled. Under the hood, an extra virtual hard disk is attached to the VM (not visible in the normal storage controller settings, but visible in Disk Management in the guest OS). It’s a small 41 MB NTFS volume. The BitLocker keys are stored there instead of a TPM – virtual TPM is only in Generation 2, but it’s using the same sorts of tech/encryption/methods to secure the contents in the Key Storage Drive, but it cannot be as secure as virtual TPM, but it is better than not having BitLocker. Microsoft can make the same promises with data at rest encryption for Generation 1 VMs, but it’s still not as good as a Generation 2 VM with vTPM or even a shielded VM (requires Generation 2).

Availability

The next section is all about keeping services up and running in Hyper-V, whether it’s caused by upgrades or infrastructure issues. Everyone has outages and Microsoft wants to reduce the impact of these. Microsoft studied the common causes, and started to tackle them in WS2016

Cluster OS Rolling Upgrades

Microsoft is planning 2-3 updates per year for Nano Server, plus there’ll be other OS upgrades in the future. You cannot upgrade a cluster node. And in the past we could only do cluster-cluster migrations to adopt new versions of Windows Server/Hyper-V. Now, we can:

Remove cluster node 1
Rebuild cluster node 1 with the new version of Windows Server/Hyper-V
Add cluster node 1 to the old cluster – the cluster runs happily in mixed-mode for a short period of time (weeks), with failover and Live Migration between the old/new OS versions.
Repeat steps 1-3 until all nodes are up to date
Upgrade the cluster functional level – Update-ClusterFunctionalLevel (see below for “Emulex incident”)
Upgrade the VMs’ version level

Zero VM downtime, zero new hardware – 2 node cluster, all the way to a 64 node cluster.

If you have System Center:

Upgrade to SCVMM 2016.
Let it orchestrate the cluster upgrade (above)

Supports starts with WS2012 R2 to WS2016. Re-read that statement: there is no support for W2008/W2008 R2/WS2012. Re-read that last statement. No need for any questions now

To avoid an “Emulex incident” (you upgrade your hosts – and a driver/firmware fails even though it is certified, and the vendor is going to take 9 months to fix the issue) then you can actually:

Do the node upgrades.
Delay the upgrade to the cluster functional level for a week or two
Test your hosts/cluster for driver/firmware stability
Rollback the cluster nodes to the older OS if there is an issue –> only possible if the cluster functional level is on the older version.

And there’s no downtime because it’s all leveraging Live Migration.

Virtual Machine Upgrades

This was done automatically when you moved a VM from version X to version X+1. Now you control it (for the above to work). Version 8 is WS2016 host support.

Microsoft identified two top causes of outages in customer environments:

Brief storage “outages” – crashing the guest OS of a VM when an IO failed. In WS2016, when an IO fails, the VM is put in a paused-critical state (for up to 24 hours, by default). The VM will resume as soon as the storage resumes.
Transient network errors – clustered hosts being isolated causing unnecessary VM failover (reboot), even if the VM was still on the network. A very common 30 seconds network outage will cause a Hyper-V cluster to panic up to and including WS2012 R2 – attempted failovers on every node and/or quorum craziness! That’s fixed in WS2016 – the VMs will stay on the host (in an unmonitored state) if they are still networked (see network protection from WS2012 R2). Clustering will wait (by default) for 4 minutes before doing a failover of that VM. If a host glitches 3 times in an hour it will be automatically quarantined, after resuming from the 3rd glitch, (VMs are then live migrated to other nodes) for 2 hours, allowing operator inspection.

Guest Clustering with Shared VHDX

Version 1 of this in WS2012 R2 was limited – supported guest clusters but we couldn’t do Live Migration, replication, or backup of the VMs/shared VHDX files. Nice idea, but it couldn’t really be used in production (it was supported, but functionally incomplete) instead of virtual fibre channel or guest iSCSI.

WS2016 has a new abstracted form of Shared VHDX – it’s even a new file format. It supports:

Backup of the VMs at the host level
Online resizing
Hyper-V Replica (which should lead to ASR support) – if the workload is important enough to cluster, then it’s important enough to replicate for DR!

One feature that does not work (yet) is Storage Live Migration. Checkpoint can be done “if you know what you are doing” – be careful!!!

Replica Support for Hot-Add VHDX

We could hot-add a VHDX file to a VM, but we could not add that to replication if the VM was already being replicated. We had to re-replicate the VM! That changes in WS2016, thanks to the concept of replica sets. A new VHDX is added to a “not-replicated” set and we can move it to the replicated set for that VM.

Hot-Add Remove VM Components

We can hot-add and hot-remove vNICs to/from running VMs. Generation 2 VMs only, with any supported Windows or Linux guest OS.

We can also hot-add or hot-remove RAM to/from a VM, assuming:

There is free RAM on the host to add to the VM
There is unused RAM in the VM to remove from the VM

This is great for those VMs that cannot use Dynamic Memory:

No support by the workload
A large RAM VM that will benefit from guest-aware NUMA

A nice GUI side-effect is that guest OS memory demand is now reported in Hyper-V Manager for all VMs.

Production Checkpoints

Referring to what used to be called (Hyper-V) snapshots, but were renamed to checkpoints to stop dumb people from getting confused with SAN and VSS snapshots – yes, people really are that stupid – I’ve met them.

Checkpoints (what are now called Standard Checkpoints) were not supported by many applications in a guest OS because they lead to application inconsistency. WS2016 adds a new default checkpoint type called a Production Checkpoint. This basically uses backup technology (and IT IS STILL NOT A BACKUP!) to create an application consistent checkpoint of a VM. If you apply (restore) the checkpoint the VM:

The VM will not boot up automatically
The VM will boot up as if it was restoring from a backup (hey dumbass, checkpoints are STILL NOT A BACKUP!)

For the stupid people, if you want to backup VMs, use a backup product. Altaro goes from free to quite affordable. Veeam is excellent. And Azure Backup Server gives you OPEX based local backup plus cloud storage for the price of just the cloud component. And there are many other BACKUP solutions for Hyper-V.

Now with production checkpoints, MSFT is OK with you using checkpoints with production workloads …. BUT NOT FOR BACKUP!

Demos

Ben does some demos of the above. His demo rig is based on nested virtualization. He comments that:

The impact of CPU/RAM is negligible
There is around a 25% impact on storage IO

Storage

The foundation of virtualization/cloud that makes or breaks a deployment.

Storage Quality of Service (QOS)

We had a basic system in WS2012 R2:

Set max IOPS rules per VM
Set min IOPS alerts per VM that were damned hard to get info from (WMI)

And virtually no-one used the system. Now we get storage QoS that’s trickled down from Azure.

In WS2016:

We can set reserves (that are applied) and limits on IOPS
Available for Scale-Out File Server and block storage (via CSV)
Metrics rules for VHD, VM, host, volume
Rules for VHD, VM, service, or tenant
Distributed rule application – fair usage, managed at storage level (applied in partnership by the host)
PoSH management in WS2016, and SCVMM/SCOM GUI

You can do single-instance or multi-instance policies:

Single-instance: IOPS are shared by a set of VMs, e.g. a service or a cluster, or this department only gets 20,000 IOPS.
Multi-instance: the same rule is applied to a group of VMs, the same rule for a large set of VMs, e.g. Azure guarantees at least X IOPS to each Standard storage VHD.

Discrete Device Assignment – NVME Storage

DDA allows a virtual machine to connect directly to a device. An example is a VM connects directly to extremely fast NVME flash storage.

Note: we lose Live Migration and checkpoints when we use DDA with a VM.

Evolving Hyper-V Backup

Lots of work done here. WS2016 has it’s only block change tracking (Resilient Change Tracking) so we don’t need a buggy 3rd party filter driver running in the kernel of the host to do incremental backups of Hyper-V VMs. This should speed up the support of new Hyper-V versions by the backup vendors (except for you-know-who-yellow-box-backup-to-tape-vendor-X, obviously!).

Large clusters had scalability problems with backup. VSS dependencies have been lessened to allow reliable backups of 64 node clusters.

Microsoft has also removed the need for hardware VSS snapshots (a big source of bugs), but you can still make use of hardware features that a SAN can offer.

ReFS Accelerated VHDX Operations

Re-FS is the preferred file system for storing VMs in WS2016. ReFS works using metadata which links to data blocks. This abstraction allows very fast operations:

Fixed VHD/X creation (seconds instead of hours)
Dynamic VHD/X expansion
Checkpoint merge, which impacts VM backup

Note, you’ll have to reformat WS2012 R2 ReFS to get the new version of ReFS.

Graphics

A lot of people use Hyper-V (directly or in Azure) for RDS/Citrix.

RemoteFX Improvements

The AVC444 thing is a lossless codec – lossless 3D rendering, apparently … that’s gobbledegook to me.

DDA Features and GPU Capabilities

We can also use DDA to connect VMs directly to CPUs … this is what the Azure N-Series VMs are doing with high-end NVIDIA GFX cards.

DirectX, OpenGL, OpenCL, CUDA
Guest OS: Server 2012 R2, Server 2016, Windows 10, Linux

The h/w requirements are very specific and detailed. For example, I have a laptop that I can do RemoteFX with, but I cannot use for DDA (SRIOV not supported on my machine).

Headless Virtual Machine

A VM can be booted without display devices. Reduces the memory footprint, and simulates a headless server.

Operational Efficiency

Once again, Microsoft is improving the administration experience.

PowerShell Direct

You can now to remote PowerShell into a VM via the VMbus on the host – this means you do not need any network access or domain join. You can do either:

Enter-PSSession for an interactive session
Invoke-Command for a once-off instruction

Supports:

Host: Windows 10/WS2016
Guest: Windows 10/WS2016

You do need credentials for the guest OS, and you need to do it via the host, so it is secure.

This is one of Ben’s favourite WS2016 features – I know he uses it a lot to build demo rigs and during demos. I love it too for the same reasons.

PowerShell Direct – JEA and Sessions

The following are extensions of PowerShell Direct and PowerShell remoting:

Just Enough Administration (JEA): An admin has no rights with their normal account to a remote server. They use a JEA config when connecting to the server that grants them just enough rights to do their work. Their elevated rights are limited to that machine via a temporary user that is deleted when their session ends. Really limits what malware/attacker can target.
Justin-Time Administration (JITA): An admin can request rights for a short amount of time from MIM. They must enter a justification, and company can enforce management approval in the process.

vNIC Identification

Name the vNICs and make that name visible in the guest OS. Really useful for VMs with more than 1 vNIC because Hyper-V does not have consistent device naming.

Hyper-V Manager Improvements

Yes, it’s the same MMC-based Hyper-V Manager that we got in W2008, but with more bells and whistles.

Support for alternative credentials
Connect to a host IP address
Connect via WinRM
Support for high-DPI monitors
Manage WS2012, WS2012 R2 and WS2016 from one HVM – HVM in Win10 Anniversary Update (The big Redstone 1 update in Summer 2016) has this functionality.

VM Servicing

MS found that the vast majority of customers never updated the Integration services/components (ICs) in the guest OS of VMs. It was a horrible manual process – or one that was painful to automate. So customers ran with older/buggy versions of ICs, and VMs often lacked features that the host supported!

ICs are updated in the guest OS via Windows Update on WS2016. Problem sorted, assuming proper testing and correct packaging!

MSFT plans to release IC updates via Windows Update to WS2012 R2 in a month, preparing those VMs for migration to WS2016. Nice!

Core Platform

Ben was running out of time here!

Delivering the Best Hyper-V Host Ever

This was the Nano Server push. Honestly – I’m not sold. Too difficult to troubleshoot and a nightmare to deploy without SCVMM.

I do use Nano in the lab. Later, Ben does a demo. I’d not seen VM status in the Nano console before, which Ben shows – the only time I’ve used the console is to verify network settings that I set remotely using PoSH There is also an ability to delete a virtual switch on the console.

Nested Virtualization

Yay! Ben admits that nested virtualization was done for Hyper-V Containers on Azure, but we people requiring labs or training environments can now run multiple working hosts & clusters on a single machine!

VM Configuration File

Short story: it’s binary instead of XML, improving performance on dense hosts. Two files:

.VMCX: Configuration
.VMRS: Run state

Power Management

Client Hyper-V was impacted badly by Windows 8 era power management features like Connected Standby. That included Surface devices. That’s sorted now.

Development Stuff

This looks like a seed for the future (and I like the idea of what it might lead to, and I won’t say what that might be!). There is now a single WMI (Root\HyperVCluster\v2) view of the entire Hyper-V cluster – you see a cluster as one big Hyper-V server. It really doesn’t do much now.

And there’s also something new called Hyper-V sockets for Microsoft partners to develop on. An extension of the Windows Socket API for “fast, efficient communication between the host and the guest”.

Scale Limits

The numbers are “Top Gear stats” but, according to a session earlier in the week, these are driven by Azure (Hyper-V’s biggest customer). Ben says that the numbers are nuts and we normals won’t ever have this hardware, but Azure came to Hyper-V and asked for bigger numbers for “massive scale”. Apparently some customers want massive super computer scale “for a few months” and Azure wants to give them an OPEX offering so those customers don’t need to buy that h/w.

Note Ben highlights a typo in max RAM per VM: it should say 12 TB max for a VM … what’s 4 TB between friends?!?!

Ben wraps up with a few demos.

Webinar: What’s New in WS2016 Failover Clustering & Storage

We’ve only just wrapped up our webinar on Windows Server 2016 Hyper-V, and we’ve just announced our next one, this time focusing on high availability and storage in Windows Server 2016. This is a huge release for these areas and there’s lots to cover … and show in demos (I am definitely mad to do live demos with a technical preview release).

The webinar is at 14:00 UK/Ireland, 15:00 CET, and 09:00 EST (I think) on September 22nd. You can register here.

In case you’re wondering why I’m taking a 1 month break from webinars … I need some time to update my Azure IaaS training course materials. We’ll share more about that on learn.mwh.ie when we’re ready.

Technorati Tags: Windows Server 2016,Hyper-V,Failover Clustering,Storage,Storage Spaces

Webinar Today: Reducing Costs By Switching From VMware to Hyper-V on DataON Cluster-in-a-Box

I’m presenting in a webinar by DataON Storage later today at 6PM UK/Irish time, 7PM Central Europe, and 1 PM Eastern. The focus is on how small-medium businesses can switch to an all-Microsoft server stack on DataON hardware and greatly reduce costs, while simplifying the deployment and increasing performance.

There are a number of speakers, including me, DataON, a customer that made that jump recently, and HGST (manufacturer of enterprise class flash storage).

You can register here.

Technorati Tags: Events,Hyper-V,Failover Clustering,Storage,Storage Spaces,Hardware

Webinar Recording – Clustering for the Small/Medium Enterprise & Branch Office

I recently did another webinar for work, this time focusing on how to deploy an affordable Hyper-V cluster in a small-medium business or a remote/branch office. The solution is based on Cluster-in-a-Box hardware and Windows Server 2012 R2 Hyper-V and Storage Spaces. Yes, it reduces costs, but it also simplifies the solution, speeds up deployment times, and improves performance. Sounds like a win-win-win-win offering!

We have shared the recording of the webinar on the MicroWarehouse site, and that page also includes the slides and some additional reading & viewing.

The next webinar has been scheduled; On August 25th at 2PM UK/Irish time (there is a calendar link on the page) I will be doing a session on what’s new in WS2016 Hyper-V, and I’ll be doing some live demos. Join us even if you don’t want to learn anything about Windows Server 2016 Hyper-V, because it’s live demos using a Technical Preview build … it’s bound to all blow up in my face.

Technorati Tags: Hyper-V,Windows Server 2012 R2,Virtualisation,Failover Clustering,Hardware,Storage,Storage Spaces

KB3172614 To Replace/Fix Hyper-V Installations Broken By KB3161606

Microsoft released a new update rollup to replace the very broken and costly (our time = our money) June rollup, KB3161606. These issues affected Hyper-V on Windows 8.1 and Windows Server 2012 R2 (WS2012 R2).

It’s sad that I have to write this post, but, unfortunately, untested updates are still being released by Microsoft. This is why I advise that updates are delayed by 2 months.

In the case of the issues in the June 2016 update rollup, the fixes are going to require human effort … customers’ human effort … and that means customers are paying for issues caused by a supplier. I’ll let you judge what you think of that (feel free to comment below).

A month after news of the issues in the update became known (the update rollup was already in the wild for a week or two), Microsoft has issued a superseding update that will fix the issues. At the same time, they finally publicly acknowledge the issues in the June update:

So it took 1.5 months, from the initial release, for Microsoft to get this update right. That’s why I advise a 2 month delay on approving/deploying updates, and I continue to do so.

What Microsoft needs to fix?

Change the way updates are created/packaged. This problem has been going on for years. Support are not good at this stuff, and it needs to move into the product groups.
Microsoft has successfully reacted to market pressure by making a special emphasis to change, e.g. The Internet, secure coding, The Cloud. Satya Nadella needs to do the same for quality assurance (QA), something that I learned in software engineering classes was as important as the code. I get that edge scenarios are hard to test, but installing/upgrading ICs in a Hyper-V guest OS is hardly a rare situation.
Start communicating. Put your hands up publicly, and say “mea culpa”, show what went wrong and follow it up with progress reports on the fix.

Technorati Tags: Windows 8.1,Windows Server 2012 R2,Windows Updates,Hyper-V,Virtualisation,Storage,Failover Clustering