TEE14 – Software Defined Storage in Windows Server vNext

Speaker: Siddhartha Roy

Software-Defined Storage gives you choice. It’s a breadth offering and unified platform for MSFT workloads and public cloud scale. Economical storage for private/public cloud customers.

About 15-20% of the room has used Storage Spaces/SOFS.

What is SDS? Cloud scale storage and cost economics on standard, volume hardware. Based on what Azure does.

Where are MSFT in the SDS Journey Today?

In WS2012 we got Storage Spaces as a cluster supported storage system. No tiering. We could build a SOFS using cluster supported storage, and present that to Hyper-V hosts via SMB 3.0.

  • Storage Spaces: Storage based on economical JBOD h/w
  • SOFS: Transparent failover, continuously available application storage platform.
  • SMB 3.0 fabric: high speed, and low latency can be added with RDMA NICs.

What’s New in Preview Release

  • Greater efficiency
  • More uptime
  • Lower costs
  • Reliability at scale
  • Faster time to value: get customers to adopt the tech

Storage QoS

Take control of the service and offer customers different bands of service.


Enabled by default on the SOFS. 2 metrics used: latency and IOPS. You can define policies around IOPS by using min and max. Can be flexible: on VHD level, VM level, or tenant/service level.

It is managed by System Center and PoSH. You have an aggregated end-end view from host to storage.

Patrick Lang comes on to do a demo. There is a file server cluster with 3 nodes. The SOFS role is running on this cluster. There is a regular SMB 3.0 file share. A host has 5 VMs running on it, stored on the share. One OLTP VM is consuming 8-10K IOPS using IOMETER. Now he uses PoSH to query the SOFS metrics. He creates a new policy with min 100 and max 200 for a bunch of the VMs. The OLTP workload gets a policy with min of 3000 and max of 5000. Now we see its IOPS drop down from 8-10K. He fires up VMs on another host – not clustered – the only commonality is the SOFS. These new VMs can take IOPS. A rogue one takes 2500 IOPS. All of the other VMs still get at least their min IOPS.

Note: when you look at queried data, you are seeing an average for the last 5 minutes. See Patrick Lang’s session for more details.

Rolling Upgrades – Faster Time to Value

Cluster upgrades were a pain. They get much easier in vNext. Take a node offline. Rebuild it in the existing cluster. Add it back in, and the cluster stays in mixed mode for a short time. Complete the upgrades within the cluster, and then disable mixed mode to get new functionality. The “big red switch” is a PoSH cmdlet to increase the cluster functional level.


Cloud Witness

A third site witness for multi-site cluster, using a service in Azure.


Compute Resiliency

Stops the cluster from being over aggressive with transient glitches.


Related to this is quarantine of flapping nodes. If a node is in and out of isolation too much, it is “removed” from the cluster. The default quarantine is 2 hours – give the admin a chance to diagnose the issue. VMs are drained from a quarantined node.

Storage Replica

A hardware agnostic synchronous replication system. You can stretch a cluster with low latency network. You get all the bits in the box to replicate storage. It uses SMB 3.0 as a transport. Can use metro-RDMA to offload and get low latency. Can add SMB encryption. Block-level synchronous requires <5MS latency. There is also an asynchronous connection for higher latency links.


The differences between synch and asynch:


Ned Pyle, a storage PM, comes on to demo Storage Replica. He’ll do cluster-cluster replication here, but you can also do server-server replication.

There is a single file server role on a cluster. There are 4 nodes in the cluster. There is assymetric clustered storage. IE half the storage on 2 nodes and the other half on the other 2 nodes. He’s using iSCSI storage in this demo. It just needs to be cluster supported storage. He right-clicks on a volume and selects Replication > Enable Replication … a wizard pops up. He picked a source disk. Clustering doesn’t do volumes … it does disks. If you do server-server repliction then you can replicate a volume. Picks a source replication log disk. You need to use a GPT disk with a file system. Picks a destination disk to replicate to, and a destination log disk. You can pre-seed the first copy of data (transport a disk, restore from backup, etc). And that’s it.

Now he wants to show a failover. Right now, the UI is buggy and doesn’t show a completed copy. Check the event logs. He copies files to the volume in the source site. Then moves the volume to the DR site. Now the replicated D: drive appears (it was offline) and all the files are there in the DR site ready to be used.

After the Preview?

Storage Spaces Shared Nothing – Low Cost

This is a no-storage-tier converged storage cluster. You create storage spaces using internal storage in each of your nodes. To add capacity you add nodes.

You get rid of the SAS layer and you can use SATA drives. The cost of SSD plummets with this system.


You can grow pools to hundreds of disks. A scenario is for primary IaaS workloads and for storage for backup/replication targets.

There is a prescriptive hardware configuration. This is not for any server from any shop. Two reasons:

  • Lots of components involved. There’s a lot of room for performance issues and failure. This will be delivered by MSFT hardware partners.
  • They do not converge the Hyper-V and storage clusters in the diagram (above). They don’t recommend convergence because the rates of scale in compute and storage are very different. Only converge in very small workloads. I have already blogged this on Petri with regards to converged storage – I don’t like the concept – going to lead to a lot of costly waste.

VM Storage Resiliency

A more graceful way of handling a storage path outage for VMs. Don’t crash the VM because of a temporary issue.


CPS – But no … he’s using this as a design example that we can implement using h/w from other sources (soft focus on the image).


Not talked about but in Q&A: They are doing a lot of testing on dedupe. First use case will be on backup targets. And secondary: VDI.

Data consistency is done by a Storage Bus Layer in the shared notching Storage Spaces system. It slips into Storage Spaces and it’s used to replicate data across the SATA fabric and expands its functionality. MSFT thinking about supporting 12 nodes, but architecturally, this feature has no limit in the number of nodes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.