The Virtualisation Smackdown – Hyper-V VHDX Scales Out to 64 TB – Yes, I said 64 Terabytes!

I was gobsmacked when I learned this week that the new Windows 8/Windows Server 2012 (WS2012) format for virtual disks, VHDX, would have a maximum size of 64 TB.  64 TB!  Damn, I was impressed with the Build announcement that it would go out to 16 TB.  Even then, it was dwarfing the paltry 2040 GB that vSphere 5.0 VMDK can do.  Wow, Hyper-V has vSphere smacked down on storage scalability; isn’t that a shocker!?!?!

Back to the serious side of things … what does this mean?  One of the big reasons that people have implemented virtualisation (28.19% – Great Big Hyper-V Survey of 2011) was flexibility.  What makes that possible is that virtual machines are normally just files, unbound to their hardware they reside on unlike legacy hardware OS installations and data storage.  A limiting factor on that has been the scalability of virtual disks.  Both VHD (pre-Windows 8 Hyper-V) and VMDK (all current versions of vSphere) are limited to 2040 GB.  The alternative is Raw Device Mapping (vSphere) or Passthrough disk (Hyper-V).

I hate this type of storage.  It’s bound to disk because it’s just a raw LUN presented to a VM and therefore it’s a hardware boundary that limits mobility, flexibility, and precludes other things that we can do such as Hyper-V Replica, snapshots, VSS backups of running VMs, etc.  Way too often I see people using Passthrough for “performance” reasons (usually with no assessment done and based on pure guesswork) without realising that even VHD has great performance (and I cannot wait for VHDX performance results to be published publicly).  The only real reason to use Passthrough disk in my opinion has been to scale a VM’s LUN beyond 2040 GB.

That changes with Windows Server 2012 Hyper-V.  I am thinking that Passthrough disk will become one of those things that is theoretical to 99.999999% of us.  It’ll be that exam question that no one can answer because they never do it.  Think about it: a 64 TB virtual disk that performs near as good as the physical disk it sits on.  Wow!

Question: So which virtualisation platform isn’t scalable or enterprise ready? *sniggers* I cannot wait to see the excuses that the competition come up with next.

There are other benefits to the VHDX format:

  • “Larger block sizes for dynamic and differencing disks, which allows these disks to attune to the needs of the workload.
  • A 4-KB logical sector virtual disk that allows for increased performance when used by applications and workloads that are designed for 4-KB sectors.
  • The ability to store custom metadata about the file that the user might want to record, such as operating system version or patches applied.
  • Efficiency in representing data (also known as “trim”), which results in smaller file size and allows the underlying physical storage device to reclaim unused space. (Trim requires physical disks directly attached to a virtual machine or SCSI disks, and trim-compatible hardware.)”

There are other things I’d love to share about VHDX, but I’m not sure of their NDA status at the moment so I’ll be sitting on those facts until later.  Being an MVP aint easy Smile

Platform Storage Evolved

“Windows 8 is the most cost effective HA storage solution”

  • Storage Spaces: virtualised storage
  • Offloaded data transfer (ODX)
  • Data deduplication

File System Availability

Confidently deploy 64 TB NTFS volumes with Windows 8 with Online scan and repair:

  • Online repair
  • Online scan and corruption logging
  • Scheduled repair
  • Downtime proportional only to number of logged corruptions: scans don’t mean downtime now
  • Failover clustering & CSV integration
  • Better manageability via Action Center, PowerShell and Server Manager

Note: this means bigger volumes aren’t the big maintenance downtime problem they might have been for Hyper-V clusters. 

Operational Simplicity

Extensible storage management API:

  • WMI programmatic interfaces
  • PSH for remote access and scripting – easy E2E provisioning
  • All new in-box application using one new API
  • Foundational infrastructure for reducing operations expenditure

Multi-vendor interoperability – common interface for IHVs

  • SMI-S standards conformant: proxy service enables broad interoperability with existing SMI-S storage h/w – standards based approach … wonder if the storage manufacturers know that Smile
  • Storage Management Provider interface enables host-based extensibility

Basically everything uses one storage management interface to access vendor arrays, SMI-S compliant arrays, and Storage Spaces compatible JBOD.  The Windows 8 admin tools use this single API via WMI and PowerShell.

We are shown a 6 line PSH script to create a disk pool, create a virtual disk, configure the virtual disk, mount it on the server, and format it with NTFS.

Storage Spaces

New category of cost effective, scalable, available storage, with operationsl simplicity for all customer segments.  Powerful new platform abstractions:

  • Storage pools: units of aggregation (of disks), administration and isolation
  • Storage spaces (virtual disks): resiliency, provisioning, and performance

Target design point:

  • Industry standard interconnects: SATA or (shared) SAS
  • Industry standard storage: JBODs

You take a bunch of disks and connect them to the server with (shared or direct) SAS (best) or direct SATA (acceptable).  The disks are aggregated into pools.  Pools are split into spaces.  You can do CSV, NFS, or Windows Storage Management.  Supports Hyper-V.

Shared SAS allows a single JBOD to be attached to multiple servers to make a highly available and scalable storage fabric.

Capabilities:

  • Optimized storage utilisation
  • Resiliency and application drive error correction
  • HA and scale out with Failover Clustering and CSV
  • Operational simplicity

Demo:

Iometer is running to simulate storage workloads.  40x Intel x25-M 160 GB SSDs connected to a Dell T710 (48 GB RAM, dual Intel CPU) server with 5 * LSI HBAs.  Gets 880580.06 read IOPS with this developer preview pre-beta release.

Changes demo to a workload that needs high bandwidth rather than IOPS.  This time he gets 3311.04 MB per second throughput.

Next demo is a JBOD with a pool (CSV).  A pair of spaces are created in the pool, each assigned to virtual machines.  Both VMs have VHDs.  The VHDs are stored in VHDs.  Both are running on different Hyper-V nodes.  Both nodes access the space via CSV.  In the demo, we see that both nodes can see both pools.  The spaces appear in Explorer with driver letters (Note: I do not like that – indicates a return to 2008 days?).  For some reason he used Quick Migration – why?!?!?  A space is only visible in explorer on a host if the VM is running on that host – they follow when VMs are migrated between nodes. 

Offloaded Data Transfer (ODX)

Co-developed with partners, e.g. Dell Equalogic.  If we copy large files on the SAN between servers, the source server normally has had to do the work (data in, CPU and SAN utilisation), send it over a latent LAN, and then the destination server has to write it to the SAN again (CPU and data out).  ODX offloads the work to a compatible SAN which can do it more quickly, and we don’t get the needless cross LAN data transfer or CPU utilisation.  E.g. Host A wants to send data to Host B.  Token is passed between hosts.  Host A sends job to SAN with the token.  SAN uses this token to sync with host B, and host B reads direct from the SAN, instead of getting data from host A across the LAN.  This will be a magic multi-site cluster data transfer solution.

In a demo, he copies a file from SAN A in Redmond to SAN B in Redmond on his laptop in Anaheim.  With ODX, runs at 250 Mbps with zero data transfer on his laptop, takes a few minutes.  With no ODX, it wants to copy data to Anaheim from SAN A and then copy data from Anaheim to SAN B, would take over 17 hours.

Thin Provisioning Notifications

Can ID thinly provisioned virtual disks. 

Data Deduplication 

Transparent to primary server workload.  Can save over 80% of storage for VHD library, around 50% for general file share.  Deduplication scope is the volume.  It is cluster aware.  It is integrated with BranchCache for optimised data transfer over the WAN.

The speakers run out of time.  Confusing presentation: think the topics covered need much more time.