GPT Protective Partition Prevents Creation Of Storage Spaces Storage Pool

I was working on a customer site today on a new JBOD & Storage Spaces installation. It should have been a pretty simple deployment, one I’ve done over and over.  But a simple step couldn’t be done. When we tried to build a new Storage Pool (an aggregation of disks for Storage Spaces) the primordial pool and the blank disks would not appear in Server Manager or in Failover Cluster Manager. PowerShell was no use either.

My first suspects were the LSI SAS cards. After much troubleshooting we found no solution. And then, I was mucking about in Disk Management when I saw something. I could bring disks online but they came up with strange behaviour, especially for new disks.

The disks came online as GPT disks, without any initialization being done by me. And the disks were … read only. They actually had a status of GPT Protective Disks.

A quick google later and I had a fix:

  • DiskPart
  • List Disk
  • Select Disk X
  • Clean
  • Repeat

With a bit of work I could have probably PowerShelled that up.

What do I think the cause was? The JBOD manufacturer supplied the disks. A part of their offer is that they’ll pre-assemble the kit and test the disks – no two disks from the same production run are made equal, and some are a lot less than capable. I think the tests left the disks in a weird state that Windows interpreted as being in this read only position.

The clean operation fixed things up and we were able to move on.

KB2913766 – Improving JBOD Management For WS2012 R2 Storage Spaces

A very useful update, KB2913766, was released by Microsoft to improve storage enclosure management for Storage Spaces in Windows 8.1 and Windows Server 2012 R2.

This article introduces a hotfix that extends platform support for Storage Spaces in Windows 8.1 and Windows Server 2012 R2. After you install this hotfix, storage enclosure management is improved. The improvement is achieved by adding Storage Management Application Programming Interface (SMAPI) support for enclosure awareness that enables managing and health monitoring of just-a-bunch-of-disks (JBOD) enclosures.

The hotfix is available from Microsoft.

There is no documentation to state what the exact improvements are. I know “some” stuff but I don’t know how clear I am to share it. A search based on that “stuff” revealed nothing public.

Dell & Microsoft Announce Support For WS2012 R2 Storage Spaces

Dell, in cooperation with Microsoft, announced the release of their supported hardware for Windows Server 2012 R2 Storage Spaces and Scale-Out File Server.

image

Microsoft said:

Dell’s announcement is an exciting development which will help more customers take advantage of the performance and availability of virtualized storage with Windows Server.

Dell went on:

Microsoft’s Storage Spaces, a technology in Windows Server 2012 R2, combined with Dell’s PowerEdge servers and PowerVault storage expansion solutions, can help organizations like hosters and cloud-providers that don’t have the feature-set needs for a separate storage array to deliver advanced, enterprise-class storage capabilities, such as continuous availability and scalability, on affordable industry-standard servers and storage.

The HCL has not been updated yet, but it appears that. Dell has two appliances that they are pushing:

  • MD1200
  • MD1220: a 24 drive tray, similar to the DataOn DNS-1640

Dell has also published Deploying Windows Server 2012 R2 Storage Spaces on Dell PowerVault.

image 2 x clustered servers with 2 x MD12xx JBODs

So, one of the big storage companies has blinked. Who is next?

BTW, when I checked out the Irish pricing, the Dell MD1220 was twice the price of the DataOn DNS-1640. After bid price, that’ll be an even match, so it’ll come down to disks for the pricing comparison.

How Microsoft Windows Build Team Replaced SANs with JBOD + Windows Server 2012 R2

I’ve heard several times in various presentations about a whitepaper by Microsoft that discusses how the Windows build team in Microsoft HQ replaced traditional SAN storage (from a certain big name storage company) with Scale-Out File Server architecture based on:

  • Windows Server 2012 R2
  • JBOD
  • Storage Spaces

I searched for this whitepaper time and time again and never found it. Then today I was searching for a different storage paper (which I have yet to find) but I did stumble on the whitepaper with the build team details.

The paper reveals that:

  • The Windows Build Team were using traditional SAN storage
  • They needs 2 petabytes of storage to do 40,000 Windows installations per day
  • 2 PB was enough space for just 5 days of data !!!!
  • A disk failure could affect dozens of teams in Microsoft

They switched to WS2012 R2 with SOFS architectures:

  • 20 x WS2012 R2 clustered file servers provide the SOFS HA architecture with easy manageability.
  • 20 x  JBODs (60 x 3.5″ disk slots) were selected. Do the maths; that’s 20 x 60 x 4 TB = 4800 TB or > 4.6  petabytes!!! Yes, the graphic says they are 3 TB drives but the text in the paper says the disks are 4 TB.
  • There is an aggregate of 80 Gbps of networking to the servers. This is accomplished with 10 Gbps networking – I would guess it is iWARP.

The result of the switch was:

  • Doubling of the storage throughput via SMB 3.0 networking
  • Tripling of the raw storage capacity
  • Lower overall cost – reduced the cost/TB by 33%
  • In conjunction with Windows Server dedupe, they achieved 5x increase in capacity wutg 45-75% de-duplication rate.
  • This lead to data retention going from 5 days to nearly a month.
  • 8 full racks of gear were culled. They reduced the server count by 6x.
  • Each week 720 petabytes of data flows across this network to/from the storage.

image

Check out the whitepaper to learn more about how Windows Server 2012 R2 storage made all this possible. And then read my content on SMB 3.0 and SOFS here (use the above search control) and on The Petri IT Knowledgebase.

CSV Cache Is Not Used With Heapmap-Tracked Tiered Storage Spaces

I had an email from Bart Van Der Beek earlier this week questioning an aspect of my kit list for a Hyper-V cluster that is using a SOFS with Storage Spaces for the shared cluster storage. I had added RAM to the SOFS nodes to use for CSV Cache. Bart had talked to some MSFT people who told him that CSV Cache would not be used with tiered storage spaces. He asked if I knew about this. I did not.

So I had the chance to ask Elden Christensen (Failover Clustering PM, TechEd speaker, and author of many of the clustering blog posts, and all around clustering guru) about it tonight. Elden explained that:

  • No, CSV cache is not used with tiered storage spaces where the heapmap is used. This is when the usage of 1 MB blocks is tracked and those blocks are automatically promoted to the hot tier, demoted to the cold tier, or left where they are on a scheduled basis.
  • CSV Cache is used when the heapmap is not used and you manually pin entire files to a tier. This would normally only be used in VDI. However, enabling dedupe on that volume will offer better performance than CSV Cache.

So, if you are creating tiered storage spaces in your SOFS, there is no benefit in adding lots of RAM to the SOFS nodes.

Thanks for the heads up, Bart.

A Kit/Parts List For A WS2012 R2 Hyper-V Cluster With DataOn SMB 3.0 Storage

I’ve had a number of requests to specify the pieces of a solution where there is a Windows Server 2012 R2 Hyper-V cluster that uses SMB 3.0 to store virtual machines on a Scale-Out File Server with Storage Spaces (JBOD). So that’s what I’m going to try to do with this post. Note that I am not going to bother with pricing:

  • It takes too long to calculate
  • Prices vary from country to country
  • List pricing is usually meaningless; work with a good distributor/reseller and you’ll get a bid/discount price.
  • Depending on where you live in the channel, you might be paying distribution price, trade price, or end-customer price, and that determines how much margin has been added to each component.
  • I’m lazy

Scale-Out File Server

Remember that an SOFS is a cluster that runs a special clustered file server role for application data. A cluster requires shared storage. That shared storage will be one or more Mini-SAS-attached JBOD trays (on the Storage Spaces HCL list) with Storage Spaces supplying the physical disk aggregation and virtualization (normally done by SAN controller software).

On the blade versus rack server question: I always go rack server. I’ve been burned by the limited flexibility and high costs of blades. Sure you can get 64 blades into a rack … but at what cost!?!?! FlexFabric-like solutions are expensive, and strictly speaking, not supported by Microsoft – not to mention they limit your bandwidth options hugely. The massive data centres that I’ve seen and been in use 1U and 2U rack servers.  I like 2U rack servers over 1U because 1U rack servers such as the R420 have only 1 full-height and 1 half-height PCI expansion slots. That half-height slot makes for tricky expansion.

For storage (and more) networking, I’ve elected to go with RDMA networking. Here you have two good choices:

  • iWARP: More affordable and running at 10 GbE – what I’ve illustrated here. Your vendor choice is Chelsio.
  • Infiniband: Amazing speeds (56 Gbps with faster to come) but more expensive. Your vendor choice is Mellanox.

I’ve ruled out RoCE. It’s too damned complicated – just ask Didier Van Hoye (@workinghardinit).

There will be two servers:

  • 2 x Dell R720: Dual Xeon CPU, 6 GB RAM, rail kits, dual CPU, on-board quad port 1 GbE NICs. The dual CPU gives me scalability to handle lots of hosts/clusters. The 4 x 1 GbE NICs are teamed (dynamic load distribution) for management functionality. I’d upgrade the built-in iDRAC Essentials to the Enterprise edition to get the KVM console and virtual media features. A pair of disks in RAID1 configuration are used for the OS in each of the SOFS nodes.
  • 10 x 1 GbE cables: This is to network the 4 x 1 GbE onboard NICs and the iDRAC management port. Who needs KVM when you’ve already bought it in the form of iDRAC.
  • 2 x Chelsio T520-CR: Dual port 10 GbE SFP+ iWARP (RDMA) NICs. These two rNICs are not teamed (not compatible with RDMA). They will reside on different VLANs/subnets for SMB Multichannel (cluster requirement). The role of these NICs is to converge SMB 3.0 storage, and cluster communications. I might even use these networks for backup traffic.
  • 4 x SFP+ cables: These are to connect the two servers to the two SFP+ 10 GbE switches.
  • 2 x LSI 9207-8e Mini-SAS HBAs: These are dual port Mini-SAS adapters that you insert into each server to connect to the JBOD(s). Windows MPIO provides the path failover.
  • 2 x Windows Server Standard Edition: We don’t need virtualization rights on the SOFS nodes. Standard edition includes Failover Clustering.

Regarding the JBODs:

Only use devices on the Microsoft HCL for your version of Windows Server. There are hardware features in these “dumb” JBODs that are required. And the testing process will probably lead to the manufacturer tweaking their hardware.

Not that although “any” dual channel SAS drive can be used, some firmwares are actually better than others. DataOn Storage maintain their own HCL of tested HDDs & SSDs and HBAs. Stick with the list that your JBOD vendor recommends.

How many and what kind of drives do you need? That depends. My example is just that: an example.

How many trays do you need? Enough to hold your required number of drives 😀 Really though, if I know that I will scale out to fill 3 trays then I will buy those 3 trays up front. Why? Because 3 trays is the minimum required for tray fault tolerance with 2-way mirror virtual disks (LUNs). Simply going from 1 tray to 2 and then 3 won’t do because data does not relocate.

Also remember that if you want tiered storage then there is a minimum number of SSDs (STRONGLY) recommended per tray.

Regarding using SATA drives: DON’T DO IT! The available interposer solution is strongly discouraged, even by DataOn.  If you really need SSD for tiered storage then you really need to pay (through the nose).

Here’s my EXAMPLE configuration:

  • 3 x DataOn Storage DNS-1640D: 24 x 2.5” disk slots in each 2U tray, each with a blank disk caddy for a dual channel SAS SSD or HDD drive. Each has dual boards for Mini-SAS connectivity (A+B for server 1 and A+B for server 2), and A+B connectivity for tray stacking. There is also dual PSU in each tray.
  • 18 x Mini-SAS cables: These cables are used to connect the LSI cards in the servers to the JBOD(s) and to stack the trays. At least I think 18 cables are required. They’re short cables because the servers are on top/under the JBOD trays and the entire storage solution is just 10U in height.
  • 12 x STEC S842E400M2 400GB SSD: Go google the price of these for a giggle! These are not your typical (or even “enterprise”) SSD that you’ll stick in a laptop.  I’m putting 4 into each JBOD, the recommended minimum number of SSDs in tiered storage if doing 2-way mirroring.
  • 48 x Seagate ST900MM0026 900 GB 10K SAS HDD: This gives us the bulk of the storage. There are 20 slots free (after the SSDs) in each JBOD and I’ve put in 16 disks into each. That gives me loads of capacity and some wiggle room to add more disks of either type.
  • 18 x Mini-SAS Cables: I’m not looking at a diagram and I’m tired so 18 might not be the right number. There’s a total of 10U of hardware in the SOFS (servers + JBOD) so short Mini-SAS cables will do the trick. These are used to attach the servers to the JBODs and to daisy chain the JBODs. The connections are fault tolerant – hence the high number of cables.

And that’s the SOFS, servers + JBODs with disks.

Just to remind you: it’s a sample spec. You might have one JBOD, you might have 4, or you might go with the 60 disk slot models. It all depends.

Hyper-V Hosts

My hosting environment will consist of one Hyper-V cluster with 8 nodes. This could be:

  • A few clusters, all sharing the same SOFS
  • One or more clusters with some non-clustered hosts, all sharing the same SOFS
  • Lots of non-clustered hosts, all sharing the same SOFS

One of the benefits of SMB 3.0 storage is that a shared folder is more flexible than a CSV on a SAN LUN. There are more sharing options, and this means that Live Migration can span the traditional boundary of storage without involving Shared-Nothing Live Migration.

Regarding host processors, the L2/L3 cache plays a huge role in performance. Try to get as new a processor as possible. And remember, it’s all Intel or all AMD; do not mix the brands.

There are lots of possible networking designs for these hosts. I’m going to use the design that I’ve implemented in the lab at work, and it’s also one that Microsoft recommends. A pair or rNICs (iWARP) will be used for the storage and cluster networking, residing on the same two VLANs as the cluster/storage networks that the SOFS nodes are on. Then two other NICs are going to be used for host and VM networking. These two NICs could be 1 GbE or 10 GbE or faster, depending on the needs of your VMs. I’ve got 4 pNICs to play with so I will team them.

    • 8 x Dell R720: Dual Xeon CPU, 256 GB RAM, rail kits, dual CPU, on-board quad port 1 GbE NICs. These are some big hosts. Put lots of RAM in because that’s the cheapest way to scale. CPU is almost never the 1st or even 2nd bottleneck in host capacity. The 4 x 1 GbE NICs are teamed (dynamic load distribution) for VM networking and management functionality. I’d upgrade the built-in iDRAC Essentials to the Enterprise edition to get the KVM console and virtual media features. A pair of disks in RAID1 configuration are used for the management OS.
    • 40 x 1 GbE cables: This is to network the 4 x 1 GbE onboard NICs and the iDRAC management port in each host. Who needs KVM when you’ve already bought it in the form of iDRAC.
    • 8 x Chelsio T520-CR: Dual port 10 GbE SFP+ iWARP (RDMA) NICs. These two rNICs are not teamed (not compatible with RDMA). They will reside on the same two different VLANs/subnets as the SOFS nodes. The role of these NICs is to converge SMB 3.0 storage, SMB 3.0 Live Migration (you gotta see it to believe it!), and cluster communications. I might even use these networks for backup traffic.
    • 16 x SFP+ cables: These are to connect the two servers to the two SFP+ 10 GbE switches.
    • 8 x Windows Server Datacenter Edition: The Datacenter edition gives us unlimited rights to install Windows Server into VMs that will run on these licensed hosts, making it the economical choice. Enabling Automatic Virtual Machine Activation in the VMs will simplify VM guest OS activation.

There are no HBAs in the Hyper-V hosts; the storage (SOFS) is accessed via SMB 3.0 over the rNICs.

Other Stuff

Hmm, we’re going to need:

  • 2 x SFP+ 10 GbE Switches with DBC support: Data Center Bridging really is required to do QoS of RDMA traffic. If would need PFC (Priority Flow Control) support if using RoCE for RDMA (not recommended – do either iWARP or Infiniband). Each switch needs at least 12 ports – allow for scalability.  For example, you might put your backup server on this network.
  • 2 x 1 GbE Switches: You really need a pair of 48 port top-of-rack switches in this design due to the number of 1 GbE ports being used and the need for growth.
  • Rack
  • PDU

And there’s probably other bits. For example, you might run a 2-node cluster for System Center and other management VMs. The nodes would have 32-64 GB RAM each. Those VMs could be stored on the SOFS or even on a JBOD that is directly attached to the 2 nodes with Storage Spaces enabled. You might run a server with lots of disk as your backup server. You might opt to run a pair of 1U servers are physical domain controllers for your infrastructure.

I recently priced up a kit, similar to above. It came in much cheaper than the equivalent blade/SAN configuration, which was a nice surprise. Even better was that the SOFS had 3 times more storage included than the SAN in that pricing!

How Many SSDs Do I Need For Tiered Storage Spaces?

This is a good question.  The guidance I had been given was between 4-8 SSDs per JBOD tray.  I’ve just found guidance that is a bit more precise.  This is what Microsoft says:

When purchasing storage for a tiered deployment, we recommend the following number of SSDs in a completely full disk enclosure of different bay capacities in order to achieve optimal performance for a diverse set of workloads:

Disk enclosure slot count Simple space 2-way mirror space 3-way mirror space
12 bay 2 4 6
24 bay 2 4 6
60 bay 4 8 12
70 bay 4 8 12

Minimum number of SSDs Recommended for Different Resiliency Settings

Migrating Two Non-Clustered Hyper-V Hosts To A Failover Cluster (With DataOn & Storage Spaces)

At work we have a small number of VMs to operate the business.  For our headcount, we actually would have lots of VMs, but distribution requires lots of systems for lots of vendors.  I generally have very little to do with our internal IT, but I’ll get involved with some engineering stuff from time to time.

2 non-clustered hosts (HP DL380 G6) were setup before I joined the company.  I upgraded/migrated those hosts to WS2012 earlier this year (networking = 4 * 1 GbE NIC team with virtualized converged networking for management OS and Live Migration). 

We decided to migrate the non-clustered hosts to create a Hyper-V cluster.  This was made feasibly affordable thanks to Storage Spaces, running on a shared JBOD.  We distribute DataOn, so we went with a single DNS-1640, to attached to both servers using the LSI 9207-8e dual port SAS card.

Yes, we’re doing the small biz option where two Hyper-V hosts are directly connected to a JBOD where Storage Spaces is running.  If we had more than 2 hosts, we would have used the SMB 3.0 architecture of Scale-Out File Server (SOFS).  Here is the process we have followed so far (all going perfectly up to now):

Step 1 – Upgrade RAM

Each host had enough RAM for it’s solo workload.  In a cluster, a single node must be capable of handling all VMs after a failover.  In our case, we doubled the RAM in each of the two servers.

Step 2 – Drain VMs from Host1

Using Shared-Nothing Live Migration, we moved VMs from Host1 to Host2.  This allows us to operate on a host for an extended period without affecting production VMs.

Note that this only worked because we had already upgraded the RAM (step 1) and we had sufficient free disk space in Host2.

Step 3 – Connect Host1

We added an LSI card into Host1.  We racked the JBOD.  And then we connected Host1 to the JBOD, one SAS cable going to port1/module1 in the JBOD, and the other SAS cable going to port1/module2 in the JBOD (for HA).

Host1 was booted up.  I downloaded the drivers, firmware, and BIOS from LSI for the adapter (never, ever use the drivers for anything that come on the Windows media if there is an OEM driver) and installed them.

Step 4 – Create Cluster

I installed two Windows features on Host1:

  • Failover Clustering
  • MPIO

I added SAS in MPIO, requiring a reboot.

Additional vNIC was added to the Management OS called Cluster2.  I then renamed the Live Migration network to Cluster 1.  QoS was configured so that the VMSwitch has 25% in the default bucket, and each of the 3 vNICs in the ManagementOS has 25% each.

SMB Multichannel constraints was configured for Cluster1 and Cluster2 for all servers.  That’s to control which NICs are used by SMB Multichannel (used by Redirected IO).

I then created a single node cluster and configured it.  Then it was time for more patching from Windows Update.

Step 5 – Hotfixes

I downloaded the recommended updates for WS2012 Hyper-V and Failover Clustering (not found on Windows Update) using a handy PowerShell script.  Then I installed them on & rebooted Host1.

Step 6 – Storage Spaces

In Failover Cluster manager I configured a new storage pool.  We’re still on WS2012 so a single hot spare disk was assigned.  Note that I strongly recommend WS2012 R2 and not assigning a hot spare; parallelized restore is a much faster and better option.

3 virtual disks (LUNs) were created:

  • Witness for the cluster
  • CSV1
  • CSV2

Rule of thumb: create 1 CSV per node in the cluster that is connected by SAS to the Storage Pool.

Step 7 – Configure Cluster Disks

The cluster is still single-node, so configuring a witness disk for quorum will cause alerts.  You can do it, but be aware of the alerts.

Each of the CSV virtual disks were converted to CSV and renamed to CSV1 and CSV2, including the mount points.

Step 8 – Test

Using Shared-Nothing Live Migration, a VM was moved to the cluster and placed on a CSV. 

This is where we are now, and we’re observing the performance/health of the new infrastructure.

Step 9 – Shared-Nothing Live Migration From Host2

All of the VMs will be moved from the D: of Host2 to the cluster and spread evenly across the two CSVs in the cluster, running on Host1.  This will leave Host1 drained.

Remember to reconfigure backups to backup VMs from the cluster!

Step 10 – Finish The Job

We will:

  1. Reconfigure the networking of Host2 as above (I’ve saved the PowerShell)
  2. Insert the LSI card in Host2 and connect it to the JBOD
  3. Install all the LSI drivers & updates on Host2 as we did on Host1
  4. Add the Failover Cluster and MPIO roles to Host2
  5. Add Host2 as a node in the cluster
  6. Patch up Host2
  7. Test Live Migration
  8. Plan out VM failover prioritization
  9. Configure Cluster Aware Updating self-updating for lunch time on the second Monday of every month – that’s a full month after Patch Tuesday, giving MSFT plenty of time to fix any broken updates (I’m thinking of Cumulative Updates/Update Rollups).

And that should be that!

The Effects Of WS2012 R2 Storage Spaces Write-Back Cache On A Hyper-V VM

I previously wrote about a new feature in Windows Server 2012 R2 Storage Spaces called Write-Back Cache (WBC) and how it improved write performance from a Hyper-V host.  What I didn’t show you was how WBC improved performance from where it counts; how does WBC improve the write-performance of services running inside of a virtual machine?

So, I set up a virtual machine.  It has 3 virtual hard disks:

  • Disk.vhdx: The guest OS (WS2012 R2 Preview), and this is stored on SOFS2.  This is a virtual Scale-Out File Server (SOFS) and is isolated from my tests.  This is the C: drive in the VM.
  • Disk1.vhdx: This is on SCSI 0 0 and is placed on \SOFS1CSV1.  The share is stored on a tiered storage space (50 GB SSD + 150 GB HDD) with 1 column and a write cache of 5 GB.  This is the D drive in the VM.
  • Disk2.vhdx: This is on SCSI 0 1 and is placed on \SOFS1CSV2.  The share is stored on a non-tiered storage space (200 GB HDD) with 4 columns.  There is no write cache.  This is the E: drive in the VM.

I set up SQLIO in the VM, with a test file in each D: (Disk1.vhdx – WBC on the underlying volume) and E: (Disk2.vhdx – no WBC on the underlying volume).  Once again, I ran SQLIO against each test file, one at a time, with random 64 KB writes for 30 seconds – I copied/pasted the scripts from the previous test.  The results were impressive:

image

Interestingly, these are better numbers than from the host itself!  The extra layer of virtualization is adding performance in my lab!

Once again, Write-Back Cache has rocked, making the write performance 6.27 times faster.  A few points on this:

  • The VM’s performance with the VHDX on the WBC-enabled volume was slightly better than the host’s raw performance with the same physical disk.
  • The VM’s performance with the VHDX on the WBC-disabled volume was nearly twice as good as the host’s raw performance with the same physical disk.  That’s why we see a WBC improvement of 6-times instead of 11-times. This is a write-job so it wasn’t CSV Cache.  I suspect sector size (physical versus logical might be what’s caused this.

I decided to tweak the scripts to get simultaneous testing of both VHDX files/shares/Storage Spaces virtual disks, and fired up performance monitor to view/compare the IOPS of each VHDX file.  The red bar is the optimised D: drive with higher write operations/second, and the green is the lower E: drive.

image

They say a picture paints a thousand words.  Let’s paint 2000 words; here’s the same test but over the length of a 60 second run.  Once again, read is the optimised D: drive and green is the E: drive.

image

Look what just 5 GB of SSD (yes, expensive enterprise class SSD) can do for your write performance!  That’s going to greatly benefit services when they have brief spikes in write activity – I don’t need countless spinning HDDs to build up IOS for those once an hour/day spikes, gobbling up capacity and power.  A few space/power efficient SSDs with Storage Spaces Write-Back Cache will do a much more efficient job.

The Effects Of WS2012 R2 Storage Spaces Write-Back Cache

In this post I want to show you the amazing effect that Write-Back Cache can have on the write performance of Windows Server 2012 R2 Storage Spaces.  But before I do, let’s fill in some gaps.

Background on Storage Spaces Write-Back Cache

Hyper-V, and many other applications/services/etc, does something called write-through.  In other words, it bypasses write caches of your physical storage.  This is to avoid corruption.  Keep this in mind while I move on.

In WS2012 R2, Storage Spaces introduces tiered storage.  This allows us to mix one tier of HDD (giving us bulk capacity) with one tier of SSD (giving us performance).  Normally a heap map process runs at 1am (task scheduler, and therefore customisable) and moves around 1 MB slices of files to the hot SSD tier or to the cold HDD tier, based on demand.  You can also pin entire files (maybe a VDI golden image) to the hot tier.

In addition, WS2012 R2 gives us something called Write-Back Cache (WBC).  Think about this … SSD gives us really fast write speeds.  Write caches are there to improve write performance.  Some applications are using write-through to avoid storage caches because they need the acknowledgement mean that the write really went to disk.

What if abnormal increases in write behaviour led to the virtual disk (a LUN in Storage Spaces) using it’s allocated SSD tier to absorb that spike, and then demote the data to the HDD tier later on if the slices are measured as cold.

That’s exactly what WBC, a feature of Storage Spaces with tiered storage, does.  A Storage Spaces tiered virtual disk will use the SSD tier to accommodate extra write activity.  The SSD tier increases the available write capacity until the spike decreases and things go back to normal.  We get the effect of a write cache, but write-through still happens because the write really is committed to disk rather than sitting in the RAM of a controller.

Putting Storage Spaces Write-Back Cache To The Test

What does this look like?  I set up a Scale-Out File Server that uses a DataOn DNS-1640D JBOD.  The 2 SOFS cluster nodes are each attached to the JBOD via dual port LSI 6 Gbps SAS adapters.  In the JBOD there is a tier of 2 * STEC SSDs (4-8 SSDs is a recommended starting point for a production SSD tier) and a tier of 8 * Seagate 10K HDDs.  I created 2 * 2-way mirrored virtual disks in the clustered Storage Space:

  • CSV1: 50 GB SSD tier + 150 GB HDD tier with 5 GB write cache size (WBC enabled)
  • CSV2: 200 GB HDD tier with no write cache (no WBC)

Note: I have 2 SSDs (sub-optimal starting point but it’s a lab and SSDs are expensive) so CSV1 has 1 column.  CSV2 has 4 columns.

Each virtual disk was converted into a CSV, CSV1 and CSV2.  A share was created on each CSV and shared as \Demo-SOFS1CSV1 and \Demo-SOFS1CSV2.  Yeah, I like naming consistency Smile

Then I logged into a Hyper-V host where I have installed SQLIO.  I configured a couple of params.txt files, one to use the WBC-enabled share and the other to use the WBC-disabled share:

  • Param1.TXT: \demo-sofs1CSV1testfile.dat 32 0x0 1024
  • Param2.TXT \demo-sofs1CSV2testfile.dat 32 0x0 1024

I pre-expanded the test files that would be created in each share by running:

  • "C:Program Files (x86)SQLIOsqlio.exe" -kW -s5 -fsequential -o4 –b64 -F"C:Program Files (x86)SQLIOparam1.txt"
  • "C:Program Files (x86)SQLIOsqlio.exe" -kW -s5 -fsequential -o4 -b64 -F"C:Program Files (x86)SQLIOparam2.txt"

And then I ran a script that ran SQLIO with the following flags to write random 64 KB blocks (similar to VHDX) for 30 seconds:

  • "C:Program Files (x86)SQLIOsqlio.exe" -BS -kW -frandom -t1 -o1 -s30 -b64 -F"C:Program Files (x86)SQLIOparam1.txt"
  • "C:Program Files (x86)SQLIOsqlio.exe" -BS -kW -frandom -t1 -o1 -s30 -b64 -F"C:Program Files (x86)SQLIOparam2.txt"

That gave me my results:

image

To summarise the results:

The WBC-enabled share ran at:

  • 2258.60 IOs/second
  • 141.16 Megabytes/second

The WBC-disabled share ran at:

  • 197.46 IOs/second
  • 12.34 Megabytes/second

Storage Spaces Write-Back Cache enabled the share on CSV1 to run 11.44 times faster than the non-enhanced share!!!  Everyone’s mileage will vary depending on number of SSDs versus HDDs, assigned cache size per virtual disk, speed of SSD and HDD, number of columns per virtual hard disk, and your network.  But one thing is for sure, with just a few SSDs, I can efficiently cater for brief spikes in write operations by the services that I am storing on my Storage Pool.

Credit: I got help on SQLIO from this blog post on MS SQL Tips by Andy Novick (MVP, SQL Server).