Ignite 2015 – Spaces-Based, Software-Defined Storage–Design and Configuration Best Practices

Speakers: Joshua Adams and Jason Gerend, Microsoft.

Designing a Storage Spaces Solution

  1. Size your disks for capacity and performance
  2. size your storage enclosures
  3. Choose how to handlw disk failures
  4. Pick the number of cluster nodes
  5. Select a hardware solution
  6. Design your storage pools
  7. Design your virtual disks

Size your disks – for capacity (HDDs)

  1. Identify your workloads and resiliency type: Parity for backups and mirror for everything else.
  2. Estimate how much raw capacity you need. Currently capcity x% data grown X data copies (if your using mirrors). Add 12% initially for automatic virtual disk repairs and meta data overhead. Example: 135 TB x 1. x 3 data copies + 12 % = 499 TB raw capacity
  3. Size your HDDs: Pick big 7200 RPM NL SAS HDDs. Fast HDD not required is using SSD tier.

Software Defined Storage Calculator allows you to size and design a deployment and it generates the PowerShell. Works with WS2012 R2 and WS2016, disaggregated and hyperconverged deployments.

Size your disks – for performance (SSDs)

  1. How many SSDs to use. Sweet spot is 1 SSD for every 2-4 HDDs. Typically 4-5 SSDs per enclosure per pool. More SSDs = more absolute performance
  2. Determine the SD size. 800 GB SSDs are typical. Larger SSD capacity = can handle larger amounts of active data. Anticipate around 10% of SSD capacity for automatically repairing after an SSD failure.

Example 36 x 800 GB SSDs.

Size you Enclosures

  1. Pick the enclosure size (12, 24, 60, etc  disks)
  2. Pick the number of enclosures. If you have 3 or 4 then you have enclosure awareness/fault tolerance, depending on type of mirroring.
  3. Each enclosure should have an identical number of disks.

Example, 3 x 60 bay JBODs each with 48 HDDs and 12 SSDs

The column count is fixed between 2 tiers. The smaller tier (SSD) limits the column count. 3-4 columns is a sweet spot.

Expanding pools has an overhead. Not trivial but it works. Recommend that you fill JBODs.

Choose how to Handle Disk Failures

  1. Simultaneous disk failures to tolerate. Use 2 data copies for small deployments and disks, and/or less important data. use 3 data copies for larger deployments and disks, and for more important data.
  2. Plan to automatically repair disks. Instead of hot spares, set aside pool capacity to automatically replace failed disks. Also effects column count … more later.

Example: 3-way mirrors.

Pick the number of Cluster Nodes

Start with 1 node per enclosure and scale up/down depending on the amount of compute required. This isn’t about performance; it’s about how much compute you can afford to lose and still retain HA.

Example: 3 x 3 = 3 SOFS nodes + 3 JBODs.

Select a hardware vendor

  1. DataON
  2. Dell
  3. HP
  4. RAID Inc
  5. Microsoft/Dell CPS

Design your Storage Pools

  1. Management domains: put your raw disks in the pool and manage them as a group. Some disk settings are applied at the pool level.
  2. More pools = more to manage. Pools = fault domains. More pools = less risk – increased resiliency and resiliency overhead..

Start with 84 disks per pool.

Divide disks evenly between pools.

Design your Virtual Disks

  • Where storage tiers, write-back cache and enclosure awareness are set.
  • More VDs = more uniform load balancing, but more to manage.
  • This is where column count come in. More columns = more throughput, but more latency. 3-4 columns is best.
  • Load balancing is dependent on identical virtual disks.
  • To automatically repair after a disk failure, need at least one more disk per tier than columns for the smallest tier, which is usually the SSD tier.
  1. Set aside 10% of SSD and HDD capacity for repairs.
  2. Start with 2 virtual disks per node.
  3. Add more to keep virtual disk size to 10 TB or less. Divide SSD and HDD capacity evenly between virtual disks. Use 3-4 columns if possible.

Best Practices for WS2012 R2

  • Scale by adding fully populated clusters. Get used to the concept of storage/compute/networking stamps.
  • Monitor your existing workloads for performance. The more you know about the traits of your unique workloads, the better future deployments will be.
  • Do a PoC deployment. Use DiskSpd and fault injection to stress the solution. Monitor the storage tiers performance to determine how much SSD capacity you need to fit a given scale of your workloads into SSD tiers.

WORK WITH A TRUSTED SOLUTION VENDOR. Not all hardware is good, even if it is on the HCL. Some are better than others, and some suck. In my opinion Intel and Quanta suck. DataON is excellent. Dell appears to have gone through hell during CPS development to be OK. And some disks, e.g. SanDISK, are  the spawn of Satan, in my experience – Note that Dell use SanDISK and Toshiba so demand Toshiba only SSDs from Dell. HGST SSDs are excellent.

Deployment Best Practices

  • Disable TRIM on SSDs. Some drives degrade performance with TRIM enabled.
  • Disable all disk based caches – if enabled if degrades performance when write-through is used (Hyper-V).
  • Use LB (least blocks) for MPIO policy. For max performance, set individual SSDs to Round Robin. This must be done on each SOFS node.
  • Optimize Storage Spaces repair settings on SOFS. Use Fast Rebuild. Change it from Auto to Always on the pool. This means that 5 minutes after a write failure, a rebuild will automatically start. Pulling a disk does not trigger an automatic rebuild – an expensive process.
  • Install the latest updates. Example: repair process got huge improvement in November 2014 update.

Deployment & Management Best Practices

  • Deploy using VMM or PowerShell. FCM is OK for small deployments.
  • VMM is great for some stuff, but in 2012 R2 it doesn’t do tiering etc. It can create the cluster well and manage shares, but for disk creation, use PowerShell.
  • Monitor it using SCOM with the new Storage Spaces management pack.
  • Also use Test-StorageHealth.PS1 to do some checks occasionally. It needs tweaking to size it for your configuration.

Design Closing Thoughts

  • Storage Spaces solutions offer: 2-4 cluster nodes and 1-4 JBODs. Store 100 to as many as 2000 VMs.
  • Storage Pool Design; HDDs  provide most of the capacity. SSDs offer performance. Up to 84 disks per pool.
  • Virtual Disk design: Set aside 10% of SSD and HDD capacity for repairs. Start with 2 VDs per node. Max 0 TB/virtual disk. 3-4 volums for balanced performance.

Coming in May

  • Storage Spaces Design Considerations Guide (basis of this presentation)
  • Storage Spaces Design Calculator (spreadsheet used in this presentation)

14 thoughts on “Ignite 2015 – Spaces-Based, Software-Defined Storage–Design and Configuration Best Practices”

  1. Hi Aidan, just firing up our first few Storage spaces solutions… Sandisk = spawn of satan is disturbing as that’s what we went for…. 🙂

    Before we break down and cry… any expansion on that? (performance, reliability, failures, all of it?)

    We’d go for refunds but don’t think it will fly!

    Many thanks…

    1. SanDISK = shit on a stick that you point at your enemies, IMO. Anyone I know that used their crap burned it in a fire soon after. They ship disks with old firmware and don’t share the firmware. The disks are unreliable under load. They grind servers to a halt. And generally, are the crap you find under snake shit that makes the shit in question look tasty. In my opinion.

  2. Hi Aidan,

    trying to correctly define the amount of unallocated space in our 3 enclosure (enclosure aware) storage spaces solution.

    Are you able to clarify what, if any, unallocated space should be left in a storage pool if you are planning to ‘fully provision’ i.e. not allow for automatic recovery from a drive failure.

    If you are configuring your V Disks to recover automatically to unallocated space I have read a Microsoft TechNet article specifying that you should leave 1 disk per tier, per storage enclosure (so 3 x SSD and 3 x HDD… one per tier in each enclosure… plus 8GB per drive).

    What is your experience of configuring for disk failure in this manor?

    1. I am not going to answer that question because not allowing for automatic recovery is pure daft.

      1. Thanks Aidan… maybe I should have been clearer.. my first question was referring more dev orientated environments.

        However, the second is a production environment.

        1. You simply don’t allocate all space to the virtual disks. If the largest disk is 4 TB in the HDD tier, then leave 4 TB free in that tier. That leaves enough room spare in that tier of that pool for parallelized restore to run.

  3. “Disable all disk based caches – if enabled if degrades performance when write-through is used (Hyper-V).”

    Is this the equivalent of setting the IsPowerProctected option for SoFS storage pools housing Hyper-V data to True?

  4. Does anyone know if the Storage Calculator sheet has already been released, or when it will be released? Thx

  5. Hi Aidan!
    At first, thank you for your blog!
    I have a question about backup VirtualDisk.
    We’re using standard OS backup feature (wbadmin) for making backups of VirtualDrive. Can it affect somehow on the Heat Map Tracking if we use it with SSD Tiering?

  6. Hi, Aidan!
    Could you, please, tell more about your opinion on Quanta’s JBODs? Why do you think they suck?
    Is related to hardware issues or bad support from vendor?

    1. Poor stability with Storage Spaces. I’ve no first hand experience, but I’ve heard enough of it in the community to rule them out.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.