In this post I’ll tell you about the cluster-in-a-box solution from DataOn Storage that allows you to deploy a Hyper-V cluster for a small-mid business or branch office in just 2U, at lower costs than you’ll pay to the likes of Dell/HP/EMC/etc, and with more performance.
So you might have noticed on social media that my employers are distributing storage/compute solutions from both DataON and Gridstore. While some might see them as competitors, I see them as complimentary solutions in our portfolio that are for two different markets:
- Gridstore: Their hyper-converged infrastructure (HCI) products remove fear and risk by giving you a pre-packaged solution that is easy and quick to scale out.
- DataON: There are two offerings, in my opinion. SMEs want HA but at a budget they can afford – I’ll focus on that area in this article. And then there are the scaled-out Storage Spaces offerings, that with some engineering and knowledge, allow you to build out a huge storage system at a fraction of the cost of the competition – assuming you buy from distributors that aren’t more focused on selling EMC or NetApp 🙂
There is a myth out there that the cloud has or will remove servers from SMEs. The category “SME” covers a huge variety of companies. Outside of the USA, it’s described as a business with 5-250 users. I know that some in Microsoft USA describe it as a company with up to 2,500 users. So, sure, a business with 5-50 users might go server-less pretty easily today (assuming broadband availability), but other organizations might continue to keep their Hyper-V (more likely in SME) or vSphere (less likely in SME) infrastructures for the foreseeable future.
These businesses have the same demands for applications, and HA is no less important to a 50 user business than it is for a giant corporation; in fact, SMEs are hurt more when systems go down because they probably have a single revenue operation that gets shut down when some system fails.
So why isn’t the Hyper-V (or vSphere) cluster the norm in an SME? It’s simple: cost. It’s one thing to go from one host to two, but throw in the cost of a modest SAS/iSCSI SAN and that solution just became unaffordable – in case you don’t know, the storage companies allegedly make 85% margin on the list price of storage. SMEs just cannot justify the cost of SAN storage.
I was at the first Build conference in LA when Microsoft announced Windows 8 and Windows Server 2012. WS2012 gave us Storage Spaces, and Microsoft implored the hardware vendors to invest in this new technology, mainly because Microsoft saw it as the future of cluster storage. A Storage Spaces-certified JBOD can be used instead of a SAN as shared cluster storage, and this could greatly bring down the cost of Hyper-V storage for customers of all sizes. Tiered storage (SSD and HDD) that combines the speed of SSD with the economy of large hard drives (now up to 10 TB) with transparent and automatic demand-based block based tiering meant that economy doesn’t mean a drop in performance – it actually increases performance!
One of the sessions, presented by Microsoft Clustering Principal PM Lead Elden Christensen, focused on a new type of hardware solution that MSFT wanted to see vendors develop. A Cluster-in-a-Box (CiB) would provide a small storage or Hyper-V cluster in a single pre-packaged and tested enclosure. That enclosure would contain:
- Up to 2 or 4 independent blade servers
- Shared storage in the form of a Storage Spaces “JBOD”
- Built in cluster networking
- Fault tolerant power supplies
- The ability to expand via SAS connections (additional JBODs)
I loved this idea; here was a hardware solution that was perfect for a Hyper-V cluster in an SME or a remote office/branch office (ROBO), and the deployment could be really simple – there are few decisions to make about the spec, performance would be awesome via storage tiering, and deployment could be really quick.
DataON CiB-9112 V12
This is the second generation of CiBs that I have worked with from DataON, a company that specialises in building state-of-the-art and Mcirosoft-certified Storage Spaces hardware. My employers, MicroWarehouse Ltd. (an Irish company that has nothing to do with an identically named UK company) distributes DataON hardware to resellers around Europe – everywhere from Galway in west Ireland to Poland so far.
The CiB concept is simple. There are two blade servers in the 2U enclosure. Each has the following spec:
- Dual Intel® Xeon® E5-2600v3 (Haswell-EP)
- DDR4 Reg. ECC memory up to 512GB
- Dual 1G SFP+ & IPMI management “KVM over IP” port
- Two PCI-e 3.0 x8 expansion slots
- One 12Gb/s SAS x4 HD expansion port
- Two 2.5” 6Gb/s SATA OS drive bays
Networking wise, there are 4 NICs per blade:
- 2 x LAN facing Intel 1 GbE NICs, which I team for a virtual switch with management OS sharing enabled (with QoS enabled).
- 2 x internal Intel 10 GbE , which I use for cluster communications and SMB 3.0 Live Migration. These NICs are internal copper connections so you do not need an external 10 GbE switch. I do not team these NICs, and they should be on 2 different subnets for cluster compatibility.
You can use the PCI-e expandability to add more SAS or NIC interfaces, as required, e.g. DataON work closely with Mellanox for RDMA networking.
The enclosure also has:
- 12-bay 3.5”/2.5“ shared drive slots (with caddies)
- 1023W (1+1) redundant power
Typically, the 12 shared drive bays are used as a single storage pool with 4 x SSDs (performance) and 8 x 7200 RPM HDDs (capacity). Tiering in Storage Spaces works very well. Here’s an anecdote I heard while in a pre-sales meeting with one of our resellers:
They put a CiB (6 GB SAS, instead of 12 GB as on the CiB-9112) into a customer site last year. That customer had the need to run a regular batch job that would normally takes hours, and they had gotten used to working around that dead time. Things changed when the VMs were moved onto the CiB. The batch job ran so quickly that the customer was sure that it hadn’t run correctly. The reseller double-checked everything, and found that Storage Spaces tiering and the power of the CiB blades had greatly improved the performance of the database in question, and everything was actually fine – great actually!
And here was the kicker – that customer got a 2 node Hyper-V cluster with shared storage in the form of a DataON CiB for less than the cost of a SAN, let alone the cost of the 2 Hyper-V nodes.
How well does this scale? I find that CPU/RAM are rarely the bottlenecks in the SME. There are plenty of cores/logical processors in the E5-2600v3, and 512 GB RAM is more than enough for any SME. Disk is usually the bottleneck. With a modest configuration (not the max) of 4 x 200 GB SSDs and 8 x 4 TB drives you’re looking at around 14 TB of usable 2-way mirrored (like RAID 10) storage. Or you could have 4 x 1.6 TB SSDs and 8 x 8 TB HDDs and have around 32 TB of usuable 2-way mirrored storage. That’s plenty!
And if that’s not enough, then you can expand the CiB using additional JBODs.
My Hands-On Experience
Lots of hardware goes through our warehouse that I never get to play with. But on occasion, a reseller will ask for my assistance. A couple of weeks ago, I got to do my first deployment of the 12 Gb SAS CiB-9112. We got it out of the box, and immediately I was impressed. This design indicates that engineers had designed the hardware for admins to manage. It really is a very clever and modular design.
The two side-bezels on the front of the 2U enclosure have a power switch and USB port for each blade server.
On the top, you can easily access the replaceable fans via a dedicated hinged panel. At the back, both fault-tolerant power supplies are in the middle, away from the clutter at the side of a rack. The blades can be removed separately from their SAS controllers. And each of the RAID1 disks for the blades’ OS (the management OS for a Hyper-C cluster) can be replaced without removing the blade.
Racking a CiB is a simple task – the entire Hyper-V cluster is a single 2U enclosure so there are no SAN controllers, SAN switches, SAN cables, and multiple servers. You slide a single 2U enclosure into it’s rail kit, plug in power, networking, and KVM, and you’re done.
Windows Server is pre-installed and you just need to modify the installation type (from eval) and enter your product key using DISM. Then you prep the cluster – DataON pre-installs MPIO, Hyper-V, and Failover Clustering to make your life easy.
My design is simple:
- The 1 GbE NICs are teamed, connected to a weight-based QoS Hyper-V switch, and shared with the parent. A weight of 50 is assigned to the default bucket QoS rule, and 50 is assigned to the management OS virtual NIC.
- The 10 GbE NICs are on 2 different subnets.
- I enable SMB 3.0 Live Migration on both nodes in Hyper-V Manager.
- MPIO is configured with the LB policy.
- I ensure that VMQ is disabled on the 1 GbE NICs and enabled on the 10 GbE NICs.
- I form the cluster with no disks, and configure the 10 GbE NICs for Live Migration.
- A single clustered storage pool is created in Failover Cluster Manger.
- A 1 GB (it’s always bigger) 2-way mirrored virtual disk is created and configured as the witness disk in the cluster.
- I create 2 virtual disks to be used as CSVs in the cluster, with 64 KB interleaves and formatted with 64 KB allocation unit size. The CSVs are tiered with some SSD and some HDD … I always leave free space in the pool to allow expandability of one CSV over the other. HA VMs are balanced between the 2 CSVs.
What about DCs? If the customer is keeping external DCs then everything is done. If they want DCs running on the CiB then I always deploy them as non-HA DCs that are stored on the C: of each CiB blade. I know that since WS2012, we are supposed to be able to run DCs are HA VMs on the cluster, but I’ve experienced issues with that.
With some PowerShell, the above process is very quick, and to be honest, the slowest bit is always the logistics of racking the CiB. I’m usually done in the early afternoon, and that includes some show’n’tell.
If you want a tidy, quick & easy to deploy, and affordable HA solution for an SME or ROBO then the DataOn CiB-9112 V12 is an awesome option. If I was doing our IT from scratch, this is what I would use (we had existing servers and added a DataON JBOD, and recently replaced the servers while retaining the JBOD). I love how tidy the solution is, and how simple it is to set up, especially with some fairly basic PowerShell. So check it out, and see what it can do for you.
25 thoughts on “DataOn CiB-9112 V12 Cluster-in-a-Box”
Sounds pretty much the same like the offering from Fujitsu (CX420-M), which is available for quite some time. Did a rollout with those boxes across 11 countries in Europe and all running flawless. However I only create 1 big CSV and leave the seconds node more or less standby. We are only running 3 VMs per country and do not need to balance. One Tipp I have is to disable the VMICTimeProvider (Enable=0) in the w32time service on virtualized domain controllers. Otherwise the host and vm will synchronize with each other and that might lead the time to drift away. However the VMICTime will still sync time with this setting exactly one time, when machine is resumed from paused state which is exactly what you want. Had a real hard time to find this info on the internet, but some guy mentioned it.
DataON have been in the space a much longer time than Fujitsu. The CiB I’ve discussed here is a newer version of a long-running line. And it’s way higher spec than what Fujitsu are offering, and at a much lower cost.
The suggestion you’ve made about DCs and the time integration service is a normal recommendation, no matter what kind of Hyper-V deployment you do.
Do you have to use CSVs in the cluster or can you use SMB shares? Trying to wrap my head around this.
No SMB shares. The CSVs are data volumes, nothing more. You’re way over thinking this. Imagine that the “JBOD” replaces a SAN in a 2-node cluster. That’s it. Simple.
I see that you enable SMB 3.0 for live migration. In the scenario where this is used for a Hyper-V cluster (*not* a scale out file server), do you create an SMB share on the clustered shared volumes. For some reason I only thought SMB 3.0 was used for Scale Our File Servers.
No, there is no file share. SMB 3.0 is a data movement protocol.
Thank for the article. A great read.
There is one caveat though, with tiering and storage spaces. You said that the CiB had 4 SSDs and 8 HDDs in a 2-way mirror config. This means that you are limited to only 2 columns (stripes) as both tiers in SS have to have the same number of columns. Basically, when you run out of SSD space you will hit a huge performance penalty of running the VMs on a RAID10 of 4 HDDs.
You can, however, have different tier configurations so you could set the SSD tier as Simple (no parity) and gain 4 columns, but because it is a tier and not a cache if it goes down, you lose data.
One should always be sure to balance the number of SSDs in the JBOD and also look at the size of them.
Incorrect – there is a rebalancing of data every night at 1am. Hot blocks are always moved to the SSD, and cold blocks are moved to the HDD tier. So you never “run out” of space.
Great Solution do you know has the support levels been increased (specifically in the UK)? Most businesses require/want more than a next business day warranty. And the Data-On support is only in the US so time zone differences are a potential issue. Also how do you monitor & get alerts for the system for things like drive failures?
I run these products for clients based in New Zealand. I find Dataons support and shipping very fast – I have the replacement drive inside a few days. You could always keep a hot spare lying around, and just replace their advanced replacement if you like. All the rest of the components are pretty redundant, which gave me the confidence to supply them here.
When talking about affordability, it would be interesting to know what the ballpark starting price is…any clues?
Do you have any idea how much of a performance penalty there is for running Storage Spaces on the Hyper-V host?
Jamie – DataOn is a small company and I recently ran into an issue where their support model would not work for my organization. That being said, they will work with your vendor and train them on break/fix procedures should you need ‘boots on the ground’ style support. However, you will find their hardware is completely modular, so any hardware component that could fail is easily replaceable very quickly and every piece of hardware is redundant.
Steven – Call up DataOn and ask for a quote – they’re very friendly and easy to deal with.
The biggest difference as of right now is DataOn is one of two, maybe three, vendors on the HCL for storage spaces that are using 12Gb SAS. Not even Dell has gotten that far, as their MD3060e enclosures are still 6Gb SAS. The CiB 9XXX is an amazing piece of hardware and it is VERY capable of handling virtual SQL loads.
Looked at these back in 2014, and they were darn good bang for your buck, but I was weary of the support on them if using for production in a small environment where you didn’t have multiple box redundancy, especially outside of the US. I’m sure they’ve beefed up their global support offering since then though.
For a ball park figure for Steven, the older CiB-9220 chassis containing 2 x nodes, each with 2x E5-2600 CPU’s, 128GB RAM and 2x SSD 128GB boot drives, was around US$16k.
Then add your choice of storage configuration:
360GB of usable MLC SSD, 4.3TB of usable 10k rpm SAS, about US$6.5k.
360GB of usable MLC SSD, 14.4TB of usable 7.2k rpm SAS Nearline, about $6k.
I’m sure prices have changed a bit since then, especially the SSD pricing. Found the DataOn guys in LA were easy to get a quote out of without any sales pressure, as were a reseller based in Germany that we contacted.
Is there a uk distributor for data on?
Go to mwh.ie in Dublin.
I get quite strange results in a test with a tiered storage spaces in Windows Server 2012 R2. I’m using a simple tiered configuration with one 200GB SSD and 2 900GB HHD. Of the 200 GB SSD 180 GB is used in the storage space.
I run Microsoft Diskspd to test the performance. I use 16kb blocks with random access.
I’m running a powershell script to let Diskspd create and test five 50 GB test files. The intention is to use more than the 180 GB SSD in the storage space and see what performance I get.
To begin with my scripts creating all the test files in sequence. I call the test files A, B, C, D and E below. The performance is expected. File A, B and C fully serviced by the SSD, File D is serviced by both SSD and HHD and File E is serviced by the HHD. Result is shown in table below.
A: 427 Mb/S
B: 427 Mb/S
C: 427 Mb/S
D: 41 Mb/S
E: 15 Mb/S
I then use the Microsoft Task Scheduler to run the Powershell based Diskspd test daily on File A and B and hourly on File C, D and E. I run the Storage Tier defrag script every 6th hour. The intention with running these scripts is to force the Storage Tier to move blocks around so file C, D and E is serviced by the SSD since they are tested and accessed much more frequently then file A and B. The expected result is to get around 427 Mb/S for file C, D and E. For File A and B, that is only tested daily and not hourly, I expect between 15 -41 Mb/S. I have run the test for almost 48 hours and I don’t get the expected results. They are much lower as the table below shows:
A: 8 – 118 Mb/S
B: 6 – 119 Mb/S
C: 6 – 117 Mb/S
D: 4 – 7 Mb/S
E: 4 – 7 Mb/S
Several of these test results is even lower than running just HHD. Is Storage Tier just as bad as these results or am I missing something?
Yup, you’re missing some SSDs. Realistically, you’re going to need 4+ SSDs. And if you’re testing, make sure you’ve aligned the interleaves and file system allocation unit size to match the data.
Are these cluster-in-a-box solutions going to be available for Windows Server 2016? Previously Windows Server would have a common feature set but differ only in licensing, but in Server 2016 Storage Spaces appears to have become Storage Spaces Direct and is only available in the Datacenter edition, which is [£$]++ over the Standard edition.
S2D and Storage Spaces are 2 different things. If you want S2D, then CiB is not for you. If you want CiB, then there are WS2016 certified solutions, e.g. the DataOn 9112.
Do you typically recommend 64KB interleave for virtual disks? Also, if I only had 4 HDD in a pool, I would need to use 1 column and keep the pool free space of at least a drive+8GB to handle the retirement of missing disk, correct?
Get-StoragePool VHD_Pool | Set-StoragePool -RetireMissingPhysicalDisks Always
Get-StoragePool VHD_Pool | Set-ResiliencySetting -Name Mirror -NumberOfDataCopiesDefault 2 -NumberOfColumnsDefault 1 -InterleaveDefault 64KB
With 4 HDDs and 2 way mirroring you’ll have 2 columns. Use 64KB Interleave for Hyper-V, with 64KB allocation unit size for NTFS.
Thanks for the info on 64KB.
I was thinking I had to reduce to 1 column, since I wanted the Always retire disks enabled. If a drive failed, 3 remaining wouldn’t allow for 2 columns in a 2-way mirror.
I see you set MPIO configured with the LB policy, but would the Tier not perform better with the SSD’s set to Round Robin (leaving the HDDs as LB)?