Today I got “generation 2” of the lab functioning the way I want it to today. The hosts are two Dell R420 12th generation servers, with 2 * 6 Core CPUs (24 logical processors each), 64 GB RAM and an extra Chelsio T440 CR quad port iWARP SFP+ NICs (for RDMA/SMB Direct). The HP DL360 G7’s are now the nodes in my Scale-Out File Server.
2 of the iWARP NICs are used for the vSwitch NIC team. The other two are not teamed (prevents RDMA) and are on different subnets to support Multichannel to the SOFS.
I have a script that tests the migration of a VM using the different WS2012 R2 options and times the movements. I just compared TCP/IP Live Migration (over 1 * 10 GbE with some CPU impact) and compared it with SMB Live Migration which used 2 * 10 GbE. This was done with a single VM with 56 GB of statically assigned RAM. The results are in:
- TCP Live Migration: using around 9.8 Gbps took 58 seconds (which is excellent)
- SMB Live Migration: using nearly all of the available 20 Gbps took 35 seconds
Think about that … a Linux (did I mention that?) VM with 56 GM RAM moved between two hosts in 35 seconds … with no noticeable CPU impact on the hosts caused by Live Migration!
I actually moved 50 VMs concurrently yesterday and there was no noticeable CPU impact!
There was a little engineering required:
- Jumbo Frames was configured on the NICs and (thanks to Didier Van Hoye, aka @workinghardinit) I verified it end-to-end using ping <IP> –l 8400 –f. This gave me 10 Gbps on a single NIC.
- The final piece was to update the driver … the out of box driver refused to use more than 5 Gbps on each NIC via SMB Multichannel, usually sitting at 2.4 Gbps most of the time. Now I had 20 Gbps.
- I verified that RDMA was kicking in almost immediately via PerfMon. Multichannel is kicking in almost immediately too.