The Quality of Service (QoS) feature has been improved in Windows Server 2012 and this becomes a major ingredient in converged fabrics. QoS rules can be used to:
- Guarantee a minimal and burstable (if there’s free capacity) amount of bandwidth (flexible)
- Limit bandwidth (inflexible)
There are 3 ways that you can create QoS rules, and each type depends on the type of traffic you are dealing with and changes where the rule is applied, and what it applies to:
1) The Virtual Switch
This is the simplest of the options in my opinion. You can create rules to be applied by a QoS enabled virtual switch. This should be used when the traffic you are dealing with is passing through a virtual switch (highlighted with red “1” above). Each rule is applied to a virtual NIC, either in a virtual machine or the Management OS. You can define a default bucket rule; this pool rule is used for any virtual NIC that does not have an explicit rule and the b/w is shared in the pool. All of my previous PoSH examples on my blog have focused on this approach.
2) OS Packet Scheduler
Maybe you have physical NICs that will be shared between roles (e.g. SMB 3.0, cluster communications, and Live Migration). These NICs have nothing to do with virtual NICs. In this case you create per-protocol rules, not virtual NIC rules. The rules are applied to network traffic as they pass through the networking stack by the OS Packet Scheduler.
Note that you can also use this approach in the guest OS of a VM.
3) Datacenter Bridging (DCB)
DCB is a function of hardware, allowing you to classify and apply QoS to networking protocols using hardware instead of software. Some protocols are invisible to Windows, such as Remote Direct Memory Access (RDMA) which is used for SMB Direct. You must use DCB to apply QoS for RDMA. Actually, using DCB offers the best performance. Two notes:
- You must have DCB enabled hardware from end to end between SMB client and SMB server.
- RoCE networking requires that you enable Priority Flow Control (PFC) for RDMA.
It’s one thing to apply QoS on a per NIC or server level, but the network admins might want to control traffic on the LAN. You can classify and tag traffic so that QoS rules on the switches/routers can also apply QoS rules. The classification can ID protocols, ports, destination IP addresses, and so on, classify the traffic, and that classification can be used by network admins, e.g. stop the total sum of backup traffic from screwing up the host networking.
QoS, combined with NIC teaming, virtual NICs, VLANs, RSS, DVMQ, etc, all make converged fabrics possible. There’s no one right design, but there are wrong ones, e.g. trying to use RSS and DVMQ on the same physical NICs.
Microsoft has published some common configurations on TechNet, along with a bunch of PowerShell. Here are some thoughts:
2 NICs without NIC Teaming
I’d expect this to be an uncommon configuration, but could see it being used in very large hosting with 1+1 host fault tolerance. In this example you can see:
- Methods 2 or 3 being used for the Management OS
- Method 1 being used for the virtual NICs
- Traffic in the guest OS is being tagged using DSCP.
2 NICs with NIC Teaming
This will be a very common configuration, useful for clusters using SAS or Fibre Channel storage. My previous blogging has focused on this example. In this example you can see:
- Method 1 being used on the virtual NICs
- The optionally note confuses things: here the author has classified and tagged the protocols of the Management OS and the VMs’ guest OSs so that the physical network can also apply QoS rules at the greater LAN level. The final code snippet shows how to pass this IeeePriorityTag on to the LAN
A twist on this example is to add 2 physical NICs for non-converged iSCSI so that you can have dedicated iSCSI switches (for manufacturer support).
4 NICs in two NIC teams
This example is basically the 2 NICs without NIC Teaming example with NIC teaming. I can see this one being used where SMB 3.0 storage is required but there is no RDMA/SMB Direct being used. NIC teaming can’t see RDMA!
The two physical NICs on the left are dedicated to the Management OS. The QoS rules classify and tag protocols and apply QoS either:
- Using DCB (more expensive but better performing)
- The OS Packet Scheduler
The NIC team and virtual switch are dedicated to virtual machines.
4 NICs with a standard NIC team and two RDMA NICs
This is a variation on the last example. Two RDMA NICs are being used for SMB Direct. NIC teaming must be dispensed with in the Management OS. Now DCB must be used. Rules are created to classify and tag protocols, and DCB is used to apply QoS. PFC will be enabled for RDMA if using RoCE NICs.
- The two Management OS NICs are not teamed, and therefore have their own IP addresses
- If using Scale-Out File Server (clustered file server for application data) then both of these NICs should be on different subnets, matching the NICs that are on the clustered file server nodes. This is a requirement of SMB Multichannel when using clustered file shares, such as those in SOFS.
- You do not enable MPIO as the diagram suggests. You get MPIO like behaviour from SMB Multichannel.
Alternate configuration of 4 NICs with a standard NIC team and two RDMA NICs
This example uses the flexibility of virtual NIC for the Management OS, while dedicating 100% of bandwidth of the RDMA NICs for SMB 3.0. See the previous example notes for the 2 SMB 3.0 NICs (IP addresses, QoS, PFC, and MPIO).
You’ll use QoS on a per virtual NIC basis that is applied by the virtual switch for the Management OS vNICs (Management, Live Migration, Clustering).
Microsoft also published some best practices for QoS.
- When using weight based rules (my preference) then keep the total weights near or under 100. I hate using QoS on individual VMs because they move. I prefer to lump them into the virtual switch bucket, and reserve specific rules for the Management OS virtual NICs. I keep the weights (vNICs and bucket) totalling 100, therefore they are percentages and not weights. You might otherwise assign 4 vNICs weights of 20 and be confused when a query shows a result of 25% (1/4 of 80).
- I like the note on cluster communications/heartbeat getting a weight of more than 1 (e.g. 5) on a 10 GbE NIC.
- Don’t do DCB and Minimum Bandwidth rules on the same NICs
- Pay attention to the notes at the end on QoS and NIC teaming.
If you have reached here without reading the referenced Microsoft pages then, then don’t be an idiot: go back and read them. I’ve only covered/explained a tiny fraction of the shared information.