Last year I did a series of posts on converged fabrics. At the time, it was still early days and we had very little information from Microsoft in the public domain. A key piece of the puzzle is the NIC team. It’s clear that NIC teaming is confusing people. I’m seeing loads of questions about bandwidth not being used, concerns about architectures and so on. I thought I’d write a new series of posts on NIC teaming, dealing with one chunk of information in each post.
If you want, you can read the official documentation on NIC teaming from Microsoft. This paper was published during the beta but it’s still valid. It is heavy reading, but it’s pretty complete.
OK, let’s get started:
NIC Teaming in Windows Server 2012
Microsoft has never supported 3rd party NIC teaming, such as the sort you get from HP, Dell, Intel or Broadcom:
Since Network Adapter Teaming is only provided by Hardware Vendors, Microsoft does not provide any support for this technology thru Microsoft Product Support Services. As a result, Microsoft may ask that you temporarily disable or remove Network Adapter Teaming software when troubleshooting issues where the teaming software is suspect.
If the problem is resolved by the removal of Network Adapter Teaming software, then further assistance must be obtained thru the Hardware Vendor.
From my perspective, the 3rd party NIC teaming solutions ripped out the guts of Microsoft networking, threw in a few parts of their own, shoved it all back in, and hoped that this Franken-networking would stay running. As a result, lots of the problems I heard about were caused by NIC teaming. I even heard of a problem were a badly configured (not according to the vendor’s guidance) NIC team could cause a network security issue that was not otherwise possible with Hyper-V.
We Hyper-V users begged for Microsoft to write their own NIC teaming for Windows Server. Windows Server 2012 delivered, and gives us NIC teaming that is built into the OS and is fully supported, including Hyper-V and Failover Clustering. In fact, we can use it to create some very nice network designs that abstract the fabrics of the data centre (converged fabrics), that result in simpler, cheaper, and more fault tolerant networking.
Load Balancing and Failover (LBFO)
NIC teaming has 2 basic reasons to exist:
Reason 1: Load Balancing
Load balancing will spread the total traffic of a server or host across a number of NICs. This is an aggregation of bandwidth. You can see a crude example of this in the below diagram, where VM1’s traffic passes through pNIC1 and VM2’s traffic passes through pNIC2.
Bandwidth aggregation is one of the most commonly misunderstood aspects of NIC teaming. Teaming four 1 GbE NICs does not necessarily give you a single 4 GbE pipe. It gives you four 1 GbE NICs that the NIC team can load balance traffic across. The design of the NIC team will dictate how the load balancing is done. In the above example, the virtual network adapter in VM1 is constrained to pNIC1 and the virtual network adapter in VM2 is limited to pNIC2.
Note: You can see that the virtual NICs are connected to a virtual switch as usual. They have no visibility of the NIC team underneath.
Reason 2: Failover
In my experience, the most troublesome part of a computer room or data centre is the network. Switches, no matter how expensive they are, fail, and they tend to choose the most inappropriate times. In this case, we can design the NIC team with path fault tolerance. A basic example below shows how each NIC in the NIC team is connected to a different access switch. If one access switch fails, then the other NIC(s) in the team pick up the load. This failover happens automatically. The team will also automatically rebalance the workloads when the storage path comes back online. This solution will work in all scenarios where the team member (pNIC1) detects a connection failure, i.e. pNIC1 goes offline.
A computer room or data centre that is designed for fault tolerance will always put in NICs in pairs, access switches in pairs, core switches in pairs, load balancers in pairs, firewalls in pairs, and so on. Of course, “pairs” might be swapped for “teams” in huge environments. And those switches could be standalone or stacked. That’s a question for your network admins or architects.
A fairly new concept in the data centre is where fail tolerance is built into the application or service that is running in the servers or VMs. In this case, the hosts are designed to fail, because the application always has fault tolerant copies elsewhere. This allows you to dispense with teaming, switch teaming/stacking, and all the additional costs of putting in 2 instead of 1.
There will be more posts on NIC teaming over the coming weeks.
This information has been brought to you by Windows Server 2012 Hyper-V Installation and Configuration Guide (available on pre-order on Amazon) where you’ll find lots of PowerShell like in this script: