I recently had a situation where virtual machines on a Windows Server 2016 (WS2016) Hyper-V host could not communicate with each other. Ping tests were failing:
- Extremely high latency
- Lost packets
In this case, I was building a new Windows Server 2016 demo lab for some upcoming community events in The Netherlands and Germany, an updated version of my Hidden Treasures in Hyper-V talk that I’ve done previously at Ignite and TechEd Europe (I doubt I’ll ever do a real talk at Ignite again because I’m neither a MS employee or a conference sponsor). The machine I’m planning on using for these demos is an Intel NUC – it’s small, powerful, and is built with lots of flash storage. My lab consists of some domain controllers, storage, and some virtualized (nested) hosts, all originally connected to an external vSwitch. I built my new hosts, but could not join them to the domain. I did a ping from the new hosts to the domain controllers, and the tests resulted in massive packet loss. Some packets go through but with 3000+ MS latency.
At first I thought that I had fat-fingered some IPv4 configurations. But I double and triple checked things. No joy there. And that didn’t make sense (did I mention that this was at while having insomnia at 4am after doing a baby feed?) The usual cause of network problems is VMQ so that was my next suspect. I checked NCPA.CPL for the advanced NIC properties of the Intel NIC and there was no sign of VMQ. That’s not always a confirmation, so I ran Get-NetAdapterAdvancedProperty in PowerShell. My physical NIC did not have VMQ features at all, but the team interface of the virtual switch did.
And then I remembered reading that some people found that the team interface (virtual NIC) of the traditional Windows Server (LBFOADMIN) team (not Switch-Embedded Teaming) had VMQ enabled by default and that it caused VMQ-style issues. I ran Set-VMNetAdapterAdvancedProperty to disable the relevant RegistryKeyword for VMQ while running a ping –t and the result was immediate; my virtual switch was now working correctly. I know what you’re thinking – how can packets switching from one VM to another on the same host be affected by a NIC team? I don’t know, but they randomly are.
I cannot comment on how this affects 10 GbE networking – the jerks at Chelsio didn’t release WS2016 drivers for the T4 NICs and I cannot justify a spend on new NICs for WinServ work right now (it’s all Azure, all the time these days). But if you are experiencing weird virtual switch packet issues, and you are using a traditional NIC team, then see if VMQ on the team interface (the one connected to your virtual switch) is causing the issue.