Broadcom & Intel Network Engineers Need A Good Beating

Your virtual machines lost network connectivity.

Yeah, Aidan Smash … again.

READ HERE: I’m tired of having to tell people to:

Disable VMQ on 1 GbE NICs … no matter what … yes, that includes you … I don’t care what your excuse is … yes; you.

That’s because VMQ on 1 GbE NICs is:

  • On by default despite the requests and advice of Microsoft
  • It breaks Hyper-V networking

Here’s what I saw on a brand new dell R730, factory fresh with a NIC firmware/driver update:

image

Now what do you think is the correct action here? Let me give you the answer:

  1. Change Virtual Machine Queues to Disabled
  2. Click OK
  3. Repeat on each 1 GbE NIC on the host.

Got any objections to that? Go to READ HERE above. Still got questions? Go to READ HERE above. Got some objections? Go to READ HERE above. Want to comment on this post? Go to READ HERE above.

This BS is why I want Microsoft to disable all hardware offloads by default in Windows Server. The OEMs cannot be trusted to deploy reliable drivers/firmware, and neither can many of you be trusted to test/configure the hosts correctly. If the offloads are off by default then you’ve opted to change the default, and it’s up to you to test – all blame goes on your shoulders.

So what modification do you think I’m going to make to these new hosts? See READ HERE above 😀

EDIT:

FYI, basic 1 GbE networking was broken on these hosts when I installed WS2012 R2 with all Windows Updates – the 10 GbE NICs were fine. I had to deploy firmware and driver updates from Dell to get the R730 to reliably talk on the network … before I did what is covered in READ HERE above.

17 thoughts on “Broadcom & Intel Network Engineers Need A Good Beating”

  1. I’ve always disabled VMQ after reading your previous posts about it, however lack of testing was never the issue. I have had it where VMQ is enabled and it ends up not showing itself for days or weeks. Almost seems like it had to send a certain amount of traffic before it started failing. What made it worse was they were in a team, so there was no guarantee that both NICs hit the line at the same time. Ended up showing itself as an “intermittent” issue…

    It however is now my #1 item to look at when troubleshooting smaller Hyper-V deployments!

  2. I’ve spent so much time troubleshooting odd behavior of a Hyper-V 2012 R2 cluster, which includes a R620 with the Broadcom NIC. I am puzzled why after some many posts and warnings both Broadcom and Microsoft can still neglect this problem. I consistently avoid Broadcom NICs solely because of my experiences with VMQ. It’s simply not worth the hassle to save $20 per server using Broadcom NICs. Even the new Dell R330 is a dud since they chose to onboard Broadcom NICs.

  3. I recently did a hyper-V install and had similar weird issues. Could ping some hosts, not others. Some VMs had internet acess, other not. Couldn’t connect via RDP, etc, etc. Disabled VMQ and rebooted everything, and all was well.
    A couple of days later, the problem was back, but worse than before – everything was down. I couldn’t get in remotely, so drove across the country to the client. All the VMQ settings were correct, but network was down. Eventually isolated it to one office, whereupon the client remembered that he had plugged in an old 10MB 3Com switch he had lying around as he needed a few extra ports in that office!
    The strange thing was that the hub had been there for months, and the client had requested the upgrade from SBS 2003 as the network kept grinding to a halt. So that hub cost him dearly, but at least he had a shiny new network!

  4. Thanks for calling them out. They need to hear it, maybe they’ll start taking their NIC driver QA teams seriously again.

  5. Had this happen at a client a couple of weeks ago, was very intermittent but disabling VMQ did the trick. Microsoft say newer drivers fix the issue, but I don’t believe them, this server had them.

  6. Aidan, what do you recommend. On all physical 1GB interface, the virtual team interface or both?

  7. We disabled VMQ via power shell on one of the DataOn clusters (which has some 10Gb and some 1GB NIC’s) and its been rock solid this last year.
    Thanks for the support you gave us at the time.
    I will never deploy a 2012 R2 Cluster 1GB with VMQ enabled. Just not work the aggravation and wasted time.
    Thanks
    Eamonn

  8. Aidan, I hate to disagree with you because I have learned a lot from your blog. This isn’t the case anymore, it is fixed. VMQ on 1 Gig nics can work, you just have to jump through a few hoops! I too have been affected by this issue ALOT. I didn’t find it acceptable to just disable VMQ. I wanted it to work. I have a lot of clients, with dell servers and Broadcom NICs, with hyper-v on server 2012 R2. FINALLY in March of 2015 Broadcom finally had a driver available that fixed the issues. However I was upset to find that JUST the driver/firmware update on the Broadcom NICs didn’t seem to make a difference. I then created what I call the holy grail of enabling VMQ on servers with 1Gig NICs.

    I’d like to share my “template” for properly configuring VMQ on Dell servers with 1G NICs. I’ve now successfully accomplished this on two hyper-v 2012 R2 clusters in production. You must make sure your Dell servers have the latest NIC firmware and drivers.

    READ THIS
    Microsoft article detailing broadcom resolution (fixed with March 2015 driver)
    https://support.microsoft.com/en-us/kb/2986895

    INSTALL THIS
    Microsoft hotfix that fixes VMQ issues
    https://support.microsoft.com/en-us/kb/3031598

    READ THESE
    VMQ Deep Dive articles
    https://blogs.technet.microsoft.com/networking/2013/09/10/vmq-deep-dive-1-of-3/
    https://blogs.technet.microsoft.com/networking/2013/09/24/vmq-deep-dive-2-of-3/
    https://blogs.technet.microsoft.com/networking/2013/10/22/vmq-deep-dive-3-of-3/

    ***In the third article, it explains how to actually enable VMQs on 1 gig nics via the registry. Believe it or not, it is off by default. DO NOT enable this until you have the latest firmware and drivers installed and the above mentioned hotfix.

    Here is an excerpt from the vmq deep dive series 3 of 3:

    VMQ and 1G NICs

    The second issue that is reported frequently is the implementation of VMQ on 1G NICs. By default, we do not enable VMQ on 1G NICs because a single processor is usually more than sufficient to handle the networking traffic generated. If your workload requires that you use VMQ on a 1G card you will need to enable it by setting a registry key. The registry key is below:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VMSMP\Parameters\BelowTenGigVmqEnabled

    DWORD = 1

    Lastly enable RSS on your virtual machines.

  9. Does anyone have a good current guide on which offloads play nice with HyperV clusters? plenty of the offload and other advanced features look nice in theory (and M$ even recommends most), but the reality seems to be that many of them break more than they fix.

  10. Jeez that revelation save me some serious time.. thanks million. all i did was
    1. Change Virtual Machine Queues to Disabled
    and the mail queue went to zero so quickly dammmm.

Leave a Reply to Eamonn Deering Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.