VMM 2008 Agent Refresh Crashes Service and Console

I mentioned a while ago that I had a problem with refreshing the VMM agent on my Hyper-V lab machine in Virtual Machine Manager 2008.  The error produced in the job was:

"Error (1700) The Virtual Machine Manager service on the CINWSVR008 server stopped while this job was running. This may have been caused by a system restart.   

Recommended Action To restart this job, navigate to the Jobs view and select the job in the results pane. Then, in the Actions pane, click Restart."

At the same time I’d get errors 1 and 1999 in the VM Manager application log in event viewer.  The key word to note is PostVirtualNetworkUpdate. 

It turns out that this is bad.  I’ve previously mentioned how it was bad to every configure HP’s network card teaming on the BL460C if you intended to run Hyper-V?  Guess what?  This appears to apply to any server.  My lab machine is a DL380G5 which has different NIC hardware than the NIC’s on my BL460C’s that I’m using for VM networking.  Here’s what MS PSS had to say:

"This is a known bug and it is fixed in the next version.  The issue is caused by Hyper-V reporting 2 NICs with the same name.  Here are the known workarounds:

  1. Update drivers in the hope that that resets names.
  2. Avoid binding VM’s to either of the duplicate NICs.
  3. Disable all but one of the duplicate NICs on the host.
  4. Disable host refresh and only do on-demand refresh (for all hosts but the affected one).

The bug ID is 38264 and will be included in the next product release (R2).  Other than these stated “workarounds” there isn’t much else that can be done".

Eeeek! 

Option 2 is out.  The server has 2 NIC’s only: one for the parent and one for guests.  That also rules out option 3.  Option 4 is a no-no.  There’s no way I’m disabling automatic refresh on a VMM server that manages my cluster where VM’s can be migrated from A to B.  A bit more searching on my part and I found that the driver upgrade seemed to work for some people on HP hardware.

I’ve applied the latest drivers to this machine but no joy with that.  The MS engineer looking at the call has come up with another idea which I’m going to look at asap.

EDIT #1:

After further work with a very patient and responsive MS engineer we’ve run out of ideas.  This one doesn’t fit previously diagnosed cases.  BTW, the engineer in question is coincidentally publishing a blog post on this issue later this week.  My case is being forwarded to the product group to see if they have further ideas.

EDIT #2:

The MS engineer working the case, Mike Briggs, has documented a workaround that seems to fit most scenarios with these symptoms.

EDIT #3:

I’ve been informed that the developers are now working on a bug fix.  Yeap, I found a new one that isn’t covered by the above.  It seems my NIC configuration doesn’t quite fit the profile of the above, i.e. no lingering ghost NIC’s.  There’s no schedule on the release of the fix; the procedure is that your call gets closed when it gets to this part of the process.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.