First some background …
A cluster is made up of (normally) 2 or more servers. They use a distributed database to keep a synchronised copy of the configuration of the HA resources, e.g. HA VMs on a Hyper-V cluster. Something called the Global Update Manager (GUM) is used to coordinate consistent updates to resource configurations across the cluster nodes.
When a node contains an update that has to be shared with other nodes an initiator node first obtains a GUM lock. Then, the node shares the update by using a Multicast Request Reply (MRR) message to the other nodes. After this update is sent, the initiator node waits for a response from other nodes before you continue. However, in certain conditions, one of the nodes does not reply to the GUM request in time because the node is stuck for some reason. Currently, there is no mechanism to determine which node is stuck and does not reply to the GUM request.
That’s just changed thanks to a hotfix that is now available that adds two new cluster control codes to help you determine which cluster node is blocking a GUM update in Windows Server 2008 R2 and Windows Server 2012.
After you install this hotfix, two new cluster control codes are added to help the administrator resolve the problem. One of the cluster control codes returns the GUM lock owner, and the other control code returns the nodes that are stuck. Therefore, administrator can restart the stuck nodes to resolve the problem. For more information about the new control codes, go to the following Microsoft:
- General information about CLUSCTL_CLUSTER_GET_GUM_LOCK_OWNER control code
- General information about CLUSCTL_NODE_GET_STUCK_NODES control code
- The Cluster service has a facility that is called GUM. GUM is used to distribute a global state throughout the cluster.
- Only one cluster node can send GUM messages at any time. This node is called the GUM lock owner.
- The GUM lock owner sends an MRR message to a subset of cluster nodes, and then waits for the nodes send message receipt confirmations.
- Run some iterations of these control codes to confirm that the node is stuck.
- After the CLUSCTL_CLUSTER_GET_GUM_LOCK_OWNER control code is called, you have to close the cluster handle. Then, you reopen the cluster handle by using the GUM lock owner node name that is returned by the control code. If you do not perform this action, the CLUSCTL_NODE_GET_STUCK_NODES control code may return an incorrect result.
You can get this hotfix from here.