First published on MSDN on Nov 23, 2010
A frequent cluster network connection issue we see happens when the cluster cannot use WMI. WMI is Windows Management Instrumentati...
I am currently facing an issue where the failover cluster has disconnected, and attempts to reconnect have failed due to one or more nodes not responding to WMI calls. I have two nodes, and one of them has also disappeared from the Hyper-V Manager interface.
Upon running the Get-Cluster command, I can see that both nodes are up and running, and the VMs on the affected node are still operational. However, when I try to reconnect the node in Hyper-V Manager, there is no response, and nothing happens after clicking "OK" on the connection prompt.
I have checked the network connection, and everything appears to be fine, including DNS resolution.
I have not yet attempted to restart the WMI services because I am concerned it might impact the production VMs.
Do you have any suggestions on how I can resolve this issue?
Based on the details that you've provided it sounds like it's simply "Failover Cluster Manager" which has disconnected, not the cluster, itself. Remember that Failover Cluster Manager and Hyper-V Manager are simply clients which are used to manage these roles and features.
Note that the bottom of the error message indicates the node that Failover Cluster Manager failed to connect to and try connecting directly to another cluster node when prompted for the cluster name. You should be able to do it from a command prompt like this:
cluadmin NODE2
You can also do a basic cluster WMI functionality test with PowerShell like this (use -computer $env:computername to test the local node, or type out another node name to test remotely):
Since you mentioned that you're also unable to connect to the node in Hyper-V Manager, it's possible that the WMI issue might not be specific to Failover Clustering, so you can also test basic WMI functionality using PowerShell like this (use -computer $env:computername to test the local node, or type out another node name to test remotely):
Also look for any WMI warnings or errors in the Application event log on the affected node - event ID 5612 is a common one you might find if something is overconsuming WMI.