Blog Post

Failover Clustering
4 MIN READ

Trouble Connecting to Cluster Nodes? Check WMI!

John Marlin's avatar
John Marlin
Icon for Microsoft rankMicrosoft
Mar 15, 2019
First published on MSDN on Nov 23, 2010

A frequent cluster network connection issue we see happens when the cluster cannot use WMI.  WMI is Windows Management Instrumentation, which is an interface through which Windows components can provide information and notifications to each other, often between remote computers ( more info about WMI ).  Failover Clustering and System Center Virtual Machine Manager (SCVMM) often use WMI to communicate between cluster nodes, so if there is an issue contacting a cluster node, WMI may be the culprit.  We use WMI in most of our wizards, such as ‘Create Cluster Wizard’, ‘Validate a Configuration Wizard’, and ‘Add Node Wizard’, so any of the following messages and warnings we list could be due to WMI issues:


·         "RPC Server Unavailable" error.


·         Access is Denied.


·         The computer ‘Node1’ could not be reached.


·         Failed to retrieve the maximum number of nodes for ‘{0}’.


·         The computer ‘Node1.contoso.com’ does not have the Failover Clustering feature installed.  Use Server Manager to install the feature on this computer.


o   Note: first confirm you have installed the Failover Clustering feature on this node




Troubleshooting Steps


Follow these series of troubleshooting steps to allow you to continue connecting your cluster.



1) Ensure it is not a DNS Issue


It is possible that the reason you cannot contact the other servers is due to a DNS issue.  Before troubleshooting WMI, try connecting to that cluster, node or server using these methods when prompted by the cluster:


a)      Network Name for the cluster or node


a.       Example: MyNode


b)      FQDN for the cluster or node


a.       Example: MyNode.contoso.com


c)       IP Address for the cluster or node


a.       Example: 10.10.10.123


d)      Some wizard pages have a ‘browse’ button which allows you to find other clusters in the domain through Active Directory




2) Check your that WMI is Running on the Node


Windows Server Failover Clustering supports PowerShell and earlier version also come with a lightweight WMI client ( WBEMTest ).  Using either PowerShell or Wbemtest you can confirm that WMI is up and running.  Although you can use WMI remotely, it is better to test this directly on the server to ensure there are no other networking or firewall issue affecting the connection.



WMI Service


First check that the ‘Windows Management Instrumentation’ Service has started on each node by opening the Services console on that node.  Also check that its Startup Type is set to Automatic.




Next we will check that Failover Clustering WMI (MSCluster) is running.  These tests would be applicable after the cluster has already been created since we are checking for cluster-specific WMI functionality.


WBEMTest or directly on the server


·         Launch CMD


· CMD > WBEMTest


·         The Windows Management Instrumentation Tester will launch


·         Select Connect


·         Namespace: Root\MSCluster


·         Select Connect


o   If you see more options available, it means you are connected and WMI is working


§  Feel free to try a query to confirm, such as selecting ‘Query’ and enter: SELECT * from MSCluster_Resource


o   If you see an error, there is a WMI issue


PowerShell or remotely from another node within the same cluster (2008 R2 or higher only)


·         Launch Elevated PowerShell


· PS > get-wmiobject mscluster_resourcegroup -computer MyNode -namespace "ROOT\MSCluster“


o   If you see a lot of information displayed, WMI is running


o   If you see an error, there is a WMI or firewall issue




3) Check your Firewall Settings


When a cluster is created, we automatically open up all the firewall settings you need.  However enterprise security policies can make changes over time, so it is worth checking that the firewall on each server is allowing cluster communication.  WMI request a DCOM connection to be made between the nodes, so you need to ensure that the ‘Remote Administration’ setting is enabled on every cluster node.  This can be done through the Windows Firewall GUI or running the elevated command: CMD > netsh firewall set service RemoteAdmin enable . You will see a variety of errors or warnings if your firewall is not property configured.  For more information about how WMI uses the firewall and troubleshooting firewall issues, visit: http://msdn.microsoft.com/en-us/library/aa389286(VS.85).aspx .




4) Reboot the Node


This can often fix intermittent issues.  Follow best practices when rebooting the server, such as live migrating VMs and gracefully failing over other services and applications to reduce downtime.  Only do this if the other troubleshooting attempts described above have failed.




5) Rebuild a Corrupt WMI Repository


If you continue to see errors after checking that WMI is running, the firewall is properly configured and rebooting, it is possible that your WMI repository has become corrupt so the cluster can no longer read from it.  The following steps will enable you to rebuild your repository so that the other nodes can read from it again.  Rebuilding the repository should be your last troubleshooting step, not your first.



·         In the Services console, manually stop the WMI service to ensure that dependent services are stopped


·         Start WMI service again


·         Launch and elevated CMD or PowerShell


· CMD/PS > winmgmt /ResetRepository




6) Patch WMI for Performance Improvements


You initial connection problems should now be fixed.  If you continue to experience intermittent connection issues caused by WMI, it could be due to the performance of your servers.  We have released a hotfix for 2008 R2 which improves the speeds at which we return WMI queries, and this is optimized for the most common WMI calls which SCVMM makes.  Get it here: http://support.microsoft.com/kb/974930 .




Good luck in resolving your cluster connection issues with WMI!



Thanks,


Symon Perriman
Program Manager II
Clustering & High-Availability


Microsoft


Updated Mar 15, 2019
Version 2.0
  • There's another vector. Azure Stack HCI 22H2 with Attentation switch enabled and Blocking Firewall rules for ISDM on the hosts. 

  • Aidil's avatar
    Aidil
    Copper Contributor

    Hi John,

    I am currently facing an issue where the failover cluster has disconnected, and attempts to reconnect have failed due to one or more nodes not responding to WMI calls. I have two nodes, and one of them has also disappeared from the Hyper-V Manager interface.

    Upon running the Get-Cluster command, I can see that both nodes are up and running, and the VMs on the affected node are still operational. However, when I try to reconnect the node in Hyper-V Manager, there is no response, and nothing happens after clicking "OK" on the connection prompt.

    I have checked the network connection, and everything appears to be fine, including DNS resolution.

    I have not yet attempted to restart the WMI services because I am concerned it might impact the production VMs.

    Do you have any suggestions on how I can resolve this issue?

    Thank you in advance for your help.

     

    • EriqStern's avatar
      EriqStern
      Icon for Microsoft rankMicrosoft

      Based on the details that you've provided it sounds like it's simply "Failover Cluster Manager" which has disconnected, not the cluster, itself.  Remember that Failover Cluster Manager and Hyper-V Manager are simply clients which are used to manage these roles and features.

       

      Note that the bottom of the error message indicates the node that Failover Cluster Manager failed to connect to and try connecting directly to another cluster node when prompted for the cluster name.  You should be able to do it from a command prompt like this:

      cluadmin NODE2

       

      You can also do a basic cluster WMI functionality test with PowerShell like this (use -computer $env:computername to test the local node, or type out another node name to test remotely):

      get-wmiobject mscluster_resourcegroup -computer $env:computername -namespace "ROOT\MSCluster“

       

      Since you mentioned that you're also unable to connect to the node in Hyper-V Manager, it's possible that the WMI issue might not be specific to Failover Clustering, so you can also test basic WMI functionality using PowerShell like this (use -computer $env:computername to test the local node, or type out another node name to test remotely):

      Get-WmiObject win32_operatingsystem -ComputerName $env:ComputerName

       

      Also look for any WMI warnings or errors in the Application event log on the affected node - event ID 5612 is a common one you might find if something is overconsuming WMI.