What really happens when HADR_CLUSAPI_CALL wait type is set?
Published Feb 10 2019 05:23 PM 6,833 Views
Microsoft

First published on MSDN on Aug 18, 2017
In a customer scenario we saw a query against system views related to AlwaysOn taking a fairly long time.

SELECT *
FROM   sys.availability_databases_cluster adc
INNER JOIN sys.availability_replicas ar
ON adc.group_id = ar.group_id
WHERE  adc.database_name = 'db1'

Investigation showed that the query was predominantly waiting on HADR_CLUSAPI_CALL. As the name suggests, this is coming from the HADR (High-Availability-Disaster-Recovery) functionality of SQL, known as AlwaysOn. Also CLUSAPI_CALL is short (but not too short) for "Cluster API calls".  In other words, whenever AlwaysOn calls into the WFCS for some work, it sets this wait type. You may wonder what type of "work" is being requested, i.e. what are these API calls?

Here is a list that an hour or two of source code research yielded (these are listed in no particular order):

    1. Check remote cluster
      OpenCluster()
      GetComputerNameEx() -not a cluster API per se
      OpenClusterNode()

 

    1. Get resource name of virtual server
      ClusterEnum()
      OpenClusterResource()
      ClusterResourceControl(.... ) using control codes CLUSCTL_RESOURCE_GET_DNS_NAME and CLUSCTL_RESOURCE_GET_NETWORK_NAME

 

    1. Enumerate cluster resources
      ClusterEnum()
      ClusterNodeEnum()
      ClusterNetInterfaceControl() - CLUSCTL_NETINTERFACE_GET_NODE, CLUSCTL_NETINTERFACE_GET_NETWORK , etc.

 

    1. Read network Information from Cluster
      OpenClusterNetwork()
      ClusterNetworkControl() - CLUSCTL_NETWORK_GET_RO_COMMON_PROPERTIES , CLUSCTL_NETWORK_GET_COMMON_PROPERTIES
      ResUtilFindDwordProperty( ...'role' ...)
      ResUtilFindMultiSzProperty() -  IPv4 and IPv6 Addresses,  IPv4 and IPv6 PrefixLengths

 

    1. Closing handles on Cluster resources (not likely that any of the Close* functions were delaying things, but possible)



CloseCluster()
CloseClusterNode()
CloseClusterGroup()
CloseClusterResource()
ClusterResourceCloseEnum()
ClusterNodeCloseEnum()
ClusterRegCloseKey()
ClusterCloseEnum()
CloseClusterNetwork()
CloseClusterNetInterface()
CloseClusterNotifyPort()


What to do?

If you encounter this issue, you should investigate the WFCS in more depth. Run Cluster Validation (but do it when SQL Server is not online)  check your network adapters, cables, DNS resolution, correctness of IP addresses and subnets, disk resources.

Version history
Last update:
‎Feb 12 2019 07:17 AM
Updated by: