Connection Timeouts in Multi-subnet Availability Group
Published Jan 15 2019 04:00 PM 51.7K Views
First published on MSDN on Jun 03, 2014


One of the issues that generates a lot of call volume we see on the AlwaysOn team is dealing with connectivity issues to the availability group listener in multi-subnet environments.

A “multi-subnet” environment is defined when the OS cluster used as the backbone for AlwaysOn has server nodes that are located in multiple, different subnets. Usually there are only 2 subnets, however, there can be more.

When the availability group listener (AGL) is configured properly, it will have an IP address for each defined subnet and have an “OR” dependency on each of the IP addresses. By default, when it is brought online it will be registered in DNS by Windows Cluster. The cluster will submit all of the IP addresses that are in the dependency list and the DNS server will generally register an A record for each IP address. (If non Microsoft Windows DNS servers are used, the exact implementation can be different).

When a client operating system (OS) needs to resolve the AGL name to IP by querying the DNS server, the DNS server will return multiple IP addresses – one for each subnet. The listener IP address in the subnet currently hosting the availability group primary replica will be online. The other listener IP address(es) will be offline. Because not all of the IP addresses returned by DNS will be online, client applications can run into problems when attempting to connect to the listener.


By default, the behavior of the SQL client libraries is to try all IP addresses returned by the DNS lookup - one after another (serially) until the all of the IP addresses have been exhausted and either a connection is made, or a connection timeout threshold has been reached. This can be problematic, because depending upon DNS configurations, the “correct” or “online” IP address may not be the first IP address returned. The default timeout for a TCP connection attempt is 21 seconds and if the first IP address attempted is not online, it will wait 21 seconds before attempting the next IP address. For each subsequent IP address, it will again have to wait 21 seconds before moving to the next IP address until the connection attempt times out or it establishes a connection to an IP address that responds.

The default connection timeout period for .NET client libraries is 15 seconds , therefore, some applications may experience intermittent connection timeouts – or delays in connecting – which can cause application delays or performance issues.


Beginning with the native client libraries for SQL 2012 as well as the .NET 4.5 libraries (earlier .NET libraries with hotfixes – see below in Appendix A), Microsoft added a new connection string parameter that can be added to change the connection behavior. This new parameter, MultiSubnetFailover , should be used and set to “TRUE.”  When set to TRUE, the connection attempt behavior changes. It will no longer attempt all of the IP addresses serially, but in parallel. That is, all of the IP addresses that the availability group listener is dependent on will receive a SYN request at the TCP layer “in parallel” (technically one immediately after the other, but not waiting for acknowledgement – so effectively “in parallel”). This means that whichever IP address is online will be attempted immediately rather than waiting for any timeouts on IP addresses that are not online. The server will respond immediately and establish a connection, while the other IP addresses and their respective connection attempts will eventually timeout – but since the application is already connected it does not matter that those connection attempts timeout and fail.


The client libraries by default do not enable this parameter (i.e. it is set to “FALSE”). The connection strings must be modified in order to ensure successful consistent and successful connections to a multi-subnet listener. Sometimes it isn’t possible to modify the connection strings – and so some applications will still encounter timeout issues when trying to connect. In some cases, the connection timeouts can be intermittent or they can be very consistent - depending upon the order in which IP addresses are returned.


One option to resolve the issue if an application cannot use the MultiSubnetFailover parameter is to change the behavior of how the AGL is registered with DNS. This assumes that dynamic updating of DNS is allowed within the environment. If you are unsure, please check with your DNS administrators to determine if dynamic updating of DNS is allowed.

There are two parameters that affect how the AGL is registered with DNS. By modifying these parameters on the server we can transparently change the experience of the client OS in its name resolution caching.

The first parameter of interest is called RegisterAllProvidersIP . This parameter determines whether the Windows Cluster will register all of the IP addresses the AGL is dependent on, or only the one active IP address. When set to 1 (default if the AGL is created from SQL Server), the AGL clustered resource is created with all of the IP addresses the AGL is dependent on, registered in DNS. When set to 0, only the one active IP address is registered in DNS (the IP address in the subnet hosting the primary replica). (NOTE: if a Client Access Point is created using Windows Failover Cluster Manager, the RegisterAllProvidersIP parameter is set to 0 by default.)

The second parameter is called HostRecordTTL . This parameter governs how long (in seconds) before cached DNS entries on a client OS are expired, forcing the client OS to re-query the DNS server again to obtain the current IP address. By default, this value is 1200 (20 minutes). This means that after a client OS makes a call to the DNS server to resolve this name to an IP address, the client OS will cache that value for 20 minutes, only querying the DNS server again after that cached record expires. If this value is reduced to 120 or 60 for example, then the client OS cached copy will expire much more quickly.

This is important because during a failover in which the primary replica moves from one subnet to the other, the old IP address that was online is un-registered, and the new IP address that is brought online is registered. This updates DNS with the new IP address as soon as the AGL comes online, but client operating systems will not resolve the AGL name to the new IP address until the currently cached entry expires, which if it had just re-queried DNS immediately before the failover, the client OS would have to wait up to 20 minutes before expiring its cached copy and querying the DNS server again to get the new IP address. This causes the client OS to continue trying to connect to the OLD IP address until its cached copy has expired. By changing the HostRecordTTL parameter value to a much lower setting than 1200, it will cause the cached value to expire more quickly. So if set to 60, the client OS will only have to wait at most 60 seconds after a failover before acquiring the new IP address – allowing client operating systems to resolve to the new, correct IP address much sooner.

The drawback to setting the value to a lower number is how often the client OS will query the DNS server. If you have a handful of application servers, then changing the value from 1200 to 60 would probably have no perceptible impact on the DNS server(s). However, if there are thousands of client machines that all must resolve the AGL name to IP, this increases the load on the DNS server(s) and could cause problems.

A balance must be drawn between the lowest possible cache expiration time and the increased DNS server load.

The following PowerShell instructions show how to change the RegisterAllProvidersIP and HostRecordTTL settings. It is important to note that these settings cannot take effect until the AGL is brought offline and then online again, forcing it to re-register with DNS. Remember, the availability group is dependent on the AGL. If the AGL goes offline, so will the availability group and the databases. However, this dependency can be temporarily removed, allowing for the OFFLINE and re-ONLINE of the AGL without taking the availability group offline.

If there are applications or users that are actively using the AGL to connect to a replica (primary or secondary with read-only Routing), then the OFFLINE/ONLINE process will cause service interruption regardless of dependencies. Therefore, if the AGL is in use, perform the following steps during a maintenance window.


Availability group listener resource parameters:  RegisterAllProvidersIP and HostRecordTTL

The following steps show how to turn off RegisterAllProvidersIP and reduce the Client OS DNS cache timeouts ( HostRecordTTL ) parameters, in the event that you cannot use the MultiSubnetFailover=True parameter on all connection strings.

If you prefer, there are two sample scripts (one TSQL and one PowerShell) in Appendix B at the end of this document that has all of the required commands already configured.  These scripts can then be executed to perform all of the steps necessary.


  • The “>” symbol at the beginning of each line represents the command prompt and should not be typed.
  • Resource names are listed inside brackets (“<” and “>”) – do not include the brackets when typing the command for execution.

Before making any changes, it is necessary to get the PORT values for each listener defined in the AG.  This is because temporarily removing the cluster resource dependencies for the AG resource on the AGL will eliminate the port assignment for the listener.  If the port assigned was not 1433, the port needs to be specified again for the listener.   Capturing the existing port assignments before making any changes will allow the restoration of the proper port assignments at the end of the script.

1.  Capture the existing port assignment for each listener in the AG.

In SQL Server Management Studio, connect to the AG primary node execute the following TSQL and then keep the results to execute after all of the PowerShell commands have been completed.

SELECT '-- (1) Copy/Paste the results of this query '
+ ' into a query window.'
AS [Generated TSQL script:]
SELECT '-- (2) After all PowerShell scripts/command'
+ ' have been executed,'
SELECT '-- (3) Execute the following TSQL commands'
+ ' to restore PORT settings.'
+ agl.dns_name + ''' (PORT = '
+ CAST(ISNULL(agl.port,1433) AS VARCHAR(5)) + ');'
+ CHAR(13) + CHAR(10)
FROM sys.availability_group_listeners agl
INNER JOIN sys.availability_groups ag
ON agl.group_id = ag.group_id

The above script should yield one or more TSQL statements that can be copied and pasted into a query window in SSMS later to re-configure the port for each listener.

2. On any one of the nodes in the cluster, open an administrative PowerShell window.

3. Get the cluster resource name for the availability group resource and the availability group listener resource using the following commands:

  • >Import-Module FailoverClusters
  • >Get-ClusterResource

This will produce an output similar to the following:

In this list of resources, we will concern ourselves with three different columns:

  • the Name (left most, with heading surrounded in light green box)
  • the OwnerGroup (third column from left, with heading in orange box)
  • the ResourceType (right most column with heading in yellow box)

The scripts below require the use of the resource name (left most column) for the resource on which we will make changes.

To get the correct resource, first find the name of your availability group in the third column (OwnerGroup) (light orange box in picture below). Once you have found the correct group, then find the resources that we need to change. We will look for two types: “ SQL Server Availability Group ” and “ Network Name ”. The resource types will be found in the right most column. In the picture below, the availability group resource is underlined in red, and the network name resource (for the listener) is underlined in yellow.

For the subsequent steps, use the following resource names:

  • TestAG_TestAGList _”, to substitute for <AG Listener Resource Name>
  • TestAG ”, to substitute for <AG RESOURCE NAME>

4. Change the parameters with the following commands:

  • >Get-ClusterResource <AG Listener Resource Name> | Set-ClusterParameter -Name HostRecordTTL -Value 120
  • >Get-ClusterResource <AG Listener Resource Name> | Set-ClusterParameter -Name RegisterAllProvidersIP -Value 0

As can be seen in the example above, the resource name for the “Network Name” type is used, “TestAG_TestAGList”. After the command is executed, a yellow warning message is shown that indicates the parameter change will not take effect until the resource is taken offline and then brought back online. This can be done during a normal availability group cluster failover, or through PowerShell script (later in this document). Similarly, the second command above can be issued to change the parameter RegisterAllProvidersIP. It, too, will return a yellow warning message identical to the one shown – indicating the parameter change will not take effect until the resource is taken offline and brought back online.

5. Temporarily remove dependency between the availability group resource and the listener name resource.

Because the listener name resource has to be taken offline and back online for the above changes to take effect, and the fact that the availability group resource is dependent on the listener name, simply taking the listener name resource offline will also take the availability group (and its databases) offline in the process. To avoid taking the availability group resource offline, the dependency that the availability group has on the listener name can be temporarily removed and then re-applied. This can either be done by the Windows Failover Cluster Manager utility or through PowerShell commands.

To remove the dependency using Windows Failover Cluster Manager:

  • Select the availability group resource.
  • Right click and select properties.
  • On the Properties dialog, navigate to the dependencies tab
  • Select the resource and click the “Delete” button and then “OK” to close the dialog box.

To remove the dependency using PowerShell:

  • >Remove-ClusterResourceDependency -Resource <AG RESOURCE NAME> -Provider <AG Listener Resource Name>

6. Offline and re-online the listener resource to force re-registration with DNS and complete the changes:

  • >Stop-ClusterResource <AG Listener Resource Name>
  • >Start-ClusterResource <AG Listener Resource Name>
To force updating DNS on Windows Server 2012 or 2012 R2:
  • >Get-ClusterResource <AG Listener Resource Name> | Update-ClusterNetworkNameResource

To force updating DNS on Windows Server 2008 or 2008R2:
  • >Cluster.exe RES <AG Listener Resource Name> /registerdns

7. Re-add the dependency of the AG resource on the Listener name resource. (The dependency should exist for proper function of the availability group and access to the databases within the availability group. Failure to re-add the dependency could cause unintended behavior of the availability group and database availability.)

To re-add the dependency using Windows Server Failover Cluster Manager:

  • Select the availability group resource.
  • Right click and select properties.
  • On the Properties dialog, navigate to the dependencies tab
  • Click the “drop down underneath “Resource” and select the listener name resource.
  • Click the “apply” button, then “OK” to close the dialog box.

To re-add the dependency using PowerShell execute the following command:

  • >Add-ClusterResourceDependency -Resource <AG RESOURCE NAME> -Provider <AG Listener Resource Name>

8. Verification – view the dependency to make sure it is re-applied, ensure all cluster resources for this AG are online, and review the parameters to make sure they’re set to the new value:

  • >Get-ClusterResourceDependency <AG RESOURCE NAME>
  • >Get-ClusterResource <AG Listener Resource Name> | Get-ClusterParameter HostRecordTTL, RegisterAllProvidersIP
  • >Get-ClusterResource

9. Re-configure Listener PORT settings.

In step 1 above, a TSQL script was executed that generated additional TSQL commands that will restore the original PORT settings for the AG Listeners.   Copy and paste the results from the query in step 1 into a query window in SSMS – connected to the primary and execute the TSQL.   The TSQL generated from step one should look similar to:

Do not type the TSQL from the above image, use the TSQL that was generated in step 1 on your machine!

After pasting into a query window, the generated TSQL text should look something similar to the following with ALTER AVAILABILITY GROUP statements inside TRY/CATCH blocks.

Upon execution it should return successful completion:

Finally, the following query will return the list of listeners and their port settings – for all availability groups on the machine:

/* this script will obtain the ports defined
* for each availability group listener that
* exists.*/
SELECT AS [Availability Group],
agl.dns_name AS [Listener DNS Name],
agl.port AS [Port]
FROM sys.availability_group_listeners agl
INNER JOIN sys.availability_groups ag
ON agl.group_id = ag.group_id
ORDER BY, agl.dns_name



For older operating systems such as Windows 7 and Windows Server 2008 R2, it is recommended the hotfixes referenced below be applied to ensure connection timeouts do not occur. Because of an issue with TDX/TDI filter drivers, timeouts can still occur when connecting to a name with multiple IP addresses – even when the correct client libraries are used and the MultisubnetFailover=True parameter is specified. These drivers are usually installed as part of older security systems such as anti-virus and intrusion detection. Ensuring these hotfixes have been applied will help prevent connectivity timeouts. Please note however, there is no hotfix for Windows Server 2008, only Windows Server 7 and Windows Server 2008 R2. If using Windows Server 2008 as a client, please refer to the articles below for more options in resolving timeout issues.

Two additional things to consider with respect to registering the listener name with DNS after making changes to either the HostRecordTTL or RegisterAllProvidersIP parameters --- DNS replication and previous settings.

The DNS server that is contacted by the OS cluster when registering or de-registering hostnames may not be the same DNS server that clients are using to resolve names to IP addresses. If this is the case, then it is possible to have additional delays in the client’s ability to get freshly updated information – simply because the client’s DNS server may not have the updated information. DNS replication topology and configuration settings can cause additional delays before all changes are replicated throughout an enterprise network. If significant delays are experienced either after a failover, or when changing these parameters, the network or DNS administrator should be contacted to investigate the DNS replication topology for the enterprise to determine if the time required to replicate across the entire organization can be reduced.

The other item to consider is that the “previous settings” are most likely already cached on client machines. If the parameter settings were “default” prior to making any changes, then any cached entries on client machines will still have the “old” expiration setting (TTL) – which is 20 minutes. That means, that even after changing the RegisterAllProvidersIP and HostRecordTTL settings – and taking the resource offline and back online to take effect – previously cached entries are not automatically expired. The client must wait the for the current TTL setting before it expires its cached copy. This means that it could still be up to 20 minutes before a client will get the new settings.

After the changes have been made and cached entries have been expired, the new settings will take effect and any subsequent TTL expirations will take place based upon the new setting (for example after 60 or 120 seconds) rather than the original default value of 20 minutes. This can be expedited on client machines if necessary by issuing an IPCONFIG /FLUSHDNS command from an elevated command prompt. This will cause the client to expire all cached entries and re-query the DNS server to obtain the new settings.


Section 5.7.1 Client-Connectivity For AlwaysOn Availability Groups from: SQL Server 2012 Release Notes .

The following table summarizes driver support for AlwaysOn Availability Groups:


Multi-Subnet Failover

Application Intent

Read-Only Routing

Multi-Subnet Failover: Faster Single Subnet Endpoint Failover

Multi-Subnet Failover: Named Instance Resolution For SQL Clustered Instances

SQL Native Client 11.0 ODBC






SQL Native Client 11.0 OLEDB






ADO.NET with .NET Framework 4.0 with connectivity patch *






ADO.NET with .NET Framework 3.5 SP1 with connectivity patch **






Microsoft JDBC driver 4.0 for SQL Server






* Download the connectivity patch for ADO .NET with .NET Framework 4.0: .

** Download the connectivity patch for ADO.NET with .NET Framework 3.5 SP1: .

MultiSubnetFailover Keyword and Associated Features

MultiSubnetFailover is a new connection string keyword used to enable faster failover with AlwaysOn availability groups and AlwaysOn Failover Cluster Instances in SQL Server 2012. The following three sub-features are enabled when MultiSubnetFailover=True is set in connection string:

  • Faster multi-subnet failover to a multi-subnet listener for an AlwaysOn Availability Group or Failover Cluster Instances.
    • Named instance resolution to a multi-subnet AlwaysOn Failover Cluster Instance.
  • Faster single subnet failover to a single subnet listener for an AlwaysOn Availability Group or Failover Cluster Instances.
    • This feature is used when connecting to a listener that has a single IP in a single subnet. This performs more aggressive TCP connection retries to speed up single subnet failovers.
  • Named instance resolution to a multi-subnet AlwaysOn Failover Cluster Instance.
    • This is to add named instance resolution support for an AlwaysOn Failover Cluster Instances with multiple subnet endpoints.

MultiSubnetFailover=True Not Supported by NET Framework 3.5 or OLEDB

Issue: If your Availability Group or Failover Cluster Instance has a listener name (known as the network name or Client Access Point in the WSFC Cluster Manager) depending on multiple IP addresses from different subnets, and you are using either ADO.NET with .NET Framework 3.5SP1 or SQL Native Client 11.0 OLEDB, potentially, 50% of your client-connection requests to the availability group listener will hit a connection timeout.


This is the script that should be run first to collect the port assignments for the listeners and generate TSQL code to be executed after the PowerShell script to re-configure the port settings to their original values.

/* this script will obtain the ports defined
* for each availability group listener that
* exists.  If no port is defined, it will
* assign it will use port 1433.
* The output will show the TSQL syntax
* to alter the listeners to apply the
* same port values later, should they
* need to be re-configured to the same
* ports.*/

DECLARE @EndTryCatch VARCHAR(max) = 'END TRY' + @CRLF +
'IF (@@ERROR <> 19468)' + @CRLF +

SELECT '-- (1) Copy/Paste the results of this query '
+ ' into a query window.'
AS [Generated TSQL script:]
SELECT '-- (2) After all PowerShell scripts/command'
+ ' have been executed,'
SELECT '-- (3) Execute the following TSQL commands'
+ ' to restore PORT settings.'
+ agl.dns_name + ''' (PORT = '
+ CAST(ISNULL(agl.port,1433) AS VARCHAR(5)) + ');'
+ @CRLF + @EndTryCatch + @CRLF
FROM sys.availability_group_listeners agl
INNER JOIN sys.availability_groups ag
ON agl.group_id = ag.group_id

PowerShell script to change the HostRecordTTL and RegisterAllProvidersIP settings.

There are five variables that need to be changed before executing the script. They are located toward the top of the script underneath “CHANGE THESE VARIABLES”. It is recommended you become familiar with the script and its options in a test environment before attempting in production. The script is written such that it will affect ALL availability group listener resources for the specified availability group, since if the parameters need to be changed for one listener, then most likely, if the availability group has more than one listener, they should all be changed.

#This script is provided "AS IS" with no warranties, and confers no rights.
#   Use of included script samples are subject to the terms specified at

#$strAGName          the name of the availability group
#$TTLValue           the # of seconds for HostRecordTTL timeout value
#$AllIPs             [0 | 1] 0 = only register one IP, 1 = register all IPs
#$RestartListener    [0 | 1] 1 = restart listener / 0 = do not restart
#$RemoveDependencies [0 | 1] 1 = temporarily remove / 0 = leave alone

#Define Variables
$strAGName = "TestAG"         #<<<<<<<<<<<<<<<<<<<<<<<<<
$TTLValue = "120"             #<<<<<<<<<<<<<<<<<<<<<<<<<
$AllIPs = 0                   #<<<<<<<<<<<<<<<<<<<<<<<<<
$RestartListener = 1          #<<<<<<<<<<<<<<<<<<<<<<<<<
$RemoveDependencies = 1       #<<<<<<<<<<<<<<<<<<<<<<<<<

#   1) Test this script in non-production environments first.
#   2) This script will change the parameters for _all_ listeners
#      for the specified availability group
#   3) This script can optionally restart the listener(s)
#   4) if restaring listeners, it can optionally temporarily
#      remove and restore the dependencies to take the
#      listener(s) offline without taking the availability group
#      itself offline.  If choosing not to temporarily remove
#      and restore dependencies, then when the listener(s) are
#      taken offline, the availability group resource will also
#      go offline - thus making the databases in the AG inaccessible.
#   5) if choosing to remove dependencies, the existing depenedencies
#      are collected and restored after restaring the listener(s)
#   6) Windows Server 2012/2012R2 has a powershell command to
#      re-register listener(s) with DNS.  Server 2008/2008R2 does
#      does not.  there is logic to determine and use the CLUSTER.EXE
#      command for Windows Server 2008/2008R2

#no changes required below this point

#Get OS version
$OSMajor = ([System.Environment]::OSVersion.Version).Major
$OSMinor = ([System.Environment]::OSVersion.Version).Minor

#load cluster module
Import-Module FailoverClusters

#get the cluster role (group) object based on the AG name provided above
$objAGGroup = Get-ClusterGroup $strAGName -ErrorAction SilentlyContinue

if ($objAGGroup -eq $null)
{Write-Host "Error:  Availability Group not found."}
#get the AG resource object in this cluster role (group)
$objAGRes = $objAGGroup | Get-ClusterResource |
Where-Object {$_.ResourceType -match "SQL Server Availability Group*"}
#get the listener(s) object(s) in this cluster role (group)
$objListener = $objAGGroup | Get-ClusterResource |
Where-Object {$_.ResourceType -match "Network Name*"}

#change the parameter settings: HostRecordTTL & RegisterAllProvidersIP
Write-Host "Making changes to Network Name:"  $list.Name
$objListener | Set-ClusterParameter -Name HostRecordTTL -Value $TTLValue
$objListener | Set-ClusterParameter -Name RegisterAllProvidersIP -Value @AllIPs
$objListener | Get-ClusterParameter -Name HostRecordTTL
$objListener | Get-ClusterParameter -Name RegisterAllProvidersIP

if ($RestartListener -eq 1) {
if($RemoveDependencies -eq 1) {
#capture the dependency(ies) that the AG resource depends on
$DepStr = ($objAGRes | Get-ClusterResourceDependency).DependencyExpression
Write-Host "Removing dependecny for " $objAGRes.Name  " on '" $DepStr "'"
Set-ClusterResourceDependency -Resource $objAGRes -Dependency $null
} #if remove dependencies

#restart the listener resource(es)
Write-Host "Restarting Network Name resource:" $list.Name
$objListener | Stop-ClusterResource
$objListener | Start-ClusterResource

#force re-registration in DNS
if ($OSMajor -ge 6 -and $OSMinor -ge 2) {
#Windows Server 2012 and up
$objListener | Update-ClusterNetworkNameResource -Verbose
else {
#for Windows Server 2008/2008R2
ForEach($list in $objListener) {
cluster.exe res $  /registerdns
if($RemoveDependencies -eq 1) {
#restore the dependency(ies) to previous setting
Write-Host "Reapplying dependencies for " $objAGRes.Name
Set-ClusterResourceDependency -Resource $objAGRes -Dependency $DepStr
#show dependency (so it can be compared) / show the settings
$objAGRes | Get-ClusterResourceDependency
} #if remove dependencies
else {
#if we chose not to remove dependencies we need to restart
#the availability group resource
$objAGRes | Start-ClusterResource
} #if restart
}#else - availability group found

Version history
Last update:
‎Jan 15 2019 04:00 PM
Updated by: