Troubleshooting Availability Group Listener in Azure
Published Jan 15 2019 04:50 PM 3,432 Views
Microsoft
First published on MSDN on Feb 01, 2016
Configuring an availability group listener in Azure has additional steps involved when compared to creating an availability group listener on premises. This topic helps you troubleshoot your availability group listener, whether your AlwaysOn Availability Groups deployment is in Azure only or in a hybrid IT environment using a site-to-site VPN.

Some steps in the listener configuration involve configuration of Azure itself, such as the load-balanced virtual machine (VM) endpoint and direct server return. However, Azure currently does not provide any tools to help you verify your configuration is working as expected. Therefore, you need a network analyzer to help you verify your configuration as well as troubleshoot any problem. This topic shows you how to use Microsoft Network Monitor to troubleshoot your availability group listener.

Availability Group Listener Configuration Summary


This section provides a list of configuration options to check while troubleshooting your availability group listener.














Load-balanced Endpoint (Configured in Azure) Configuration inside VMs Configuration of Client Connectivity


  • Configured on all VMs that are availability replicas.

  • Public port and local port should be the same.

  • The probe port is used by the Azure load balancer to determine which server is the primary replica

  • Direct Server Return (DSR) is set to true on the VM load-balanced endpoint.




  • Configured on all cluster nodes that host
    availability.

  • Configured on all VMs that are availability replicas:

    • Open probe port in the firewall

    • Open listener port in the firewall



  • Configured on the computer or VM with the primary replica (in hybrid IT, the primary replica should be on-premises)

    • Create client access point for the availability group cluster service

    • Configure IP address resource with cloud service IP, cluster network name, and probe port

    • Configure dependency from availability group resource to Listener resource

    • Specify listener port in SQL Server Management Studio






  • If client is on Azure VM, place VM in a diferent cloud service

  • For clients in the same Active Directory domain, connect to configured listener name and port number

  • For clients outside of Azure, configure the login timeout to accommodate for network latencies.







Verify Probe Pings in Availability Group Listener Configuration


Note : Azure Load Balancers don’t support the ping (ICMP) protocol.


To determine whether the probe port on the load-balanced endpoint is working properly on the Azure VMs, you use Network Monitor to flter your packet capture on the probe port.

When the load-balanced endpoint is configured properly, the Azure load balancer continuously pings each of the VMs to determine if it has the primary replica so that it can route client connections to the correct VM. If the VM is the primary replica, the cluster service is configured with the probe port and responds to the probe pings. This traffic can be seen in Network Monitor by performing packet capture with the following flter applied in the Display Filter pane:

TCP.DstPort == 59999 OR TCP.SrcPort == 59999


The first clause captures the incoming pings from the Azure load balancer and the second clause captures the reply by the primary replica. The screenshot below shows what it looks like when you do a packet capture on the primary replica in Azure.



The Con Id column shows you packets related to the same probe ping (shown in yellow). The response message from your VM acknowledges each ping message by incrementing the Seq value in the Ack value. You can also see the load balancer’s source IP address (shown in orange).

If you do not see the response messages from the VM, but SynReTransmit messages instead from the load balancer, the VM is not responding to the load balancer, which can mean that it is not the primary replica or that the probe ping are not working as expected.

Verify Listener Connectivity in Availability Group Listener Configuration


To determine whether the availability group listener is working properly on the Azure VMs, you use Network Monitor to filter your packet capture on the listener port.

When the client tries to connect to the availability group listener using the cloud service’s IP address and the listener port, Azure verifies that the connection port is the same as the one configured in the load-balanced endpoint and lets the TCP connection through to the primary replica, which has been responding to the probe pings. If the VM’s firewall has a corresponding rule, the client access point is configured properly, and the listener port is configured in your availability group, the availability group listener accepts the connection and the client can perform updates and queries. This traffic can be seen in Network Monitor by performing packet capture with the following filter applied in the Display Filter pane (assuming that the listener port is 10000):

TCP.DstPort == 10000 OR TCP.SrcPort == 10000


The screenshot below shows what it looks like when you successfully connect to the availability group listener in Azure and perform a simple query.



The Con Id column shows you packets related to the same client connection (shown in yellow). In the packets sent from the VM, you can see information regarding the client’s hostname, domain, and username (shown in red). You can also see the client’s IP address on the internet (shown in orange). In this case, it is a client VM in a different cloud service, so the IP address shown is the client VM’s cloud service IP address.

Troubleshoot Availability Group Listener Configuration in Azure


The table below lists some of the common symptoms when troubleshooting availability group listeners in Azure, and possible causes for each symptom.

Tip: The Windows Ping.exe command does not work on the availability group listener in Azure. The load-balanced endpoint only accepts TCP connections, while Ping.exe uses ICMP.



Symptom Possible Cause Comment
No traffic on probe port (59999)

  • Load-balanced endpoint is not configured

  • Probe port is not configured for load-balanced endpoint

  • Firewall on VM is not opened for probe port


Probe port receives pings and SynReTransmit packets, but no replies

  • Current VM is not primary replica

  • IP address resource in client access point is not configured with probe port, or a different probe port is specified in IP address resource than in load- balanced endpoint

  • IP address resource in client access point is offline


This symptom indicates that the probe port is configured properly in the load-balanced endpoint and that the VM firewall has allowed the incoming packet. To test whether the clustered service is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for rhs.exe in the list.
No traffic on listener port

  • DSR in load-balanced endpoint not set to true

  • Public and local ports on load- balanced endpoint are different (not supported)

  • A network access control list (ACL) is configured on the load-balanced endpoint, but the client’s public IP address is not allowed or not part of an allowed range.

  • Client is not using port number in connection string or is using a different port number

  • Firewall on VM is not opened for listener port



The listener port should match the public/local port specified on the load-balanced endpoint.
Listener port receives incoming  traffic and SynReTransmit packets, but no replies

  • Cluster’s client access point is configured with incorrect dependencies

  • IP address resource in client access point contains incorrect cluster network name (“Cluster Network <#>” by default)

  • Listener is not configured with
    a port number or is configured with an incorrect port number in SQL Server Management Studio.


This symptom indicates that the load-balanced endpoint is configured properly and that the Azure load balancer has successfully routed the client’s connection request to the primary replica, but no listener is actively listening on that port. To test whether the listener is listening on the intended port, run netstat -ab in a command prompt on the primary replica and search for sqlservr.exe.The common mistake in configuring dependencies for the client access point is to set the availability group resource to depend on the IP address resource(s). Instead, you should configure the listener name to depend on the IP address(s) and configure the availability group resource to depend on the listener name.
Listener only accessible from the primary replica node itself

  • Client connection recognized local server as availability group resource owner and bypassed load-balanced endpoint entirely

  • Client does not reside in a separate cloud service (not supported)


Client lost connectivity to listener after failover

  • Firewall on new primary replica is not open for probe port (Azure load balancer cannot find new primary replica) or
    for listener port (client connection refused by firewall)

  • Client does not reside in a separate cloud service (not supported)

  • An availability group resource is offline


All IP address resources in client access point are offline, but listener name is online

  • Listener name is not configured to depend on IP address resources


Listener name is offline, but availability group resource is online

  • Availability group resource is not configured to depend on listener name


At least one IP address is online, but listener name is offline

  • Listener name is not set to OR for all IP addresses.



Version history
Last update:
‎Jan 15 2019 04:50 PM
Updated by: