Unable to access SQL Database connected via Service Endpoint after database failover occurs
Published Feb 15 2021 01:40 AM 4,153 Views
Microsoft

Some customer asked me about the following query around their creating system.

Query

"We are creating some system and using paired regions to secure redundancy of it. Each database instance (primary and replicated) is located on paired regions and these instances belong to a fail over group. When we tested database failover, applications in their environment could not access primary database instance. We don't think network is reachable since global peering is configured between virtual networks in each region. What is the root cause of this issue? How can we fix this issue?"

 

Backgrounds

As backgrounds are not clear to me, I asked the customer to share details about their system and the facing issue with me.

  • Their system is deployed to paired regions to secure the redundancy of their system.
  • Traffic Manager works in front of their system to balance incoming traffic. They use priority-based traffic-routing for load balancing. If some failure occurs in active region, Traffic Manager changes route of incoming traffic to another region. But endpoint monitoring path is context root of web application, and health check API has not been deployed yet.
  • Global peering between virtual networks in both regions is configured.
  • They use App Service instances to host their applications. Their App Service instances are integrated with virtual networks, and service endpoints for SQL Database instances are configured against the subnet where these app services are integrated. Also, service endpoints for App Service instances are configured in order to interact with other App Service instances.
  • They use SQL Database and instances on both regions belong to automatic failover group. As read-write/read-only listener is geo-independent, they don't have to modify database connection string used in applications whenever database failover occurs.
  • As of now, they don't mind that primary database region should be the same as the one where Traffic Manager routes incoming traffic. In other words, they think cross region connection is fine.

The following diagram reflects their comments and our hearing results.

 

image-4[1].png

 

Root cause

If you are familiar with Azure, you can detect the root cause of this issue easily. This is due to service endpoint limitation. For Azure SQL, a service endpoint applies only to Azure service traffic within a virtual network's region.

 

In the following case, applications in Region-A can access database instance in Region-A.

 

image-5[1].png

 

However, In the following case, applications in Region-A cannot access database instance in Region-B even if global peering is configured.

 

image-6[1].png

 

Solutions

There are several solutions for this case, but I provided the following options.

  • Using private link (Use private link along with global peering works, too.)
  • Modifying traffic routing rule

1. Using private link

If cross region connection is still fine, they can fix this issue by using private link instead of service endpoint.

 

Azure Private Link for Azure SQL Database and Azure Synapse Analytics

https://docs.microsoft.com/azure/azure-sql/database/private-endpoint-overview

 

The following document elaborates tips for integration between SQL Database and App Service.

 

Web app private connectivity to Azure SQL database

https://docs.microsoft.com/azure/architecture/example-scenario/private-web-app/private-web-app

 

When using private link, the diagram looks like this.

 

Logico_jp_0-1615336518167.png

 

In case of the following configuration, each Private DNS Zone to resolve SQL Database private endpoint(s) should be deployed in each resource group.

  • Global peering is not configured.
  • Two private endpoints for SQL Database(s) in both regions are located in each region.

If VNet global peering is configured among both regions, just one Private DNS Zone should be deployed in either resource group, and VNet link to the Private DNS Zone should be configured. Anyway, Web application can run in one region, while database can run in another region.

If more detail information for the latter case (global peering) is required, please check the following document.

 

Multi-region web app with private connectivity to database
https://docs.microsoft.com/azure/architecture/example-scenario/sql-failover/app-service-private-sql-...

 

When using private link, the following limitations should be considered.

 

Cost

Performance

2. Modifying traffic routing rule

In some cases, private link does not meet requirements. In this case, we should configure Traffic Manger to match the region to which Traffic Manager routes incoming traffic and database primary region. The diagram looks like this.

 

Logico_jp_1-1614255842673.png

 

To achieve this, the following configuration is required.

  • First of all, priority for active region is set smaller value (e.g. 50) , and the priority for the other region is set much bigger value (e.g. 1000). This configuration allows incoming traffic to be routed to the active region (smaller value is set as priority). For more details, see the following document.

Priority traffic-routing method
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-routing-methods#priority-traffic-ro...

 

  • Then, health check API should be configured. The API checks if access between applications and databases is healthy. If heathy, the API returns HTTP 200, otherwise, it returns other than 200 (e.g. 503).
  • Following the document, traffic Manager is configured in order to use this API to monitor endpoint. If health check API returns other than 200, Traffic Manger directs incoming traffic to another region.

Configure endpoint monitoring

https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monit...

 

This concept has some limitations listed below.

  • Needless to say, health check API should be created. 
  • It takes some time to change routing region. Precisely, the minimum number of trials (from 0 to 9) to monitor endpoint by health check API and trial interval (default interval is 30 seconds, and 10 second interval is also available, but additional cost is required). For more details, see the following document.

Configure endpoint monitoring
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monit...

 

Conclusion

In this case, I suggested both ways and asked this customer to make their decision. And last but not least, in this case, Traffic manager is used, but the concept of this solution is applicable in case of Azure Front Door.

Co-Authors
Version history
Last update:
‎Aug 31 2022 06:43 PM
Updated by: