Some customer asked me about the following query around their creating system.
Query
"We are creating some system and using paired regions to secure redundancy of it. Each database instance (primary and replicated) is located on paired regions and these instances belong to a fail over group. When we tested database failover, applications in their environment could not access primary database instance. We don't think network is reachable since global peering is configured between virtual networks in each region. What is the root cause of this issue? How can we fix this issue?"
Backgrounds
As backgrounds are not clear to me, I asked the customer to share details about their system and the facing issue with me.
- Their system is deployed to paired regions to secure the redundancy of their system.
- Traffic Manager works in front of their system to balance incoming traffic. They use priority-based traffic-routing for load balancing. If some failure occurs in active region, Traffic Manager changes route of incoming traffic to another region. But endpoint monitoring path is context root of web application, and health check API has not been deployed yet.
- Global peering between virtual networks in both regions is configured.
- They use App Service instances to host their applications. Their App Service instances are integrated with virtual networks, and service endpoints for SQL Database instances are configured against the subnet where these app services are integrated. Also, service endpoints for App Service instances are configured in order to interact with other App Service instances.
- They use SQL Database and instances on both regions belong to automatic failover group. As read-write/read-only listener is geo-independent, they don't have to modify database connection string used in applications whenever database failover occurs.
- As of now, they don't mind that primary database region should be the same as the one where Traffic Manager routes incoming traffic. In other words, they think cross region connection is fine.
The following diagram reflects their comments and our hearing results.
Root cause
If you are familiar with Azure, you can detect the root cause of this issue easily. This is due to service endpoint limitation. For Azure SQL, a service endpoint applies only to Azure service traffic within a virtual network's region.
In the following case, applications in Region-A can access database instance in Region-A.
However, In the following case, applications in Region-A cannot access database instance in Region-B even if global peering is configured.
Solutions
There are several solutions for this case, but I provided the following options.
- Using private link (Use private link along with global peering works, too.)
- Modifying traffic routing rule
1. Using private link
If cross region connection is still fine, they can fix this issue by using private link instead of service endpoint.
Azure Private Link for Azure SQL Database and Azure Synapse Analytics
https://docs.microsoft.com/azure/azure-sql/database/private-endpoint-overview
The following document elaborates tips for integration between SQL Database and App Service.
Web app private connectivity to Azure SQL database
https://docs.microsoft.com/azure/architecture/example-scenario/private-web-app/private-web-app
When using private link, the diagram looks like this.
In case of the following configuration, each Private DNS Zone to resolve SQL Database private endpoint(s) should be deployed in each resource group.
- Global peering is not configured.
- Two private endpoints for SQL Database(s) in both regions are located in each region.
If VNet global peering is configured among both regions, just one Private DNS Zone should be deployed in either resource group, and VNet link to the Private DNS Zone should be configured. Anyway, Web application can run in one region, while database can run in another region.
If more detail information for the latter case (global peering) is required, please check the following document.
Multi-region web app with private connectivity to database
https://docs.microsoft.com/azure/architecture/example-scenario/sql-failover/app-service-private-sql-multi-region
When using private link, the following limitations should be considered.
Cost
- Private link is not free of charge.
https://azure.microsoft.com/pricing/details/private-link/ - Costs for network traffic across regions are required.
Performance
- When connecting SQL Database, all connections are proxied via the Azure SQL Database gateways. That leads to poorer throughput than direct connection.
https://docs.microsoft.com/azure/azure-sql/database/connectivity-architecture#connection-policy - Network latency gets longer in case of cross region connection.
2. Modifying traffic routing rule
In some cases, private link does not meet requirements. In this case, we should configure Traffic Manger to match the region to which Traffic Manager routes incoming traffic and database primary region. The diagram looks like this.
To achieve this, the following configuration is required.
- First of all, priority for active region is set smaller value (e.g. 50) , and the priority for the other region is set much bigger value (e.g. 1000). This configuration allows incoming traffic to be routed to the active region (smaller value is set as priority). For more details, see the following document.
Priority traffic-routing method
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-routing-methods#priority-traffic-routing-method
- Then, health check API should be configured. The API checks if access between applications and databases is healthy. If heathy, the API returns HTTP 200, otherwise, it returns other than 200 (e.g. 503).
- Following the document, traffic Manager is configured in order to use this API to monitor endpoint. If health check API returns other than 200, Traffic Manger directs incoming traffic to another region.
Configure endpoint monitoring
This concept has some limitations listed below.
- Needless to say, health check API should be created.
- It takes some time to change routing region. Precisely, the minimum number of trials (from 0 to 9) to monitor endpoint by health check API and trial interval (default interval is 30 seconds, and 10 second interval is also available, but additional cost is required). For more details, see the following document.
Configure endpoint monitoring
https://docs.microsoft.com/azure/traffic-manager/traffic-manager-monitoring#configure-endpoint-monitoring
Conclusion
In this case, I suggested both ways and asked this customer to make their decision. And last but not least, in this case, Traffic manager is used, but the concept of this solution is applicable in case of Azure Front Door.