Overview
One common pattern that we see is when customers want maximum in-region availability coupled with a cross region disaster recovery option. This typically manifests itself as, Zone Redundant HA in the primary region and a Read Replica in another region, as illustrated in the following diagram:
With ZRHA, failover between the Primary and Standby servers is automatically managed by the Azure platform and importantly, the service endpoint name does not change. However, currently with a regional failover, which is a manual process, the service endpoint name does change. A number of customers have expressed an interest in an option to perform regional failover without having to update application connection strings. By using the power and simplicity of DNS we will explain how you can ensure that connection string changes are not required when failing over between regions.
Setup
For the remainder of this article, we will use the following simplified architecture diagram as our starting point:
Service Name |
IP |
Server |
Region |
aue-primary-01.postgres.database.azure.com |
10.0.1.4 |
Primary |
Australia East |
ause-repl-01.postgres.database.azure.com |
192.168.1.4 |
Replica |
Australia Southeast |
We have a single Primary server, located in Australia East, which has a Replica in Australia Southeast. Both servers are setup using Private Access (VNET Injection) which uses RFC 1918 address spaces. In principle, this solution should work with Public Access servers also, but for the purposes of this article, we are focusing on the Private Access networking method.
For the above setup to work there is some plumbing that needs to be in place. Most of the configuration information can be found in the Networking overview - Azure Database for PostgreSQL - Flexible Server document.
Summary Prerequisites
- VNET Injection. Both servers are on separate regional VNETs
- VNET Peering. Those VNETs need to be peered VNET Peering
- Private DNS Zone. Both servers need to be using the same Private DNS Zone Using-a-private-dns-zone
- Replication. For Cross Region Replication to work. Ensure that the following section of documentation has been followed Replication Across Regions
Private DNS Zone
We have created a Private DNS Zone named 'flexserver.private.postgres.database.azure.com':
Both the Primary and Replica have been registered in this zone at server creation time and, as you can see, the servers are given randomly generated ‘Address’ records in the DNS zone:
‘A’ Record / Name |
IP |
Server |
b417c0001567.flexserver.private.postgres.database.azure.com |
10.0.1.4 |
Primary |
d1433fcf5a76.flexserver.private.postgres.database.azure.com |
192.168.1.4 |
Replica |
It is possible to resolve these names from within either VNET. For example, here we have a Linux VM, which happens to be on the Australia East VNET, which can resolve the names of both the service name or the private DNS zone name of each of the servers. For clarity, this Linux VM is simply being used here to host the ‘psql’ binary that we are using as our "application" in this article and is not in any way required for the failover;
Not only name resolution, but courtesy of our VNET peering we can also connect to either database. First, the primary's service name and then the Private DNS Zone alias making use of the PostgreSQL command line application ‘psql’ and requesting full verification of the server certificate:
And next, the replica, again using both the service name and Private DNS Zone alias:
To recap, we have setup a Primary and a Replica in another region using Private Access networking, standard VNET peering and Private DNS Zone features. We then verified that we could connect to either database using the service name, or the name allocated by the Private DNS zone. So the question remains, how can I failover to the replica database, for example in a DR drill, and allow my application to connect to this promoted replica, without making any changes to the application configuration? The answer it turns out, is pretty simple…..
In addition to typical DNS record types of ‘A’ (Address) and ‘PTR’ (Pointer) there is a useful record type that can be used as an “alias” and effectively point to another DNS entry. It is a ‘CNAME’ record type and we will show you how to configure one so that it can point to either database in our setup.
For our example we will create a CNAME record with value ‘prod’ that points at the ‘A’ record for our Primary server. Inside the Private DNS Zone you can ‘+ Record Set’ and add a CNAME like so:
Note that the default TTL will be 1 hour and you may want to reduce this to prevent DNS clients and applications caching the answer for too long. This can be significant during or after a failover. Once the CNAME record has been added the DNS zone looks like this;
Notice that the new ‘prod’ name points to the ‘A’ record for the primary server. Let us now verify that we can use the CNAME record to connect to the primary database;
When we try and connect with full certificate verification this will fail as the certificate does not match the CNAME record 'prod.flexserver.private.postgres.database.azure.com'. You can see what names are provided by the certificate by dumping the text from the certificate using the openssl tool like this:
You can see from the output above that there are 3 acceptable DNS records baked into the certificate, and our CNAME is not one of them. Therefor, using 'sslmode' of 'verify-full' fails. The workaround for this is to downgrade the certificate requirement to anything less than verify-full, for example, using 'verify-ca' results in a successful connection as show below:
The PostgreSQL Document on SSL covers the available options for 'sslmode' and what the difference are between modes. It is available here PostgreSQL: Documentation: 15: 34.19. SSL Support
When you are using Azure Private networking and Azure Private DNS Zones, as we are in this example, it is perfectly acceptable to set sslmode to 'verify-ca'.
It is also possible to edit the CNAME DNS record. In our case we are going to point it to the replica:
After saving the updated CNAME when we connect to ‘prod’, it will actually be the replica, which is in READ-ONLY mode and we can verify that by trying a write operation, such as creating a table:
Sure enough, the CNAME ‘prod’ is now pointing at the replica as we expected.
With what we have learnt so far, we can see that the flexibility of Azure Private DNS and CNAME records is ideal for this use case. The last step will be to perform the failover and complete our testing.
In the Azure portal we can navigate to the ‘Replication’ blade of either the Replica or the Standby and ‘Promote’ the Replica:
After clicking ‘Promote’, this window will appear:
Once the newly promoted Replica is available, we verify the following;
- The CNAME record points to the Replica (now Primary)
- The database is writeable
From an application perspective (our application is psql in this article), we haven’t had to make any changes to connect to our database regardless of which region is hosting the workload. This method can be easily integrated within DR procedures or failover testing. Making use of the Azure CLI to semi-automate these changes is also possible and could possibly reduce the likelihood of human errors associated with changing DNS records. However, DNS changes are, in general, much simpler than making application configuration changes.
For reference, here is a link to a great article showing how to configure the 'sslmode' parameter for Java applications connecting with JDBC to Azure PostgreSQL Flexible Server:
Updated Apr 13, 2023
Version 2.0bmckerr-MSFT
Microsoft
Joined November 30, 2021
Azure Database for PostgreSQL Blog
Follow this blog board to get notified when there's new activity