Failover Between Regions with Azure PostgreSQL Flexible Server without connection string changes
Published Mar 16 2023 07:16 PM 6,725 Views
Microsoft

Overview

One common pattern that we see is when customers want maximum in-region availability coupled with a cross region disaster recovery option. This typically manifests itself as, Zone Redundant HA in the primary region and a Read Replica in another region, as illustrated in the following diagram:

 

bmckerrMSFT_0-1681347450461.png

 

With ZRHA, failover between the Primary and Standby servers is automatically managed by the Azure platform and importantly, the service endpoint name does not change. However, currently with a regional failover, which is a manual process, the service endpoint name does change. A number of customers have expressed an interest in an option to perform regional failover without having to update application connection strings. By using the power and simplicity of DNS we will explain how you can ensure that connection string changes are not required when failing over between regions.

Setup

For the remainder of this article, we will use the following simplified architecture diagram as our starting point:

 

bmckerrMSFT_1-1678833805241.png

 

 

Service Name

IP

Server

Region

aue-primary-01.postgres.database.azure.com

10.0.1.4

Primary

Australia East

ause-repl-01.postgres.database.azure.com

192.168.1.4

Replica

Australia Southeast

 

We have a single Primary server, located in Australia East, which has a Replica in Australia Southeast. Both servers are setup using Private Access (VNET Injection) which uses RFC 1918 address spaces. In principle, this solution should work with Public Access servers also, but for the purposes of this article, we are focusing on the Private Access networking method.

 

For the above setup to work there is some plumbing that needs to be in place. Most of the configuration information can be found in the Networking overview - Azure Database for PostgreSQL - Flexible Server document.

Summary Prerequisites

 

  • VNET Injection. Both servers are on separate regional VNETs 
  • VNET Peering. Those VNETs need to be peered VNET Peering  
  • Private DNS Zone. Both servers need to be using the same Private DNS Zone Using-a-private-dns-zone
  • Replication. For Cross Region Replication to work. Ensure that the following section of documentation has been followed Replication Across Regions

 

Private DNS Zone

We have created a Private DNS Zone named 'flexserver.private.postgres.database.azure.com':

 

bmckerrMSFT_0-1681275146803.png

 

 

Both the Primary and Replica have been registered in this zone at server creation time and, as you can see, the servers are given randomly generated ‘Address’ records in the DNS zone:

 

‘A’ Record / Name

IP

Server

b417c0001567.flexserver.private.postgres.database.azure.com

10.0.1.4

Primary

d1433fcf5a76.flexserver.private.postgres.database.azure.com

192.168.1.4

Replica

 

It is possible to resolve these names from within either VNET. For example, here we have a Linux VM, which happens to be on the Australia East VNET, which can resolve the names of both the service name or the private DNS zone name of each of the servers. For clarity, this Linux VM is simply being used here to host the ‘psql’ binary that we are using as our "application" in this article and is not in any way required for the failover;

 

bmckerrMSFT_2-1678833805246.png

 

Not only name resolution, but courtesy of our VNET peering we can also connect to either database. First, the primary's service name and then the Private DNS Zone alias making use of the PostgreSQL command line application ‘psql’ and requesting full verification of the server certificate:

 

bmckerrMSFT_1-1681276117502.png

 

And next, the replica, again using both the service name and Private DNS Zone alias:

 

bmckerrMSFT_3-1681276381561.png

 

To recap, we have setup a Primary and a Replica in another region using Private Access networking, standard VNET peering and Private DNS Zone features. We then verified that we could connect to either database using the service name, or the name allocated by the Private DNS zone. So the question remains, how can I failover to the replica database, for example in a DR drill, and allow my application to connect to this promoted replica, without making any changes to the application configuration? The answer it turns out, is pretty simple….. 

 

In addition to typical DNS record types of ‘A’ (Address) and ‘PTR’ (Pointer) there is a useful record type that can be used as an “alias” and effectively point to another DNS entry. It is a ‘CNAME’ record type and we will show you how to configure one so that it can point to either database in our setup.

 

For our example we will create a CNAME record with value ‘prod’ that points at the ‘A’ record for our Primary server. Inside the Private DNS Zone you can ‘+ Record Set’ and add a CNAME like so:

 

bmckerrMSFT_5-1678833805262.png

 

 

Note that the default TTL will be 1 hour and you may want to reduce this to prevent DNS clients and applications caching the answer for too long. This can be significant during or after a failover. Once the CNAME record has been added the DNS zone looks like this;

 

bmckerrMSFT_6-1678833805267.png

 

Notice that the new ‘prod’ name points to the ‘A’ record for the primary server. Let us now verify that we can use the CNAME record to connect to the primary database;

 

bmckerrMSFT_1-1681347853721.png

 

When we try and connect with full certificate verification this will fail as the certificate does not match the CNAME record 'prod.flexserver.private.postgres.database.azure.com'. You can see what names are provided by the certificate by dumping the text from the certificate using the openssl tool like this:

 

bmckerrMSFT_2-1681348130198.png

 

You can see from the output above that there are 3 acceptable DNS records baked into the certificate, and our CNAME is not one of them. Therefor, using 'sslmode' of 'verify-full' fails. The workaround for this is to downgrade the certificate requirement to anything less than verify-full, for example, using 'verify-ca' results in a successful connection as show below:

 

bmckerrMSFT_3-1681348278615.png

 

The PostgreSQL Document on SSL covers the available options for 'sslmode' and what the difference are between modes. It is available here PostgreSQL: Documentation: 15: 34.19. SSL Support

 

bmckerrMSFT_4-1681348740954.png

 

When you are using Azure Private networking and Azure Private DNS Zones, as we are in this example, it is perfectly acceptable to set sslmode to 'verify-ca'.

 

It is also possible to edit the CNAME DNS record. In our case we are going to point it to the replica:

 

bmckerrMSFT_5-1681349055750.png

 

After saving the updated CNAME when we connect to ‘prod’, it will actually be the replica, which is in READ-ONLY mode and we can verify that by trying a write operation, such as creating a table:

 

bmckerrMSFT_6-1681349152566.png

 

Sure enough, the CNAME ‘prod’ is now pointing at the replica as we expected.

 

With what we have learnt so far, we can see that the flexibility of Azure Private DNS and CNAME records is ideal for this use case. The last step will be to perform the failover and complete our testing.

 

In the Azure portal we can navigate to the ‘Replication’ blade of either the Replica or the Standby and ‘Promote’ the Replica:

 

bmckerrMSFT_10-1678833805281.png

 

After clicking ‘Promote’, this window will appear:

bmckerrMSFT_11-1678833805286.png

 

Once the newly promoted Replica is available, we verify the following;

  • The CNAME record points to the Replica (now Primary)
  • The database is writeable

 

bmckerrMSFT_7-1681349463412.png

 

From an application perspective (our application is psql in this article), we haven’t had to make any changes to connect to our database regardless of which region is hosting the workload. This method can be easily integrated within DR procedures or failover testing.  Making use of the Azure CLI to semi-automate these changes is also possible and could possibly reduce the likelihood of human errors associated with changing DNS records. However, DNS changes are, in general, much simpler than making application configuration changes.

 

For reference, here is a link to a great article showing how to configure the 'sslmode' parameter for Java applications connecting with JDBC to Azure PostgreSQL Flexible Server:

Secure your Java application connections to Flexible Server via JDBC and SSL - Microsoft Community H...

 

 

 

5 Comments
Co-Authors
Version history
Last update:
‎Apr 12 2023 06:32 PM
Updated by: