Lesson Learned #408: The Strange Case of the Failover Conundrum
Published Jul 28 2023 05:50 PM 1,729 Views

Our protagonist's application is equipped with a failover group, a mechanism enabling automatic redirection to a secondary database if the primary one becomes unavailable. This group comprises two servers named "XYZWE.database.windows.net" in West Europe and "XZNE.database.windows.net" in North Europe. The designated endpoint for failover is "XYZfog.database.windows.net."

To optimize connection management and enhance performance, we implemented HikariCP, a powerful connection pooler designed for Java applications.

 

Things seemed to be running smoothly as the application connected to the primary database through HikariCP's pool of connections. However, when testing a planned or forced failover event, the database infrastructure initiates the process of changing roles for both servers. 

 

During this transition, a challenge arises. Depending on the speed of the primary-to-secondary database role change and the DNS update to point to the new primary, HikariCP may still rely on cached connections to the secondary. Consequently, the application could mistakenly connect to the secondary database, which might be in a READ_ONLY state, restricting operations to read-only.

 

Checking the network traces we saw:

 

  • HikariCP established the connection to XYZfog.database.windows.net is pointing to IP of XYZWE.database.windows.net (Primary).
  • HikariCP cached the connection and maintain the connection for the lifetime configured
  • We performed a planned / unplanned failover:
    • Azure SQL Database sent the TCP Reset to the connection
    • XYZfog.database.windows.net will point to IP of XYZNE.database.windows.net as a new primary
  • However, HikariCP receives a lot of new connections meanwhile the DNS is updating (a couple of seconds) and is still pointing XYZfog.database.windows.net to XYZWE.database.windows.net the connection is stablished to the secondary in Readonly. 
  • After a couple of minutes depending HikariCP or when the internal DNS cache might be updated, the new connections will be pointing to  XYZNE.database.windows.net.

 

To address this situation in terms of our application: 

 

  1. Check for READ_ONLY mode: Before using the connection acquired from the pool, verify if the connection is in READ_ONLY mode. You can achieve this by executing a simple query on the connection to check the updateability of the database.

  2. Close and Reopen Connection: If the connection is in READ_ONLY mode, close the connection and obtain a new one from the pool. This ensures that HikariCP acquires a new valid connection to the new primary database.

 

Please note that this approach is general and may require adjustments to fit your specific requirements and application context. Implementing failover handling logic may become more complex based on the details and environment of your application.

 

By incorporating this logic, you can ensure that HikariCP always employs a valid connection to the primary database following a planned or forced failover, avoiding any potential issues with READ_ONLY connections.

 

I didn't test this situation in .NET or ODBC, but, perhaps might be the same behavior. 

 

 

 

import java.sql.Connection;
import java.sql.SQLException;

// Get a connection from the pool
Connection connection = dataSource.getConnection();
try {
    // Check if the connection is in READ_ONLY mode
    if (isReadOnlyConnection(connection)) {
        // Close the connection and obtain a new one to get a valid connection
        connection.close();
        connection = dataSource.getConnection();
    }

    // Use the connection for your database operations
    // ...
} catch (SQLException e) {
    // Handle errors
    e.printStackTrace();
} finally {
    // Make sure to close the connection when you are done with it
    if (connection != null) {
        connection.close();
    }
}

// Method to check if the connection is in READ_ONLY mode
private boolean isReadOnlyConnection(Connection connection) throws SQLException {
    try (java.sql.Statement statement = connection.createStatement()) {
        java.sql.ResultSet resultSet = statement.executeQuery("SELECT DATABASEPROPERTYEX(DB_NAME(), 'Updateability')");
        if (resultSet.next()) {
            String updateability = resultSet.getString(1);
            return "READ_ONLY".equals(updateability);
        }
        return false;
    }
}

 

 

 

Enjoy!

Version history
Last update:
‎Jul 28 2023 10:51 AM
Updated by: