Identify and solve connection performance in Azure Postgres

Microsoft

Dec 15, 2022

One challenge with PostgreSQL is that on a busy database, it consumes significant amount of resources when handling large number of concurrent connections. A common symptoms on a busy servers with high connections count is high CPU utilization and connection latency. This is common with applications that have client connections for noticeably short durations (less than 60s). In this scenario, the time spent on processing connections and disconnections can affect the overall database performance. In-fact effective connection management is among the top most discussed issue by our

customer. In my earlier post titled “How to leverage built-in connection pooler (PgBouncer) in Azure PostgreSQL Flexible Server”, we have learned how can we reduce postgres connection overhead by using the connection pooling. In this post we are going learn how to query connections, along with their respective state and duration.

Connection management in Postgres is an exposed nerve. In summary, each connection in Postgres consumes significant amount of resources from your database, whether idle or active. Besides high CPU utilization, this can also cause other performance issues such as disk contention, lock contention et. described in this article Number Of Database Connections

How to find connections and their state?

SELECT count(*),
       state
FROM pg_stat_activity
GROUP BY 2;

 count |             state
-------+-------------------------------
     6 | active
    20 | idle
    16 | idle in transaction

There are 3 common states for a postgres connection. Each of these is useful in deciding what you should do to better manage your connections. All these numbers can be useful to track every 30 seconds and chart on your own monitoring. Let us break them down:

Active - This is currently running queries; in a sense this is truly how many connections you may need at a time.
Idle - This is where you have opened a connection to the database, but nothing is happening. This is the one area where a connection pooler like PgBouncer can help most.
Idle In Transaction - This is where your app has run a BEGIN but it is now waiting somewhere in a transaction and not doing work.

Short-lived Vs Long-lived connections?

A 'short-lived' connection is one that is only open for a brief period (less than a minute). This is typically the case with highly transactional applications, where the client opens a connection, executes a simple query (read/write), and then closes the connection. If your application is designed around 'short-lived' connections and you expect many queries from different client sessions, then may benefit using the connection pooling.

Similarly, a connection opened for more than 10 minutes can be categorized as ‘long-lived’. For example, one might have a web-server or an app-server opening many connections to the Azure Postgres and keeping them open until the client (Web/App server) is stopped.

The maximum number of connections (short-lived or long-lived) a server allows to connect simultaneously is decided by the max_connections server parameter, configurable by the user. When this limit is reached the server will not accept any new connections until one of the existing clients disconnect.

How to find if you have ‘short-lived’ connections in your database?

Kusto Query Language (KQL) is a powerful pipeline-driven, read-only query language that enables querying Azure Service Logs. KQL supports querying Azure logs to quickly analyze a high volume of data. For this article, we will use the KQL to query Azure Postgres Logs and find connections aggregated by their age.

Prerequisites:

Enable Azure Postgres logs to be sent to Azure log analytics - Configure Log Analytics
Adjust “log_line_prefix” server parameter:
- From the Server Parameters blade - set the "log line prefix" to include the escapes "user=%u,db=%d,session=%c,sess_time=%s" in any sequence, in order to get the desired results
- Example
  - Before: log_line_prefix = "%t-%c-"
  - After: log_line_prefix = "%t-%c-user=%u,db=%d,session=%c,sess_time=%s"

1. Query ‘short-lived’ connections in Azure postgres Flexible Server

AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where TimeGenerated >= ago(2d)
| where Message contains "disconnection: session time"
| extend pgmessage = tostring(split(Message, "disconnection: session time: ")[-1])
| extend myuser = tostring(split(tostring(split(pgmessage, " database=")[-2]), " user=")[-1])
|extend hours = todecimal(substring(pgmessage, 0, 1))
| extend minutes = todecimal(substring(pgmessage, 2, 2))
| extend seconds = todecimal(substring(pgmessage, 5, 2))
| extend milliseconds = todecimal(substring(pgmessage, 7, 4))
| extend connection_life_seconds = hours*60*60+minutes*60+seconds+milliseconds
| where myuser != 'azuresu'
| extend connection_type = case(connection_life_seconds < 60 , strcat("Short Live Connection"), connection_life_seconds between (60 .. 1200) , strcat("Normal Live Connection"),connection_life_seconds >1200, strcat("Long Live Connections"), "") 
| summarize max(connection_life_seconds) by TimeGenerated,connection_type,myuser
| render timechart

2. Query ‘short-lived’ connections in Azure postgres Single Server

AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLLogs"
| where TimeGenerated >= ago(2d)
| where Message contains "disconnection: session time"
| extend pgmessage = tostring(split(Message, "disconnection: session time: ")[-1])
| extend myuser = tostring(split(tostring(split(pgmessage, " database=")[-2]), " user=")[-1])
|extend hours = todecimal(substring(pgmessage, 0, 1))
| extend minutes = todecimal(substring(pgmessage, 2, 2))
| extend seconds = todecimal(substring(pgmessage, 5, 2))
| extend milliseconds = todecimal(substring(pgmessage, 7, 4))
| extend connection_life_seconds = hours*60*60+minutes*60+seconds+milliseconds
| where myuser != 'azure_superuser'
| extend connection_type = case(connection_life_seconds < 60 , strcat("Short Live Connection"), connection_life_seconds between (60 .. 1200) , strcat("Normal Live Connection"),connection_life_seconds >1200, strcat("Long Live Connections"), "") 
| summarize max(connection_life_seconds) by TimeGenerated,connection_type,myuser
| render timechart

3. Query connections using PgBouncer connection pooling

AzureDiagnostics
| where ResourceProvider =="MICROSOFT.DBFORPOSTGRESQL"
| where Category == "PostgreSQLFlexPGBouncer"
| where log_s contains "login attempt:" 
| extend pglogs = tostring(split(log_s, "@")[-1])
| extend myip = tostring(split(tostring(split(tostring(split(pglogs, " login attempt")[-2]), " user=")[-1]),":")[-2])
| extend myuser = tostring(split(tostring(split(pglogs, " tls=")[-2]), " user=")[-1])
| where myuser != 'azuresu'
| order by TimeGenerated asc
| project TimeGenerated,myuser,myip
| summarize count(myuser) by myip,myuser
| render timechart

Okay – so I have ‘short-lived’ connections in my database, what should I do next?

First do not panic—we got you covered.

From the explanation above, it is abundantly clear that connection management is a significant overhead in Postgres. Instead of opening and closing connections for each new request, one option is to consider connection pooling. Connection poolers (like pgpool and pgbouncer) use a cache of database connections that can be reused when future requests to the database are required. This smaller number of database connections helps reduce the context switching and resource contention, and therefore improves the overall performance. This allows you to scale your database connections, while keeping the connection latency low.

In Azure Database for PostgreSQL – Flexible Server, we offer a ‘built-in’ PgBouncer, that can be simply enabled using a parameter switch, and updating the app connection string to connect to PgBouncer on port 6432, rather than directly to the database server. For more details on the Azure Postgres PgBouncer configurations, please refer documentation.

Conclusion

This post explains that a higher number of database connections doesn’t scale your application throughput. Instead, as you increase the number of database connections, the context switching and resource contention increases, which negatively impacts performance.

If your application is designed to open lot of short-lived connections, irrespective they’re active or idle, you should consider making changes so that your memory and CPU resources aren’t wasted just to manage these connections. The next step could be limiting the number of concurrent connections to the database. However, if doing that isn’t feasible, then you may consider using a connection pooler. In Azure Database for PostgreSQL Flexible Server, PgBouncer is available as a ‘built-in’ option, making it extremely easy to reap the benefits for benefit connection pooling with a few clicks.