Not a week goes by that I don't have a conversation about connection management in Postgres. Connection management itself is admittedly a sore spot for Postgres, but it is also a confusing one for many. The cost of connections in Postgres has been talked about a few times before. The short summary is that each connection into Postgres is going to consume about 10 MB of memory from your database. If you have 50 connections, you're using about 500 MB of memory on just having those connections. The wasted memory is one thing, but the other is that even when a connection isn't doing anything there is some additional overhead for Postgres itself.
Another detail of connections, especially within the cloud, is security. You want your database secured, and when you connect with SSL required there is a negotiation time that takes place on each connection. This can bring the cost of connecting to a database up to 10s or even 100s of milliseconds. When your application has many short-lived requests this adds a lot of overhead. You could easily spend 80 milliseconds to establish a connection, only for your query to execute in 3 ms, and then the total response time to the customer complete in under 100ms. Yet, most of your time was wasted on simply connecting to the DB.
The common recommendation to fix this is connection pooling. A connection pool is a cache of connections that get reused when future requests are made. What this means is if a new request comes in the application server will already have a connection sitting there available.
The approach you take to connection pooling will significantly impact how healthy your database is over the long term. Most take the quick approach of putting in place connection pooling using your given application framework. For example with Ruby on Rails you can set your connection pool by simply setting a `config['pool']` to some value. This means for each running rails server you'll have a set pool. When a new request comes in it will take that connection and give it to the running web process, when it's done it will return it back to the pool. There is no expensive connection time.
Let’s look at an example. Below we have a simplified architecture. In the below you can see we have 4 rails web servers running, each has a connection pool of 5.
The issue here is that you have a lot of connections open and consuming database resources without ever being used. As you can see in our example above out of 20 "total connections" only 6 are being used. Leaving 14 idle connections, this means we're wasting 14 * 10 MB or roughly 140MB of resources.
Enter pgbouncer. Pgbouncer is a connection pooler that sits between your application and the database. When it needs a new connection to the database it grabs it, and then continues to re-use it.After a certain period of time it releases that connection. What this means is when your application grabs a connection to the database and doesn’t use it, it’s not actually passed on and consumed as an idle connection. You can see how this changes what your database actually sees below.
Many databases can benefit from connection pooling with pgbouncer even if they already have connection pooling setup at their application level. This can begin as early as 10s of connections to your database, and is critical when you have several hundred connections. You can get a better idea of how many active vs. idle connections you have by executing the query highlighted in this post. And if you’re looking to get started and setup pgbouncer, here are some quick and simple steps for you.
One final note, while this is a limitation of Postgres today, this doesn't mean it will remain this way forever.