Azure Database for PostgreSQL Blog

16 MIN READ

Postgres 17 Query Performance Improvements

Microsoft

Nov 08, 2024

PostgreSQL 17 brings a wealth of improvements aimed at enhancing query performance, making it a significant upgrade for database administrators and developers. This release focuses on several key areas to boost efficiency and performance.

In this blog, you will cover the following query-related enhancements:

Streaming IO: introduction of new read stream API and enhancements to existing read buffer API to enhance sequential reads and analyze performance.
B-tree Indexes: better handling of multi-value lookups for queries using IN clause.
Correlated Subqueries: transformations that convert correlated IN subqueries into joins, avoiding repeated executions.
Union Operator: use of Merge Append for better query performance.
CTE Improvements: Improved execution plans and memory usage, with optimized sorting in materialized CTEs.
IS NULL / NOT NULL Handling: Eliminates unnecessary scans on NOT NULL columns when IS NULL or IS NOT NULL checks are specified.
JSON Functions: New function JSON_TABLE allows converting JSON data directly into a relational table format, making it easier to query and analyze JSON data using standard SQL.

Let's explore the key query-related enhancements in this release, with practical examples executed on Azure Database for PostgreSQL - Flexible Server.

Note: The practical examples shown in this blog are for demonstration only. The actual query times may vary depending on various factors including SKU selected, environment where it was executed, storage size and many more. This blog should not be considered as a benchmark between Postgres 17 and previous versions.

Streaming I/O: Faster sequential reads and efficient ANALYZE

One of the standout features in PostgreSQL 17 is the improvements made to enhance sequential scans and analyze performance. This enhancement is possible with enhancements in Read Buffer API and introduction of new Read Stream API.

Read Buffer API: The Read Buffer API is fundamental for PostgreSQL's I/O operations. It handles the reading of data pages (buffers) from disk into memory. In Postgres 17 it supports vectored I/O, enabling multiple blocks to be read in a single I/O operation.
Read Stream API: The Read Stream API is a new addition in PostgreSQL 17, designed to facilitate streaming I/O operations. This API allows for more efficient sequential scans and other operations by enabling the system to issue larger, vectored I/O requests. This means that instead of reading one buffer at a time, PostgreSQL can now read multiple buffers in a single call, which can significantly speed up operations like ANALYZE.

The APIs help make both sequential scans and analyze more performant on Postgres 17 compared to previous versions.

Example:

For the purpose to demonstrate the sequential improvements a sample orders table is considered. The data, size and table DDL remain the same on both Postgres 17 server and Postgres 16 servers. The size and table definition of the table is as shown below:

Size:

 \dt+ public.orders;
                                    List of relations
 Schema |  Name  | Type  |  Owner   | Persistence | Access method |  Size   | Description
--------+--------+-------+----------+-------------+---------------+---------+-------------
 public | orders | table | postgres | permanent   | heap          | 7469 MB |

Table DDL:

\d+ public.orders;
                                                                    Table "public.orders"
   Column    |         Type          | Collation | Nullable |                 Default                  | Storage  | Compression | Stats target | Description
-------------+-----------------------+-----------+----------+------------------------------------------+----------+-------------+--------------+-------------
 order_id    | integer               |           | not null | nextval('orders_order_id_seq'::regclass) | plain    |             |              |
 customer_id | integer               |           |          |                                          | plain    |             |              |
 order_date  | date                  |           |          |                                          | plain    |             |              |
 status      | character varying(50) |           |          |                                          | extended |             |              |
Indexes:
    "orders_pkey" PRIMARY KEY, btree (order_id)
    "co_ix" btree (customer_id)
    "idx_status" btree (status)
Access method: heap

Query:

Let's execute a simple COUNT (*) operation on the orders table on two separate servers one with Postgres 17 version and another with Postgres 16 version and then compare execution times by using EXPLAIN ANALYZE output as shown below.

PostgreSQL Version 16:

EXPLAIN ANALYZE SELECT count(*) FROM public.orders;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1738822.05..1738822.06 rows=1 width=8) (actual time=6618.554..6620.057 rows=1 loops=1)
   ->  Gather  (cost=1738821.83..1738822.04 rows=2 width=8) (actual time=6618.460..6620.051 rows=2 loops=1)
         Workers Planned: 2
         Workers Launched: 1
         ->  Partial Aggregate  (cost=1737821.83..1737821.84 rows=1 width=8) (actual time=6616.859..6616.860 rows=1 loops=2)
               ->  Parallel Seq Scan on orders  (cost=0.00..1581467.67 rows=62541667 width=0) (actual time=0.005..4148.512 rows=75050000 loops=2)
 Planning Time: 0.068 ms
 Execution Time: 6620.097 ms
(8 rows)

PostgreSQL Version 17:

EXPLAIN ANALYZE SELECT count(*) FROM public.orders;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=1738822.05..1738822.06 rows=1 width=8) (actual time=4821.934..4823.057 rows=1 loops=1)
   ->  Gather  (cost=1738821.83..1738822.04 rows=2 width=8) (actual time=4821.846..4823.051 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=1737821.83..1737821.84 rows=1 width=8) (actual time=4819.456..4819.457 rows=1 loops=3)
               ->  Parallel Seq Scan on orders  (cost=0.00..1581467.67 rows=62541667 width=0) (actual time=0.021..3262.231 rows=50033333 loops=3)
 Planning Time: 0.095 ms
 Execution Time: 4823.091 ms

Along with API improvements Postgres 17 can also parallelize sequential scans more efficiently. The table's blocks are divided into ranges and shared among multiple worker processes. Each worker completes its assigned range before requesting additional blocks. This leads to better execution times. As evident above in Postgres 17 two parallel workers are used by optimizer but that is not the case in Postgres 16.

As you can observe from the above snapshots, Postgres 17 version takes ~4.8 seconds compared to ~6.6 seconds in Postgres 16 version.

Enhanced B-tree Index Handling

PostgreSQL 17 introduces better handling of multi-value searches in B-tree indexes, particularly for IN clause lookups. Previously, PostgreSQL would perform multiple lookups for each value in the IN clause, which could be inefficient.

Multi-value Lookups: During a single scan, PostgreSQL can now consider multiple values and retrieve them together if they fall on the same leaf page. This reduces the number of times the B-tree index needs to be traversed from root to leaf, enhancing performance.

Example:

Let’s consider the orders table again and execute the below query.

Query:

SELECT * FROM orders WHERE status IN ('processing', 'delivered');

PostgreSQL Version 16:

explain(analyze,buffers) SELECT * FROM orders WHERE status IN ('processing', 'delivered');
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_status on orders  (cost=0.57..2686329.05 rows=50543674 width=20) (actual time=0.024..4187.776 rows=50100000 loops=1)
   Index Cond: ((status)::text = ANY ('{processing,delivered}'::text[]))
   Buffers: shared hit=361937
 Planning Time: 0.072 ms
 Execution Time: 5593.582 ms
(5 rows)

SELECT * from pg_stat_user_tables where relname='orders';
-[ RECORD 1 ]-------+------------------------------
relid               | 1163035734
schemaname          | public
relname             | orders
seq_scan            | 0
last_seq_scan       |
seq_tup_read        | 0
idx_scan            | 2
last_idx_scan       | 2024-11-07 14:58:59.491626+00
idx_tup_fetch       | 50100000
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_tup_newpage_upd   | 0
n_live_tup          | 150100007
n_dead_tup          | 0
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     |
last_analyze        | 2024-11-07 14:58:41.720598+00
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 1
autoanalyze_count   | 0

You can observe that every time the query is executed the number of index scans it does is 2 as evident from the idx_scan column and fetches ~50 million rows for each execution as evident from idx_tup_fetch column from pg_stat_user_tables.

PostgreSQL Version 17:

EXPLAIN(ANALYZE, BUFFERS) SELECT * FROM orders WHERE status IN ('processing', 'delivered');
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_status on orders  (cost=0.57..2683608.10 rows=50033331 width=20) (actual time=0.020..3747.953 rows=50100000 loops=1)
   Index Cond: ((status)::text = ANY ('{processing,delivered}'::text[]))
   Buffers: shared hit=361933
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.117 ms
 Execution Time: 5044.886 ms
(7 rows)

postgres=> SELECT * FROM pg_stat_user_tables WHERE relname='orders';
-[ RECORD 1 ]-------+------------------------------
relid               | 24754
schemaname          | public
relname             | orders
seq_scan            | 0
last_seq_scan       |
seq_tup_read        | 0
idx_scan            | 1
last_idx_scan       | 2024-11-07 15:34:06.555587+00
idx_tup_fetch       | 50100000
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_tup_newpage_upd   | 0
n_live_tup          | 150100007
n_dead_tup          | 0
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     |
last_analyze        | 2024-11-07 15:33:51.694345+00
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 1
autoanalyze_count   | 0

You can observe that every time the query is executed the number of index scans it does is 1 as evident from the idx_scan column and fetches ~50 million rows for each execution as evident from idx_tup_fetch column from pg_stat_user_tables.

From the above snapshots, the number of index scans is 1 in Postgres 17 version compared to 2 in Postgres 16 version. This ultimately helps for faster execution. The query takes ~5 sec compared to ~5.6 seconds in Postgres 16 version.

Correlated Subqueries Optimization

Correlated subqueries have traditionally been a performance bottleneck in SQL queries. PostgreSQL 17 addresses this by transforming correlated IN subqueries into joins, which avoids repeated execution of the subquery for each row of the outer query.

Transformation to Joins: This optimization reduces the execution time of queries involving correlated subqueries, making them more efficient.

Example

orders and customers tables are considered in this example. The table definition (DDL) of the customers table is shown below.

\d+ public.customers;
                                                                   Table "public.customers"
   Column   |            Type             | Collation | Nullable |                Default                | Storage  | Compression | Stats target | Description
------------+-----------------------------+-----------+----------+---------------------------------------+----------+-------------+--------------+-------------
 id         | integer                     |           | not null | nextval('customers_id_seq'::regclass) | plain    |             |              |
 name       | character varying(100)      |           | not null |                                       | extended |             |              |
 region     | character varying(50)       |           | not null |                                       | extended |             |              |
 email      | character varying(100)      |           |          |                                       | extended |             |              |
 created_at | timestamp without time zone |           |          | CURRENT_TIMESTAMP                     | plain    |             |              |
Indexes:
    "customers_pkey" PRIMARY KEY, btree (id)
    "c_ix" btree (id)
    "customers_email_key" UNIQUE CONSTRAINT, btree (email)
Access method: heap

Query:

SELECT * FROM public. orders O 
WHERE customer_id IN (SELECT id FROM public. Customers 
WHERE id = O.customer_id AND region = 'NORTH');

PostgreSQL Version 16:

EXPLAIN ANALYZE SELECT * FROM public. orders O WHERE customer_id IN (SELECT id FROM public. Customers WHERE id = O.customer_id AND region = 'NORTH');
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on orders o  (cost=0.00..348625176.00 rows=75050000 width=20) (actual time=100842.023..100842.024 rows=0 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 150100000
   SubPlan 1
     ->  Index Scan using c_ix on customers  (cost=0.29..4.31 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=150100000)
           Index Cond: (id = o.customer_id)
           Filter: ((region)::text = 'NORTH'::text)
           Rows Removed by Filter: 0
 Planning Time: 0.188 ms
 Execution Time: 100842.050 ms

PostgreSQL Version 17:

EXPLAIN ANALYZE SELECT * FROM public. orders O WHERE customer_id IN (SELECT id FROM public. Customers WHERE id = O.customer_id AND region = 'NORTH');
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.57..2387.55 rows=2 width=20) (actual time=6.227..6.228 rows=0 loops=1)
   ->  Seq Scan on customers  (cost=0.00..2381.00 rows=1 width=4) (actual time=6.226..6.227 rows=0 loops=1)
         Filter: ((region)::text = 'NORTH'::text)
         Rows Removed by Filter: 100000
   ->  Index Scan using co_ix on orders o  (cost=0.57..6.53 rows=2 width=20) (never executed)
         Index Cond: (customer_id = customers.id)
 Planning Time: 0.320 ms
 Execution Time: 6.248 ms
(8 rows)

Optimizer transforms subquery to a join and completes the query in ~7 milliseconds.

As evident above, in Postgres version 16:

The query performs a sequential scan on the orders table, filtering rows based on a condition involving a subquery.

For each row in orders table, it performs an index scan on the customers table to check if the customer_id matches and if the region is 'NORTH'.

The execution time is quite high (~100 seconds), likely due to the large number of rows being scanned and filtered.

Union Operator

PostgreSQL 17 brings significant enhancements to the Union operator. Optimizer uses Merge Append to merge sorted input streams, reducing the need for additional sorting that helps improve overall execution times of the Union operator’s queries.

Example:

test_table and test_table2 are considered to demonstrate the union operator. The test_table table definition and size are as below:

\dt+ test_table;
                                      List of relations
 Schema |    Name    | Type  |  Owner   | Persistence | Access method |  Size   | Description
--------+------------+-------+----------+-------------+---------------+---------+-------------
 public | test_table | table | postgres | permanent   | heap          | 8389 MB |
(1 row)

test_table2 size and table definition is as shown below:

 \dt+ test_table2;
                                       List of relations
 Schema |    Name     | Type  |  Owner   | Persistence | Access method |  Size   | Description
--------+-------------+-------+----------+-------------+---------------+---------+-------------
 public | test_table2 | table | postgres | permanent   | heap          | 2796 MB |
(1 row)

Query:

The query used in the example is as shown below:

SELECT id, name FROM test_table tt WHERE id < 1000000 
UNION 
SELECT id, name FROM test_table2 tt2 WHERE id < 1000000 
ORDER BY id;

PostgreSQL Version 16:

EXPLAIN ANALYZE SELECT id, name FROM test_table tt WHERE id < 1000000 UNION SELECT id, name FROM test_table2 tt2 WHERE id < 1000000 ORDER BY id;
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=877648.15..893225.28 rows=2076951 width=40) (actual time=698.354..1249.876 rows=2000000 loops=1)
   ->  Sort  (cost=877648.15..882840.52 rows=2076951 width=40) (actual time=698.353..1019.897 rows=2000000 loops=1)
         Sort Key: tt.id, tt.name
         Sort Method: external merge  Disk: 56608kB
         ->  Append  (cost=0.56..578578.31 rows=2076951 width=40) (actual time=0.014..340.383 rows=2000000 loops=1)
               ->  Index Scan using idx on test_table tt  (cost=0.56..45327.97 rows=1014366 width=20) (actual time=0.013..124.132 rows=1000001 loops=1)
                     Index Cond: (id < 1000000)
               ->  Index Scan using test_table2_id_idx1 on test_table2 tt2  (cost=0.56..522865.58 rows=1062585 width=20) (actual time=0.024..125.115 rows=999999 loops=1)
                     Index Cond: (id < 1000000)
 Planning Time: 0.229 ms
 Execution Time: 1302.346 ms
(11 rows)

PostgreSQL Version 17:

EXPLAIN ANALYZE SELECT id, name FROM test_table tt WHERE id < 1000000 UNION SELECT id, name FROM test_table2 tt2 WHERE id < 1000000 ORDER BY id;
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1.75..681820.49 rows=1898718 width=40) (actual time=0.062..1029.511 rows=2000000 loops=1)
   ->  Merge Append  (cost=1.75..672326.90 rows=1898718 width=40) (actual time=0.061..843.386 rows=2000000 loops=1)
         Sort Key: tt.id, tt.name
         ->  Incremental Sort  (cost=0.61..70554.25 rows=938907 width=20) (actual time=0.030..257.789 rows=1000001 loops=1)
               Sort Key: tt.id, tt.name
               Presorted Key: tt.id
               Full-sort Groups: 31251  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
               ->  Index Scan using idx on test_table tt  (cost=0.56..28303.44 rows=938907 width=20) (actual time=0.016..124.878 rows=1000001 loops=1)
                     Index Cond: (id < 1000000)
         ->  Incremental Sort  (cost=1.14..582785.46 rows=959811 width=20) (actual time=0.030..259.652 rows=999999 loops=1)
               Sort Key: tt2.id, tt2.name
               Presorted Key: tt2.id
               Full-sort Groups: 31250  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
               ->  Index Scan using test_table2_id_idx1 on test_table2 tt2  (cost=0.56..539593.97 rows=959811 width=20) (actual time=0.020..124.495 rows=999999 loops=1)
                     Index Cond: (id < 1000000)
 Planning Time: 0.254 ms
 Execution Time: 1074.780 ms

If you observe the explain analyze output, the optimizer chooses Merge Append node. The query execution time is ~1.1 seconds. In Postgres 16 optimizer chooses altogether different plan and execution time comes to ~1.3 seconds.

CTE Improvements

Materialized Common Table Expressions (CTEs) in PostgreSQL are a way to optimize query performance by storing the result of a CTE temporarily at the beginning of the query execution. This stored result is then used throughout the execution of the entire parent query.

PostgreSQL 17 has introduced several significant improvements to materialized Common Table Expressions (CTEs).

Materialized CTEs: PostgreSQL 17 has improved the performance of materialized CTEs. Previously, CTEs acted as optimization fences, preventing the planner from pushing down predicates. Now, materialized CTEs can propagate column statistics to the outer query, allowing for better query planning and execution.
Path Key Propagation: Another improvement is the ability to propagate path keys from a CTE to the outer query. A path key is a way for the query planner to understand the order of rows in a result set. This understanding can help the planner avoid unnecessary sorting operations. This helps in maintaining the sort order and can lead to more efficient query plans.

Example:

Query:

WITH request AS MATERIALIZED (SELECT id FROM public.Customers c ORDER BY id ) 
SELECT customer_id, COUNT (order_id) 
FROM public.orders WHERE customer_id IN (SELECT id FROM request) 
GROUP BY customer_id ORDER BY customer_id;

PostgreSQL Version 16:

EXPLAIN ANALYZE WITH request AS MATERIALIZED (SELECT id FROM public. Customers c ORDER BY id ) SELECT customer_id, COUNT (order_id) FROM public.orders o WHERE customer_id IN (SELECT id FROM request) GROUP BY customer_id ORDER BY customer_id ;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=21703.05..24620.46 rows=166709 width=12) (actual time=399.829..456.156 rows=100000 loops=1)
   Group Key: o.customer_id
   CTE request
     ->  Index Only Scan using c_ix on customers c  (cost=0.29..2052.29 rows=100000 width=4) (actual time=0.037..7.158 rows=100000 loops=1)
           Heap Fetches: 0
   ->  Sort  (cost=19650.76..20067.53 rows=166709 width=8) (actual time=399.821..422.442 rows=300000 loops=1)
         Sort Key: o.customer_id
         Sort Method: external merge  Disk: 5304kB
         ->  Nested Loop  (cost=2250.57..3561.28 rows=166709 width=8) (actual time=43.411..338.439 rows=300000 loops=1)
               ->  HashAggregate  (cost=2250.00..2252.00 rows=200 width=4) (actual time=43.391..65.002 rows=100000 loops=1)
                     Group Key: request.id
                     Batches: 5  Memory Usage: 10305kB  Disk Usage: 200kB
                     ->  CTE Scan on request  (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.038..22.256 rows=100000 loops=1)
               ->  Index Scan using co_ix on orders o  (cost=0.57..6.53 rows=2 width=8) (actual time=0.002..0.002 rows=3 loops=100000)
                     Index Cond: (customer_id = request.id)
 Planning Time: 0.175 ms
 Execution Time: 459.484 ms
(17 rows)

PostgreSQL Version 17:

EXPLAIN ANALYZE WITH request AS MATERIALIZED (SELECT id FROM public. Customers c ORDER BY id ) SELECT customer_id, COUNT (order_id) FROM public.orders o WHERE customer_id IN (SELECT id FROM request) GROUP BY customer_id ORDER BY customer_id ;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=2054.84..16699.38 rows=166709 width=12) (actual time=0.036..155.654 rows=100000 loops=1)
   Group Key: o.customer_id
   CTE request
     ->  Index Only Scan using c_ix on customers c  (cost=0.29..2052.29 rows=100000 width=4) (actual time=0.015..7.759 rows=100000 loops=1)
           Heap Fetches: 0
   ->  Merge Semi Join  (cost=2.55..12146.46 rows=166709 width=8) (actual time=0.027..125.817 rows=300000 loops=1)
         Merge Cond: (o.customer_id = request.id)
         ->  Index Scan using co_ix on orders o  (cost=0.57..4897548.71 rows=150100000 width=8) (actual time=0.007..61.174 rows=300001 loops=1)
         ->  CTE Scan on request  (cost=0.00..2000.00 rows=100000 width=4) (actual time=0.017..24.345 rows=100000 loops=1)
 Planning Time: 0.171 ms
 Execution Time: 158.362 ms
(11 rows)

As evident in Postgres 16, you can observe that the optimizer goes for a Hash aggregate where actual rows vs estimated rows estimates are off. The reason for that being the optimizer not having individual column statistics used in the CTE. Optimizer then considers a nested loop join between CTE and the orders table ignoring the sort that already happened in CTE.

In Postgres 17 optimizer goes for a Merge semi join because the column statistics and Sorts used in the CTE are available for optimizer to come up with a better plan. The result the query in this test was ~2x faster in Postgres 17 version compared to Postgres 16 version.

Improved Handling of NULL and NOT NULL Columns

Queries involving IS NULL or IS NOT NULL checks on columns with NOT NULL constraints have been optimized in PostgreSQL 17.

Behavior in PostgreSQL 16 and Earlier

Redundant Checks: Even if a column has a NOT NULL constraint, the query planner would still evaluate IS NULL and IS NOT NULL conditions. This means that the planner did not automatically skip these checks, leading to potentially redundant evaluations.
No Optimization Based on Constraints: The query planner did not leverage the NOT NULL constraint to optimize these checks.

IS NOT NULL Optimization

Eliminates scans on NOT NULL columns if IS NOT NULL is specified in the query.

Example:

For this example, let’s consider a table with name test_table. The name column in the table has Not Null constraint. The table definition is below:

\d test_table;
             Table "public.test_table"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | bigint  |           |          |
 id2    | integer |           |          |
 name   | text    |           | not null |
Indexes:
    "idx" btree (id)

Query:

SELECT * FROM test_table where name IS NOT NULL;

PostgreSQL Version 16:

EXPLAIN ANALYZE SELECT * FROM test_table where name IS NOT NULL;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..1569212.08 rows=49577608 width=24) (actual time=315.852..5334.884 rows=50000000 loops=1)
   Filter: (name IS NOT NULL)
 Planning Time: 0.043 ms
 Execution Time: 6499.924 ms
(4 rows)

PostgreSQL Version 17:

EXPLAIN ANALYZE SELECT * FROM test_table where name IS NOT NULL;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Seq Scan on test_table  (cost=0.00..857841.36 rows=49996936 width=24) (actual time=0.006..2956.395 rows=50000000 loops=1)
 Planning Time: 0.057 ms
 Execution Time: 4083.230 ms
(3 rows)

As evident, you can observe that optimizer does a filter / not null check inspite of column being defined as not null in Postgres 16 whereas in Postgres 17 version this no longer happens and that improves the query performance.

IS NULL Optimization:

Eliminates scans on NOT NULL columns if IS NULL is specified in the query.

Query:

SELECT * FROM test_table where name IS NULL;

PostgreSQL Version 16:

EXPLAIN ANALYZE SELECT * FROM test_table where name IS NULL;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1281009.47 rows=1 width=24) (actual time=2149.561..2151.016 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 1
   ->  Parallel Seq Scan on test_table  (cost=0.00..1280009.37 rows=1 width=24) (actual time=2147.850..2147.851 rows=0 loops=2)
         Filter: (name IS NULL)
         Rows Removed by Filter: 25000000
 Planning Time: 0.044 ms
 Execution Time: 2151.035 ms
(8 rows)

As evident, you can observe that optimizer does a null check and filters the rows.

PostgreSQL Version 17:

 EXPLAIN ANALYZE SELECT * FROM test_table where name IS NULL;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.001..0.002 rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 0.039 ms
 Execution Time: 0.014 ms
(4 rows)

Optimizer uses node – “One-Time Filter: False”. The query completes in less than 1 millisecond.

JSON Functions

PostgreSQL 17 introduces new functions for working with JSON data, making it easier to query and analyze JSON data using standard SQL.

JSON_TABLE: Converts JSON data directly into a relational table format.

Other functions include JSON_EXISTS, JSON_QUERY and JSON_VALUE, which provide more flexibility and power when working with JSON data.

Example:

PostgreSQL Version 17:

 WITH json_data AS (
    SELECT '[
        {"id": 1, "name": "x", "age": 30},
        {"id": 2, "name": "y", "age": 25}
    ]'::json AS data
)
SELECT *
FROM JSON_TABLE(
    (SELECT data FROM json_data),
    '$[*]'
    COLUMNS (
        id INT PATH '$.id',
        name TEXT PATH '$.name',
        age INT PATH '$.age'
    )
) AS jt;
 id | name | age
----+------+-----
  1 | x    |  30
  2 | y    |  25
(2 rows)

Conclusion

PostgreSQL 17 is a game-changer for query performance, offering a range of enhancements that make database operations faster and more efficient. Whether you're dealing with large table scans, complex subqueries, or JSON data, PostgreSQL 17 has the tools to optimize your queries and improve overall performance.

You can find the release notes for PostgreSQL 17 here. This document provides a comprehensive overview of all the new features, enhancements, and changes in PostgreSQL 17.

If you have any questions, don't hesitate to reach out to us at Ask Azure DB for PostgreSQL.

Updated Nov 10, 2024

Version 2.0

Sarat_Balijepalli

Microsoft

Joined February 22, 2024

View Profile

Azure Database for PostgreSQL Blog

Follow this blog board to get notified when there's new activity

Blog Post

Postgres 17 Query Performance Improvements

Streaming I/O: Faster sequential reads and efficient ANALYZE

Enhanced B-tree Index Handling

Correlated Subqueries Optimization

Union Operator

CTE Improvements

Improved Handling of NULL and NOT NULL Columns

IS NOT NULL Optimization

IS NULL Optimization:

JSON Functions

Conclusion

Share