Azure Database for PostgreSQL Blog

7 MIN READ

PostgreSQL Making Index creation faster

Microsoft

Mar 05, 2024

This blog post is your essential guide to enhancing the speed and efficiency of your index creation process in Azure Database for PostgreSQL - Flexible Server databases. Discover a wealth of practical tips and strategies designed to streamline your approach to index management. From tuning server parameters for optimal performance to cleaning up excess space on tables.

You will see we ensure a smooth and efficient index creation experience. You will also be driven into the importance of monitoring and resolving potential blockers that could hinder the index creation process, as well as the strategic timing of these operations to minimize their impact on other database activities.

Reading this post will arm you with the knowledge and tools needed to avoid common pitfalls and ensure your databases run more efficiently. Whether you're a seasoned database administrator or just starting out, the insights provided here are for you, if you are looking to improve your database performance through effective index management.

Here are the best practices which you are going to encounter in the blog post below. To know more about each one of them, scroll down.

Two important parameters to tune
- maintenance_work_mem
- Parallelizing the index creation process
Cleaning table bloat
Identifying and Resolving Blockers
Scheduling index creation

Index Creation

The following best practices would enable you to expedite the index creation process.

1. Two important parameters to tune

`maintenance_work_mem`

maintenance_work_mem, impacts the performance of any maintenance tasks by allocating specified memory from RAM. The default value for this parameter in Azure Database for PostgreSQL - Flexible Server varies based on SKU and the upper limit for the parameter is '2097151kB'(2 GB). You can increase the parameter value for fast execution of maintenance activities (adding indexes/vacuuming etc.) using the Azure Portal.

An example to check on the impact of maintenance_work_mem. In this example, using the below code you can create an index on a 4 GB table with the structure below.

CREATE TABLE IF NOT EXISTS
    (
        a1 integer,
        b1 integer,
        c1 integer,
        d1 integer,
        e1 text ,
        f1 text ,
        g1 date
    );

Using the below query we will be loading the table with size 4 GB.


INSERT INTO test_data
    SELECT n,
n,
n,
n,
'abcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyz',
now() FROM generate_series(1,20000000) n;

Use the query below to check on the size of the table.


SELECT   relname AS table_name,
pg_size_pretty(pg_relation_size(relid)) as "Actual Size"
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

postgres=# \timing;
Timing is ON

postgres=# SHOW maintenance_work_mem;
maintenance_work_mem
----------------------
274MB
(1 row)
Time: 0.254 ms

postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
CREATE INDEX
Time: 21987.553 ms (00:21.988)

postgres=> set maintenance_work_mem='2097151kB';
SET
Time: 44.165 ms

postgres=# SHOW maintenance_work_mem;
maintenance_work_mem
----------------------
2097151kB
1 row)
Time: 0.229 ms

postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
CREATE INDEX
Time: 19396.686 ms (00:19.397)

Creating an index with maintenance_work_mem='274MB' (which is the default value for a 4 vCore SKU) took around 21s 988ms and the same with maintenance_work_mem='2097151kB'(2 GB) took 19s 397ms.

On Azure portal, when you change the server parameter, we do not accept units '2097152kB' (2GB), so I am using the above units instead. If you are using a session, you do not need to convert to these units.

Parallelizing the index creation process

Parallel index creation is supported in +PG11versions. This could be utilized by using the parameter max_parallel_maintenance_workers impacts the performance of the maintenance tasks by providing multiple workers to parallelize the execution of maintenance tasks. In this scenario, increasing this parameter will help improve the execution time of the index creation in most cases. You will see the below demonstration shows runtime with max_parallel_maintenance_workers when set to 2 and 8 respectively.

postgres=> SET max_parallel_maintenance_workers=2;
SET
Time: 26.144 ms

postgres=# SHOW max_parallel_maintenance_workers;
    max_parallel_maintenance_workers
    ----------------------------------
    2
   (1 row)

postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
   CREATE INDEX
   Time: 21987.553 ms (00:21.988)

postgres=> SET max_parallel_maintenance_workers=8;
SET
Time: 25.960 ms

postgres=# SHOW max_parallel_maintenance_workers;
   max_parallel_maintenance_workers
   ----------------------------------
   8
   (1 row)

postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
  CREATE INDEX
  Time: 12818.791 ms (00:12.819)

2. Cleaning table bloat

Before creating an index, make sure your table isn't wasting space, which can happen when data is deleted or updated. Clearing out this unnecessary space by running a VACUUM operation on the table will help organize its storage better.

VACUUM (ANALYZE, VERBOSE) <tablename>;

You can introduce bloat on the table using the Update script below which generated 18% bloat if you follow the above example with data load. Your autovacuum will run if your bloat is greater than autovacuum_vacuum_scale_factor. The default is 0.2 (20%).

UPDATE test_data SET g1=now() WHERE a1 <= 3600000;

Here is the query to run and check on dead_pct for the table.

SELECT schemaname, relname, n_dead_tup, n_live_tup, round(n_dead_tup::float/n_live_tup::float*100) dead_pct ,autovacuum_count , last_vacuum, last_autovacuum ,last_autoanalyze,last_analyze
    FROM pg_stat_user_tables
    WHERE n_live_tup > 0 and relname= 'test_data'
    ORDER BY n_live_tup DESC;

You can now run the create index query with 18% bloat and 0% bloat.

SELECT schemaname, relname, n_dead_tup, n_live_tup, round(n_dead_tup::float/n_live_tup::float*100) dead_pct ,autovacuum_count , last_vacuum, last_autovacuum ,
    last_autoanalyze,last_analyze
    postgres-# FROM pg_stat_user_tables
    postgres-# WHERE n_live_tup > 0
    postgres-# ORDER BY n_live_tup DESC;


[ RECORD 1 ]----+------------------------------
schemaname       | public
relname          | test_data
n_dead_tup       | 3672951
n_live_tup       | 19927074
dead_pct         | 18
autovacuum_count | 5
last_vacuum      |
last_autovacuum  | 2024-02-26 23:17:24.984945+00
last_autoanalyze | 2024-02-26 23:17:25.538443+00
last_analyze     |


Time: 28.900 ms
postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
CREATE INDEX
Time: 17547.189 ms (00:17.547)

postgres=# VACUUM (ANALYZE, VERBOSE) test_data;
INFO:  vacuuming ".test_data"
INFO:  scanned index "idx_test_data_id" to remove 3600000 row versions
DETAIL:  CPU: user: 1.93 s, system: 0.15 s, elapsed: 2.97 s
INFO:  "test_data": removed 3600000 row versions in 48000 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.67 s
INFO:  index "idx_test_data_id" now contains 20000000 row versions in 76901 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test_data": found 0 removable, 11903000 nonremovable row versions in 206707 out of 314667 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2640
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 2.75 s, system: 1.18 s, elapsed: 6.36 s.
INFO:  vacuuming "pg_toast.pg_toast_17462"
INFO:  "pg_toast_17462": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2640
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  analyzing " test_data"
INFO:  "test_data": scanned 30000 of 314667 pages, containing 1903250 live rows and 0 dead rows; 30000 rows in sample, 19962999 estimated total rows

You can now try to run the create index command with a table having 0% bloat.

postgres=# SELECT schemaname, relname, n_dead_tup, n_live_tup, round(n_dead_tup::float/n_live_tup::float*100) dead_pct ,autovacuum_count , last_vacuum, last_autovacuum ,
    last_autoanalyze,last_analyze
    postgres-# FROM pg_stat_user_tables
    postgres-# WHERE n_live_tup > 0
    postgres-# ORDER BY n_live_tup DESC;
-[ RECORD 1 ]----+------------------------------
schemaname       | public
relname          | test_data
n_dead_tup       | 0
n_live_tup       | 19999803
dead_pct         | 0
autovacuum_count | 1
last_vacuum      |
last_autovacuum  | 2024-02-21 22:29:49.513376+00
last_autoanalyze | 2024-02-21 22:29:49.930922+00
last_analyze     |

Time: 32.305 ms
postgres=# CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
CREATE INDEX
Time: 12873.053 ms (00:12.873)

Executing the Create Index command with 18% bloat took around 17s 547ms ms to complete whereas without bloat the index creation completed in 12s 873ms.

3. Identifying and Resolving blockers

Keep an eye on the index creation process to make sure there are no obstacles that might slow it down. To find obstacles, you can run the following query to see if there are any blocking processes and what query they are related to:

To create a blocker, I am running the Insert query at the beginning of the blog and then trying to create an index at the same time in 2 different sessions.

Session1:


INSERT INTO test_data
    SELECT n,
n,
n,
n,
'abcdefghijklmnopqrstuvwxyz',
'abcdefghijklmnopqrstuvwxyz',
now() FROM generate_series(1,20000000) n;

Session2:


CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);

Session3:


postgres=> SELECT activity.pid, activity.usename, activity.query, blocking.pid AS blocking_id, blocking.query AS blocking_query FROM pg_stat_activity AS activity JOIN pg_stat_activity AS blocking O
N blocking.pid = ANY(pg_blocking_pids(activity.pid));
-[ RECORD 1 ]--+------------------------------------------------------------
pid            | 5849
usename        | postgres
query          | CREATE INDEX idx_test_data_id ON test_data(a1, b1, c1, d1);
blocking_id    | 27582
blocking_query | INSERT INTO test_data                                      +
               |     SELECT n,                                              +
               | n,                                                         +
               | n,                                                         +
               | n,                                                         +
               | 'abcdefghijklmnopqrstuvwxyz',                              +
               | 'abcdefghijklmnopqrstuvwxyz',                              +
               | now() FROM generate_series(1,20000000) n;


Time: 30.898 ms

As shown in the screenshot above, you will have to end the 'blocking_id' process to allow the Create Index to finish its operation.

To do that use the query below to terminate the process.

SELECT pg_terminate_backend(27582);

This would remove the obstacle and speed up the index creation. Removing any found obstacles quickly will make the index creation faster.

To investigate the process of the index creation process you can use the below query.

select * from pg_stat_progress_create_index;

4. Scheduling Index creation

Pick a time when the server is not busy to make the index. This way, you can reduce the effect on other database operations by doing the CREATE INDEX operation when the workload is low to speed up the index creation.

Conclusion

As a database administrator, you can follow the above best practices to create Postgres indexes in an efficient and optimized way, which will improve the performance and speed of your database and data retrieval.

If you have any questions, don't hesitate to reach out to our Postgres team at Microsoft at Ask Azure DB for PostgreSQL

Updated Sep 10, 2024

Version 3.0

Azure Database for PostgreSQL Flexible Server

Microsoft

Joined February 22, 2022