Tips for tuning Postgres LIKE & ILIKE queries via pg_trgm on Azure
Published Mar 04 2024 09:54 AM 1,932 Views

Query tuning plays a crucial role pertaining to database performance. If you are facing issues with query run times while using a pattern search with LIKE or ILIKE operators, you are in the right place. This blog helps you with tuning the sub-optimal query plan which impacts the query performance with LIKE or ILIKE operators. In such scenarios a normal b-tree index may not generate an optimal query plan.

 

This blog will walk you through a use case scenario and show you how to tune your queries with LIKE or ILIKE operators. The Postgres Operator class and pg_trgm extension available on Azure Database for PostgreSQL - Flexible Server can be used to create B-tree and GIN indexes to better optimize the query performance with the LIKE or ILIKE operators.

 

The use case scenario refers to the following steps for step-by-step guide to optimize query performance:

 

Creating sample table and data load

 

Create a table with two columns id and field_text, load 80 million records into the table and VACUUM ANALYZE the table after the data load is complete.

 

You can follow below steps to create and load the data.

 

CREATE TABLE test_like
 (id int,
 field_text text);

INSERT INTO test_like (id, field_text);

SELECT id,'testp' FROM generate_series (1, 80000000) id; 

 

Let’s look at the sample data stored in the table.

 

VACUUM ANALYZE test_like;

SELECT * FROM test_like LIMIT 5;

id | field_text
----+------------
  1 | testp1
  2 | testp2
  3 | testp3
  4 | testp4
  5 | testp5 

 

Tests with b-tree index

 

Let’s execute a query with LIKE operator using EXPLAIN ANALYZE as shown below.

 

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';

QUERY PLAN
-------------------------------------------------------------------

Gather  (cost=1000.00..927019.11 rows=7999 width=17) (actual time=0.287..1984.567 rows=111111 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Parallel Seq Scan on test_like  (cost=0.00..925219.21 rows=3333 width=17) (actual time=136.215..1850.600 rows=37037 loops=3)

         Filter: (field_text ~~ 'testp123%'::text)

         Rows Removed by Filter: 26629630

Planning Time: 0.060 ms

Execution Time: 1988.630 ms

(8 rows) 

 

The above screenshot shows the query plan using EXPLAIN ANALYZE command on the SELECT query performing a pattern search.

 

The execution time of the query is 1.9 seconds.

 

Let’s try to improve the performance by adding a b-tree index on field_text  column and execute the query again.

 

CREATE INDEX test_like_ix ON test_like(field_text);

ANALYZE test_like;

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';

QUERY PLAN
----------------------------------------

Gather  (cost=1000.00..927057.67 rows=8000 width=17) (actual time=14.525..1952.439 rows=111111 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Parallel Seq Scan on test_like  (cost=0.00..925257.67 rows=3333 width=17) (actual time=108.557..1944.423 rows=37037 loops=3)

         Filter: (field_text ~~ 'testp123%'::text)

         Rows Removed by Filter: 26629630

Planning Time: 0.418 ms

Execution Time: 1956.481 ms

(8 rows) 

 

The above screenshot shows the index creation and query plan using EXPLAIN ANALYZE command on the SELECT query performing a pattern search, however the index is not being used by optimizer.

 

The optimizer did not pick the new b-tree index test_like_ix created and the query still took 1.9 seconds.

 

Tests with a b-tree index created using PG operator class

 

To better handle the scenario explained in previous section Postgres provides operator class. The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the data types of text, varchar, and char respectively and help in pattern search.

 

Let’s now drop the old index and create an index using text_pattern_ops operator class and execute the query again.

 

CREATE INDEX test_like_ix ON test_like (field_text text_pattern_ops);

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';



                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

Index Scan using test_like_ix on test_like  (cost=0.57..4.59 rows=8000 width=17) (actual time=0.029..23.059 rows=111111 loops=1)

   Index Cond: ((field_text ~>=~ 'testp123'::text) AND (field_text ~<~ 'testp124'::text))

   Filter: (field_text ~~ 'testp123%'::text)

Planning Time: 0.097 ms

Execution Time: 27.052 ms

(5 rows) 

 

The above screenshot depicts after creating index using the PG operator class the EXPLAIN ANALYZE output of the SELECT query.

 

The new index created is picked by the optimizer and the query execution time improved from 1.9 Seconds to 27 milliseconds.

 

Tests with a GIN index created using pg_trgm extension

 

Moving to the next scenario, let’s try to execute a query with following pattern %testp123 using EXPLAIN ANALYZE as shown below.

 

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE '%testp123';

                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Gather  (cost=1000.00..927068.67 rows=8000 width=17) (actual time=2594.731..2596.424 rows=1 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Parallel Seq Scan on test_like  (cost=0.00..925268.67 rows=3333 width=17) (actual time=1733.854..2592.494 rows=0 loops=3)

         Filter: (field_text ~~ '%testp123'::text)

         Rows Removed by Filter: 26666666

Planning Time: 0.069 ms

Execution Time: 2596.513 ms

(8 rows) 

 

The above screenshot shows the EXPLAIN ANALYZE of the SELECT query using pattern search. The query execution time is found to be ~2.5 seconds and optimizer does not pick up the index test_like_ix created earlier.

 

Fortunately, Postgres also provides pg_trgm extension which supports wildcard searches in queries that use SQL LIKE or ILIKE operators. Using the pg_trgm extension you could create GIN indexes to speed up the queries that use SQL LIKE operators for pattern search. This can be helpful in scenarios where you like to use patterns like %test123% or %test123.

 

To implement the pg_trgm extension approach:

  • Go to Server parameters blade on the Azure Database for PostgreSQL - Flexible Server. 
  • Select azure.extensions parameter. 
  • Add pg_trgm extension and save.

 

Sarat_Balijepalli_1-1709067005416.png

 

Create the GIN index by executing the following commands.

 

CREATE EXTENSION pg_trgm;

CREATE INDEX test_like_ix2 ON test_like USING GIN (field_text gin_trgm_ops); 

 

Execute the query again.

 

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE '%testp123';

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on test_like  (cost=1968.00..16952.30 rows=8000 width=17) (actual time=1075.015..1076.468 rows=1 loops=1)

   Recheck Cond: (field_text ~~ '%testp123'::text)

   Rows Removed by Index Recheck: 2130

   Heap Blocks: exact=1725

   ->  Bitmap Index Scan on test_like_ix2  (cost=0.00..1966.00 rows=8000 width=0) (actual time=1074.813..1074.813 rows=2131 loops=1)

         Index Cond: (field_text ~~ '%testp123'::text)

Planning Time: 0.085 ms

Execution Time: 1076.562 ms

(8 rows) 

 

The above screenshot depicts the creation of GIN index and the EXPLAIN ANALYZE shows the optimizer using the GIN index created to execute the SELECT query.

 

The performance of the query has improved from ~2.5 seconds to 1 second and the optimizer uses newly created index %test_like_ix2.

 

The pg_trgm extension helps to improve the performance of queries using the ILIKE operator also. For queries using ILIKE all three pattern search types %typep123, %typep123% and type123% are supported by GIN index created using the pg_trgm extension.

 

For example, you could execute the below mentioned query in 1 second with the GIN index test_like_ix2 compared to 12.7 seconds without the GIN index.

 

Output without Gin Index

 

EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text ILIKE '%testP123';

                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------

Gather  (cost=1000.00..927078.54 rows=8000 width=17) (actual time=12602.114..12603.881 rows=1 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Parallel Seq Scan on test_like  (cost=0.00..925278.54 rows=3333 width=17) (actual time=8425.930..12599.774 rows=0 loops=3)

         Filter: (field_text ~~* '%testp123'::text)

         Rows Removed by Filter: 26666666

Planning Time: 0.135 ms

Execution Time: 12603.967 ms

(8 rows)  

 

Output with Gin Index

 


                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on test_like  (cost=1968.00..16952.30 rows=8000 width=17) (actual time=1077.749..1080.002 rows=1 loops=1)

   Recheck Cond: (field_text ~~* '%testp123'::text)

   Rows Removed by Index Recheck: 2130

   Heap Blocks: exact=1725

   ->  Bitmap Index Scan on test_like_ix2  (cost=0.00..1966.00 rows=8000 width=0) (actual time=1077.541..1077.541 rows=2131 loops=1)

         Index Cond: (field_text ~~* '%testp123'::text)

Planning Time: 0.237 ms

Execution Time: 1080.093 ms 

 

Conclusion

A combination of pg_trgm extension and GIN index created using the extension helps in improving the performance of the queries that have pattern search using LIKE or ILIKE operators. You could also use Postgres operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops that supports B-tree indexes on the data types of text, varchar, and char respectively. 

 

To learn more about Postgres operator classes or pg_trgm extension please visit:

 

  1. PostgreSQL: Documentation: 16: 11.10. Operator Classes and Operator Families
  2. PostgreSQL: Documentation: 16: F.35. pg_trgm — support for similarity of text using trigram matching

 

If you have any questions, don't hesitate to reach out to us at Ask Azure DB for PostgreSQL.

 

Co-Authors
Version history
Last update:
‎Mar 04 2024 09:54 AM
Updated by: