Query tuning plays a crucial role pertaining to database performance. If you are facing issues with query run times while using a pattern search with LIKE or ILIKE operators, you are in the right place. This blog helps you with tuning the sub-optimal query plan which impacts the query performance with LIKE or ILIKE operators. In such scenarios a normal b-tree index may not generate an optimal query plan.
This blog will walk you through a use case scenario and show you how to tune your queries with LIKE or ILIKE operators. The Postgres Operator class and pg_trgm
extension available on Azure Database for PostgreSQL - Flexible Server can be used to create B-tree and GIN indexes to better optimize the query performance with the LIKE or ILIKE operators.
The use case scenario refers to the following steps for step-by-step guide to optimize query performance:
Create a table with two columns id
and field_text
, load 80 million records into the table and VACUUM ANALYZE the table after the data load is complete.
You can follow below steps to create and load the data.
CREATE TABLE test_like
(id int,
field_text text);
INSERT INTO test_like (id, field_text);
SELECT id,'testp' FROM generate_series (1, 80000000) id;
Let’s look at the sample data stored in the table.
VACUUM ANALYZE test_like;
SELECT * FROM test_like LIMIT 5;
id | field_text
----+------------
1 | testp1
2 | testp2
3 | testp3
4 | testp4
5 | testp5
Let’s execute a query with LIKE operator using EXPLAIN ANALYZE as shown below.
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=1000.00..927019.11 rows=7999 width=17) (actual time=0.287..1984.567 rows=111111 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_like (cost=0.00..925219.21 rows=3333 width=17) (actual time=136.215..1850.600 rows=37037 loops=3)
Filter: (field_text ~~ 'testp123%'::text)
Rows Removed by Filter: 26629630
Planning Time: 0.060 ms
Execution Time: 1988.630 ms
(8 rows)
The above screenshot shows the query plan using EXPLAIN ANALYZE command on the SELECT query performing a pattern search.
The execution time of the query is 1.9 seconds.
Let’s try to improve the performance by adding a b-tree index on field_text
column and execute the query again.
CREATE INDEX test_like_ix ON test_like(field_text);
ANALYZE test_like;
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';
QUERY PLAN
----------------------------------------
Gather (cost=1000.00..927057.67 rows=8000 width=17) (actual time=14.525..1952.439 rows=111111 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_like (cost=0.00..925257.67 rows=3333 width=17) (actual time=108.557..1944.423 rows=37037 loops=3)
Filter: (field_text ~~ 'testp123%'::text)
Rows Removed by Filter: 26629630
Planning Time: 0.418 ms
Execution Time: 1956.481 ms
(8 rows)
The above screenshot shows the index creation and query plan using EXPLAIN ANALYZE command on the SELECT query performing a pattern search, however the index is not being used by optimizer.
The optimizer did not pick the new b-tree index test_like_ix
created and the query still took 1.9 seconds.
To better handle the scenario explained in previous section Postgres provides operator class. The operator classes text_pattern_ops
, varchar_pattern_ops
, and bpchar_pattern_ops
support B-tree indexes on the data types of text, varchar, and char respectively and help in pattern search.
Let’s now drop the old index and create an index using text_pattern_ops
operator class and execute the query again.
CREATE INDEX test_like_ix ON test_like (field_text text_pattern_ops);
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE 'testp123%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using test_like_ix on test_like (cost=0.57..4.59 rows=8000 width=17) (actual time=0.029..23.059 rows=111111 loops=1)
Index Cond: ((field_text ~>=~ 'testp123'::text) AND (field_text ~<~ 'testp124'::text))
Filter: (field_text ~~ 'testp123%'::text)
Planning Time: 0.097 ms
Execution Time: 27.052 ms
(5 rows)
The above screenshot depicts after creating index using the PG operator class the EXPLAIN ANALYZE output of the SELECT query.
The new index created is picked by the optimizer and the query execution time improved from 1.9 Seconds to 27 milliseconds.
pg_trgm
extension
Moving to the next scenario, let’s try to execute a query with following pattern %testp123
using EXPLAIN ANALYZE as shown below.
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE '%testp123';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..927068.67 rows=8000 width=17) (actual time=2594.731..2596.424 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_like (cost=0.00..925268.67 rows=3333 width=17) (actual time=1733.854..2592.494 rows=0 loops=3)
Filter: (field_text ~~ '%testp123'::text)
Rows Removed by Filter: 26666666
Planning Time: 0.069 ms
Execution Time: 2596.513 ms
(8 rows)
The above screenshot shows the EXPLAIN ANALYZE of the SELECT query using pattern search. The query execution time is found to be ~2.5 seconds and optimizer does not pick up the index test_like_ix
created earlier.
Fortunately, Postgres also provides pg_trgm
extension which supports wildcard searches in queries that use SQL LIKE or ILIKE operators. Using the pg_trgm
extension you could create GIN indexes to speed up the queries that use SQL LIKE operators for pattern search. This can be helpful in scenarios where you like to use patterns like %test123%
or %test123
.
To implement the pg_trgm
extension approach:
pg_trgm
extension and save.
Create the GIN index by executing the following commands.
CREATE EXTENSION pg_trgm;
CREATE INDEX test_like_ix2 ON test_like USING GIN (field_text gin_trgm_ops);
Execute the query again.
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text LIKE '%testp123';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_like (cost=1968.00..16952.30 rows=8000 width=17) (actual time=1075.015..1076.468 rows=1 loops=1)
Recheck Cond: (field_text ~~ '%testp123'::text)
Rows Removed by Index Recheck: 2130
Heap Blocks: exact=1725
-> Bitmap Index Scan on test_like_ix2 (cost=0.00..1966.00 rows=8000 width=0) (actual time=1074.813..1074.813 rows=2131 loops=1)
Index Cond: (field_text ~~ '%testp123'::text)
Planning Time: 0.085 ms
Execution Time: 1076.562 ms
(8 rows)
The above screenshot depicts the creation of GIN index and the EXPLAIN ANALYZE shows the optimizer using the GIN index created to execute the SELECT query.
The performance of the query has improved from ~2.5 seconds to 1 second and the optimizer uses newly created index %test_like_ix2
.
The pg_trgm
extension helps to improve the performance of queries using the ILIKE operator also. For queries using ILIKE all three pattern search types %typep123
, %typep123%
and type123%
are supported by GIN index created using the pg_trgm
extension.
For example, you could execute the below mentioned query in 1 second with the GIN index test_like_ix2
compared to 12.7 seconds without the GIN index.
Output without Gin Index
EXPLAIN ANALYZE SELECT * FROM test_like WHERE field_text ILIKE '%testP123';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..927078.54 rows=8000 width=17) (actual time=12602.114..12603.881 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test_like (cost=0.00..925278.54 rows=3333 width=17) (actual time=8425.930..12599.774 rows=0 loops=3)
Filter: (field_text ~~* '%testp123'::text)
Rows Removed by Filter: 26666666
Planning Time: 0.135 ms
Execution Time: 12603.967 ms
(8 rows)
Output with Gin Index
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test_like (cost=1968.00..16952.30 rows=8000 width=17) (actual time=1077.749..1080.002 rows=1 loops=1)
Recheck Cond: (field_text ~~* '%testp123'::text)
Rows Removed by Index Recheck: 2130
Heap Blocks: exact=1725
-> Bitmap Index Scan on test_like_ix2 (cost=0.00..1966.00 rows=8000 width=0) (actual time=1077.541..1077.541 rows=2131 loops=1)
Index Cond: (field_text ~~* '%testp123'::text)
Planning Time: 0.237 ms
Execution Time: 1080.093 ms
A combination of pg_trgm
extension and GIN index created using the extension helps in improving the performance of the queries that have pattern search using LIKE or ILIKE operators. You could also use Postgres operator classes text_pattern_ops
, varchar_pattern_ops
, and bpchar_pattern_ops
that supports B-tree indexes on the data types of text, varchar, and char respectively.
To learn more about Postgres operator classes or pg_trgm
extension please visit:
If you have any questions, don't hesitate to reach out to us at Ask Azure DB for PostgreSQL.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.