Impact of LIKE query performance when working on a subset of a full table

I understand that LIKE queries are slow since they cannot be indexed. However, I'm interested in learning about performance in this situation:

Let's say I have a table like:

user_id | message ------------------- 1 | foo bar baz 1 | bar buz qux . . . . . . 2 | bux bar foo 2 | bar 

where I say 1 million lines, but 10,000 users, so each user has about 100 posts.

Obviously, the search is as follows:

 SELECT * FROM table WHERE message like '%ar%'; 

will be very slow. However, in my application, I would always look for user posts:

 SELECT * FROM table WHERE message like '%ar%' AND user_id = 2; 

where the user_id column will be indexed.

I correctly understand that in a similar scenario, Postgres will only execute a slow LIKE request for users ~ 100 rows after using the indexed user_id column rather than the full table, which will limit my performance?

And also, that such a request will not be much slower with 10 or 100 million users, if any user has only ~ 100 messages?

+4
source share
2 answers

The optimizer determines many things when compiling SQL into a plan.

One of them is how to filter the data (with indexes, etc.) before applying other conditions line by line.


In your case, if you have a suitable index, LIKE will only apply to records after filtering is complete.


To understand a little more about this, get the plan created by your request. You should be able to see where the indices are used to subset / filter the data, and then a separate step applying the LIKE condition.

+3
source

@MatBailie has already cleared your main question. I want to refer to your statement:

I understand that LIKE queries are slow since they cannot be indexed.

This is not entirely true.

First , and this has been true for a long time, left-bound templates can use an index. This works for regular expressions ( ~ ), as well as LIKE ( ~~ ) and SIMILAR TO . I recently wrote a comprehensive review on the dba.SE issue:

This may not work for you, because the templates in your question are not tied. If they were, you could get optimized performance using a multi- column index that uses the text class of the text_pattern_ops operator for the message column, like this:

 CREATE INDEX tbl_user_id_message_idx ON tbl (user_id, message text_pattern_ops); 

For queries such as:

 SELECT * FROM tbl WHERE user_id = 2 AND message ~~ 'bar%'; -- left anchored LIKE 

Secondly , since PostgreSQL 9.1 you can use the pg_trgm extension and create a GIST or GIN index, all templates can be used with this. There are some limitations. Maintaining such an index is more expensive, which is why it is most useful for read-only or rarely written tables. Details:

Depesz has a related tutorial .

+8
source

Source: https://habr.com/ru/post/1402855/


All Articles