PostgreSQL performance difference between LIKE and regular expression

Can anyone explain such a big performance difference between these SQL?

SELECT count(*) as cnt FROM table WHERE name ~ '\*{3}'; -- Total runtime 12.000 - 18.000 ms SELECT count(*) as cnt FROM table WHERE name ~ '\*\*\*'; -- Total runtime 12.000 - 18.000 ms SELECT count(*) as cnt FROM table WHERE name LIKE '%***%'; -- Total runtime 5.000 - 7.000 ms 

As you can see, the difference more than doubles between the LIKE operator and a simple regular expression (I thought that the LIKE operator would be internally converted to a regular expression and there shouldn't be any difference)

There are almost 13,000 rows here, and the "name" column is of type "text". There are no indexes associated with the "name" column defined in the table.

EDIT:

EXPLAIN ANALYSIS OF EACH OF THEM:

 EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*{3}'; Aggregate (cost=894.32..894.33 rows=1 width=0) (actual time=18.279..18.280 rows=1 loops=1) -> Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.620..18.266 rows=25 loops=1) Filter: (nombre ~ '\*{3}'::text) Total runtime: 18.327 ms 

 EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*\*\*'; Aggregate (cost=894.32..894.33 rows=1 width=0) (actual time=17.404..17.405 rows=1 loops=1) -> Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.608..17.396 rows=25 loops=1) Filter: (nombre ~ '\*\*\*'::text) Total runtime: 17.451 ms 

 EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre LIKE '%***%'; Aggregate (cost=894.32..894.33 rows=1 width=0) (actual time=4.258..4.258 rows=1 loops=1) -> Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.138..4.249 rows=25 loops=1) Filter: (nombre ~~ '%***%'::text) Total runtime: 4.295 ms 
+6
source share
2 answers

The text LIKE text ( ~~ ) operator is implemented using the special C code in like_match.c . This is ad-hoc code that is completely regex-independent. Considering the comments, it is obviously specially optimized to implement only % and _ as wildcards and, if possible, a short circuit at the output, while the regular expression mechanism is more complex by several orders of magnitude.

Note that in your test case, just as the regular expression is suboptimal compared to LIKE , LIKE is probably suboptimal compared to strpos(name, '***') > 0

strpos is implemented using the Boyer-Moore-Horspool algorithm , which is optimized for large substrings with few partial matches in the searched text.

Internally, these functions are reasonably optimized, but when there are several methods for the same purpose, choosing the most probable one is still the callerโ€™s task. PostgreSQL will not analyze a template for us to map and switch regexp to LIKE or LIKE to strpos based on this analysis.

+5
source

I'm not sure I should post it as an answer ... I made a rough comparison by doing something similar in PHP - filtering a huge array using regular expressions and simple strpos (as a replacement for LIKE). Code:

 // regex filter $filteredRegex = array_filter($a,function($item){ return preg_match('/000/',$item); }); // substring search filter $filteredStrpos = array_filter($a,function($item){ return strpos($item,'000')!==FALSE; }); 

So, a comparative analysis of this code leads to the fact that the regular expression filter doubles the result of strpos in time, so I can assume that the cost of the CPU for the regular expression is approximately double that of a simple substring search.

It seems that @zerkms had all the reason :)

0
source

Source: https://habr.com/ru/post/984733/


All Articles