Improving query speed: simple SELECT with LIKE

I have inherited a large legacy code base that works in django 1.5, and my current task is to speed up a section of the site that takes ~ 1 min to load.

I made an application profile and got the following:

enter image description here

In particular, the culprit is the following request (removed for brevity):

SELECT COUNT(*) FROM "entities_entity" WHERE ( "entities_entity"."date_filed" <= '2016-01-21' AND ( UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR -- 34 more of these UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR -- 34 more of these ) ) 

which basically consist of a lot of similar queries in two fields, entity_city_state_zip and agent_city_state_zip , which are character varying(200) | not null character varying(200) | not null .

This query is executed twice (!), Taking 18814.02ms each time, and once again replacing COUNT with SELECT , taking an extra 20216.49 (I'm going to cache the result of COUNT )

The explanation is as follows:

 Aggregate (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1) -> Seq Scan on entities_entity (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1) Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text) )) Rows Removed by Filter: 310249 Planning time: 2.110 ms Execution time: 17841.944 ms 

I tried using the index on entity_city_state_zip and agent_city_state_zip using various combinations, for example:

 CREATE INDEX ON entities_entity (upper(entity_city_state_zip)); CREATE INDEX ON entities_entity (upper(agent_city_state_zip)); 

or using varchar_pattern_ops , with no luck.

The server uses something like this:

 qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) | Q(agent_city_state_zip__icontains = all_city_list)) 

to generate this request.

I don’t know what else to try,

Thanks!

+5
source share
2 answers

I think the problem is in the plural LIKE "and in TOP (" UNION ...

You can use:

WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'

Or something like this:

WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])


Edited

About Raw SQL query in Django:

Hello

+1
source

I looked at a course at Pluralsight that addressed a very similar problem. The course was "Postgres for.NET Developers," and it was in the section "Fun With Simple SQL," "Full Text Search."

To summarize their solution using your example:

Create a new column in your table that will represent your entity_city_state_zip as tsvector:

 create table entities_entity ( date_filed date, entity_city_state_zip text, csz_search tsvector not null -- add this column ); 

Initially, you may need to set it to null, and then fill in the data and make it invalid.

 update entities_entity set csz_search = to_tsvector (entity_city_state_zip); 

Then create a trigger that will populate a new field every time you add or change a record:

 create trigger entities_insert_update before insert or update on entities_entity for each row execute procedure tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip); 

Now your search queries can query the tsvector field, rather than the city / state / zip field:

 select * from entities_entity where csz_search @@ to_tsquery('Atherton') 

Some notes about this:

  • to_tsquery, in case you did not use it, WAY is more complex than the above example. This allows conditions, partial matches, etc.
  • it is also not case sensitive, so there is no need to execute the upper functions that you have in your request

As a last step, put the GIN index in the tsquery field:

 create index entities_entity_ix1 on entities_entity using gin(csz_search); 

If I understand the course correctly, this should make your request fly, and it will overcome the problem of the inability of the btree index to work on a like '% request.

Here is an explanation plan for such a request:

 Bitmap Heap Scan on entities_entity (cost=56.16..1204.78 rows=505 width=81) Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text)) -> Bitmap Index Scan on entities_entity_ix1 (cost=0.00..56.04 rows=505 width=0) Index Cond: (csz_search @@ to_tsquery('Atherton'::text)) 
+1
source

Source: https://habr.com/ru/post/1241143/


All Articles