Improving query speed: simple SELECT with LIKE

Question

Improving query speed: simple SELECT with LIKE

I have inherited a large legacy code base that works in django 1.5, and my current task is to speed up a section of the site that takes ~ 1 min to load.

I made an application profile and got the following:

In particular, the culprit is the following request (removed for brevity):

SELECT COUNT(*) FROM "entities_entity" WHERE ( "entities_entity"."date_filed" <= '2016-01-21' AND ( UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR -- 34 more of these UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR -- 34 more of these ) )

which basically consist of a lot of similar queries in two fields, entity_city_state_zip and agent_city_state_zip , which are character varying(200) | not null character varying(200) | not null .

This query is executed twice (!), Taking 18814.02ms each time, and once again replacing COUNT with SELECT , taking an extra 20216.49 (I'm going to cache the result of COUNT )

The explanation is as follows:

 Aggregate (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1) -> Seq Scan on entities_entity (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1) Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text) )) Rows Removed by Filter: 310249 Planning time: 2.110 ms Execution time: 17841.944 ms

I tried using the index on entity_city_state_zip and agent_city_state_zip using various combinations, for example:

 CREATE INDEX ON entities_entity (upper(entity_city_state_zip)); CREATE INDEX ON entities_entity (upper(agent_city_state_zip));

or using varchar_pattern_ops , with no luck.

The server uses something like this:

 qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) | Q(agent_city_state_zip__icontains = all_city_list))

to generate this request.

I don’t know what else to try,

Thanks!

+5

performance django postgresql

NicoSantangelo Jan 21 '16 at 15:28

source share

2 answers

I looked at a course at Pluralsight that addressed a very similar problem. The course was "Postgres for.NET Developers," and it was in the section "Fun With Simple SQL," "Full Text Search."

To summarize their solution using your example:

Create a new column in your table that will represent your entity_city_state_zip as tsvector:

 create table entities_entity ( date_filed date, entity_city_state_zip text, csz_search tsvector not null -- add this column );

Initially, you may need to set it to null, and then fill in the data and make it invalid.

 update entities_entity set csz_search = to_tsvector (entity_city_state_zip);

Then create a trigger that will populate a new field every time you add or change a record:

 create trigger entities_insert_update before insert or update on entities_entity for each row execute procedure tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip);

Now your search queries can query the tsvector field, rather than the city / state / zip field:

 select * from entities_entity where csz_search @@ to_tsquery('Atherton')

Some notes about this:

to_tsquery, in case you did not use it, WAY is more complex than the above example. This allows conditions, partial matches, etc.
it is also not case sensitive, so there is no need to execute the upper functions that you have in your request

As a last step, put the GIN index in the tsquery field:

 create index entities_entity_ix1 on entities_entity using gin(csz_search);

If I understand the course correctly, this should make your request fly, and it will overcome the problem of the inability of the btree index to work on a like '% request.

Here is an explanation plan for such a request:

 Bitmap Heap Scan on entities_entity (cost=56.16..1204.78 rows=505 width=81) Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text)) -> Bitmap Index Scan on entities_entity_ix1 (cost=0.00..56.04 rows=505 width=0) Index Cond: (csz_search @@ to_tsquery('Atherton'::text))

+1

Hambone Jan 22 '16 at 3:52

source share

Volodymyr Matvienko · Accepted Answer · 2016-01-21T15:41:29+0000

I think the problem is in the plural LIKE "and in TOP (" UNION ...

You can use:

WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'

Or something like this:

WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])

Edited

About Raw SQL query in Django:

Hello

Improving query speed: simple SELECT with LIKE

More articles: