Optimize date offer BETWEEN

Question

Optimize date offer BETWEEN

I need help optimizing a PostgreSQL query that uses a BETWEEN clause with a timestamp field.

I have 2 tables:

 ONE(int id_one(PK), datetime cut_time, int f1 . . .)

contains about 3394 lines

 TWO(int id_two(PK), int id_one(FK), int f2 . . .)

contains about 4,000,000 lines

There are btree indexes for both PK id_one and id_two , for FK id_one and cut_time .

I want to execute a query like:

 select o.id_one, Date(o.cut_time), o.f1, t.f2 from one o inner join two t ON (o.id_one = t.id_one) where o.cut_time between '2013-01-01' and '2013-01-31';

This query retrieves about 1,700,000 rows in 7 seconds.

The following is a report analysis report:

 "Merge Join (cost=20000000003.53..20000197562.38 rows=1680916 width=24) (actual time=0.017..741.718 rows=1692345 loops=1)" " Merge Cond: (c.coilid = hf.coilid)" " -> Index Scan using pk_coils on coils c (cost=10000000000.00..10000000382.13 rows=1420 width=16) (actual time=0.008..4.539 rows=1404 loops=1)" " Filter: ((cut_time >= '2013-01-01 00:00:00'::timestamp without time zone) AND (cut_time <= '2013-01-31 00:00:00'::timestamp without time zone))" " Rows Removed by Filter: 1990" " -> Index Scan using idx_fk_lf_data on hf_data hf (cost=10000000000.00..10000166145.90 rows=4017625 width=16) (actual time=0.003..392.535 rows=1963386 loops=1)" "Total runtime: 768.473 ms"

The index in the timestamp column is not used. How to optimize this query?

+3

sql indexing postgresql postgresql-performance between

Nko Apr 16 '13 at 14:23

source share

2 answers

The correct DDL script

Not sure which notation you use in your question. This is not Postgres syntax. A proper setup might look like this:
SQL Fiddle

More on this violin further.
Assuming a timestamp data type for a datetime column.

Invalid request

BETWEEN almost always incorrect for a principal with timestamp columns. More on this answer:

Find overlapping date ranges in PostgreSQL

In your request:

 SELECT o.one_id, date(o.cut_time), o.f1, t.f2 FROM one o JOIN two t USING (one_id) WHERE o.cut_time BETWEEN '2013-01-01' AND '2013-01-31';

... the string constants "2013-01-01" and "2013-01-31" are tied to the timestamps "2013-01-01 00:00" and "2013-01-31 00:00", This excludes most January 31st. The time stamp "2013-01-31 12:00" will not qualify, which is certainly incorrect .
If you use "2013-02-01" instead of the upper limit, this will include "2013-02-01 00:00". Still wrong.

To get all the January 2013 timestamps, it must be :

 SELECT o.one_id, date(o.cut_time), o.f1, t.f2 FROM one o JOIN two t USING (one_id) WHERE o.cut_time >= '2013-01-01' AND o.cut_time < '2013-02-01';

Exclude the upper bound.

Optimize request

@Clodoaldo already mentioned a significant drag and drop in performance: it might be pointless to retrieve 1.7 million lines. The totality before you get the result.

Since table two much larger, the decisive are the rows from which you should get. While you are extracting most of the table by more than 5%, the simple index on two.one_id will not be used because it is faster to scan the table sequentially.

The table statistics are out of date or you are faced with cost constants and other parameters (which you obviously see, see below) to force Postgres to use the index anyway.

The only chance I will see for an index on two is the coverage index with PostgreSQL 9.2 . But you did not provide your version number.

 CREATE INDEX two_one_id_f2 on two(one_id, f2);

This way, Postgres can read from the index directly if certain preconditions are met. Maybe a little faster, not so much. Not tested.

Strange numbers in `EXPLAIN` output

Regarding your weird numbers in your EXPLAIN ANALYZE . This SQL Fiddle should explain this.

It looks like you had these debugging settings:

 SET enable_seqscan = off; SET enable_indexscan = off; SET enable_bitmapscan = off;

All should be on , except for debugging. Will cripple performance! Check:

 SELECT * FROM pg_settings WHERE name ~~ 'enable%'

+5

Erwin Brandstetter Apr 16 '13 at 22:23

source share

Clodoaldo Neto · Accepted Answer · 2013-04-16 14:30

The request is completed in less than one second. The remaining 6+ seconds are spent on traffic between the server and the client.

Optimize date offer BETWEEN

The correct DDL script

Invalid request

Optimize request

Strange numbers in EXPLAIN output

More articles:

Strange numbers in `EXPLAIN` output