Efficient PostgreSQL timestamp query using index or raster index scan?

In PostgreSQL, I have an index in the date field in my tickets table. When I compare the field with now() , the query is pretty efficient:

 # explain analyze select count(1) as count from tickets where updated_at > now(); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1) -> Index Scan using tickets_updated_at_idx on tickets (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1) Index Cond: (updated_at > now()) Total runtime: 33.271 ms 

It goes down and uses a raster map scan if I try to compare it with now() minus the interval.

 # explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1) -> Bitmap Heap Scan on tickets (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1) Recheck Cond: (updated_at > (now() - '24:00:00'::interval)) -> Bitmap Index Scan on tickets_updated_at_idx (cost=0.00..20847.74 rows=897368 width=0) (actual time=238.799..238.799 rows=924699 loops=1) Index Cond: (updated_at > (now() - '24:00:00'::interval)) Total runtime: 543.952 ms 

Is there a more efficient way to query using date arithmetic?

+2
source share
1 answer

The first query expects to find rows=74 , but actually finds rows=40250 .
The second query expects to find rows=897368 and actually finds rows=924699 .

Of course, processing 23 x as many lines takes significantly longer. Therefore, your actual times are not surprising.

Statistics for data with updated_at > now() deprecated. Run:

 ANALYZE tickets; 

and repeat your requests. And do you seriously have data with updated_at > now() ? That sounds wrong.

However, it is not surprising that statistics are outdated for recently modified data. This is in the logic of things. If your query is dependent on current statistics, you must run ANALYZE before starting the query.

Also check with (only in your session):

 SET enable_bitmapscan = off; 

and repeat the second query to view the time without scanning the raster index.

Why scan a raster index for more rows?

Simple index scan allows you to retrieve rows from the heap sequentially, as indicated in the index. It's simple, stupid and without overhead. Fast for multiple lines, but may be more expensive than scanning a raster index with increasing number of lines.

A raster index scan collects rows from an index before viewing a table. If there are several rows on the same data page, this saves repeated visits and can significantly speed up the work. The more rows, the more likely it is, scanning a raster index will save time.

For an even larger number of rows (about 5% of the table is largely dependent on the actual data), the scheduler switches to sequential table scanning and does not use the index at all.

The best option is to view only the index introduced in Postgres 9.2. This is only possible if certain preconditions are met. If all relevant columns are included in the index, the index type supports it, and the visibility map indicates that all rows on the data page are visible for all transactions, this page should not be extracted from the heap (table) and there is enough information in the index.

The decision depends on your statistics (the number of lines Postgres expects to find and their distribution), and cost parameters , most importantly random_page_cost , cpu_index_tuple_cost and effective_cache_size .

+2
source

Source: https://habr.com/ru/post/1239522/


All Articles