Optimize Postgres Print Time Request Range

Question

Optimize Postgres Print Time Request Range

I have the following table and indexes:

CREATE TABLE ticket ( wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass), eid bigint, created timestamp with time zone NOT NULL DEFAULT now(), status integer NOT NULL DEFAULT 0, argsxml text, moduleid character varying(255), source_id bigint, file_type_id bigint, file_name character varying(255), status_reason character varying(255), ... )

I created an index on the created label as follows:

 CREATE INDEX ticket_1_idx ON ticket USING btree (created );

and here is my request

 select * from ticket where created between '2012-12-19 00:00:00' and '2012-12-20 00:00:00'

This worked fine until the number of records began to grow (about 5 million), and now it is returning forever.

An analysis explanation shows this:

 "Index Scan using ticket_1_idx on ticket (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)" " Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))" "Total runtime: 175.853 ms"

So far I have tried setting

 random_page_cost = 1.75 effective_cache_size = 3

Also created

 create CLUSTER ticket USING ticket_1_idx;

Nothing works. What am I doing wrong? Why does he choose sequential scanning? Indexes should make the query fast. Anything you can do to optimize it?

+7

indexing postgresql postgresql-performance query-optimization database-partitioning

user1754724 Dec 21

source share

1 answer

Erwin Brandstetter · Answer 1 · 2012-12-23 01:28

`CLUSTER`

If you intend to use CLUSTER , the displayed syntax is invalid.

~~create CLUSTER ticket USING ticket_1_idx;~~ ~~hit>~~

Run once:

 CLUSTER ticket USING ticket_1_idx;

This can help a lot with large result sets. Not so much to return a single line.
Postgres remembers which index to use for subsequent calls. If your table is not read-only, the effect worsens over time, and you need to re-run certain intervals:

 CLUSTER ticket;

Perhaps only on unstable partitions. See below.

However , if you have many updates, CLUSTER (or VACUUM FULL ) may seem poor for performance. The right amount of bloating allows UPDATE to place new versions of rows on one data page and avoid the need to physically expand the main file in the OS too often. You can use the carefully tuned FILLFACTOR to get the best of both worlds:

Duty cycle for a sequential index that is PK

`pg_repack`

CLUSTER accepts an exclusive lock on the table, which can be a problem in a multi-user environment. Quote guide:

When a table is clustered, ACCESS EXCLUSIVE locks on the topic. This prevents any other database operations (both writing ) from working in the table until CLUSTER completes.

My bold accent. Consider an alternative pg_repack :

Unlike CLUSTER and VACUUM FULL it works online without holding exclusive locks on processed tables during processing. pg_repack is boot efficient, with performance comparable to using CLUSTER directly.

and

pg_repack needs to perform an exclusive lock at the end of the reorganization.

Version 1.3.1 works with:

PostgreSQL 8.3, 8.4, 9.0, 9.1, 9.2, 9.3, 9.4

Version 1.4.2 works with:

PostgreSQL 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 10

Query

The request is simple enough not to cause performance issues per se.

However, a word about correctness: BETWEEN contains boundaries. Your query selects everything on December 19, plus entries from December 20, 00:00 hours. This is an extremely unlikely requirement. Most likely you really want:

 SELECT * FROM ticket WHERE created >= '2012-12-19 0:0' AND created < '2012-12-20 0:0';

Performance

First, you ask:

Why does he choose sequential scanning?

The EXPLAIN output explicitly displays an index scan, rather than a sequential table scan. There must be some misunderstanding.

If you press hard to increase productivity, you can improve the situation. But the necessary background information is not in this matter. Possible options:

You can query only the required columns instead of * to reduce the cost of translation (and possibly other performance benefits).
You can look at the separation and put practical time slices in separate tables. Add indexes to sections as needed.
If splitting is not an option, another related but less intrusive method is to add one or more partial indexes .
For example, if you basically query the current month, you can create the following partial index:
```
 CREATE INDEX ticket_created_idx ON ticket(created) WHERE created >= '2012-12-01 00:00:00'::timestamp; 
```
CREATE new index right before the start of the new month. You can easily automate a task with a cron job. Optional DROP partial indexes for old months.
Keep a common index optional for CLUSTER (which cannot work with partial indexes). If the old records never change, paging will help to solve this problem, since you only need to re-cluster the new partitions. Again, if entries never change at all, you probably don't need CLUSTER .

If you combine the last two steps, the performance should be amazing.

Work basics

You may be missing one of the basics. All the usual recommendations apply:

Optimize Postgres Print Time Request Range

CLUSTER

pg_repack

Query

Performance

Work basics

More articles:

`CLUSTER`

`pg_repack`