PostgreSQL Bitmap Heap Scan by index is very slow, but only index scan is fast

I create a table with 43kk numbers, populating them with values ​​of 1..200. So ~ 220 thousand. For each number distributed according to the table.

create table foo (id integer primary key, val bigint); insert into foo select i, random() * 200 from generate_series(1, 43000000) as i; create index val_index on foo(val); vacuum analyze foo; explain analyze select id from foo where val = 55; 

Result: http://explain.depesz.com/s/fdsm

I expect a total runtime of <1s, is this possible? I have an SSD, the core is i5 (1.8), 4gb RAM. 9.3 Postgres.

If I use Index Only scan, it works very fast:

 explain analyze select val from foo where val = 55; 

http://explain.depesz.com/s/7hm

But I need to select id not val, therefore Incex Only scanning is not suitable in my case.

Thanks in advance!

Additional Information:

 SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid)) FROM pg_class WHERE oid='foo'::regclass; 

Result:

 "foo";236758;43800000;"1850 MB" 

Config:

 "cpu_index_tuple_cost";"0.005";"" "cpu_operator_cost";"0.0025";"" "cpu_tuple_cost";"0.01";"" "effective_cache_size";"16384";"8kB" "max_connections";"100";"" "max_stack_depth";"2048";"kB" "random_page_cost";"4";"" "seq_page_cost";"1";"" "shared_buffers";"16384";"8kB" "temp_buffers";"1024";"8kB" "work_mem";"204800";"kB" 
+5
source share
3 answers

I have an answer here: http://ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only- scan-is-fast

The trick is to use a composite index for id and value:

 create index val_id_index on foo(val, id); 

Only index scanning will be used this way, but now I can select the identifier.

 select id from foo where val = 55; 

Result:

http://explain.depesz.com/s/nDt3

But this works ONLY in Postgres with version 9.2+. If you are forced to use the versions below, try other options.

+6
source

Despite the fact that you request only 0.5% of the table or data ~ 10 MB in size (almost 2 GB from the table), the values ​​of interest are evenly distributed throughout the table.

You can see this in the foreground that you indicated:

  • BitmapIndexScan ends at 123.172ms
  • BitmapHeapScan takes 17055.046ms.

You can try clustering your tables based on an index that will put the rows together on the same pages. On my SATA disks, I have the following:

 SET work_mem TO '300MB'; EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55; Bitmap Heap Scan on foo (...) (actual time=90.315..35091.665 rows=215022 loops=1) Heap Blocks: exact=140489 Buffers: shared hit=20775 read=120306 written=24124 SET maintenance_work_mem TO '1GB'; CLUSTER foo USING val_index; EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55; Bitmap Heap Scan on foo (...) (actual time=49.215..407.505 rows=215022 loops=1) Heap Blocks: exact=1163 Buffers: shared read=1755 

Of course, this is a one-time operation, and over time it will be longer.

+2
source

You can try decreasing random_page_cost - for SSDs it can be 1. Secondly, you can increase work_mem .. 10 MB is a relatively low value for current servers with gigabyte RAM. You must double-check the effective_cache_size file - it may also be too low.

 work_mem * max_connection * 2 + shared_buffers < RAM dedicated for Postgres effective_cache ~ shared_buffers + file system cache 
0
source

Source: https://habr.com/ru/post/1205913/


All Articles