I am trying to test the effectiveness of various partitioning schemes in PostgreSQL. I created several databases with one table and different partition sizes, and then filled hundreds of millions of records in each.
Now I would like to test some queries on all of them and compare the results. But the problem is that I got a different actual time in the same database when I run the same query.
For example, I performed EXPLAIN (buffers, analyze, verbose) SELECT count(*) FROM testdata WHERE dt = '2017-06-01 08:50:00' first time:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=356199.96..356199.97 rows=1 width=8) (actual time=155757.190..155757.190 rows=1 loops=1) Output: count(*) Buffers: shared read=290555 -> Append (cost=0.00..354506.51 rows=677381 width=0) (actual time=165.565..155583.815 rows=665275 loops=1) Buffers: shared read=290555 -> Seq Scan on public.testdata (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Filter: (testdata.dt = '2017-06-01 08:50:00'::timestamp without time zone) -> Bitmap Heap Scan on public.testdata_2017_06_01 (cost=12682.26..354506.51 rows=677380 width=0) (actual time=165.562..155319.677 rows=665275 loops=1) Recheck Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone) Rows Removed by Index Recheck: 31006916 Heap Blocks: exact=25283 lossy=263451 Buffers: shared read=290555 -> Bitmap Index Scan on testdata_2017_06_01_dt_idx (cost=0.00..12512.91 rows=677380 width=0) (actual time=159.597..159.597 rows=665275 loops=1) Index Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone) Buffers: shared read=1821 Planning time: 75.419 ms Execution time: 155757.564 ms (17 rows)
and second time:
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=356199.96..356199.97 rows=1 width=8) (actual time=18871.164..18871.164 rows=1 loops=1) Output: count(*) Buffers: shared read=290555 -> Append (cost=0.00..354506.51 rows=677381 width=0) (actual time=148.208..18731.027 rows=665275 loops=1) Buffers: shared read=290555 -> Seq Scan on public.testdata (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Filter: (testdata.dt = '2017-06-01 08:50:00'::timestamp without time zone) -> Bitmap Heap Scan on public.testdata_2017_06_01 (cost=12682.26..354506.51 rows=677380 width=0) (actual time=148.204..18489.171 rows=665275 loops=1) Recheck Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone) Rows Removed by Index Recheck: 31006916 Heap Blocks: exact=25283 lossy=263451 Buffers: shared read=290555 -> Bitmap Index Scan on testdata_2017_06_01_dt_idx (cost=0.00..12512.91 rows=677380 width=0) (actual time=143.119..143.119 rows=665275 loops=1) Index Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone) Buffers: shared read=1821 Planning time: 2.105 ms Execution time: 18871.482 ms (17 rows)
The difference is huge (155 seconds versus 18 seconds), and my first, although it was that the result was cached after the first time. But the explanation above does not display hit values ββin the buffers section. Also, the estimated cost of both versions is the same. Why compared to the actual time? Is there any maintenance work that creates such a query plan that takes such a considerable amount of time?
If I try to execute the same request for a different date, the situation will be repeated. In case PostgreSQL caches the query plan, will it be regenerated with each new parameter value? This does not make sense to me, because in this case it works like caching query results.
Question: what is the best way to get a clean query result for comparing different partitioning schemes? In MySQL, I could use SELECT SQL_NO_CACHE to disable any caching in the query. PostgreSQL has DISCARD ALL , I think, but that does not help me with the explanations above.
Update # 1:
Definition of the main table testdata:
=> \d+ testdata; Table "public.testdata" Column | Type | Modifiers | Storage | Stats target | Description ------------+-----------------------------+---------------+----------+--------------+------------- param1id | integer | not null | plain | | param2id | integer | | plain | | param3id | integer | | plain | | devid | integer | | plain | | nameid | integer | | plain | | dt | timestamp without time zone | default now() | plain | | value | double precision | | plain | | svalue | character varying(255) | | extended | | Triggers: testdata_partition_insert_trigger BEFORE INSERT ON testdata FOR EACH ROW EXECUTE PROCEDURE create_partition_and_insert() Child tables: testdata_2017_06_01, testdata_2017_06_02, ... other child tables, one per day ... testdata_2017_06_30
Definition of child testdata_2017_06_01:
\d+ testdata_2017_06_01; Table "public.testdata_2017_06_01" Column | Type | Modifiers | Storage | Stats target | Description ------------+-----------------------------+---------------+----------+--------------+------------- param1id | integer | not null | plain | | param2id | integer | | plain | | param3id | integer | | plain | | devid | integer | | plain | | nameid | integer | | plain | | dt | timestamp without time zone | default now() | plain | | value | double precision | | plain | | svalue | character varying(255) | | extended | | Indexes: "testdata_2017_06_01_devid_idx" btree (devid) "testdata_2017_06_01_dt_idx" btree (dt) "testdata_2017_06_01_nameid_idx" btree (nameid) Check constraints: "partition_check" CHECK (dt >= '2017-06-01 00:00:00'::timestamp without time zone AND dt <= '2017-06-01 23:59:59'::timestamp without time zone) Inherits: testdata
Update # 2:
After enabling track_io_timing after consulting @rvs, I reloaded the instance and the image changed significantly. Firstly, free memory has been increased from 6 to 8 GB - a screen shot . Secondly, the scheduler started using Index Only Scan instead of Bitmap Heap Scan . Then all requests take much less time (about 2.4 ms for the first request and 0.041 ms when it hits the cache, against intial 155 and 18 seconds). And finally, it started to hit the cache (in the examples above, the displayed read-only buffers are explained).
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4.59..4.60 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=1) Output: count(*) Buffers: shared hit=4 -> Append (cost=0.00..4.58 rows=2 width=0) (actual time=0.012..0.012 rows=0 loops=1) Buffers: shared hit=4 -> Seq Scan on public.testdata (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1) Filter: (testdata.dt = '2017-06-15 18:10:10'::timestamp without time zone) -> Index Only Scan using testdata_2017_06_15_dt_idx on public.testdata_2017_06_15 (cost=0.56..4.58 rows=1 width=0) (actual time=0.009..0.009 rows=0 loops=1) Index Cond: (testdata_2017_06_15.dt = '2017-06-15 18:10:10'::timestamp without time zone) Heap Fetches: 0 Buffers: shared hit=4 Planning time: 0.888 ms Execution time: 0.044 ms (13 rows)
Was there a huge amount of data attachments overflowing with all available RAM (8 GB)? We introduced one section with 40kk entries, and then copied it to other sections. Should we expect something similar in production (we are going to insert about 1 thousand records per second with voluminous inserts)? This is pretty scary if we need to run a database instance from time to time.
Thanks for the help!