I have a view in my application that renders a lot of data, and in the backend, the data obtained with this query:
DataPoint Load (20394.8ms) SELECT communities.id as com, consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" INNER JOIN "clusterings" ON "clusterings"."id" = "communities"."clustering_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 AND "clusterings"."id" = 1 GROUP BY communities.id, consumers.id [["timestamp", "2015-11-20 09:23:00"], ["timestamp", "2015-11-27 09:23:00"], ["interval_id", 2]]
The task takes about 20 seconds, which seems a bit excessive.
The code for generating the request is as follows:
res = {} DataPoint.joins(consumer: {communities: :clustering} ) .where('clusterings.id': self, timestamp: chart_cookies[:start_date] .. chart_cookies[:end_date], interval_id: chart_cookies[:interval_id]) .group('communities.id') .group('consumers.id') .select('communities.id as com, consumers.name as con', 'array_agg(timestamp ORDER BY data_points.timestamp asc) as tims', 'array_agg(consumption ORDER BY data_points.timestamp ASC) as cons') .each do |d| res[d.com] ||= {} res[d.com][d.con] = d.tims.zip(d.cons) res[d.com]["aggregate"] ||= d.tims.map{|t| [t,0]} res[d.com]["aggregate"] = res[d.com]["aggregate"].zip(d.cons).map{|(a,b),d| [a,(b+d)]} end res
And the corresponding database models are as follows:
create_table "data_points", force: :cascade do |t| t.bigint "consumer_id" t.bigint "interval_id" t.datetime "timestamp" t.float "consumption" t.float "flexibility" t.datetime "created_at", null: false t.datetime "updated_at", null: false t.index ["consumer_id"], name: "index_data_points_on_consumer_id" t.index ["interval_id"], name: "index_data_points_on_interval_id" t.index ["timestamp", "consumer_id", "interval_id"], name: "index_data_points_on_timestamp_and_consumer_id_and_interval_id", unique: true t.index ["timestamp"], name: "index_data_points_on_timestamp" end create_table "consumers", force: :cascade do |t| t.string "name" t.string "location" t.string "edms_id" t.bigint "building_type_id" t.bigint "connection_type_id" t.float "location_x" t.float "location_y" t.string "feeder_id" t.bigint "consumer_category_id" t.datetime "created_at", null: false t.datetime "updated_at", null: false t.index ["building_type_id"], name: "index_consumers_on_building_type_id" t.index ["connection_type_id"], name: "index_consumers_on_connection_type_id" t.index ["consumer_category_id"], name: "index_consumers_on_consumer_category_id" end create_table "communities_consumers", id: false, force: :cascade do |t| t.bigint "consumer_id", null: false t.bigint "community_id", null: false t.index ["community_id", "consumer_id"], name: "index_communities_consumers_on_community_id_and_consumer_id" t.index ["consumer_id", "community_id"], name: "index_communities_consumers_on_consumer_id_and_community_id" end create_table "communities", force: :cascade do |t| t.string "name" t.text "description" t.bigint "clustering_id" t.datetime "created_at", null: false t.datetime "updated_at", null: false t.index ["clustering_id"], name: "index_communities_on_clustering_id" end create_table "clusterings", force: :cascade do |t| t.string "name" t.text "description" t.datetime "created_at", null: false t.datetime "updated_at", null: false end
How to make a request faster? Is it possible to reorganize the query to simplify it or add additional code to the database schema to reduce time?
Interestingly, a slightly simplified version of the request, which I use in another view, is much faster, in just 1161.4 ms for the first request and 41.6 ms for the following requests:
DataPoint Load (1161.4ms) SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 AND "communities"."id" = 100 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-20 09:23:00"], ["timestamp", "2015-11-27 09:23:00"], ["interval_id", 2]]
Using the EXPLAIN (ANALYZE, BUFFERS) command EXPLAIN (ANALYZE, BUFFERS) with a query in dbconsole, I get the following output:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=12.31..7440.69 rows=246 width=57) (actual time=44.139..20474.015 rows=296 loops=1) Group Key: communities.id, consumers.id Buffers: shared hit=159692 read=6148105 written=209 -> Nested Loop (cost=12.31..7434.54 rows=246 width=57) (actual time=20.944..20436.806 rows=49728 loops=1) Buffers: shared hit=159685 read=6148105 written=209 -> Nested Loop (cost=11.88..49.30 rows=1 width=49) (actual time=0.102..6.374 rows=296 loops=1) Buffers: shared hit=988 read=208 -> Nested Loop (cost=11.73..41.12 rows=1 width=57) (actual time=0.084..4.443 rows=296 loops=1) Buffers: shared hit=396 read=208 -> Merge Join (cost=11.58..40.78 rows=1 width=24) (actual time=0.075..1.365 rows=296 loops=1) Merge Cond: (communities_consumers.community_id = communities.id) Buffers: shared hit=5 read=7 -> Index Only Scan using index_communities_consumers_on_community_id_and_consumer_id on communities_consumers (cost=0.27..28.71 rows=296 width=16) (actual time=0.039..0.446 rows=296 loops=1) Heap Fetches: 4 Buffers: shared hit=1 read=6 -> Sort (cost=11.31..11.31 rows=3 width=16) (actual time=0.034..0.213 rows=247 loops=1) Sort Key: communities.id Sort Method: quicksort Memory: 25kB Buffers: shared hit=4 read=1 -> Bitmap Heap Scan on communities (cost=4.17..11.28 rows=3 width=16) (actual time=0.026..0.027 rows=6 loops=1) Recheck Cond: (clustering_id = 1) Heap Blocks: exact=1 Buffers: shared hit=4 read=1 -> Bitmap Index Scan on index_communities_on_clustering_id (cost=0.00..4.17 rows=3 width=0) (actual time=0.020..0.020 rows=8 loops=1) Index Cond: (clustering_id = 1) Buffers: shared hit=3 read=1 -> Index Scan using consumers_pkey on consumers (cost=0.15..0.33 rows=1 width=33) (actual time=0.007..0.008 rows=1 loops=296) Index Cond: (id = communities_consumers.consumer_id) Buffers: shared hit=391 read=201 -> Index Only Scan using clusterings_pkey on clusterings (cost=0.15..8.17 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=296) Index Cond: (id = 1) Heap Fetches: 296 Buffers: shared hit=592 -> Index Scan using index_data_points_on_consumer_id on data_points (cost=0.44..7383.44 rows=180 width=24) (actual time=56.128..68.995 rows=168 loops=296) Index Cond: (consumer_id = consumers.id) Filter: (("timestamp" >= '2015-11-20 09:23:00'::timestamp without time zone) AND ("timestamp" <= '2015-11-27 09:23:00'::timestamp without time zone) AND (interval_id = 2)) Rows Removed by Filter: 76610 Buffers: shared hit=158697 read=6147897 written=209 Planning time: 1.811 ms Execution time: 20474.330 ms (40 rows)
The bullet driver returns the following warnings:
USE eager loading detected Community => [:communities_consumers] Add to your finder: :includes => [:communities_consumers] USE eager loading detected Community => [:consumers] Add to your finder: :includes => [:consumers]
After removing the connection to the cluster table, the new query plan is as follows:
EXPLAIN for: SELECT communities.id as com, consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 AND "communities"."clustering_id" = 1 GROUP BY communities.id, consumers.id [["timestamp", "2015-11-29 20:52:30.926247"], ["timestamp", "2015-12-06 20:52:30.926468"], ["interval_id", 2]] QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=10839.79..10846.42 rows=241 width=57) -> Sort (cost=10839.79..10840.39 rows=241 width=57) Sort Key: communities.id, consumers.id -> Nested Loop (cost=7643.11..10830.26 rows=241 width=57) -> Nested Loop (cost=11.47..22.79 rows=1 width=49) -> Hash Join (cost=11.32..17.40 rows=1 width=16) Hash Cond: (communities_consumers.community_id = communities.id) -> Seq Scan on communities_consumers (cost=0.00..4.96 rows=296 width=16) -> Hash (cost=11.28..11.28 rows=3 width=8) -> Bitmap Heap Scan on communities (cost=4.17..11.28 rows=3 width=8) Recheck Cond: (clustering_id = 1) -> Bitmap Index Scan on index_communities_on_clustering_id (cost=0.00..4.17 rows=3 width=0) Index Cond: (clustering_id = 1) -> Index Scan using consumers_pkey on consumers (cost=0.15..5.38 rows=1 width=33) Index Cond: (id = communities_consumers.consumer_id) -> Bitmap Heap Scan on data_points (cost=7631.64..10805.72 rows=174 width=24) Recheck Cond: ((consumer_id = consumers.id) AND ("timestamp" >= '2015-11-29 20:52:30.926247'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926468'::timestamp without time zone)) Filter: (interval_id = 2::bigint) -> BitmapAnd (cost=7631.64..7631.64 rows=861 width=0) -> Bitmap Index Scan on index_data_points_on_consumer_id (cost=0.00..1589.92 rows=76778 width=0) Index Cond: (consumer_id = consumers.id) -> Bitmap Index Scan on index_data_points_on_timestamp (cost=0.00..6028.58 rows=254814 width=0) Index Cond: (("timestamp" >= '2015-11-29 20:52:30.926247'::timestamp without time zone) AND ("timestamp" <= '2015-12-06 20:52:30.926468'::timestamp without time zone)) (23 rows)
As stated in the comments, these are query plans for a simplified query with a restriction on communities.id
and without him
DataPoint Load (1563.3ms) SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]] EXPLAIN for: SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]] QUERY PLAN
and
DataPoint Load (1479.0ms) SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]] EXPLAIN for: SELECT consumers.name as con, array_agg(timestamp ORDER BY data_points.timestamp asc) as tims, array_agg(consumption ORDER BY data_points.timestamp ASC) as cons FROM "data_points" INNER JOIN "consumers" ON "consumers"."id" = "data_points"."consumer_id" INNER JOIN "communities_consumers" ON "communities_consumers"."consumer_id" = "consumers"."id" INNER JOIN "communities" ON "communities"."id" = "communities_consumers"."community_id" WHERE ("data_points"."timestamp" BETWEEN $1 AND $2) AND "data_points"."interval_id" = $3 GROUP BY communities.id, consumers.name [["timestamp", "2015-11-29 20:52:30.926000"], ["timestamp", "2015-12-06 20:52:30.926000"], ["interval_id", 2]] QUERY PLAN