I have a vote_pairs that looks like this:
CREATE VIEW vote_pairs AS SELECT v1.name as name1, v2.name as name2, ... FROM votes AS v1 JOIN votes AS v2 ON v1.topic_id = v2.topic_id;
And, with ~ 100k rows in the votes table, queries in this view take about 3 seconds to complete.
However, when I add an additional filter for names:
… ON v1.topic_id = v2.topic_id AND v1.name < v2.name;
The execution time is four times, it takes almost 12 seconds to complete the vote_pairs requests.
This runtime is constant, regardless of the position of the limit ... For example, a query is equally slow if the filter moves to the WHERE outer query:
SELECT * FROM vote_pairs WHERE name1 < name2;
What's happening? Lexicographic comparisons slow down in Postgres? Is this something else? And how can I improve the speed of this request?
Voting table:
CREATE TABLE votes ( topic_id INTEGER REFERENCES topics(id), name VARCHAR(64), vote VARCHAR(12) ) CREATE INDEX votes_topic_name ON votes (topic_id, name); CREATE INDEX votes_name ON votes (name);
EXPLAIN ANALYZE output without a name filter:
db=# CREATE OR REPLACE VIEW vote_pairs AS db-# SELECT db-# v1.name as name1, db-# v2.name as name2 db-# FROM votes AS v1 db-# JOIN votes AS v2 db-# ON v1.topic_id = v2.topic_id; CREATE VIEW db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=3956.38..71868.56 rows=5147800 width=28) (actual time=51.810..1236.673 rows=5082750 loops=1) Hash Cond: (v1.topic_id = v2.topic_id) -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.019..18.358 rows=112950 loops=1) -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.671..50.671 rows=112950 loops=1) -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..20.306 rows=112950 loops=1) Total runtime: 1495.963 ms (6 rows)
And with the filter:
db=# CREATE OR REPLACE VIEW vote_pairs AS db-# SELECT db-# v1.name as name1, db-# v2.name as name2 db-# FROM votes AS v1 db-# JOIN votes AS v2 db-# ON v1.topic_id = v2.topic_id AND v1.name < v2.name; CREATE VIEW db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=3956.38..84738.06 rows=1715933 width=28) (actual time=66.688..6900.478 rows=2484900 loops=1) Hash Cond: (v1.topic_id = v2.topic_id) Join Filter: ((v1.name)::text < (v2.name)::text) -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.023..24.539 rows=112950 loops=1) -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=65.603..65.603 rows=112950 loops=1) -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..26.756 rows=112950 loops=1) Total runtime: 7048.740 ms (7 rows)
EXPLAIN (ANALYZE, BUFFERS):
db=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vote_pairs; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=3956.38..71345.89 rows=5152008 width=28) (actual time=56.230..1204.522 rows=5082750 loops=1) Hash Cond: (v1.topic_id = v2.topic_id) Buffers: shared hit=129 read=1377 written=2, temp read=988 written=974 -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..20.492 rows=112950 loops=1) Buffers: shared hit=77 read=676 -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=55.742..55.742 rows=112950 loops=1) Buckets: 2048 Batches: 8 Memory Usage: 752kB Buffers: shared hit=52 read=701 written=2, temp written=480 -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..22.954 rows=112950 loops=1) Buffers: shared hit=52 read=701 written=2 Total runtime: 1499.302 ms (11 rows) db=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM vote_pairs WHERE name1 > name2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=3956.38..84225.91 rows=1717336 width=28) (actual time=51.214..6422.592 rows=2484900 loops=1) Hash Cond: (v1.topic_id = v2.topic_id) Join Filter: ((v1.name)::text > (v2.name)::text) Rows Removed by Join Filter: 2597850 Buffers: shared hit=32 read=1477, temp read=988 written=974 -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..22.605 rows=112950 loops=1) Buffers: shared hit=27 read=726 -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.678..50.678 rows=112950 loops=1) Buckets: 2048 Batches: 8 Memory Usage: 752kB Buffers: shared hit=2 read=751, temp written=480 -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.005..21.337 rows=112950 loops=1) Buffers: shared hit=2 read=751 Total runtime: 6573.308 ms (13 rows)
Other notes:
VACCUM FULL and ANALYZE votes performed- Both 8.4.11 and 9.2.3 behave identically