My (simplified) request:
SELECT a.id FROM a LEFT JOIN b ON b.id = a.id WHERE b.id IS NULL ORDER BY id;
This query plan works as follows:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Merge Anti Join (cost=0.57..3831.88 rows=128092 width=8) Merge Cond: (a.id = b.id) -> Index Only Scan using a_pkey on a (cost=0.42..3399.70 rows=130352 width=8) -> Index Only Scan using b_pkey on b (cost=0.15..78.06 rows=2260 width=8) (4 rows)
However, sometimes postgresql 9.5.9 switches to sequential scanning if the scheduler thinks it might be better (see Why does PostgreSQL perform sequential scanning on an indexed column? ). However, in my case, this made matters worse.
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Merge Anti Join (cost=405448.22..39405858.08 rows=1365191502 width=8) Merge Cond: (a.id = b.id) -> Index Only Scan using a_pkey on a (cost=0.58..35528317.86 rows=1368180352 width=8) -> Materialize (cost=405447.64..420391.89 rows=2988850 width=8) -> Sort (cost=405447.64..412919.76 rows=2988850 width=8) Sort Key: b.id -> Seq Scan on b (cost=0.00..43113.50 rows=2988850 width=8) (7 rows)
My solution (hack) was to prevent sequential scans:
set enable_seqscan to off;
The postgresql documentation says that the correct way to do this is seq_page_cost using ALTER TABLESPACE. This might be useful when using ORDER BY for indexed columns, but I'm not sure. https://www.postgresql.org/docs/9.1/static/runtime-config-query.html
source share