I'm having trouble getting a query planner to write good plans for row-level security (RLS) tables. It seems that all that is required is a join from a table with row-level security enabled to a table with row-level security enabled to force a bad plan, even if there are corresponding indexes on both tables that the scheduler should use.
Is there a way to help the planner understand this? Or is certain statistics unavailable when RLS is involved?
I tried to enable RLS (adding a wide open policy c USING (TRUE)) for a table that does not need RLS and has the same effect as not including the policy in this table.
DROP SCHEMA IF EXISTS foo CASCADE;
CREATE SCHEMA foo;
CREATE TABLE foo.bar AS
SELECT generate_series(1,10000000) AS id, md5(random()::text) AS descr, random() * 5 + 1 AS licflag;
CREATE TABLE foo.baz AS
SELECT generate_series(1,10000000) AS id, md5(random()::text) AS descr, random() * 5 + 1 AS licflag;
CREATE UNIQUE INDEX ON foo.bar (id);
CREATE INDEX ON foo.bar (licflag);
CREATE UNIQUE INDEX ON foo.baz (id);
CREATE INDEX ON foo.baz (licflag);
ANALYZE foo.bar;
ANALYZE foo.baz;
ALTER TABLE foo.bar ENABLE ROW LEVEL SECURITY;
DROP ROLE IF EXISTS restricted;
CREATE ROLE restricted NOINHERIT;
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA foo FROM restricted;
GRANT restricted to current_user;
GRANT USAGE ON SCHEMA foo TO restricted;
GRANT SELECT ON ALL TABLES IN SCHEMA foo TO restricted;
CREATE POLICY restrict_foo ON foo.bar
FOR SELECT TO restricted
USING (licflag < 3);
EXPLAIN ANALYZE
SELECT *
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id
WHERE f2.id BETWEEN 500 AND 12000
AND f1.licflag < 3;
SET ROLE restricted;
EXPLAIN ANALYZE
SELECT *
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id
WHERE f2.id BETWEEN 500 AND 12000;
Results in
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.87..87677.34 rows=4668 width=90) (actual time=0.091..45.337 rows=4622 loops=1)
-> Index Scan using baz_id_idx on baz f2 (cost=0.43..471.90 rows=11573 width=45) (actual time=0.042..4.496 rows=11501 loops=1)
Index Cond: ((id >= 500) AND (id <= 12000))
-> Index Scan using bar_id_idx on bar f1 (cost=0.43..7.53 rows=1 width=45) (actual time=0.003..0.003 rows=0 loops=11501)
Index Cond: (id = f2.id)
Filter: (licflag < '3'::double precision)
Rows Removed by Filter: 1
Planning time: 1.300 ms
Execution time: 45.826 ms
(9 rows)
SET
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=569.62..273628.35 rows=4227 width=90) (actual time=8.317..2074.996 rows=4558 loops=1)
Hash Cond: (f1.id = f2.id)
-> Seq Scan on bar f1 (cost=0.00..218457.95 rows=3967891 width=45) (actual time=0.016..1616.577 rows=3998388 loops=1)
Filter: (licflag < '3'::double precision)
Rows Removed by Filter: 6001612
-> Hash (cost=436.47..436.47 rows=10652 width=45) (actual time=8.033..8.033 rows=11501 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1027kB
-> Index Scan using baz_id_idx on baz f2 (cost=0.43..436.47 rows=10652 width=45) (actual time=0.026..4.871 rows=11501 loops=1)
Index Cond: ((id >= 500) AND (id <= 12000))
Planning time: 0.305 ms
Execution time: 2075.371 ms
(11 rows)
I'm on
psql (9.5.3, server 9.5.4)
UPDATE 1:
I ran a query with an RLS predicate in WHERE clauses, for example
EXPLAIN ANALYZE
SELECT *
FROM foo.bar f1
JOIN foo.baz f2 ON f1.id = f2.id
WHERE f1.id BETWEEN 500 AND 12000
AND f1.licflag < 3;
and this led to a better tariff plan. But when I removed this extra predicate, the KEPT scheduler improved the plan. This makes me think that something is wrong with the statistics. Does anyone know how to manually run statistics update without resetting statistics for all db? Going through Postgres docs right now ...
Update 2:
. , .