Why Postgres SQL function scans partitions that it should not

Question

Why Postgres SQL function scans partitions that it should not

I came across a very strange problem with my SQL functions. It seems they have different execution plans between the language SQL and language plpgsql , but I can’t say which execution plan is installed for the SQL version, since it requires this: Function final statement must be SELECT or INSERT/UPDATE/DELETE RETURNING. and will not allow me to use EXPLAIN .

As for why I know that they have different plans, this is because the SQL version is not running, complaining that it cannot connect to one of the foreign servers that are currently filmed. The connection is performed using external tables, and this table is divided by date ( date_col column), with some of its partitions physically located on one server, and some on a foreign server. The Date parameter used in the function ensures that it should only scan one section and that section is on the same server. This is also shown in EXPLAIN below, used in plain SQL (not in function):

 Append (cost=2.77..39.52 rows=2 width=36) CTE ct -> Result (cost=0.00..0.51 rows=100 width=4) InitPlan 2 (returns $1) -> Aggregate (cost=2.25..2.26 rows=1 width=32) -> CTE Scan on ct (cost=0.00..2.00 rows=100 width=4) -> Seq Scan on table1 (cost=0.00..0.00 rows=1 width=36) Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1))) -> Seq Scan on "part$_table1_201707" (cost=0.00..36.75 rows=1 width=36) Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1)))

External sections until 2017 show that the scheduler selects the correct section and is not looking for others to scan. This is true for plain SQL and plpgsql function , but not for sql function . Why is this possible and can I avoid it without rewriting my functions?

From what I understand, there should be some difference between how the parameters are passed to the sql function , since the hard coding date in it does not allow the request to scan unnecessary sections. Perhaps something like this is happening:

 WITH ct AS (SELECT unnest(array[1,2]) AS arr) SELECT col1, col2 FROM table1 WHERE date_col = (SELECT '2017-07-30'::date) AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])

Creating such an EXPLAIN :

 Append (cost=2.78..183.67 rows=3 width=36) CTE ct -> Result (cost=0.00..0.51 rows=100 width=4) InitPlan 2 (returns $1) -> Result (cost=0.00..0.01 rows=1 width=4) InitPlan 3 (returns $2) -> Aggregate (cost=2.25..2.26 rows=1 width=32) -> CTE Scan on ct (cost=0.00..2.00 rows=100 width=4) -> Seq Scan on table1 (cost=0.00..0.00 rows=1 width=36) Filter: ((date_col = $1) AND (some_col = ANY ($2))) -> Seq Scan on "part$_table1_201707" (cost=0.00..36.75 rows=1 width=36) Filter: ((date_col = $1) AND (some_col = ANY ($2))) -> Foreign Scan on "part$_table1_201603" (cost=100.00..144.14 rows=1 width=36)

For reference, you can reproduce the problem on PostgreSQL 9.6.4 using the following code:

 CREATE SERVER broken_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'broken_server', dbname 'postgres', port '5432'); CREATE USER MAPPING FOR postgres SERVER broken_server OPTIONS (user 'foreign_username', password 'foreign_password'); CREATE TABLE table1 (id serial PRIMARY KEY, date_col date, some_col int, col1 int, col2 text); CREATE TABLE part$_table1_201707 () INHERITS (table1); ALTER TABLE part$_table1_201707 ADD CONSTRAINT part$_table1_201707_date_chk CHECK (date_col BETWEEN '2017-07-01'::date AND '2017-07-31'::date); CREATE FOREIGN TABLE part$_table1_201603 () INHERITS (table1) SERVER broken_server OPTIONS (schema_name 'public', table_name 'part$_table1_201603'); ALTER TABLE part$_table1_201603 ADD CONSTRAINT part$_table1_201603_date_chk CHECK (date_col BETWEEN '2016-03-01'::date AND '2016-03-31'::date); CREATE OR REPLACE FUNCTION function_plpgsql(param1 date, param2 int[]) RETURNS TABLE(col1 int, col2 text) LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN -- RETURN QUERY WITH ct AS (SELECT unnest(param2) AS arr) SELECT t.col1, t.col2 FROM table1 AS t WHERE date_col = param1 AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[]); --reasons -- END; $function$; CREATE OR REPLACE FUNCTION function_sql(param1 date, param2 int[]) RETURNS TABLE(col1 int, col2 text) LANGUAGE SQL SECURITY DEFINER AS $function$ -- WITH ct AS (SELECT unnest(param2) AS arr) SELECT t.col1, t.col2 FROM table1 AS t WHERE date_col = param1 AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[]) -- $function$; CREATE OR REPLACE FUNCTION function_sql_hardcoded(param1 date, param2 int[]) RETURNS TABLE(col1 int, col2 text) LANGUAGE SQL SECURITY DEFINER AS $function$ -- WITH ct AS (SELECT unnest(param2) AS arr) SELECT t.col1, t.col2 FROM table1 AS t WHERE date_col = '2017-07-30'::date AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[]) -- $function$; EXPLAIN ANALYZE SELECT * FROM function_sql('2017-07-30'::date, array[1,2]); -- ERROR: could not connect to server "broken_server" EXPLAIN ANALYZE SELECT * FROM function_plpgsql('2017-07-30'::date, array[1,2]); --works EXPLAIN ANALYZE SELECT * FROM function_sql_hardcoded('2017-07-30'::date, array[1,2]); --works, but useless

+5

postgresql partitioning

Łukasz Kamiński 21 sept '17 at 9:17

source share

1 answer

Vao tsun · Answer 1 · 2017-09-21T10:08:37+0000

https://www.postgresql.org/docs/current/static/ddl-partitioning.html

The exception of restrictions is valid only when the WHERE clause contains a constant request (or external parameters). For example, comparisons with an immutable function such as CURRENT_TIMESTAMP cannot be optimized because the scheduler cannot know which section the function value may fall at runtime.

which explains the scan of unnecessary partitions - plpgsql processes the request before giving it to the optimizer, I assume, and the sql function with a constant should work. like a prepared expression, I think. but comparing an attribute value with a function parameter is probably not a suitable case :)

Why Postgres SQL function scans partitions that it should not

More articles: