Why does the scheduler bring different results for functions with different volatility?

Question

Why does the scheduler bring different results for functions with different volatility?

This question arises as a result and the result of an SQL function is very slow compared to a query without a function wrapper . I should note that I do not consider this a duplicate, since this issue requires a solution to a specific problem. I ask for additional information about the behavior in general here and to demonstrate how it can be reproduced. (To demonstrate the difference, you can see a rather long stream of comments in the accepted answer, where we discussed the behavior, and I felt it was out of topic, especially considering the length.)

I have a function. Here's a sample demonstrating the behavior of interest:

CREATE OR REPLACE FUNCTION test(INT) RETURNS TABLE(num INT, letter TEXT) VOLATILE LANGUAGE SQL AS $$ SELECT * FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x LIMIT $1 $$;

When I run this EXPLAIN :

 EXPLAIN ANALYZE SELECT * FROM test(10);

I get this result in psql (where I removed the giant "Query Plan" header):

  Function Scan on test (cost=0.25..10.25 rows=1000 width=36) (actual time=0.125..0.136 rows=5 loops=1) Total runtime: 0.179 ms (2 rows)

Pay attention to the line rating. It evaluates 1000 lines.

But if I change the function to STABLE or IMMUTABLE :

 CREATE OR REPLACE FUNCTION test(INT) RETURNS TABLE(num INT, letter TEXT) STABLE LANGUAGE SQL AS $$ SELECT * FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x LIMIT $1 $$;

Then the same EXPLAIN gives me a different plan:

  Limit (cost=0.00..0.06 rows=5 width=36) (actual time=0.010..0.050 rows=5 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=36) (actual time=0.005..0.018 rows=5 loops=1) Total runtime: 0.087 ms (3 rows)

Now it correctly evaluates 5 lines and shows the plan of the request contained within the function. Cost is much higher. Lead time also decreased. (The request is so short that it may not be particularly significant.)

In light of the related question regarding a much larger amount of data and having a very significant difference in performance, it would seem that the scheduler really does something different depending on whether the function is VOLATILE or STABLE / IMMUTABLE .

What exactly does this planner do, and where can I read the documentation on it?

These tests were performed in PG 9.3.

+1

function volatile postgresql sql-execution-plan

jpmc26 Jan 30 '15 at 21:34

source share

1 answer

Daniel Vérité · Accepted Answer · 2015-01-30T22:16:40+0000

It estimates 1000 lines

1000 Valued Rows is the default value recorded in CREATE FUNCTION :

execution_cost
A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns set, this is the cost of the returned string. If no value is indicated, 1 is assumed for the C-language and internal functions and 100 units for functions in all other languages. Larger values cause the scheduler to try to avoid function evaluations more often than necessary.
result_rows
A positive number giving the estimated number of rows that the scheduler should expect to return. This is allowed when a function is declared to return a set. The default assumption is 1000 lines.

When a function is declared mutable, it asks not to be nested, therefore this default value for result_rows is executed.

On the other hand, when it is embedded in a query, as in your second test, the number of rows will be evaluated as if the function body was moved to the query and the function declaration did not exist. This leads in the second test to an accurate assessment, since the VALUES clause can be directly evaluated.

What exactly does this scheduler do, and where can I read some documentation on it?

In general, scheduler optimization strategies are not described in the main documentation. They are discussed on the mailing lists and mentioned in the comments on the source code, which, fortunately, tend to be exceptionally clear and well written (compared to the average source code). In the case of the inlining function, I believe that the comments inline_set_returning_functions and inline_set_returning_function show most of the rules that govern this particular optimization. (warning: above the links are the current leading branch, which can be changed or changed at any time).

Why does the scheduler bring different results for functions with different volatility?

More articles: