Why does PostgreSQL handle my query differently in a function?

I have a very simple query, which is not much more complicated than:

select * from table_name where id = 1234 

... takes less than 50 milliseconds.

I took this request and included it in the function:

 CREATE OR REPLACE FUNCTION pie(id_param integer) RETURNS SETOF record AS $BODY$ BEGIN RETURN QUERY SELECT * FROM table_name where id = id_param; END $BODY$ LANGUAGE plpgsql STABLE; 

This function when executing select * from pie(123); takes 22 seconds.

If I hardcode the integer instead of id_param, the function does less than 50 milliseconds.

Why does the fact that I use the parameter in the where clause make my function run slower?


Edit to add a specific example:

 CREATE TYPE test_type AS (gid integer, geocode character varying(9)) CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character) RETURNS SETOF test_type AS $BODY$ BEGIN RETURN QUERY EXECUTE 'SELECT gs.geo_shape_id AS gid, gs.geocode FROM geo_shapes gs WHERE geocode = $1 AND geo_type = 1 GROUP BY geography, gid, geocode' USING geocode_param; END; $BODY$ LANGUAGE plpgsql STABLE; ALTER FUNCTION geocode_carrier_route_by_geocode(character) OWNER TO root; --Runs in 20 seconds select * from geocode_route_by_geocode('999xyz'); --Runs in 10 milliseconds SELECT gs.geo_shape_id AS gid, gs.geocode FROM geo_shapes gs WHERE geocode = '9999xyz' AND geo_type = 1 GROUP BY geography, gid, geocode 
+2
source share
1 answer

PostgreSQL 9.2 update

There was a significant improvement, I quote the release notes here :

Allow the scheduler to create custom plans for a specific value parameter, even when using prepared statements (Tom Lane)

In the past, the prepared statement always had a single β€œgeneral” plan that was used for all parameter values, which was often inferior to the plans used for unprepared reports containing explicit constant values. Now the scheduler is trying to create custom plans for specific parameter values. The general plan will be used only after the user plans have repeatedly proved that they do not bring any benefit. This change should eliminate performance penalties that were previously seen from using prepared statements (including non-dynamic statements in PL / PgSQL).


Original answer for PostgreSQL 9.1 or later

The plpgsql functions have the same effect as the PREPARE statement: the queries are parsed and the query plan is cached.

The advantage is that some overhead is saved for each call.
The disadvantage is that the query plan is not optimized for the specific values ​​of the parameters with which it is called.

For queries in tables with an even distribution of data, this is usually not a problem, and PL / pgSQL functions will run slightly faster than the original SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values ​​in the WHERE or, more generally, choose the best query plan for specific values, you may not get the optimal query plan. Try executing an SQL function or use dynamic SQL with EXECUTE to force reprogramming a query for each call. It might look like this:

 CREATE OR REPLACE FUNCTION pie(id_param integer) RETURNS SETOF record AS $BODY$ BEGIN RETURN QUERY EXECUTE 'SELECT * FROM table_name where id = $1' USING id_param; END $BODY$ LANGUAGE plpgsql STABLE; 

Edit after comment:

If this option does not change the runtime, there should be other factors in the game that you might have missed or not mentioned. Different database? Different parameter values? You will need to post more details.

I will add a quote from the manual to copy my above statements:

EXECUTE with a simple constant command line and some USE parameters, as in the first example above, are functionally equivalent to just write the command directly in PL / pgSQL and allow PL / pgSQL variable replacement to happen automatically. An important difference is that EXECUTE reschedules the command for each execution, generating a plan specific to the current parameter value; whereas PL / pgSQL usually creates a general plan and caches it for reuse. In situations where the best plan is highly dependent on the value of the parameters, EXECUTE can be much faster; whereas the plan is not sensitive to parameter values, rescheduling will be a waste.

+6
source

Source: https://habr.com/ru/post/981741/


All Articles