SQL function is very slow compared to query without function shell

Question

SQL function is very slow compared to query without function shell

I have this PostgreSQL 9.4 query that works very fast (~ 12 ms):

SELECT auth_web_events.id, auth_web_events.time_stamp, auth_web_events.description, auth_web_events.origin, auth_user.email, customers.name, auth_web_events.client_ip FROM public.auth_web_events, public.auth_user, public.customers WHERE auth_web_events.user_id_fk = auth_user.id AND auth_user.customer_id_fk = customers.id AND auth_web_events.user_id_fk = 2 ORDER BY auth_web_events.id DESC;

But if I embed it in a function, the query is executed very slowly through all the data, it seems that it is executed through each record, what am I missing ?, I have ~ 1M of data, and I want to simplify saving the database level to large queries to functions and submissions.

 CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE( id int, time_stamp timestamp with time zone, description text, origin text, userlogin text, customer text, client_ip inet ) AS $func$ SELECT auth_web_events.id, auth_web_events.time_stamp, auth_web_events.description, auth_web_events.origin, auth_user.email AS user, customers.name AS customer, auth_web_events.client_ip FROM public.auth_web_events, public.auth_user, public.customers WHERE auth_web_events.user_id_fk = auth_user.id AND auth_user.customer_id_fk = customers.id AND auth_web_events.user_id_fk = $1 ORDER BY auth_web_events.id DESC; $func$ LANGUAGE SQL;

Request Plan:

 "Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)" " Sort Key: auth_web_events.id" " Sort Method: quicksort Memory: 25kB" " -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)" " -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)" " -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)" " Index Cond: (user_id_fk = 2)" " -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)" " Index Cond: (id = 2)" " -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)" " Index Cond: (id = auth_user.customer_id_fk)" "Planning time: 0.369 ms" "Execution time: 61.965 ms"

I call funcion this way:

 SELECT * from get_web_events_by_userid(2)

Request plan for function:

 "Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)" "Planning time: 0.038 ms" "Execution time: 279107.175 ms"

EDIT: I just change the parameters and the problem persists.
EDIT2: Erwin Response Plan Request:

 "Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)" " Sort Key: w.id" " Sort Method: quicksort Memory: 25kB" " -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)" " -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)" " -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)" " Index Cond: (id = 2)" " -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)" " Index Cond: (user_id_fk = 2)" " -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)" " Index Cond: (id = u.customer_id_fk)" "Planning time: 0.541 ms" "Execution time: 0.101 ms"

+6

function postgresql postgresql-performance sql-execution-plan

Mmeyer Jan 30 '15 at 0:21

source share

2 answers

You will get better performance by making this query dynamic and using plpgsql.

 CREATE OR REPLACE FUNCTION get_web_events_by_userid(uid int) RETURNS TABLE( id int, time_stamp timestamp with time zone, description text, origin text, userlogin text, customer text, client_ip inet ) AS $$ BEGIN RETURN QUERY EXECUTE 'SELECT auth_web_events.id, auth_web_events.time_stamp, auth_web_events.description, auth_web_events.origin, auth_user.email AS user, customers.name AS customer, auth_web_events.client_ip FROM public.auth_web_events, public.auth_user, public.customers WHERE auth_web_events.user_id_fk = auth_user.id AND auth_user.customer_id_fk = customers.id AND auth_web_events.user_id_fk = ' || uid || 'ORDER BY auth_web_events.id DESC;' END; $$ LANGUAGE plpgsql;

0

pwnyexpress Jan 30 '15 at 14:36

source share

Erwin brandstetter · Accepted Answer · 2015-01-30T17:41:21+0000

user hit>

When rewriting your function, I realized that you added column aliases:

 SELECT ... auth_user.email AS user , customers.name AS customer,

.. which did nothing , since these aliases are invisible outside the function and are not mentioned inside the function. Therefore, they will be ignored. For documentation purposes, better use a comment.

But it also makes your query invalid because user is a fully reserved word and cannot be used as a column alias unless double quoting is specified.

Oddly enough, in my tests, the function seems to work with the wrong alias. Probably because it is ignored (?). But I'm not sure that this could not have any side effects.

Your function is overwritten (otherwise equivalent):

 CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE( id int , time_stamp timestamptz , description text , origin text , userlogin text , customer text , client_ip inet ) AS $func$ SELECT w.id , w.time_stamp , w.description , w.origin , u.email -- AS user -- make this a comment! , c.name -- AS customer , w.client_ip FROM public.auth_user u JOIN public.auth_web_events w ON w.user_id_fk = u.id JOIN public.customers c ON c.id = u.customer_id_fk WHERE u.id = $1 -- reverted the logic here ORDER BY w.id DESC $func$ LANGUAGE sql STABLE;

Obviously, the STABLE keyword has changed the result. Function volatility should not be a problem in the described test situation. Typically, a setting does not provide a single, isolated function call. Read the details in the manual. In addition, the standard EXPLAIN does not display query plans for functions inside , you can use the optional auto-explain module for this:

Postgres UDF call request plan written in pgpsql

But , as we developed in the next question, changing the function volatility to STABLE allows you to embed a simple SELECT into an external operator, thereby effectively removing the function from the image. This explains why we see the query plan after adding STABLE , and also explains why the query plan is different.

Why does the scheduler come up with different results for functions with different volatility?

You have a very odd data distribution :

The auth_web_events table contains 100,000,000 entries, auth_user-> 2 entries, clients-> 1 entry

Since you did not specify otherwise, the function assumes an estimate of 1000 lines . But your function actually only returns 2 rows . If all your calls return (in close proximity) to two lines, just declare it with the added ROWS 2 . It is also possible to change the query plan for the VOLATILE variant (even if STABLE is the right choice anyway here).

SQL function is very slow compared to query without function shell

More articles: