Postgres Materialize results in poor performance in the delete request

I have a DELETE query that needs to be run on PostgreSQL 9.0.4. I find that it works until it reaches 524,289 rows in the subquery request.

For example, at step 524,288 there is no materialized view, and the cost looks pretty good:

 explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN (SELECT userID FROM table2 fetch first 524288 rows only);  QUERY PLAN ------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ----------- Delete (cost = 13549.49..17840.67 rows = 21 width = 6) -> Index Scan using jslps_userid_nopt on table1 (cost = 13549.49..17840.67 rows = 21 width = 6) Filter: ((NOT (hashed SubPlan 1)) AND (pointlevel = 0)) SubPlan 1 -> Limit (cost = 0.00..12238.77 rows = 524288 width = 8) -> Seq Scan on table2 (cost = 0.00..17677.92 rows = 757292 width = 8) (6 rows) 

However, as soon as I hit 524,289, the materialized view will come into play and the DELETE query will become much more expensive:

  explain DELETE FROM table1 WHERE pointLevel = 0 AND userID NOT IN
 (SELECT userID FROM table2 fetch first 524289 rows only);

   QUERY PLAN

 -------------------------------------------------- -------------------------------------------------- -------  
 Delete (cost = 0.00..386910.33 rows = 21 width = 6)
     -> Index Scan using jslps_userid_nopt on table1 (cost = 0.00..386910.33 rows = 21 width = 6)
          Filter: ((pointlevel = 0) AND (NOT (SubPlan 1)))
          SubPlan 1
            -> Materialize (cost = 0.00..16909.24 rows = 524289 width = 8)
                  -> Limit (cost = 0.00..12238.79 rows = 524289 width = 8)
                        -> Seq Scan on table2 (cost = 0.00..17677.92 rows = 757292 width = 8) (7 rows)

I worked on the problem using JOIN instead in the subselect request:

SELECT s.userid FROM table1 s LEFT JOIN table2 p ON s.userid=p.userid WHERE p.userid IS NULL AND s.pointlevel=0 

However, I am still interested in understanding why materialization significantly reduces productivity.

+6
source share
1 answer

I assume that in rows=524289 memory buffer is rows=524289 , so the subquery must be materialized on disk. Consequently, a dramatic increase in the required time.

Here you can learn more about tuning memory buffers: http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html
If you play with work_mem , you will see the difference in the behavior of the request.

However, using a join in a subquery is a much better way to speed up the query, since you are limiting the number of rows in the source itself, rather than just picking the first XYZ rows and then doing the checks.

+4
source

Source: https://habr.com/ru/post/977007/


All Articles