Postgresql && operator is slower than expected

Sample data

I have two tables that I need to combine into a column containing an array of integers.

CREATE TABLE table_array_1 (key1 int, values1 int[]); CREATE TABLE table_array_2 (key2 int, values2 int[]); 

One table has a small number of rows, but large array sizes:

 DO $function$ DECLARE i int := 0; BEGIN WHILE i < 100 LOOP INSERT INTO table_array_1 (key1, values1) values(random(1, i), (SELECT array_agg(random(3000000, 4000000)) FROM (SELECT generate_series(1, random(1, 2000))) lp)); i := i + 1; END LOOP; END; $function$ 

The second table has more rows, but the sizes of small arrays:

 DO $function$ DECLARE i int := 0; BEGIN WHILE i < 1000 LOOP INSERT INTO table_array_2 (key2, values2) values(random(1, i), (SELECT array_agg(random(3000000, 4000000)) FROM (SELECT generate_series(1, random(1, 50))) lp)); i := i + 1; END LOOP; END; $function$ 

The random(int,int) function returns a random integer in the range:

 CREATE OR REPLACE FUNCTION random(int, int) RETURNS int LANGUAGE sql AS $function$ SELECT ($1 + ($2 - $1) * random())::int; $function$ 

Test

First, I try to join them as follows explain :

 SELECT t1.key1, t2.key2 FROM table_array_1 t1 JOIN table_array_2 t2 ON t2.values2 && t1.values1 

But it is much slower (about 100x) than this explain :

 SELECT DISTINCT t1.key1, t2.key2 FROM (SELECT key1, unnest(values1) AS values1 FROM table_array_1) t1 JOIN (SELECT key2, unnest(values2) AS values2 FROM table_array_2) t2 ON t2.values2 = t1.values1 

There are no indexes in these tables, and the cost of using the GIN too high to be useful. GiST does not improve it. This suggests that I would use the intarray extension.

Question

Why is unnest + distinct much faster?

Can I improve the performance of the array comparison operator or use something else that would not include unnest + distinct ? I am looking for performance improvements, and common sense tells me that these 2 operations should be slower.

+5
source share
1 answer

I believe that the performance difference does not really arise from JOIN vs UNNEST , but rather from the condition that you are checking.

The && array operator is quite resource intensive since it needs to check if any two elements in the arrays overlap (especially when the arrays are "dirty" with NULL values). Although a simple operand = probably works at the byte level, therefore, much faster.

PS: Depending on your version of Postgres, it may also be that your engine does not optimize the query plan correctly. ( this is, for 9.0, a pretty outstanding example ).

0
source

Source: https://habr.com/ru/post/1269098/


All Articles