Sample data
I have two tables that I need to combine into a column containing an array of integers.
CREATE TABLE table_array_1 (key1 int, values1 int[]); CREATE TABLE table_array_2 (key2 int, values2 int[]);
One table has a small number of rows, but large array sizes:
DO $function$ DECLARE i int := 0; BEGIN WHILE i < 100 LOOP INSERT INTO table_array_1 (key1, values1) values(random(1, i), (SELECT array_agg(random(3000000, 4000000)) FROM (SELECT generate_series(1, random(1, 2000))) lp)); i := i + 1; END LOOP; END; $function$
The second table has more rows, but the sizes of small arrays:
DO $function$ DECLARE i int := 0; BEGIN WHILE i < 1000 LOOP INSERT INTO table_array_2 (key2, values2) values(random(1, i), (SELECT array_agg(random(3000000, 4000000)) FROM (SELECT generate_series(1, random(1, 50))) lp)); i := i + 1; END LOOP; END; $function$
The random(int,int) function returns a random integer in the range:
CREATE OR REPLACE FUNCTION random(int, int) RETURNS int LANGUAGE sql AS $function$ SELECT ($1 + ($2 - $1) * random())::int; $function$
Test
First, I try to join them as follows explain :
SELECT t1.key1, t2.key2 FROM table_array_1 t1 JOIN table_array_2 t2 ON t2.values2 && t1.values1
But it is much slower (about 100x) than this explain :
SELECT DISTINCT t1.key1, t2.key2 FROM (SELECT key1, unnest(values1) AS values1 FROM table_array_1) t1 JOIN (SELECT key2, unnest(values2) AS values2 FROM table_array_2) t2 ON t2.values2 = t1.values1
There are no indexes in these tables, and the cost of using the GIN too high to be useful. GiST does not improve it. This suggests that I would use the intarray extension.
Question
Why is unnest + distinct much faster?
Can I improve the performance of the array comparison operator or use something else that would not include unnest + distinct ? I am looking for performance improvements, and common sense tells me that these 2 operations should be slower.