Using a single SQL correlated subquery to get two columns

Question

Using a single SQL correlated subquery to get two columns

My problem is represented by the following query:

SELECT b.row_id, bx, by, b.something, (SELECT ax FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x, (SELECT ay FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y FROM my_table b

I use the same subquery operator twice to get both source_x and source_y . This is why I am wondering if it is possible to do this using only one subquery?

Because, as soon as I run this query on my real data (millions of rows), it seems to never end and takes hours, if not days (my connection hangs to the end).

I am using PostgreSQL 8.4

+4

performance sql indexing postgresql correlated-subquery

Julie FC Nov 06 '11 at 19:11

source share

4 answers

I think you can use this approach:

 SELECT b.row_id , bx , by , b.something , ax , ay FROM my_table b left join my_table a on a.row_id = (b.row_id - 1) and a.something != 42

+7

DavidEG Nov 06 '11 at 19:24

source share

old fashioned syntax:

 SELECT b.row_id, bx, by, b.something , ax AS source_x , ay AS source FROM my_table b ,my_table a WHERE a.row_id = b.row_id - 1 AND a.something != 42 ;

Join syntax:

 SELECT b.row_id, bx, by, b.something , ax AS source_x , ay AS source FROM my_table b JOIN my_table a ON (a.row_id = b.row_id - 1) WHERE a.something != 42 ;

0

wildplasser Nov 06 '11 at 19:31

source share

 SELECT b.row_id, bx, by, b.something, ax, ay FROM my_table b LEFT JOIN ( SELECT row_id + 1, x, y FROM my_table WHERE something != 42 ) AS a ON a.row_id = b.row_id;

0

Neil Nov 06 '11 at 19:47

source share

Erwin brandstetter · Accepted Answer · 2011-11-06T23:45:54+0000

@DavidEG has posted the best syntax for the query.

However, your problem is not only with the request technique . A JOIN instead of two subqueries can speed up work at best twice. Most likely less. This does not explain the clock. Even with millions of lines, a well-tuned PostgreSQL should complete a simple query, for example, in seconds, not hours.

The first thing that stands out is the syntax error in your request:
```
 ... WHERE a.row_id = (b.row_id - 1), a.something != 42 
```

AND or OR here, not a comma.

Next, you need to check the indices . If row_id not a primary key, you may not have a pointer to it. For optimal performance of this particular query, create an index of multiple columns on (row_id, something) as follows:
```
 CREATE INDEX my_table_row_id_something_idx ON my_table (row_id, something) 
```
If a filter excludes one value every time in something != 42 , you can also use a partial index instead of extra speed:
```
 CREATE INDEX my_table_row_id_something_idx ON my_table (row_id) WHERE something != 42 
```

This will make a difference if 42 is a common value or something is a larger column than just an integer. (An index with two integers usually takes the same size on disk as an index with one of them, due to data alignment. More about data alignment here .)

When performance is a problem, it is always recommended to check your settings . PostgreSQL's default settings are very minimal in many distributions, not for processing millions of lines.
Depending on your actual version of PostgreSQL, upgrading to the current version 9.1 can help a lot .
Ultimately, it is always a hardware factor. Tuning and optimization can bring you so far.

Using a single SQL correlated subquery to get two columns

More articles: