Why is this SQL (join) query slow when connecting to a column that uses select max?

I am using Apache Derby 10.8 if that matters.

I have a very simple database with a table full of elements and a table full of bids for these elements. I want to select each item with the highest bid for this item attached to it. The following is my first attempt, and the performance is terrible:

select item.id as item_id, item.name as item_name, item.retail_value as item_retail_value, item.vendor as item_vendor, bid.bid_amount as bid_amount, bid.bidder_name as bid_bidder_name, bid.bidder_phone as bid_bidder_phone, bid.operator_name as bid_operator_name from item left outer join bid on bid.item_id = item.id and bid.bid_amount = (select max(bid.bid_amount) from bid where bid.item_id = item.id and bid.status = 'OK') 

I created a test data set that uses 282 elements with 200 bids for each element (a total of 56,400 bids). The above request takes about 30-40 seconds. If I select each item and manually iterate over items that select high bids for each, it takes less than a second.

I tried indexing the bid.bid_amount and bid.status , but that didn't do anything noticeable. SQL is not my strongest area, so if someone wants to explain why , this query is so slow that I am very grateful.

+4
source share
4 answers

The query is slow because you are doing what is called a correlated subquery - it works with max for each row.

Try something like this:

 select item.id as item_id, item.name as item_name, item.retail_value as item_retail_value, item.vendor as item_vendor, bid.bid_amount as bid_amount, bid.bidder_name as bid_bidder_name, bid.bidder_phone as bid_bidder_phone, bid.operator_name as bid_operator_name from item left outer join ( select item_id, MAX(bid_amount) maxamount from bid where status = 'OK' group by item_id ) b1 on item.id = b1.item_id left outer join bid on bid.item_id = item.id and bid.bid_amount = b1.maxamount 

This subquery is run only once, and it will run much faster.

+8
source

You have created a synchronized (or correlated) subquery. A subquery is performed for each row of the external table (element).

+2
source

The problem is that your nested subquery works at every step of the JOIN operation. Unsurprisingly, query performance is unsatisfactory; the processor and disk are probably hard to work! Assuming you are trying to get the maximum OK'd bid for each item in the item table, you can try this query:

 SELECT I.id AS item_id, I.name AS item_name, I.retail_value AS item_retail_value, I.vendor AS item_vendor, B.bid_amount AS bid_amount, B.bidder_name AS bid_bidder_name, B.bidder_phone AS bid_bidder_phone, B.operator_name AS bid_operator_name FROM item AS I LEFT OUTER JOIN (SELECT item_id, MAX(bid_amount) AS bid_amount FROM bid WHERE STATUS = 'OK' GROUP BY item_id) AS _TEMP ON _TEMP.item_id = B.item_id LEFT OUTER JOIN bid AS B ON B.item_id = _TEMP.item_id AND B.bid_amount = _TEMP.bid_amount; 
+1
source

You can also improve query performance by using indexing on bid.item_id, since the subquery selects records based on item_id.

0
source

Source: https://habr.com/ru/post/1390102/


All Articles