Why is the subquery and join so slow

I need to select rows from the BUNDLES table that have one of several SAP_STATE_ID values. These values ​​depend on whether the corresponding SAP status should be exported or not.

This query is executed very quickly (there is an index in the field SAP_STATE_ID) -

SELECT b.* FROM BUNDLES b WHERE b.SAP_STATE_ID IN (2,3,5,6) 

But ... I would like to get a list of identifiers dynamically, for example:

 SELECT b.* FROM BUNDLES b WHERE b.SAP_STATE_ID IN (SELECT s.SAP_STATE_ID FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1) 

And this question suddenly takes too much time. I would expect the SQL server to start the subquery first (it does not depend on anything from the main query), and then run everything, as in my first example. I tried rewriting it to use joins instead of a subquery:

 SELECT b.* FROM BUNDLES b JOIN SAP_STATES s ON (s.SAP_STATE_ID = b.SAP_STATE_ID) WHERE s.EXPORT_TO_SAP = 1 

but it has the same poor performance. It seems to be running a subquery for each row of the BUNDLES table or something like that. I do not really know how to read execution plans, but I tried. It says that 81% is for scanning the BUNDLES primary key index (I have no idea why it should do this, there is a BUNDLE_ID field defined as PRIMARY KEY, but it does not appear at all in the request ...)

Does anyone have an explanation why the SQL server is so "stupid"? Is there a way to achieve what I want with good performance, but without having to provide a static list of SAP_STATE_ID?

script for both tables and associated indexes - http://mab.to/xbYiI0wKj

execution plan for the subquery version - http://mab.to/8Qh6gpdYZ

query plan for the version with associations - http://mab.to/YCqeGCUbr

(for some reason, these two plans look the same, and both suggest creating the BUNDLES.SAP_STATE_ID index that already exists)

+6
source share
3 answers

I am sure that the statistics are not listed in the tables. If you want to make it work in a hurry, I would write a query like:

 SELECT b.* FROM SAP_STATES s INNER LOOP JOIN BUNDLES b ON s.SAP_STATE_ID = b.SAP_STATE_ID WHERE s.EXPORT_TO_SAP = 1 

This makes the nested loops connect over SAP_STATES , which filters on BUNDLES

+3
source

When you use tables (temporary or physical), the SQL engine builds statistics against it and, therefore, has a very clear idea of ​​the number of rows in it and which is the best way to execute it. On the other hand, a computed table (sub query) has no statistics against it.

So, although it may seem simple to a person to deduce the number of rows in it, the “stupid” SQL Engine does not know about all this. Now, having come to the request, the WHERE s.EXPORT_TO_SAP = 1 creates a world of differences here. The clustered index is sorted and built by SAP_STATE_ID, but additionally checks the WHERE clause; it has no option except to scan the entire table (in the final data set)! I bet that if instead of a clustered index, if a non-clustered index was included in the column SAP_STATE_ID that covers the field EXPORT_TO_SAP, this could do the trick. Since scanning with a clustered index is generally bad for performance, I suggest you use the approach below:

 SELECT s.SAP_STATE_ID into #Sap_State FROM SAP_STATES s WHERE s.EXPORT_TO_SAP = 1 SELECT b.* FROM BUNDLES b join #Sap_State a on a.sap_state_id = b.sap_state_id 
+2
source

Since for some reason there are problems with mab.to,

I would suggest the following

 table index sap_states (export_to_sap, sap_state_id ) bundles (sap_state_id) select b.* from sap_states ss join bundles b on ss.sap_state_id = b.sap_state_id where ss.export_to_sap = 1 
0
source

Source: https://habr.com/ru/post/976065/


All Articles