Does SQL query inside NOT IN take longer than a full query?

I am using NOT IN inside my SQL query.

For instance:

select columnA from table1 where columnA not in ( select columnB from table2) 

How is it possible that this part of the request

 select columnB from table2 

It takes 30 seconds to complete, but the whole request above takes 0.1 seconds to complete? Should a full request not accept 30sec +?

BTW, both queries return valid results.

Thanks!

Comments replies

This is because the second query is not actually completed, but has only returned the first rows of "x" (out of a very large table?)

No, the request completes after 30 seconds, and not in many returned rows (for example, 50).

But Aleksandar wondered why the question of evaluating the effectiveness of the killer was so quick.

my point for sure

Also, how long does columnB from table2 take to execute?

Actually the original request is "select distinct ...

+4
source share
6 answers

You seem to think that your main request involves the following steps:

 (1) Run the subquery (2) Check each row in table1 against the result set from the subquery. 

Therefore, you think that executing a subquery separately should take less time than executing the entire query.

But SQL is not a procedural language, and the structure of a query does not necessarily imply the steps that will be taken to execute the query.

According to Guff, the optimizer will come up with (which, in his opinion,) an optimal plan for fulfilling each request. These execution plans are not always obvious when viewing the request, and in some cases can indeed be completely contradictory.

I think that in this case it is more likely that the optimizer came up with a faster method of checking whether a value exists in table2 than just querying the whole table2 at once. This may be the transformation that Guff demonstrated (although this still does not mean that a specific implementation plan is being used).

I would suggest that table1 has a significantly smaller number of rows than table2, and the index exists in table2.columnB. So all you have to do is get the rows from table1, and then examine the index for each of these values ​​to check for existence. But this is only one possibility.

In addition, as Michael Buen noted, differences in the size of the returned result set can also affect your perceived performance. My intuition is that this is secondary to the differences in the implementation plan, but can be significant.

+3
source

This is because the query optimizer turns a query into something completely different. The actual request should be the same as the one created using this request:

 select columnA from table1 left join table2 on ColumnA = ColumnB where ColumnB is null 

If a database can use indexes to join tables, it may not need to query the entire table2 or even touch the table itself.

+4
source

A dramatic comparison, let's say ...

 select columnB from table2 

... has a billion lines (30 seconds), many data moves the wire and is displayed to the user.

And this...

 select columnA from table1 

... has only one line.

RDBMS will not do a stupid job of retrieving table2 data from server to client if you are not going to display table2 data. Thus, when testing the presence of data, large network bandwidth or I / O will not be used, all this happens on the server, the only thing that could be pulled from the server to the client is only one row of table1.

 select columnA from table1 where columnA not in ( select columnB from table2) 

And everything will be especially fast if your column A and columnB have an index

Things that would slow down the database are twofold: firstly, when you pull too much data from the server to the client, and secondly, when you do not have an index for the corresponding fields

+2
source

When it can use indexes, and the number of returned results is small. May be. Returning results may lead to runtime.

0
source

just go on the call, make sure you know the difference between NOT IN and NOT EXISTS .

If "columnA" is NULL, it will not be returned using the NOT IN solution you are viewing, but the above LEFT antijoin examples will behave as NON-EXISTING.

Also, make sure the TOAD / SQL developer doesn’t just show the top 50 that they like to do (do a count of the number (*) from table1 to see if the result of the query is 50).

do PLUS EXPLAIN in the query and see if it contains something that looks blatant - check your indexes and also see if the columns allow NULLS - lack of indexes may be the culprit, but a full scan of the table from NULLS can be a nightmare) .

0
source

NOT is a performance killer.

On some SQL machines, first in (...) the temporary table, the query is executed again that does NOT on the data in the temporary table.

If you can, you should use only IN!

-1
source

Source: https://habr.com/ru/post/1334356/


All Articles