Efficient way to select all values from one column not in another column

Question

Efficient way to select all values from one column not in another column

I need to return all values from colA that are not in colB from mytable . I use:

 SELECT DISTINCT(colA) FROM mytable WHERE colA NOT IN (SELECT colB FROM mytable)

It works, but the request takes too long to complete.

Is there a more efficient way to do this?

+6

performance sql distinct notin

Flash Jan 05 '12 at 3:03

source share

3 answers

You can use exists :

 select distinct colA from mytable m1 where not exists (select 1 from mytable m2 where m2.colB = m1.colA)

exists performs a half join to quickly combine values. not in completes the entire result set, and then on it or . exists usually faster for values in tables.

+6

Eric Jan 05 '12 at 3:05

source share

You can use the EXCEPT statement, which effectively distinguishes between two SELECT . EXCEPT DISTINCT will return only unique values. The Oracle MINUS statement is equivalent to EXCEPT DISTINCT .

0

Paul Jan 05 '12 at 3:13

source share

Erwin brandstetter · Accepted Answer · 2012-01-05T03:07:45+0000

In standard SQL, there are no parentheses in DISTINCT colA . DISTINCT not a function.

 SELECT DISTINCT colA FROM mytable WHERE colA NOT IN (SELECT DISTINCT colB FROM mytable);

Added DISTINCT to the subselect. If you have a lot of duplicates, this can speed up the request.

CTE may be faster, depending on your DBMS. In addition, I demonstrate LEFT JOIN as an alternative to excluding values in valB and an alternative way to get different values using GROUP BY :

 WITH x AS (SELECT colB FROM mytable GROUP BY colB) SELECT m.colA FROM mytable m LEFT JOIN x ON x.colB = m.colA WHERE x.colB IS NULL GROUP BY m.colA;

Or, simplified further, and with a simple subquery (possibly the fastest):

 SELECT DISTINCT m.colA FROM mytable m LEFT JOIN mytable x ON x.colB = m.colA WHERE x.colB IS NULL;

basically 4 methods to exclude rows with keys present in another (or the same) table:

Select rows not in another table.

The deciding factor for speed will be the indices . You need to have indices on colA and colB to make this query fast.

Efficient way to select all values ​​from one column not in another column

More articles:

Efficient way to select all values from one column not in another column