I have two tables:
CREATE TABLE test1 (id int); CREATE TABLE test2 (id int); INSERT INTO test1 VALUES (1); INSERT INTO test1 VALUES (2); INSERT INTO test2 VALUES (1);
Then I want to see a list of all identifiers that are in test1, and not in test2.
Here are at least three ways to do this:
INTERACTION :
SELECT a.id FROM test1 a LEFT OUTER JOIN test2 b ON a.id = b.id WHERE b.id IS NULL;
MINUS
SELECT id FROM test1 MINUS SELECT id FROM test2;
NOT IN :
SELECT id FROM test1 WHERE id NOT IN ( SELECT id FROM test2 );
So far so good. All three of these queries should give me the same results: 1 row with a value of 2.
If I insert null in test2, then OUTER JOIN and MINUS queries continue to return the same results, but NOT IN does not return rows.
It really embarrassed me. Then I noticed that if I changed it to
SELECT id FROM test1 WHERE id NOT IN ( SELECT id FROM test2 WHERE id IS NOT NULL );
that I get the expected results - another line.
Why is this happening? I suppose this is something completely fundamental to SQL, but I don’t understand what it is (and I’m sure that in the other databases that I used earlier, the three methods I cited gave equivalent results - although I I don’t have SQL Server or postgres to test against right now, so I can forget about my behavior).
(I believe one answer to this question is “Stop worrying and just don't use NOT IN,” but it can be costly in terms of code readability — sometimes it's more elegant than doing everything with external connections or minus.)
source share