Selecting SQL with a subquery of "IN" does not return records if the subquery contains NULL

I came across this interesting behavior. I see that the left join is the way to go, but still I would like it to be cleared. Is this a mistake or design behavior? Any explanation?

When I select records from the left table where the value is missing as a result of the subquery in the right table, the expected β€œmissing” record is not returned if the result of the subquery has zero values. I expected two ways to write this query are equivalent.

Thanks!

declare @left table (id int not null primary key identity(1,1), ref int null) declare @right table (id int not null primary key identity(1,1), ref int null) insert @left (ref) values (1) insert @left (ref) values (2) insert @right (ref) values (1) insert @right (ref) values (null) print 'unexpected empty resultset:' select * from @left where ref not in (select ref from @right) print 'expected result - ref 2:' select * from @left where ref not in (select ref from @right where ref is not null) print 'expected result - ref 2:' select l.* from @left l left join @right r on r.ref = l.ref where r.id is null print @@version 

gives:

 (1 row(s) affected) (1 row(s) affected) (1 row(s) affected) (1 row(s) affected) unexpected empty resultset: id ref ----------- ----------- (0 row(s) affected) expected result - ref 2: id ref ----------- ----------- 2 2 (1 row(s) affected) expected result - ref 2: id ref ----------- ----------- 2 2 (1 row(s) affected) Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) Apr 2 2010 15:48:46 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2) (Hypervisor) 
+4
source share
4 answers

This is by design. If the match fails and the set contains NULL, the result will be NULL as specified by the SQL standard.

  '1' IN ('1', '3') => true
 '2' IN ('1', '3') => false
 '1' IN ('1', NULL) => true
 '2' IN ('1', NULL) => NULL

 '1' NOT IN ('1', '3') => false
 '2' NOT IN ('1', '3') => true
 '1' NOT IN ('1', NULL) => false
 '2' NOT IN ('1', NULL) => NULL

Informally, the logic is that NULL can be considered as an unknown value. For example, it does not matter here that the unknown value - β€œ1” is explicitly indicated in the set, so the result is correct.

 '1' IN ('1', NULL) => true 

In the following example, we cannot be sure that β€œ2” is in the set, but since we do not know all the values, we also cannot be sure that it is not in the set. So the result is NULL.

 '2' IN ('1', NULL) => NULL 

Another way to look at this is to rewrite x NOT IN (Y, Z) as X <> Y AND X <> Z Then you can use the three-valued logic rules:

 true AND NULL => NULL false AND NULL => false 
+6
source

Yes, that’s how it was designed. There are also many other considerations between executing a LEFT JOIN or NOT IN . You should see this link to have a very good explanation of this behavior.

+3
source

That's what the ANSI committee thinks.

You can precede your requests with

 set ansi_defaults OFF 

and you get the expected result.

Since Microsoft SQL-Server 7.0 is pretty strict regarding compliance with ansi standards.

EDIT:

Do not fight the defaults. In the end, you give up.

0
source

The root cause of the behavior is attributed to Mark. It can be resolved in more than one way: - LEFT JOIN, Filtering NULL values ​​from an internal query, by filtering them from the where clause or from the select clause using the associated subquery - to name a few.

The following three short messages are a case study on the same subject: - NOT IN Subquery returns zero rows -Use , NOT IN Subquery returns zero rows -Root Cause , NOT IN Subquery returns zero rows -Workarounds

0
source

Source: https://habr.com/ru/post/1333469/


All Articles