Null and IN () provide unexpected results

It seems so basic that I am stunned by the lack of a better word. I have two tables, call them albums and artists

 CREATE TABLE `albums` ( `album_id` bigint(20) NOT NULL AUTO_INCREMENT, `artist_id` bigint(20) DEFAULT NULL, `name` varchar(200) NOT NULL, PRIMARY KEY (`album_id`) ) CREATE TABLE `artists` ( `artist_id` bigint(20) NOT NULL AUTO_INCREMENT, `name` varchar(250) NOT NULL, PRIMARY KEY (`artist_id`) ) 

Each table contains several hundred thousand reconstructions. Some of the album lines have a zero artist_id , this is expected.

However, when I execute the following query to find artists without albums:

SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums)

... the query returns null results. I know this is not true. So I tried the following:

SELECT * FROM artists WHERE artist_id NOT IN (SELECT artist_id FROM albums WHERE artist_id IS NOT NULL)

... and I return a couple of thousand lines. My question is: why does the first query seem to work with the idea that any number = NULL? Or is it an odd effect that NULL has in an IN() expression? I feel that this is something basic that I missed. I usually do not use NULL in my db tables.

+6
source share
3 answers

This is why NOT EXISTS is semantically correct.

 SELECT * FROM artists ar WHERE NOT EXISTS (SELECT * FROM albums al WHERE ar.artist_id = al.artist_id) 

Logics:

  • NOT IN (x, y, NULL) actually
    • NOT (x OR y OR NULL) actually
      • (NOT x) AND (NOT y) AND (NOT NULL)

So NULL invalidates the integer NOT IN

+7
source

The quick answer is that the IN operator is a shortcut to =a OR =b OR ... If you include zeros in this list, then I think this violates the operator. Perhaps your second option is the best option.

Or using a connection can also work and be more efficient.

+7
source

This is due to how SQL NULLs are interpreted - you should think of them as UNKNOWN.

Let's say you have artist_id = 1

If you run the following:

 artist_id = NULL 

Instead of getting "False" - you get "UNKNOWN";

When you run a query such as yours, only values โ€‹โ€‹evaluating to TRUE are returned.

 artist_id IN (NULL, NULL, NULL...) = UNKNOWN artist_id NOT IN (NULL, NULL, NULL....) = UNKNOWN 
+2
source

Source: https://habr.com/ru/post/893874/


All Articles