First of all, do not use null in your Scala code, unless you need it for compatibility reasons.
As for your question, this is plain SQL. col("c1") === null interpreted as c1 = NULL and since NULL marks undefined values, the result is not defined for any value, including NULL itself.
spark.sql("SELECT NULL = NULL").show
+-------------+ |(NULL = NULL)| +-------------+ | null| +-------------+
spark.sql("SELECT NULL != NULL").show
+-------------------+ |(NOT (NULL = NULL))| +-------------------+ | null| +-------------------+
spark.sql("SELECT TRUE != NULL").show
+------------------------------------+ |(NOT (true = CAST(NULL AS BOOLEAN)))| +------------------------------------+ | null| +------------------------------------+
spark.sql("SELECT TRUE = NULL").show
+------------------------------+ |(true = CAST(NULL AS BOOLEAN))| +------------------------------+ | null| +------------------------------+
The only valid methods for checking for NULL :
IS NULL :
spark.sql("SELECT NULL IS NULL").show
+--------------+ |(NULL IS NULL)| +--------------+ | true| +--------------+
spark.sql("SELECT TRUE IS NULL").show
+--------------+ |(true IS NULL)| +--------------+ | false| +--------------+
IS NOT NULL :
spark.sql("SELECT NULL IS NOT NULL").show
+------------------+ |(NULL IS NOT NULL)| +------------------+ | false| +------------------+
spark.sql("SELECT TRUE IS NOT NULL").show
+------------------+ |(true IS NOT NULL)| +------------------+ | true| +------------------+
implemented in the DataFrame DSL as Column.isNull and Column.isNotNull respectively.
Note :
For NULL -safe comparisons, use IS DISTINCT / IS NOT DISTINCT :
spark.sql("SELECT NULL IS NOT DISTINCT FROM NULL").show
+---------------+ |(NULL <=> NULL)| +---------------+ | true| +---------------+
spark.sql("SELECT NULL IS NOT DISTINCT FROM TRUE").show
+--------------------------------+ |(CAST(NULL AS BOOLEAN) <=> true)| +--------------------------------+ | false| +--------------------------------+
or not(_ <=> _) / <=>
spark.sql("SELECT NULL AS col1, NULL AS col2").select($"col1" <=> $"col2").show
+---------------+ |(col1 <=> col2)| +---------------+ | true| +---------------+
spark.sql("SELECT NULL AS col1, TRUE AS col2").select($"col1" <=> $"col2").show
+---------------+ |(col1 <=> col2)| +---------------+ | false| +---------------+
in SQL and DataFrame DSL respectively.
Related :
Enabling null values ββin Apache Spark Join