However, I often read about performance problems when fields are invalid and it is recommended to use an empty string in cases where NULL is actually semantically correct.
For some time I'm going to make a choice because of the choice of the word:
- Even if it was a significant performance factor, this does not make it semantically correct for using a value instead of NULL. In SQL, NULL has a semantic role to indicate a missing or inapplicable value. NULL performance characteristics in this RDBMS implementation are independent of this. Performance may vary from brand to brand or from version to version, but the NULL target in the language is consistent.
In any case, I have not heard any evidence that NULL works poorly. I would be interested in any references to performance measurements that show nullable columns that perform worse than non-null columns.
I am not saying that I am not mistaken, or that this cannot be true in some cases - simply that it makes no sense to make idle assumptions. Science does not consist of hypotheses; you need to show evidence with repeatable measurements.
Metrics will also tell you how different the performance is, so you can judge whether or not to worry about something. That is, the impact can be measurable and non-zero, but still negligible compared to higher performance factors, such as proper indexing of tables or determining the size of the database cache.
In MySQL, a NULL search can benefit from an index:
mysql> CREATE TABLE foo ( i INT NOT NULL, j INT DEFAULT NULL, PRIMARY KEY (i), UNIQUE KEY j_index (j) ); mysql> INSERT INTO foo (i, j) VALUES (1, 1), (2, 2), (3, NULL), (4, NULL), (5, 5); mysql> EXPLAIN SELECT * FROM foo WHERE i = 3; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | foo | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ mysql> EXPLAIN SELECT * FROM foo WHERE j IS NULL; +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | foo | ref | j_index | j_index | 5 | const | 2 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
Please note that this is not a performance measurement yet. I just showed that you can use the index when searching for NULL. I'm going to argue (though not measured, but it's just StackOverflow) that the advantage of the index overshadows any possible punishment when looking for NULL over an empty string.
It is not the right design decision to select zero or space or any other value to replace NULL. You may need to use these values as significant in the column. That's why NULL exists as a value that, by definition, is outside the value domain of any data type, so you can use the entire range of values of integers or strings or something else and still have something that means “none of the above values . "