When to use NULL tables in MySQL

Question

When to use NULL tables in MySQL

I appreciate the semantic value of the NULL value in the database table, other than the false and empty row. '' However, I often read about performance issues when fields are NULL, and it was recommended that you use an empty string in cases where NULL is actually semantically correct.

What circumstances are suitable for using nullable fields and NULL values? What are the tradeoffs? Is it wise to just avoid using NULL at all and just use empty lines, false or 0, to indicate the absence of a value?

UPDATE

OK. I understand the semantic difference between "and" NULL ", as well as circumstances (performance agnostics) in which NULL is the appropriate field value. However, let me talk about the intended performance problem. This is from Schwartz's excellent" high-performance MySQL ", Zeitsev et al. http://www.borders.co.uk/book/high-performance-mysql-optimization-backups-replication-and-more/857673/ :

For MySQL, it is much more difficult to optimize queries that relate to nullable coumns, because they make indexes, index statistics, and comparing values more difficult. A column with a null value uses more storage space and requires special processing inside MySQL. when an indexed column with a null value, it requires an extra byte to write and can even lead to a fixed-size index (for example, an index of a single integer column) to convert to a variable size in MyISAM.

Read more here: Google Book Preview

This may be the final answer - I was just looking for a second opinion and experience on the first line.

+47

mysql

DavidWinterbottom Jan 23 '09 at 0:03

source share

10 answers

Bill Karwin · Answer 1 · 2009-01-23 02:25

However, I often read about performance problems when fields are invalid and it is recommended to use an empty string in cases where NULL is actually semantically correct.

For some time I'm going to make a choice because of the choice of the word:

Even if it was a significant performance factor, this does not make it semantically correct for using a value instead of NULL. In SQL, NULL has a semantic role to indicate a missing or inapplicable value. NULL performance characteristics in this RDBMS implementation are independent of this. Performance may vary from brand to brand or from version to version, but the NULL target in the language is consistent.

In any case, I have not heard any evidence that NULL works poorly. I would be interested in any references to performance measurements that show nullable columns that perform worse than non-null columns.

I am not saying that I am not mistaken, or that this cannot be true in some cases - simply that it makes no sense to make idle assumptions. Science does not consist of hypotheses; you need to show evidence with repeatable measurements.

Metrics will also tell you how different the performance is, so you can judge whether or not to worry about something. That is, the impact can be measurable and non-zero, but still negligible compared to higher performance factors, such as proper indexing of tables or determining the size of the database cache.

In MySQL, a NULL search can benefit from an index:

mysql> CREATE TABLE foo ( i INT NOT NULL, j INT DEFAULT NULL, PRIMARY KEY (i), UNIQUE KEY j_index (j) ); mysql> INSERT INTO foo (i, j) VALUES (1, 1), (2, 2), (3, NULL), (4, NULL), (5, 5); mysql> EXPLAIN SELECT * FROM foo WHERE i = 3; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | foo | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ mysql> EXPLAIN SELECT * FROM foo WHERE j IS NULL; +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | foo | ref | j_index | j_index | 5 | const | 2 | Using where | +----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+

Please note that this is not a performance measurement yet. I just showed that you can use the index when searching for NULL. I'm going to argue (though not measured, but it's just StackOverflow) that the advantage of the index overshadows any possible punishment when looking for NULL over an empty string.

It is not the right design decision to select zero or space or any other value to replace NULL. You may need to use these values as significant in the column. That's why NULL exists as a value that, by definition, is outside the value domain of any data type, so you can use the entire range of values of integers or strings or something else and still have something that means “none of the above values . "

Ólafur Waage · Answer 2 · 2009-01-23 00:09

The MySQL manual does have a good article on NULL issues.

Hope this helps.

Also found this other SO address about NULL and performance

Kezzer · Answer 3 · 2009-01-27 13:32

We do not allow NULL values in our databases unless this applies to numeric values or dates. The reason we do this is because numerical values sometimes need not be defaulted to zero, as this is very, very bad. I am a developer of stock brokers, and there is a big difference between NULL and 0 . Using COALESCE is useful if we want the default values to return to zero, even if we do not store them as such.

 MyVal = COALESCE(TheData, 0)

As we do bulk inserts of data from flat files, we use format files to define a data record that, in any case, automatically converts empty values to empty strings.

The dates are by default, no matter what the value may seem, depending on the match I believe, but by default we used something like 1900, and again the dates are very important. Other values of plain text are not so important, and if you leave them blank, this is usually normal.

Jim Anderson · Answer 4 · 2009-01-23 00:08

Typically, if an attribute is required, it is defined as Not NULL, and if it can be omitted, it is defined as nullable.

ForYourOwnGood · Answer 5 · 2009-01-23 00:20

An empty string should not be used instead of NULL . NULL does not represent anything, because an empty string is something inside which there is nothing. NULL will always be false compared to another value (even NULL ) and NULL will not be summed in the COUNT function.

If you need to present unknown information, replace it with NULL .

user1105491 · Answer 6 · 2013-06-30 20:57

As @ForYourOwnGood said, Null should be used for "unknown" information. For example: If you have many fields that the client must fill out during registration, and some of them are optional. For some reason, you can reserve an identifier for this particular client, and since you do not know if additional fields are a real choice for a client that will remain empty, you must set them to NULL, that is, "unknown" when you first keep the row. If the client submits the form, passes all your verification, and then you save this information, then you know that the optional field remains empty by design.

This is a good example of using NULL.

SquareCog · Answer 7 · 2009-01-23 00:15

The main advantage, of course, is the semantic meaning of NULL, which you talked about.

In addition to this - and it may depend on your storage mechanism, as always, checking the documentation - but, at least in some databases, NULLs take up much less space than a regular value. For example, if you have a "varchar" column declared as 20 characters and it rarely populates, you can save a lot of disk space by making it NULL instead of an empty row.

I have never heard of any performance issues when using NULL, one is the other way around. I heard that people fuck their accounts because they think NULL is wrong, but they never work. If this is real, I would love to hear about it!

pilif · Answer 8 · 2009-01-23 00:17

The value of a NULL column is more or less "not applicable in this context." I usually use NULL columns in two cases:

If the field is not applicable (let's say you have a boolean column is_thirsty and you add two sets of data. One person and a stone. In the case of a person, you set is_thirsty to either true or false, whereas in the case of a stone, you would probably set its in null.
If I need to tag something and store some data with a value. Similar to the inventory closing date that you used for a) indicate that the inventory can no longer be changed, and b) indicate when the inventory was closed. Instead of two columns ( closed_at and is_closed ), I simply create a closed_at column and set it to NULL if the inventory set can still be changed, but set the date after closing it.

It basically comes down to the fact that I use NULL when a field void has a different unique semantics than just an empty field. This is the lack of an average primary. The absence of a closing date means that the inventory set is still open for changes.

NULL values can have unpleasant side effects, and they greatly complicate your ability to add data to the table, and more often than not, you can end up with mish-mash from NULL values and blank lines, for example.

In addition, NULL is not equal to anything, which will cause queries to hang everywhere if you are not very careful.

Personally, I use NULL columns only if one of the two above cases applies. I never use it to designate empty fields when emptiness makes no sense except for the absence of meaning.

dkretz · Answer 9 · 2009-01-23 00:18

Any self-respecting database engine these days should not offer a penalty for the proper use of NULL if your query is not designed correctly (which is usually not a problem that you will often have regarding NULL).

You should pay attention to using the database (including NULL) as intended; then worry about the effects of optimatin when and when they occur.

The cumulative effect of incorrect NULLed columns, both in SQL complexity and in accuracy, almost certainly outweighs the benefits of cheating with Mother. In addition, it will ruin your head, like anyone else who tries to figure out what you are trying to do.

FerranB · Answer 10 · 2009-01-23 00:22

In some databases, such as Oracle, something from MySQL may be true:

Zeros are not indexed, then finding null values can be a bottleneck.
Returning zeros to strings will save space.

When to use NULL tables in MySQL

More articles: