Zero or non-zero varchar data types - which is faster for queries?

Question

Zero or non-zero varchar data types - which is faster for queries?

Usually we prefer that all our varchar / nvarchar columns do not have NULL values with an empty string ('') as the default value. Someone from the team suggested that nullable is better because:

A query like this:

 Select * From MyTable Where MyColumn IS NOT NULL

faster than this:

 Select * From MyTable Where MyColumn == ''

Does anyone have experience to check if this is true?

+9

sql

Randy Minder Jun 19 '10 at 15:08

source share

5 answers

If you want to know that there is no value, use NULL.

Regarding speed, IS NULL should be faster because it does not use string comparison.

+5

Mewp Jun 19 '10 at 15:10

source share

If you need NULL, use NULL. Also an empty string.

Regarding performance, “it depends”

If you have varchar, you store the actual value in a string for the length. If you have a char, you keep the actual length. NULL will not be stored in the string depending on the engine (for example, NULL bitmap for SQL Server).

This means that NULL is faster, the request for the request, but it can add complexity COALESCE / NULLIF / ISNULL.

So, your colleague is partially right, but may not fully appreciate it.

Blindly, using an empty string, the sentinel value is used, and then works through the semantic problem NULL

FWIW and in person:

I would use NULL, but not always. I like to avoid dates like December 31, 9999, in which NULL avoidance leads you.
From the answer of Cade Roux ... I also think that discussions about the "Date of death are insignificant" are meaningless. For the field, from a practical point of view, there is either a value or not.
Sentinel values are worse than NULL. Magic numbers. is anyone

+4

gbn Jun 19 '10 at 19:15

source share

Tell this guy on your team to get his head out of his ass! (But beautiful).

Developers like this may be the poison for a team full of low level optimization myths, all of which may be true or were true at one point in time for a particular provider or request template, or perhaps only in theory but never true in practice. Performing these myths is a waste of time and can ruin a good design.

He probably knows well and wants to bring his knowledge to the team. Unfortunately, he is mistaken. It is incorrect in the sense that the benchmark confirms the validity or incorrectness of its statement. He is mistaken in the sense that this is not how you create the database. The question of whether to make a field NULL-capable is the question of a data domain for the purpose of determining the type of field. It should be answered in terms of what it means that the field does not matter.

+2

John Jun. 19 '10 at 16:04

source share

In short, NULL = UNKNOWN! .. Which means (using the date of death) that the object can be 1) alive, 2) dead, but the date of death is unknown, or 3) is unknown if the entity is dead or alive. For numeric columns, I always default to 0 (ZERO), because somewhere along the line you may have to perform aggregate calculations and NULL + 123 = NULL. For alphanumerics, I use NULL because it is the least expensive in terms of performance and easier to say "... where IS NULL" than to say "... where a =" ". Using '... where a =" "[space] 'is not a good idea because [space] is not NULL! For dates, if you need to leave the date column NULL, you can add a status indicator column, which in the above example is A = Alive, D = Dead, Q = Dead, date of death unknown; N = Alive or Dead unknown.

+1

Frank R. Jun 19 '10 at 17:27

source share

Cade Roux · Accepted Answer · 2010-06-19 15:51

On some platforms (and even versions), this depends on how the NULL indexes are indexed.

My basic rule for NULL:

Do not allow NULL until justified
Do not allow NULL if the data is really not known.

A good example of this is address bar modeling. If you have AddressLine1 and AddressLine2, what does this mean that the first has data and the second has NULL? It seems to me that you either know the address or not, and partial NULLs in the dataset just pose problems when someone concatenates them and gets NULL (ANSI behavior). You can solve this problem by resolving NULL and adding a control constraint - either all the address information will be NULL or not.

A similar thing with an average initial / name. Some do not. Is this different from the fact that it is unknown, and you do not care?

Also, date of death - what does NULL mean? Not dead? Unknown date of death? Many times, one column is not enough to encode domain knowledge.

So, for me, whether to allow NULL has a very strong influence on semantics data - performance will be second, because if the data is misinterpreted (potentially by many different people), it is usually a much more expensive problem than performance.

This may seem like a small thing (in SQL Server, an implementation is a bitmask stored in a string), but only with NULL permission after justification does it seem to me to work better. It catches things at an early stage of development, makes you consider assumptions and understand your problem area.

Zero or non-zero varchar data types - which is faster for queries?

More articles: