Why are NULL values ​​displayed as 0 in fact tables?

What is the reason that in measurements on actual tables (dimensionally modeled data stores), NULL values ​​are usually displayed as 0?

+4
source share
4 answers

It depends on what you are modeling, but in general, to avoid complications with the implementation of units. And in many scenarios it makes sense to consider NULL as 0 for these purposes.

For example, a customer with NULL orders for a certain period of time. Or a merchant with NULL sales revenue (shame on him!).

+4
source

Although you already accepted a different answer, I would say that using NULL is actually the best choice for several reasons.

The first reason is because aggregates return a “correct” answer (that is, one that users expect) when NULL, but give a “wrong” answer when you use zero. Consider the results of AVG () in these two queries:

 -- with zero; gives 1.5 select SUM(measure), AVG(measure) from ( select 1.0 as 'measure' union all select 2.0 union all select 3.0 union all select 0 ) dt -- with null; gives 2 select SUM(measure), AVG(measure) from ( select 1.0 as 'measure' union all select 2.0 union all select 3.0 union all select null ) dt 

If we assume that the measure here is the "number of days for the manufacture of the goods", and NULL represents the element that is still being produced, then zero gives the wrong answer. The same considerations apply to MIN () and MAX ().

The second problem is that if the value of zero is the default value, then how do you distinguish between zero as the default value and zero as the real value? For example, consider a measure of “delivery costs in euros,” where NULL means that the customer himself took the order so that there were no delivery fees, and zero means that the order was sent to the customer for free. You cannot use null to replace NULL without completely changing the data value. Obviously, you can argue that the difference should be clear from other dimensions (for example, the delivery method), but this adds more complexity to the reports and understands the data.

+15
source

The main reason is that the database handles nulls differently from spaces or zeros , although they look like spaces or zeros to human eyes.

Here is a link to an old Ralph Kimball design tip on the same topic.

This blogpost talks about eliminating zeros in dimensions and gives some suggestions.

+1
source

NULL instead of 0 should be used if you intend to make an average value in a fact column. This is the only time I believe that NULLS is fine or in size

if the fact value is unknown / arrives late, then leave it as NULL.

aggregated functions like MIN, MAX work on NULLS, just ignoring them

(For the record, one of Ralph Kimball's assistants said this in his course, which I intended)

 with goodf as ( select 1 x union all select null union all select 4 ) select sum(x) sumx,min(x) minx,max(x) maxx,avg(cast(x as float)) avgx from goodf with badf as ( select 1 x union all select 0 /* unknown */ union all select 4 ) select sum(x) sumx,min(x) minx,max(x) maxx,avg(cast(x as float)) avgx from badf 

in badf, above the average, it goes wrong because it uses zero of an unknown value as literally 0

0
source

Source: https://habr.com/ru/post/1383491/


All Articles