Why are NULL values displayed as 0 in fact tables?

Question

Why are NULL values displayed as 0 in fact tables?

What is the reason that in measurements on actual tables (dimensionally modeled data stores), NULL values are usually displayed as 0?

+4

null sql-server ssis data-warehouse dimensional-modeling

jrara Nov 28 '11 at 19:09

source share

4 answers

Although you already accepted a different answer, I would say that using NULL is actually the best choice for several reasons.

The first reason is because aggregates return a “correct” answer (that is, one that users expect) when NULL, but give a “wrong” answer when you use zero. Consider the results of AVG () in these two queries:

 -- with zero; gives 1.5 select SUM(measure), AVG(measure) from ( select 1.0 as 'measure' union all select 2.0 union all select 3.0 union all select 0 ) dt -- with null; gives 2 select SUM(measure), AVG(measure) from ( select 1.0 as 'measure' union all select 2.0 union all select 3.0 union all select null ) dt

If we assume that the measure here is the "number of days for the manufacture of the goods", and NULL represents the element that is still being produced, then zero gives the wrong answer. The same considerations apply to MIN () and MAX ().

The second problem is that if the value of zero is the default value, then how do you distinguish between zero as the default value and zero as the real value? For example, consider a measure of “delivery costs in euros,” where NULL means that the customer himself took the order so that there were no delivery fees, and zero means that the order was sent to the customer for free. You cannot use null to replace NULL without completely changing the data value. Obviously, you can argue that the difference should be clear from other dimensions (for example, the delivery method), but this adds more complexity to the reports and understands the data.

+15

Pondlife Nov 29 '11 at 12:43

source share

The main reason is that the database handles nulls differently from spaces or zeros , although they look like spaces or zeros to human eyes.

Here is a link to an old Ralph Kimball design tip on the same topic.

This blogpost talks about eliminating zeros in dimensions and gives some suggestions.

+1

Molap Nov 29 '11 at 12:44

source share

NULL instead of 0 should be used if you intend to make an average value in a fact column. This is the only time I believe that NULLS is fine or in size

if the fact value is unknown / arrives late, then leave it as NULL.

aggregated functions like MIN, MAX work on NULLS, just ignoring them

(For the record, one of Ralph Kimball's assistants said this in his course, which I intended)

 with goodf as ( select 1 x union all select null union all select 4 ) select sum(x) sumx,min(x) minx,max(x) maxx,avg(cast(x as float)) avgx from goodf with badf as ( select 1 x union all select 0 /* unknown */ union all select 4 ) select sum(x) sumx,min(x) minx,max(x) maxx,avg(cast(x as float)) avgx from badf

in badf, above the average, it goes wrong because it uses zero of an unknown value as literally 0

0

Ab bennett Oct 26 '16 at 21:15

source share

Yuck · Accepted Answer · 2011-11-28T19:12:04+0000

It depends on what you are modeling, but in general, to avoid complications with the implementation of units. And in many scenarios it makes sense to consider NULL as 0 for these purposes.

For example, a customer with NULL orders for a certain period of time. Or a merchant with NULL sales revenue (shame on him!).

Why are NULL values ​​displayed as 0 in fact tables?

More articles:

Why are NULL values displayed as 0 in fact tables?