I am currently doing data exploration using Hive and cannot explain the following behavior. Let's say I have a table (named mytable) with a master_id field.
When I count the number of rows, I get
select count(*) as c from mytable c 1129563
If I want to count the number of rows with an invalid master_id, I get a larger number
select count(*) as c from mytable where master_id is not null c 1134041
In addition, master_id is never null.
select count(*) as c from mytable where master_id is null c 0
I cannot explain how adding a where clause can ultimately increase the number of lines. Does anyone have a hint to explain this behavior?
thanks
source share