I am trying to calculate how many values in a dataset match a filter condition, but I ran into problems when the filter does not match any records.
There are many columns in my data structure, but only three are used for this example: key is the data key for the set (not unique), value is the value of the floating number written, nominal_value is the float representing the nominal value.
Our use case now is to find the number of values that are 10% or more below the nominal value.
I am doing something like this:
filtered_data = FILTER data BY value <= (0.9 * nominal_value); filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT(filtered_data.value); DUMP filtered_count;
In most cases, there are no values outside the nominal range, so filtered_data empty (or null. Do not know how to find out which one). This results in filtered_count also empty / null, which is undesirable.
How can I build a statement that will return 0 when filtered_data empty / null? I tried several options that I found on the Internet:
-- Extra parens in COUNT required to avoid syntax error filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE COUNT((filtered_data.value is null ? {} : filtered_data.value));
that leads to:
Two inputs of BinCond must have compatible schemas. left hand side:
and
filtered_count = FOREACH (GROUP filtered_data BY key) GENERATE (filtered_data.value is null ? 0 : COUNT(filtered_data.value));
leading to empty / null results.
source share