Two things. First, count should be count . In a pig, all built-in functions must be called with all caps.
Secondly, count counts the number of values ββin the bag, not the value. Therefore, you should group true / false and then count :
boolean = FOREACH records GENERATE $3 AS trueORfalse ; groups = GROUP boolean BY trueORfalse ; counts = FOREACH groups GENERATE group AS trueORfalse, COUNT(boolean) ;
So, now the DUMP output for counts will look something like this:
(true, 2) (false, 1)
If you need true and false values ββin their own relationships, you can FILTER output of counts . However, it would probably be better to SPLIT boolean , then make two separate accounts:
boolean = FOREACH records GENERATE $3 AS trueORfalse ; SPLIT boolean INTO alltrue IF trueORfalse == 'true', allfalse IF trueORfalse == 'false' ; tcount = FOREACH (GROUP alltrue ALL) GENERATE COUNT(alltrue) ; fcount = FOREACH (GROUP allfalse ALL) GENERATE COUNT(allfalse) ;
source share