I'm trying to learn a hive. Surprisingly, I cannot find an example of how to write a simple word counting task. Is it correct?
Say I have an input file input.tsv
:
hello, world this is an example input file
I am creating a splitter in Python to turn each line into words:
import sys for line in sys.stdin: for word in line.split(): print word
And then I have the following in my hive script:
CREATE TABLE input (line STRING); LOAD DATA LOCAL INPATH 'input.tsv' OVERWRITE INTO TABLE input;
I'm not sure if I missed something, or if it is really that difficult. (In particular, I need a temporary table of words
, and do I need to write an external splitter function?)
source share