I want to create a hadoop application that can read words from one file and search in another file.
If the word exists, it must be written to one output file. If the word does not exist, it must be written to another output file.
I tried some examples in hadoop. I have two questions.
Two files - approximately 200 MB each. Checking every word in a different file may run out of memory. Is there an alternative way to do this?
How to write data to different files, since the output of the hadoop reduction phase is written to only one file. Is it possible to have a filter to reduce the phase for writing data to different output files?
Thank.
source
share