Creating multiple output files using Hadoop 0.20+

I am trying to output the results of my reducer to several files. The data results are all contained in one file, and the rest of the results are divided into categories in their respected files. I know from 0.18 that you can do this with MultipleOutputs, and it has not been removed. However, I am trying to make my application compatible with 0.20+. Existing multi-output functions still require JobConf (my application uses Job and Configuration). How can I generate multiple key-based outputs?

+4
source share
2 answers

Support for multiple outputs is not 0.20. You will need to use the old API.

It was added in 0.21, which is not currently issued as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this issue.

+9
source

You can do this in Hadoop 0.20, simply, as already mentioned, you must use the old API.

There is very crude code to do this at http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The received jar writes each record to a file named after its (disinfected) key.

+2
source

Source: https://habr.com/ru/post/1299989/


All Articles