Creating multiple output files using Hadoop 0.20+

Question

Creating multiple output files using Hadoop 0.20+

I am trying to output the results of my reducer to several files. The data results are all contained in one file, and the rest of the results are divided into categories in their respected files. I know from 0.18 that you can do this with MultipleOutputs, and it has not been removed. However, I am trying to make my application compatible with 0.20+. Existing multi-output functions still require JobConf (my application uses Job and Configuration). How can I generate multiple key-based outputs?

+4

java file-io hadoop

monksy Feb 01 '10 at 21:08

source share

2 answers

You can do this in Hadoop 0.20, simply, as already mentioned, you must use the old API.

There is very crude code to do this at http://github.com/orngejaket/Info_Moist_1_Splicer/tree/master/src/contrib/streaming/src/java/org/infochimps/hadoop/mapred/lib/

The received jar writes each record to a file named after its (disinfected) key.

+2

mrflip Feb 03 '10 at 1:06

source share

Binary nerd · Accepted Answer · 2010-02-01T23:41:55+0000

Support for multiple outputs is not 0.20. You will need to use the old API.

It was added in 0.21, which is not currently issued as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this issue.

Creating multiple output files using Hadoop 0.20+

More articles: