Writing to multiple folders in hadoop?

I am trying to separate my output from gearbox to different folders.

My dirver has the following code: FileOutputFormat.setOutputPath(job, new Path(output)); //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class); MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class); MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class); And then my reducer has the following code: mos.write("foo",NullWritable.get(),new Text(jsn.toString())); mos.write("bar", key,NullWritable.get()); mos.write("foobar", key,NullWritable.get()); But in the output, I see: output/foo-r-0001 output/foo-r-0002 output/foobar-r-0001 output/bar-r-0001 But what I am trying is : output/foo/part-r-0001 output/foo/part-r-0002 output/bar/part-r-0001 

exit / foobar / part-t-0001

How can I do it? Thanks

+6
source share
1 answer

If you mean these MultipleOutputs , the easiest way would be to do one of the following from you -

In your case, this is point 1, so please change the following -

 mos.write("foo",NullWritable.get(),new Text(jsn.toString())); mos.write("bar", key,NullWritable.get()); mos.write("foobar", key,NullWritable.get()); 

in

 mos.write("foo",NullWritable.get(),new Text(jsn.toString()), "foo/part"); mos.write("bar", key,NullWritable.get(), "bar/part"); mos.write("foobar", key,NullWritable.get(), "foobar/part"); 

Where "foo / part", "bar / part" and "foobar / part" matches baseOutputPath. Consequently, the directories foo, bar and foobar will be created and part-r-xxxxx inside these files.

You can also try step 2 above, which does not really need a named output.

Please return to me for clarification if necessary.

+4
source

Source: https://habr.com/ru/post/955814/


All Articles