Hadoop 0.2: How to read exits from TextOutputFormat?

My reducer class produces output using TextOutputFormat (by default, OutputFormat is set by Job). I like to use these results after completing the MapReduce job to aggregate the outputs. In addition to this, I like to write aggregated information using TextInputFormat so that the result of this process can be used by the next iteration of the MapReduce task. Can someone give me an example on how to write and read using TextFormat? By the way, the reason I use TextFormat and not Sequence is interoperability. The outputs must be consumed by any software.

+3
source share
1 answer

Do not exclude only sequence files; they allow you to quickly and easily map MapReduce jobs, and you can use "hasoop fs -text filename" to output them in text format if you need it for other things.

But back to the original question: to use TextInputFormat, set it as the input format in the task, and then use TextInputFormat.setInputPathsit to indicate which files it should use as input. The key to your mapper should be LongWritable, and the value should be text.

TextOutputFormat , , TextOuputFormat.setOutputPath . ( , ) NullWritable , , , ( , "mapred.textoutputformat.separator" ).

+5

Source: https://habr.com/ru/post/1740424/


All Articles