Can I print the output of Mapper and gearbox for a single job in Hadoop Mapreduce

For a given MR job, I need to create two output files. One file should be the output of Mapper Another file should be the output of Reducer (which is just an aggregation above Mapper).

Can I write both display and gear files in the same job?

EDIT:

In task 1 (phase Mapper only), the Output contains 20 fields in one line, which should be written to hdfs (file1). In Job 2 (Mapper n Converter), Mapper takes input from Job1 output, removes several fields for entering the standard format (10 fields in total), and passes it to the reducer, which writes file2.

I need both file1 and file2 in hdfs ... Now I doubt that in Job1 mapper I can write data to hdfs as file1, and then change the same data and transfer it to the reducer.

PS: At the moment I am using 2 jobs with a chain mechanism. The first task contains only cartography, in seconds - a cartographer and gearbox.

+5
source share
1 answer

Perhaps you can use the MultipleOutputs class to define a single output for the mapper and (optionally) for the reducer. For the matching device, you will have to write things twice: once for the output file (using MultipleOutputs) and once for emitting pairs to the reducer (as usual).

Then you can also use the ChainMapper class to define the following workflow in one job:

Mapper 1 (file 1) → Mapper 2 → Reducer (file 2)

Honestly, I have never used this logic, but you can try. Good luck

+2
source

Source: https://habr.com/ru/post/1239248/


All Articles