Reduce map: ChainMapper and ChainReducer

Question

I need to split the Map Reduce jar file into two jobs to get two different output files: one from each reducer of two jobs.

I mean that the first task should produce an output file that will be the input for the second task in the chain.

I read something about ChainMapper and ChainReducer in version 0.20 (I am currently using 0.18): can they be good for my needs?

Can someone suggest me some links where you can find some examples to use these methods? Or maybe there is another way to solve my problem?

Thank,

Luke

+3

zero51 Jun 17 '10 at 7:41

2 answers

, - , . Oozie Cascading.

0

stholy 13 . '12 22:39

user381928 · Accepted Answer · 2010-07-03T16:50:07+0000

There are many ways to do this.

Cascading tasks
JobConf "job1" "input" inputdirectory "temp" . : JobClient.run(job1).
JobConf "job2" "temp" inputdirectory "output" . : JobClient.run(job2).

JobConf

JobConf , (1), , JobClient.run.

Job jobconfs :

Job job1=new Job(jobconf1); Job job2=new Job(jobconf2);

jobControl, :

JobControl jbcntrl=new JobControl("jbcntrl");
jbcntrl.addJob(job1);
jbcntrl.addJob(job2);
job2.addDependingJob(job1);
jbcntrl.run();

ChainMapper ChainReducer
, Map + | | Map *, ChainMapper ChainReducer, Hadoop 0.19 . , , .