Reduce map: ChainMapper and ChainReducer

I need to split the Map Reduce jar file into two jobs to get two different output files: one from each reducer of two jobs.

I mean that the first task should produce an output file that will be the input for the second task in the chain.

I read something about ChainMapper and ChainReducer in version 0.20 (I am currently using 0.18): can they be good for my needs?

Can someone suggest me some links where you can find some examples to use these methods? Or maybe there is another way to solve my problem?

Thank,

Luke

+3
source share
2 answers

There are many ways to do this.

  • Cascading tasks

    JobConf "job1" "input" inputdirectory "temp" . : JobClient.run(job1).

    JobConf "job2" "temp" inputdirectory "output" . : JobClient.run(job2).

  • JobConf

    JobConf , (1), , JobClient.run.

    Job jobconfs :

    Job job1=new Job(jobconf1); Job job2=new Job(jobconf2);

    jobControl, :

    JobControl jbcntrl=new JobControl("jbcntrl");
    jbcntrl.addJob(job1);
    jbcntrl.addJob(job2);
    job2.addDependingJob(job1);
    jbcntrl.run();
    
  • ChainMapper ChainReducer

    , Map + | | Map *, ChainMapper ChainReducer, Hadoop 0.19 . , , .

+11

, - , . Oozie Cascading.

0

Source: https://habr.com/ru/post/1750444/


All Articles