How to use the MultithreadedMapper class in Hadoop Mapreduce?

Question

How to use the MultithreadedMapper class in Hadoop Mapreduce?

I came across the MultiithreadedMapper class in the new version of Hadoop, and the documentation says that it can be used instead of the usual (single-threaded) mapping class. But I did not find any demo to use this new class. In addition, I would be happier to use the setNumberOfThreads () method. Any sample code to use this?

Thanks in advance

+4

java mapreduce hadoop

Harsh Mar 18 '12 at 17:26

source share

1 answer

Thomas jungblut · Accepted Answer · 2012-03-18T19:27:11+0000

A small piece of code for you:

Configuration conf = new Configuration(); Job job = new Job(conf); job.setMapperClass(MultithreadedMapper.class); conf.set("mapred.map.multithreadedrunner.class", WebGraphMapper.class.getCanonicalName()); conf.set("mapred.map.multithreadedrunner.threads", "8"); job.setJarByClass(WebGraphMapper.class); // rest ommitted job.waitForCompletion(true);

I think this is pretty self-evident. You use a multi-threaded cartographer as the main class, and then you configure which class (your real cartographer) it should execute. There are also these handy static methods that make this configuration for you. The call might look like this:

 MultithreadedMapper.setMapperClass(job, WebGraphMapper.class); MultithreadedMapper.setNumberOfThreads(job, 8);

How to use the MultithreadedMapper class in Hadoop Mapreduce?

More articles: