How to use the MultithreadedMapper class in Hadoop Mapreduce?

I came across the MultiithreadedMapper class in the new version of Hadoop, and the documentation says that it can be used instead of the usual (single-threaded) mapping class. But I did not find any demo to use this new class. In addition, I would be happier to use the setNumberOfThreads () method. Any sample code to use this?

Thanks in advance

+4
source share
1 answer

A small piece of code for you:

Configuration conf = new Configuration(); Job job = new Job(conf); job.setMapperClass(MultithreadedMapper.class); conf.set("mapred.map.multithreadedrunner.class", WebGraphMapper.class.getCanonicalName()); conf.set("mapred.map.multithreadedrunner.threads", "8"); job.setJarByClass(WebGraphMapper.class); // rest ommitted job.waitForCompletion(true); 

I think this is pretty self-evident. You use a multi-threaded cartographer as the main class, and then you configure which class (your real cartographer) it should execute. There are also these handy static methods that make this configuration for you. The call might look like this:

 MultithreadedMapper.setMapperClass(job, WebGraphMapper.class); MultithreadedMapper.setNumberOfThreads(job, 8); 
+8
source

Source: https://habr.com/ru/post/1402118/


All Articles