Is it possible to run multiple map tasks in one JVM?

Question

Is it possible to run multiple map tasks in one JVM?

Do I want to share large in-memory static data (RAM lucene index) for my map tasks in Hadoop? Is there a way for multiple map / reduce tasks to share a single JVM?

+3

jvm hadoop lucene hadoop-plugins

yura Feb 02 '11 at 17:29

source share

4 answers

$HADOOP_HOME/conf/mapred-site.xml next

<property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>#</value>
</property>

# , , JVM ( - 1), -1 .

+4

Nija 02 . '11 18:10

JVM, , : http://chasebradford.wordpress.com/2011/02/05/distributed-cache-static-objects-and-fast-setup/

, , , . , JVM.

0

Chase 21 . '11 17:04

, (Hadoop) .

Map Reduce. , , , Hadoop . , , JVM. JVM.

I am currently working on a prototype that can extend the work of one JVM to several cores (in fact, you just need one JVM to use several cores). This way you can reduce duplication in memory data structures without the cost of CPU usage. The next step for me is to develop a version of Hadoop that can run several Map tasks in one JVM, which is exactly what you are asking for.

There is an interesting post here https://issues.apache.org/jira/browse/MAPREDUCE-2123

0

Yunming zhang Sep 27 '13 at 2:30

source share

Joe stein · Accepted Answer · 2011-02-02T18:09:13+0000

Jobs can include multi-user JVM tasks by setting the mapred.job.reuse.jvm.num.tasks job configuration. If the value is 1 (default), then the JVMs are not reused (i.e. 1 task per JVM). If it is -1, the number of tasks that the JVM can run (one job) is not limited. You can also specify a value greater than 1 using api.

Is it possible to run multiple map tasks in one JVM?

More articles: