Is it possible to run multiple map tasks in one JVM?

Do I want to share large in-memory static data (RAM lucene index) for my map tasks in Hadoop? Is there a way for multiple map / reduce tasks to share a single JVM?

+3
source share
4 answers

Jobs can include multi-user JVM tasks by setting the mapred.job.reuse.jvm.num.tasks job configuration. If the value is 1 (default), then the JVMs are not reused (i.e. 1 task per JVM). If it is -1, the number of tasks that the JVM can run (one job) is not limited. You can also specify a value greater than 1 using api.

+9
source

$HADOOP_HOME/conf/mapred-site.xml next

<property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>#</value>
</property>

# , , JVM ( - 1), -1 .

+4

, (Hadoop) .

Map Reduce. , , , Hadoop . , , JVM. JVM.

I am currently working on a prototype that can extend the work of one JVM to several cores (in fact, you just need one JVM to use several cores). This way you can reduce duplication in memory data structures without the cost of CPU usage. The next step for me is to develop a version of Hadoop that can run several Map tasks in one JVM, which is exactly what you are asking for.

There is an interesting post here https://issues.apache.org/jira/browse/MAPREDUCE-2123

0
source

Source: https://habr.com/ru/post/1789559/


All Articles