Spark Standalone Mode multiple shell sessions (applications)

Question

Spark Standalone Mode multiple shell sessions (applications)

In Spark 1.0.0 Offline with multiple work nodes, I try to run the Spark shell from two different computers (the same Linux user).

The documentation says: "By default, applications sent to an offline cluster will run in FIFO (first-in-first-out) order, and each application will try to use all available nodes."

The number of cores per employee is set to 4 with 8 available (via SPARK_JAVA_OPTS = "- Dspark.cores.max = 4"). Memory is also limited, so there should be enough for both.

However, looking at WebUI Spark Master, a shell application that was launched later will always remain in the “WAITING” state until the first one is displayed. The number of cores assigned to it is 0, Memory for node 10G (same as already running)

Is there a way to run both shells at the same time without using Mesos?

+4

apache-spark

helm Jun 23 '14 at 18:57

source share

3 answers

David · Answer 1 · 2015-01-29T01:22:49+0000

, . , , . 5 , - = 10G ( , ), 2- 10G , , shell . 5G , .

, , - , . , .

huitseeker · Answer 2 · 2015-01-13T17:06:00+0000

, . , spark-shell . , spark-shell, , .

spark-shell, spark-submit , . , spark-submit , spark-shell , .

( ) spark-submit.

Tobber · Answer 3 · 2015-01-21T15:37:37+0000

total-executor-cores. , 16 , :

bin/spark-shell --total-executor-cores 16 --master spark://$MASTER:7077

In this case, each shell will use only 16 cores, so there can be two shells on your 32-core cluster. Then they can be started at the same time, but never use more than 16 cores each:(

This decision is far from ideal, I know. You depend on users to limit themselves, to close their shells, and resources are lost when the user does not run the code. I created a request to fix this on JIRA , for which you can vote.

Spark Standalone Mode multiple shell sessions (applications)

More articles: