Stop -all.sh in Spark sbin / folder does not stop all sub nodes

Hi I have a Spark cluster offline, i.e. I have one Spark-master process and three Spark-slave processes running on my laptop (Spark cluster on one machine).

Starting the wizard and slaves is just starting the scripts in Spark_Folder / sbin / start-master.sh and Spark_Folder / sbin / stop-master.sh.

However, when I run Spark_Folder / sbin / stop-all.sh, it stops only one master and one smear, since I have three subordinate starts, after running stop -all.sh I still have two subordinates.

I enter the script "stop-slaves.sh" and found below:

if [ "$SPARK_WORKER_INSTANCES" = "" ]; then "$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker 1 else for ((i=0; i<$SPARK_WORKER_INSTANCES; i++)); do "$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker $(( $i + 1 )) done fi 

This script seems to stop based on the number "SPARK_WORKER_INSTANCES". But what if I start the slave using a non-numeric name?

And any idea to turn off the entire spark cluster with one click? (I know that to run "pkill -f spark *" will work though)

Thank you very much.

+6
source share
4 answers

I just figure out the solution:

in " /usr/lib/spark/conf/spark-env.sh " add an additional parameter SPARK_WORKER_INSTANCES = 3 "(or the number of your subordinate instances), then run" / usr / lib / spark / sbin / stop -all.sh " , and all instances are stopped.

However, "stop -all.sh" only works for slaves that you started using for numbers, for example:

 /usr/lib/spark/sbin/start-slave.sh 1 spark://master-address:7077 /usr/lib/spark/sbin/start-slave.sh 2 spark://master-address:7077 /usr/lib/spark/sbin/start-slave.sh 3 spark://master-address:7077 

if you start slaves using arbitrary names, then "stop -all.sh" does not work, for example:

 /usr/lib/spark/sbin/start-slave.sh myWorer1 spark://master-address:7077 /usr/lib/spark/sbin/start-slave.sh myWorer2 spark://master-address:7077 /usr/lib/spark/sbin/start-slave.sh myWorer3 spark://master-address:7077 
+6
source

Use jps command in terminal

the output will be like this:

 5417 NameNode 8480 Jps 13311 Elasticsearch 5602 DataNode 5134 Worker 5849 SecondaryNameNode 4905 Master 

Kill the master and worker process.

like this

 kill 5134 kill 4905 

The master and followers will both be stopped.

If they restarted again, it means that you shut down your system with the master and slaves stopped ... you need to reboot the system.

+3
source

I had a similar problem. That I just had to ssh on 8 machines and use kill -9 for all the relevant processes. I used ps -ef | grep spark to find process identifiers. It is tiring, but it worked.

0
source

kill -9 $(jps -l | grep spark | awk -F ' ' '{print $1}')

0
source

Source: https://habr.com/ru/post/985380/


All Articles