Spark: multiple spark-submit in parallel

I have a general question about Apache Spark:

We have sparking scenarios that consume Kafka messages. Problem: they accidentally execute without a specific error ...

Some scripts do nothing while they work, when I run them manually, one of them does not work with this message:

SparkUI ERROR: Failed to bind SparkUI java.net.BindException: address already in use: "SparkUI" service failed after 16 attempts!

So I'm wondering if there could be a specific way to run scripts in parallel?

All of them are in one bank, and I start them using Supervisor. Spark is installed on Cloudera Manager 5.4 on yarn.

This is how I run the script:

sudo -u spark spark-submit --class org.soprism.kafka.connector.reader.TwitterPostsMessageWriter /home/soprism/sparkmigration/data-migration-assembly-1.0.jar --master yarn-cluster --deploy-mode client 

Thank you for your help!

Update: I changed the command and now run it (it stops working with a specific message):

 root@ns6512097 :~# sudo -u spark spark-submit --class org.soprism.kafka.connector.reader.TwitterPostsMessageWriter --master yarn --deploy-mode client /home/soprism/sparkmigration/data-migration-assembly-1.0.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/avro-tools-1.7.6-cdh5.4.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/09/28 16:14:21 INFO Remoting: Starting remoting 15/09/28 16:14:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// sparkDriver@ns6512097.ip-37-187-69.eu :52748] 15/09/28 16:14:21 INFO Remoting: Remoting now listens on addresses: [akka.tcp:// sparkDriver@ns6512097.ip-37-187-69.eu :52748] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/avro-tools-1.7.6-cdh5.4.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
+8
source share
3 answers

This problem occurs if several users try to start the spark session at the same time or the existing spark session is not closed.

There are two ways to fix this problem.

  • Start a new spark session on another port by following

     spark-submit --conf spark.ui.port=5051 <other arguments>`<br>`spark-shell --conf spark.ui.port=5051 
  • To find all spark sessions using ports 4041 to 4056 and kill the process using the kill command, the netstat and kill command can be used to find the process that takes up the port and kills the process, respectively. Here's the use:

     sudo netstat -tunalp | grep LISTEN| grep 4041 

The above command will output the result as shown below, the last column is the process identifier, in this case PID is 32028

 tcp 0 0 :::4040 :::* LISTEN 32028/java 

Once you know the process identifier (PID), you can kill the spark process (spark shell or spark-send) using the following command

 sudo kill -9 32028 
+5
source

You can also increase the value set for spark.port.maxRetries .

According to docs :

The maximum number of attempts when binding to the port before delivery. When a specific value is set to the port (not 0), each subsequent retry will increase the port used in the previous attempt by 1 before retrying. This, in fact, allows you to try a number of ports from the start port specified on port + maxRetries.

+2
source

The above answers are correct.

However, we should not try to change the values โ€‹โ€‹of spark.port.maxRetries , as this will increase the load on the same server, which in turn will reduce the performance of the cluster and can lead the node to deadlock situations. Download can be verified using the uptime command in your session.

The main reason for this problem is when you try to run all spark applications through the --deploy-mode client .

If you have distributed capacity in your cluster, it is best to run it with --deploy-mode cluster .

Thus, every time it starts the spark application on different nodes, therefore, problems with port binding on one node are eliminated.

Hope this helps. Hooray!

0
source

Source: https://habr.com/ru/post/1232437/


All Articles