How do I configure Apache Spark random working ports for dense firewalls?

I use Apache Spark to run machine learning algorithms and other big data tasks. Previously, I used the stand-alone spark cluster, working with the spark master and working on the same machine. Now I have added some working machines and because of the dense firewall I need to edit the random port of the worker. Can someone help how to change random spark ports and tell me which configuration file to edit? I read the documentation for the spark program, and it says that spark-defaults.conf should be configured, but I don’t know how to configure this file to specifically modify random spark ports.

+5
source share
2 answers

here https://spark.apache.org/docs/latest/configuration.html#networking

In the Network section, you can see that some of the ports are random by default. You can set them to your choice, as shown below:

 val conf = new SparkConf() .setMaster(master) .setAppName("namexxx") .set("spark.driver.port", "51810") .set("spark.fileserver.port", "51811") .set("spark.broadcast.port", "51812") .set("spark.replClassServer.port", "51813") .set("spark.blockManager.port", "51814") .set("spark.executor.port", "51815") 
+6
source

Update for Spark 2.x


Some libraries have been rewritten from scratch, and many obsolete *.port properties *.port now deprecated (see SPARK-10997 / SPARK-20605 / SPARK-12588 / SPARK-17678 / etc)

For Spark 2.1, for example, the port ranges on which the driver will listen to the traffic executor,

  • between spark.driver.port and spark.driver.port + spark.port.maxRetries
  • between spark.driver.blockManager.port and spark.driver.blockManager.port + spark.port.maxRetries

And the range of ports on which artists will listen to driver traffic and / or traffic from other artists,

  • between spark.blockManager.port and spark.blockManager.port + spark.port.maxRetries

The "maxRetries" property allows you to run multiple Spark jobs in parallel; if the base port is already in use, then the new task will try to use the next, etc., if the entire range is no longer in use.

A source:
https://spark.apache.org/docs/2.1.1/configuration.html#networking
https://spark.apache.org/docs/2.1.1/security.html under "Port Settings"

+3
source

Source: https://habr.com/ru/post/1210207/


All Articles