Does this mean that the accumulated memory of all running processes, one node will not exceed this cap?
Yes, if you use Spark in YARN-client mode, otherwise it limits only the JVM.
However, with this parameter, YARN is a tricky thing. YARN limits the accumulated memory to spark.executor.memory , and Spark uses the same limit for the JVM executor, in Python there is no memory for such restrictions, so I had to disable YARN restrictions.
Regarding an honest answer to your question according to your stand-alone Spark configuration: No, spark.executor.memory does not limit Python memory allocation.
BTW, setting the SparkConf option does not affect Spark standalone artists, as they are already taken. Learn more about conf/spark-defaults.conf
If so, should this number be set to the maximum possible number?
You must set it to a balanced number. There is a specific feature of the JVM: it will ultimately spark.executor.memory and never make it free. You cannot set spark.executor.memory to TOTAL_RAM / EXECUTORS_COUNT since it will take up all the memory for Java.
In my environment, I use spark.executor.memory=(TOTAL_RAM / EXECUTORS_COUNT) / 1.5 , which means that 0.6 * spark.executor.memory will be used by the Spark cache, 0.4 * spark.executor.memory is the JVM executor and 0.5 * spark.executor.memory - using Python.
You can also configure spark.storage.memoryFraction , which defaults to 0.6 .
source share