I am looking for a complete link to command line parameters, environment variables, and configuration files, especially how they relate to each other and take precedence.
Thanks:)
Known Resources
Problem example
The offline documentation says the following:
The following configuration parameters can be transferred to the master and worker
...
-d DIR, --work-dir DIR Directory for use in logs with spaces and output logs (default: SPARK_HOME / work); only per employee
and later
SPARK_LOCAL_DIRS directory to use for scratches in Spark
SPARK_WORKER_DIR directory for launching applications, which will include both logs and the space with errors (default: SPARK_HOME / work).
As a beginner spark, I'm a little confused.
- What is the relationship between
SPARK_LOCAL_DIRS , SPARK_WORKER_DIR and -d . - What if I point them all to different values ββ- this takes precedence.
- Do the variables written in
$SPARK_HOME/conf/spark-env.sh precedence over the variable defined in the original spark of the shell / script?
The perfect solution
What I'm looking for is a single link that
- prioritizes various ways of specifying variables for a spark and
- lists all variables / parameters.
For example, something like this:
Varialble | Cmd-line | Default | Description SPARK_MASTER_PORT | -p --port | 8080 | Port for master to listen on SPARK_SLAVE_PORT | -p --port | random | Port for slave to listen on SPARK_WORKER_DIR | -d --dir | $SPARK_HOME/work | Used as default for worker data SPARK_LOCAL_DIRS | | $SPARK_WORKER_DIR| Scratch space for RDD's .... | .... | .... | ....
source share