How to get applicationId of a Spark application deployed to YARN in Scala?

I use the following Scala code (as a custom spark-submitwrapper) to send a Spark application to a YARN cluster:

val result = Seq(spark_submit_script_here).!!

All that I have at the time of sending is the spark-submitSpark application bank (no SparkContext). I would like to grab applicationIdout result, but it is empty.

I can see that applicationId messages and the rest of the yarn messages appear on my command line:

INFO yarn.Client: application report for application_1450268755662_0110

How can I read it in code and get applicationId?

+4
source share
2 answers

Spark issue 5439, SparkContext.applicationId, stderr. , spark-submit script/object, , stderr .

+4

Python, :

    cmd_list = [{
            'cmd': '/usr/bin/spark-submit --name %s --master yarn --deploy-mode cluster '
                   '--executor-memory %s --executor-cores %s --num-executors %s '
                   '--class %s %s %s'
                   % (
                       app_name,
                       config.SJ_EXECUTOR_MEMORY,
                       config.SJ_EXECUTOR_CORES,
                       config.SJ_NUM_OF_EXECUTORS,
                       config.PRODUCT_SNAPSHOT_SKU_PRESTO_CLASS,
                       config.SPARK_JAR_LOCATION,
                       config.SPARK_LOGGING_ENABLED
                   ),
            'cwd': config.WORK_DIR
        }]
cmd_output = subprocess.run(cmd_obj['cmd'], shell=True, check=True, cwd=cwd, stderr=subprocess.PIPE)
cmd_output = cmd_output.stderr.decode("utf-8")
yarn_application_ids = re.findall(r"application_\d{13}_\d{4}", cmd_output)
                if len(yarn_application_ids):
                    yarn_application_id = yarn_application_ids[0]
                    yarn_command = "yarn logs -applicationId " + yarn_application_id
0

Source: https://habr.com/ru/post/1622706/


All Articles