YARN applications cannot start when specifying YARN node shortcuts

I try to use YARN node shortcuts to bind work nodes, but when I run applications on YARN (Spark or just YARN), these applications cannot start.

  • with Spark, when a --conf spark.yarn.am.nodeLabelExpression="my-label"task is running, the task cannot start (locked to Submitted application [...], see details below).

  • with the YARN application (for example distributedshell), when specified, the -node_label_expression my-labelapplication cannot start either

Here are the tests that I have done so far.

YARN node tagging

I use Google Dataproc to start my cluster (example: 4 employees, 2 on proactive nodes ). My goal is to make any YARN application wizard work on an unacceptable node , otherwise the node can be disconnected at any time, which makes it difficult to run the application.

I create a cluster using the YARN ( --properties) properties to include node labels:

gcloud dataproc clusters create \
    my-dataproc-cluster \
    --project [PROJECT_ID] \
    --zone [ZONE] \
    --master-machine-type n1-standard-1 \
    --master-boot-disk-size 10 \
    --num-workers 2 \
    --worker-machine-type n1-standard-1 \
    --worker-boot-disk-size 10 \
    --num-preemptible-workers 2 \
    --properties 'yarn:yarn.node-labels.enabled=true,yarn:yarn.node-labels.fs-store.root-dir=/system/yarn/node-labels'

Versions of packaged Hadoop and Spark:

  • Hadoop Version: 2.8.2
  • Spark: 2.2.0

After that, I create a label ( my-label) and update two unsafe workers using this label:

yarn rmadmin -addToClusterNodeLabels "my-label(exclusive=false)"
yarn rmadmin -replaceLabelsOnNode "\
    [WORKER_0_NAME].c.[PROJECT_ID].internal=my-label \
    [WORKER_1_NAME].c.[PROJECT_ID].internal=my-label"

I see the created shortcut in the YARN web interface:

Shortcut created by YARN

Spark

When I run a simple example ( SparkPi) without specifying node label information:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode client \
  /usr/lib/spark/examples/jars/spark-examples.jar \
  10

- YARN , <DEFAULT_PARTITION>.root.default.

, spark.yarn.am.nodeLabelExpression, Spark:

spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --conf spark.yarn.am.nodeLabelExpression="my-label" \
    /usr/lib/spark/examples/jars/spark-examples.jar \
    10

. YARN :

  • YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
  • : Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue Absolute capacity = 0.0 % ; Queue Absolute used capacity = 0.0 % ; Queue Absolute max capacity = 0.0 % ;

, , ( <DEFAULT_PARTITION, ), :

Spark mission accepted

Used Application Master Resources <memory:1024, vCores:1>, Max Application Master Resources - <memory:0, vCores:0>. , , , .

, :

yarn.scheduler.capacity.root.default.accessible-node-labels=my-label

:

yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-am-resource-percent
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.user-limit-factor
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.minimum-user-limit-percent

.

YARN

YARN :

hadoop jar \
    /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
    -shell_command "echo ok" \
    -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
    -queue default \
    -node_label_expression my-label

, :

INFO distributedshell.Client: Got application report from ASM for, appId=6, clientToAMToken=null, appDiagnostics= Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue Absolute capacity = 0.0 % ; Queue Absolute used capacity = 0.0 % ; Queue Absolute max capacity = 0.0 % ; , appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1520354045946, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, [...]

-node_label_expression my-label, <DEFAULT_PARTITION>.root.default .

  • - ?
  • , Dataproc? , , .
  • , ? "" Spark, , .

+4
1

Google ( , , PIT), , script Dataproc. , Dataproc, YARN. script capacity-scheduler.xml node (my-label):

<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
  <value>my-label</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.my-label.capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
  <value>my-label</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity</name>
  <value>100</value>
</property>

, script, " accessible-node-labels root ( ) root.default ( )". root.default - , . ​​ 100.

YARN (systemctl restart hadoop-yarn-resourcemanager.service).

, .

, , .

+1

Source: https://habr.com/ru/post/1694581/


All Articles