How to set up an exclusive FIFO application queue in YARN?

I need to disable the parallel execution of YARN applications in the hadoop cluster. YARN now has default settings, so multiple jobs can run in parallel. I do not see any advantages of this because both jobs are slower.

I found this setting yarn.scheduler.capacity.maximum-applications , which limits the maximum number of applications, but affects both submitted and running applications (as indicated in the documents). I would like to leave the sent applications in the queue until the currently running application is complete. How can I do that?

+5
source share
2 answers

1) Change the scheduler to FairScheduler

Hadoop distributions use CapacityScheduler by default (Cloudera uses FairScheduler as the default Scheduler). Add this property to yarn-site.xml

 <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> 

2) Set default Queue

Fair Scheduler creates a queue for each user. I.E., if three different users submit tasks, then three separate queues will be created and resources will be distributed between the three queues. Disable it by adding this property to yarn-site.xml

 <property> <name>yarn.scheduler.fair.user-as-default-queue</name> <value>false</value> </property> 

This ensures that all jobs go to the same default queue.

3) Limit maximum applications

Now that the job queue is limited to one default queue. Limit the maximum number of applications to 1 that can be run in this queue.

Create a file called fair-scheduler.xml under $HADOOP_CONF_DIR and add these entries

 <allocations> <queueMaxAppsDefault>1</queueMaxAppsDefault> </allocations> 

Also add this property to yarn-site.xml

 <property> <name>yarn.scheduler.fair.allocation.file</name> <value>$HADOOP_CONF_DIR/fair-scheduler.xml</value> </property> 

Restart the YARN services after adding these properties.


When submitting multiple applications, the ACCEPTED application ACCEPTED first be considered the active application, and the rest will be queued as pending applications. These pending applications will remain in the ACCEPTED state until the RUNNING application is FINISHED . The active application will be allowed to use all available resources.

Link: Hadoop: Fair Scheduler

+2
source

According to my understanding of your question. I see the above line of code / setup may not help you. You can check the code below using the existing setup, this may give you some solution.

 <allocations> <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy> <queue name="<<Your Queue Name>>" <weight>40</weight> <schedulingPolicy>fifo</schedulingPolicy> </queue> <queue name=<<Your Queue Name>>> <weight>60</weight> <queue name=<<Your Queue Name>> /> <queue name=<<Your Queue Name>> /> </queue> <queuePlacementPolicy> <rule name="specified" create="false" /> <rule name="primaryGroup" create="false" /> <rule name="default" queue=<<Your Queue Name>> /> </queuePlacementPolicy> </allocations> 
+1
source

Source: https://habr.com/ru/post/1266079/


All Articles