1) Change the scheduler to FairScheduler
Hadoop distributions use CapacityScheduler by default (Cloudera uses FairScheduler as the default Scheduler). Add this property to yarn-site.xml
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property>
2) Set default Queue
Fair Scheduler creates a queue for each user. I.E., if three different users submit tasks, then three separate queues will be created and resources will be distributed between the three queues. Disable it by adding this property to yarn-site.xml
<property> <name>yarn.scheduler.fair.user-as-default-queue</name> <value>false</value> </property>
This ensures that all jobs go to the same default queue.
3) Limit maximum applications
Now that the job queue is limited to one default queue. Limit the maximum number of applications to 1 that can be run in this queue.
Create a file called fair-scheduler.xml under $HADOOP_CONF_DIR and add these entries
<allocations> <queueMaxAppsDefault>1</queueMaxAppsDefault> </allocations>
Also add this property to yarn-site.xml
<property> <name>yarn.scheduler.fair.allocation.file</name> <value>$HADOOP_CONF_DIR/fair-scheduler.xml</value> </property>
Restart the YARN services after adding these properties.
When submitting multiple applications, the ACCEPTED application ACCEPTED first be considered the active application, and the rest will be queued as pending applications. These pending applications will remain in the ACCEPTED state until the RUNNING application is FINISHED . The active application will be allowed to use all available resources.
Link: Hadoop: Fair Scheduler