How to specify uberization of a Hive request in Hadoop2?

Hadoop 2 introduces a new uberization feature. For example, this link says:

Uberization is the ability to run all MapReduce jobs in the JMM ApplicationMaster if the job is small enough. That way, you avoid the overhead of container requests from the ResourceManager and ask NodeManager to run (supposedly small) tasks.

What can I not say, is this happening magically behind the curtains or is there something that needs to be done for this? For example, when executing the catch request, is there a parameter (or a hint) for this to happen? Can you point out the threshold of being "small enough"?

In addition, I had problems finding a lot in this concept - does this happen under a different name?

+4
source share
2 answers

I found details in the YARN Book from Arun Murthy about "uber jobs":

Uber’s work occurs when several cartographers and reducers are combined to use one container. In the Uber Jobs configuration found in the mapred-site.xml Options presented in Table 9.3.

Here is table 9.3:

|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.enable     | Whether to enable the small-jobs "ubertask" optimization,  |
|                                   | which runs "sufficiently small" jobs sequentially within a |
|                                   | single JVM. "Small" is defined by the maxmaps, maxreduces, |
|                                   | and maxbytes settings. Users may override this value.      |
|                                   | Default = false.                                           |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxmaps    | Threshold for the number of maps beyond which the job is   |
|                                   | considered too big for the ubertasking optimization.       |
|                                   | Users may override this value, but only downward.          |
|                                   | Default = 9.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxreduces | Threshold for the number of reduces beyond which           |
|                                   | the job is considered too big for the ubertasking          |
|                                   | optimization. Currently the code cannot support more       |
|                                   | than one reduce and will ignore larger values. (Zero is    |
|                                   | a valid maximum, however.) Users may override this         |
|                                   | value, but only downward.                                  |
|                                   | Default = 1.                                               |
|-----------------------------------+------------------------------------------------------------|
| mapreduce.job.ubertask.maxbytes   | Threshold for the number of input bytes beyond             |
|                                   | which the job is considered too big for the uber-          |
|                                   | tasking optimization. If no value is specified,            |
|                                   | `dfs.block.size` is used as a default. Be sure to          |
|                                   | specify a default value in `mapred-site.xml` if the        |
|                                   | underlying file system is not HDFS. Users may override     |
|                                   | this value, but only downward.                             |
|                                   | Default = HDFS block size.                                 |
|-----------------------------------+------------------------------------------------------------|

I don't know yet if there is an oral way to install this or if you just use the above with Hive.

+4
source

Uber , Application Master. , , , MAX Mappers <= 9; MAX Reducers <= 1, (RM) JVM.

SET mapreduce.job.ubertask.enable = TRUE;

, Uberised job , , , , (RM) RM, Application master, .

+1

Source: https://habr.com/ru/post/1543631/


All Articles