It is important to understand that you cannot specify the number of map tasks. Ultimately, the number of map tasks is defined as the number of input sections , which depends on your implementation of InputFormat . Let's say you have 1 TB of input, and the HDFS block size is 64 MB, so Hadoop will calculate about 16 thousand tasks of the card, and from there, if you specify a manual value of less than 16 thousand, it will be ignored, but more than 16 thousand ., And it will be.
To get through the command line, the easiest way is to use the built-in GenericOptionsParser class (described here), which will directly analyze the general command line. Hadoop-related arguments, like what you are trying to do. The good thing is that it allows you to pass almost any Hadoop parameters you want, without having to write additional code later. You would do something like this:
public static void main(String[] args) { Configuration conf = new Configuration(); String extraArgs[] = new GenericOptionsParser(conf, args).getRemainingArgs();
Now the properties that need to be determined to change the number of cartographers and reducers are mapred.map.tasks and mapred.reduce.tasks , mapred.reduce.tasks , so you can simply start your task using these parameters:
-D mapred.map.tasks=42 -D mapred.reduce.tasks
and they will be directly analyzed with your GenericOptionParser and automatically populate your Configuration object. Note that there is a space between -D and properties, this is important, otherwise it will be interpreted as JVM parameters.
Here is a good link if you want to know more about it.
source share