Pig: control number of cartographers

I can control the number of gears using the PARALLEL clause in the statements that lead to gears.

I want to control the number of cartographers. The data source has already been created, and I cannot reduce the amount of detail in the data source. Is it possible to control the number of cards generated by pig applications? Can I keep the bottom and top cover from the number of cards that were created? Is it right to control?

I tried using pig.maxCombinedSplitSize, mapred.min.split.size, mapred.tasktracker.map.tasks.maximum, etc., but they don't seem to help.

Can someone please help me understand how to control the number of cards and maybe share a working example?

+4
source share
2 answers

There is a simple rule of thumb for the number of cartographers: there are as many cartographers as there are file sections. File splitting depends on the size of the block into which you split HDFS files (64 MB, 128 MB, 256 MB depending on your configuration), note that FileInput formats are considered, but can determine their own behavior.

Partitions are important because they are tied to the physical location of the data in the cluster; Hadoop brings the code into the data, not the data into the code.

, (64 , 128 , 256 ), , , , , . pig.maxCombinedSplitSize, Mapper, . , . , Mappers, . , .

, Mappers.

+7

mapred.map.tasks . / . , , , .

0

Source: https://habr.com/ru/post/1544710/


All Articles