Markup! how does haop do it? Use a hash function? What is the default function?

Separation is the process of determining which instance of a gearbox will receive intermediate keys and values. Each cartographer must determine for all its output (key, value) the pairs that the gearbox will receive. It is necessary that for any key, no matter what instance of the mapping it generated, the target section is the same Problem: how does the haop do it? Use a hash function? What is the default function?

+6
source share
1 answer

The default separator in Hadoop is the HashPartitioner , which has a method called getPartition . It takes key.hashCode() & Integer.MAX_VALUE and finds the module using the number of reduction tasks.

For example, if there are 10 reduction tasks, getPartition will return values ​​from 0 to 9 for all keys.

Here is the code:

 public class HashPartitioner<K, V> extends Partitioner<K, V> { public int getPartition(K key, V value, int numReduceTasks) { return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; } } 

To create a custom separator, you must extend the Partitioner , create the getPartition method, and then set your separator to the driver code ( job.setPartitionerClass(CustomPartitioner.class); ). This is especially useful if you are performing secondary sorting operations, for example.

+16
source

Source: https://habr.com/ru/post/952606/


All Articles