Markup! how does haop do it? Use a hash function? What is the default function?

Question

Markup! how does haop do it? Use a hash function? What is the default function?

Separation is the process of determining which instance of a gearbox will receive intermediate keys and values. Each cartographer must determine for all its output (key, value) the pairs that the gearbox will receive. It is necessary that for any key, no matter what instance of the mapping it generated, the target section is the same Problem: how does the haop do it? Use a hash function? What is the default function?

+6

hash hadoop partitioning

cherri_zj Aug 27 '13 at 16:23

source share

1 answer

tommy_o · Accepted Answer · 2013-08-27T16:45:48+0000

The default separator in Hadoop is the HashPartitioner , which has a method called getPartition . It takes key.hashCode() & Integer.MAX_VALUE and finds the module using the number of reduction tasks.

For example, if there are 10 reduction tasks, getPartition will return values from 0 to 9 for all keys.

Here is the code:

 public class HashPartitioner<K, V> extends Partitioner<K, V> { public int getPartition(K key, V value, int numReduceTasks) { return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; } }

To create a custom separator, you must extend the Partitioner , create the getPartition method, and then set your separator to the driver code ( job.setPartitionerClass(CustomPartitioner.class); ). This is especially useful if you are performing secondary sorting operations, for example.

Markup! how does haop do it? Use a hash function? What is the default function?

More articles: