The default separator in Hadoop is the HashPartitioner , which has a method called getPartition . It takes key.hashCode() & Integer.MAX_VALUE and finds the module using the number of reduction tasks.
For example, if there are 10 reduction tasks, getPartition will return values from 0 to 9 for all keys.
Here is the code:
public class HashPartitioner<K, V> extends Partitioner<K, V> { public int getPartition(K key, V value, int numReduceTasks) { return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; } }
To create a custom separator, you must extend the Partitioner , create the getPartition method, and then set your separator to the driver code ( job.setPartitionerClass(CustomPartitioner.class); ). This is especially useful if you are performing secondary sorting operations, for example.
source share