When using Hadoop streaming, the sorter and sorter can be installed and configured as follows:
hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \ -D mapreduce.map.output.key.field.separator=. \ -D stream.map.output.field.separator= \ -D stream.num.map.output.key.fields=2 \ -D num.key.fields.for.partition=2 \ -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
I would like to do the same in my Java main() method. Sorting can be done as follows:
job.setSortComparatorClass(KeyFieldBasedComparator.class); KeyFieldBasedComparator.setKeyFieldComparatorOptions(job, "-k 1,2");
The .setKeyFieldPartitionerOptions method in the KeyFieldBasedPartitioner class, however, is not static :
KeyFieldBasedPartitioner partitioner = new KeyFieldBasedPartitioner(); partitioner.setKeyFieldPartitionerOptions(job, "-k 1,2");
In the job object, I can set the class, however:
job.setPartitionerClass(KeyFieldBasedPartitioner.class);
How can I set the above parameters in this case? I could, of course, implement my own class of delimiters, but why is this an effort if there should be an easy way?
source share