Using KeyFieldBasedPartitioner and secondary sorting in Java Hadoop is similar to Hadoop thread

Question

Using KeyFieldBasedPartitioner and secondary sorting in Java Hadoop is similar to Hadoop thread

When using Hadoop streaming, the sorter and sorter can be installed and configured as follows:

hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \ -D mapreduce.map.output.key.field.separator=. \ -D stream.map.output.field.separator= \ -D stream.num.map.output.key.fields=2 \ -D num.key.fields.for.partition=2 \ -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner

I would like to do the same in my Java main() method. Sorting can be done as follows:

 job.setSortComparatorClass(KeyFieldBasedComparator.class); KeyFieldBasedComparator.setKeyFieldComparatorOptions(job, "-k 1,2");

The .setKeyFieldPartitionerOptions method in the KeyFieldBasedPartitioner class, however, is not static :

 KeyFieldBasedPartitioner partitioner = new KeyFieldBasedPartitioner(); partitioner.setKeyFieldPartitionerOptions(job, "-k 1,2");

In the job object, I can set the class, however:

 job.setPartitionerClass(KeyFieldBasedPartitioner.class);

How can I set the above parameters in this case? I could, of course, implement my own class of delimiters, but why is this an effort if there should be an easy way?

+5

java hadoop partitioner

Irondwarf Oct 24 '15 at 15:49

source share

No one has answered this question yet.

See related questions:

23498

Why is processing a sorted array faster than processing an unsorted array?

6170

Is Java pass-by-reference or pass-by-value?

3799

How do I read / convert an InputStream to a string in Java?