Using KeyFieldBasedPartitioner and secondary sorting in Java Hadoop is similar to Hadoop thread

When using Hadoop streaming, the sorter and sorter can be installed and configured as follows:

hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \ -D mapreduce.map.output.key.field.separator=. \ -D stream.map.output.field.separator= \ -D stream.num.map.output.key.fields=2 \ -D num.key.fields.for.partition=2 \ -D mapreduce.job.output.key.comparator.class=org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedComparator \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner 

I would like to do the same in my Java main() method. Sorting can be done as follows:

 job.setSortComparatorClass(KeyFieldBasedComparator.class); KeyFieldBasedComparator.setKeyFieldComparatorOptions(job, "-k 1,2"); 

The .setKeyFieldPartitionerOptions method in the KeyFieldBasedPartitioner class, however, is not static :

 KeyFieldBasedPartitioner partitioner = new KeyFieldBasedPartitioner(); partitioner.setKeyFieldPartitionerOptions(job, "-k 1,2"); 

In the job object, I can set the class, however:

 job.setPartitionerClass(KeyFieldBasedPartitioner.class); 

How can I set the above parameters in this case? I could, of course, implement my own class of delimiters, but why is this an effort if there should be an easy way?

+5
source share

Source: https://habr.com/ru/post/1234434/


All Articles