Sort by value in Hadoop from file

I have a file containing String, then a space, and then a number in each line.

Example:

Line1: Word 2 Line2 : Word1 8 Line3: Word2 1 

I need to sort the number in descending order, and then put the result in a file that assigns the rank to the numbers. Therefore, my output should be a file containing the following format:

 Line1: Word1 8 1 Line2: Word 2 2 Line3: Word2 1 3 

Does anyone have an idea how I can do this in Hadoop? I am using java with Hadoop.

+4
source share
3 answers

I developed a solution to this problem. It was just really.

To sort by value, you must use

 setOutputValueGroupingComparator(Class) 

To sort in descending order you need to use setSortComparatorClass(LongWritable.DecreasingComparator.class);

For ranking you need to use Counter class , getCounter and increment .

+2
source

You can arrange the calculation of the map / abbreviation as follows:

Card input: default

Card Output: "key: number, value: word"

_ key sorting phase _

Here you need to override the default sorter to sort in descending order.

Reduction - 1 gear

Reduce input: "key: number, value: word"

Reduce output: "key: word, value: (number, rank)"

Keep a global counter. For each key-value pair, add a rank by increasing the counter.

Change Here is the code cut off from a custom descendant sorter:

 public static class IntComparator extends WritableComparator { public IntComparator() { super(IntWritable.class); } @Override public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt(); Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt(); return v1.compareTo(v2) * (-1); } } 

Remember to set it as a comparator for your work:

 job.setSortComparatorClass(IntComparator.class); 
+8
source

Hadoop Streaming - Hadoop 1.0.x

According to this , after

 bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.*.jar 
  • you add a comparator

    -D mapred.output.key.comparator.class = org.apache.hadoop.mapred.lib.KeyFieldBasedComparator

  • indicate the type of sorting you want

    -D mapred.text.key.comparator.options = - [options]

where [options] are similar to Unix sort . Here are some examples

Reverse order

 -D mapred.text.key.comparator.options=-r 

Sort by numeric values

 -D mapred.text.key.comparator.options=-n 

Sort by value or in any other field

 -D mapred.text.key.comparator.options=-kx,y 

with the -k flag, you specify the sort key. The x, y parameters define this key. So, if you have a string with more than one token, you can choose which token will be the sort key or which combination of tokens will be the sort key. See the links for more details and examples.

+5
source

Source: https://habr.com/ru/post/1383346/


All Articles