How to sort columns with requirements below

I have 3 columns

a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

Expected Result

a 10 x
b 20 w
c 12 z

i.e. I need to sort column 2, but without changing the order of column 1 then grep the row with the maximum value in the list based on the 2nd column

+4
source share
4 answers

Two approaches (choose one of them):

1 ) sorting + uniq "trick":

sort -k1,1 -k2,2rn file | uniq -w1
  • -k1,1 - sort rows by the 1st field in the 1st phase

  • -k2,2rn - sort rows by second field numerically in reverse order

  • uniq -w1- displays unique strings comparing no more than a 1character in strings (can be changed -w<number>)

Output:

a 10 x
b 20 w
c 12 z

2) GNU datamash:

datamash -Wsf -g1 max 2 <file | cut -f1-3

:

a   10  x
b   20  w
c   12  z
+4

$ cat infile
a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

$ awk -F'[[:blank:]]' '{f=($1 in b)}f && b[$1]<$2 || !f{a[$1]=$0;b[$1]=$2}END{for(i in a)print a[i]}' infile
a 10 x
b 20 w
c 12 z

awk -F'[[:blank:]]' '
                     {
                       f=($1 in b)
                     }
                     f && b[$1]<$2 || !f{
                        a[$1]=$0;
                        b[$1]=$2
                     }
                  END{
                        for(i in a)
                            print a[i]
                     }
                    ' infile

  • -F'[[:blank:]]' -

  • f=($1 in b) - f (true=1/false=0), , index/array ($1) b

  • f && b[$1]<$2 || !f, f , (b[$1]) (< $2) // (||) !f ,

  • a[$1]=$0; (a) ($1) // ($0)

  • b[$1]=$2 (b) ($1) ($2)

  • END { for(i in a) print a[i] } END a .

: , -F'...' ,

+1

UNIX sort awk:

sort -k1,1 -k2,2nr file | awk '!seen[$1]++'

vim:

:!%sort -k1,1 -k2,2nr | awk '\!seen[$1]++'

:

sort , 1, 2. :

a 10 x
a 03 w
a 01 y
b 20 w
b 01 x
c 12 z
c 10 y
c 02 w

awk script, seen, 1. !, 1 , :

a 10 x  <-- print
a 03 w
a 01 y
b 20 w  <-- print
b 01 x
c 12 z  <-- print
c 10 y
c 02 w
+1

.

awk '
{
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;
  a[$1]=a[$1]>$2?a[$1]:$2;
}
END{
  for(i in a){
     print b[i]
}
}
'   Input_file

:

awk '
{                                    ##Starting block here.
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;##creating an array named b whose index is $1, then checking if array a with index $1 value is greater than $2 or not, if yes then assign b[$1] to b[$1] else change it to current line. This is to make sure always we should get the line whose $2 value is greater than its previous value with respect to $1.
  a[$1]=a[$1]>$2?a[$1]:$2; ##creating an array named a whose index is $1 and checking if value of a[$1] is greater than $2 is yes then keep a[$1] value as it is else change its value to current line value.
}
END{                       ##Starting END block of awk here.
  for(i in a){             ##Starting a for loop to traverse inside array a elements.
     print b[i]            ##Because array a and array b have same indexes and we have to print whole lines values so printing array b value here.
}
}
'  Input_file              ##mentioning the Input_file here.
+1

Source: https://habr.com/ru/post/1685277/


All Articles