Do Unix sort groups by their maximum value?

Question

Do Unix sort groups by their maximum value?

Let's say I have this input file 49142202.txt:

Can I sort the groups in column 1 by the value in column 2? The desired output is as follows:

B   6 <-- B group at the top, because 6 is larger than 5 and 3
B   2 <-- 2 less than 6
A   5 <-- A group in the middle, because 5 is smaller than 6 and larger than 3
A   4 <-- 4 less than 5
C   3 <-- C group at the bottom, because 3 is smaller than 6 and 5
C   1 <-- 1 less than 3

Here is my solution :

join -t$'\t' -1 2 -2 1 \
 <(cat 49142202.txt | sort -k2nr,2 | sort --stable -k1,1 -u | sort -k2nr,2 \
  | cut -f1 | nl | tr -d " " | sort -k2,2) \
 <(cat 49142202.txt | sort -k1,1 -k2nr,2) \
| sort --stable -k2n,2 | cut -f1,3

The first entry in join, sorted by column 2, is this:

2   A
1   B
3   C

The second entry in join, sorted by column 1, is as follows:

Conclusion join:

It is then sorted by row number nlin column 2, and the original input columns 1 and 3 are stored with cut.

, , , groupby pandas Python, , GNU Coreutils, sort, join, cut, tr nl? , awk , , , . !

+4

sorting unix bash grouping gnu-coreutils

tommy.carstensen 07 . '18 0:26

3

Allan · Answer 1 · 2018-03-07T07:01:47+0000

, pipes, cat , , sort, - / :

, f_grp_sort - :

for elem in $(sort -k2nr f_grp_sort | awk '!seen[$1]++{print $1}')
do 
   grep $elem <(sort -k2nr f_grp_sort) 
done

:

:

sort -k2nr f_grp_sort :

sort -k2nr f_grp_sort | awk '!seen[$1]++{print $1}' :

B
A
C

awk 1 .

for elem in $(...)do grep $elem <(sort -k2nr f_grp_sort); done grep , B, A, C, .

, sort -k2nr f_grp_sort :

$ sort -k2nr f_grp_sort > tmp_sorted_file && for elem in $(awk '!seen[$1]++{print $1}' tmp_sorted_file); do grep $elem tmp_sorted_file; done && rm tmp_sorted_file

Jeff Breadner · Answer 2 · 2018-03-07T03:24:52+0000

, , bash, , . .

, , , , col1 col2. ARR_A ARR_B, A B 1 ( $col1 , bash). 2, 1.

, , 1, , 1 1 2.

dynamc , 1 . , , bash 1, .

file=./49142202.txt

while read col1 col2 extra
do
  if [[ "$col1" =~ ^[a-zA-Z0-9_]+$ ]]
  then
    eval 'ARR_'${col1}'+=("'${col2}'")'
  else
    echo "Bad character detected in Column 1:  '$col1'"
    exit 1
  fi
done < "$file"

sort -k2nr,2 "$file" | sort --stable -k1,1 -u | sort -k2nr,2 | while read col1 extra
do 
  for col2 in $(eval 'printf "%s\n" "${ARR_'${col1}'[@]}"' | sort -r)
  do
    echo $col1 $col2
  done
done

, , :

$ cat 49142202.txt
A 4
B 6
C 3
A 5
B 2
C 1
C 0

$ ./run
B 6
B 2
A 5
A 4
C 3
C 1
C 0

tommy.carstensen · Answer 3 · 2018-03-07T12:49:13+0000

Thanks a lot @JeffBreadner and @Allan! I came up with another solution, which is very similar to my first, but gives a little more control, because it allows easier nesting for loops:

for x in $(sort -k2nr,2 $file | sort --stable -k1,1 -u | sort -k2nr,2 | cut -f1); do
 awk -v x=$x '$1==x' $file | sort -k2nr,2
done

Do you mind if I disagree with any of your answers until I have time to evaluate the effectiveness of the time and memory of your decisions? Otherwise, I will probably just come up for a solution awkby @Allan.

Do Unix sort groups by their maximum value?

More articles: