Let's say I have this input file 49142202.txt:
A 5
B 6
C 3
A 4
B 2
C 1
Can I sort the groups in column 1 by the value in column 2? The desired output is as follows:
B 6 <-- B group at the top, because 6 is larger than 5 and 3
B 2 <-- 2 less than 6
A 5 <-- A group in the middle, because 5 is smaller than 6 and larger than 3
A 4 <-- 4 less than 5
C 3 <-- C group at the bottom, because 3 is smaller than 6 and 5
C 1 <-- 1 less than 3
Here is my solution :
join -t$'\t' -1 2 -2 1 \
<(cat 49142202.txt | sort -k2nr,2 | sort --stable -k1,1 -u | sort -k2nr,2 \
| cut -f1 | nl | tr -d " " | sort -k2,2) \
<(cat 49142202.txt | sort -k1,1 -k2nr,2) \
| sort --stable -k2n,2 | cut -f1,3
The first entry in join, sorted by column 2, is this:
2 A
1 B
3 C
The second entry in join, sorted by column 1, is as follows:
A 5
A 4
B 6
B 2
C 3
C 1
Conclusion join:
A 2 5
A 2 4
B 1 6
B 1 2
C 3 3
C 3 1
It is then sorted by row number nlin column 2, and the original input columns 1 and 3 are stored with cut.
, , , groupby pandas Python, , GNU Coreutils, sort, join, cut, tr nl? , awk , , , . !