AWK - How to do selective sorting of multiple columns?

In awk, how can I do this:

Entrance:

1 af 1 12 v 2 bg 2 10 w 3 ch 3 19 x 4 di 4 15 y 5 ej 5 11 z 

Required output by sorting a numeric value into $5 :

 1 af 2 10 w 2 bg 5 11 z 3 ch 1 12 v 4 di 4 15 y 5 ej 3 19 x 

Note that sorting should only affect $4 , $5 and $6 (based on the value of $5 ), in which the previous part of the table remains intact.

+4
source share
3 answers

This can be done in a few steps with paste :

 $ gawk '{print $1, $2, $3}' in.txt > a.txt $ gawk '{print $4, $5, $6}' in.txt | sort -k 2 -n b.txt > b.txt $ paste -d' ' a.txt b.txt 1 af 2 10 w 2 bg 5 11 z 3 ch 1 12 v 4 di 4 15 y 5 ej 3 19 x 
+5
source

Personally, I use awk to safely sort column arrays is quite difficult, because often you need to hold and sort duplicate keys. If you need to selectively sort a group of columns, I would call paste for some help:

 paste -d ' ' <(awk '{ print $1, $2, $3 }' file.txt) <(awk '{ print $4, $5, $6 | "sort -k 2" }' file.txt) 

Results:

 1 af 2 10 w 2 bg 5 11 z 3 ch 1 12 v 4 di 4 15 y 5 ej 3 19 x 
+4
source

This can be done in pure awk , but as @steve said, it is not perfect. gawk has limited sorting functions, and awk does not have built-in sorting. However, here is a (rather hacky) solution using the comparison function in gawk :

 [ ghoti@pc ~/tmp3]$ cat text 1 af 1 12 v 2 bg 2 10 w 3 ch 3 19 x 4 di 4 15 y 5 ej 5 11 z [ ghoti@pc ~/tmp3]$ cat doit.gawk ### Function to be called by asort(). function cmp(i1,v1,i2,v2) { split(v1,a1); split(v2,a2); if (a1[2]>a2[2]) { return 1; } else if (a1[2]<a2[2]) { return -1; } else { return 0; } } ### Left-hand-side and right-hand-side, are sorted differently. { lhs[NR]=sprintf("%s %s %s",$1,$2,$3); rhs[NR]=sprintf("%s %s %s",$4,$5,$6); } END { asort(rhs,sorted,"cmp"); ### This calls the function we defined, above. for (i=1;i<=NR;i++) { ### Step through the arrays and reassemble. printf("%s %s\n",lhs[i],sorted[i]); } } [ ghoti@pc ~/tmp3]$ gawk -f doit.gawk text 1 af 2 10 w 2 bg 5 11 z 3 ch 1 12 v 4 di 4 15 y 5 ej 3 19 x [ ghoti@pc ~/tmp3]$ 

This saves your entire input file in arrays so that lines can be collected after sorting. If your input is millions of lines, this can be problematic.

Note that you can play with printf and sprintf to set the appropriate output field separators.

You can find documentation on using asort() with functions on the gawk man page; find PROCINFO["sorted_in"] .

+4
source

Source: https://habr.com/ru/post/1437258/


All Articles