Multi-key sorting using the Linux sort command

Say I have this file.

$ cat a.txt c 1002 4 f 1001 1 d 1003 1 a 1001 3 e 1004 2 b 1001 2 

I want to sort it by the second column, and then by the third column. Column two is numbers, and column 3 can be thought of as a string. I know that the following command works well.

 $ sort -k2,2n -k3,3 a.txt f 1001 1 b 1001 2 a 1001 3 c 1002 4 d 1003 1 e 1004 2 

However, I think sort -k2n a.txt should also work as long as it doesn't.

 $ sort -k2n a.txt a 1001 3 b 1001 2 f 1001 1 c 1002 4 d 1003 1 e 1004 2 

It seems to be sorted by column two, and then by column one instead of column three. Why is this happening? Is this a mistake or not? The reason sort -k2 a.txt works fine with the data above, since these numbers are only a fixed width.

My version of sorting is sort (GNU coreutils) 8.15 in cygwin.

+6
source share
1 answer

I find this a warning in GNU sort docs.

Sort numerically by the second field and allow communications by sorting in alphabetical order on the third and fourth characters of field five. use ': As a field separator.

  sort -t : -k 2,2n -k 5.3,5.4 

Note that if you wrote -k 2n instead of -k 2,2n, then sorting would use all characters starting from the second field and extending to the end of the line as a primary numeric key. For the vast majority of applications that handle keys covering more than one field as numeric will not do what you expect.

I'm not sure what happened when it evaluates β€œ1001 3” as a numeric key, but β€œwill not do what you expect” is accurate. It seems obvious that the right thing is to specify each key independently.

The same webpage talks about resolving "links."

Finally, in the extreme case, when all keys are compared equal, sorting compares whole lines, as if no order parameters other than -reverse (-r) were specified.

I admit, I'm a little puzzled by how to interpret this.

+9
source

Source: https://habr.com/ru/post/946843/


All Articles