Unexpected bash sorting behavior

If I create a text file containing the following lines:

>TESTTEXT_10000000 >TESTTEXT_1000000 >TESTTEXT_10000002 >TESTTEXT_10000001 

and do sort myfile , my output

 >TESTTEXT_1000000 >TESTTEXT_10000000 >TESTTEXT_10000001 >TESTTEXT_10000002 

However, if I add / 1 and / 2 to my lines, the sort result will change a lot, and I don’t know why.

Input:

 >TESTTEXT_10000000/1 >TESTTEXT_1000000/1 >TESTTEXT_10000002/1 >TESTTEXT_10000001/1 

Output:

 >TESTTEXT_10000000/1 >TESTTEXT_1000000/1 >TESTTEXT_10000001/1 >TESTTEXT_10000002/1 

Input:

 >TESTTEXT_10000000/2 >TESTTEXT_1000000/2 >TESTTEXT_10000002/2 >TESTTEXT_10000001/2 

Output:

 >TESTTEXT_10000000/2 >TESTTEXT_10000001/2 >TESTTEXT_1000000/2 >TESTTEXT_10000002/2 

Is a forward slash recognizable as a delimiter? using --field-sperator did not change the behavior. If so, why 1000000/2 between inputs 1000001/2 and 1000002/2? Using human sorting, numerical sorting, or other parameters never led to consistency. Can someone help me here?

: edit: Since this seems relevant, given the answers, the LC_ALL value on this computer is en_GB.UTF-8

+5
source share
1 answer

/ is in front of 0 in your locale. Using LC_ALL=C or another language will not change anything.

In your case, you could use the -V collation correctly:

 sort -V myfile 

Alternatively you can specify a separator and keys for sorting:

 sort -t/ -k1,1 myfile 
+3
source

Source: https://habr.com/ru/post/1259098/


All Articles