The sort command uses the current locale, as indicated by the LC_ALL environment LC_ALL , to determine the sort order for characters. Usually the easiest way to fix sorting problems is to manually set it to the C locale, which processes each 8-bit byte as a single character and compares it with a simple numeric value. In most shells, this can be done as a one-time use for only one command, prefixed like this:
LC_ALL=C sort < infile > outfile
It will also solve similar problems for some other word processing programs. (For example, I recall the problems associated with CSV files on a German personβs computer - this was due to the fact that the Germans used a comma instead of a decimal point. Putting LC_ALL=C in front of the corresponding commands fixed this problem too.)
[EDIT] Although Perl may be directed to process some strings as Unicode, it still treats input and output as 8-bit byte streams, so the above approach should result in something similar to the Perl sort() function. (Thanks to Ven'Tatsu for this nugget.)
source share