Why does the UNIX collation utility ignore leading spaces without the -b option?

[This is a rewriting of a similar question that I asked back ... Sorry for the confusion!]

I am confused in leading and standard sort utility. Consider the contents of myfile :

 a b a 

Running sort -t : myfile gives an unexpected result, at least for me:

 a a b 

It makes sense? <space> should appear either before az (as in the case of ASCII), or after. In the first case, I would expect

  a b a 

and in the second case

 a b a 

Why does it seem that sort uses the -b option (ignore leaders) if when it was not turned on? In fact, to be safe, I added the -t option to have exactly one field on each line. ( According to the POSIX standard , field “A” contains the maximum sequence of non-separating characters and, in the absence of the -t option, any previous field separator. “ sort myfile gives the same output, which is also unexpected.)

Thanks in advance!

+6
source share
2 answers

It depends on the locale. FROM

 LC_COLLATE=en_US.utf8 sort myfile 

I get your unexpected result and

 LC_COLLATE=C sort myfile 

I get the expected result. Also see bash sorting an unusual order. Problem with spaces?

(I don't know why sort treats -b and -t like this.)

+9
source
 $ sort -t : foo a a b $ env LC_ALL=C sort -t: foo a b a 

On the man page: * WARNING * The language specified by the environment affects the sort order. Set LC_ALL = C to get a traditional sort order that uses native byte values.

+7
source

Source: https://habr.com/ru/post/895712/


All Articles