Cannot explain sort behavior (1)

I was puzzled by this when I saw a strange order of the following files listed in ls :

 Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv Star Wars Episode IV - A New Hope (1977) BDRip.mkv Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv 

From the point of view of a person, "I" should go first, then "II", etc.

so I created a file with the following contents:

 $ cat 1 Star Wars Episode II - Attack Star Wars Episode III - Revenge Star Wars Episode I - The Star Wars Episode IV - A Star Wars Episode VI - Return Star Wars Episode V - The 

if I sort it, it gives me the following:

 $ sort 1 Star Wars Episode II - Attack Star Wars Episode III - Revenge Star Wars Episode I - The Star Wars Episode IV - A Star Wars Episode VI - Return Star Wars Episode V - The 

However, if I delete the '-' and everything after it sorts correctly:

 $ cat 1 Star Wars Episode II Star Wars Episode III Star Wars Episode I Star Wars Episode IV Star Wars Episode VI Star Wars Episode V $ sort 1 Star Wars Episode I Star Wars Episode II Star Wars Episode III Star Wars Episode IV Star Wars Episode V Star Wars Episode VI 

So, as soon as I add a character after the space, it will start sorting unpredictable for me:

 $ cat 1 Star Wars Episode II y Star Wars Episode III x Star Wars Episode I z Star Wars Episode IV w Star Wars Episode VI v Star Wars Episode V u $ sort 1 Star Wars Episode III x Star Wars Episode II y Star Wars Episode IV w Star Wars Episode I z Star Wars Episode VI v Star Wars Episode V u 

Any hint of similar behavior?

Update: sorting: using en_CA.UTF-8 sorting rules

update # 2 as per the comment below, this is due to the locale.

 ls | LANG=C sort Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv Star Wars Episode IV - A New Hope (1977) BDRip.mkv Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv 

Why then does the UTF8 locale make it different? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (correct sorting)

Update # 3 It's about the locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

+6
source share
2 answers
+2
source

It ignores all non-letter characters when using locale-based sorting:

 II - Attack -> "IIA" III - Revenge -> "III" I - The -> "ITh" IV - A -> "IVA" VI - Return -> "VIR" V - The -> "VTh" 

With LC_ALL=C the space character is sorted before the alphanumeric character:

 I - The -> "I -" II - Attack -> "II " III - Revenge -> "III" IV - A -> "IV " V - The -> "V -" VI - Return -> "VI " 

So, it is a coincidence that this works, but it requires another 30 films.

+1
source

Source: https://habr.com/ru/post/959402/


All Articles