I was puzzled by this when I saw a strange order of the following files listed in ls :
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv Star Wars Episode IV - A New Hope (1977) BDRip.mkv Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
From the point of view of a person, "I" should go first, then "II", etc.
so I created a file with the following contents:
$ cat 1 Star Wars Episode II - Attack Star Wars Episode III - Revenge Star Wars Episode I - The Star Wars Episode IV - A Star Wars Episode VI - Return Star Wars Episode V - The
if I sort it, it gives me the following:
$ sort 1 Star Wars Episode II - Attack Star Wars Episode III - Revenge Star Wars Episode I - The Star Wars Episode IV - A Star Wars Episode VI - Return Star Wars Episode V - The
However, if I delete the '-' and everything after it sorts correctly:
$ cat 1 Star Wars Episode II Star Wars Episode III Star Wars Episode I Star Wars Episode IV Star Wars Episode VI Star Wars Episode V $ sort 1 Star Wars Episode I Star Wars Episode II Star Wars Episode III Star Wars Episode IV Star Wars Episode V Star Wars Episode VI
So, as soon as I add a character after the space, it will start sorting unpredictable for me:
$ cat 1 Star Wars Episode II y Star Wars Episode III x Star Wars Episode I z Star Wars Episode IV w Star Wars Episode VI v Star Wars Episode V u $ sort 1 Star Wars Episode III x Star Wars Episode II y Star Wars Episode IV w Star Wars Episode I z Star Wars Episode VI v Star Wars Episode V u
Any hint of similar behavior?
Update: sorting: using en_CA.UTF-8 sorting rules
update # 2 as per the comment below, this is due to the locale.
ls | LANG=C sort Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv Star Wars Episode IV - A New Hope (1977) BDRip.mkv Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Why then does the UTF8 locale make it different? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (correct sorting)
Update # 3 It's about the locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021