Is the Python sort function the same as for Linux sort with LC_ALL = C

I am porting a Bash script in Python. the script sets LC_ALL=C and uses the Linux sort command to provide its own byte order instead of the sort order by language ( https://stackoverflow.com/a/312947/ ).

In Python, I want to use the Python functions list sort() or sorted() (without the key= option). Do I always get the same results as on Linux with LC_ALL=C ?

+6
source share
4 answers

Sort should behave as you would expect if you pass locale.strcoll as the cmp argument to list.sort () and sorted () :

 import locale locale.setlocale(locale.LC_ALL, "C") yourList.sort(cmp=locale.strcoll) 
+7
source

Given that you can add a comparison function, you can make sure that sorting is equivalent to LC_ALL = C. However, from the docs, it looks like if all characters have 7 bits, then by default it is sorted this way, otherwise local sorting is used.

In case you have 8-bit or Unicode characters, then specific sorting by language makes a lot of sense.

+1
source

Non-unicode strings in Python version less than 3 are actually bytes. the sorting function and methods do nothing to enforce the language standard (a local module function is needed to facilitate language-dependent sorting).

unicode strings and all Python 3.x strings are no longer bytes. Python 3 has a type of "byte".

+1
source

I used International components for Unicode as well as PyICU , sort things with sorted () and use my own language (Catalan in my case). For example, ordering a list of user profiles by the name property:

 collator = PyICU.Collator.createInstance(PyICU.Locale('ca_ES.UTF-8')) sorted(user_profiles, key=lambda x: x.name, cmp=collator.compare) 
+1
source

Source: https://habr.com/ru/post/905440/


All Articles