Python string.letters does not include language diacritics

I am trying to get the alphabet from a python string module depending on a specific locale without success (i.e. with diacritics, i.e. for French). Here is a minimal example:

import locale, string locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') print string.letters # shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8') print string.letters # also shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 

The python documentation says that string.letters is locale dependent , but it doesn't seem to work for me.

What am I doing wrong and is this the right way to get the language alphabet?

Edit: I just checked the locale print locale.getlocale() after setting and was changed correctly.

+5
source share
1 answer

In python 2.7 (no string.letters in python 3.x), it works if you set the locale to 'fr_FR' (equivalent to 'fr_FR.ISO8859-1' and not 'fr_FR.UTF-8').

 >>> import locale, string >>> locale.setlocale(locale.LC_ALL, 'es_ES') 'es_ES' >>> string.letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>> locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8') 'es_ES.UTF-8' >>> string.letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' 

So, \ xaa is the character "ª", \ xab "", \ xd1 is the "Ñ", etc. But the encoding representation is really broken.

I highly recommend reading this: https://pythonhosted.org/kitchen/unicode-frustrations.html

+3
source

Source: https://habr.com/ru/post/1258812/


All Articles