How to find out which characters are defined as alphanumeric for a given language

Question

How to find out which characters are defined as alphanumeric for a given language

Thus, when matching python regular expressions, we have the value \ w and others affected by the re.LOCALE flag:

\ w
If the LOCALE and UNICODE flags are not specified, any alphanumeric and underscore characters match; this is equivalent to set [a-zA-Z0-9_]. Using LOCALE, it will match the set [0-9_] plus any characters are defined as alphanumeric for the current locale.

So, we want to find out which characters are defined as alphanumeric for a given language - let's say we made "locale -a", and we have a list of locales, and we want to find information for one of the listed locales on the system. Any way to quickly find information: a piece of code on a python or a single-line file, a shell command, or possibly reference material.

+6

python regex locale

Basel shishani Mar 11 '12 at 4:00

source share

1 answer

torek · Accepted Answer · 2012-03-11T04:16:15+0000

Use string.letters.

Example:

>>> import locale >>> import string >>> locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 'en_US.UTF-8' >>> string.letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' >>> locale.setlocale(locale.LC_ALL, 'de_DE') 'de_DE' >>> string.letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>>

How to find out which characters are defined as alphanumeric for a given language

More articles: