Matching single-case string sequences with regular expressions in Python

Suppose I want to match a lowercase letter followed by an uppercase letter, I could do something like

re.compile(r"[az][AZ]") 

Now I want to do the same for unicode strings, i.e. match something like "aΓ…" or "yÜ".

I tried

 re.compile(r"[az][AZ]", re.UNICODE) 

but it does not work.

Any clues?

+6
source share
1 answer

This is difficult to do with the Python regex because the current implementation does not support Unicode property shortcuts such as \p{Lu} and \p{Ll} .

[A-Za-z] , of course, will match the letters ASCII regardless of whether the Unicode option is set or not.

So, until the re module is updated (or you install the regex package at the moment in development), you either need to do it programmatically ( char.islower() over the line and does char.islower() / char.isupper() for characters) or sets all code points Unicode by hand, which is probably not worth the effort ...

+6
source

Source: https://habr.com/ru/post/897204/


All Articles