Using a range in regex for arabic letters

When using Regex in Python, it is easy to use parentheses to represent a range of characters a-z, but this does not seem to work in other languages, such as Arabic:

import re
pattern = '[ي-ا]'
p = re.compile(pattern)

The result is a long error report ending in

raise error("bad character range")
sre_constants.error: bad character range

how can this be fixed?

+4
source share
2 answers

Use Unicode escape sequences instead.

>>> re.compile('[\u0627-\u064a]')
<_sre.SRE_Pattern object at 0x237f460>
+6
source

, , " ا ي", ( , ):

'[ا-ي]'

:

>>> re.compile('[ا-ي]')
<_sre.SRE_Pattern object at 0x6001f0a80>

>>> re.compile('[ا-ي]', re.DEBUG)
in
  range (1575, 1610)
<_sre.SRE_Pattern object at 0x6001f0440>

, '[ي-ا]' " ي ا", , ا , ي.

, Ignacio Vazquez-Abrams Unicode- , .

+7

Source: https://habr.com/ru/post/1569275/


All Articles