Regular expression does not work

So, I'm trying to parse a file, and I have the following code:

def learn_re(s):
pattern=re.compile("[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} .")
if pattern.match(s):
    return True
return False

This corresponds to "01: 01: 01.123 -"; however, when I add another character, it does not work. For example, if I edit my code so that it

def learn_re(s):
pattern=re.compile("[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} . C")
if pattern.match(s):
    return True
return False

This does not match "01: 01: 01.123 - C". What's going on here?

+4
source share
2 answers

The problem is that yours is a Unicode character. When in str, it actually behaves like a few characters:

>>> print len('—')
3

But if you use unicodeinstead str:

>>> print len(u'—')
1

So the following will print True:

def learn_re(s):
    pattern=re.compile("[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3} . C")
    if pattern.match(s):
        return True
    return False

print learn_re(u"01:01:01.123 — C")

, python 2. python 3, str unicode str, .

+4

, (3 ). python unicode, 3 , .{3} , python.

; r'...', \ .

A . , . / , \..

pattern = re.compile(r'[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3} .')
+1

Source: https://habr.com/ru/post/1656739/


All Articles