Influence? in regex string

Consider the following Python code:

>>> re.search(r'.*(99)', 'aa99bb').groups() ('99',) >>> re.search(r'.*(99)?', 'aa99bb').groups() (None,) 

I don't understand why in the second example I won't catch 99.

+4
source share
2 answers

This is because .* First matches the entire string. At this point, it is no longer possible to match 99 , and since the group is optional, the regex engine stops because it found a successful match.

If, on the other hand, a group is required, the regular expression engine should return to .* .

Compare the following debugging sessions from RegexBuddy (the part of the line associated with .* Is highlighted in yellow, the part is (99) blue):

.*(99) :

enter image description here


.*(99)? :

enter image description here

+11
source

Depending on your need, a good choice might be [^9]*(99)? . There is no backtracking, instead there is nothing but 9 followed by an additional 99. It does not work if you want to ignore 9s to 99.

 >>> re.search(r'[^9]*(99)?', 'aa99bb').groups() ('99',) >>> re.search(r'[^9]*(99)?', 'aa9x99bb').groups() (None,) 
0
source

Source: https://habr.com/ru/post/1344359/


All Articles