Regex result

Question

Regex result

I have the code below:

import re line = "78349999234"; searchObj = re.search(r'9*', line) if searchObj: print "searchObj.group() : ", searchObj.group() else: print "Nothing found!!"

However, the conclusion is empty. I thought that * means: calls the resulting RE so that it matches 0 or more repetitions of the previous RE, as many repetitions as possible. ab* will match 'a' , 'ab' or 'a' , followed by any number of 'b' s. Why can't I see the result in this case?

+5

python regex

user3369157 Oct 14 '14 at 23:23

source share

2 answers

Willem van onsem · Answer 1 · 2014-10-14T23:28:02+0000

I think the regex matches left to right. So the first pattern that matches is the empty line before 7... If he finds 9 , he will really correspond to him greedy: and try to "eat" (which is the correct terminology) as many characters as possible.

If you request for:

 >>> print(re.findall(r'9*',line)); ['', '', '', '', '9999', '', '', '', '']

It matches all empty lines between characters, and, as you can see, 9999 also matches.

The main reason is probably performance: if you are looking for a pattern in a string of 10M + characters, you are very happy if the pattern is already in the first ten-digit characters. You don’t want to waste your energy looking for a “better” match ...

EDIT

If there are 0 or more cases, one group (in this case 9 ) is repeated zero or more times. In an empty line, characters are repeated exactly 0 times. If you want to match patterns where characters are repeated one or more times , you should use

9+

It leads to:

 >>> print(re.search(r'9+', line)); <_sre.SRE_Match object; span=(4, 8), match='9999'>

re.search for a template that accepts an empty string is probably not that much useful, as it will always match the empty string before the actual start of the string.

Avinash raj · Answer 2 · 2014-10-15T04:39:02+0000

The main reason is because the re.search function stops searching for strings when it finds a match. 9* means a match with the number 9 zero or more times. Since there is an empty string before each character, the re.search function stops its search after finding the first empty string. That's why you got an empty string as output ...

Regex result

More articles: