Looking for the difference between re.match (pattern, ...) and re.search (r '\ A' + pattern, ...)

(All of the code below assumes a context in which import re has already been evaluated.)

the documentation of the differences between re.match and re.search specifically compares the re.match(pattern, ...) run with the re.search('^' + pattern, ...) run. This seems like a bit of a straw to me, because a real test would be to compare re.match(pattern, ...) with re.search(r'\A' + pattern, ...) 1 .

To be more specific, I cannot readily come up with a combination of pattern and string for which the result

 m = re.match(pattern, string) 

will be different from the result

 m = re.search(r'\A' + pattern, string) 

(Note that if the original pattern in pattern is unicode , then the corrected pattern in r'\A' + pattern is quite convenient.)

We emphasize that here I am not interested in possible differences in performance, convenience, etc. At the moment, I'm only interested in differences in the final results (i.e. differences in the final values ​​of m ).

To formulate the question a little more broadly, I am looking for a combination of pattern , flags , string and kwargs such that the final value of m in

 r0 = re.compile(pattern, flags=flags) m = r0.match(string, **kwargs) 

different from the final value of m in

 r1 = re.compile(r'\A' + pattern, flags=flags) m = r1.search(string, **kwargs) 

This combination of pattern , flags , string and kwargs inputs may exist, but in order to make this statement with any certainty, you will need a deep knowledge of the inside of the Python regular expression engine. IOW, in contrast to the "positive answer" (i.e., the one consisting of only one combination of inputs, as described), the "negative answer" to this question comes down to a fairly authoritative statement, therefore, to convince it, it is necessary that the case made on much deeper level (than for a "positive" answer).

To summarize: I am looking for answers of one of two possible types:

  • A combination of pattern , flags , string and kwargs , which will lead to different m values ​​in the last two cases mentioned above;
  • An authoritative "negative" answer (i.e., such a combination of input does not exist) based on knowledge of Python internal regular expressions.

1 \A binds the match to the beginning of the line, regardless of whether the match is multi-line or not. BTW, an analogue of \A to match the end of a line \Z Pretty inconvenient, Python \Z matches Perl \Z , not Perl \Z This worked when I wrote an earlier version of this post. (BTW, in Python regexes \Z doesn't really matter; it just matches z .) Thanks to John Y for determining my error.

+4
source share
1 answer

Maybe something I don’t see here, but I think the difference is clear.

  • re.match() returns a successful match only if the pattern you are looking for is at the beginning of the line, and from the look of the examples in the documentation it seems that re.match() uses \A to bind the match to the beginning of the line and not the beginning of the line in multi-line mode.

  • re.search() returns a successful match regardless of where the template is inside the target string, if there is a match, of course, if you don't bind the template intentionally.

Now, answering your main question, what is the difference between re.match(pattern, …) and re.search(r'\A' + pattern, …) ?

Well, there’s no difference, it’s just a convenience method, so you don’t need to type r'\A' + pattern every time I think if you want to snap your correspondence, which happens a lot, I suppose.

You can be sure that re.match() uses \A internally by simply looking at the last example in the comparison link you specified:

 >>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match >>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match <_sre.SRE_Match object at ...> 
0
source

Source: https://habr.com/ru/post/1499613/


All Articles