Why re.match ()?

Question

Why re.match ()?

I know this topic has already been discussed several times here in StackOverflow, but I'm looking for a better answer.

While I appreciate the differences , I really couldn't find a final explanation of why the re module in python provides both match() and search() . Can't I get the same behavior with search() if I add ^ in single-line mode and /A in multi-line mode? Did I miss something?

I tried to understand the implementation by looking at the _sre.c code, and I understand that the search ( sre_search() ) is actually implemented by moving the pointer to the string to be searched and applying sre_match() to it until a match is found.

Therefore, I suggest that using re.match() may be slightly faster than the corresponding regular expression (c ^ or /A ) using re.search() . This is the reason?

I also explored the python-dev ML file archives, but to no avail.

 >>> string="""first line ... second line""" >>> print re.match('first', string, re.MULTILINE) <_sre.SRE_Match object at 0x1072ae7e8> >>> print re.match('second', string, re.MULTILINE) None >>> print re.search('\Afirst', string, re.MULTILINE) <_sre.SRE_Match object at 0x1072ae7e8> >>> print re.search('\Asecond', string, re.MULTILINE) None

+6

python regex

spider Mar 12 '15 at 10:25

source share

1 answer

Casimir et Hippolyte · Accepted Answer · 2015-03-12T12:12:21+0000

As you already know, re.match will check the pattern only at the beginning of the line, and re.search will check the entire line until it finds a match.

So, is there a difference between re.match('toto', s) and re.search('^toto', s) and what is it?

Let's do a little test:

 #!/usr/bin/python import time import re p1 = re.compile(r'toto') p2 = re.compile(r'^toto') ssize = 1000 s1 = 'toto abcdefghijklmnopqrstuvwxyz012356789'*ssize s2 = 'titi abcdefghijklmnopqrstuvwxyz012356789'*ssize nb = 1000 i = 0 t0 = time.time() while i < nb: p1.match(s1) i += 1 t1 = time.time() i = 0 t2 = time.time() while i < nb: p2.search(s1) i += 1 t3 = time.time() print "\nsucceed\nmatch:" print (t1-t0) print "search:" print (t3-t2) i = 0 t0 = time.time() while i < nb: p1.match(s2) i += 1 t1 = time.time() i = 0 t2 = time.time() while i < nb: p2.search(s2) i += 1 t3 = time.time() print "\nfail\nmatch:" print (t1-t0) print "search:" print (t3-t2)

Two methods are tested using a string that does not match and a string that matches.

results:

 succeed match: 0.000469207763672 search: 0.000494003295898 fail match: 0.000430107116699 search: 0.46605682373

What can we do with these results:

1) The results are similar when the template succeeds.

2) The performance is completely different when the template fails. This is the most important point because it means that re.search continues to check every line position, even if the template is bound when re.match stops immediately.

If you increase the size of the test line with an error, you will see that re.match does not take more time, but re.search depends on the size of the line.

Why re.match ()?

More articles: