Applying a regular expression to a substring without using a string slice

I want to find a regular expression in a larger string from a specific position forward and without using string fragments .

My background is that I want to do a string search iteratively for matches between different regular expressions. A natural solution in Python would be to track the current position within the string and use, for example,

re.match(regex, largeString[pos:]) 

in a loop. But for really large strings (~ 1 MB), string slicing, like in largeString[pos:] , becomes expensive. I am looking for a way around this.

Side note. Funny, in the niche of the Python documentation , it talks about the optional pos parameter for the matching function (which will be exactly what I want), which cannot be found using the functions themselves :-).

+5
source share
4 answers

Variants with the pos and endpos options only exist as members of regular expression objects. Try the following:

 import re pattern = re.compile("match here") input = "don't match here, but do match here" start = input.find(",") print pattern.search(input, start).span() 

... outputs (25, 35)

+6
source

The pos keyword is available only in versions of the method. For instance,

 re.match("e+", "eee3", pos=1) 

invalid but

 pattern = re.compile("e+") pattern.match("eee3", pos=1) 

work.

+4
source
 >>> import re >>> m=re.compile ("(o+)") >>> m.match("oooo").span() (0, 4) >>> m.match("oooo",2).span() (2, 4) 
+2
source

You can also use positive lookbehinds, for example:

 import re test_string = "abcabdabe" position=3 a = re.search("(?<=.{" + str(position) + "})ab[az]",test_string) print a.group(0) 

gives:

 abd 
+1
source

Source: https://habr.com/ru/post/890083/


All Articles