Regex replace (in Python) - easier?

Anytime I want to replace a piece of text that is part of most of the text, I always need to do something like:

"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)" 

And then combine the start group with the new data for replace , and then with the end group.

Is there a better way to do this?

+42
python regex
Jan 29 '09 at 5:43
source share
4 answers

Take a look at the Python re documentation for lookaheads (?=...) and lookbehinds (?<=...) - I'm sure they are the way you want. They correspond to strings, but do not “consume” the bits of the strings that they correspond to.

+18
Jan 29 '09 at 5:51
source share
— -
 >>> import re >>> s = "start foo end" >>> s = re.sub("foo", "replaced", s) >>> s 'start replaced end' >>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s) >>> s 'start can use a callable for the replaced text too end' >>> help(re.sub) Help on function sub in module re: sub(pattern, repl, string, count=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a callable, it passed the match object and must return a replacement string to be used. 
+105
Jan 29 '09 at 5:56
source share

The short option is that you cannot use variable-width patterns in lookbehinds with the Python re module. Unable to change this:

 >>> import re >>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz") 'fooquuxbaz' >>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz") Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> re.sub("(?<=fo+)bar(?=baz)", "quux", string) File "C:\Development\Python25\lib\re.py", line 150, in sub return _compile(pattern, 0).sub(repl, string, count) File "C:\Development\Python25\lib\re.py", line 241, in _compile raise error, v # invalid expression error: look-behind requires fixed-width pattern 

This means that you need to get around this, the simplest solution, very similar to what you are doing now:

 >>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz") 'fooquuxbaz' >>> >>> # If you need to turn this into a callable function: >>> def replace(start, replace, end, replacement, search): return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search) 

It does not have the elegance of a lookbehind solution, but it is still very clear, simple, single-line. And if you look at what the expert has to say on this issue (he talks about JavaScript, which lacks views completely, but many of the principles are the same), you will see that its simplest solution is very similar to this.

+11
Jan 29 '09 at 15:11
source share

I believe that the best idea is simply to capture in the group everything that you want to replace, and then replace it using the initial and final properties of the captured group.

considers

Adrian

 #the pattern will contain the expression we want to replace as the first group pat = "word1\s(.*)\sword2" test = "word1 will never be a word2" repl = "replace" import re m = re.search(pat,test) if m and m.groups() > 0: line = test[:m.start(1)] + repl + test[m.end(1):] print line else: print "the pattern didn't capture any text" 

This will print: 'word1 will never be word2'

The group to be replaced can be located at any position in the line.

+4
Jan 12
source share



All Articles