Why don't backlinks work in Python re.sub when using the replace function?

Using re.sub in Python 2.7, the following example uses a simple backlink:

 re.sub('-{1,2}', r'\g<0> ', 'pro----gram-files') 

It prints the following line as expected:

 'pro-- -- gram- files' 

I would expect the following example to be identical, but it is not:

 def dashrepl(matchobj): return r'\g<0> ' re.sub('-{1,2}', dashrepl, 'pro----gram-files') 

This gives the following unexpected conclusion:

 'pro\\g<0> \\g<0> gram\\g<0> files' 

Why do two examples give different results? Am I missing something in the documentation that explains this? Is there any special reason that this behavior is preferable to what I expected? Is there a way to use backlinks in a replacement function?

+4
source share
2 answers

Since there are simpler ways to achieve your goal, you can use them.

As you have already noticed, your replacement function receives a matching object as an argument.

This object has, among other things, the group() method, which can be used instead:

 def dashrepl(matchobj): return matchobj.group(0) + ' ' 

which will give exactly your result.


But you are absolutely right - the documents are a little confused this way:

they describe the argument argument repl :

repl can be a string or a function; if it is a string, any backslashes in it are processed.

and

If repl is a function, it is called for each non-overlapping occurrence of the pattern. The function takes one argument of the mapping object and returns a replacement string.

You could interpret this as if the "replacement string" returned by the function is also applicable to the backslash scrolling process.

But since this process is described only for the case that "this is a string", it becomes clearer, but not obvious at first glance.

+5
source

If you pass the re.sub function, it allows you to replace the match with the string returned by the function. Basically, re.sub uses different code paths depending on whether you pass a function or a string. And yes, this is really desirable. Consider the case when you want to replace foo matches with bar and baz matches with qux . Then you can write it as:

 repdict = {'foo':'bar','baz':'qux'} re.sub('foo|baz',lambda match: repdict[match.group(0)],'foo') 

You could argue that you can do this in 2 passes, but you cannot do it if the repdict looks like {'foo':'baz','baz':'qux'}

And I don’t think you can do this with backlinks (at least not easily).

+3
source

Source: https://habr.com/ru/post/1440589/


All Articles