Regex: getting a back reference to a number, adding to it

A simple question in regex:

I want to replace page numbers in a string with pagenumber + some number (e.g. 10). I decided that I could capture the matching page number using a backlink, perform an operation on it, and use it as a replacement argument in re.sub .

This works (just passing the value):

 def add_pages(x): return x re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE) 

Harvest, of course, 'here is Page 11 and here is Page 78\nthen there is Page 65'

Now, if I change the add_pages function to change the passed backlink, I get an error.

 def add_pages(x): return int(x)+10 re.sub("(?<=Page )(\d{2})",add_pages(r"\1") ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE) ValueError: invalid literal for int() with base 10: '\\1' 

because what is passed to the add_pages function seems to be a literal backlink, not what it refers to.

There is no retrieval of all the associated numbers in the list, and then processing and adding back, how would I do this?

+5
source share
2 answers

The actual problem is that you have to pass the function to the second re.sub parameter, instead you call the function and pass the return value.

Why does this work in the first case?

Whenever a match is found, the second parameter will be considered. If it is a string, then it will be used as a replacement, if it is a function, then the function will be called with the object . In your case, add_pages(r"\1") just returns r"\1" . So re.sub translates to this

 print re.sub("(?<=Page )(\d{2})", r"\1", ...) 

Thus, it actually replaces the original matched string with the same. That is why it works.

Why does this not work in the second case?

But in the second case, when you do

 add_pages(r"\1") 

You are trying to convert r"\1" to an integer, which is not possible. That is why he fails.

How to fix it?

The actual way to write this would be,

 def add_pages(matchObject): return str(int(matchObject.group()) + 10) print re.sub("(?<=Page )(\d{2})", add_pages, ...) 

Read more about the group function here

+7
source
 def add_pages(matchobj): return str(int(matchobj.group(0))+10) print re.sub("(?<=Page )(\d{2})",add_pages ,'here is Page 11 and here is Page 78\nthen there is Page 65',re.MULTILINE 
+1
source

Source: https://habr.com/ru/post/1207714/


All Articles