How to replace all occurrences of a regular expression, as if reapplying repeatedly

For example, I have text with a large number of product sizes, such as "2x4", which I would like to convert to "2 xby 4".

pattern = r"([0-9])\s*[xX\*]\s*([0-9])" re.sub(pattern, r"\1 xby \2", "2x4") '2 xby 4' # good re.sub(pattern, r"\1 xby \2", "2x4x12") '2 xby 4x12' # not good. need this to be '2 xby 4 xby 12' 

One way to describe what I want to do is to repeat the replacement until no changes are made. For example, I can just do the above replacement twice to get what I want

 x = re.sub(pattern, r"\1 xby \2", "2x4x12") x = re.sub(pattern, r"\1 xby \2", x) '2 xby 4 xby 12' 

But I guess the best way

+5
source share
3 answers

You can use this lookahead regular expression to search:

 r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)' 

(?=[0-9]+) - this is a positive look that simply confirms the presence of the second number, if you look ahead, but does not move the internal pointer of the regular expression, matching the number.

And use this to replace:

 r'\1 xby ' 

RegEx Demo

Code:

 >>> pattern = r'([0-9]+)\s*[xX*]\s*(?=[0-9]+)' >>> re.sub(pattern, r'\1 xby ', "2x4") '2 xby 4' >>> re.sub(pattern, r'\1 xby ', "2x4x12") '2 xby 4 xby 12' 
+5
source

I think you can approach this in one go by thinking about it a little differently. What you're really trying to do is replace x with xby - so you can scan the entire line once if you don't consume the right side of the digits.

For this, I recommend a forward-looking statement. In principle, confirm that the thing you are replacing is accompanied by numbers, but do not eat numbers in this process. This designation (? = ...) - see re docpage .

For me, I have the following: note that compiling a regular expression is optional, and \ d is usually preferable [0-9]:

 pattern = re.compile(r"(\d+)\s*[xX\*]\s*(?=\d)") pattern.sub(r"\1 xby ", "2x4x12") '2 xby 4 xby 12' 

In one pass, it will process the entire string.

+1
source

Since you are trying to restart a match with text that has already been converted by a regular expression, there is no better way.

It’s kind of like unwinding a math problem, if you want to do: (2 + 3) + 4, you will need to replace β€œ(2 + 3)” to replace β€œ5 + 4”, since line β€œ5” is nowhere in your source text .

What you can do is check your string for any match and continue re-replacing your previous results until more matches are found.

Edit: you can also just create some regular expressions for the number of repetitions and run them in descending order of length. That is, look for 2x3x5x2, then 2x3x5, then 2x3, since gradually you will not hit anything that has already been replaced.

0
source

Source: https://habr.com/ru/post/1244912/


All Articles