How can I reduce steps in python regex?

This is my regex

(? [^>] + [& ;])? / (?! ) (?: | )

That's an example

<r> this is text which I do not want <a> This is what I want!<br>

I just want to discard the text between '>' and '<br or p'

This regular expression works exactly the way I want, but I understand that it takes too much time.

I ran this in the regex debugger, and it took over 800 steps to validate the wrong sentence.

how can i fix this?

+4
source share
2 answers

Your template ([^>]+?[<])/?(?!a)(?:br|p)means:

  • ([^>]+?[<])- Capture in group 1 one or more (but as few as possible) characters, except >, up to the first<
  • /? - /
  • (?!a)(?:br|p) - br, p, a ( ).

, </br>, <br <p.

>([^<]+)</?(?:br|p)\b

regex (21 9 <r> this is text which I do not want <a> This is what I want!br>).

:

  • > - >
  • ([^<]+) - 1, , <
  • < - <
  • /? - ( ) /
  • (?:br|p)\b - br, p, ( char).

- Python:

import re
p = re.compile(r'>([^<]+)</?(?:br|p)\b')
s = "<r> this is text which I do not want <a> This is what I want!<br>"
print(p.findall(s))
+1
0

Source: https://habr.com/ru/post/1651466/


All Articles