How can I reduce steps in python regex?

Question

This is my regex

(? [^>] + [& ;])? / (?! ) (?: | )

That's an example

<r> this is text which I do not want <a> This is what I want!<br>

I just want to discard the text between '>' and '<br or p'

This regular expression works exactly the way I want, but I understand that it takes too much time.

I ran this in the regex debugger, and it took over 800 steps to validate the wrong sentence.

how can i fix this?

+4

심형보 Aug 17 '16 at 1:15

2 answers

:

>([^<]+)<(?:br|p)

0

Jack 17 . '16 1:20

Wiktor Stribiżew · Accepted Answer · 2016-08-17T07:15:19+0000

Your template ([^>]+?[<])/?(?!a)(?:br|p)means:

([^>]+?[<])- Capture in group 1 one or more (but as few as possible) characters, except >, up to the first<
/? - /
(?!a)(?:br|p) - br, p, a ( ).

, </br>, <br <p.

>([^<]+)</?(?:br|p)\b

regex (21 9 <r> this is text which I do not want <a> This is what I want!br>).

:

import re
p = re.compile(r'>([^<]+)</?(?:br|p)\b')
s = "<r> this is text which I do not want <a> This is what I want!<br>"
print(p.findall(s))