Remove simple HTML tags from String in Oracle through RegExp. Explanation needed

Question

Remove simple HTML tags from String in Oracle through RegExp. Explanation needed

I don’t understand why my reg1 and reg2 columns remove “bbb” from my row and only reg3 works as expected.

WITH t AS (SELECT 'aaa <b>bbb</b> ccc' AS teststring FROM dual) SELECT teststring, regexp_replace(teststring, '<.+>') AS reg1, regexp_replace(teststring, '<.*>') AS reg2, regexp_replace(teststring, '<.*?>') AS reg3 FROM t TESTSTRING REG1 REG2 REG3 aaa <b>bbb</b> ccc aaa ccc aaa ccc aaa bbb ccc

Thanks a lot!

+6

oracle plsql regex

Basti Jun 10 '15 at 12:51

source share

2 answers

Since the first and second are in this coincidence: <b>bbb</b> - in this case b>bbb</b matches both .* And .+

The third one will not do what you need either. You are looking for something like this: <[^>]*> . But you also need to replace all matches with "

0

DevilPinky Jun 10 '15 at 13:00

source share

Olivier Jacot-Descombes · Accepted Answer · 2015-06-10T12:56:52+0000

Because regex is greedy by default. That is, the expressions .* Or .+ Try to take as many characters as possible. Therefore, <.+> Will span from the first < to the last > . Make lazy using a lazy operator ? :

 regexp_replace(teststring, '<.+?>')

or

 regexp_replace(teststring, '<.*?>')

Now the search > will be stopped at the first encounter > .

Please note that . also includes > , so the greedy option (without ? ) swallows everything > , but the latter.

Remove simple HTML tags from String in Oracle through RegExp. Explanation needed

More articles: