Regular Expression Negative Lookahead / Lookbehind for excluding HTML from Find-and-Replace

I have a function on my site where the search results have a search query highlighted in the results. However, some of the fields that the site viewed contain HTML. For example, let's say I had a search result consisting of <span>Hello all</span> . If the user was looking for the letter a , I want the code to return <span>Hello <mark>a</mark>all</span> instead of the messy <sp<mark>a</mark>n>Hello <mark>a</mark>ll</sp<mark>a</mark>n> , which he would return now.

I know that I can use negative lookbehinds and lookaheads in preg_replace() to exclude any instances where a is between < and > . But how do I do this? Regular expressions are one of my weaknesses, and I cannot come up with any of this work.

So far I have this:

 $return = preg_replace("/(?<!\<[az\s]+?)$match(?!\>[az\s]+?)/i", '<mark>'.$match.'</mark>', $result); 

But that does not work. Any help?

0
source share
2 answers

It was thought that bad practice used regular expression to parse a complex language such as HTML. With sufficient skill and patience, as well as an improved regex engine, this may be possible, but the potential traps are huge and performance is unlikely to be good.

The best solution is to use the dom parser, such as the DOMDocument class built into PHP.

A good example of this can be found here in the answer to this related SO question .

Hope this helps.

+1
source

If you want to use regular expressions, you need a simple negative forward expression (assuming well-formed markup with no < or > inside or between tags)

 $return = preg_replace("/$match(?![^<>]*>)/i", '<mark>$0</mark>', $result); 

Any special regular expression characters in $match must be properly escaped.

+1
source

Source: https://habr.com/ru/post/1397115/


All Articles