PHP Regular Expression: exclude href anchor tags

I am creating a simple search for my application.

I use PHP's regular expression replacement (preg_replace) to search for a search query (case insensitive) and add <strong> tags around the search query.

preg_replace('/'.$query.'/i', '<strong>$0</strong>', $content);

Now I'm not the best with regular expressions. So, what would I add to the regular expression so as not to replace the search terms that are in the href anchor tag?

Thus, if someone were looking for "information", this would not change the link to "http://something.com/this_ <strong> info </strong> /index.html"

+6
source share
3 answers

I believe that for this you will need conditional subpatterns:

 $query = "link"; $query = preg_quote($query, '/'); $p = '/((<)(?(2)[^>]*>)(?:.*?))*?(' . $query . ')/smi'; $r = "$1<strong>$3</strong>"; $str = '<a href="/Link/foo/the_link.htm">'."\n".'A Link</a>'; // multi-line text $nstr = preg_replace($p, $r, $str); var_dump( $nstr ); $str = 'Its not a Link'; // non-link text $nstr = preg_replace($p, $r, $str); var_dump( $nstr ); 

Output: (view source)

 string(61) "<a href="/Link/foo/the_link.htm"> A <strong>Link</strong></a>" string(31) "Its not a <strong>Link</strong>" 

PS: above regex also takes care of multi-line replacements and, more importantly, ignores the match not only of href, but of any other HTML object enclosed in < and > .

EDIT: If you just want to exclude hrefs and not all html objects, use this template instead of above in my answer:

 $p = '/((<)(?(2).*?href=[^>]*>)(?:.*?))*?(' . $query . ')/smi'; 
+1
source

I'm not 100% what you end up after that, but what I can do is kind of “search phrase,” highlighting the object in which keywords are highlighted, so to speak. If so, I suggest taking a look at the text helper in CodeIgniter. It provides a small highlight_phrase function, and this can do what you are looking for.

The function is as follows.

 function highlight_phrase($str, $phrase, $tag_open = '<strong>', $tag_close = '</strong>') { if ($str == '') { return ''; } if ($phrase != '') { return preg_replace('/('.preg_quote($phrase, '/').')/i', $tag_open."\\1".$tag_close, $str); } return $str; } 
0
source

You can use conditional subpatterns, see explanation here: http://cz.php.net/manual/en/regexp.reference.conditional.php

 preg_replace("/(?(?<=href=\")([^\"]*\")|($query))/i","\\1<strong>\\2</strong>",$x); 

In your case, if you have all the HTML, not just href="" , there is a simpler solution using the 'e' modifier that allows you to use PHP code when replacing matches

 function termReplacer($found) { $found = stripslashes($found); if(substr($found,0,5)=="href=") return $found; return "<strong>$found</strong>"; } echo preg_replace("/(?:href=)?\S*$query/e","termReplacer('\\0')",$x); 

See example No. 4 here http://cz.php.net/manual/en/function.preg-replace.php If your expression is even more complex, you can use regExp even inside termReplacer() .

There is a small mistake in PHP : the $found parameter in termReplacer() must be stripslashed!

0
source

Source: https://habr.com/ru/post/886556/


All Articles