Regex replace string in HTML but not inside link or header

I am looking for a regular expression to replace a given line in an html page, but only if the line is not part of the tag itself or is displayed as text inside the link or header.

Examples:

Looking for "replace_me"

<p>You can replace_me just fine</p> Ok

<a href='replace_me'>replace_me</a> no match

<h3>replace_me</h3> no match

<a href='/test/'><span>replace_me</span></a> no match

<p style="background:url('replace_me')">replace_me<h1>replace_me</h1></p> first no match, second OK, third no match

Thanks in advance!

UPDATE:

I found a working regex

\b(replace_me)\b(?!(?:(?!<\/?[ha].*?>).)*<\/[ha].*?>)(?![^<>]*>)
+3
source share
3 answers

HTML - , . , , , , , :

  • ?
  • ?
  • ?
  • ?

, "" (: ) , , - - Turing Complete, - , :)

0
\b(replace_me)\b(?!(?:(?!<\/?[ha].*?>).)*<\/[ha].*?>)(?![^<>]*>)
0

- HTML tio2 TiO<sub>2</sub> ticl4 TiCl<sub>4</sub>.

, , "" , . www.ilovetio2.com, www.tastytastyticl4.info. href .

, , HTML:

  • str_ireplace
  • href, <sub>...</sub>, preg_replace_callback

    public static function subscriptStrings($str)
    {
    
        // $str is arbitrary string which may be HTML, may be plain text
    
        // Define search / replacements
        $map = [
            'tio2' => 'TiO<sub>2</sub>',
            'ticl4' => 'TiCl<sub>4</sub>'
        ];
    
        // Replace ALL instances, paying no heed to their context
        $str = str_ireplace(array_keys($map), array_values($map), $str);
    
        // Make a second pass, specifically looking for href values
        $str = preg_replace_callback('/href="[^"]+"/', function ($str) {
    
            // Return the href value stripped of <sub> tags
            return str_replace(['<sub>', '</sub>'], '', $str[0]);
        }, $str);
    
        return $str;
    }
    

, - - .

0

Source: https://habr.com/ru/post/1749366/


All Articles