Regex / wildcard replace with PHP string

I have a ton of text that loads in the heading, and inside that link lies.

<link rel="canonical" href="could_be_anything_here_at_all" /> 

I want to replace it with a new value, but changing the page-based href means that simple str_replace is not possible.

I looked at using preg_replace, but can't figure out what seems like a simple problem.

  $regex = '/(^<link rel="canonical")(\/>$)/'; $match = preg_match_all($regex, $content, $matches); var_dump($matches); 
  • // start and end the expression?
  • () indicate the individual "expressions" that should be matched for the returned string?
  • Filters for results starting with the following line:
  • $ Filters for results that end with the next line?

So I'm looking for a line that starts with <link rel="canonical" and ends with />

I showed the steps that I need and my kick. Please help me write and ultimately understand how to do this. I am really at a loss on this.

+4
source share
2 answers

The regular expression you wrote is everywhere. Follow the template:

Whatever happens, it starts with <link and ends with the symbol ></link> or /> (should be considered for these annoying non-standard web pirates). You are looking for the rel parameter, if any, and it should be canonical.

We can start writing a regular expression: #<link([^>]+)(/>|></link>)#is . This will display all link tags. You can then parse the parameters with simple strpos calls.

If you are sure that rel = "canonical" will be the first parameter of the link tag, you can extend the regular expression further in #<link rel="canonical" href="?'?([^"']+)"?'?(/>|></link>)#is . This will display it in order, which is great if you are sure it will be in order.

In order of appearance:

[^>]+ matches one character > one or more times

The is flags mean: case insensitive, don't break on a new line

"?'? matches 0 or one" followed by 0 or 1'

If something else is unclear, let me know.

Edit: answer your questions

  • // start and end the expression? They are called delimiters, and they "enclose" the expression. Perl's regular expression mechanism allows you to set flags relative to the expression (i, s, g, b, etc.), and they must be outside the expression. They follow the separator, and this is the separator point. You can use any character you like - he will choose the very last two repeating ones. People tend to use / because of JS, using one single char for this - I prefer # in PHP to clear / ambiguities arising from closing HTML tags.

  • () indicate the individual expressions that should be matched for the returned string? () matches a subset and allows you to return it to the results if you specify a variable for matches. Each part of the regular expression can use wildcards and co, but only things wrapped in () will be returned in matches

  • Filter filters for results starting with the following line: Nope. The range ^ outside the range [] will match everything that starts with the next full line stop. On a new line, effective, not just "words."
  • $ Filters for results that end with the next line? The same as above is just the β€œend”, not the β€œbeginning”.
+2
source

Brief initial note. It is not recommended to parse HTML with regular expressions, but rather DomDocument or some other DOM parsing add-in. But since this is used only with 1 line of the sentence, here is what I would do with it:

 <?php // base string $str = '<link rel="canonical" href="could_be_anything_here_at_all" />'; // for preg_replace $preg_replace = '<link rel="canonical" href="'.preg_replace('/<link rel="canonical" href="(.*)" \/>/','MY_NEW_LINK',$str).'" />'; echo $preg_replace; // preg_match_all preg_match_all('/<link rel="canonical" href="(.*)" \/>/',$str,$preg_match); echo '<pre>',print_r($preg_match),'</pre>'; // process as you wish 
+1
source

Source: https://habr.com/ru/post/1478895/


All Articles