The regular expression you wrote is everywhere. Follow the template:
Whatever happens, it starts with <link and ends with the symbol ></link> or /> (should be considered for these annoying non-standard web pirates). You are looking for the rel parameter, if any, and it should be canonical.
We can start writing a regular expression: #<link([^>]+)(/>|></link>)#is . This will display all link tags. You can then parse the parameters with simple strpos calls.
If you are sure that rel = "canonical" will be the first parameter of the link tag, you can extend the regular expression further in #<link rel="canonical" href="?'?([^"']+)"?'?(/>|></link>)#is . This will display it in order, which is great if you are sure it will be in order.
In order of appearance:
[^>]+ matches one character > one or more times
The is flags mean: case insensitive, don't break on a new line
"?'? matches 0 or one" followed by 0 or 1'
If something else is unclear, let me know.
Edit: answer your questions
// start and end the expression? They are called delimiters, and they "enclose" the expression. Perl's regular expression mechanism allows you to set flags relative to the expression (i, s, g, b, etc.), and they must be outside the expression. They follow the separator, and this is the separator point. You can use any character you like - he will choose the very last two repeating ones. People tend to use / because of JS, using one single char for this - I prefer # in PHP to clear / ambiguities arising from closing HTML tags.
() indicate the individual expressions that should be matched for the returned string? () matches a subset and allows you to return it to the results if you specify a variable for matches. Each part of the regular expression can use wildcards and co, but only things wrapped in () will be returned in matches
- Filter filters for results starting with the following line: Nope. The range ^ outside the range [] will match everything that starts with the next full line stop. On a new line, effective, not just "words."
- $ Filters for results that end with the next line? The same as above is just the βendβ, not the βbeginningβ.
source share