Cancel PHP Regex for Youtube URLs

Say I have HTML in a database that looks like this:

Hello world!

<a href="https://www.youtube.com/watch?v=m7t75u72vd">ABC</a>

Blah blah blah...

https://www.youtube.com/watch?v=df82vnx07s

Blah blah blah...
<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>

Now I want to use the PHP regex to capture 2nd and 3rd URLs, but ignore the first.

The regex equation that I still have is:

\s*[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)

This works very well, but I don’t know how to make it exclude / cancel the first type of URL that starts with: href = "

Please help, thanks!

+4
source share
3 answers

"negative lookbehind", , . , ((?<!href=[\'"])http) . , !

$regex    = '/((?<!href=[\'"])http)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)/';
$useCases = [
    1 => '<a href="https://www.youtube.com/watch?v=m7t75u72vd">ABC</a>',
    2 => "<a href='https://www.youtube.com/watch?v=m7t75u72vd'>ABC</a>",
    3 => 'https://www.youtube.com/watch?v=df82vnx07s',
    4 => '<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>'
];
foreach ($useCases as $index => $useCase) {
    $matches = [];
    preg_match($regex, $useCase, $matches);
    if ($matches) {
        echo 'The regex was matched in usecase #' . $index . PHP_EOL;
    }
}
// Echoes:
// The regex was matched in usecase #3
// The regex was matched in usecase #4
+1

, , (?![^<]*>), , 0+, <, >:

[a-zA-Z\/:.]*youtu(?:be\.com\/watch\?v=|\.be\/)([a-zA-Z0-9\-_]+)(?![^<]*>)
                                                                   ^^^^^^^^^^

regex

. ., , be. ([a-zA-Z0-9\-_]+) [a-zA-Z0-9_-]+, , [a-zA-Z\/\/:\.]* , https?:\/\/[a-zA-Z.]*.

0

Example solution:

(?![^<]*>)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)

Explanation Visualization

0
source

Source: https://habr.com/ru/post/1668774/


All Articles