Regex for html tags that are not commented

I need to find all tags <link />in html that are not commented out.

For example, in html:

<link rel="stylesheet" href="xyz/dzgt/style.css" />
<!--[if IE 7]>
<link rel="stylesheet" type="text/css" href="xyz/dzgt/ie7.css" />
<![endif]-->

I need a regular expression <link rel="stylesheet" href="xyz/dzgt/style.css"/>, but not appropriate <link rel="stylesheet" type="text/css" href="xyz/dzgt/ie7.css" />, because it is surrounded <!-- -->.

I could find all the tags <link />with the following regex /<link.*href="(.*\.css)".*\/>/m, but it also matches the comments, but I only need those that are not commented.

Thanks for the help in advance!

+4
source share
1 answer

You should use DOMDocumentClass instead of regular expression to parse HTML. Check outthis.

<?php
$html='<link rel="stylesheet" href="xyz/dzgt/style.css" />
<!--[if IE 7]>
<link rel="stylesheet" type="text/css" href="xyz/dzgt/ie7.css" />
<![endif]-->';
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('link') as $tag) {
        echo $tag->getAttribute('href');

}

OUTPUT :

xyz/dzgt/style.css
+4
source

Source: https://habr.com/ru/post/1528938/


All Articles