Here's how to do it with native DOM extensions
$doc = new DOMDocument;
$doc->loadHtmlFile('http://example.com/');
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a/@href');
$urls = array();
foreach($links as $link) {
$urls[] = $link->value;
}
print_r($urls);
Please note that the above will also find relative links. If you do not want them to change the Xpath to
'//a/@href[starts-with(., "http")]'
Note that using Regex to match HTML is the way to madness. Regex matches string patterns and knows nothing about HTML elements and attributes. DOM, so you should prefer it over Regex for every situation that goes beyond matching the super-trivial string pattern from Markup.
source
share