First of all, regex and HTML don't mix. Using:
foreach(DOMDocument::loadHTML($source)->getElementsByTagName('a') as $a) { $a->getAttribute('href'); }
Links that may go beyond your site begin with a protocol or // , i.e.
http://example.com //example.com/
href="google.com" is a link to a local file.
But if you want to create a static copy of the site, why not just use wget ?
source share