Use CURL to read the remote URL to retrieve the HTML.
$url = "http://www.example.com"; $curl = curl_init($url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); $output = curl_exec($curl); curl_close($curl)
Then use the PHP DOM object model to parse the HTML.
For example, to get all the <h1>
tags from the source,
$DOM = new DOMDocument; $DOM->loadHTML( $output); //get all H1 $items = $DOM->getElementsByTagName('h1'); //display all H1 text for ($i = 0; $i < $items->length; $i++) echo $items->item($i)->nodeValue . "<br/>";
source share