PHP function to capture all links inside a <DIV> on a remote site using the scrape method
Does anyone have a PHP function that can capture all links inside a specific DIV on a remote site? Thus, use can be:
$ links = grab_links ($ url, $ divname);
And return an array that I can use. Exciting links I can understand, but not sure how to do this, only within a specific div.
Thank! Scott
+3
3 answers
PHP XPath. . php- : http://php.net/manual/en/simplexmlelement.xpath.php
URL- DIV :
$xml = new SimpleXMLElement($docAsString);
$result = $xml->xpath('//div//a');
HTML , XML.
XPath: http://msdn.microsoft.com/en-us/library/ms256086.aspx
+2
Simple DOM PHP:
http://simplehtmldom.sourceforge.net/
:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
+2
-, , , , .
http://www.earthinfo.org/xpaths-with-php-by-example/
<?php
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='news_moreTopStories']//a/@href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
// for images
echo "<br><br>";
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='promo_area']//img/@src" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
?>
PHP DOM, ...
http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/
$html = file_get_contents('http://www.bbc.com');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);
//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementById('news_moreTopStories')->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->getAttribute('href'), '<br>';
}
+2