Downloading the html of a web page is commonly called screen cleaning. This can be useful if you want the program to retrieve data from this page. The easiest way to request HTTP resources is to use the cURL tool call. cURL comes as a standalone unix tool, but there are libraries that can be used in every programming language. To capture this page from a Unix command prompt:
curl http://stackoverflow.com/questions/1077970/in-any-languages-can-i-capture-a-webpageno-install-no-activex-if-i-can-plz
In PHP you can do the same:
<?php $ch = curl_init() or die(curl_error()); curl_setopt($ch, CURLOPT_URL,"http://stackoverflow.com/questions/1077970/in-any-languages-can-i-capture-a-webpageno-install-no-activex-if-i-can-plz"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $data1=curl_exec($ch) or die(curl_error()); echo "<font color=black face=verdana size=3>".$data1."</font>"; echo curl_error($ch); curl_close($ch); ?>
Now, before copying the entire website, you should check the robots.txt file to make sure robots are allowed to host your site, and you can check if there is an API available that allows you to retrieve data without HTML.
brianegge Jul 03 '09 at 6:34 2009-07-03 06:34
source share