How can I speed this up?

Question

How can I speed this up?

I have a script, which, in my opinion, is a fairly simple scraper, call it what you want, but it takes an average of at least 6 seconds ... is it possible to speed it up? The $ date variables exist only for code synchronization and do not add anything significant to the time it takes. I set two timestamps, and each of them is approximately 3 seconds. Example URL below for testing

$date = date('m/d/Y h:i:s a', time()); echo "start of timing $date<br /><br />"; include('simple_html_dom.php'); function getUrlAddress() { $url = $_SERVER['HTTPS'] == 'on' ? 'https' : 'http'; return $url .'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']; } $date = date('m/d/Y h:i:s a', time()); echo "<br /><br />after geturl $date<br /><br />"; $parts = explode("/",$url); $html = file_get_html($url); $date = date('m/d/Y h:i:s a', time()); echo "<br /><br />after file_get_url $date<br /><br />"; $file_string = file_get_contents($url); preg_match('/<title>(.*)<\/title>/i', $file_string, $title); $title_out = $title[1]; foreach($html->find('img') as $e){ $image = $e->src; if (preg_match("/orangeBlue/", $image)) { $image = ''; } if (preg_match("/BeaconSprite/", $image)) { $image = ''; } if($image != ''){ if (preg_match("/http/", $image)) { $image = $image; } elseif (preg_match("*//*", $image)) { $image = 'http:'.$image; } else { $image = $parts['0']."//".$parts[1].$parts[2]."/".$image; } $size = getimagesize($image); if (($size[0]>110)&&($size[1]>110)){ if (preg_match("/http/", $image)) { $image = $image; } echo '<img src='.$image.'><br>'; } } } $date = date('m/d/Y h:i:s a', time()); echo "<br /><br />end of timing $date<br /><br />";

URL example

UPDATE

This is really what timestamps show:

countdown 01/24/2012 12:31:50 a.m.

after geturl 12/24/2012 12:31:50 am

after file_get_url 01/24/2012 12:31:53 am

end of time 01/24/2012 12:31:57 a.m.

 http://www.ebay.co.uk/itm/Duke-Nukem-Forever-XBOX-360-Game-BRAND-NEW-SEALED-UK-PAL-UK-Seller-/170739972246?pt=UK_PC_Video_Games_Video_Games_JS&hash=item27c0e53896`

0

html php simple-html-dom

Darren sweeney Jan 24 '12 at 0:21

source share

2 answers

I'm not a PHP guy, but it seems to me that you go to the Internet to get the file twice ...

First use this:

 $html = file_get_html($url);

Then again using this:

 $file_string = file_get_contents($url);

So, if each hit takes a couple of seconds, you can reduce the time by finding a way to reduce this to one web hit.

Either that, or I'm blind. This is a real opportunity!

+1

MrChips Jan 24 '12 at 0:46

source share

Hans · Accepted Answer · 2012-01-24T00:27:13+0000

Probably the getimagesize function — it collects and retrieves each image on the page to determine the size. Perhaps you can write something with curl to get the header for Content-size only (although, in fact, this may be what getimagesize does).

In any case, on the same day I wrote several spiders, and this is a slow action, since Internet speed is better than ever, it is still a sample for each element. And I was not even interested in the image.

How can I speed this up?

More articles: