Running file_put_contents in parallel?

I was looking for stackoverflow for a solution, but couldn't find anything even close to what I'm trying to achieve. Perhaps I just blissfully do not know some kind of magic PHP sauce that solves this problem ...;)

Basically I have an array with give or pick up a few hundred URLs, pointing to different XML files on the remote server. I do some file checks to check if the contents of the XML files have changed, and if that happens, I will upload the new XML files to my server.

PHP code:

$urls = array( 'http://stackoverflow.com/a-really-nice-file.xml', 'http://stackoverflow.com/another-cool-file2.xml' ); foreach($urls as $url){ set_time_limit(0); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_BINARYTRANSFER, false); $contents = curl_exec($ch); curl_close($ch); file_put_contents($filename, $contents); } 

Now $ filename is set somewhere else and gives each xml its own identifier based on my logic. So far, this script is working fine and doing what it should, but it is very slow. I know that my server can handle a lot more, and I suspect my foreach is slowing down the process.

Is there a way to speed up foreach? I am currently thinking of extending the file_put_contents in each foreach loop to 10 or 20, basically reducing the runtime 10 or 20 times, but I can't figure out how to approach this best and most efficient way. Any help or pointers on how to proceed?

+4
source share
3 answers

Your bottleneck (most likely) is your curling requests, you can only write to a file after each request, there is no way (in one script) to speed up this process.

I don’t know how it all works, but you can do parallel requests: http://php.net/manual/en/function.curl-multi-exec.php .

Perhaps you can get the data (if memory is available for storing it), and then when they fill the data.

+6
source

Just run another script. Each script will load some URLs.

You can get more information about this template here: http://en.wikipedia.org/wiki/Thread_pool_pattern

The more the script works, the more parallelism you get

+2
source

I use guzzle pool in parallel queries;) (you can send x paralel request)

http://docs.guzzlephp.org/en/stable/quickstart.html

0
source

Source: https://habr.com/ru/post/1438107/


All Articles