Can a multi-threaded curl handle a large number of simultaneous URLs?

I need to call a large number of APIs at the same time. I am trying to do this with a multi-threaded curl, but it looks like it was not able to get all the API results properly (some errors, I think this is a timeout ???) if I pass it a lot of URLs, It seems that 50 URLs are the maximum that I can pass, and about 100 at a time when I really start to see problems. Because of this, I had to implement logic in order to fragment the URLs that I am trying to twist at the moment.

Questions:

  • What can cause turbulence problems?
  • Is there anything in curl that I can configure to say that I need to wait for more answers - in case my problems have something to do with timeouts?
  • Is there something in my /php.ini server that I can tune to improve the performance of my script?

Here's the script:

function multithreaded_curl(array $urls, $concurrent_urls = 50) { // Data to be returned $total_results = array(); // Chunk the URLs $chunked_urls = array_chunk($urls, $concurrent_urls); foreach ($chunked_urls as $chunked_url) { // Chunked results $results = array(); // Array of cURL handles $curl_handles = array(); // Multi-handle $mh = curl_multi_init(); // Loop through $chunked_urls and create curl handles, then add them to the multi-handle foreach ($chunked_url as $k => $v) { $curl_handles[$k] = curl_init(); curl_setopt($curl_handles[$k], CURLOPT_URL, $v); curl_setopt($curl_handles[$k], CURLOPT_HEADER, 0); curl_setopt($curl_handles[$k], CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl_handles[$k], CURLOPT_SSL_VERIFYPEER, 0); curl_multi_add_handle($mh, $curl_handles[$k]); } // Execute the handles $running = NULL; do { curl_multi_exec($mh, $running); } while ($running > 0); // Get content and remove handles foreach ($curl_handles as $k => $v) { $results[$k] = json_decode(curl_multi_getcontent($v), TRUE); curl_multi_remove_handle($mh, $v); } // All done curl_multi_close($mh); // Combine results $total_results = array_merge($total_results, $results); } return $total_results; } 
+5
source share
1 answer

regarding Q1: As already noted, there are several solutions to problems with this algorithm. First of all, it probably exhausts local (handles, etc.), as well as remote (maxConnections, maxThreads, etc.) resources. Do not do it this way.

regarding Q2: you do not need (see below), but please get error answers before guessing errors.

regarding Q3: yes, there are several options on the REMOTE web server depending on the provider of the remote web server (restrictions on stream numbers, maximum number of connections, maximum number of connections per client, etc.). If this is also your server, you can configure them to better meet your needs, but first you must configure the client algorithm.

In general, it makes no sense to run more than a few connections at a time. Reusing connections is much faster and does not spoil your local handles, etc. And does not do DOS attacks for remote systems. The only reason for this is because the server takes a long time to process the request than io takes.

Did you check the speed when you do this, say 4 connections at a time and reuse them instead of creating new ones? Indeed, you populate curl_handles [] for one use of each of them. Creating IO objects takes time.

+4
source

Source: https://habr.com/ru/post/1243535/


All Articles