CURL sometimes returns an empty string for a valid URL

Question

CURL sometimes returns an empty string for a valid URL

I use the roll-curl library [https://github.com/LionsAd/rolling-curl] to asynchronously retrieve content from a large number of web resources as part of a scheduled task. The library allows you to set the maximum number of concurrent CURL connections, and I started with 20, and then increased the speed to 50 to increase the speed.

It seems that every time I run it, arbitrary URLs of several thousand processed simply fail and return an empty string. It seems the more parallel connections I have, the more unsuccessful requests I receive. The same URL that did not fire once may work the next time I try to run this function. What could be the reason for this, and how to avoid it?

+4

asynchronous php curl

MarathonStudios Mar 28 '11 at 7:11

source share

2 answers

Ben shoval · Answer 1 · 2016-05-21T21:54:15+0000

Everything that Luc Franken wrote is accurate, and his answer led me to solve my version of the user problem, which:

Remote servers respond according to their own, very variable, schedules. To give them enough time to respond, it is important to set two cURL parameters to ensure free time. It:

CURLOPT_CONNECTTIMEOUT => 30 CURLOPT_TIMEOUT => 30

You can use a longer and shorter time until you find something that minimizes errors. But if you get intermittent responses with curl / multi-curl / rollcurl, you can probably solve most of the problem this way.

Luc franken · Answer 2 · 2012-02-02T12:46:16+0000

In general, you assume that this should not happen.

In the case of access to external servers, this is not so. Your code should be fully aware of servers that may not respond, not respond on time, or not respond correctly. The HTTP process allows everything to go wrong. If you get to the server, you should be notified by the HTTP error code (although this does not always happen), but network problems can create useless answers.

Do not trust external input. This is the root of the problem.

In your particular case, you consistently increase the number of requests. This will create more requests, open sockets and other uses. To find a solution to your exact problem, you need enhanced access to the server so you can see the log files and monitor open connections and other problems. You can test this on a test server without any other software creating connections so that you can isolate the problem.

But how well you have experienced this, you have only uncertainties. For example, you can block external servers because you are making too many requests. You may be stuck with some security filters, such as DDOS filters, etc. Monitoring and adjusting the number of requests (automated or manual) will create the most stable solution for you. You can also simply accept these lost requests and simply process a stable queue that ensures that you receive the content at a specific point in time.

CURL sometimes returns an empty string for a valid URL

More articles: