CURL sometimes returns an empty string for a valid URL

I use the roll-curl library [https://github.com/LionsAd/rolling-curl] to asynchronously retrieve content from a large number of web resources as part of a scheduled task. The library allows you to set the maximum number of concurrent CURL connections, and I started with 20, and then increased the speed to 50 to increase the speed.

It seems that every time I run it, arbitrary URLs of several thousand processed simply fail and return an empty string. It seems the more parallel connections I have, the more unsuccessful requests I receive. The same URL that did not fire once may work the next time I try to run this function. What could be the reason for this, and how to avoid it?

+4
source share
2 answers

Everything that Luc Franken wrote is accurate, and his answer led me to solve my version of the user problem, which:

Remote servers respond according to their own, very variable, schedules. To give them enough time to respond, it is important to set two cURL parameters to ensure free time. It:

CURLOPT_CONNECTTIMEOUT => 30 CURLOPT_TIMEOUT => 30 

You can use a longer and shorter time until you find something that minimizes errors. But if you get intermittent responses with curl / multi-curl / rollcurl, you can probably solve most of the problem this way.

+1
source

In general, you assume that this should not happen.

In the case of access to external servers, this is not so. Your code should be fully aware of servers that may not respond, not respond on time, or not respond correctly. The HTTP process allows everything to go wrong. If you get to the server, you should be notified by the HTTP error code (although this does not always happen), but network problems can create useless answers.

Do not trust external input. This is the root of the problem.

In your particular case, you consistently increase the number of requests. This will create more requests, open sockets and other uses. To find a solution to your exact problem, you need enhanced access to the server so you can see the log files and monitor open connections and other problems. You can test this on a test server without any other software creating connections so that you can isolate the problem.

But how well you have experienced this, you have only uncertainties. For example, you can block external servers because you are making too many requests. You may be stuck with some security filters, such as DDOS filters, etc. Monitoring and adjusting the number of requests (automated or manual) will create the most stable solution for you. You can also simply accept these lost requests and simply process a stable queue that ensures that you receive the content at a specific point in time.

0
source

Source: https://habr.com/ru/post/1345508/


All Articles