Scrapy User timeout caused connection failure

I use scrapy to upload images, but got a timeout error:

Retrying <GET http://www/***.jpg> (failed 1 times): User timeout caused connection failure 

However, I can upload the image using wget instantly . DOWNLOAD_TIMEOUT (scrapy parameter) is set to the default value of 180 seconds, so this should not be the main cause of the error. I tried using scrapy with a proxy server and a non-proxy server, both give me the above errors.

+6
source share
1 answer

If you clean multiple images (especially from multiple domains), the download will occur simultaneously, and each download may take longer than downloading a single image from the command line. Try decreasing the CONCURRENT_REQUESTS setting and increasing DOWNLOAD_TIMEOUT .

Verify with the scrapy fetch URL that you can get the image to eliminate the Scrapy problem.

Finally, check the differences in the request headers (User-agent, cookies, referrer, etc.), some differences here may explain the difference in the response from the server. If you can find a title that matters, it's easy to change it in Scrapy.

+8
source

Source: https://habr.com/ru/post/953439/


All Articles