Scrapy: downloader / response_count vs response_received_count

I use scrapy to scan multiple websites and want to analyze the crawl speed. Statistics discarded at the end contain downloader/response_count value response_received_count value. The first is systematically superior to the second.

Why is there a difference, and which scanner element increases two values ​​in the statistics collector?

+6
source share
1 answer
  • CoreStats is an Extension responsible for response_received_count
  • DownloaderStats is Middleware responsible for downloader/response_count .

The CoreStats connects the signal signals.response_received signal with an increase in the response_received_count value, so it should count every received signals.response_received response (even bad conditions), while the DownloaderStats middleware processes the response in a certain order, as we can see here, its order is 850 , so previous Downloader middlewares (those whose number is less than 850 may discard or even receive errors when processing a response, and downloader/response_count will never be incremented.

+8
source

Source: https://habr.com/ru/post/1274446/


All Articles