Let Scrapy Continue Scanning from the Last Breakpoint

I use scrapy to crawl the site, but it doesn’t work well (power is turned off, etc.).

I wonder how I can continue scanning from the place where it was broken. I do not want to start with seeds.

+4
source share
1 answer

This can be done by saving the scheduled disk requests.

scrapy crawl somespider -s JOBDIR=crawls/somespider-1 

See http://doc.scrapy.org/en/latest/topics/jobs.html for more details.

+4
source

Source: https://habr.com/ru/post/1497651/