Let Scrapy Continue Scanning from the Last Breakpoint

Question

I use scrapy to crawl the site, but it doesn’t work well (power is turned off, etc.).

I wonder how I can continue scanning from the place where it was broken. I do not want to start with seeds.

+4

MrROY Aug 18 '13 at 10:04

1 answer

Danilo bargen · Answer 1 · 2013-10-01T15:40:33+0000

This can be done by saving the scheduled disk requests.

scrapy crawl somespider -s JOBDIR=crawls/somespider-1