Running multiple spiders one by one

Question

Running multiple spiders one by one

I use the Scrapy framework so that spiders crawl through some web pages. Basically, I want to cancel web pages and save them in a database. I have one spider per web page. But I am having trouble launching these spiders right away, so the spider starts scanning exactly after the other spiders finish scanning. How can this be achieved? Is scrapyd a solution?

0

python scrapy scrapyd

Nabin Feb 11 '14 at 6:07

source share

1 answer

Guy gavriely · Accepted Answer · 2014-02-11T06:17:28+0000

scrapyd is really a good way, max_proc or max_proc_per_cpu can be used to limit the number of parallel spdiers, you will schedule spiders using scrapyd rest api like:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider

Running multiple spiders one by one

More articles: