Running multiple spiders one by one

I use the Scrapy framework so that spiders crawl through some web pages. Basically, I want to cancel web pages and save them in a database. I have one spider per web page. But I am having trouble launching these spiders right away, so the spider starts scanning exactly after the other spiders finish scanning. How can this be achieved? Is scrapyd a solution?

0
source share
1 answer

scrapyd is really a good way, max_proc or max_proc_per_cpu can be used to limit the number of parallel spdiers, you will schedule spiders using scrapyd rest api like:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider 
+1
source

Source: https://habr.com/ru/post/975637/


All Articles