How to restart scrapy spider

Question

How to restart scrapy spider

What I need:

run crawler
job seeker
wait 1 minute
start the search robot again

I try this:

from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from time import sleep while True: process = CrawlerProcess(get_project_settings()) process.crawl('spider_name') process.start() sleep(60)

But the error will turn out:

twisted.internet.error.ReactorNotRestartable

Please help me do it right.

Python 3.6
Scrapy 1.3.2
Linux

+6

python python-3.x scrapy scrapy-spider

sojowok Feb 19 '17 at 22:11

source share

2 answers

sojowok · Answer 1 · 2017-02-20T16:13:26+0000

I think I found a solution:

 from scrapy.utils.project import get_project_settings from scrapy.crawler import CrawlerRunner from twisted.internet import reactor from twisted.internet import task timeout = 60 def run_spider(): l.stop() runner = CrawlerRunner(get_project_settings()) d = runner.crawl('spider_name') d.addBoth(lambda _: l.start(timeout, False)) l = task.LoopingCall(run_spider) l.start(timeout) reactor.run()

Haritz laboa · Answer 2 · 2019-04-10T15:22:13+0000

To avoid the ReactorNotRestartable error, you can try to create the main.py file, from where several times to call the scanner from the shell using subprocesses .

This main.py file could be like this:

 from time import sleep import subprocess timeout = 60 while True: command = 'scrapy crawl yourSpiderName' subprocess.run(command, shell=True) sleep(timeout)

How to restart scrapy spider

More articles: