How to restart scrapy spider

What I need:

  • run crawler
  • job seeker
  • wait 1 minute
  • start the search robot again

I try this:

from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from time import sleep while True: process = CrawlerProcess(get_project_settings()) process.crawl('spider_name') process.start() sleep(60) 

But the error will turn out:

twisted.internet.error.ReactorNotRestartable

Please help me do it right.

Python 3.6
Scrapy 1.3.2
Linux

+6
source share
2 answers

I think I found a solution:

 from scrapy.utils.project import get_project_settings from scrapy.crawler import CrawlerRunner from twisted.internet import reactor from twisted.internet import task timeout = 60 def run_spider(): l.stop() runner = CrawlerRunner(get_project_settings()) d = runner.crawl('spider_name') d.addBoth(lambda _: l.start(timeout, False)) l = task.LoopingCall(run_spider) l.start(timeout) reactor.run() 
+3
source

To avoid the ReactorNotRestartable error, you can try to create the main.py file, from where several times to call the scanner from the shell using subprocesses .

This main.py file could be like this:

 from time import sleep import subprocess timeout = 60 while True: command = 'scrapy crawl yourSpiderName' subprocess.run(command, shell=True) sleep(timeout) 
+2
source

Source: https://habr.com/ru/post/1264424/


All Articles