ReactorNotRestartable error during loop using scrapy

I get a twisted.internet.error.ReactorNotRestartable error when executing the following code:

 from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start() if result: break sleep(3) 

The first time it works, I get an error message. I create a process variable every time, so what's the problem?

+12
source share
3 answers

By default, CrawlerProcess .start() stops the Twisted Reactor, which it creates when all scanners shut down.

You must call process.start(stop_after_crawl=False) if you create a process in each iteration.

Another option is to control the Twisted reactor yourself and use the CrawlerRunner . The docs have an example for this.

+5
source

I was able to solve this problem as follows. process.start() should be called only once.

 from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start() 
0
source

Link http://crawl.blog/scrapy-loop/

  import scrapy from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from twisted.internet import reactor from twisted.internet.task import deferLater def sleep(self, *args, seconds): """Non blocking sleep callback""" return deferLater(reactor, seconds, lambda: None) process = CrawlerProcess(get_project_settings()) def _crawl(result, spider): deferred = process.crawl(spider) deferred.addCallback(lambda results: print('waiting 100 seconds before restart...')) deferred.addCallback(sleep, seconds=100) deferred.addCallback(_crawl, spider) return deferred _crawl(None, MySpider) process.start() 
0
source

Source: https://habr.com/ru/post/1264425/


All Articles