ReactorNotRestartable error during loop using scrapy

Question

ReactorNotRestartable error during loop using scrapy

I get a twisted.internet.error.ReactorNotRestartable error when executing the following code:

 from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start() if result: break sleep(3)

The first time it works, I get an error message. I create a process variable every time, so what's the problem?

+12

python python-2.7 twisted scrapy

k_wit Oct 9 '16 at 17:47

source share

3 answers

paul trmbrth · Answer 1 · 2016-10-10T09:38:34+0000

By default, CrawlerProcess .start() stops the Twisted Reactor, which it creates when all scanners shut down.

You must call process.start(stop_after_crawl=False) if you create a process in each iteration.

Another option is to control the Twisted reactor yourself and use the CrawlerRunner . The docs have an example for this.

Sagun shrestha · Answer 2 · 2019-06-17T11:21:37+0000

I was able to solve this problem as follows. process.start() should be called only once.

 from time import sleep from scrapy import signals from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from scrapy.xlib.pydispatch import dispatcher result = None def set_result(item): result = item while True: process = CrawlerProcess(get_project_settings()) dispatcher.connect(set_result, signals.item_scraped) process.crawl('my_spider') process.start()

Alexis Mejía · Answer 3 · 2019-06-26T21:14:07+0000

Link http://crawl.blog/scrapy-loop/

  import scrapy from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings from twisted.internet import reactor from twisted.internet.task import deferLater def sleep(self, *args, seconds): """Non blocking sleep callback""" return deferLater(reactor, seconds, lambda: None) process = CrawlerProcess(get_project_settings()) def _crawl(result, spider): deferred = process.crawl(spider) deferred.addCallback(lambda results: print('waiting 100 seconds before restart...')) deferred.addCallback(sleep, seconds=100) deferred.addCallback(_crawl, spider) return deferred _crawl(None, MySpider) process.start()

ReactorNotRestartable error during loop using scrapy

More articles: