Before linking me to the other answers related to this, please note that I read them and am still a bit confused. Ok, here we go.
So, I am creating a webapp in Django. I am importing the latest scrapy library to crawl a website. I do not use celery (I know very little about it, but saw it in other topics related to this).
One of the URLs on our site, / crawl /, is for launching the crawler. This is the only url on our site that requires scrapy. Here is the function called when visiting the URL:
def crawl(request):
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
runner = CrawlerRunner()
d = runner.crawl(ReviewSpider)
d.addBoth(lambda _: reactor.stop())
reactor.run()
return render(request, 'index.html')
You will notice that this is an adaptation of a treatment textbook on their website. When you first visit this URL when starting the server, everything works as intended. The second time on, a ReactorNotRestartable exception is thrown. I understand that this exception occurs when a command is issued in a reactor that has already been shut down to start again, which is not possible.
If you look at the sample code, I would suggest that the string "runner = CrawlerRunner ()" will return a new ~ reactor to use every time this URL is visited. But I believe that perhaps my understanding of twisted reactors is not entirely clear.
How can I start and start a new reactor every time I visit this URL?
Thank you very much