I have a script called algorithm.py and I want to be able to call Scrapy spiders during the script. File structure:
algorithm.py MySpiders /
where MySpiders is a folder containing several scripting projects. I would like to create the perform_spider1 (), perform_spider2 () methods ... which I can call in the .py algorithm.
How do I create this method?
I managed to call one spider using the following code, however this is not a method and it only works for one spider. I am a newbie in need of help!
import sys,os.path sys.path.append('path to spider1/spider1') from twisted.internet import reactor from scrapy.crawler import Crawler from scrapy.settings import Settings from scrapy import log, signals from scrapy.xlib.pydispatch import dispatcher from spider1.spiders.spider1_spider import Spider1Spider def stop_reactor(): reactor.stop() dispatcher.connect(stop_reactor, signal=signals.spider_closed) spider = RaListSpider() crawler = Crawler(Settings()) crawler.configure() crawler.crawl(spider) crawler.start() log.start() log.msg('Running reactor...') reactor.run()
source share