CrawlSpider with Splash enabled after the first URL

Question

CrawlSpider with Splash enabled after the first URL

I am writing a scrapy spider where I need to display some responses with a splash. My spider is based on CrawlSpider. I need to make my start_url answers to feed my scanner. Unfortunately, my spider scanner stops after the first answer. Any idea what is going wrong?

class VideoSpider(CrawlSpider):

    start_urls = ['https://juke.com/de/de/search?q=1+Mord+f%C3%BCr+2']

rules = (
    Rule(LinkExtractor(allow=()), callback='parse_items',process_request = "use_splash",),
)

def use_splash(self, request):
    request.meta['splash'] = {
            'endpoint':'render.html',
            'args':{
                'wait':0.5,
                }
            }     
    return request

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(url, self.parse, meta={
            'splash': {
                'endpoint': 'render.html',
                'args': {'wait': 0.5}
        }
    })  


def parse_items(self, response):      
    data = response.body
    print(data)

+3

scrapy scrapy-spider scrapy-splash

Jan wilhelm Jun 22 '16 at 21:15

source share

1 answer

scriptso · Accepted Answer · 2017-03-25T18:41:32+0000

Use SplashRequest instead of scrapy.Request ... Check out my answer CrawlSpider with Splash

CrawlSpider with Splash enabled after the first URL

More articles: