I tried using an example of using scrapy on a page (an example under the name: returning several requests and elements from one callback)
I just changed the domains to point to a real website:
import scrapy
class MySpider(scrapy.Spider):
name = 'huffingtonpost'
allowed_domains = ['huffingtonpost.com/']
start_urls = [
'http://www.huffingtonpost.com/politics/',
'http://www.huffingtonpost.com/entertainment/',
'http://www.huffingtonpost.com/media/',
]
def parse(self, response):
for h3 in response.xpath('//h3').extract():
yield {"title": h3}
for url in response.xpath('//a/@href').extract():
yield scrapy.Request(url, callback=self.parse)
But getting ValuError
as published in that sense . Any ideas?
source
share