I want to use scrapy for the website so that the pages are divided into many subdomains. I know what I need CrawlSpiderwith Rule, but I need the rule to be just “allow all subdomains and allow parsers to process themselves according to the data” (which means - in the example item_links are in different subdomains) / p>
code example:
def parse_page(self, response):
sel = Selector(response)
item_links = sel.xpath("XXXXXXXXX").extract()
for item_link in item_links:
item_request = Request(url=item_link,
callback=self.parse_item)
yield item_request
def parse_item(self, response):
sel = Selector(response)
** EDIT ** Just to make the question clear, I want the ability to crawl everything * .example.com → which means not to receive Filtered offsite request to 'foo.example.com'
** OTHER CHANGE ** After @agstudy's answer, make sure you remember to delete allowed_domains = ["www.example.com"]
source
share