Sample code for Scrapy process_links and process_request

I'm new to Scrapy, and I was hoping someone could give me some good code examples when process_links and process_request are most useful. I see that process_links is used to filter the url, but I don't know how to encode it.

Thank.

+4
source share
1 answer

You mean scrapy.spiders.Rulewhich is most often used inscrapy.CrawlSpider

They do pretty much what the names say or in other words, which act as a kind of middleware between the moments when the link is extracted and processed / loaded.

process_links , . , , :

  • , .
  • , .

:

def process_links(self, link):
    for link in links:
        #1
        if 'foo' in link.text:
            continue  # skip all links that have "foo" in their text
        yield link 
        #2
        link.url = link.url + '/'  # fix url to avoid unnecessary redirection
        yield link

process_requests , , . process_links, , :

  • (, cookie).
  • , , URL-.

:

def process_req(self, req):
    # 1
    req = req.replace(headers={'Cookie':'foobar'})
    return req
    # 2
    if 'foo' in req.url:
        return req.replace(callback=self.parse_foo)
    elif 'bar' in req.url:
        return req.replace(callback=self.parse_bar)
    return req

, , , .

+4
source

Source: https://habr.com/ru/post/1648030/


All Articles