Sample code for Scrapy process_links and process_request

Question

Sample code for Scrapy process_links and process_request

I'm new to Scrapy, and I was hoping someone could give me some good code examples when process_links and process_request are most useful. I see that process_links is used to filter the url, but I don't know how to encode it.

Thank.

+4

python scrapy

Arrow Jul 15 '16 at 15:58

source share

1 answer

Granitosaurus · Answer 1 · 2016-07-15T22:18:12+0000

You mean scrapy.spiders.Rulewhich is most often used inscrapy.CrawlSpider

They do pretty much what the names say or in other words, which act as a kind of middleware between the moments when the link is extracted and processed / loaded.

process_links , . , , :

, .
, .

:

def process_links(self, link):
    for link in links:
        #1
        if 'foo' in link.text:
            continue  # skip all links that have "foo" in their text
        yield link 
        #2
        link.url = link.url + '/'  # fix url to avoid unnecessary redirection
        yield link

process_requests , , . process_links, , :

(, cookie).
, , URL-.

:

def process_req(self, req):
    # 1
    req = req.replace(headers={'Cookie':'foobar'})
    return req
    # 2
    if 'foo' in req.url:
        return req.replace(callback=self.parse_foo)
    elif 'bar' in req.url:
        return req.replace(callback=self.parse_bar)
    return req

, , , .

Sample code for Scrapy process_links and process_request

More articles: