How does adding the dont_filter = True argument to scrapy.Request make my parsing method work?

Here is a simple spider spider

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["https://www.dmoz.org"]
    start_urls = ('https://www.dmoz.org/')

    def parse(self,response):
        yield scrapy.Request(self.start_urls[0],callback=self.parse2)

    def parse2(self, response):
        print(response.url)

When the program starts, the parse2 method does not work, and it does not print response.url. Then I found a solution to this in the lower thread.

Why is my second request not called in my scrapy spider syntax method

Just that I need to add dont_filter = True as an argument in the request method in order to make the parse2 function work.

yield scrapy.Request(self.start_urls[0],callback=self.parse2,dont_filter=True)

But in the examples provided in the scrapy documentation and many YouTube tutorials, they never used the dont_filter = True argument in the scrapy.Request method and still their second parsing functions work.

Take a look at this

def parse_page1(self, response):
    return scrapy.Request("http://www.example.com/some_page.html",
                      callback=self.parse_page2)

def parse_page2(self, response):
    # this would log http://www.example.com/some_page.html
    self.logger.info("Visited %s", response.url)

, dont_filter = True ? ? , ?

P.S. QA, , , 50 ( !)

+4
1

: . Scrapy , . parse2 . dont_filter=True, scrapy . .

:

Scrapy, start_urls start_requests(), URL- parse, , . , Scrapy. , parse . , .

Scrapy , . , Scrapy , URL-, scrapy .

URL- start_urls. Scrapy URL-. parse. parse URL- ( ), parse2 . , scrapy . . , parse2 .

, URL- , start_requests() scrapy.Request start_urls.

+6

Source: https://habr.com/ru/post/1651258/


All Articles