Here is a simple spider spider
import scrapy
class ExampleSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["https://www.dmoz.org"]
start_urls = ('https://www.dmoz.org/')
def parse(self,response):
yield scrapy.Request(self.start_urls[0],callback=self.parse2)
def parse2(self, response):
print(response.url)
When the program starts, the parse2 method does not work, and it does not print response.url. Then I found a solution to this in the lower thread.
Why is my second request not called in my scrapy spider syntax method
Just that I need to add dont_filter = True as an argument in the request method in order to make the parse2 function work.
yield scrapy.Request(self.start_urls[0],callback=self.parse2,dont_filter=True)
But in the examples provided in the scrapy documentation and many YouTube tutorials, they never used the dont_filter = True argument in the scrapy.Request method and still their second parsing functions work.
Take a look at this
def parse_page1(self, response):
return scrapy.Request("http://www.example.com/some_page.html",
callback=self.parse_page2)
def parse_page2(self, response):
self.logger.info("Visited %s", response.url)
, dont_filter = True ? ? , ?
P.S. QA, , , 50 ( !)