How to get old url when redirecting to scrapy?

scrapy version: 0.20

Problem:

start_urls=[URL1,URL2,URL3]

def parse(self,response):
    //suppose URL2 is redirected to other URL
    //I need to get current start URL(before redirection) 

I tried with response.request.url, but it is the same as response.url

Please help me

+4
source share
1 answer

If you have it turned on RedirectMiddleware(it should be turned on by default), you can try:

original_url = response.meta.get('redirect_urls', [response.url])[0]

See https://github.com/scrapy/scrapy/blob/master/scrapy/downloadermiddlewares/redirect.py#L35 for details on implementation details

+9
source

Source: https://habr.com/ru/post/1532463/


All Articles