Removing cookies in scrapy Request

I use scrapy + selenium since the website I use for crawling needs javascript for authentication. I log in using selenium and pass cookies on the following request.

def login(self, response):
    driver = webdriver.Firefox()
    driver.get("http://www.site.com/login")
    driver.find_element_by_xpath("//input[@id='myname']").send_keys(settings['USERNAME'])
    driver.find_element_by_xpath("//input[@id='mypwd']").send_keys(settings['PASSWORD'])
    driver.find_element_by_xpath("//input[@name='Logon']").click()
    self.driver = driver
    return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)

So far so good, since cookies are sticky, all of the following requests work fine. My scrap is quite long, so at some point the cookie expires, so I need to relocate. At this point, I submit a new request with a callback to login. Here it fails, as new cookies combine with old ones. Is there a way to reset cookie?

ANSWER

@Drewness in his answer suggested using the attribute dont_merge_cookiesin a meta dictionary. This did not help for the following reason. According to the source code , the following query:

Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, meta={'dont_merge_cookies' : True}, dont_filter=True)

should not do anything with the cookies that you transfer to him.

In my solution, I decided to skip the attribute dont_merge_cookiesand simply reset the response headers immediately before creating the request:

response.headers = {}
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)
+4
source share
1 answer

From docs :

- cookie ( ), cookies . -. , , - cookie. Scrapy .

:

request_with_cookies = Request(url="http://www.example.com",
                               cookies={'currency': 'USD', 'country': 'UY'},
                               meta={'dont_merge_cookies': True})

dont_merge_cookies , .

0

Source: https://habr.com/ru/post/1529329/


All Articles