I use scrapy + selenium since the website I use for crawling needs javascript for authentication. I log in using selenium and pass cookies on the following request.
def login(self, response):
driver = webdriver.Firefox()
driver.get("http://www.site.com/login")
driver.find_element_by_xpath("//input[@id='myname']").send_keys(settings['USERNAME'])
driver.find_element_by_xpath("//input[@id='mypwd']").send_keys(settings['PASSWORD'])
driver.find_element_by_xpath("//input[@name='Logon']").click()
self.driver = driver
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)
So far so good, since cookies are sticky, all of the following requests work fine. My scrap is quite long, so at some point the cookie expires, so I need to relocate. At this point, I submit a new request with a callback to login. Here it fails, as new cookies combine with old ones. Is there a way to reset cookie?
ANSWER
@Drewness in his answer suggested using the attribute dont_merge_cookiesin a meta dictionary. This did not help for the following reason. According to the source code , the following query:
Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, meta={'dont_merge_cookies' : True}, dont_filter=True)
should not do anything with the cookies that you transfer to him.
In my solution, I decided to skip the attribute dont_merge_cookiesand simply reset the response headers immediately before creating the request:
response.headers = {}
return Request(url=driver.current_url, cookies=self.driver.get_cookies(), callback=self.after_login, dont_filter=True)