Access session file in scrapy spiders

I am trying to access a session cookie inside a spider. First I enter the social network using in a spider:

def parse(self, response): return [FormRequest.from_response(response, formname='login_form', formdata={'email': '...', 'pass':'...'}, callback=self.after_login)] 

In after_login I would like to access session cookies in order to transfer them to another module (selenium here) in order to continue processing the page using an authenticated session.

I would like something like this:

  def after_login(self, response): # process response ..... # access the cookies of that session to access another URL in the # same domain with the autehnticated session. # Something like: session_cookies = XXX.get_session_cookies() data = another_function(url,cookies) 

Unfortunately, response.cookies does not return session cookies.

How can I get session cookies? I looked at the cookie middleware: scrapy.contrib.downloadermiddleware.cookies and scrapy.http.cookies, but there seems to be no easy way to access session cookies.

More details are given in my original question:

Unfortunately, I used your idea, but I don’t see cookies, although I know for sure that they exist because scrapy.contrib.downloadermiddleware.cookies prints cookies! These are the cookies I want to capture.

So here is what I am doing:

The after_login (self, response) method gets the response variable after proper authentication, and then I access the URL with the session data:

  def after_login(self, response): # testing to see if I can get the session cookies cookieJar = response.meta.setdefault('cookie_jar', CookieJar()) cookieJar.extract_cookies(response, response.request) cookies_test = cookieJar._cookies print "cookies - test:",cookies_test # URL access with authenticated session url = "http://site.org/?id=XXXX" request = Request(url=url,callback=self.get_pict) return [request] 

As you can see from the output below, there really are cookies, but I can’t grab them using cookieJar:

 cookies - test: {} 2012-01-02 22:44:39-0800 [myspider] DEBUG: Sending cookies to: <GET http://www.facebook.com/profile.php?id=529907453> Cookie: xxx=3..........; yyy=34.............; zzz=.................; uuu=44.......... 

So, I would like to get a dictionary containing the keys xxx, yyy, etc. with corresponding values.

Thank:)

+17
cookies session-cookies session scrapy
Jan 03 '12 at 5:35
source share
2 answers

A classic example is the presence of a login server that provides a new session identifier after a successful login. This new session identifier should be used with another request.

Here is the code obtained from the source, which seems to work for me.

 print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1] 

the code:

 def check_logged(self, response): tmpCookie = response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1] print 'cookie from login', response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1] cookieHolder=dict(SESSION_ID=tmpCookie) #print response.body if "my name" in response.body: yield Request(url="<<new url for another server>>", cookies=cookieHolder, callback=self."<<another function here>>") else: print "login failed" return 
+12
Oct 12 '15 at 1:03
source share
β€” -

This may be overkill, but I don’t know how you are going to use these cookies, so it can be useful (excerpt from real code - adapt it to your case):

 from scrapy.http.cookies import CookieJar class MySpider(BaseSpider): def parse(self, response): cookieJar = response.meta.setdefault('cookie_jar', CookieJar()) cookieJar.extract_cookies(response, response.request) request = Request(nextPageLink, callback = self.parse2, meta = {'dont_merge_cookies': True, 'cookie_jar': cookieJar}) cookieJar.add_cookie_header(request) # apply Set-Cookie ourselves 

CookieJar has several useful methods.

If you still do not see cookies - maybe they are not there?




UPDATE

Looking at the CookiesMiddleware code:

 class CookiesMiddleware(object): def _debug_cookie(self, request, spider): if self.debug: cl = request.headers.getlist('Cookie') if cl: msg = "Sending cookies to: %s" % request + os.linesep msg += os.linesep.join("Cookie: %s" % c for c in cl) log.msg(msg, spider=spider, level=log.DEBUG) 

So try request.headers.getlist('Cookie')

+6
Jan 03 2018-12-12T00:
source share



All Articles