302s and lose cookies with urllib2

Question

302s and lose cookies with urllib2

I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a page login to automate loading.

I saw some questions and answers to this, but nothing solves my problem. I lose my cookie when I simulate a login that ends with a 302 redirect. Answer 302 is where the cookie is set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during the redirect. I tried to create an HTTPRedirectHandler class to ignore the redirect, but that didn't seem to help. I tried calling the CookieJar globally to process cookies from the HTTPRedirectHandler, but 1. This did not work (because I was processing the header from the redirector, and the CookieJar function that I used, extract_cookies, needed a full request) and 2. This is an ugly way Deal with it.

I will probably need some guidance on this, since I'm pretty green with Python. I think I basically bark on the right tree here, but maybe focus on the wrong branch.

cj = cookielib.CookieJar() cookieprocessor = urllib2.HTTPCookieProcessor(cj) class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): global cj cookie = headers.get("set-cookie") if cookie: # Doesn't work, but you get the idea cj.extract_cookies(headers, req) return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) http_error_301 = http_error_303 = http_error_307 = http_error_302 cookieprocessor = urllib2.HTTPCookieProcessor(cj) # Oh yeah. I'm using a proxy too, to follow traffic. proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'}) opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)

Addition: I also tried to use a machine operator, but to no avail. This is probably a new question, but I will present it here, as it is the same final goal:

This simple code using mechanization when used with 302 emitting URLs (http://fxfeeds.mozilla.com/firefox/headlines.xml) - note that the same behavior occurs when set_handle_robots (False) is not used. I just wanted to make sure this is not the case:

 import urllib2, mechanize browser = mechanize.Browser() browser.set_handle_robots(False) opener = mechanize.build_opener(*(browser.handlers)) r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")

Output:

 Traceback (most recent call last): File "redirecttester.py", line 6, in <module> r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml") File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302 File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request AttributeError: OpenerDirector instance has no attribute '_add_referer_header'

Any ideas?

+2

python urllib2 mechanize cookiejar

umop Apr 4 '11 at 20:28

source share

4 answers

jathanism · Answer 1 · 2011-04-04T21:05:20+0000

Recently, I had the same problem, but in the interests of time it refused and decided to go with mechanize . It can be used as a complete replacement for urllib2 , which behaves exactly the way you expect the browser to behave with respect to Referer headers, redirects and cookies.

 import mechanize cj = mechanize.CookieJar() browser = mechanize.Browser() browser.set_cookiejar(cj) browser.set_proxies({'http': '127.0.0.1:8888'}) # Use browser handlers to create a new opener opener = mechanize.build_opener(*browser.handlers)

The Browser object can be used as the opener itself (using the .open() method). It maintains internal state, but also returns a response object for each call. This way you get more flexibility.

In addition, if you do not need to manually check the cookiejar or pass it to something else, you can omit the explicit creation and purpose of this object.

I fully understand that this does not concern what is really happening, and why urllib2 cannot provide this solution out of the box, or at least without a lot of configuration, but if you have little time and just want it to work, just use mechanize.

Valentin · Answer 2 · 2011-06-12T12:31:03+0000

Depends on how the redirection is performed. If this is done using HTTP Refresh, then the mechanization has an HTTPRefreshProcessor that you can use. Try creating an opener like this:

 cj = mechanize.CookieJar() opener = mechanize.build_opener( mechanize.HTTPCookieProcessor(cj), mechanize.HTTPRefererProcessor, mechanize.HTTPEquivProcessor, mechanize.HTTPRefreshProcessor)

RuiDC · Answer 3 · 2011-08-11T01:56:09+0000

I just changed an option that works for me, at least when you try to read Atom from http://www.fudzilla.com/home?format=feed&type=atom

I can’t check if the following snippet will work as is, but it can give you a start:

 import cookielib cookie_jar = cookielib.LWPCookieJar() cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar) handlers = [cookie_handler] #+others, we have proxy + progress handlers opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2848 for implementation of _FeedURLHandler opener.addheaders = [] #may not be needed but see the comments around the link referred to below try: return opener.open(request) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2954 for implementation of request finally: opener.close()

Steiny · Answer 4 · 2016-02-03T22:27:27+0000

I also had the same issue when the server responded to the POST login request using 302 and the session token in the Set-Cookie header. Using Wireshark, it was clearly visible that urllib monitors the redirection, but does not include the session token in the Cookie.

I literally just ripped out urllib and made a direct replacement for requests , and it worked fine for the first time, without changing anything. Great props for these guys.

302s and lose cookies with urllib2

More articles: