How to prevent Python urllib (2) from redirecting

I'm currently trying to log in to the site using Python, however, it seems the site is sending a cookie and a redirect statement on the same page. Python seems to make sure that redirecting this way prevents me from reading the cookie on the login page. How to prevent using urlib (or urllib2) python after redirection?

+42
python urllib2
Feb 16 '09 at 20:29
source share
4 answers

You could do a couple of things:

  • Create your own HTTPRedirectHandler that intercepts every redirect
  • Create an instance of HTTPCookieProcessor and set this opener so that you have access to the cookiejar.

This is a small thing that shows how

import urllib2 #redirect_handler = urllib2.HTTPRedirectHandler() class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): print "Cookie Manip Right Here" return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) http_error_301 = http_error_303 = http_error_307 = http_error_302 cookieprocessor = urllib2.HTTPCookieProcessor() opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor) urllib2.install_opener(opener) response =urllib2.urlopen("WHEREEVER") print response.read() print cookieprocessor.cookiejar 
+33
Feb 16 '09 at 21:13
source share

If you only need to stop the redirect, then there is an easy way to do this. For example, I want to receive cookies and for best performance I don’t want to be redirected to any other page. I also hope that the code will be saved as 3xx. let's say, for example, 302.

 class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor): def http_response(self, request, response): code, msg, hdrs = response.code, response.msg, response.info() # only add this line to stop 302 redirection. if code == 302: return response if not (200 <= code < 300): response = self.parent.error( 'http', request, response, code, msg, hdrs) return response https_response = http_response cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor) 

That way you don’t even have to go into urllib2.HTTPRedirectHandler.http_error_302 ()

Even more common is that we just want to stop the redirect (as needed):

 class NoRedirection(urllib2.HTTPErrorProcessor): def http_response(self, request, response): return response https_response = http_response 

And usually use it like this:

 cj = cookielib.CookieJar() opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj)) data = {} response = opener.open('http://www.example.com', urllib.urlencode(data)) if response.code == 302: redirection_target = response.headers['Location'] 
+27
Jul 31 '12 at 16:33
source share

urllib2.urlopen calls build_opener() , which uses this list of handler classes:

 handlers = [ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor] 

You can try calling urllib2.build_opener(handlers) yourself with a list that passes the HTTPRedirectHandler , and then call the open() method on the result to open your URL. If you really don't like redirects, you can even call urllib2.install_opener(opener) to your discovery without redirecting.

It seems your real problem is that urllib2 does not make cookies the way you would like. See Also How to use Python to access a web page and receive cookies for later use?

+11
Feb 16 '09 at 20:38
source share

This question has been asked until here .

EDIT: If you have to deal with fancy web applications, you should probably try mechanize . This is a great library that mimics a web browser. You can control redirects, cookies, refresh pages ... If a website doesn’t [rely] heavily on JavaScript, you get along with mechanization.

+3
Feb 16 '09 at 20:46
source share



All Articles