Python only gets headers with urllib2

I need to implement a function to get headers (without GET or POST) using urllib2. Here is my function:

def getheadersonly(url, redirections = True): if not redirections: class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) http_error_301 = http_error_303 = http_error_307 = http_error_302 cookieprocessor = urllib2.HTTPCookieProcessor() opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor) urllib2.install_opener(opener) class HeadRequest(urllib2.Request): def get_method(self): return "HEAD" info = {} info['headers'] = dict(urllib2.urlopen(HeadRequest(url)).info()) info['finalurl'] = urllib2.urlopen(HeadRequest(url)).geturl() return info 

Uses the code from this and the answer. However, this one performs redirection even if the flag is False . I tried the code with:

 print getheadersonly("http://ms.com", redirections = False)['finalurl'] print getheadersonly("http://ms.com")['finalurl'] 

This gives morganstanley.com in both cases. What is wrong here?

+1
python urllib2
Mar 27 2018-12-12T00:
source share
2 answers

Firstly, your code contains several errors:

  • With each getheadersonly request getheadersonly you install a new global urlopener, which is then used on subsequent calls to urllib2.urlopen

  • You make two HTTP requests to get two different response attributes.

  • The implementation of urllib2.HTTPRedirectHandler.http_error_302 not so trivial, and I do not understand how it can prevent redirects in the first place.

In principle, you should understand that each handler is installed in the opener to handle a certain type of response. urllib2.HTTPRedirectHandler needs to convert certain http codes to redirects. If you do not want a redirect, do not add a redirect handler to the opener. If you do not want to open ftp links, do not add FTPHandler , etc.

This is all you need to do is create a new opener and add urllib2.HTTPHandler() , configure the request for the request "HEAD" and pass the instance of the request to the opener, read the attributes and close the response.

 class HeadRequest(urllib2.Request): def get_method(self): return 'HEAD' def getheadersonly(url, redirections=True): opener = urllib2.OpenerDirector() opener.add_handler(urllib2.HTTPHandler()) opener.add_handler(urllib2.HTTPDefaultErrorHandler()) if redirections: # HTTPErrorProcessor makes HTTPRedirectHandler work opener.add_handler(urllib2.HTTPErrorProcessor()) opener.add_handler(urllib2.HTTPRedirectHandler()) try: res = opener.open(HeadRequest(url)) except urllib2.HTTPError, res: pass res.close() return dict(code=res.code, headers=res.info(), finalurl=res.geturl()) 
+7
Mar 27 '12 at 15:04
source share

You can send a HEAD request using httplib . A HEAD is the same as a GET request, but the server does not then send the message body.

+2
Mar 27 2018-12-12T00:
source share



All Articles