Why can I read the HEAD HTTP request in python 3 urllib.request?

Question

Why can I read the HEAD HTTP request in python 3 urllib.request?

I want to make a HEAD request without any content data to save bandwidth. I am using urllib.request . However, after testing, does it look like HEAD requests are also receiving data? What's happening?

 Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 6 2014, 22:16:31) [MSC v.1600 64 bit (AM D64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib.request >>> req = urllib.request.Request("http://www.google.com", method="HEAD") >>> resp = urllib.request.urlopen(req) >>> a = resp.read() >>> len(a) 24088

+6

python httprequest python-3.4 urllib

Eric Mar 29 '15 at 9:44

source share

1 answer

Martijn pieters · Accepted Answer · 2015-03-29T09:51:37+0000

URL redirection http://www.google.com :

 $ curl -D - -X HEAD http://www.google.com HTTP/1.1 302 Found Cache-Control: private Content-Type: text/html; charset=UTF-8 Location: http://www.google.co.uk/?gfe_rd=cr&ei=A8sXVZLOGvHH8ge1jYKwDQ Content-Length: 261 Date: Sun, 29 Mar 2015 09:50:59 GMT Server: GFE/2.0 Alternate-Protocol: 80:quic,p=0.5

and urllib.request redirected by issuing a GET request to this new location:

 >>> import urllib.request >>> req = urllib.request.Request("http://www.google.com", method="HEAD") >>> resp = urllib.request.urlopen(req) >>> resp.url 'http://www.google.co.uk/?gfe_rd=cr&ei=ucoXVdfaJOTH8gf-voKwBw'

You will need to create your own handler stack to prevent this; HTTPRedirectHandler not smart enough to not handle redirection when the HEAD method action is issued. Adapting an example from Alan Duan from How to prevent Python urllib (2) from being used after redirecting to Python 3 will give you:

 import urllib.request class NoRedirection(urllib.request.HTTPErrorProcessor): def http_response(self, request, response): return response https_response = http_response opener = urllib.request.build_opener(NoRedirection) req = urllib.request.Request("http://www.google.com", method="HEAD") resp = opener.open(req)

You better use the requests library; it explicitly sets allow_redirects=False when using requests.head() or requests.Session().head() calls, so you can see the original result:

 >>> import requests >>> requests.head('http://www.google.com') <Response [302]> >>> _.headers['Location'] 'http://www.google.co.uk/?gfe_rd=cr&ei=FcwXVbepMvHH8ge1jYKwDQ'

and even if redirection is enabled, the response.history list gives you access to intermediate requests, and requests also uses the correct method for redirected calls:

 >>> response = requests.head('http://www.google.com', allow_redirects=True) >>> response.url 'http://www.google.co.uk/?gfe_rd=cr&ei=8e0XVYfGMubH8gfJnoKoDQ' >>> response.history [<Response [302]>] >>> response.history[0].url 'http://www.google.com/' >>> response.request.method 'HEAD'

Why can I read the HEAD HTTP request in python 3 urllib.request?

More articles: