Urllib.urlretrieve with custom header

I am trying to get the file using urlretrieveby adding a custom header.

Checking the source code urllib.request, I realized that it urlopencan take an object Requestin a parameter, and not just a string that allows me to put the header that I want. But if I try to do the same with urlretrieve, I get a TypeError: an expected string or object like bytes , as mentioned in this other post.

What I ended up with was rewriting my own urlretrieve, deleting the line throwing the error (this line doesn't matter in my use case).

It works great , but I wonder if there is a better / cleaner way , and not rewrite my own urlretrieve. If you can pass the custom header to urlopen, it looks like with urlretrieve?

should it be possible to do the same?
+4
source share
2 answers

I'm not sure this guy is chatting, but that didn't work for me.

I found a way to add a few extra lines of code ...

import urllib.request

opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(<URL>, <file_name_to_save_as>)

If you want to know the details, you can refer to the python documentation: https://docs.python.org/3/library/urllib.request.html

+10
source

urllib.request.urlretrieve() urllib.request.urlopen() ( , Python 3). , , urlopen.

urlopen(params), urllib.request._opener, None, urlopen , . urllib.request._opener.open(<urlopen_params>) ( urllib.request._opener opener).

opener.open() . opener.open(), :

  • URL urllib.request.Request ( Request, ).
  • Request ( URL).
  • :
    • protocol_request (, http_request) - .
    • protocol_open -
    • protocol_response -
    • Python

:

  • ( urllib.request.build_opener)
  • urllib.request._opener ( urllib.request.install_opener)

urllib.request.build_opener , , , .

, - :

import urllib.request as req

class MyHTTP(req.HTTPHandler):
    def http_request(self, req):
        req.headers["MyHeader"] = "Content of my header"
        return super().http_request(req)

opener = req.build_opener(MyHTTP())
req.install_opener(opener)

, urllib.request.urlretrieve() -, urlopen(), HTTP . , :

import urllib.request as req   

req.install_opener(req.build_opener())

, , / , , urllib.

+3

Source: https://habr.com/ru/post/1682004/


All Articles