I need to collect information from web pages using Python from a Linux terminal, it works fine, but some pages (not all of them) get the wrong URL when I try to use request.get because they have agent detectors, and they don’t know how to answer my request (I’m not a browser or mobile application from a Linux terminal).
Using the "User-Agent" header didn’t work either, I tried several different ways to send it for emulation. I am a Mozilla browser:
user_agent = {'User-Agent': 'Mozilla/5.0'}
or
user_agent = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4'}
or many other combinations.
On some servers, when I try to use this line:
page = requests.get(url, headers=user_agent)
, - , .
- , User-Agent ? Python Notebook, - , () .