Why does mechanization throw an HTTP 403 error?

For some reason, I get HTTP Error 403: Forbidden when I try to open the http://questionablecontent.net page. I used the robots.txt error, but this was resolved. Also, I can't even find the robots.txt file.

I can still browse the webpage with chrome, so I wonder: does the appearance mechanize differently than chrome, even after setting the appropriate headers?

Here is my code (which doesn't work):

 br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 

I also tried setting addheaders to the same headers as my browser (which I found here ):

 br.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')] 

... but that didn't work either.

Finally, I tried using Selenium, and it worked because it loads the page in chrome and then exchanges data with Python. However, I would still like to get him to work with mechanization. Also, I'm still not sure how chrome and mechanization look different on their server.

+6
source share
1 answer

The trick is probably in the request headers sent by selenium, in addition to the user agent header, some servers also check other headers to provide a real browser talking to them. look at one of my older answers:

urllib2.HTTPError: HTTP Error 403: Forbidden

In your place, I would try to add all the headers that your real Chrome browser sends, and then eliminate the unnecessary ones.

+4
source

Source: https://habr.com/ru/post/950595/


All Articles