For some reason, I get HTTP Error 403: Forbidden when I try to open the http://questionablecontent.net page. I used the robots.txt error, but this was resolved. Also, I can't even find the robots.txt file.
I can still browse the webpage with chrome, so I wonder: does the appearance mechanize differently than chrome, even after setting the appropriate headers?
Here is my code (which doesn't work):
br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_robots(False) br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
I also tried setting addheaders to the same headers as my browser (which I found here ):
br.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')]
... but that didn't work either.
Finally, I tried using Selenium, and it worked because it loads the page in chrome and then exchanges data with Python. However, I would still like to get him to work with mechanization. Also, I'm still not sure how chrome and mechanization look different on their server.
source share