Python Mechanize will not open these sites

Question

Python Mechanize will not open these sites

I am working with the Python Mechanize module. I came across three different sites that cannot be opened mechanized directly:

ru.wikipedia.org/wiki/Dog (new user, cannot post more than two TT links)
https://www.google.com/search?num=100&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e&gs_upl=618l914l0l1027l3l2l0l0l0l0l173l1l1l1l1l1l1l1l1l1l1l1l0

http://www.cpsc.gov/cpscpub/prerel/prhtml03/03059.html

import mechanize br = mechanize.Browser() br.set_handle_robots(False)

Adding the following code allows you to mechanize opening and analyzing the Wikipedia article and google search results:

  br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

But my workarounds are not suitable for the CPSC.gov site - when I try to open it using the Mechanize Browser mechanism, my python freezes - to the point that I can't even turn it off.

What's going on here?

+6

python mechanize

Michael hart Dec 15 '11 at 23:05

source share

1 answer

jcollado · Accepted Answer · 2011-12-15T23:57:07+0000

In the case of the cpsc.gov site, it looks like a refresh header, which is incorrectly handled by the HTTPRefreshProcessor mechanization. However, you can solve this problem as follows:

 import mechanize url = 'http://www.cpsc.gov/cpscpub/prerel/prhtml03/03059.html' br = mechanize.Browser() br.set_handle_refresh(False) br.open(url)

Python Mechanize will not open these sites

More articles: