I am trying to use Python to enter a site and collect information from several web pages, and I get the following error:
Traceback (most recent call last): File "extract_test.py", line 43, in <module> response=br.open(v) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open return self._mech_open(url, data, timeout=timeout) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open raise response mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code
I used time.sleep() and it works, but it seems unreasonable and unreliable, is there any other way to avoid this error?
Here is my code:
import mechanize import cookielib import re first=("example.com/page1") second=("example.com/page2") third=("example.com/page3") fourth=("example.com/page4") ## I have seven URL I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Log in credentials br.open("example.com") br.select_form(nr=0) br["username"] = "username" br["password"] = "password" br.submit() for url in urls_list: br.open(url) print re.findall("Some String")
Aous1000 Apr 01 '14 at 12:35 on 2014-04-01 12:35
source share