How to avoid HTTP 429 error (too many requests) python

I am trying to use Python to enter a site and collect information from several web pages, and I get the following error:

Traceback (most recent call last): File "extract_test.py", line 43, in <module> response=br.open(v) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 203, in open return self._mech_open(url, data, timeout=timeout) File "/usr/local/lib/python2.7/dist-packages/mechanize/_mechanize.py", line 255, in _mech_open raise response mechanize._response.httperror_seek_wrapper: HTTP Error 429: Unknown Response Code 

I used time.sleep() and it works, but it seems unreasonable and unreliable, is there any other way to avoid this error?

Here is my code:

 import mechanize import cookielib import re first=("example.com/page1") second=("example.com/page2") third=("example.com/page3") fourth=("example.com/page4") ## I have seven URL I want to open urls_list=[first,second,third,fourth] br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Log in credentials br.open("example.com") br.select_form(nr=0) br["username"] = "username" br["password"] = "password" br.submit() for url in urls_list: br.open(url) print re.findall("Some String") 
+42
python mechanize
Apr 01 '14 at 12:35 on
source share
3 answers

Obtaining 429 status is not an error, another server "kindly" asks you to stop spamming requests. Obviously, your request rate is too high, and the server does not agree with this.

You should not try to “evade” this or even try to circumvent the server’s security settings by trying to trick your IP address, you should simply respect the server’s response without sending too many requests.

If everything is set up correctly, you will also receive a “Retry-after” header along with answer 429. This header indicates the number of seconds that you must wait before making another call. The right way to handle this “problem” is to read this headline and your process asleep for many seconds.

You can find more information on status 429 here: http://tools.ietf.org/html/rfc6585#page-3

+76
Apr 29 '14 at 14:14
source

Writing this piece of code fixed my problem:

requests.get(link, headers = {'User-agent': 'your bot 0.1'})

+7
Nov 03 '16 at 4:14
source

Another workaround would be to trick your IP address using some kind of Public VPN or Tor network. This implies a speed limit on the server at the IP level.

There is a short blog post demonstrating how to use tor with urllib2:

http://blog.flip-edesign.com/?p=119

+5
Apr 01 '14 at 13:08
source



All Articles