Connection refused while trying to restart thread for webscrapper

Question

Connection refused while trying to restart thread for webscrapper

I use DryScrape to clean a javascript page, and sometimes it kills the process if it is an error. I tried using catch as per the documentation to prevent it, but I did not understand:

try: sess.visit('url')) except webkit_server.EndOfStreamError: continue except webkit_server.NoResponeerror: continue except webkit_server.InvalidResponseError: continue except webkit_server.NoX11Error: continue

So, I have such a setting to restart threads if they fail: class Checker (): def check_if_thread_is_alive (self): a = ThreadClass () a.start ()

  b = ThreadClass() b.start() c = ThreadClass() c.start() d = ThreadClass() d.start() while True: if not a.is_alive(): print "Restarting A" a = ThreadClass() a.start() if not b.is_alive(): print "Restarting B" b = ThreadClass() b.start() if not c.is_alive(): print "Restarting C" c = ThreadClass() c.start() if not d.is_alive(): print "Restarting D" d = ThreadClass() d.start()

However, I get an error when I try to restart the stream:

 Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "Scrapper.py", line 30, in run sess = dryscrape.Session(base_url = 'url') File "/usr/local/lib/python2.7/dist-packages/dryscrape/session.py", line 18, in __init__ self.driver = driver or DefaultDriver() File "/usr/local/lib/python2.7/dist-packages/dryscrape/driver/webkit.py", line 30, in __init__ super(Driver, self).__init__(**kw) File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 225, in __init__ self.conn = connection or ServerConnection() File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 444, in __init__ self._sock = (server or get_default_server()).connect() File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect sock.connect(("127.0.0.1", self._port)) File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused

Is there a better way to solve this problem or is something missing?

+6

python multithreading python-2.7 webkit

user2540748 Nov 16 '14 at 18:30

source share

2 answers

dsgdfg · Answer 1 · 2015-08-24T09:55:32+0000

Cos: you are trying to connect to yourself.

 need change target url.

if you want to connect to yourself, first create a service.

 File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect sock.connect(("127.0.0.1", self._port)) File "/usr/lib/python2.7/socket.py", line 224, in meth <<<--- you're trying to connect to yourself. return getattr(self._sock,name)(*args)

Neowang · Answer 2 · 2015-08-29T11:51:20+0000

If you want to skip an exception, you can always use a catch-all exception handler like this. This is usually considered a very bad price, but it supports your scraper if an error occurs only occasionally:

 try: sess.visit(url) except Exception as e: # Print the exception for debugging here continue

And you start a local server for testing? From the trace:

 File "/usr/local/lib/python2.7/dist-packages/webkit_server.py", line 414, in connect sock.connect(("127.0.0.1", self._port))

In fact, you are connecting to localhost. If you start your own server, check the server log to see why it stops responding to connection requests.

Just noticed one more error in your script:

 sess.visit('url') # it should be something like: url = "http://www.google.com/" sess.visit(url)

Connection refused while trying to restart thread for webscrapper

More articles: