Urllib2 breaks but does not close socket connection

I am making a python url capture program. For my purposes, I want this time to be really very fast, so I do

urllib2.urlopen("http://.../", timeout=2)

Of course, it’s right from time to time, as it should be. However, it does not try to close the connection to the server, so the server believes that the client is still connected. How can I ask urllib2 to just close the connection after time expires?

Running gc.collect () does not work, and I would not want to use httplib if I cannot help.

The closest I can get is: the first attempt will be a timeout. The server reports that the connection is closed just like the second timeout attempt. The server then reports that the connection is closed just like the third attempt. Ad infinitum.

Many thanks.

0
source share
2 answers

I have a suspicion that the socket is still open in the stack frames. When Python throws an exception, it saves the stack frames, so debuggers and other tools can view the stack and evaluate values.

For historical reasons, and now for backward compatibility, information about the stack is stored (downstream) in sys (see sys.exc_info (), sys.exc_type and others). This is one of the things that were removed in Python 3.0.

For you, this means that the stack is still alive and referenced. There, the stack contains local data for some function that has an open socket. This is why the socket is not yet closed. It is only when the stack trace is deleted that everything will be gc'ed.

To check this, insert something like

 try: 1/0 except ZeroDivisionError: pass 

in the except clause. This is a quick way to replace the current exception with something else.

+2
source

This is such a hack, but the following code works. If the request is in another AND function, it does not throw an exception, then the socket is always closed.

 def _fetch(self, url): try: return urllib2.urlopen(urllib2.Request(url), timeout=5).read() except urllib2.URLError, e: if isinstance(e.reason, socket.timeout): return None else: raise e def fetch(self, url): x = None while x is None: x = self._fetch(url) print "Timeout" return x 

Anyone have a better way?

0
source

Source: https://habr.com/ru/post/900946/


All Articles