Python 2.7.12, Ubuntu 16.04.
A simple server application that checks the email field every 10 seconds and writes to the debug log that there are no messages. The mailbox is really empty, so the application really does nothing. Single threaded. It does not use any custom C-extensions. Usually uses 17 MB of memory or something like that.
But after a few hours, maybe a day, it suddenly begins to grow in memory up to 8 gigabytes, consumes 100% of the processor and no longer writes a debug log.
I read http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/ , launched my application under pdb, wait a night and see what objgraph might display. Nothing. There are no types of objects that are growing everywhere, all indicators are approximately normal (first I check the application in a normal state).
Then I try to use https://pythonhosted.org/Pympler/muppy.html Like this:
(Pdb) from pympler import muppy
(Pdb) all_objects = muppy.get_objects()
*** MemoryError:
Then after a while I saw in the system monitor that the pdb process now consumes only 2 GB of memory! But this is single-threaded (I check with ps) and my application was stopped, so somehow 6Gb just disappears ...
: , pdb- 100% 8-12 . gc.collect(), . muppy.get_objects() .
pdb gc.collect() 2 , , .
gc.collect() 0, .
, - python, , SSL ( , , ). IMAP SSL, imaplib + , ( Python).
What else can I do to find / fix the problem? Maybe some good memory checking tool or a known bug in python libraries?
Guys, I'm taking it! It seems that the GC issues were only visual effects, the real issue here:
class NonBlockingSSL_IMAP(imaplib.IMAP4):
def __init__(self, timeout, *args, **kwargs):
self.timeout = timeout
imaplib.IMAP4.__init__(self, *args, **kwargs)
....
def read(self, size):
readed = 0
buffer = []
dried = False
while readed < size:
if dried:
self.wait_for(recv=self.sock)
try:
data = self.sslobj.recv(size - readed)
buffer.append(data)
readed += len(data)
except ssl.SSLWantReadError:
dried = True
except socket.error as se:
if se.errno != errno.EAGAIN:
raise se
dried = True
return ''.join(buffer)
....
def wait_for(self, recv=None, send=None):
rd = [recv] if recv is not None else []
wr = [send] if send is not None else []
ready_r, ready_w, _ = select.select(rd, wr, [], self.timeout)
if not ready_r and not ready_w:
raise socket.error(errno.EAGAIN, "timeout")
If recv returns an empty string, I add it to the buffer and do not increase the number of reads (reason len == 0). And it can be an endless cycle. And as a result, we have one very large list of empty lines - there are no additional objects at all (because everything '' is one object, hello, objgraph). Only one really big list. And, probably, Python cannot handle it correctly, and I can figure it out.