Python garbage collector is crazy

Python 2.7.12, Ubuntu 16.04.

A simple server application that checks the email field every 10 seconds and writes to the debug log that there are no messages. The mailbox is really empty, so the application really does nothing. Single threaded. It does not use any custom C-extensions. Usually uses 17 MB of memory or something like that.

But after a few hours, maybe a day, it suddenly begins to grow in memory up to 8 gigabytes, consumes 100% of the processor and no longer writes a debug log.

I read http://tech.labs.oliverwyman.com/blog/2008/11/14/tracing-python-memory-leaks/ , launched my application under pdb, wait a night and see what objgraph might display. Nothing. There are no types of objects that are growing everywhere, all indicators are approximately normal (first I check the application in a normal state).

Then I try to use https://pythonhosted.org/Pympler/muppy.html Like this:

(Pdb) from pympler import muppy
(Pdb) all_objects = muppy.get_objects()
*** MemoryError: 

Then after a while I saw in the system monitor that the pdb process now consumes only 2 GB of memory! But this is single-threaded (I check with ps) and my application was stopped, so somehow 6Gb just disappears ...

: , pdb- 100% 8-12 . gc.collect(), . muppy.get_objects() .

pdb gc.collect() 2 , , .

gc.collect() 0, .

, - python, , SSL ( , , ). IMAP SSL, imaplib + , ( Python).

What else can I do to find / fix the problem? Maybe some good memory checking tool or a known bug in python libraries?


Guys, I'm taking it! It seems that the GC issues were only visual effects, the real issue here:
# IMAP with timeouts
class NonBlockingSSL_IMAP(imaplib.IMAP4):
    def __init__(self, timeout, *args, **kwargs):
        self.timeout = timeout
        imaplib.IMAP4.__init__(self, *args, **kwargs)

    ....

    def read(self, size):
        readed = 0
        buffer = []

        # ssl socket is not entirely normal socket, so may be there are some data
        # and select do not know about them. So we should dry socket first.
        dried = False

        while readed < size:
            if dried:
                self.wait_for(recv=self.sock)
            try:
                data = self.sslobj.recv(size - readed)                
                buffer.append(data)   # <----- here !!!!
                readed += len(data)
            except ssl.SSLWantReadError:
                dried = True
            except socket.error as se:
                if se.errno != errno.EAGAIN:
                    # something bad happened
                    raise se
                dried = True
        return ''.join(buffer)

    ....

    def wait_for(self, recv=None, send=None):
        rd = [recv] if recv is not None else []
        wr = [send] if send is not None else []

        ready_r, ready_w, _ = select.select(rd, wr, [], self.timeout)

        if not ready_r and not ready_w:
            raise socket.error(errno.EAGAIN, "timeout")

If recv returns an empty string, I add it to the buffer and do not increase the number of reads (reason len == 0). And it can be an endless cycle. And as a result, we have one very large list of empty lines - there are no additional objects at all (because everything '' is one object, hello, objgraph). Only one really big list. And, probably, Python cannot handle it correctly, and I can figure it out.

+4
source share

Source: https://habr.com/ru/post/1688146/


All Articles