One of my python applications seems to be leaking memory, judging by the steady increase in memory usage. My hypothesis is a circular reference somewhere, despite all efforts to avoid this. To isolate the problem, I look at ways to manually check for unreachable elements, a tool designed exclusively for debugging.
The gc module is apparently capable of tracking, and I tried to execute the following code, which is intended to compile a list of unreachable elements that have been generated since the last call. The first call simply sets a base breakpoint and will not identify unreachable elements.
def unreachable(): # first time setup import gc gc.set_threshold( 0 ) # only manual sweeps gc.set_debug( gc.DEBUG_SAVEALL ) # keep unreachable items as garbage gc.enable() # start gc if not yet running (is this necessary?) # operation if gc.collect() == 0: return 'no unreachable items' s = 'unreachable items:\n ' \ + '\n '.join( '[%d] %s' % item for item in enumerate( gc.garbage ) ) _deep_purge_list( gc.garbage ) # remove unreachable items return s # return unreachable items as text
Here _deep_purge_list aims to break cycles and remove objects manually. The following implementation handles some common cases, but is not close to water. My first question is related to this, see Down.
def _deep_purge_list( garbage ): for item in garbage: if isinstance( item, dict ): item.clear() if isinstance( item, list ): del item[:] try: item.__dict__.clear() except: pass del garbage[:]
Based on very limited testing, the installation works correctly. The following circular reference correctly reports once:
class A( object ): def __init__( self ): self.ref = self print unreachable()
However, with the following case, the odd happens:
print unreachable() # no unreachable items import numpy print unreachable() # unreachable items: # [0] (<type '_ctypes.Array'>,) # [1] {'__module__': 'numpy.ctypeslib', '__dict__': <attribute '__dict__' of 'c_long_Array_1' objects>, '__weakref__': <attribute '__weakref__' of 'c_long_Array_1' objects>, '_length_': 1, '_type_': <class 'ctypes.c_long'>, '__doc__': None} # [2] <class 'numpy.ctypeslib.c_long_Array_1'> # [3] <attribute '__dict__' of 'c_long_Array_1' objects> # [4] <attribute '__weakref__' of 'c_long_Array_1' objects> # [5] (<class 'numpy.ctypeslib.c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>) print unreachable() # unreachable items: # [0] (<type '_ctypes.Array'>,) # [1] {} # [2] <class 'c_long_Array_1'> # [3] (<class 'c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>)
Repeated calls continue to return the last result. The problem does not occur when the unreachable call is called for the first time after import. However, at the moment I have no reason to believe that this problem is ambiguous; I suggest that it reveals a flaw in my approach.
My questions:
- Is there a better way to remove items in gc.garbage? Ideally, there is a way to remove gc, how would it (should it?) Do without DEBUG_SAVEALL?
- Can anyone explain the numpy import problem and / or suggest ways to fix this?
Afterthought:
It looks like the code below does close to the intended one:
def unreachable(): import gc gc.set_threshold( 0 ) gc.set_debug( gc.DEBUG_LEAK ) gc.enable() print 'collecting {{{' gc.collect() print '}}} done'
However, for debugging, I prefer rich string representations of type / id, as provided by gc. Also, I would like to understand the flaw in my previous approach and learn something about the gc module.
Appreciating your help,
Gertjan
Update 06/05:
I came across a situation where the first implementation did not report unreachable elements, unless locals () was called immediately before (discarding the return value). Not understanding how this can affect tracking gc objects, this confuses me even more. I'm not sure how easy it will be to create a small example demonstrating this problem, but if demand requires it, I can take a picture.