Python gc.garbage understanding problem (for tracking memory leaks)

One of my python applications seems to be leaking memory, judging by the steady increase in memory usage. My hypothesis is a circular reference somewhere, despite all efforts to avoid this. To isolate the problem, I look at ways to manually check for unreachable elements, a tool designed exclusively for debugging.

The gc module is apparently capable of tracking, and I tried to execute the following code, which is intended to compile a list of unreachable elements that have been generated since the last call. The first call simply sets a base breakpoint and will not identify unreachable elements.

def unreachable(): # first time setup import gc gc.set_threshold( 0 ) # only manual sweeps gc.set_debug( gc.DEBUG_SAVEALL ) # keep unreachable items as garbage gc.enable() # start gc if not yet running (is this necessary?) # operation if gc.collect() == 0: return 'no unreachable items' s = 'unreachable items:\n ' \ + '\n '.join( '[%d] %s' % item for item in enumerate( gc.garbage ) ) _deep_purge_list( gc.garbage ) # remove unreachable items return s # return unreachable items as text 

Here _deep_purge_list aims to break cycles and remove objects manually. The following implementation handles some common cases, but is not close to water. My first question is related to this, see Down.

 def _deep_purge_list( garbage ): for item in garbage: if isinstance( item, dict ): item.clear() if isinstance( item, list ): del item[:] try: item.__dict__.clear() except: pass del garbage[:] 

Based on very limited testing, the installation works correctly. The following circular reference correctly reports once:

 class A( object ): def __init__( self ): self.ref = self print unreachable() # no unreachable items A() print unreachable() # unreachable items: # [0] <__main__.A object at 0xb74579ac> # [1] {'ref': <__main__.A object at 0xb74579ac>} print unreachable() # no unreachable items 

However, with the following case, the odd happens:

 print unreachable() # no unreachable items import numpy print unreachable() # unreachable items: # [0] (<type '_ctypes.Array'>,) # [1] {'__module__': 'numpy.ctypeslib', '__dict__': <attribute '__dict__' of 'c_long_Array_1' objects>, '__weakref__': <attribute '__weakref__' of 'c_long_Array_1' objects>, '_length_': 1, '_type_': <class 'ctypes.c_long'>, '__doc__': None} # [2] <class 'numpy.ctypeslib.c_long_Array_1'> # [3] <attribute '__dict__' of 'c_long_Array_1' objects> # [4] <attribute '__weakref__' of 'c_long_Array_1' objects> # [5] (<class 'numpy.ctypeslib.c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>) print unreachable() # unreachable items: # [0] (<type '_ctypes.Array'>,) # [1] {} # [2] <class 'c_long_Array_1'> # [3] (<class 'c_long_Array_1'>, <type '_ctypes.Array'>, <type '_ctypes._CData'>, <type 'object'>) 

Repeated calls continue to return the last result. The problem does not occur when the unreachable call is called for the first time after import. However, at the moment I have no reason to believe that this problem is ambiguous; I suggest that it reveals a flaw in my approach.

My questions:

  • Is there a better way to remove items in gc.garbage? Ideally, there is a way to remove gc, how would it (should it?) Do without DEBUG_SAVEALL?
  • Can anyone explain the numpy import problem and / or suggest ways to fix this?

Afterthought:

It looks like the code below does close to the intended one:

 def unreachable(): import gc gc.set_threshold( 0 ) gc.set_debug( gc.DEBUG_LEAK ) gc.enable() print 'collecting {{{' gc.collect() print '}}} done' 

However, for debugging, I prefer rich string representations of type / id, as provided by gc. Also, I would like to understand the flaw in my previous approach and learn something about the gc module.

Appreciating your help,

Gertjan

Update 06/05:

I came across a situation where the first implementation did not report unreachable elements, unless locals () was called immediately before (discarding the return value). Not understanding how this can affect tracking gc objects, this confuses me even more. I'm not sure how easy it will be to create a small example demonstrating this problem, but if demand requires it, I can take a picture.

+6
source share
1 answer

The last time I had such a need, I ended up using the objgraph module . It provides much more accurate information than you can easily get from gc module directly. Unfortunately, I have no code illustrating its use.

One place where it crashes is in memory allocated by any called C-code libraries. For example, if a project uses PIL, it is very easy to leak memory due to the fact that they do not properly release python objects that are supported by C. This depends on the C module how to properly close such objects.

0
source

Source: https://habr.com/ru/post/946472/


All Articles