Why is one version of a memory leak, but not another? (Python)

Question

Why is one version of a memory leak, but not another? (Python)

Both of these functions calculate the same thing (integer numbers such that the length of the connected Collatz sequence does not exceed n) is essentially the same. The only difference is that the first uses sets exclusively, while the second uses both sets and lists.

The second memory leak (in IDLE with Python 3.2, at least), the first is not, and I have no idea why. I tried a few "tricks" (for example, adding del statements), but nothing helps (which is not surprising, these tricks should be useless).

I would be grateful to everyone who could help me understand what is happening.

If you want to test the code, you should probably use a n value in the range 55 to 65, something above 75 will almost certainly result in a (fully expected) memory error.

 def disk(n): """Uses sets for explored, current and to_explore. Does not leak.""" explored = set() current = {1} for i in range(n): to_explore = set() for x in current: if not (x-1) % 3 and ((x-1)//3) % 2 and not ((x-1)//3) in explored: to_explore.add((x-1)//3) if not 2*x in explored: to_explore.add(2*x) explored.update(current) current = to_explore return len(explored) def disk_2(n): """Does exactly the same thing, but Uses a set for explored and lists for current and to_explore. Leaks (like a sieve :)) """ explored = set() current = [1] for i in range(n): to_explore = [] for x in current: if not (x-1) % 3 and ((x-1)//3) % 2 and not ((x-1)//3) in explored: to_explore.append((x-1)//3) if not 2*x in explored: to_explore.append(2*x) explored.update(current) current = to_explore return len(explored)

EDIT . This also happens when using the interpreter's interactive mode (without IDLE), but not when running the script directly from the terminal (in this case, the memory usage returns to usually some time after the function returns or immediately, as soon as the explicit call to gc.collect() ) .

+4

python python-3.x

Jbeuh Nov 17 '12 at 15:46

source share

3 answers

eryksun · Answer 1 · 2012-11-18T20:01:48+0000

CPython allocates small objects (obmalloc.c, 3.2.3) from 4 KiB pools that it manages in 256 KiB blocks called arenas. Each active pool has a fixed block size of 8 to 256 bytes in increments of 8. For example, a 14-byte object is allocated from the first available pool with a block size of 16 bytes.

There is a potential problem if arenas are allocated on the heap instead of using mmap (this is configured through mallopt M_MMAP_THRESHOLD ), in this the heap cannot be compressed below the highest allocated arena, which will not be released until 1 block in 1 pool is allocated to the object (CPython does not float objects in memory).

Given the above, the next version of your function should probably solve the problem. Replace the return len(explored) with the following 3 lines:

  result = len(explored) del i, x, to_explore, current, explored return result + 0

After releasing the containers and all reference objects (releasing the arena back to the system), this returns a new int with the expression result + 0 . A heap cannot be compressed as long as there is a reference to the first result object. In this case, it is automatically freed when the function returns.

If you are testing this interactively without the “plus 0” step, remember that REPL (Read, Eval, Print, Loop) maintains a link to the last result, accessible using the pseudo-variable “ _ ”.

In Python 3.3, this should not be a problem, since the distributor object has been changed to use anonymous mmap for the arena , if any. (The upper limit for the object allocator was also tied to 512 bytes to accommodate 64-bit platforms, but this is immaterial.)

Regarding manual garbage collection, gc.collect() performs a complete assembly of tracked container objects, but also cleans freelists of objects that support built-in types (for example, frames, methods, floats). Python 3.3 has added additional API functions to clear free-list lists ( PyList_ClearFreeList ), dicts ( PyDict_ClearFreeList ) and sets ( PySet_ClearFreeList ). If you prefer to keep your appearance free, use gc.collect(1) .

hyde · Answer 2 · 2012-11-17T16:08:50+0000

I doubt that it is leaking, I bet that garbage collection has not yet begun, so the used memory continues to grow. This is due to the fact that each cycle of the outer cycle, the previous current list, becomes available for garbage collection, but will not collect garbage until it is.

In addition, even if it is garbage collection, memory usually does not return back to the OS, so you need to use any Python method to get the current heap size used.

If you add garbage collection at the end of each iteration of the outer loop, this can reduce the number of bits of memory or not, depending on how exactly Python handles its heap and garbage collection without it.

Marcin · Answer 3 · 2012-11-17T17:25:57+0000

You have no memory leak. Linux processes do not free up OS memory until they exit. Accordingly, the statistics that you see, for example, top will only grow.

You only have a memory leak, if after completing the same or smaller job size, Python grabs more memory from the OS, when it should "reuse" the memory that it used for objects that "should" have garbage collected.

Why is one version of a memory leak, but not another? (Python)

More articles: