Python memory leak with loaded attribute in class that is not going to

I am using python 2.7.14. It can be played on OSX and Linux.

I have a python class:

import cPickle class TestClass: def __init__(self, path_to_data=None): self.loaded_data = None if path_to_data: self.load(path_to_data) def load(self, path_to_data): self.loaded_data = None with open(path_to_data, 'r') as f: self.loaded_data = cPickle.load(f) 

You can make a suitable dimensional pickled dictionary:

 >>> import cPickle >>> d = {x:x+1 for x in range(1000000)} >>> with open('testdict.pkl', 'w+') as f: >>> cPickle.dump(d, f) 

And repeat the problem as follows:

 >>> from test_py import TestClass >>> import psutil >>> import os >>> process = psutil.Process(os.getpid()) >>> process.memory_info() pmem(rss=8085504L, vms=4405288960L, pfaults=2154, pageins=0) >>> >>> t = TestClass('testdict.pkl') >>> process.memory_info() pmem(rss=155897856L, vms=4552028160L, pfaults=38241, pageins=0) >>> >>> t = TestClass('testdict.pkl') >>> process.memory_info() pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0) >>> >>> del t >>> process.memory_info() pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0) 

Why doesn't memory receive garbage collection? Something else doesn't quite add up: sys.getsizeof(t.loaded_data) only returns 50331928 , but the rss difference between the two loads is greater than that. Is this a bug or a function that I don't understand, and how to avoid it?

Thanks!

EDIT

For those who indicate that cPickle may have a memory leak, here is an option:

 from marisa_trie import Trie class TestClass: def __init__(self, path_to_data=None): self.loaded_data = None if path_to_data: self.load(path_to_data) def load(self, path_to_data): self.loaded_data = None self.loaded_data = Trie().load(path_to_data) 

script works

 from test_py import TestClass import psutil import os import gc process = psutil.Process(os.getpid()) print 'empty process:', process.memory_info() t = TestClass('testtrie.trie') print 'first load:', process.memory_info() t = TestClass('testtrie.trie') print 'second load:', process.memory_info() gc.collect() print 'after gc.collect:', process.memory_info() 

prints

 empty process: pmem(rss=8052736L, vms=4405383168L, pfaults=2158, pageins=134) first load: pmem(rss=9801728L, vms=4406640640L, pfaults=2585, pageins=158) second load: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158) after gc.collect: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158) 

(here testtrie.trie was built as follows:

 from marisa_trie import Trie Trie(unicode(x) for x in range(1000000)).save('testtrie.trie') 

)

+5
source share

Source: https://habr.com/ru/post/1275898/


All Articles