I am using python 2.7.14. It can be played on OSX and Linux.
I have a python class:
import cPickle class TestClass: def __init__(self, path_to_data=None): self.loaded_data = None if path_to_data: self.load(path_to_data) def load(self, path_to_data): self.loaded_data = None with open(path_to_data, 'r') as f: self.loaded_data = cPickle.load(f)
You can make a suitable dimensional pickled dictionary:
>>> import cPickle >>> d = {x:x+1 for x in range(1000000)} >>> with open('testdict.pkl', 'w+') as f: >>> cPickle.dump(d, f)
And repeat the problem as follows:
>>> from test_py import TestClass >>> import psutil >>> import os >>> process = psutil.Process(os.getpid()) >>> process.memory_info() pmem(rss=8085504L, vms=4405288960L, pfaults=2154, pageins=0) >>> >>> t = TestClass('testdict.pkl') >>> process.memory_info() pmem(rss=155897856L, vms=4552028160L, pfaults=38241, pageins=0) >>> >>> t = TestClass('testdict.pkl') >>> process.memory_info() pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0) >>> >>> del t >>> process.memory_info() pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0)
Why doesn't memory receive garbage collection? Something else doesn't quite add up: sys.getsizeof(t.loaded_data) only returns 50331928 , but the rss difference between the two loads is greater than that. Is this a bug or a function that I don't understand, and how to avoid it?
Thanks!
EDIT
For those who indicate that cPickle may have a memory leak, here is an option:
from marisa_trie import Trie class TestClass: def __init__(self, path_to_data=None): self.loaded_data = None if path_to_data: self.load(path_to_data) def load(self, path_to_data): self.loaded_data = None self.loaded_data = Trie().load(path_to_data)
script works
from test_py import TestClass import psutil import os import gc process = psutil.Process(os.getpid()) print 'empty process:', process.memory_info() t = TestClass('testtrie.trie') print 'first load:', process.memory_info() t = TestClass('testtrie.trie') print 'second load:', process.memory_info() gc.collect() print 'after gc.collect:', process.memory_info()
prints
empty process: pmem(rss=8052736L, vms=4405383168L, pfaults=2158, pageins=134) first load: pmem(rss=9801728L, vms=4406640640L, pfaults=2585, pageins=158) second load: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158) after gc.collect: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158)
(here testtrie.trie was built as follows:
from marisa_trie import Trie Trie(unicode(x) for x in range(1000000)).save('testtrie.trie')
)