Measurement of heap size increase after loading a large object

Question

Measurement of heap size increase after loading a large object

I am interested to know about increasing the total size of the python heap when loading a large object. heapy seems to be what i need, but i don't understand the results.

I have a 350 MB pickle file with a pandas DataFrame in it that contains about 2.5 million records. When I upload a file and then check the heapy heap, it reports that only about 8 MB of objects have been added to the heap.

 import guppy h = guppy.hpy() h.setrelheap() df = pickle.load(open('test-df.pickle')) h.heap()

This gives the following result:

 Partition of a set of 95278 objects. Total size = 8694448 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 44700 47 4445944 51 4445944 51 str 1 25595 27 1056560 12 5502504 63 tuple 2 6935 7 499320 6 6001824 69 types.CodeType ...

What bothers me is Total size of 8694448 bytes . It is only 8 MB.

Why does the Total size not reflect the size of the whole DataFrame df ?

(Using python 2.7.3, heapy 0.1.10, Linux 3.2.0-48-generic-pae (Ubuntu), i686)

+6

python pandas

rodion Jul 02 '13 at 2:51

source share

2 answers

amit · Answer 1 · 2013-07-02T11:25:42+0000

You can try the pympler , which worked for me the last time I checked. If you are just interested in the general increase in memory, and not for a particular class, you can specify a specific OS call to get the total used memory. For example, on a unix-based OS, you can do something like the following before and after loading the object to get diff.

 resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

usual me · Answer 2 · 2014-07-04T13:30:21+0000

I had a similar problem when I tried to find out why my 500 MB CSV files occupy up to 5 GB in memory. Pandas is mostly built on top of Numpy and therefore uses C malloc to allocate space. That's why it does not appear in heapy, which only profiles pure Python objects. One solution might be to look at valgrind to track memory leaks.

Measurement of heap size increase after loading a large object

More articles: