Measurement of heap size increase after loading a large object

I am interested to know about increasing the total size of the python heap when loading a large object. heapy seems to be what i need, but i don't understand the results.

I have a 350 MB pickle file with a pandas DataFrame in it that contains about 2.5 million records. When I upload a file and then check the heapy heap, it reports that only about 8 MB of objects have been added to the heap.

 import guppy h = guppy.hpy() h.setrelheap() df = pickle.load(open('test-df.pickle')) h.heap() 

This gives the following result:

 Partition of a set of 95278 objects. Total size = 8694448 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 44700 47 4445944 51 4445944 51 str 1 25595 27 1056560 12 5502504 63 tuple 2 6935 7 499320 6 6001824 69 types.CodeType ... 

What bothers me is Total size of 8694448 bytes . It is only 8 MB.

Why does the Total size not reflect the size of the whole DataFrame df ?

(Using python 2.7.3, heapy 0.1.10, Linux 3.2.0-48-generic-pae (Ubuntu), i686)

+6
source share
2 answers

You can try the pympler , which worked for me the last time I checked. If you are just interested in the general increase in memory, and not for a particular class, you can specify a specific OS call to get the total used memory. For example, on a unix-based OS, you can do something like the following before and after loading the object to get diff.

 resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
0
source

I had a similar problem when I tried to find out why my 500 MB CSV files occupy up to 5 GB in memory. Pandas is mostly built on top of Numpy and therefore uses C malloc to allocate space. That's why it does not appear in heapy, which only profiles pure Python objects. One solution might be to look at valgrind to track memory leaks.

0
source

Source: https://habr.com/ru/post/948512/


All Articles