Python Multiprocessing - How to track memory usage?

Question

Python Multiprocessing - How to track memory usage?

I have not found a good way to control the memory usage of a Python script using multiprocessing . In particular, I will say that I do this:

 import time biglist = range(pow(10, 7)) time.sleep(5)

The memory usage is 1.3 GB, which is measured as /usr/bin/time -v , and top . But now, let's say I do this:

 import time from multiprocessing import Pool def worker(x): biglist = range(pow(10, 7)) time.sleep(5) return Pool(5).map(worker, range(5))

Now top reports 5 x 1.3 GB, which is true. But /usr/bin/time -v still reports 1.3 GB, which makes no sense. If he measures the consumption of the parent process, then he should say 0. If he measures the parent and children, then he should report 5 x 1.3 GB. Why does it say 1.3 GB? Now try copy-to-write:

 import time from multiprocessing import Pool biglist = range(pow(10, 7)) def worker(x): time.sleep(5) return Pool(5).map(worker, range(5))

Now /usr/bin/time -v reports 1.3 GB (again), which is correct. But top reports 6 x 1.3 GB, which is not true. When copying to a record, it should indicate only 1.3 GB.

How can I reliably control the memory usage of a Python script using multiprocessing ?

+5

python python-2.7 python-multiprocessing

usual me Apr 12 '16 at 9:06

source share

1 answer

Sergey Arkhipov · Answer 1 · 2016-04-12T10:00:12+0000

It really depends on what is meant by “reliable”. You can use the pmap <pid> command to get statistics on memory usage by processes (I think you're interested in the total field). You need to keep track of all the processes that were created during the execution of your program (I think ps --forest can help you).

If you want detailed information, you can use /proc/[pid]/{smaps,status,maps} ( man pages ).

Also remember the difference between RSS and VSZ .

Python Multiprocessing - How to track memory usage?

More articles: