I have not found a good way to control the memory usage of a Python script using multiprocessing . In particular, I will say that I do this:
import time biglist = range(pow(10, 7)) time.sleep(5)
The memory usage is 1.3 GB, which is measured as /usr/bin/time -v , and top . But now, let's say I do this:
import time from multiprocessing import Pool def worker(x): biglist = range(pow(10, 7)) time.sleep(5) return Pool(5).map(worker, range(5))
Now top reports 5 x 1.3 GB, which is true. But /usr/bin/time -v still reports 1.3 GB, which makes no sense. If he measures the consumption of the parent process, then he should say 0. If he measures the parent and children, then he should report 5 x 1.3 GB. Why does it say 1.3 GB? Now try copy-to-write:
import time from multiprocessing import Pool biglist = range(pow(10, 7)) def worker(x): time.sleep(5) return Pool(5).map(worker, range(5))
Now /usr/bin/time -v reports 1.3 GB (again), which is correct. But top reports 6 x 1.3 GB, which is not true. When copying to a record, it should indicate only 1.3 GB.
How can I reliably control the memory usage of a Python script using multiprocessing ?
source share