I find many similar questions, but do not answer. For a simple array, there is multiprocessing. For a sparse matrix or any other arbitrary object, I find manager.namespace. So I tried the code below:
from scipy import sparse from multiprocessing import Pool import multiprocessing import functools def myfunc(x,ns): return ns.A[x,:]*ns.A*ns.A[:,x] manager = multiprocessing.Manager() Global = manager.Namespace() pool=Pool() Global.A=sparse.rand(10000,10000,0.5,'csr') myfunc2=functools.partial(myfunc,ns=Global) r=pool.map(myfunc2, range(100))
The code works, but is not efficient. In fact, only 4 out of 16 workers work. The reason is that, I think, a manager allows only one worker to access data at a time. Since the data is read-only, I really don't need a lock. So is there a better way to do this?
ps, I saw people talking about copy-on-write fork (). I really don't understand what it is, but it does not work. If I first generate A and do Pool (), each process will have a copy of A.
Thanks in advance.
source share