Python multiprocessing doesn't give expected acceleration

Question

Python multiprocessing doesn't give expected acceleration

I am trying to optimize my code using the Python multiprocessing.Pool module, but I am not getting the acceleration results that I would logically expect.

The main method that I do is to calculate matrix-vector products for a large number of vectors and a fixed large sparse matrix. Below is an example of a toy that does what I need, but with random matrices.

 import time import numpy as np import scipy.sparse as sp def calculate(vector, matrix = None): for i in range(50): v = matrix.dot(vector) return v if __name__ == '__main__': N = 1e6 matrix = sp.rand(N, N, density = 1e-5, format = 'csr') t = time.time() res = [] for i in range(10): res.append(calculate(np.random.rand(N), matrix = matrix)) print time.time() - t

The method completes in about 30 seconds.

Now, since the calculation of each element of the results does not depend on the results of any other calculation, it is natural to think that the calculation of parallels will speed up the process. The idea is to create 4 processes, and if each of them performs some calculations, then the time required to complete all processes should decrease in some sense by about 4 . To do this, I wrote the following code:

 import time import numpy as np import scipy.sparse as sp from multiprocessing import Pool from functools import partial def calculate(vector, matrix = None): for i in range(50): v = matrix.dot(vector) return v if __name__ == '__main__': N = 1e6 matrix = sp.rand(N, N, density = 1e-5, format = 'csr') t = time.time() input = [] for i in range(10): input.append(np.random.rand(N)) mp = partial(calculate, matrix = matrix) p = Pool(4) res = p.map(mp, input) print time.time() - t

My problem is that this code takes just over 20 seconds to run, so I haven't even improved performance with a factor of 2 ! Even worse, performance does not improve, even if there are 8 processes in the pool! Any idea why acceleration doesn't happen?

Note My actual method takes much longer and the input vectors are stored in a file. If I split the file into 4 pieces and then run my script in a separate process for each file manually, each process ends four times faster than it would for the whole file (as expected). I am confused why this acceleration (which is obviously possible) does not happen with multiprocessing.Pool

Edi : I just found Multiprocessing.Pool makes it easy to multiply a Numpy matrix , this question may be related. I have to check, however.

+5

performance python python-multiprocessing

5xum Oct 16 '14 at 8:13

source share

1 answer

codeMonkey · Answer 1 · 2015-01-05T22:38:59+0000

Try:

 p = Pool(4) for i in range(10): input = np.random.rand(N) p.apply_async(calculate, args=(input, matrix)) # perform function calculate as new process with arguments input and matrix p.close() p.join() # wait for all processes to complete

I suspect that the "partial" object and the map lead to blocking behavior. (although I have never used partial, so I am not familiar with it.)

"apply_async" (or "map_async") are multiprocessing methods that do not specifically block - (see Python multiprocessing.Pool: when to use apply, apply_async or map? )

Generally, for "embarrassing concurrent issues" like this, apply_async works for me.

EDIT:

I try to write the results to MySQL databases when this is done - the implementation I provided does not work if this is not your approach. A "map" is probably the right answer if you want to use the order in the list as your way of tracking which record is, but I remain suspicious of "partial" objects.

Python multiprocessing doesn't give expected acceleration

More articles: