I am trying to optimize my code using the Python multiprocessing.Pool module, but I am not getting the acceleration results that I would logically expect.
The main method that I do is to calculate matrix-vector products for a large number of vectors and a fixed large sparse matrix. Below is an example of a toy that does what I need, but with random matrices.
import time import numpy as np import scipy.sparse as sp def calculate(vector, matrix = None): for i in range(50): v = matrix.dot(vector) return v if __name__ == '__main__': N = 1e6 matrix = sp.rand(N, N, density = 1e-5, format = 'csr') t = time.time() res = [] for i in range(10): res.append(calculate(np.random.rand(N), matrix = matrix)) print time.time() - t
The method completes in about 30 seconds.
Now, since the calculation of each element of the results does not depend on the results of any other calculation, it is natural to think that the calculation of parallels will speed up the process. The idea is to create 4 processes, and if each of them performs some calculations, then the time required to complete all processes should decrease in some sense by about 4 . To do this, I wrote the following code:
import time import numpy as np import scipy.sparse as sp from multiprocessing import Pool from functools import partial def calculate(vector, matrix = None): for i in range(50): v = matrix.dot(vector) return v if __name__ == '__main__': N = 1e6 matrix = sp.rand(N, N, density = 1e-5, format = 'csr') t = time.time() input = [] for i in range(10): input.append(np.random.rand(N)) mp = partial(calculate, matrix = matrix) p = Pool(4) res = p.map(mp, input) print time.time() - t
My problem is that this code takes just over 20 seconds to run, so I haven't even improved performance with a factor of 2 ! Even worse, performance does not improve, even if there are 8 processes in the pool! Any idea why acceleration doesn't happen?
Note My actual method takes much longer and the input vectors are stored in a file. If I split the file into 4 pieces and then run my script in a separate process for each file manually, each process ends four times faster than it would for the whole file (as expected). I am confused why this acceleration (which is obviously possible) does not happen with multiprocessing.Pool
Edi : I just found Multiprocessing.Pool makes it easy to multiply a Numpy matrix , this question may be related. I have to check, however.