Problem
I need to calculate many Fourier transforms. I would like to do this in parallel with my cores. Note that I do not need a parallel FFT algorithm, I just want to run many embarrassing parallel FFTs.
I found that while my CPU usage is increasing, my time to completion is not decreasing.
Example
We create some random data.
In [1]: import numpy as np In [2]: x = np.random.random(10000000)
And the time required to calculate the FFT of both cold, and after calculating it once.
In [3]: %time _ = np.fft.rfft(x)
We execute this sequence of sequences
In [5]: %time _ = map(np.fft.rfft, [x] * 12)
We do the same, but now use a thread pool of four threads.
In [7]: from multiprocessing.pool import ThreadPool In [8]: pool = ThreadPool(4)
We find that there is no acceleration. However, we find that CPU utilization, as measured by top , is around 400%. This is not a problem with the GIL. There is something about FFT that does not parallelize. Maybe we break through higher-level caches?
Hardware: Intel (R) Core (TM) i5-3320M CPU @ 2.60 GHz
Question
Typically, what happens here and is there a way to use multiple cores to speed up parallel parallel parallel computing?