Dear stackoverflow community!
Today I found that on a high-end cluster architecture, elemental multiplication by two cubes with dimensions of 1921 x 512 x 512 takes ~ 27 s. This is too long, since I have to perform such calculations at least 256 times for azimuthal averaging of the power spectrum in the current implementation. I found that slow performance was mainly due to different step structures (C in one case and FORTRAN in another). One of the two arrays was the newly created Boolean grid (order C), and the other (FORTRAN order) from numpy.fft.fftn () the Fourier transform of the input grid (order C). Any reasons why numpy.fft.fftn () changes the steps and ideas on how to prevent this, except changing the axes (which would be just a workaround)? With similar steps ( ndarray.copy () mesh FT) ~ 4s achievable, huge improvement.
The question is this:
Consider an array:
ran = np.random.rand(1921, 512, 512)
ran.strides
(2097152, 4096, 8)
a = np.fft.fftn(ran)
a.strides
(16, 30736, 15736832)
As we see, the structure of the step is different. How can this be prevented (without using a = np.fft.fftn (ran, axes = (1,0)))? Are there any other numpy array methods that can affect the structure of the step? What can be done in these cases?
Useful tips, as usual, are greatly appreciated!
source
share