Can anyone help? I am a pretty experienced Matlab user, but I am unable to speed up the code below.
The fastest time that I could achieve in one pass through all three loops using 12 cores is ~ 200 s. The actual function will be called ~ 720 times, and the speed will take more than 40 hours. According to the Matlab profiler, most of the processor’s time is spent calling an exponential function. I was able to significantly speed this up using gpuArray and then run the exp call on the Quadro 4000 graphics card, however this prevents the use of the parfor loop, since there is only one video card on the workstation that destroys any profit. Can someone help, or is this code close to the optimal one that can be achieved with Matlab? I wrote a very crude implementation of C ++ using openMP, but got a small gain.
Thank you very much in advance
function SPEEDtest_CPU % Variable setup: % - For testing I'll use random variables. These will actually be fed into % the function for the real version of this code. sy = 320; sx = 100; sz = 32; A = complex(rand(sy,sx,sz),rand(sy,sx,sz)); B = complex(rand(sy,sx,sz),rand(sy,sx,sz)); C = rand(sy,sx); D = rand(sy*sx,1); F = zeros(sy,sx,sz); x = rand(sy*sx,1); y = rand(sy*sx,1); x_ind = (1:sx) - (sx / 2) - 1; y_ind = (1:sy) - (sy / 2) - 1; % MAIN LOOPS % - In the real code this set of three loops will be called ~720 times! % - Using 12 cores, the fastest I have managed is ~200 seconds for one % call of this function. tic for z = 1 : sz A_slice = A(:,:,z); A_slice = A_slice(:); parfor cx = 1 : sx for cy = 1 : sy E = ( x .* x_ind(cx) ) + ( y .* y_ind(cy) ) + ( C(cy,cx) .* D ); F(cy,cx,z) = (B(cy,cx,z) .* exp(-1i .* E))' * A_slice; end end end toc end
source share