I need to perform a discretized convolution of (complex) matrices and define the following function in Julia:
function convolve(M::Array{Complex{Float64},2}, K::Array{Float64,2}, p::Int) (n,m) = size(M) res = zeros(Complex{Float64},n) for k=1:p for l=1:n res[l] += M[l,k]*K[l,end-p+k] end end return res end
I use it like this:
M=complex(rand(2000,2000)) K=rand(2000,2000) @time convolve(M,K,2000,0)
Now it is relatively fast and surprisingly faster (about 3 times) than the vectorized version, where I replace the inner loop with res += M[:,k].*K[:,end-p+k] . (I think this is due to the large memory allocation for temporary arrays, I can live with it).
But the vectorized MATLAB code runs about 5 times faster:
function res = convolve(M, K, p) n = size(M,1); res = zeros(n,1); for k=1:p res = res + M(:,k).*K(:,end-p+k); end end
What am I doing wrong, and how do I get Julia to perform such multiple multiplications as fast as MATLAB? Is this an indexing issue?
Note. I checked with @code_warntype that there is no funny thing with type indecision (no Any or Union , etc.), but the problem can be more subtle. The @code_llvm macro gives an amazingly long output, but I'm not an expert, so it's hard for me to understand what is happening.