I have been writing MATLAB scripts since some time, and yet, I don’t understand how this works “under the hood”. Consider the following script that performs some calculations using (large) vectors in three different ways:
- Vector operations MATLAB;
- Simple for a loop that performs the same calculations across components;
- An optimized cycle, which should be faster than 2., so as to avoid some allocation and some assignment.
Here is the code:
N = 10000000; A = linspace(0,100,N); B = linspace(-100,100,N); C = linspace(0,200,N); D = linspace(100,200,N); % 1. MATLAB Operations tic C_ = C./A; D_ = D./B; G_ = (A+B)/2; H_ = (C_+D_)/2; I_ = (C_.^2+D_.^2)/2; X = G_ .* H_; Y = G_ .* H_.^2 + I_; toc tic X; Y; toc % 2. Simple cycle tic C_ = zeros(1,N); D_ = zeros(1,N); G_ = zeros(1,N); H_ = zeros(1,N); I_ = zeros(1,N); X = zeros(1,N); Y = zeros(1,N); for i = 1:N, C_(i) = C(i)/A(i); D_(i) = D(i)/B(i); G_(i) = (A(i)+B(i))/2; H_(i) = (C_(i)+D_(i))/2; I_(i) = (C_(i)^2+D_(i)^2)/2; X(i) = G_(i) * H_(i); Y(i) = G_(i) * H_(i)^2 + I_(i); end toc tic X; Y; toc % 3. Opzimized cycle tic X = zeros(1,N); Y = zeros(1,N); for i = 1:N, X(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2); Y(i) = (A(i)+B(i))/2 * (( C(i)/A(i) + D(i)/B(i) ) /2)^2 + ( (C(i)/A(i))^2 + (D(i)/B(i))^2 ) / 2; end toc tic X; Y; toc
I know that one should always try to vectorize the calculations after being built by MATLAB on matrices / vectors (thus, this is currently not always the best choice), so I expect something like:
C = A .* B;
faster than:
for i in 1:N, C(i) = A(i) * B(i); end
What I do not expect is that it is really faster even in the above script, despite the fact that the second and third methods that I use only go through one cycle, while the first method performs many vector operations (which is theoretically are a "for" loop every time). This leads me to conclude that MATLAB has some magic that allows (for example):
C = A .* B; D = C .* C;
will be faster than one “for” cycle with some work inside it.
So:
- What is the magic that allows you to quickly complete the 1st part?
- when you write "D = A. * B", does MATLAB actually do component calculation with a for loop, or just keep track that D contains some multiplication of "bla" and "bla"?
EDIT
- Suppose I want to implement the same calculations using C ++ (using maybe some library). Will the first MATLAB method be faster than the third implemented in C ++? (I will answer this question myself, just give me some time.)
EDIT 2
According to the request, there is an experiment execution time:
Part 1: 0.237143
Part 2: 4.440132 of which 0.195154 for distribution
Part 3: 2.280640 of which 0.057500 for placement
and without JIT:
Part 1: 0.337259
Part 2: 149.602017 of which 0.033886 for placement
Part 3: 82.167713 of which 0.010852 for distribution