For simplicity, my answer will take a square matrix n by n, but this is also true for non-squares.
Your loop method uses vector matrix multiplication. The naive solution is also the most famous, which leads to O (n ^ 2) runtime, which is repeated n times. As a result, you get the full runtime of O (n ^ 3).
There is a better approach for matrix multiplication. The best-known algorithm requires only a little less O (n ^ 2,4), which makes it much faster for a large number.
You will achieve better execution time by multiplying several Bi vectors at once using matrix multiplication. This will not lead to the performance of pure matrix multiplication, but working with large slices of b is probably the fastest solution for efficient memory.
Some codes for the various approaches discussed are:
n=5000; k=100; A=rand(n,n); S=rand(n,n); workers=matlabpool('size'); %for a parfor solution, the batch size must be smaller because multiple batches are stred in memory at once kparallel=k/workers; disp('simple loop:'); tic; for i = 1:n product = A*S(:,n); end toc disp('batched loop:'); tic; for i = 1:(n/k) product = A*S(:,(i-1)*k+1:(i)*k); end toc disp('batched parfor loop:'); tic; parfor i = 1:(n/kparallel) product = A*S(:,(i-1)*kparallel+1:(i)*kparallel); end toc disp('matrix multiplication:'); tic; A*S; toc
source share