Matlab repeats matrix multiplication - loop versus built-in features

Given the matrix A , I need to multiply with other vectors n Bi (i.e. i=1...n ). Size A can be as 5000x5000 and thus Bi as 5000x1 .

If I rate the product as follows:

 for i=1:n product=A*Bi; % do something with product end 

The result is a path (an order of magnitude) slower than calculating products such as:

 %assume that S is a matrix that contains the vectors Bi as columns, ie S(:,i)=Bi, then: results=A*S; %stores all the products in matrix form % do something with results 

The problem is that the number n of Bi vectors may be too large to be stored in memory, for example n=300000 , so I need to use a loop approach where every time I evaluate a product, use and then discard the Bi vector.

Why is this approach so slow compared to direct multiplication, and are there any ways to overcome this?

+5
source share
4 answers

You can try to loop through batches, for example,

 for i = 0:(n/k)-1 product = A*S(:,(i*k+1):(i+1)*k) end 

And tune k to find the best compromise of speed and memory for you.

MATLAB loops are slow because it is an interpreted language. Therefore, he must work hard on the fly. Currently, loops are greatly improved due to the JIT compiler, but they are still slow compared to the built-in functions that are written and compiled in C. In addition, they use really ultra-modern super fast matrix multiplication algorithms, compared to your rather naive algorithm achieved by a cycle, which also helps speed up the process you are experiencing.

+4
source

For simplicity, my answer will take a square matrix n by n, but this is also true for non-squares.

Your loop method uses vector matrix multiplication. The naive solution is also the most famous, which leads to O (n ^ 2) runtime, which is repeated n times. As a result, you get the full runtime of O (n ^ 3).

There is a better approach for matrix multiplication. The best-known algorithm requires only a little less O (n ^ 2,4), which makes it much faster for a large number.

You will achieve better execution time by multiplying several Bi vectors at once using matrix multiplication. This will not lead to the performance of pure matrix multiplication, but working with large slices of b is probably the fastest solution for efficient memory.

Some codes for the various approaches discussed are:

 n=5000; k=100; A=rand(n,n); S=rand(n,n); workers=matlabpool('size'); %for a parfor solution, the batch size must be smaller because multiple batches are stred in memory at once kparallel=k/workers; disp('simple loop:'); tic; for i = 1:n product = A*S(:,n); end toc disp('batched loop:'); tic; for i = 1:(n/k) product = A*S(:,(i-1)*k+1:(i)*k); end toc disp('batched parfor loop:'); tic; parfor i = 1:(n/kparallel) product = A*S(:,(i-1)*kparallel+1:(i)*kparallel); end toc disp('matrix multiplication:'); tic; A*S; toc 
+3
source

In addition to @Dan's answer, you can try in parallel if you have enough cores and big enough operations to make it profitable (see this answer for more details on parfor memory parfor ):

 parfor ii = 0:(n/k)-1 product = A*S(:,(ii*k+1):(ii+1)*k) end 

I do not see in the docs on mtimes (operator * ) whether it is implicitly multithreaded, but it I think it’s worth taking a picture.

+1
source

To make multiplications of each array with a matrix, simply multiply the matrix by one matrix, which will have the arrays you need as columns.

So if you want to check out this

if

 size(a)=3,3 

then

  a*b==horzcat(a*b(:,1),a*b(:,2),a*b(:,3)) 

truly

This way you save a lot of time from the loop

0
source

Source: https://habr.com/ru/post/1240167/


All Articles