Solving multiple linear systems using vectorization

Sorry if this is obvious, but I searched for a while and didn't find anything (or missed).

I am trying to solve linear systems of the form Ax = B with the matrix A a 4x4 and B a 4x1.

I know that for one system I can use mldivide to get x: x=A\B

However, I am trying to solve a large number of systems (possibly> 10000), and I do not want to use a for loop because I was told that this is noticeably slower than the matrix formulation in many MATLAB problems.

Then my question is: is there a way to solve Ax = B using vectorization with A 4x4x N and B matrix 4x N?

PS: I do not know if this is important, but the vector B is the same for all systems.

+6
source share
5 answers

You should use a for loop. It may be an advantage to precompile factorization and reuse if A stays the same and B changes. But for your problem, when changes A and B remain unchanged, there is no alternative to solving N linear systems.

You also should not worry too much about the cost of executing loops: the MATLAB JIT compiler means that loops can often be just as fast in recent versions of MATLAB.

+5
source

I do not think you can optimize this further. As @Tom explained, since A is the one that is changing, there is no use in factorizing different A in advance ...

In addition, the loopback solution is pretty fast, given the dimensions mentioned:

 A = rand(4,4,10000); B = rand(4,1); %# same for all linear systems tic X = zeros(4,size(A,3)); for i=1:size(A,3) X(:,i) = A(:,:,i)\B; end toc 

The elapsed time is 0.168101 seconds.

+5
source

Here's the problem: you are trying to perform a 2D operation ( mldivide ) on a 3d matrix. No matter how you look at it, you need to refer to the matrix by index, where there is time when the time penalty ... this is not a for loop, which is a problem, but how people use them. If you can structure your problem differently, then you may be able to find a better option, but now you have several options:

1 - mex

2 - parallel processing (write parform)

3 - CUDA

+3
source

Here is a rather esoteric solution using the special features of MATLAB. Build a huge 4k x 4k sparse matrix with 4x4 blocks diagonally. Then allow all at once.

On my machine, this solution gets the same solution up to the same accuracy as the @ Amro / Tom for-loop solution, but faster.

 n = size(A,1); k = size(A,3); AS = reshape(permute(A,[1 3 2]),n*k,n); S = sparse( ... repmat(1:n*k,n,1)', ... bsxfun(@plus,reshape(repmat(1:n:n*k,n,1),[],1),0:n-1), ... AS, ... n*k,n*k); X = reshape(S\repmat(B,k,1),n,k); 

for a random example:

 For k = 10000 For loop: 0.122570 seconds. Giant sparse system: 0.032287 seconds. 

If you know that your 4x4 matrices are positive definite, you can use chol on S to increase accuracy.

This is stupid. But just as the slow matlab for loops is still in 2015, even with JIT. This solution seems to find a sweet spot when k is not too large, so everything still fits into memory.

+3
source

I know this post has been around for many years, but I’ll make my two cents anyway. You can put all your matrices A into the large diagonal matrix of the block, where there will be 4x4 blocks on the diagonal of the large matrix. The right side will be all of your vectors b stacked on top of each other again and again. Once you install this, it is presented as a sparse system and can be effectively resolved using the algorithms that mldivide selects. Blocks are numerically decoupled, so even if they have special blocks, the answers to nonsingular blocks should be right when you use mldivide. There is code that used this approach in MATLAB Central:

http://www.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver

I suggest experimenting so that the approach is faster than the loop. I suspect this may be more efficient, especially for a large number of small systems. In particular, if there are good formulas for the coefficients A with respect to N matrices, you can construct the full left side with the help of vector operations MATLAB (without cycles), which can give you additional cost savings. As others noted, vectorized operations are not always faster, but they are often in my experience.

+2
source

Source: https://habr.com/ru/post/890501/


All Articles