Mldivide vs (LU & linsolve)

This question may be too broad to be placed here, but I will try to be as specific as possible. If you still think it is too wide, I will just delete it.

  • Look at the EDIT below for my last thoughts on this.
  • Also look at Ander Biguri's answer if you have access to the parallel computing panel and the NVIDIA GPU.

My problem:

I solve dynamic equations using the Newmark scheme (implicit 2nd order), which involves solving a set of linear shape systems A*x=bfor x.

I have already optimized all the code that does not require linear systems. Currently, the solution of linear systems takes up to 70% of the calculation time in the process.

I use MATLAB linsolve, but my matrix Adoes not have any properties that could be used optsfor both linsolve.

Idea:

As can be seen from the documentation linsolve:

If A has properties in opts, linsolve is faster than mldivide because Linsolve does not perform any tests to verify that A has the specified properties

As far as I know, using mldivideMATLAB will use the LU decomposition, since my matrix Adoes not have any specific property other than being square.

My question is:

, , A, MATLAB lu, linsolve, x = U\(L\b) opts, , , MATLAB , mldivide.

: . , , 2% .

, , ? , , , , - , - mldivide.

:

A=randn(2500);
% Getting A to be non singular
A=A.'*A;
x_=randn(2500,1);
b=A*x_;
clear x_

% Case 1 : mldivide
tic
for ii=1:100

    x=A\b;

end
out=toc;
disp(['Case 1 time per iteration :' num2str((out)/100)]);

% Case 2 : LU+linsolve

opts1.LT=true;
opts2.UT=true;

tic;
for ii=1:100

    [L,U]=lu(A);

    % It seems that these could be directly replaced by U\(L\b) as mldivide check for triangularity first
    Tmp=linsolve(L,b,opts1);
    x=linsolve(U,Tmp,opts2);

end
out2=toc;

disp(['Case 2 time per iteration :' num2str((out2)/100)]);

.

linsolve, - opts, lu, . ( @rayryeng "timeit, " ), 2 ~ 3% mldivide, . , - , .

timeit 1626*1626:

mldivide:

 t1 =

   0.102149773097083   

linsolve:

t2 =

   0.099272037768204

: 0.028171725121151

+4
1

, NVIDIA , , :

:

tic;

for ii=1:10
        A2=gpuArray(A); % so we account for memory management
        b2=gpuArray(b);
      x=A2\b2;
end
out2=toc;

(CPU GPU)

Case 1 time per iteration :0.011881
Case 2 time per iteration :0.0052003
+4

Source: https://habr.com/ru/post/1662613/


All Articles