CUDA Batch Solution with Ax = b Sparse Range for Different B

I have a sparse matrix A, and I would like to (directly) solve Ax = b. I have about 500 vectors b, so I would like to solve for the corresponding 500 x. I'm new to CUDA, so I'm a little confused as to which options I have.

cuSOLVER has the cuSolverSP batch direct solver for sparse A_i x_i = b_i using QR here . (I would be fine with LU, since A is decently conditioned.) However, as far as I can tell, I cannot use the fact that all my A_i are the same.

Will there be an alternative to first determine the sparse LU (QR) factorization on the CPU or GPU, and then run backsubstitution (backsub and matrix mult, respectively) on the GPU? If cusolverSp <t> csrlsvlu () is for one b_i, is there a standard way to perform this operation for multiple b_i?

Finally, since I have no intuition for this, should I expect GPU acceleration for any of these parameters, given the necessary overhead? x has a length of ~ 10000-100000. Thanks.

+6
source share
3 answers

I am currently working on something similar. I decided to basically wrap the conjugate gradient and incomplete bootstraps, pre-processed with software samples that are associated with the CUDA SDK, in a small class.

You can find them in the CUDA_HOME directory under: samples/7_CUDALibraries/conjugateGradient and /Developer/NVIDIA/CUDA-samples/7_CUDALibraries/conjugateGradientPrecond

Basically, you load the matrix into the device memory once (and for ICCG, calculate the corresponding analysis of the air conditioners / matrices), then call the decisive kernel with different vectors b.

I don’t know what you expect from your structure of a matrix strip, but if it is symmetrical and diagonally dominates (the diagonal stripes along each row and column have the opposite diagonal sign, and their sum is less than the record diagonal) or positive definite (no eigenvectors with eigenvalue 0.), then CG and ICCG should be useful. Alternatively, various multigrid algorithms are another option if you want to go through their coding.

If your matrix is ​​only positive semidefinite (for example, it has at least one eigenvector with an eigenvalue of zero), you can still avoid using CG or ICCG if you make sure that: 1) The right-hand side of (b vectors) is orthogonal to the zero space ( empty space, meaning eigenvectors with an eigenvalue of zero). 2) Your solution is orthogonal to zero space.

It is interesting to note that if you have a nontrivial empty space, then different numerical solvers can give you different answers for the same exact system. The solutions will be distinguished by a linear combination of zero space ... This problem caused me many human hours of debugging and disappointment before I finally caught it, so it's good to know about it.

Finally, if your matrix has a Circulant Band structure, you can consider using a solution based on the fast Fourier transform (FFT). FFT-based numerical solvers can often provide superior performance where applicable.

+1
source

If you don't mind going to the open source library, you can also check out CUSP: CUSP Quick Launch Page

It has a pretty decent set of solvers, including several predefined methods: Examples of CUSP presets

The smoothed aggregation pre-generator (a variant of an algebraic multigrid program) seems to work very well if your GPU has enough internal memory for it.

0
source

Is it available in any store?

-1
source

Source: https://habr.com/ru/post/986768/


All Articles