I am currently working on something similar. I decided to basically wrap the conjugate gradient and incomplete bootstraps, pre-processed with software samples that are associated with the CUDA SDK, in a small class.
You can find them in the CUDA_HOME directory under: samples/7_CUDALibraries/conjugateGradient and /Developer/NVIDIA/CUDA-samples/7_CUDALibraries/conjugateGradientPrecond
Basically, you load the matrix into the device memory once (and for ICCG, calculate the corresponding analysis of the air conditioners / matrices), then call the decisive kernel with different vectors b.
I donβt know what you expect from your structure of a matrix strip, but if it is symmetrical and diagonally dominates (the diagonal stripes along each row and column have the opposite diagonal sign, and their sum is less than the record diagonal) or positive definite (no eigenvectors with eigenvalue 0.), then CG and ICCG should be useful. Alternatively, various multigrid algorithms are another option if you want to go through their coding.
If your matrix is ββonly positive semidefinite (for example, it has at least one eigenvector with an eigenvalue of zero), you can still avoid using CG or ICCG if you make sure that: 1) The right-hand side of (b vectors) is orthogonal to the zero space ( empty space, meaning eigenvectors with an eigenvalue of zero). 2) Your solution is orthogonal to zero space.
It is interesting to note that if you have a nontrivial empty space, then different numerical solvers can give you different answers for the same exact system. The solutions will be distinguished by a linear combination of zero space ... This problem caused me many human hours of debugging and disappointment before I finally caught it, so it's good to know about it.
Finally, if your matrix has a Circulant Band structure, you can consider using a solution based on the fast Fourier transform (FFT). FFT-based numerical solvers can often provide superior performance where applicable.
source share