Efficient matrix decomposition into square submatrices in C ++

Question

Efficient matrix decomposition into square submatrices in C ++

I implemented the Matrix data type in C ++ using the 1D data type and wrapping it in rows and columns. Now I want to have this ability to create square / locked submatrices from now on, and I want to do this in memory.

The problem is that I want some of these submatrices to be transferred to the GPU memory and able to process them there in parallel. This, for example, is useful for Matrix Multiplication. Since these sub-matrices are not aligned in the main memory, copy them to the device’s memory, since one block does not seem possible without creating a separate copy? I want this direct copy of a copy of the GPU submatrix to the original processor matrix to increase efficiency and effectiveness. I do not know about the exact separation in advance.

Does anyone have any ideas how I can achieve this?

Just a reminder, the matrix should be partitioned in blocks, not a row, which will be relatively easy in C / C ++.

+4

c ++ c stl gpgpu gpu-programming

usman Feb 17 '11 at 12:18

source share

2 answers

xtofl · Answer 1 · 2011-02-17T12:27:50+0000

If the required sub-matrices are known at the time of creation of the "master" matrix, and if they form a wizard section, it is possible to create a composite matrix class in the following way:

// supposing an IMatrix<T> interface (pure virtual members only) class template< typename T > struct CompositeMatrix : public IMatrix<T> { typedef std::vector<PlainMatrix<T>*> tMatrices; tMatrices submatrices; T& element( size_t row, size_t column ) { return findsubmatrix( row, column )->element( row, column ); } // find algorithm implementing 'chain of responsibility-like' pattern. PlainMatrix<T>* findsubmatrix( size_t row, size_t col ) { for( tMatrices::iterator it = submatrices.begin() ; it != submatrices.end() ; ++it) { if( it->contains( row,col ) ) return *it; } return NULL; } };

"PlainMatix" can be organized in an efficient manner.

Dave o. · Answer 2 · 2011-02-23T01:32:13+0000

If the size of your matrices is 2, you can save them in the host memory in z-order . So you just need the start and end index of the submatrix to copy it with a single call to cudaMemcpy .

Efficient matrix decomposition into square submatrices in C ++

More articles: