How to work with Eigen in CUDA kernels

Eigen - C ++ linear algebra library http://eigen.tuxfamily.org .

It is easy to work with basic data types, such as basic floating point arrays, and simply copy them to the device’s memory and pass a pointer to the cuda kernels. But the Eigen matrix is ​​a complex type, so how to copy it to the device memory and let the cuda kernels read / write with it?

+8
source share
4 answers

If you want to access Eigen::Matrix data using a raw C pointer, you can use the .data() function. The coefficient is stored sequentially in memory in the main column order by default or in the main row, if you requested:

 MatrixXd A(10,10); double *A_data = A.data(); 
+4
source

Since November 2016 (Eigen 3.3 release) there is a new option: Use Eigen directly inside the CUDA cores - see this question .

An example from a related question:

 __global__ void cu_dot(Eigen::Vector3f *v1, Eigen::Vector3f *v2, double *out, size_t N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if(idx < N) { out[idx] = v1[idx].dot(v2[idx]); } return; } 

Copying the Eigen::Vector3f to a device:

 Eigen::Vector3f *host_vectors = new Eigen::Vector3f[N]; Eigen::Vector3f *dev_vectors; cudaMalloc((void **)&dev_vectors, sizeof(Eigen::Vector3f)*N) cudaMemcpy(dev_vectors, host_vectors, sizeof(Eigen::Vector3f)*N, cudaMemcpyHostToDevice) 
+9
source

In addition to rewriting and correcting the code, there is a library compatible with Eigen, written as a by-product of a research project that performs matrix calculations on a GPU, and you can use several backends: https://github.com/rudaoshi/gpumatrix

I cannot vouch for him, but if it works, this is probably exactly what you are looking for.

If you want to use a more universal solution, this thread seems to contain very useful information.

+4
source

There are two ways.

Make your own work on the GPU, which is probably difficult and will not work well. At least if working on a GPU means only getting it for compilation and getting results. Eigen is almost optimized for modern processors. Internally Eigen uses its own allocators and memory layouts, which most likely will not work well on CUDA.

The 2nd method is easier to do and should not violate the legacy Eigen code, and probaly is the only one suitable in your case. Switch your base matrices to simple matrices (i.e. double** ) use Eigen::Map . Thus, you will have an Eigen interface for a simple data type, so the codes should not be interrupted, and you can send the matrix to the GPU as a regular c-array, as is usually done. The downside is that you probably won't use Eigen to its full potential, however, if you offload most of the work on the GPU, that's fine.

In fact, this changed the situation a bit. Instead of getting Eigen arrays to work on CUDA, you can get Eigen to work with regular arrays.

+1
source

Source: https://habr.com/ru/post/1261186/


All Articles