Type Qualifiers for device class in CUDA

Question

Type Qualifiers for device class in CUDA

I'm currently trying to make a piece of CUDA code with a class that will be used exclusively on the device side (i.e. the host should not know about its existence). However, I cannot work out the correct qualifiers for the class ( deviceclass below):

 __device__ float devicefunction (float *x) {return x[0]+x[1];} class deviceclass { private: float _a; public: deviceclass(float *x) {_a = devicefunction(x);} float getvalue () {return _a;} }; // Device code __global__ void VecInit(float* A, int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) { deviceclass *test; test = new deviceclass(1.0, 2.0); A[i] = test->getvalue(); } } // Standard CUDA guff below: Variables float *h_A, *d_A; // Host code int main(int argc, char** argv) { printf("Vector initialization...\n"); int N = 10000; size_t size = N * sizeof(float); // Allocate h_A = (float*)malloc(size); cudaMalloc(&d_A, size); printf("Computing...\n"); // Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; VecInit<<<blocksPerGrid, threadsPerBlock>>>(d_A, N); // Copy result from device memory to host memory cudaMemcpy(h_A, d_A, size, cudaMemcpyDeviceToHost); //...etc }

Configuring deviceclass as soon as a __device__ throws an error __device__ from a global function, however setting it as __device__ __host__ or __global__ seems unnecessary. Can someone point me in the right direction?

+4

c ++ scope device qualifiers cuda

Phil Feb 22 '11 at 13:00

source share

2 answers

I believe Node() is a typo.

In the CUDA C Programming Guide, section 3.1.5:

However, only a subset of C ++ is fully supported for device code

and Appendix D.6:

Code compiled for devices with computing power of 2.x and higher can use C ++ classes ...

I think your code uses incompatible C ++.

0

koan Feb 22 '11 at 17:41

source share

Phil · Accepted Answer · 2011-02-23T16:36:26+0000

It turns out that qualifiers should go for member functions of the class, below is a fully working version:

 #include <iostream> #include <stdio.h> #include <stdlib.h> using namespace std; void Cleanup(void); // Functions to be pointed to __device__ float Plus (float a, float b) {return a+b;} class deviceclass { private: float test; public: __device__ deviceclass(float a, float b) { test = Plus(a,b); } __device__ float getvalue() {return test;} }; // Device code __global__ void VecInit(float* A, int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) { deviceclass test(1.0, 2.0); A[i] = test.getvalue(); } } // Standard CUDA guff below: Variables float *h_A, *d_A; // Host code int main(int argc, char** argv) { printf("Vector initialization...\n"); int N = 10000; size_t size = N * sizeof(float); // Allocate h_A = (float*)malloc(size); cudaMalloc(&d_A, size); printf("Computing...\n"); // Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; VecInit<<<blocksPerGrid, threadsPerBlock>>>(d_A, N); // Copy result from device memory to host memory cudaMemcpy(h_A, d_A, size, cudaMemcpyDeviceToHost); // Verify result int i; for (i = 0; i < N; ++i) { cout << endl << h_A[i]; } cout << endl; Cleanup(); } void Cleanup(void) { // Free device memory if (d_A) cudaFree(d_A); // Free host memory if (h_A) free(h_A); cudaThreadExit(); exit(0); }

Type Qualifiers for device class in CUDA

More articles: