Using global and persistent memory in CUDA

Hey, I have the following code:

#if USE_CONST == 1 __constant__ double PNT[ SIZE ]; #else __device__ double *PNT; #endif 

and a little later I:

 #if USE_CONST == 0 cudaMalloc((void **)&PNT, sizeof(double)*SIZE); cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice); #else cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE); #endif 

whereas point is defined somewhere in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than not. I access the array in my kernel function through

PNT[ index ]

Where is the problem between both options? Thanks!

+5
source share
2 answers

Proper use of cudaMemcpyToSymbol prior to CUDA 4.0:

 cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE) 

or alternatively:

 double *cpnt; cudaGetSymbolAddress((void **)&cpnt, "PNT"); cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice); 

which can be a little faster if you plan to access the symbol from the host API more than once.

EDIT: misunderstood the question. For a global version of memory, do something similar to the second version for persistent memory

 double *gpnt; cudaGetSymbolAddress((void **)&gpnt, "PNT"); cudaMemcpy(gpnt, point, sizeof(double)*SIZE. cudaMemcpyHostToDevice);); 
+3
source

Although this is an old question, I am adding this for future googlers:

The problem is here:

 cudaMalloc((void **)&PNT, sizeof(double)*SIZE); cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice); 

cudaMalloc writes to the host version of PNT , which is actually a device variable that cannot be accessed from the host. It would be correct to allocate the memory, copy the address to the device symbol and copy the memory into the memory pointed to by this symbol:

 void* memPtr; cudaMalloc(&memPtr, sizeof(double)*SIZE); cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr)); // In other places you'll need an additional: // cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr)); cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice); 

It would be easier:

 #if USE_CONST == 1 __constant__ double PNT[ SIZE ]; #else __device__ double PNT[ SIZE ]; #endif // No #if required anymore: cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE); 
+1
source

Source: https://habr.com/ru/post/902621/


All Articles