Using global and persistent memory in CUDA

Question

Using global and persistent memory in CUDA

Hey, I have the following code:

#if USE_CONST == 1 __constant__ double PNT[ SIZE ]; #else __device__ double *PNT; #endif

and a little later I:

 #if USE_CONST == 0 cudaMalloc((void **)&PNT, sizeof(double)*SIZE); cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice); #else cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE); #endif

whereas point is defined somewhere in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than not. I access the array in my kernel function through

PNT[ index ]

Where is the problem between both options? Thanks!

+5

memory cuda

tim May 17 '11 at 23:24

source share

2 answers

Although this is an old question, I am adding this for future googlers:

The problem is here:

 cudaMalloc((void **)&PNT, sizeof(double)*SIZE); cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

cudaMalloc writes to the host version of PNT , which is actually a device variable that cannot be accessed from the host. It would be correct to allocate the memory, copy the address to the device symbol and copy the memory into the memory pointed to by this symbol:

 void* memPtr; cudaMalloc(&memPtr, sizeof(double)*SIZE); cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr)); // In other places you'll need an additional: // cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr)); cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

It would be easier:

 #if USE_CONST == 1 __constant__ double PNT[ SIZE ]; #else __device__ double PNT[ SIZE ]; #endif // No #if required anymore: cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);

+1

Flamefire Dec 03 '15 at 10:39

source share

talonmies · Accepted Answer · 2011-05-17T23:32:27+0000

Proper use of cudaMemcpyToSymbol prior to CUDA 4.0:

 cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)

or alternatively:

 double *cpnt; cudaGetSymbolAddress((void **)&cpnt, "PNT"); cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

which can be a little faster if you plan to access the symbol from the host API more than once.

EDIT: misunderstood the question. For a global version of memory, do something similar to the second version for persistent memory

 double *gpnt; cudaGetSymbolAddress((void **)&gpnt, "PNT"); cudaMemcpy(gpnt, point, sizeof(double)*SIZE. cudaMemcpyHostToDevice););

Using global and persistent memory in CUDA

More articles: