I want to use memory __constant__to which all threads in all my cores will be available.
The ad looks something like this:
extern __constant__ float smooth [8 * 1024];
I copy data to this variable using
cudaMemcpyToSymbol("smooth", smooth_local, smooth_size, 0, cudaMemcpyHostToDevice);
smooth_size = 7K bytes
It gave me the wrong conclusion
but when I ran it in mode -deviceemuand tried to print the contents of both of these variables inside the kernel, I got all zeros for smooth and smooth_local was correct.
I tried to print the output right after cudaMemcpyToSymbol, but it gave me 0.
Can you shed light on my problem?
Nishu
source
share