CUDA Unable to see shared memory values in Nsight debugging

Question

CUDA Unable to see shared memory values in Nsight debugging

For some time I struggled with a problem with which I can not find a solution. The problem is that when I try to debug my CUDA code using Nvidia Nsight in Visual Studio 2008, I get strange results when using shared memory.

My code is:

template<typename T> __device__ T integrate() { extern __shared__ T s_test[]; // Dynamically allocated shared memory /**** Breakpoint (1) here ****/ int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing s_test[index] = (T)index; /* Some other irelevant code here */ } return v; }

When I get to breakpoint 1 and look at the shared memory inside the Visual Studio Watch window, only the first 8 values of the array change, and the rest remain empty. I would expect all the first 64 to do this. Watch window from Visual Studio

I thought this could have something to do with all the distortions that are not being performed at the same time. So I tried to sync them. I added this code inside integrate()

 template<typename T> __device__ T integrate() { /* Old code is still here */ __syncthreads(); /**** Breakpoint (2) here ****/ if(index < 64 && blockIdx.x==0) { T tmp = s_test[index]; // Write to tmp variable so I can inspect it inside Nsight Watch window v = tmp + index; // Use `tmp` and `index` somehow so that the compiler doesn't optimize it out of existence } return v; }

But the problem still exists. In addition, the rest of the values inside tmp are not 0 , as the VS form of the viewport is displayed.

I should mention that it takes a lot of steps to go to __syncthreads() , so when I get to it I just go to breakpoint 2. What the hell is going on !?

EDIT System / Startup Configuration Information

System

Name Intel (R) Core (TM) 2 Duo CPU E7300 @ 2.66 GHz
X86 architecture
Frequency 2.666 MHz
The number of cores 2
Page Size 4.096
Total physical memory 3.582.00 MB
Available physical memory 1.983.00 MB
Windows 7 Ultimate Version Name
Version Number 6.1.7600

GeForce 9500 GT

Driver Version 301.42
WDDM Driver Model
CUDA device index 0
GPU G96 Family
Computing ability 1.1
Number of SMs 4
Physical frame buffer size (MB) 512
Frame Buffer Bandwidth (GB / s) 16
Frame buffer bus width (bit) 128
Frame buffer location highlighted
Graphic Clock (Mhz) 812
Memory Clock (Mhz) 500
Processor Clock (Mhz) 1625
RAM Type DDR2

IDE

Microsoft Visual Studio Team System 2008
NVIDIA Nsight Visual Studio Edition, Version 2.2 Build No. 2.2.0.12255

Compiler

1> "C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v4.2 \ bin \ nvcc.exe" -G -gencode = arch = compute_10, code = \ "sm_10, compute_10 \" --machine 32 -ccbin "C: \ Program Files \ Microsoft Visual Studio 9.0 \ VC \ bin" -D_NEXUS_DEBUG -g -D_DEBUG -Xcompiler "/ EHsc / W3 / nologo / Od / Zi / RTC1 / MDd" -I "inc" -I "C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v4.2 \ include "-maxrregcount = 0 --compile -o" Debug / process_f2f.cu.obj "process_f2f.cu

Run configuration . The size of shared memory does not seem to matter. I tried several versions. The one I worked with the most:

Total memory 2048 bytes
Grid / Block Sizes: {101, 101, 1}, {16, 16, 1}

+4

debugging shared-memory cuda nsight

Iam Oct 2 '12 at 17:50

source share

1 answer

Younes nj · Answer 1 · 2015-11-17T17:03:40+0000

Have you tried putting __ syncthreads () after assigning values?

 template<typename T> __device__ T integrate() { extern __shared__ T s_test[]; // Dynamically allocated shared memory int index = threadIdx.x + threadIdx.y * blockDim.x; // Local index in block. Column major ordering if(index < 64 && blockIdx.x==0) { // Only work on a few values. Just testing s_test[index] = (T)index; /* Some other irelevant code here */ } __syncthreads(); /**** Breakpoint (1) here ****/ return v; }

And try to see the values at this breakpoint.

CUDA Unable to see shared memory values ​​in Nsight debugging

More articles:

CUDA Unable to see shared memory values in Nsight debugging