I am trying to analyze some code that I found on the Internet, and I keep thinking about myself in the corner. I am looking at the core of a histogram launched with the following parameters
histogram<<<2500, numBins, numBins * sizeof(unsigned int)>>>(...);
I know that the parameters are the size of the grid, block, shared memory.
Does this mean that there are 2500 blocks of numBins threads each, each block also has a piece of numBins * sizeof(unsigned int) shared memory available for its threads?
In addition, there are __syncthreads() calls inside the kernel itself, are there then 2500 sets of numBins calls to __syncthreads() during the kernel call?
source share