Cuda, order of execution in the 3d block

As a title, I would like to know the correct execution order if we have a 3d block

I think I remember that I already read something about it, but it was some time ago, I don’t remember where, but he was walking by someone who did not look so reliable.

In any case, I would like to receive some confirmation about this.

Is this the following (skewed)?

[0, 0, 0] ... [blockDim.x, 0, 0] - [0, 1, 0] ... [blockDim.x, 1, 0] - (...) - [0, blockDim .y, 0] ... [blockDim.x, blockDim.y, 0] - [0, 0, 1] ... [blockDim.x, 0, 1] - (...) - [0, blockDim .y, 1] ... [blockDim.x, blockDim.y, 1] - (...) - [blockDim.x, blockDim.y, blockDim.z]

+6
source share
1 answer

Yes, this is the correct order; streams are ordered with resizing x first, then y, and then z (equivalent to column order) inside the block. The calculation can be expressed as

int threadID = threadIdx.x + blockDim.x * threadIdx.y + (blockDim.x * blockDim.y) * threadIdx.z; int warpID = threadID / warpSize; int laneID = threadID % warpsize; 

Here threadID is the thread number inside the block, warpID is the deformation inside the block, and laneID is the thread number inside the warp.

Note that threads are not necessarily executed in any predictable order associated with this ordering inside a block. The execution model ensures that the threads of the same warp are executed "block-step", but you cannot deduce them more than from the numbering of threads inside the block.

+7
source

Source: https://habr.com/ru/post/920586/


All Articles