Will there be a performance difference for CUDA blocks of size 1024x1 versus 32x32?

It is expected that these two block sizes (1024x1 vs 32x32) will be executed in terms of thread scheduling and the memory bandwidth perspective? Is there an expected performance difference between these two block sizes? Note that both use 1024 threads per block.

+4
source share
1 answer

Threadblock sizes, especially when we talk about the same number of threads per block, do not in themselves affect performance.

- warps. , . threadIdx.x, blockIdx.x .., , .

+3

Source: https://habr.com/ru/post/1540294/


All Articles