In CUDA, are non-coalesced memory accesses branching the industry?

I always thought that the divergence in the branches was caused only by the branch code, for example, "if", "else", "for", "switch", etc. However, I recently read an article that says:

β€œYou can clearly see that the number of diverging branches occupied by threads in each first intelligence-based algorithm is at least two times more important than a full intelligence strategy. As a rule, this is the result of additional incompatible calls to global memory. Therefore, this discrepancy threads leads to many memory accesses that need to be serialized, which increases the total number of instructions executed.

You may notice that the number of warp serializations for a version using incompatible accesses is seven to sixteen times more important than for its counterpart. Indeed, thread discrepancies caused by non-coalesced accesses lead to numerous memory accesses that need to be serialized, increasing the number of executed commands. "

It seems that, according to the author, noncollinear calls can cause diverging branches. It's true? My question is, how many reasons are there for a divergence in the industry? Thanks in advance.

+6
source share
1 answer

I think the author is unclear regarding concepts and / or terminology.

The two concepts of divergence and serialization are closely related. Divergence causes serialization, since divergent groups of threads in the warp must be performed sequentially. But serialization does not cause a discrepancy, since the discrepancy relates specifically to threads in the core using different code paths.

Other things that cause serialization (but not discrepancy) are banking conflicts and atomic operations.

+3
source

Source: https://habr.com/ru/post/954927/


All Articles