OpenCL: Key Questions About SIMT Execution Model

Some concepts and designs of the SIMT architecture are still unclear.

From what I saw and read, diverging code paths and if () are generally a pretty bad idea, because many threads can execute in lockstep. What does it mean? Sort of:

kernel void foo(..., int flag)
{
    if (flag)
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

The flag parameter is the same for all work units, and the same branch is used for all work units. Now, is the GPU going to execute all the code, still serializing everything, and basically still taking the branch that is not taken? Or is it a little smarter and will only execute the branch if all the threads are consistent with the branch? Which will always be the case here.

those. does serialization ALWAYS happen or only when necessary? Sorry for the stupid question.;)

+3
source share
3 answers

No not always. The execution of both branches occurs only if the condition is not consistent between the threads in the local workgroup, which means that if the condition is evaluated using different values ​​between work items in the local workgroup, the current generation of GPUs will execute both branches, but only the correct ones branches will record values ​​and have side effects.

Thus, maintaining consistency is vital for performance in GPU branches.

+3
source

not sure about ati, but for nvidia it's smart. There will be no serialization if every thread in warp goes the same way.

+1

in your example, the flag will have the same value for all work items, so a good compiler will generate code that will take all work items in one direction.

But consider the following case:

kernel void foo(..., int *buffer)
{
    if (buffer[get_global_id(0)])
        DO_STUFF
    else
        DO_SOMETHING_ELSE
}

It is not guaranteed that all work items will have the same path, so serialization or control flow exception is required.

+1
source

Source: https://habr.com/ru/post/1760956/


All Articles