CUDA: Scattering Exchange Scheme

I am learning CUDA from the Udacity course for parallel programming. In the quiz, they are given the problem of sorting a predefined variable (player height). Since this is a one-to-one correspondence between an input and output array, should it not be a Map communication template instead of a Scatter ? enter image description here

+5
source share
3 answers

CUDA does not make a canonical definition of these terms that I know of. So my answer is just a suggestion on how this could be or was interpreted.

"Since this is a one-to-one correspondence between the input and output array"

This operator is not supported by a chart that shows spaces in the output array that do not have a corresponding input point associated with them.

If a smaller set of values ​​is distributed into a larger array (therefore, gaps occur in the output array in the output array, in which no input value matches the location (s) of the gap), then to describe this operation. Both diffusers and cards have cards that describe where the input values ​​go, but it may be that the instructor defined the scatter and the card in such a way as to distinguish between these two cases, such as the following plausible definitions:

Scattering: a one-to-one relationship from input to output (i.e., a one-way relationship). Each input location has a corresponding output location, but not every output location has a corresponding input location.

Map: a one-to-one relationship between input and output (i.e., bidirectional ratio). Each input location has a corresponding output location, and each output location has a corresponding input location.

Assemble: a one-to-one relationship from output to input (ie, a unidirectional relationship). Each output location has a corresponding input location, but not every input location has a corresponding output location.

+6
source

The definition of each communication pattern (map, scatter, collection, etc.) varies slightly from one language / environment / context to another, but since I followed the same Udacity course, I will try to explain this term as I understand it in the context of the course :

The Map operation computes each output element as a function of its corresponding input element, that is:

 output[tid] = foo(input[tid]); 

The Gather template calculates each output element as a function of one or more (usually more) input elements that are not necessarily corresponding (as a rule, these are elements from a neighborhood). For instance:

 output[tid] = (input[tid-1] + input[tid+1]) / 2; 

Finally, the Scatter operation has each input element contributing to one or more (again, usually more) output elements. For instance,

 atomicAdd( &(output[tid-1]), input[tid]); atomicAdd( &(output[tid]), input[tid]); atomicAdd( &(output[tid+1]), input[tid]); 

The example given in the question is clearly not . Map, because each output is calculated from input in another place.

In addition, it is difficult to understand how the same example can be a scatter, because each input element calls only one record on the output, but it really is a scatter, because each input causes a record on the output, the location of which is determined by the input.

In other words, each CUDA stream processes the input element at the location associated with its tid (stream identifier number), and computes where to write the result. Most likely, the spread will be recorded in several places, and not just in one, so this is a specific case, which can also be called differently.

+4
source

Each player has 3 properties (name, height, rank). Therefore, I believe that the scatter is correct, because we must consider these three things in order to make a conclusion.

If a player has only one property, such as rank, then the map is correct, I think.

Link: Parallel Communication Templates Repeat this lecture

link: map / reduce / collect / scatter image

0
source

Source: https://habr.com/ru/post/1210788/


All Articles