OpenCL runtime model for loop

I am currently studying OpenCL and find this piece of code:

int gti = get_global_id(0);
int ti = get_local_id(0);

int n = get_global_size(0);
int nt = get_local_size(0);
int nb = n/nt;

for(int jb=0; jb < nb; jb++) { /* Foreach block ... */
      pblock[ti] = pos_old[jb*nt+ti]; /* Cache ONE particle position */
      barrier(CLK_LOCAL_MEM_FENCE); /* Wait for others in the work-group */

      for(int j=0; j<nt; j++) { /* For ALL cached particle positions ... */
         float4 p2 = pblock[j]; /* Read a cached particle position */
         float4 d = p2 - p;
         float invr = rsqrt(d.x*d.x + d.y*d.y + d.z*d.z + eps);
         float f = p2.w*invr*invr*invr;
         a += f*d; /* Accumulate acceleration */
      }

      barrier(CLK_LOCAL_MEM_FENCE); /* Wait for others in work-group */
}

Background information about the code: . This is part of the OpenCL core in the NBody simulation program. Full code and tutorial can be found here .

Here are my questions (mainly for for loops):

  • How exactly for loops running in OpenCL? I know that all work items run the same code and the work items in the work group try to execute in parallel. So, if I started the for loop in OpenCL, does this mean that all work items run the same loop or is the cycle somehow divided into several work items, with each work item performing part of the loop (i.e., the work item 1 processes indices 0 ~ 9, item 2 processes indices 10 ~ 19, etc.).

  • , ? OpenCL, ?

  • ( , for , , ), ? , a = a + f * d, .

, , .

+4
3

, , .

1.1) , ( ), ( ) .

for(int jb=0; jb < nb; jb++) { /* Foreach block ... */
      pblock[ti] = pos_old[jb*nt+ti];

//I assume pblock is local memory

1.2) ( )

1.3) , for,

for(int j=0; j<nt; j++) {

. , , .

1) - C OpenCL, , , . OpenCL - (, # 1.1).

2) OpenCL , , .

3) , : 1 , , , .

, , , . , .

+3

, OpenCL?

  • , . SALU , , , SALU , 9-10 (, - ) SALU , VALU, .

  • SIMD, , . -, , , . , ( ). / , . SIMD, MIMD, , .

, ?

  • nb nt - , , .

+2

1) , OpenCL? , . , for OpenCL, - , (.. 1 0 ~ 9, 2 10 ~ 19 ..).

. , , . . ( AMD) warp ( NV), .

. . , . , , , , . , :

if(condition is true)
   do_a();
else
   do_b();

, , do_a(); do_b(). , , . , do_a(), do_a(); , do_b(), . .

, , , , , . :

for(int jb=0; jb < nb; jb++) { /* Foreach block ... */
      pblock[ti] = pos_old[jb*nt+ti]; /* Cache ONE particle position */
      barrier(CLK_LOCAL_MEM_FENCE); /* Wait for others in the work-group */

      for(int j=0; j<nt; j++) { /* For ALL cached particle positions ... */

, , , .

2) , ? OpenCL, ?

(1), , .

OpenCL, , , . OpenCL , . , . ,

unsigned int gid = get_global_id(0);
buf[gid] = input1[gid] + input2[gid];

.

3) ( , for , , ), ? a = a + f * d, , .

     float4 d = p2 - p;
     float invr = rsqrt(d.x*d.x + d.y*d.y + d.z*d.z + eps);
     float f = p2.w*invr*invr*invr;
     a += f*d; /* Accumulate acceleration */

a, f d , , . GPU ; , , , , , , ( -, , , , ).

, , a, f d, .

+2
source

Source: https://habr.com/ru/post/1542867/


All Articles