Strange behavior of the OpenCL atom add operation

Question

Strange behavior of the OpenCL atom add operation

For the project, I had to dive into OpenCL: everything is going well, but now I need atomic operations. I am running OpenCL code on top of the Nvidia graphics core with the latest drivers. clGetDeviceInfo() query CL_DEVICE_VERSION returns me: OpenCL 1.0 CUDA , so, I think, I should refer to the specifications of OpenCL 1.0.

I started using the atom_add operation in my kernel in the __global int* vnumber : atom_add(&vnumber[0], 1); . This gave me clearly wrong results. Thus, as an additional check, I moved the add command at the beginning of the kernel so that it runs for each thread. When the kernel starts up with 512 x 512 threads, the contents of vnumber[0] are: 524288 , which is exactly 2 x 512 x 512, which is twice the value I should get. The funny thing is that by changing the add operation to atom_add(&vnumber[0], 2); , the return value is 65536 , again twice as much as I should get.

Has anyone already experienced something similar? Did I miss something? I checked the validity of the data types, but it seems to be fine (I use the *int buffer and allocate it using sizeof(cl_int) ).

+4

atomic opencl gpgpu

Neenster Nov 02 '11 at 12:07

source share

1 answer

vocaro · Accepted Answer · 2011-11-02T20:05:59+0000

You are using atom_add, which is an OpenCL 1.0 extension for local memory. But you give him global memory. Instead, try OpenCL 1.1 atomic_add, which works with global memory.

Strange behavior of the OpenCL atom add operation

More articles: