Strange behavior of the OpenCL atom add operation

For the project, I had to dive into OpenCL: everything is going well, but now I need atomic operations. I am running OpenCL code on top of the Nvidia graphics core with the latest drivers. clGetDeviceInfo() query CL_DEVICE_VERSION returns me: OpenCL 1.0 CUDA , so, I think, I should refer to the specifications of OpenCL 1.0.

I started using the atom_add operation in my kernel in the __global int* vnumber : atom_add(&vnumber[0], 1); . This gave me clearly wrong results. Thus, as an additional check, I moved the add command at the beginning of the kernel so that it runs for each thread. When the kernel starts up with 512 x 512 threads, the contents of vnumber[0] are: 524288 , which is exactly 2 x 512 x 512, which is twice the value I should get. The funny thing is that by changing the add operation to atom_add(&vnumber[0], 2); , the return value is 65536 , again twice as much as I should get.

Has anyone already experienced something similar? Did I miss something? I checked the validity of the data types, but it seems to be fine (I use the *int buffer and allocate it using sizeof(cl_int) ).

+4
source share
1 answer

You are using atom_add, which is an OpenCL 1.0 extension for local memory. But you give him global memory. Instead, try OpenCL 1.1 atomic_add, which works with global memory.

+3
source

Source: https://habr.com/ru/post/1379296/


All Articles