For the project, I had to dive into OpenCL: everything is going well, but now I need atomic operations. I am running OpenCL code on top of the Nvidia graphics core with the latest drivers. clGetDeviceInfo() query CL_DEVICE_VERSION returns me: OpenCL 1.0 CUDA , so, I think, I should refer to the specifications of OpenCL 1.0.
I started using the atom_add operation in my kernel in the __global int* vnumber : atom_add(&vnumber[0], 1); . This gave me clearly wrong results. Thus, as an additional check, I moved the add command at the beginning of the kernel so that it runs for each thread. When the kernel starts up with 512 x 512 threads, the contents of vnumber[0] are: 524288 , which is exactly 2 x 512 x 512, which is twice the value I should get. The funny thing is that by changing the add operation to atom_add(&vnumber[0], 2); , the return value is 65536 , again twice as much as I should get.
Has anyone already experienced something similar? Did I miss something? I checked the validity of the data types, but it seems to be fine (I use the *int buffer and allocate it using sizeof(cl_int) ).
source share