Usually, when OpenCL returns unexpected boundary responses, these are problems outside of the bounds. If most of the code works, except for the boundaries, check the kernel for memory access. Usually this will be associated with: global offset + global id, local offset + local identifier, or a combination that exceeds the selection. Make sure that the global and local workize and offsets in the C / C ++ code correspond to the allocated memory, as this translates to global and local identifiers in the kernel.
Edit: try to run the code on the CPU device, as they are less forgiving of OOB and, as a rule, cause a stack overflow or something like that. GPU devices are very forgiving and usually either use undefined, 0 (depending on the compiler options), or some bizarre super huge value.
source share