OpenCL: difference between __ constant memory and const __global memory

I would like to understand the difference when I create a buffer with a read-only property and use it with the __constant address space __constant in the kernel or use it with the const __global address space const __global .

What I already found are not the answers to my question, but they contain some useful information:

If I understand well, the allocation in the GPU memory occurs when the clCreateBuffer function is called. Therefore, I don’t understand how the compiler decides that the buffer is in read-only memory (which has a limit of 64 KB) or in global memory. (I know that in most cases, read-only memory is part of the global memory space.) If it depends on the address qualifier, it means that the 64K limit can be ignored with const __global .

Is there a performance difference between __constant and const __global ? __Global memory can be cached, so both are read-only and (can be) cached. (Source: 3.3. Memory model / global memory section and Figure 3.3; http://www.khronos.org/registry/cl/specs/opencl-1.x-latest.pdf#page=24 )

+4
source share
2 answers

Based on my experiences, there is no conceptual difference between them, they both imply that the data pointed to are read-only. The difference is only then apparent depending on the implementation used by the supplier.

For example, on the nvidia graphics core, the memory marked __constant is cached (cache size is 8 KB for each processor, which I assume is for all current devices). It should be noted that access to this cache is serialized if different work items get access to different addresses, and therefore I found it most useful for passing parameter structures that are constant in the workgroup. If you look at the section on read-only memory in the CUDA programming guide, you get an idea of ​​how this works. Memory marked as const __global is not cached. I suppose it just tells the compiler to throw an error if you try to change the value pointed to values.

I'm not sure that AMD does similar caching on its hardware

Hope that helps

+1
source

For AMD OpenCL implementation see explanation here: https://github.com/RadeonOpenCompute/ROCm/issues/203

In principle, a constant has an implicit restriction. So the constant int * p is basically equivalent to const global int * restricts p.

0
source

Source: https://habr.com/ru/post/1494703/


All Articles