OpenCL: What type of memory to use?

I have some kind of filtering core, something like this:

__kernel void filterKernel (__global float4 *filter, __global float4* in_array, __global float4* out_array) { ... out_array[tid] = in_array[tid] * filter[fid]; ... } 
  • kernel filterKernel is called several times (about 1000 times).

  • A variable filter is an array of floats that never changes values ​​(remains unchanged for all working groups and for all kernel calls).

  • in_array contains 32768 floats.

What is the best declaration of this filter variable? __constant? __local? Maybe the place is "const" here and there? What helps the compiler the most? What makes the code the fastest?

+4
source share
3 answers

You should use a constant address space (__constant), since most GPUs have special caches for read-only memory. The only problem is that the read-only memory is small (about 16-64 KB).

+3
source

__ local would be wrong, since you cannot initialize it to anything. You probably want to use __constant, provided that it works.

+1
source

If it is not too large, try defining your filter globally inside the .cl file.
There you can try to select it either in __ constant or __ local space and compare which one is faster. But not all SDKs support global variables in the __local address space (I'm looking at you ATI).

If you still want to pass the filter as a kernel argument, consider calling SetKernelArg (0, ...) only once . You also do not need to call SetKernelArg () 1000 times if the value or index of the kernel argument does not change. Although this may not have a measurable effect on performance, it is still cleaner.

0
source

Source: https://habr.com/ru/post/1334661/


All Articles