I have an OpenCL core that should treat an array as multiple arrays, where each sum of the sub-cameras is stored in a local cache array.
For example, imagine a fowling array:
[[1, 2, 3, 4], [10, 30, 1, 23]]
- Each workgroup receives an array (in our example, two workgroups);
Each work item processes two indexes of the array (for example, it multiplies the index of the local_id value), where the result of the work item is stored in the general array of the workgroup.
__kernel void test(__global int **values, __global int *result, const int array_size){ __local int cache[array_size]; // initialise if (get_local_id(0) == 0){ for (int i = 0; i < array_size; i++) cache[i] = 0; } barrier (CLK_LOCAL_MEM_FENCE); if(get_global_id(0) < 4){ for (int i = 0; i<2; i++) cache[get_local_id(0)] += values[get_group_id(0)][i] * get_local_id(0); } barrier (CLK_LOCAL_MEM_FENCE); if(get_local_id(0) == 0){ for (int i = 0; i<array_size; i++) result[get_group_id(0)] += cache[i]; } }
The problem is that I cannot determine the size of the cache array using the kernel parameter, but I need to have a dynamic kernel.
How can I create it dynamically? as a malloc function in c ...
Or is the only solution available is sending a temp array for my kernel function?
source share