I am new to OpenCL.
I would like to write a common kernel, so later I can expand its use for other collapsing memory patterns and correlate this with Rectangular stencil patternfor simplicity (while also avoiding access beyond borders).
This kernel manages the use of local memory ( __local float ∗lmem).
At the moment, I have the structure of my .clfile below:
__kernel void kmain (
__global float ∗in ,
__global float ∗out ,
__global float ∗in2 ,
__local float ∗lmem)
{
int wg_x = get group id(0);
int wg_y = get group id(1);
int wi_x = get local id(0);
int wi_y = get local id(1);
for (int iter_x = 0; iter_x< NUM_WUS_X-1, iter_x++ ) {
for (int iter_y = 0; iter_y< NUM_WUS_Y-1; iter_x++) {
int wu_x, wu_y;
(wu_x, wu_y) = func(wg_x, wg_y
wi_x, wi_y,
iter_x ,iter_y);
for (int i = 0; i < N-1, i++) {
for (int j = 0; j< M-1, j++) {
int idx_o = fo(wu_x, wu_y, i, j);
int idx_i = fi(wu_x, wu_y, i, j);
... = in[idx_o + CO_1][idx_i + CI_1];
...
... = in[idx_o + CO_k][idx_i + CI_k];
...
}
}
...
out[y][x] = ...;
}
}
}
Does anyone have any ideas on implementing this template with an appropriate shared host?
source
share