Common OpenCL stencil core and host

I am new to OpenCL.

I would like to write a common kernel, so later I can expand its use for other collapsing memory patterns and correlate this with Rectangular stencil patternfor simplicity (while also avoiding access beyond borders).

This kernel manages the use of local memory ( __local float ∗lmem).

At the moment, I have the structure of my .clfile below:

__kernel void kmain (
    __global floatin ,
    __global floatout ,
    __global float ∗in2 ,
    __local float ∗lmem)
{
    int wg_x = get group id(0);
    int wg_y = get group id(1);
    int wi_x = get local id(0);
    int wi_y = get local id(1);     

    // number of work units each work-item processes
    for (int iter_x = 0; iter_x< NUM_WUS_X-1, iter_x++ ) {
        for (int iter_y = 0; iter_y< NUM_WUS_Y-1; iter_x++) {
            int wu_x, wu_y;

            // The current work unit coordinate (wu_x, wu_y) is computed based on work group ID (wg_x, wg_y), work item ID (wi_x, wi_y) and work unit ID (iter_x, iter_y) :
            (wu_x, wu_y) = func(wg_x, wg_y
                          wi_x, wi_y,
                          iter_x ,iter_y);

        // This is where to cooperatively load
        // a region of <in> to the local memory.
        // barrier (...);

            for (int i = 0; i < N-1, i++) {
                for (int j = 0; j< M-1, j++) {

                // (fo, fi) detemines the home access pattern centered around (idx_o, idx_i). WI(*,*) defines the memory access pattern i.e: (wi_x) = (wi_y) :
                int idx_o = fo(wu_x, wu_y, i, j);
                int idx_i = fi(wu_x, wu_y, i, j);

                // offsets CO and CI  determine stencil pattern within each work-item
                ... = in[idx_o + CO_1][idx_i + CI_1];
                ... // context (inner loop body)
                ... = in[idx_o + CO_k][idx_i + CI_k];
                ... // context (inner loop body)
            }
        }
        // barrier (...);
        ... // context (epilogue)
        out[y][x] = ...;
        }
    }
}

Does anyone have any ideas on implementing this template with an appropriate shared host?

+4
source share

Source: https://habr.com/ru/post/1674849/


All Articles