Does the OpenCL kernel work really bad?

My application takes 5200 ms to calculate the data set using OpenCL on the GPU , 330 ms for the same data using OpenCL on the CPU ; whereas when processing without OpenCL on the CPU using multiple threads, it takes 110 ms . OpenCL time runs only to execute the kernel, that is, it starts immediately before clEnqueueNDRangeKerneland ends immediately after clFinish. I have a Windows gadget that tells me that I use only 19% of the power of the GPU. Even if I can do it up to 100%, it will take about 1000 ms, which is much higher than my processor.

enter image description here

The size of the workgroup is somewhat out of scope CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, and I use all the computing units (6 for the GPU and 4 for the CPU). Here is my core:

__kernel void reduceURatios(__global myreal *coef, __global myreal *row, myreal ratio)
{
    size_t gid = get_global_id(0);

    myreal pCoef = coef[gid];
    myreal pRow = row[gid];

    pCoef = pCoef - (pRow * ratio);
    coef[gid] = pCoef;
}

I get a similar poor performance for another kernel:

__kernel void calcURatios(__global myreal *ratios, __global myreal *rhs, myreal c, myreal r)
{
    size_t gid = get_global_id(0);

    myreal pRatios = ratios[gid];
    myreal pRHS = rhs[gid];

    pRatios = pRatios / c;
    ratios[gid] = pRatios;

    //pRatios = pRatios * r;
    pRHS = pRHS - (pRatios * r);
    rhs[gid] = pRHS;
}

Questions:

  • Why is my GPU so bad compared to CPU on OpenCL.
  • Why is a CPU on OpenCL 3X slower than a processor without OpenCL, but with several Threaded?
0
source share
1 answer

Perhaps you could add some information on how you queue this kernel - perhaps with the wrong size of local work? (Do not hesitate, just go nullas the local size of the work - OpenCL will choose the appropriate one).

. , , . , /, ( ). - , .

(: fooobar.com/questions/72668/...).

, Unified Memory, HSA, AMD Kaveri .. , .

EDIT: , , . (), , , /.

+2

Source: https://habr.com/ru/post/1535779/


All Articles