My application takes 5200 ms to calculate the data set using OpenCL on the GPU , 330 ms for the same data using OpenCL on the CPU ; whereas when processing without OpenCL on the CPU using multiple threads, it takes 110 ms . OpenCL time runs only to execute the kernel, that is, it starts immediately before clEnqueueNDRangeKerneland ends immediately after clFinish. I have a Windows gadget that tells me that I use only 19% of the power of the GPU. Even if I can do it up to 100%, it will take about 1000 ms, which is much higher than my processor.

The size of the workgroup is somewhat out of scope CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, and I use all the computing units (6 for the GPU and 4 for the CPU). Here is my core:
__kernel void reduceURatios(__global myreal *coef, __global myreal *row, myreal ratio)
{
size_t gid = get_global_id(0);
myreal pCoef = coef[gid];
myreal pRow = row[gid];
pCoef = pCoef - (pRow * ratio);
coef[gid] = pCoef;
}
I get a similar poor performance for another kernel:
__kernel void calcURatios(__global myreal *ratios, __global myreal *rhs, myreal c, myreal r)
{
size_t gid = get_global_id(0);
myreal pRatios = ratios[gid];
myreal pRHS = rhs[gid];
pRatios = pRatios / c;
ratios[gid] = pRatios;
//pRatios = pRatios * r;
pRHS = pRHS - (pRatios * r);
rhs[gid] = pRHS;
}
Questions:
- Why is my GPU so bad compared to CPU on OpenCL.
- Why is a CPU on OpenCL 3X slower than a processor without OpenCL, but with several Threaded?
source
share