I am working on a C ++ project using OpenCL. I use the CPU as an OpenCL device with Intel OpenCL runtime
I noticed a strange side effect when calling OpenCL functions. Here is a simple test:
#include <iostream> #include <cstdio> #include <vector> #include <CL/cl.hpp> int main(int argc, char* argv[]) { /* cl_int status; std::vector<cl::Platform> platforms; cl::Platform::get(&platforms); std::vector<cl::Device> devices; platforms[1].getDevices(CL_DEVICE_TYPE_CPU, &devices); cl::Context context(devices); cl::CommandQueue queue = cl::CommandQueue(context, devices[0]); status = queue.finish(); printf("Status: %d\n", status); */ int ch; int b = 0; int sum = 0; FILE* f1; f1 = fopen(argv[1], "r"); while((ch = fgetc(f1)) != EOF) { sum += ch; b++; if(b % 1000000 == 0) printf("Char %d read\n", b); } printf("Sum: %d\n", sum); }
This is a simple loop that reads the char file on char and adds them, so the compiler does not try to optimize it.
My system is Core i7-4770K, 2TB HDD 16GB DDR3, running Ubuntu 14.10. The program above, with a 100 MB file as input, takes about 770 ms. This matches the speed of my hard drive. So far so good.
If you now invert comments and run only the OpenCL call zone, it takes about 200 ms. Again, so far, so good.
Buf, if you uncomment everything, the program takes more than 2000 ms. I would expect 770 ms + 200 ms, but this is 2000 ms. You may even notice an increase in the delay between output messages in the for loop. The two regions (OpenCL calls and read characters) are assumed to be independent.
I do not understand why using OpenCL hinders the simple performance of C ++ for loop. This is not a simple OpenCL initialization delay.
I am compiling this example with
g++ weird.cpp -O2 -lOpenCL -o weird
I also tried using Clang ++, but this happens the same thing.