Weird OpenCL Causes C ++ Side-Effects for Loop Performance

I am working on a C ++ project using OpenCL. I use the CPU as an OpenCL device with Intel OpenCL runtime

I noticed a strange side effect when calling OpenCL functions. Here is a simple test:

#include <iostream> #include <cstdio> #include <vector> #include <CL/cl.hpp> int main(int argc, char* argv[]) { /* cl_int status; std::vector<cl::Platform> platforms; cl::Platform::get(&platforms); std::vector<cl::Device> devices; platforms[1].getDevices(CL_DEVICE_TYPE_CPU, &devices); cl::Context context(devices); cl::CommandQueue queue = cl::CommandQueue(context, devices[0]); status = queue.finish(); printf("Status: %d\n", status); */ int ch; int b = 0; int sum = 0; FILE* f1; f1 = fopen(argv[1], "r"); while((ch = fgetc(f1)) != EOF) { sum += ch; b++; if(b % 1000000 == 0) printf("Char %d read\n", b); } printf("Sum: %d\n", sum); } 

This is a simple loop that reads the char file on char and adds them, so the compiler does not try to optimize it.

My system is Core i7-4770K, 2TB HDD 16GB DDR3, running Ubuntu 14.10. The program above, with a 100 MB file as input, takes about 770 ms. This matches the speed of my hard drive. So far so good.

If you now invert comments and run only the OpenCL call zone, it takes about 200 ms. Again, so far, so good.

Buf, if you uncomment everything, the program takes more than 2000 ms. I would expect 770 ms + 200 ms, but this is 2000 ms. You may even notice an increase in the delay between output messages in the for loop. The two regions (OpenCL calls and read characters) are assumed to be independent.

I do not understand why using OpenCL hinders the simple performance of C ++ for loop. This is not a simple OpenCL initialization delay.

I am compiling this example with

 g++ weird.cpp -O2 -lOpenCL -o weird 

I also tried using Clang ++, but this happens the same thing.

+6
source share
2 answers

That was interesting. This is because getc creates a thread-safe version at the moment the queue is created, and therefore increasing the time is a lock locking cycle - I'm not sure why / how this happens, but this is a decisive moment for the AMD OpenCL SDK with Intel processors. I was very surprised that I had basically the same time as the OP.

https://software.intel.com/en-us/forums/topic/337984

You can try to fix this specific problem by simply changing getc to getc_unlocked.

He returned me up to 930 ms for me - this increase in time of more than 750 ms is mainly carried out in the lines of creating the platform and context.

+3
source

I believe the effect is because OpenCL objects are still in scope and therefore are not deleted before the for loop. They may influence other calculations for reasons necessary. For example, running the example, when you gave it, gives the following points on my system (g ++ 4.2.1 with O2 on Mac OSX):

 CL: 0.012s Loop: 14.447s Both: 14.874s 

But the inclusion of OpenCL code in its own anonymous area, so automatically calling destructors before the loops seem to get rid of the problem. Using the code:

 #include <iostream> #include <cstdio> #include <vector> #include "cl.hpp" int main(int argc, char* argv[]) { { cl_int status; std::vector<cl::Platform> platforms; cl::Platform::get(&platforms); std::vector<cl::Device> devices; platforms[1].getDevices(CL_DEVICE_TYPE_CPU, &devices); cl::Context context(devices); cl::CommandQueue queue = cl::CommandQueue(context, devices[0]); status = queue.finish(); printf("Status: %d\n", status); } int ch; int b = 0; int sum = 0; FILE* f1; f1 = fopen(argv[1], "r"); while((ch = fgetc(f1)) != EOF) { sum += ch; b++; if(b % 1000000 == 0) printf("Char %d read\n", b); } printf("Sum: %d\n", sum); } 

I get timings:

 CL: 0.012s Loop: 14.635s Both: 14.648s 

Which seems linear. The effect is quite small compared to other effects in the system, such as loading the processor from other processes, but it seems to disappear when an anonymous area is added. I will do some profiling and add it as an edit if it creates anything interesting.

+1
source

Source: https://habr.com/ru/post/986298/


All Articles