Speeding up OpenCV C ++ multithreading

There is a little context for the following code.

Mat img0; // 1280x960 grayscale 

-

 timer.start(); for (int i = 0; i < img0.rows; i++) { vector<double> v; uchar* p = img0.ptr<uchar>(i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } } cout << "Single thread " << timer.end() << endl; 

and

 timer.start(); concurrency::parallel_for(0, img0.rows, [&img0](int i) { vector<double> v; uchar* p = img0.ptr<uchar>(i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } }); cout << "Multi thread " << timer.end() << endl; 

Result:

 Single thread 0.0458856 Multi thread 0.0329856 

Acceleration is hardly noticeable.

My Intel i5 3.10 GHz processor

RAM 8 GB DDR3

EDIT

I tried a slightly different approach.

 vector<Mat> imgs = split(img0, 2,1); // `split` is my custom function that, in this case, splits `img0` into two images, its left and right half 

-

 timer.start(); concurrency::parallel_for(0, (int)imgs.size(), [imgs](int i) { Mat img = imgs[i]; vector<double> v; for (int row = 0; row < img.rows; row++) { uchar* p = img.ptr<uchar>(row); for (int col = 0; col < img.cols; ++col) { v.push_back(p[col]); } } }); cout << " Multi thread Sectored " << timer.end() << endl; 

And I get a much better result:

 Multi thread Sectored 0.0232881 

So it looks like I was creating 960 threads or something when I ran

 parallel_for(0, img0.rows, ... 

And it didn’t work.

(I must add that Kenny’s comment is correct. Do not attach too much importance to the specific numbers indicated here. When measuring small intervals such as these, there are big variations. But in general, what I wrote in the editorial office is about splitting the image in half, improved performance over the old approach.)

+5
source share
1 answer

I think your problem is that you are limited by memory bandwidth. The second fragment is mainly read from the whole image, and it should go from the main memory to the cache. (Or from L2 cache to L1 cache).

You need to arrange your code so that all four cores work simultaneously with the same bit of memory (I suppose you are not really trying to optimize this code - this is just a simple example).

Edit: Insert the keyword β€œno” in the last statement into brackets.

+1
source

Source: https://habr.com/ru/post/1237982/


All Articles