Speeding up OpenCV C ++ multithreading

Question

Speeding up OpenCV C ++ multithreading

There is a little context for the following code.

Mat img0; // 1280x960 grayscale

-

 timer.start(); for (int i = 0; i < img0.rows; i++) { vector<double> v; uchar* p = img0.ptr<uchar>(i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } } cout << "Single thread " << timer.end() << endl;

and

 timer.start(); concurrency::parallel_for(0, img0.rows, [&img0](int i) { vector<double> v; uchar* p = img0.ptr<uchar>(i); for (int j = 0; j < img0.cols; ++j) { v.push_back(p[j]); } }); cout << "Multi thread " << timer.end() << endl;

Result:

 Single thread 0.0458856 Multi thread 0.0329856

Acceleration is hardly noticeable.

My Intel i5 3.10 GHz processor

RAM 8 GB DDR3

EDIT

I tried a slightly different approach.

 vector<Mat> imgs = split(img0, 2,1); // `split` is my custom function that, in this case, splits `img0` into two images, its left and right half

-

 timer.start(); concurrency::parallel_for(0, (int)imgs.size(), [imgs](int i) { Mat img = imgs[i]; vector<double> v; for (int row = 0; row < img.rows; row++) { uchar* p = img.ptr<uchar>(row); for (int col = 0; col < img.cols; ++col) { v.push_back(p[col]); } } }); cout << " Multi thread Sectored " << timer.end() << endl;

And I get a much better result:

 Multi thread Sectored 0.0232881

So it looks like I was creating 960 threads or something when I ran

 parallel_for(0, img0.rows, ...

And it didn’t work.

(I must add that Kenny’s comment is correct. Do not attach too much importance to the specific numbers indicated here. When measuring small intervals such as these, there are big variations. But in general, what I wrote in the editorial office is about splitting the image in half, improved performance over the old approach.)

+5

c ++ multithreading opencv

ancajic Dec 12 '15 at 15:31

source share

1 answer

Martin bonner · Answer 1 · 2015-12-12T15:39:51+0000

I think your problem is that you are limited by memory bandwidth. The second fragment is mainly read from the whole image, and it should go from the main memory to the cache. (Or from L2 cache to L1 cache).

You need to arrange your code so that all four cores work simultaneously with the same bit of memory (I suppose you are not really trying to optimize this code - this is just a simple example).

Edit: Insert the keyword “no” in the last statement into brackets.

Speeding up OpenCV C ++ multithreading

More articles: