I have parallelized existing code for computer vision applications using OpenMP. I think I developed it well, because:
- Workload is balanced.
- There is no synchronization / lock mechanism.
- I parallelized the outer contours
- All cores are used most of the time (no idle cores)
- Enough work for every thread
Now the application does not scale when using many cores, for example. it does not scale very well after 15 cores.
The code uses external libraries (for example, OpenCV and IPP), where the code is already optimized and vectorized, while I manually vectorized some parts of the code as best as possible. However, according to Intel Advisor, the code is not very well vectorized, but there is nothing to do: I have already vectorized the code where I could, and I cannot improve external libraries.
So my question is: is it possible that vectorization is the reason that the code does not scale well at some point? If so, why?
source
share