Why is my OpenMP implementation slower than a single-threaded implementation?

I am learning OpenMP concurrency and have tried myself in my existing code. In this code, I tried to make all loops for parallel. However, this seems to make the MUCH program slower, at least 10 times slower, or even more than the single-threaded version.

Here is the code: http://pastebin.com/zyLzuWU2

I also used pthreads, which turned out to be faster than the single-threaded version.

Now the question is: what am I doing wrong in my OpenMP implementation that causes this slowdown?

Thanks!

edit: single-threaded version - it's just without all #pragmas

+4
source share
3 answers

One of the problems that I see with your code is that you are using OpenMP through loops that are very small (like 8 or 64 iterations). This will not be effective due to overhead. If you want to use OpenMP for the n-queens problem, see OpenMP 3.0 tasks and the parallelism thread for branch and border problems.

+4
source

I think your code is too complex to be reviewed. One mistake that I immediately saw was that it was not even correct. In those places where you use omp parallel for to do the sums, you should use reduction(+: yourcountervariable) so that the results of the different threads are correctly gathered together. Otherwise, one thread may overwrite the result of others.

+3
source

At least two reasons:

  • You only do 8 iterations of a very simple loop. Your execution environment will be completely dominated by the overhead associated with setting up all threads.

  • In some places, the critical section causes competition; all threads will try to continuously access the critical section and block each other.

+2
source

Source: https://habr.com/ru/post/1340321/


All Articles