Parallel inner loop using openmp

Question

Parallel inner loop using openmp

I have three nested loops, but only the innermost one is parallelizable. The stopping conditions of the outer and middle cycle depend on the calculations performed by the innermost loop, and therefore I cannot change the order.

I used the OPENMP pragma directive just before the innermost loop, but performance with two threads is worse than with one. I think this is because flows are created at each iteration of the outer contours.

Is there a way to create threads outside of outer loops, but just use them in the innermost loop?

Thank you in advance

+4

c ++ loops parallel-processing openmp

Hernan Feb 05 '11 at 13:04

source share

2 answers

OpenMP should use a thread pool, so you won’t re-create threads every time you execute your loop. Strictly speaking, however, this may depend on the OpenMP implementation you use (I know the GNU compiler uses the pool). I suggest you look for other common problems, such as a false exchange.

+4

ltjax Feb 05 '11 at 13:09

source share

minjang · Accepted Answer · 2011-02-05T16:23:29+0000

Unfortunately, modern multicore computer systems are not suitable for such a fine-grained internal parallelism cycle. This is not due to a problem with the creation / molding of the thread. As Itiaks noted, almost all OpenMP implementations use thread pools, i.e. They create multiple threads, and the threads are parked. Thus, there is actually no overhead for creating threads.

However, the problems of such parallel inner loops are as follows:

Sending tasks / tasks to threads: even if we don’t need to physically create threads, at least we should assign tasks (= create logical tasks) to threads that basically require synchronization.
Combining threads: after all threads in a command, these threads should be combined (unless the OpenMP directive is used). This is usually implemented as a barrier operation, which is also very intense synchronization.

Therefore, you should minimize the actual number of assignments / join threads. You can reduce this overhead by increasing the amount of internal loop work per call. This can be done with some code changes, such as a loop reversal.

Parallel inner loop using openmp

More articles: