How to nest parallel loops in a sequential loop using OpenMP

I am currently working on matrix calculation using OpenMP. I have several loops in my code, and instead, when calling for each loop #pragma omp parallel for [...] (which creates all the threads and destroys them right after), I would like to create them all at the beginning and delete them at the end of the program to avoid overhead. I need something like:

#pragma omp parallel { #pragma omp for[...] for(...) #pragma omp for[...] for(...) } 

The problem is that I have some parts that should be executed by only one thread, but in a loop that contains loops, they should be executed in parallel ... Here is what it looks like:

 //have to be execute by only one thread int a=0,b=0,c=0; for(a ; a<5 ; a++) { //some stuff //loops which have to be parallelize #pragma omp parallel for private(b,c) schedule(static) collapse(2) for (b=0 ; b<8 ; b++); for(c=0 ; c<10 ; c++) { //some other stuff } //end of the parallel zone //stuff to be execute by only one thread } 

(The boundaries of the loop in my example are quite small. In my program, the number of iterations can go on up to 20,000 ...) One of my first ideas was to do something like this:

 //have to be execute by only one thread #pragma omp parallel //creating all the threads at the beginning { #pragma omp master //or single { int a=0,b=0,c=0; for(a ; a<5 ; a++) { //some stuff //loops which have to be parallelize #pragma omp for private(b,c) schedule(static) collapse(2) for (b=0 ; b<8 ; b++); for(c=0 ; c<10 ; c++) { //some other stuff } //end of the parallel zone //stuff to be execute by only one thread } } } //deleting all the threads 

It does not compile, I get this error from gcc: "the sharing area cannot be closely nested in the workspace, the critical, ordered, main or explicit task area."

I know that this certainly comes from the β€œwrong” nesting, but I don’t understand why this does not work. Do I need to add a barrier in front of the parallel zone? I lost a little and do not know how to solve it.

Thank you in advance for your help. Greetings.

+6
source share
2 answers

In the last contour of your code, you declare a parallel region within which the main directive is used to ensure that only the main thread executes the block, and inside the main block an attempt to parallelize the loop across all threads. You claim that compiler errors occur due to incorrect nesting, but you wonder why this does not work.

This does not work, because extending the work to multiple threads in the area of ​​code that will be executed by only one thread makes no sense.

Your first pseudo code is better, but you probably want to expand it like this:

 #pragma omp parallel { #pragma omp for[...] for(...) #pragma omp single { ... } #pragma omp for[...] for(...) } 

The single directive ensures that the block of code that it concludes is executed by only one thread. Unlike the master single directive, an exit barrier is also implied; you can change this behavior with the nowait .

+3
source

Most OpenMP delays do not "create all threads and destroy them immediately after." Streams are created at the beginning of the first section of OpenMP and are destroyed when the program terminates (at least as the Intel OpenMP implementation does). There is no performance benefit from using one large parallel area instead of several smaller ones.

Intel's battery life (which is open source and can be found here ) has options for controlling what happens to threads when they don't work. By default, they will rotate for a while (in case the program immediately launches a new parallel section), then they will sleep. If you sleep, it will take a little longer to start them for the next parallel section, but it depends on the time between regions, and not on the syntax.

+3
source

Source: https://habr.com/ru/post/958527/


All Articles