OpenMP reset directive explanation: when it is needed and when it is useful

One OpenMP directive that I have never used and do not know when to use flush (with and without a list).

I have two questions:

 1.) When is an explicit `omp flush` or `omp flush(var1, ...) necessary? 2.) Is it sometimes not necessary but helpful (ie can it make the code fast)? 

The main reason I can’t figure out when to use an explicit flash is because the flushes are executed implicitly after many directives (like as a barrier, single, ...) that synchronize threads. I cannot, for example, see a way to use flash rather than synchronization (for example, using nowait ).

I understand that different compilers can implement omp flush in different ways. Some may interpret the flash with the list as one without (i.e., Clear all shared objects) OpenMP flush vs flush (list) . But I don’t care what the specification requires. In other words, I want to know where explicit flush in principle be necessary or useful.

Edit: I think I need to clarify my second question. Let me give you an example. I would like to know if there are cases of deleting an implicit flash (for example, with nowait) and instead of using an explicit flash, but only for some common variables it will be faster (and still give the correct result). Something like the following:

 float a,b; #pragma omp parallel { #pragma omp for nowait // No barrier. Do not flush on exit. //code which uses only shared variable a #pragma omp flush(a) // Flush only variable a rather than all shared variables. #pragma omp for //Code which uses both shared variables a and b. } 

I think the code still needs barriers after the first loop, but all barriers have an implicit flash to defeat the target. Is it possible to have a barrier that does not flash?

+8
source share
2 answers

The flush directive tells the OpenMP compiler to generate code in order to reconcile the private representation of the stream in shared memory. OpenMP usually does this pretty well and does the right thing for typical programs. Therefore, there is no need for flush .

However, in some cases, the OpenMP compiler needs some help. One of these cases is an attempt to implement your own spin lock. In these cases, you will need a combination of flushes to make things work, as otherwise the rotation variables will not be updated. Obtaining the correct reset sequence will be difficult and very, very error prone.

A general recommendation is that flushing should not be used. If at all, programmers should avoid list cleaning ( flush(var,...) ) in any way. Some people are actually talking about abandoning it in the future of OpenMP.

In terms of performance, the effect of discharge should be negative rather than positive. Since this causes the compiler to generate memory limitations and additional load / save operations, I would expect this to slow down.

EDIT: For your second question, the answer is no. OpenMP ensures that each thread has a consistent view of shared memory when needed. If the threads are not synchronized, they do not need to update their view of shared memory, because they do not see any “interesting” changes there. This means that any stream reading does not read data that has been changed by any other stream. If that were the case, then you would have a race condition and a potential mistake in your program. To avoid the race, you need to perform synchronization (which then involves a reset to ensure consistency of each participating stream again). A similar argument applies to barriers. You use barriers to begin a new era in computing a parallel domain. Since you keep threads in lock mode, you will most likely also have some common state between threads, which was calculated in the previous era.

By the way, OpenMP can store personal data for a stream, but this is not necessary. Thus, it is likely that the OpenMP compiler will store variables in registers for some time, which leads to their non-synchronization with shared memory. However, updates to array elements are usually quickly reflected in shared memory, since the amount of private storage for a stream is usually small (sets of registers, caches, pure memory, etc.). OpenMP only gives you some weak restrictions on what you can expect. The actual implementation of OpenMP (or hardware) can be as strict as desired (for example, immediately record any changes and permanently reset).

+15
source

Not quite an answer, but Michael Clemm's question is closed for comment. I think a great example of why flushes are so hard to understand and use correctly is the following copying (and shortening a bit) from OpenMP Examples :

 //http://www.openmp.org/wp-content/uploads/openmp-examples-4.0.2.pdf //Example mem_model.2c, from Chapter 2 (The OpenMP Memory Model) int main() { int data, flag = 0; #pragma omp parallel num_threads(2) { if (omp_get_thread_num()==0) { /* Write to the data buffer that will be read by thread */ data = 42; /* Flush data to thread 1 and strictly order the write to data relative to the write to the flag */ #pragma omp flush(flag, data) /* Set flag to release thread 1 */ flag = 1; /* Flush flag to ensure that thread 1 sees S-21 the change */ #pragma omp flush(flag) } else if (omp_get_thread_num()==1) { /* Loop until we see the update to the flag */ #pragma omp flush(flag, data) while (flag < 1) { #pragma omp flush(flag, data) } /* Values of flag and data are undefined */ printf("flag=%d data=%d\n", flag, data); #pragma omp flush(flag, data) /* Values data will be 42, value of flag still undefined */ printf("flag=%d data=%d\n", flag, data); } } return 0; } 

Read the comments and try to understand.

0
source

Source: https://habr.com/ru/post/957050/


All Articles