The flush directive tells the OpenMP compiler to generate code in order to reconcile the private representation of the stream in shared memory. OpenMP usually does this pretty well and does the right thing for typical programs. Therefore, there is no need for flush .
However, in some cases, the OpenMP compiler needs some help. One of these cases is an attempt to implement your own spin lock. In these cases, you will need a combination of flushes to make things work, as otherwise the rotation variables will not be updated. Obtaining the correct reset sequence will be difficult and very, very error prone.
A general recommendation is that flushing should not be used. If at all, programmers should avoid list cleaning ( flush(var,...) ) in any way. Some people are actually talking about abandoning it in the future of OpenMP.
In terms of performance, the effect of discharge should be negative rather than positive. Since this causes the compiler to generate memory limitations and additional load / save operations, I would expect this to slow down.
EDIT: For your second question, the answer is no. OpenMP ensures that each thread has a consistent view of shared memory when needed. If the threads are not synchronized, they do not need to update their view of shared memory, because they do not see any “interesting” changes there. This means that any stream reading does not read data that has been changed by any other stream. If that were the case, then you would have a race condition and a potential mistake in your program. To avoid the race, you need to perform synchronization (which then involves a reset to ensure consistency of each participating stream again). A similar argument applies to barriers. You use barriers to begin a new era in computing a parallel domain. Since you keep threads in lock mode, you will most likely also have some common state between threads, which was calculated in the previous era.
By the way, OpenMP can store personal data for a stream, but this is not necessary. Thus, it is likely that the OpenMP compiler will store variables in registers for some time, which leads to their non-synchronization with shared memory. However, updates to array elements are usually quickly reflected in shared memory, since the amount of private storage for a stream is usually small (sets of registers, caches, pure memory, etc.). OpenMP only gives you some weak restrictions on what you can expect. The actual implementation of OpenMP (or hardware) can be as strict as desired (for example, immediately record any changes and permanently reset).