How to implement a "soft barrier" in multithreaded C ++

I have multithreaded C ++ code with the following structure:

do_thread_specific_work(); update_shared_variables(); //checkpoint A do_thread_specific_work_not_modifying_shared_variables(); //checkpoint B do_thread_specific_work_requiring_all_threads_have_updated_shared_variables(); 

What follows control point B is work that could begin if all flows reached only control point A, so my concept is a “soft barrier”.

Generally, libraries with multiple threads provide only “hard barriers” in which all threads must reach a certain point before they can continue. Obviously, at checkpoint B, a hard barrier can be used.

Using a soft barrier can lead to better lead times, especially since work between breakpoints A and B cannot be balanced by load between threads (i.e. 1 slow thread that reaches breakpoint A, but not B, can cause all others wait at the barrier just before checkpoint B).

I tried using atomics to synchronize things, and I know with 100% certainty, which is NOT guaranteed. For example, using the openmp syntax, before starting a parallel section:

 shared_thread_counter = num_threads; //known at compile time #pragma omp flush 

Then at checkpoint A:

 #pragma omp atomic shared_thread_counter--; 

Then at checkpoint B (using the survey):

 #pragma omp flush while (shared_thread_counter > 0) { usleep(1); //can be removed, but better to limit memory bandwidth #pragma omp flush } 

I have developed several experiments in which I use an atom to indicate that some operation is before it is completed. The experiment will work with 2 threads most of the time, but fails consistently when I have many threads (e.g. 20 or 30). I suspect this is due to the caching structure of modern processors. Even if one thread updates some other value before performing an atomic decrement, it cannot be read by another thread in this order. Consider the case where another value is a cache skip, and atomic decrement is a cache.

So, back to my question, how to properly implement this "soft barrier"? Is there a built-in function guaranteeing such functionality? I would prefer openmp, but I am familiar with most of the other common multithreaded libraries.

As a workaround, now I am using a hard barrier at checkpoint B, and I changed my code to make work between checkpoint A and B automatically load balance between threads (which was quite difficult from time to time).

Thanks for any advice / understanding :)

+4
source share
1 answer

How to use a condition variable? I am not sure if a condition variable is provided, as I am not familiar with OpenMP.

 int counter = 0; condition_variable cond; // checkpoint A ++counter; cond.notify_all(); // checkpoint B cond.wait_until( counter >= NUM_THREADS ); 

Before each thread reaches breakpoint A, not a single thread passes through breakpoint B.

+2
source

Source: https://habr.com/ru/post/1381165/


All Articles