How can I handle data race in OpenMP?

I am trying to use OpenMP to add numbers to an array. Below is my code:

int* input = (int*) malloc (sizeof(int)*snum); int sum = 0; int i; for(i=0;i<snum;i++){ input[i] = i+1; } #pragma omp parallel for schedule(static) for(i=0;i<snum;i++) { int* tmpsum = input+i; sum += *tmpsum; } 

This does not give the correct result for sum . What's wrong?

+5
source share
2 answers

Your code currently has race status , so the result is incorrect. To illustrate why this is so, use a simple example:

You work on 2 threads, and the array int input[4] = {1, 2, 3, 4}; . You correctly initialize sum to 0 and are ready to start the loop. In the first iteration of your loop, stream 0 and stream 1 read sum from memory as 0 , and then add their corresponding element to sum and write back to memory. However, this means that stream 0 tries to write sum = 1 to memory (the first element is 1 and sum = 0 + 1 = 1 ), and stream 1 tries to write sum = 2 to the memory (second element is 2 , and sum = 0 + 2 = 2 ). The final result of this code depends on which of the latter ends last, and therefore writes the last to memory, which is a condition of the race. Not only this, but in this particular case, not one of the answers that the code could produce is correct! There are several ways around this; Below I will talk about three main ones:

#pragma omp critical

OpenMP has a so-called critical directive. This limits the code so that only one thread can do something at a time. For example, your for -loop can be written:

 #pragma omp parallel for schedule(static) for(i = 0; i < snum; i++) { int *tmpsum = input + i; #pragma omp critical sum += *tmpsum; } 

This excludes the race condition, since only one stream is accessed and written to sum at a time. However, the critical directive is very bad for performance and is likely to kill most (if not all) of the gains you get from using OpenMP in the first place.

#pragma omp atomic

The atomic directive is very similar to the critical directive. The main difference is that although the critical directive applies to everything you would like to do one thread at a time, the atomic directive applies only to read / write operations in memory. Since all we do in this code example is read and write to summarize, this directive will work just fine:

 #pragma omp parallel for schedule(static) for(i = 0; i < snum; i++) { int *tmpsum = input + i; #pragma omp atomic sum += *tmpsum; } 

atomic performance is usually significantly better than critical performance. However, this is not the best option in your particular case.

reduction

The method that you should use, and the method that has already been suggested by others, is reduction . You can do this by changing for -loop to:

 #pragma omp parallel for schedule(static) reduction(+:sum) for(i = 0; i < snum; i++) { int *tmpsum = input + i; sum += *tmpsum; } 

The reduction command tells OpenMP that while the loop is working, you want each thread to track its own sum variable and add them all at the end of the loop. This is the most efficient method, since the whole cycle now runs in parallel, with the only overhead being at the end of the cycle when the sum values โ€‹โ€‹of each thread must be added up.

+8
source

Use the reduction clause ( description on MSDN ).

 int* input = (int*) malloc (sizeof(int)*snum); int sum = 0; int i; for(i=0;i<snum;i++){ input[i] = i+1; } #pragma omp parallel for schedule(static) reduction(+:sum) for(i=0;i<snum;i++) { sum += input[i]; } 
+3
source

Source: https://habr.com/ru/post/1207133/


All Articles