This can be improved by fixing some OpenMP errors. First, since you are summing (copies) of count in all parallel threads, you need to apply the reduction operator at the end of the parallel segment to combine all of them back into one value. In addition, the variables i , x , y and z must have separate instances for each parallel thread — you do not want the threads to use the same one! To indicate all this, your #pragma at the top of the loop should be:
#pragma omp parallel for private(i, x, y, z) reduction(+:count)
In addition, the scope is a for loop, so you don't have to do anything; automatically there will be a synchronization of flows after an exit of a cycle. (And you need this synchronization to get count , to contain all increments from all threads!) In particular, your task and barrier pragmas are pointless, since at that moment you return to one thread - and, in addition, there is no point in setting this single computation into a parallel task.
And there is a problem associated with the probability of slowness and / or bad randomness of the random number generator of the system in these cases. You will probably want to study the features of this in your system and give it a new random seed in each thread, or use a different random number generator, depending on what you find.
In addition, he looks pretty reasonable. You can not do much more with this algorithm, since it is short and trivially parallelizable.
source share