Common vectors in OpenMP

I am trying to build the program that I am using and got the following question. Will I get a performance loss if multiple threads have to read / write on the same vector, but different elements of the vector? I have the feeling that my program is having difficulty coping with this. Take the following code:

#include <vector> int main(){ vector<double> numbers; vector<double> results(10); double x; //write 10 values in vector numbers for (int i =0; i<10; i++){ numbers.push_back(cos(i)); } #pragma omp parallel for \ private(x) \ shared(numbers, results) for(int j = 0; j < 10; j++){ x = 2 * numbers[j] + 5; #pragma omp critical // do I need this ? { results[j] = x; } } return 0; } 

Obviously, the actual program performs much more expensive operations, but this example should only explain my question. Thus, a for loop can be executed quickly and completely in parallel, or should different threads wait for each other, because only one thread at a time can access the vector number, for example, although they all read different elements of the vector?

The same question with the write operation: do I need a critical pragma, or is this not a problem, since each stream writes to another element of vector results? I am happy with every help I can get, and it would also be nice to know if there is a better way to do this (maybe not using vectors at all, but simple arrays and pointers, etc.?) I also read that the vectors in In some cases, they are not thread safe, and it is recommended to use a pointer: OpenMP and STL vector

Many thanks for your help!

+6
source share
2 answers

I assume that most of the problems with vectors in multiple threads would be if it changed, then it copies the entire contents of the vector to a new location in memory (a larger selection), which if you access it in parallel, then you just tried read the remote object.

If you do not resize the array, then I have never had a problem with writing reads to the vector at the same time (obviously, until I write the same element twice)

Due to the lack of performance improvements, the critical openmp section will slow down your program, possibly just as it would only use 1 thread (depending on how much is actually running outside this critical section)

You can delete the critical section instruction (subject to the above conditions).

+7
source

You do not get acceleration precisely because of critical sexy, which is unnecessary, because the same elements will never be changed at the same time. Remove a piece of critical section and it will work fine.

You can also play the scheduling strategy, because if the memory access is not linear (this is the example you gave), the threads can fight for the cache (by writing items to the same cache line). OTOH, if the number of elements is set as in your case, and there is no branching in the loop (so they will run at about the same speed), static , which is IIRC by default, should work as best as possible.

(BTW you can declare x inside the loop to avoid private(x) , and the shared directive is implied by IIRC (I never used it).

+5
source

Source: https://habr.com/ru/post/912103/


All Articles