OpenMP / C ++: number of elements in for-loop

I am doing very simple tests with OpenMP in C ++, and I am facing a problem that is probably stupid, but I cannot figure out what happened. In the following MWE:

#include <iostream> #include <ctime> #include <vector> #include <omp.h> int main() { int nthreads=1, threadid=0; clock_t tstart, tend; const int nx=10, ny=10, nz=10; int i, j, k; std::vector<std::vector<std::vector<long long int> > > arr_par; arr_par.resize(nx); for (i=0; i<nx; i++) { arr_par[i].resize(ny); for (j = 0; j<ny; j++) { arr_par[i][j].resize(nz); } } tstart = clock(); #pragma omp parallel default(shared) private(threadid) { #ifdef _OPENMP nthreads = omp_get_num_threads(); threadid = omp_get_thread_num(); #endif #pragma omp master std::cout<<"OpenMP execution with "<<nthreads<<" threads"<<std::endl; #pragma omp end master #pragma omp barrier #pragma omp critical { std::cout<<"Thread id: "<<threadid<<std::endl; } #pragma omp for for (i=0; i<nx; i++) { for (j=0; j<ny; j++) { for (k=0; k<nz; k++) { arr_par[i][j][k] = i*j + k; } } } } tend = clock(); std::cout<<"Elapsed time: "<<(tend - tstart)/double(CLOCKS_PER_SEC)<<" s"<<std::endl; return 0; } 

if nx , ny and nz are 10 , the code runs smoothly. If I increase these numbers to 20 , I get segfault. It works without problems in sequence or with OMP_NUM_THREADS=1 , regardless of the number of elements.

I put this damn thing together with

 g++ -std=c++0x -fopenmp -gstabs+ -O0 test.cpp -o test 

using GCC 4.6.3.

Any thought would be appreciated!

+2
source share
1 answer

You have data race in your loop counters:

 #pragma omp for for (i=0; i<nx; i++) { for (j=0; j<ny; j++) { // <--- data race for (k=0; k<nz; k++) { // <--- data race arr_par[i][j][k] = i*j + k; } } } 

Since neither the j or k class is set to the private data exchange class, their values ​​can exceed the corresponding limits when several threads try to increase them at a time, which will lead to access without binding to arr_par . The possibility of increasing the number of threads j or k increases with the number of iterations.

The best way to handle these cases is to simply declare the loop variables inside the loop statement itself:

 #pragma omp for for (int i=0; i<nx; i++) { for (int j=0; j<ny; j++) { for (int k=0; k<nz; k++) { arr_par[i][j][k] = i*j + k; } } } 

Another way is to add the private(j,k) clause to the head of the parallel region:

 #pragma omp parallel default(shared) private(threadid) private(j,k) 

It is not necessary to do i private in your case, since the loop variable of parallel loops is implicitly made private. However, if i used elsewhere in the code, it might make sense to make it confidential to prevent other data races.

Also, do not use clock() to measure time for parallel applications, since on most Unix OS it returns the total processor time for all threads. Use omp_get_wtime() instead.

+2
source

Source: https://habr.com/ru/post/974693/


All Articles