Disclaimer: The following example is just a fictitious example to quickly understand the problem. If you are thinking about a real world problem, think about dynamic programming.
Problem: We have an n * m matrix, and we want to copy the elements from the previous row, as in the following code:
for (i = 1; i < n; i++) for (j = 0; j < m; j++) x[i][j] = x[i-1][j];
Approach: Iterations of the outer loop should be performed in order, they will be executed sequentially. The inner loop can be parallelized. We would like to minimize the overhead of creating and destroying threads, so we would like to create a thread team only once, however this seems like an impossible task in OpenMP.
#pragma omp parallel private(j) { for (i = 1; i < n; i++) { #pragma omp for scheduled(dynamic) for (j = 0; j < m; j++) x[i][j] = x[i-1][j]; } }
When we use the ordered option in the outer loop, the code will be executed in a sequential manner, so there will be no performance gain. I am looking for a solution for the scenario above, even if I had to use some workaround.
I am adding my actual code. It is actually slower than seq. version. Please view:
for (i = 1; i <= n; i++) scanf ("%d %d", &in[i][W], &in[i][V]); for (i = 0; i <= wc; i++) a[0][i] = 0; #pragma omp parallel private(i,w) { for(i = 1; i <= n; ++i)
As for the measurement, I am using something like this:
double t; t = omp_get_wtime();