O3 optimization flag makes acceleration worse in parallel processing

Question

O3 optimization flag makes acceleration worse in parallel processing

I am testing accelerations for a parallel C program using OpenMP. Using the -O3 flag to compile code with gcc, the execution time seems a lot shorter. However, I constantly get slower accelerations for different thread numbers (2,4,8,16,24) compared to code compiled without optimization flags. How is this possible?

Here is more information about what I have found so far. I am writing code for finding prime numbers based on Sieve of Eratosthenes and trying to optimize it using a parallel version using OpenMP. Here is the code

#include <stdio.h>
#include <stdlib.h>
#include <omp.h> 
#include <math.h> 

// ind2num: returns the integer (3<=odd<=numMax)
//      represented by index i at prime_numbers (0<=i<=maxInd)
#define ind2num(i)  (2*(i)+3)
// num2ind: retorns the index (0<=i<=maxInd) at prime_numbers
//      which represents the number (3<=odd<=numMax)
#define num2ind(i)  (((i)-3)/2)

// Sieve: find all prime numbers until ind2num(maxInd)
void Sieve(int *prime_numbers, long maxInd) {
    long maxSqrt;
    long baseInd;
    long base;
    long i;

    // square root of the largest integer (largest possible prime factor)
    maxSqrt = (long) sqrt((long) ind2num(maxInd));

    // first base
    baseInd=0;
    base=3;

    do {
        // marks as non-prime all multiples of base starting at base^2
        #pragma omp parallel for schedule (static)
        for (i=num2ind(base*base); i<=maxInd; i+=base) {
            prime_numbers[i]=0;
        }

        // updates base to next prime number
        for (baseInd=baseInd+1; baseInd<=maxInd; baseInd++)
            if (primos[baseInd]) {
                base = ind2num(baseInd);
                break;
            }
    }
    while (baseInd <= maxInd && base <= maxSqrt);

}

, 1000000000 (10 ^ 9), , (1,2,4,8,16,24)

-O3 | 56.31s | 28.87s | 21.77s | 11.19s | 6.13s | 4.50s |
-O3.... | 10.10s | 5.23s | 3.74s | 2.81s | 2.62s | 2.52s |

:

-O3 | 1 | 1.95 | 2,59 | 5,03 | 9.19 | 12.51 |
-O3.... | 1 | 1.93 | 2.70 | 3,59 | 3,85 | 4.01 |

- -O3?

+4

c openmp

Gabriel Ilharco 19 . '16 17:35

1

David Schwartz · Accepted Answer · 2016-08-19T18:51:47+0000

, . , . , .

, , . , . .

O3 optimization flag makes acceleration worse in parallel processing

More articles: