OpenMP vs gcc compiler optimization

I am studying openmp using an example of calculating pi value by quadrature. In sequential order, I run the following C code:

double serial() {
    double step;
    double x,pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    for (int i = 0; i < num_steps; i++) {
        x = (i + 0.5) * step; // forward quadature
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

I compare this with the omp implementation, using parallel to shorten:

double SPMD_for_reduction() {
    double step;
    double pi,sum = 0.0;

    step = 1.0 / (double) num_steps;

    #pragma omp parallel for reduction (+:sum)
    for (int i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x*x);
    }
    pi = step * sum;

    return pi;
}

For num_steps = 1,000,000,000 and 6 threads in the omp case, I compile and time:

    double start_time = omp_get_wtime();
    serial();
    double end_time = omp_get_wtime();

    start_time = omp_get_wtime();
    SPMD_for_reduction();
    end_time = omp_get_wtime();

Using cc compiler optimizers should not exceed 4 s (serial) and .66s (omp). With the -O3 flag, the serial runtime drops to “.000001s,” and the omp runtime is basically unchanged. What's going on here? Are these vector instructions used, or is it bad code or a synchronization method? If it is a vectorization, why omp function is not used?

It may seem that my machine uses a modern 6-core Xeon processor.

Thanks!

+4
1

. , . .

double start_time = omp_get_wtime();
serial(); //<-- Computations not used.
double end_time = omp_get_wtime();

openMP , , , , .

, , - double serial_pi = serial();, serial_pi. , , .

+2

Source: https://habr.com/ru/post/1620441/


All Articles