OpenMP Parallel Peak

Question

OpenMP Parallel Peak

I am using OpenMP in Visual Studio 2010 to speed up loops.

I wrote a very simple test to see an increase in performance using OpenMP. I am using omp parallel in an empty loop

int time_before = clock(); #pragma omp parallel for for(i = 0; i < 4; i++){ } int time_after = clock(); std::cout << "time elapsed: " << (time_after - time_before) << " milliseconds" << std::endl;

Without the omp pragma, it sequentially takes 0 milliseconds to complete (as expected), and with the help of the pragma it usually also takes the value 0. The problem is that with the opm pragma it clicks from time to time, from 10 to 32 milliseconds. Every time I tried in parallel with OpenMP, I got these random bursts, so I tried this very basic test. Are spikes an integral part of OpenMP, or can they be avoided?

Parallel for me gives great speedups on some cycles, but these random bursts are too large for me to use.

0

c ++ performance multithreading parallel-processing openmp

user3124047 Jun 29 '14 at 5:32

source share

3 answers

kukis · Answer 1 · 2014-06-29T07:15:06+0000

This is pretty normal behavior. Sometimes your operating system is busy and takes longer to create new threads.

Gugi · Answer 2 · 2014-06-29T07:38:35+0000

I want to complement the answer of the cookie: I would also say that the reason for the appearance of spikes is due to the additional overhead that comes with OpenMP.

Also, since you are performing performance-sensitive measurements, I hope you compiled your code with optimizations enabled. In this case, a loop without OpenMP is simply optimized by the compiler, so there is no code between time_before and time_after . However, with OpenMP at least g ++ 4.8.1 ( -O3 ) cannot optimize the code: the loop is still present in assembler and contains additional instructions for managing sharing. (I cannot try this with VS at the moment.)

Thus, the comparison is not very fair, since it is fully optimized without OpenMP.

Edit: You should also keep in mind that OpenMP does not recreate threads every time. Rather, it uses a thread pool. So, if you are doing an omp construct before your loop, threads are already created when another is detected:

 // Dummy loop: Spawn the threads. #pragma omp parallel for for(int i = 0; i < 4; i++){ } int time_before = clock(); // Do the actual measurement. OpenMP re-uses the threads. #pragma omp parallel for for(int i = 0; i < 4; i++){ } int time_after = clock();

In this case, the spikes should disappear.

minjang · Answer 3 · 2014-07-08T23:21:26+0000

If “OpenMP parallel spiking,” which I would call “parallel overhead,” is troubling in your loop, it means you probably don't have enough workload to parallelize . Parallelization gives acceleration only if you have a sufficient problem size. You have already shown an extreme example: there is no work in a parallel loop. In this case, you will observe greatly changing times due to parallel overhead.

The parallel overhead in OpenMP omp parallel for includes several factors:

First, omp parallel for is the sum of omp parallel and omp for .
Overhead spawning or waking streams (many OpenMP implementations will not create / destroy every omp parallel .
As for omp for , the overhead of (a) scheduling workflows for workflows, (b) scheduling (especially if dynamic scheduling is used).
The overhead of the implicit barrier at the end of omp parallel , if nowait not specified.

FYI, to measure OpenMP concurrent overhead, would be more efficient:

 double measureOverhead(int tripCount) { static const size_t TIMES = 10000; int sum = 0; int startTime = clock(); for (size_t k = 0; k < TIMES; ++k) { for (int i = 0; i < tripCount; ++i) { sum += i; } } int elapsedTime = clock() - startTime; int startTime2 = clock(); for (size_t k = 0; k < TIMES; ++k) { #pragma omp parallel for private(sum) // We don't care correctness of sum // Otherwise, use "reduction(+: sum)" for (int i = 0; i < tripCount; ++i) { sum += i; } } int elapsedTime2 = clock() - startTime2; double parallelOverhead = double(elapsedTime2 - elapsedTime)/double(TIMES); return parallelOverhead; }

Try to run such a small code, perhaps once, and then take the average. Also, put at least a minimum workload in cycles. In the above code, parallelOverhead represents the approximate overhead of the OpenMP omp parallel for construct.

OpenMP Parallel Peak

More articles: