Controlling FPU behavior in OpenMP?

I have a great C ++ program that modifies the FPU control word (using _controlfp() ). This eliminates some FPU exceptions and sets SEHTranslator to create C ++ typed exceptions. I am using VC ++ 9.0.

I would like to use OpenMP (v.2.0) to parallelize some of our computational cycles. I already applied it to one, but the numerical results are slightly different (although I understand that this can also be due to the fact that the calculations are performed in a different order). I assume this is because the state of the FPU is thread dependent. Is there any way that OpenMP threads inherit this state from the main thread? Or is there a way to indicate with OpenMP that new threads perform a specific function that sets the correct state? What is the idiomatic way to handle this situation?

+4
source share
3 answers
  • As you have already indicated, double / float operations are not associative / commutative / are distributed as real numbers in mathematics. In particular, the multiplication / division of a huge number / a very small number can lead to noticeable accuracy errors when changing the order of calculations.

  • The state of the FPU must be thread-dependent, since the state is represented as a register, and the status of the register (= context) is specific to the flow.

  • It is ambiguous to say that the generated threads inherit the state of the main thread, because in this context the state is not clear. If you mean the registration status, then this is not so.

  • My suggestion is - why don't you just set the FPU control word for each thread? For example, before spawning an OpenMP stream, i.e. Before parallel, save the current FPU control word in a global variable using _ status87 . Then put the statements that read the global variable and set the new value in parallel - for iteration. Since it is read-only on a global variable, you are not worried about data race.

 unsigned int saved_status = _status87(); #pragma omp parallel for (...) for (int i = 0; i < N; ++i) { _controlfp(saved_status, ...); .. } 
+1
source

The likelihood is that this is due to streamlining of floating point operations. We all rely on our operations being associative and commutative, but the unfortunate truth is that floating point operations are not commutative, so when they are parallelized, the results can vary as the order gets randomized.

Try running the loops back and see if the result is different.

If you have a need for a thread, OMP provides guarantees about iterations of loops falling on the same threads, i.e. if you run a loop from 1 to N on a quad core, iterations from 1 to N / 4 will run on the same thread.

-Rick

0
source

I came to the conclusion that I have no problems. The differences in the results are related to the order of calculations, and not to the state of the FPU in different threads (we do not change the accuracy or rounding mode). As for masking FPU exceptions that differ in workflows, this is not a concern, because if a workflow performs an operation that will lead to an exception, this result (currently NaN or Inf, etc.) will ultimately β€œhang” from the main thread and the exception will be thrown.

In addition, an exception must be detected in the same OpenMP stream that selected it. This means that I want the main thread to be able to rule out exceptions altogether.

0
source

Source: https://habr.com/ru/post/1300666/


All Articles