Automatic vectorization on double and ffast-math

Why use -ffast-math with g ++ to achieve loop vectorization with double s? I don't like -ffast-math because I don't want to lose accuracy.

+5
source share
3 answers

You do not lose accuracy with -ffast-math . This only affects the processing of NaN , Inf , etc. And the order of operations.

If you have a specific piece of code in which you do not want GCC to reorder or simplify calculations, you can mark the variables as used with the asm statement.

For example, the following code performs a rounding operation on f . However, the two operations f += g and f -= g are likely to be optimized by gcc:

 static double moo(double f, double g) { g *= 4503599627370496.0; // 2 ** 52 f += g; f -= g; return f; } 

On x86_64, you can use this asm statement to asm GCC not to perform this optimization:

 static double moo(double f, double g) { g *= 4503599627370496.0; // 2 ** 52 f += g; __asm__("" : "+x" (f)); f -= g; return f; } 

You will need to adapt this for each architecture, unfortunately. In PowerPC, use +f instead of +x .

+8
source

Very likely, because vectorization means that you may have different results, or you can skip floating point signals / exceptions.

If you compile for 32-bit x86, then gcc and g ++ use x87 for floating point math by default, on 64-bit use SSE by default, however x87 can and will produce different values ​​for the same calculation, so it’s unlikely g ++ will consider vectorization if it cannot guarantee that you will get the same results if you do not use the -ffast-math or some of the -ffast-math flags.

In fact, it all boils down to the fact that the floating point environment for vectorized code can differ from the environment for vectorized code, sometimes in important ways, if the differences do not matter to you, something like

 -fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math 

but first look at these parameters and make sure that they do not affect the correctness of your program. -ffinite-math-only may also help

+2
source

Because -ffast-math allows you to reorder operands, which allows you to vectorize a lot of code.

For example, to calculate this

 sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99] 

the compiler is required to perform additions sequentially without -ffast-math , because -ffast-math floating point is neither commutative nor associative.

For the same reason that compilers cannot optimize a*a*a*a*a*a to (a*a*a)*(a*a*a) without -ffast-math

This means that vectorization is not available if you do not have very efficient horizontal vectors.

However, if -ffast-math , the expression can be evaluated as follows (see A7. Auto-Vectorization ).

 sum0 = a[0] + a[4] + a[ 8] + … a[96] sum1 = a[1] + a[5] + a[ 9] + … a[97] sum2 = a[2] + a[6] + a[10] + … a[98] sum3 = a[3] + a[7] + a[11] + … a[99] sum = sum0 + sum1 + sum2 + sum3 

Now the compiler can easily vectorize it by adding each column in parallel, and then perform horizontal addition at the end

sum == sum ? Only if (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + … This refers to associativity, which is not always respected. Specifying /fp:fast allows the compiler to convert your code to speed up work - up to 4 times faster for this simple calculation.

Do you prefer fast or accurate? - A7. Auto-Vectorization

This can be enabled using -fassociative-math in gcc.

Further reading

0
source

Source: https://habr.com/ru/post/1310043/


All Articles