Automatic vectorization on double and ffast-math

Question

Automatic vectorization on double and ffast-math

Why use -ffast-math with g ++ to achieve loop vectorization with double s? I don't like -ffast-math because I don't want to lose accuracy.

+5

gcc double vectorization fast-math g ++

Rugero turra May 17, '10 at 20:54

source share

3 answers

Very likely, because vectorization means that you may have different results, or you can skip floating point signals / exceptions.

If you compile for 32-bit x86, then gcc and g ++ use x87 for floating point math by default, on 64-bit use SSE by default, however x87 can and will produce different values for the same calculation, so it’s unlikely g ++ will consider vectorization if it cannot guarantee that you will get the same results if you do not use the -ffast-math or some of the -ffast-math flags.

In fact, it all boils down to the fact that the floating point environment for vectorized code can differ from the environment for vectorized code, sometimes in important ways, if the differences do not matter to you, something like

 -fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math

but first look at these parameters and make sure that they do not affect the correctness of your program. -ffinite-math-only may also help

+2

Spudd86 Jun 14 '10 at 16:29

source share

Because `-ffast-math` allows you to reorder operands, which allows you to vectorize a lot of code.

For example, to calculate this

 sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99]

the compiler is required to perform additions sequentially without -ffast-math , because -ffast-math floating point is neither commutative nor associative.

For the same reason that compilers cannot optimize a*a*a*a*a*a to (a*a*a)*(a*a*a) without -ffast-math

This means that vectorization is not available if you do not have very efficient horizontal vectors.

However, if -ffast-math , the expression can be evaluated as follows (see A7. Auto-Vectorization ).

 sum0 = a[0] + a[4] + a[ 8] + … a[96] sum1 = a[1] + a[5] + a[ 9] + … a[97] sum2 = a[2] + a[6] + a[10] + … a[98] sum3 = a[3] + a[7] + a[11] + … a[99] sum = sum0 + sum1 + sum2 + sum3

Now the compiler can easily vectorize it by adding each column in parallel, and then perform horizontal addition at the end

sum == sum ? Only if (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + … This refers to associativity, which is not always respected. Specifying /fp:fast allows the compiler to convert your code to speed up work - up to 4 times faster for this simple calculation.
Do you prefer fast or accurate? - A7. Auto-Vectorization

This can be enabled using -fassociative-math in gcc.

Automatic vectorization on double and ffast-math

Because `-ffast-math` allows you to reorder operands, which allows you to vectorize a lot of code.

Further reading

More articles:

Automatic vectorization on double and ffast-math

Because -ffast-math allows you to reorder operands, which allows you to vectorize a lot of code.

Further reading

More articles:

Because `-ffast-math` allows you to reorder operands, which allows you to vectorize a lot of code.