Why doesn't LLVM optimize floating point instructions?

Question

Why doesn't LLVM optimize floating point instructions?

See above. I wrote an example of functions:

source.ll:

 define i32 @bleh(i32 %x) { entry: %addtmp = add i32 %x, %x %addtmp1 = add i32 %addtmp, %x %addtmp2 = add i32 %addtmp1, %x %addtmp3 = add i32 %addtmp2, %x %addtmp4 = add i32 %addtmp3, 1 %addtmp5 = add i32 %addtmp4, 2 %addtmp6 = add i32 %addtmp5, 3 %multmp = mul i32 %x, 3 %addtmp7 = add i32 %addtmp6, %multmp ret i32 %addtmp7 }

source-fp.ll:

 define double @bleh(double %x) { entry: %addtmp = fadd double %x, %x %addtmp1 = fadd double %addtmp, %x %addtmp2 = fadd double %addtmp1, %x %addtmp3 = fadd double %addtmp2, %x %addtmp4 = fadd double %addtmp3, 1.000000e+00 %addtmp5 = fadd double %addtmp4, 2.000000e+00 %addtmp6 = fadd double %addtmp5, 3.000000e+00 %multmp = fmul double %x, 3.000000e+00 %addtmp7 = fadd double %addtmp6, %multmp ret double %addtmp7 }

Why does this happen when I optimize both functions with

opt -O3 source[-fp].ll -o opt.source[-fp].ll -S

what is i32 optimized but no double ? I expected fadd be combined with one fmul . Instead, it looks exactly the same.

Is this because flags are set differently? I know certain optimizations that are possible for i32 , which are not performed for double . But the lack of simple constant folding is beyond my comprehension.

I am using LLVM 3.1.

+4

compiler-optimization floating-point llvm

f00id Aug 13 '12 at 21:23

source share

1 answer

Stephen canon · Accepted Answer · 2012-08-13T21:56:30+0000

It is not entirely true to say that optimization is impossible. I will go through the first few lines to show where conversions are not allowed either:

  %addtmp = fadd double %x, %x

This first line can be safely converted to fmul double %x 2.0e+0 , but actually not optimized on most architectures ( fadd is usually as fast or fast as fmul and does not require the creation of a constant 2.0 ). Please note that the prohibition of overflow, this operation is accurate (like all scaling by degrees of two).

  %addtmp1 = fadd double %addtmp, %x

This string can be converted to fmul double %x 3.0e+0 . Why is this a legal transformation? Since the calculation that %addtmp was accurate, therefore only one rounding was made if it was calculated as x * 3 or x + x + x . Since these are the basic operations of IEEE-754 and therefore are correctly rounded, the result will be the same. How about overflow? None of them can overflow if the other does not.

  %addtmp2 = fadd double %addtmp1, %x

This is the first line that cannot be legally converted to a constant * x. 4 * x will calculate accurately without rounding, while x + x + x + x performs two rounds: x + x + x rounded once, and then adding x can be repeated a second time.

  %addtmp3 = fadd double %addtmp2, %x

The same thing here; 5 * x will have one rounding; x + x + x + x + x takes three.

The only line that can be converted favorably replaces x + x + x with 3 * x . However, the subexpression x + x already present elsewhere, so the optimizer can easily refuse to use this transformation (since it can take advantage of the existing partial result if it does not).

Why doesn't LLVM optimize floating point instructions?

More articles: