Why doesn't LLVM optimize floating point instructions?

See above. I wrote an example of functions:

source.ll:

 define i32 @bleh(i32 %x) { entry: %addtmp = add i32 %x, %x %addtmp1 = add i32 %addtmp, %x %addtmp2 = add i32 %addtmp1, %x %addtmp3 = add i32 %addtmp2, %x %addtmp4 = add i32 %addtmp3, 1 %addtmp5 = add i32 %addtmp4, 2 %addtmp6 = add i32 %addtmp5, 3 %multmp = mul i32 %x, 3 %addtmp7 = add i32 %addtmp6, %multmp ret i32 %addtmp7 } 

source-fp.ll:

 define double @bleh(double %x) { entry: %addtmp = fadd double %x, %x %addtmp1 = fadd double %addtmp, %x %addtmp2 = fadd double %addtmp1, %x %addtmp3 = fadd double %addtmp2, %x %addtmp4 = fadd double %addtmp3, 1.000000e+00 %addtmp5 = fadd double %addtmp4, 2.000000e+00 %addtmp6 = fadd double %addtmp5, 3.000000e+00 %multmp = fmul double %x, 3.000000e+00 %addtmp7 = fadd double %addtmp6, %multmp ret double %addtmp7 } 

Why does this happen when I optimize both functions with

opt -O3 source[-fp].ll -o opt.source[-fp].ll -S

what is i32 optimized but no double ? I expected fadd be combined with one fmul . Instead, it looks exactly the same.

Is this because flags are set differently? I know certain optimizations that are possible for i32 , which are not performed for double . But the lack of simple constant folding is beyond my comprehension.

I am using LLVM 3.1.

+4
source share
1 answer

It is not entirely true to say that optimization is impossible. I will go through the first few lines to show where conversions are not allowed either:

  %addtmp = fadd double %x, %x 

This first line can be safely converted to fmul double %x 2.0e+0 , but actually not optimized on most architectures ( fadd is usually as fast or fast as fmul and does not require the creation of a constant 2.0 ). Please note that the prohibition of overflow, this operation is accurate (like all scaling by degrees of two).

  %addtmp1 = fadd double %addtmp, %x 

This string can be converted to fmul double %x 3.0e+0 . Why is this a legal transformation? Since the calculation that %addtmp was accurate, therefore only one rounding was made if it was calculated as x * 3 or x + x + x . Since these are the basic operations of IEEE-754 and therefore are correctly rounded, the result will be the same. How about overflow? None of them can overflow if the other does not.

  %addtmp2 = fadd double %addtmp1, %x 

This is the first line that cannot be legally converted to a constant * x. 4 * x will calculate accurately without rounding, while x + x + x + x performs two rounds: x + x + x rounded once, and then adding x can be repeated a second time.

  %addtmp3 = fadd double %addtmp2, %x 

The same thing here; 5 * x will have one rounding; x + x + x + x + x takes three.

The only line that can be converted favorably replaces x + x + x with 3 * x . However, the subexpression x + x already present elsewhere, so the optimizer can easily refuse to use this transformation (since it can take advantage of the existing partial result if it does not).

+7
source

Source: https://habr.com/ru/post/957023/


All Articles