Why doesn't the compiler optimize a floating-point value * 2 in the increment of the exponent?

I often noticed that gcc converts multiplications to shifts in the executable. Something similar can happen when multiplying int and a float . For example, 2 * f may simply increase f by 1, saving some cycles. Can compilers possibly ask them to do this (e.g. via -ffast-math ), in general, do this?

Are the compilers smart enough to do this, or do I need to do it myself using the scalb*() or ldexp()/frexp() family of functions?

+46
c ++ performance optimization c compiler-optimization
Oct 16 '12 at 16:23
source share
8 answers

For example, 2 * f, can simply increase the exponent f by 1, saving several cycles.

This is simply not true.

At first you have too many angular cases, such as zero, infinity, Nan, and denormals. Then you have a performance problem.

Misunderstanding lies in the fact that the increment of the exponent is not faster than performing the multiplication.

If you look at the hardware instructions, there is no direct way to increase the exponent. So what you need to do is:

  • Bitwise conversion to integer.
  • The increase in the exponent.
  • Bitwise conversion back to floating point.

Typically, there is usually a medium or long delay to move data between integers and floating point execution units. Therefore, in the end, this “optimization” becomes much worse than a floating point prime.

Therefore, the reason the compiler does not do this “optimization” is because it is not faster.

+81
Oct 16 '12 at 16:33
source share

In modern processors, multiplication usually has one-cycle throughput and low latency. If the value is already in the floating-point register, you cannot defeat it by juggling it to perform integer arithmetic in the view. If this is in memory for starters, and if you are not accepting either the current value or the correct result is nonzero, denormal, nan, or infinity, then it might be faster to do something like

 addl $0x100000, 4(%eax) # x86 asm example 

multiply by two; the only time I could see that it is useful if you are working with a whole array of floating point data that are limited from zero and infinity, and scaling with the power of two is the only operation that you will perform (so you don’t no reason to load data into floating point registers).

+20
Oct 16
source share

Common floating point formats, in particular IEEE 754, do not save the exponent as a prime integer, and processing it as an integer will not produce the correct results.

In a 32-bit float or 64-bit binary expression, the exponent field is 8 or 11 bits, respectively. Exponent codes from 1 to 254 (in float) or from 1 to 2046 (in double order) act as integers: if you add one of these values ​​and the result is one of these values, then the represented value is doubled. However, adding to these situations fails:

  • The initial value is 0 or subnormal. In this case, the exponent field starts from zero and adds 2 -126 (in float) or 2 -1022 (twice) to the number; he does not double the number.
  • The initial value exceeds 2 127 (in float) or 2 1023 (in double). In this case, the exponent field starts from 254 or 2046, and adding one to it changes the number to NaN; he does not double the number.
  • The initial value is infinity or NaN. In this case, the exponent field starts at 255 or 2047, and adding one to it changes it to zero (and probably overflows into a signed bit). The result is zero or subnormal, but must be infinite or NaN, respectively.

(The above applies to positive signs. The situation is symmetrical with negative signs.)

As others noted, some processors do not have the means to quickly control bits of floating point values. Even for those that do this, the exponent field is not isolated from other bits, so you usually cannot add it to it without overflowing it into a sign bit in the latter case above.

Although some applications may carry shortcuts, such as neglecting subnormal or NaN or even infinities, it is rare that applications can ignore zero. Since adding one to the exponent does not allow you to correctly process zero, it is not applicable.

+17
Oct. 16 '12 at 17:00
source share

This is not about compilers or compilers that are not smart. It is more like complying with standards and creating all the necessary “side effects” like Infs, Nans and denormals.

Also, it may not be the creation of other side effects that are not caused, such as reading memory. But I admit that in some cases it can be faster.

+9
Oct 16 '12 at 16:26
source share

Actually, this is what happens in the hardware.

2 also transmitted to the FPU as a floating-point number, with a mantissa 1.0 and an exponent of 2 ^ 1. For multiplication, exponentials are added, and mantissas are multiplied.

Given that special equipment is used to handle a complex case (multiplied by values ​​not equal to two), and a special case is not processed worse than when using special equipment, it makes no sense to have additional circuits and instructions.

+3
Oct 17 '12 at 6:14
source share

It may be useful for embedded system compilers to have a special pseudo-operator with a power scale of two, which could be translated by the code generator in any way optimal for the machine in question, since on some embedded processors the focus on the exponent can be an order of magnitude faster than performing full multiplication by two, but on embedded microcircuits, where the multiplication is the slowest, the compiler can probably achieve a greater increase in performance by using IAOD in floating point mode check its arguments at runtime to skip part of the right of the mantissa.

0
Oct 17
source share

The previous question related to the transition to a stop code on multiplication by powers of 2 . Consensus and actual implementations have proven that, unfortunately, there is no current way to be more efficient than standard multiplication.

0
Nov 14 '12 at 9:12
source share

If you think that multiplying by two means increasing the figure by 1, think again. Here are the possible cases for IEEE 754 floating point arithmetic:

Case 1: Infinity and NaN remain unchanged.

Case 2: the floating point numbers with the highest possible exponent change to infinity, increasing the exponent and setting the mantissa, except that the sign bit is zero.

Case 3: Normalized floating-point numbers with an exponent less than the maximum possible exponent exponent increase by one. Yippee !!!

Case 4: Denormalized floating point numbers with the highest bit set mantissas increase their exponent by one, turning them into normalized numbers.

Case 5: Denormalized floating-point numbers with the highest mantissa bits, including +0 and -0, have their own mantissa shifted to the left by one bit position, leaving the exponent unchanged.

I highly doubt that a compiler that produces integer code that handles all of these cases correctly will be anywhere as fast as the built-in floating point processor. And it is only suitable for multiplying by 2.0. To multiply by 4.0 or 0.5, a completely new set of rules is applied. And for the case of multiplication by 2.0, you can try to replace x * 2.0 with x + x, and many compilers will do this. That is, they do this because the processor can, for example, perform one addition and one multiplication at a time, but not one of each type. Therefore, sometimes you prefer x * 2.0, and sometimes x + x, depending on what other operations you need to do at the same time.

0
Mar 11 '15 at 17:12
source share



All Articles