I assume that your interest lies in the implementation of exponential functions, which can be found in standard mathematical libraries for HLL, in particular, C / C ++. These include the functions exp() , exp2() , exp10() and pow() , as well as analogues with the same precision expf() , exp2f() , exp10f() and powf() .
The exponentiation methods that you are talking about (e.g. k-ary, sliding window) are commonly used in cryptographic algorithms such as RSA, which are based on exponentiality. They are usually not used for exponential functions provided through math.h or cmath . The implementation details of standard mathematical functions, such as exp() , are different, but the general scheme follows a three-step process:
- reduction of function argument to primary approximation interval
- approximation of a suitable basic function on the interval of primary approximation
- mapping the result for the primary interval to the entire range of the function
An auxiliary step is often handling special cases. They can refer to special mathematical situations, such as log(0.0) , or to special floating point operands, such as NaN (Not a Number).
The C99 code for expf(float) below shows an example of how these steps look for a specific example. The argument a first partitioned so that exp(a) = e r * 2 i where i is an integer and r is in [log (sqrt (0.5), log (sqrt (2.0)], the interval of the primary approximation. In the second step we approximate e r by a polynomial. Such approximations can be constructed in accordance with various design criteria, such as minimizing the absolute or relative error. The polynomial can be estimated in different ways, including the Horner scheme and the Estrin scheme.
The code below uses a very general approach, using the minimax approximation, which minimizes the maximum error in the entire approximation interval. The standard algorithm for calculating such approximations is the Remez algorithm. The assessment is carried out according to the Horner scheme; The numerical accuracy of this estimate is enhanced by using fmaf() .
This standard mathematical function implements what is known as smooth multiple addition or FMA. This computes a*b+c using the full product a*b while adding and applying one rounding at the end. On most modern hardware, such as GPUs, IBM Power processors, recent x86 processors (e.g. Haswell), recent ARM processors (as an optional extension), this is a direct reference to the hardware instruction. On platforms that lack such an instruction, fmaf() will display a rather slow emulation code, in which case we do not want to use it if we are interested in performance.
The final calculation is 2 i multiplication for which C and C ++ provide the ldexp() function. Industrial Strength library code typically uses machine-specific features that use IEEE-754 binary arithmetic for float . Finally, the code cleans up overflow and overflow cases.
The x90 FPU inside x86 processors has an F2XM1 that computes 2 x -1 at [-1.1]. This can be used for the second step of calculating exp() and exp2() . There is an FSCALE that is used to multiply by 2 i in the third step. A common way to implement F2XM1 is microcode that uses a rational or polynomial approximation. Please note that the x90 FPU is currently supported mainly for supporting older versions. Modern x86 platform platforms typically use pure SSE-based software implementations and algorithms similar to those shown below. Some combine small tables with polynomial approximations.
pow(x,y) can be conceptually implemented as exp(y*log(x)) , but this suffers from a significant loss of accuracy when x is near unity and y is large, as well as incorrect handling of numerous special cases specified in C / C ++ standards. One way around the accuracy problem is to compute log(x) and the product y*log(x)) in some form of extended precision. The details would fill out a complete, long, separate answer, and I don't have a code to demonstrate it. In various mathematical libraries, C / C ++ pow(double,int) and powf(float, int) are calculated using a separate code path that uses the square-and-multiply method with bit scanning of a binary representation of an integer exponent.