What exponential algorithms use CPU / programming languages?

I learned about faster exponentiation algorithms (k-ary, sliding doors, etc.) and wondered which of them are used in processors / programming languages? (I am unclear as to whether this is happening on the CPU or through the compiler)

And only for kicks, which is the fastest?

Edit the latitude: it is intentionally wide, because I know that there are many different tricks for this. The verified answer had what I was looking for.

+1
source share
1 answer

I assume that your interest lies in the implementation of exponential functions, which can be found in standard mathematical libraries for HLL, in particular, C / C ++. These include the functions exp() , exp2() , exp10() and pow() , as well as analogues with the same precision expf() , exp2f() , exp10f() and powf() .

The exponentiation methods that you are talking about (e.g. k-ary, sliding window) are commonly used in cryptographic algorithms such as RSA, which are based on exponentiality. They are usually not used for exponential functions provided through math.h or cmath . The implementation details of standard mathematical functions, such as exp() , are different, but the general scheme follows a three-step process:

  • reduction of function argument to primary approximation interval
  • approximation of a suitable basic function on the interval of primary approximation
  • mapping the result for the primary interval to the entire range of the function

An auxiliary step is often handling special cases. They can refer to special mathematical situations, such as log(0.0) , or to special floating point operands, such as NaN (Not a Number).

The C99 code for expf(float) below shows an example of how these steps look for a specific example. The argument a first partitioned so that exp(a) = e r * 2 i where i is an integer and r is in [log (sqrt (0.5), log (sqrt (2.0)], the interval of the primary approximation. In the second step we approximate e r by a polynomial. Such approximations can be constructed in accordance with various design criteria, such as minimizing the absolute or relative error. The polynomial can be estimated in different ways, including the Horner scheme and the Estrin scheme.

The code below uses a very general approach, using the minimax approximation, which minimizes the maximum error in the entire approximation interval. The standard algorithm for calculating such approximations is the Remez algorithm. The assessment is carried out according to the Horner scheme; The numerical accuracy of this estimate is enhanced by using fmaf() .

This standard mathematical function implements what is known as smooth multiple addition or FMA. This computes a*b+c using the full product a*b while adding and applying one rounding at the end. On most modern hardware, such as GPUs, IBM Power processors, recent x86 processors (e.g. Haswell), recent ARM processors (as an optional extension), this is a direct reference to the hardware instruction. On platforms that lack such an instruction, fmaf() will display a rather slow emulation code, in which case we do not want to use it if we are interested in performance.

The final calculation is 2 i multiplication for which C and C ++ provide the ldexp() function. Industrial Strength library code typically uses machine-specific features that use IEEE-754 binary arithmetic for float . Finally, the code cleans up overflow and overflow cases.

The x90 FPU inside x86 processors has an F2XM1 that computes 2 x -1 at [-1.1]. This can be used for the second step of calculating exp() and exp2() . There is an FSCALE that is used to multiply by 2 i in the third step. A common way to implement F2XM1 is microcode that uses a rational or polynomial approximation. Please note that the x90 FPU is currently supported mainly for supporting older versions. Modern x86 platform platforms typically use pure SSE-based software implementations and algorithms similar to those shown below. Some combine small tables with polynomial approximations.

pow(x,y) can be conceptually implemented as exp(y*log(x)) , but this suffers from a significant loss of accuracy when x is near unity and y is large, as well as incorrect handling of numerous special cases specified in C / C ++ standards. One way around the accuracy problem is to compute log(x) and the product y*log(x)) in some form of extended precision. The details would fill out a complete, long, separate answer, and I don't have a code to demonstrate it. In various mathematical libraries, C / C ++ pow(double,int) and powf(float, int) are calculated using a separate code path that uses the square-and-multiply method with bit scanning of a binary representation of an integer exponent.

 #include <math.h> /* import fmaf(), ldexpf() */ /* Like rintf(), but -0.0f -> +0.0f, and |a| must be < 2**22 */ float quick_and_dirty_rintf (float a) { float cvt_magic = 0x1.800000p+23f; return (a + cvt_magic) - cvt_magic; } /* Approximate exp(a) on the interval [log(sqrt(0.5)), log(sqrt(2.0))]. Maximum ulp error = 0.70951 */ float expf_poly (float a) { float r; r = 0x1.6ab95ep-10f; r = fmaf (r, a, 0x1.126d28p-07f); r = fmaf (r, a, 0x1.5558f8p-05f); r = fmaf (r, a, 0x1.55543ap-03f); r = fmaf (r, a, 0x1.fffffap-02f); r = fmaf (r, a, 0x1.000000p+00f); r = fmaf (r, a, 0x1.000000p+00f); return r; } /* Approximate exp2() on interval [-0.5,+0.5] Maximum ulp error = 0.79961 */ float exp2f_poly (float a) { float r; r = 0x1.4308f2p-13f; r = fmaf (r, a, 0x1.5f0722p-10f); r = fmaf (r, a, 0x1.3b2bd4p-07f); r = fmaf (r, a, 0x1.c6af80p-05f); r = fmaf (r, a, 0x1.ebfbdep-03f); r = fmaf (r, a, 0x1.62e430p-01f); r = fmaf (r, a, 0x1.000000p+00f); return r; } /* Approximate exp10(a) on [log(sqrt(0.5))/log(10), log(sqrt(2.0))/log(10)]. Maximum ulp error = 0.77879 */ float exp10f_poly (float a) { float r; r = 0x1.a65694p-3f; r = fmaf (r, a, 0x1.158a00p-1f); r = fmaf (r, a, 0x1.2bda9ap+0f); r = fmaf (r, a, 0x1.046f72p+1f); r = fmaf (r, a, 0x1.535248p+1f); r = fmaf (r, a, 0x1.26bb1cp+1f); r = fmaf (r, a, 0x1.000000p+0f); return r; } /* Compute exponential base e. Maximum ulp error = 0.88790 */ float my_expf (float a) { float t, r; int i; t = a * 0x1.715476p+0f; // 1/log(2) t = quick_and_dirty_rintf (t); i = (int)t; r = fmaf (t, -0x1.62e430p-1f, a); // log_2_hi r = fmaf (t, 0x1.05c610p-29f, r); // log_2_lo t = expf_poly (r); r = ldexpf (t, i); if (a < -105.0f) r = 0.0f; if (a > 105.0f) r = 1.0f/0.0f; // +INF return r; } /* Compute exponential base 2. Maximum ulp error = 0.87922 */ float my_exp2f (float a) { float t, r; int i; t = quick_and_dirty_rintf (a); i = (int)t; r = a - t; t = exp2f_poly (r); r = ldexpf (t, i); if (a < -152.0f) r = 0.0f; if (a > 152.0f) r = 1.0f/0.0f; // +INF return r; } /* Compute exponential base 10. Maximum ulp error = 0.95588 */ float my_exp10f (float a) { float r, t; int i; t = a * 0x1.a934f0p+1f; t = quick_and_dirty_rintf (t); i = (int)t; r = fmaf (t, -0x1.344136p-2f, a); // log10(2)_hi r = fmaf (t, 0x1.ec10c0p-27f, r); // log10(2)_lo t = exp10f_poly (r); r = ldexpf (t, i); if (a < -46.0f) r = 0.0f; if (a > 46.0f) r = 1.0f/0.0f; // +INF return r; } 
+1
source

Source: https://habr.com/ru/post/1246201/


All Articles