Optimized low-precision approximation of `rootn (x, n)`

rootn (float_t x, int_t n) is a function that computes the nth root x 1 / n and is supported by some programming languages ​​such as OpenCL . When IEEE-754 floating-point numbers are used, effective low-current starting approximations for any n can be generated based on a simple manipulation of the basic bit pattern, assuming that only normalized x operands should be processed.

The binary exponent root (x, n) will be 1 / n of the binary exponent x . The IEEE-754 floating point exponent field is offset. Instead of not biasing the exponent, dividing it and re-biasing the result, we can simply divide the biased exponent by n , and then apply the bias to compensate for the previously forgotten bias. In addition, instead of highlighting, dividing the exponent field, we can simply divide the entire operand x , reinterpreted as an integer. The required offset is trivial, to find the value 1 as an argument, will return the result 1 for any n .

If we have two helper functions, __int_as_float() , which reinterpret the IEEE-754 binary32 as int32 and __float_as_int() , which reinterpret the int32 operand as binary32 , we come to following the low-precision approach to rootn (x, n) simple way:

 rootn (x, n) ~= __int_as_float((int)(__float_as_int(1.0f)*(1.0-1.0/n)) + __float_as_int(x)/n) 

The integer division __float_as_int (x) / n can be reduced to a shift or multiplication plus a shift of the known optimizations of the integer division by a constant divisor . Some examples:

 rootn (x, 2) ~= __int_as_float (0x1fc00000 + __float_as_int (x) / 2) // sqrt (x) rootn (x, 3) ~= __int_as_float (0x2a555556 + __float_as_int (x) / 3) // cbrt (x) rootn (x, -1) ~= __int_as_float (0x7f000000 - __float_as_int (x) / 1) // rcp (x) rootn (x, -2) ~= __int_as_float (0x5f400000 - __float_as_int (x) / 2) // rsqrt (x) rootn (x, -3) ~= __int_as_float (0x54aaaaaa - __float_as_int (x) / 3) // rcbrt (x) 

For all these approximations, the result will be accurate only when x = 2 n * m for integer m . Otherwise, the approximation will provide a reassessment compared to the true mathematical result. We can approximately halve the maximum relative error by slightly reducing the bias, which will lead to a balanced combination of underestimation and revaluation. This is easily accomplished by binary searching for the optimal offset, which uses all floating-point numbers in the interval [1, 2 n ) as test cases. In doing so, we find:

 rootn (x, 2) ~= __int_as_float (0x1fbb4f2e + __float_as_int(x)/2) // max rel err = 3.47474e-2 rootn (x, 3) ~= __int_as_float (0x2a51067f + __float_as_int(x)/3) // max rel err = 3.15547e-2 rootn (x,-1) ~= __int_as_float (0x7ef311c2 - __float_as_int(x)/1) // max rel err = 5.05103e-2 rootn (x,-2) ~= __int_as_float (0x5f37642f - __float_as_int(x)/2) // max rel err = 3.42128e-2 rootn (x,-3) ~= __int_as_float (0x54a232a3 - __float_as_int(x)/3) // max rel err = 3.42405e-2 

Some may notice that computing for rootn (x,-2) is basically the initial part of Quake fast inverse square root .

Based on the differences between the original source offset and the final offset, optimized to minimize the maximum relative error, I could formulate a heuristic for secondary correction and, thus, the final, optimized offset value.

However, I am wondering if it is possible to determine the optimal bias by a closed-loop formula so that the maximum absolute value of the relative error max (| (approx (x, n) - x 1 / n ) / x 1 / n |) is minimized for all x [1,2 n ). For convenience of presentation, we can limit the numbers to binary32 (IEEE-754 with one precision).

I know that in general for minimax approximations there is no closed-form solution, however, I get the impression that closed-form solutions exist for the case of polynomial approximations to algebraic functions, such as the nth root. In this case, we have a (piecewise) linear approximation.

+28
math floating-point algorithm bit-manipulation
Aug 17 '15 at 4:10
source share
1 answer

Here's some Octave code (MATLAB) that calculates offsets for positive n, assuming the hypotheses below. It seems to work on 2 and 3, but I suspect that one of the assumptions breaks when n is too large. No time to investigate right now.

 % finds the best offset for positive n % assuming that the conjectures below are true function oint = offset(n) % find the worst upward error with no offset efrac = (1 / log(2) - 1) * n; evec = [floor(efrac) ceil(efrac)]; rootnvec = 2 .^ (evec / n); [abserr i] = max((1 + evec / n) ./ rootnvec); relerr = abserr - 1; rootnx = rootnvec(i); % conjecture: z such that approx(z, n) = 1 % should have the worst downward error fun = @(o) relerr - o / rootnx + (1 / (1 + o * n) ^ (1 / n) - 1); oreal = fzero(fun, [0 1]); oint = round((127 * (1 - 1 / n) - oreal) * 2 ^ 23); 



The partial answer is only for positive n - I'm just going to bribe a bit by assuming the worst revaluation, assuming we are not adjusting the downward bias.

Let us define an idealized approximation for x in [1, 2^n) .

 rootn-A(x, n) = 1 + floor(lg(x))/n + ((x/2^floor(lg(x))) - 1) / n ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ contribution of contribution of the the exponent significand 

We want to maximize rootn-A(x, n) / x^(1/n) .

It seems experimentally that the maximum occurs when x is a power of two. In this case, the significant term is zero and floor(lg(x)) = lg(x) , so we can maximize

 (1 + lg(x)/n) / x^(1/n). 

Replace y = lg(x)/n , and we can maximize (1 + y) / 2^y for y in [0, 1) so that n*y an integer. Changing the integrity condition, this is a calculus exercise showing that this maximum concave function is at y = 1/log(2) - 1 , about 0.4426950408889634 . It follows that the maximum for x degree two is either x = 2^floor((1/log(2) - 1) * n) or x = 2^ceil((1/log(2) - 1) * n) I assume that one of them is actually a global maximum.

At the end of the underestimation, it seems to us that we want x so that the rootn(x, n) output is 1 , at least for small n . Later, I hope.

+8
Aug 19 '15 at 15:52
source share
— -



All Articles