Is there an exact approximation of the acos () function?

I need a double precision acos() function in a computational shader. Since there is no double-precision acos() function in GLSL, I tried to implement my own.

First, I implemented the Taylor series as an equation from the Wiki - the Taylor series with pre-calculated faculty values. But this seems inaccurate around 1. The maximum error was approximately 0.08 with 40 iterations.

I also implemented this method , which works very well on the CPU with a maximum error of -2.22045e-16, but I have some problems for implementing this in the shader.

I am currently using the acos() approximation function from here , where someone posted their approximation functions to this site. I use the most accurate function of this site, and now I get the maximum error -7.60454e-08, but also this error is too high.

My code for this function is:

 double myACOS(double x) { double part[4]; part[0] = 32768.0/2835.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+sqrt(2.0+2.0*x)))); part[1] = 256.0/135.0*sqrt(2.0-sqrt(2.0+sqrt(2.0+2.0*x))); part[2] = 8.0/135.0*sqrt(2.0-sqrt(2.0+2.0*x)); part[3] = 1.0/2835.0*sqrt(2.0-2.0*x); return (part[0]-part[1]+part[2]-part[3]); } 

Does anyone know another acos() implementation method that is very accurate and, if possible, easy to implement in a shader?

Some system information:

  • Nvidia GT 555M
  • launch OpenGL 4.3 with optirun
+6
source share
2 answers

My current exact shader implementation of acos () is a mix from the usual Taylor series and a response from Bence . With 40 iterations, I get precision 4.44089e-16 before implementing acos () from math.h. This may not be the best, but it works for me:

And here he is:

 double myASIN2(double x) { double sum, tempExp; tempExp = x; double factor = 1.0; double divisor = 1.0; sum = x; for(int i = 0; i < 40; i++) { tempExp *= x*x; divisor += 2.0; factor *= (2.0*double(i) + 1.0)/((double(i)+1.0)*2.0); sum += factor*tempExp/divisor; } return sum; } double myASIN(double x) { if(abs(x) <= 0.71) return myASIN2(x); else if( x > 0) return (PI/2.0-myASIN2(sqrt(1.0-(x*x)))); else //x < 0 or x is NaN return (myASIN2(sqrt(1.0-(x*x)))-PI/2.0); } double myACOS(double x) { return (PI/2.0 - myASIN(x)); } 

Any comments what can be done better? For example, using LUT for factor values, but in my shader acos () is just called once, so there is no need for it.

+2
source

The NVIDIA GT 555M GPU is a device with 2.1 computing power, so there is basic hardware support for basic dual-precision operations, including fused multipy-add (FMA). As with all NVIDIA GPUs, the square root operation is performed. I am familiar with CUDA, but not with GLSL. According to version 4.3 of the GLSL specification , it provides a double precision FMA function as a fma() function and provides a double precision square root, sqrt() . It is unclear if sqrt() is implemented correctly in accordance with IEEE-754 rules. I guess this is similar to CUDA.

Instead of using the Taylor series, one would have to use a minimal Remez approximation. To optimize speed and accuracy, the use of FMA is essential. Estimation of a polynomial using the Horner scheme is sufficient for high accrual. The code below uses a second-order Orner scheme. As in the DanceIgel file, acos conveniently calculated using the asin approximation as the main building block in combination with standard mathematical identifiers.

With 400 M test vectors, the maximum relative error observed with the code below was 2.67 e-16, and the maximum error ulp 1.442 ulp.

 /* compute arcsin (a) for a in [-9/16, 9/16] */ double asin_core (double a) { double q, r, s, t; s = a * a; q = s * s; r = 5.5579749017470502e-2; t = -6.2027913464120114e-2; r = fma (r, q, 5.4224464349245036e-2); t = fma (t, q, -1.1326992890324464e-2); r = fma (r, q, 1.5268872539397656e-2); t = fma (t, q, 1.0493798473372081e-2); r = fma (r, q, 1.4106045900607047e-2); t = fma (t, q, 1.7339776384962050e-2); r = fma (r, q, 2.2372961589651054e-2); t = fma (t, q, 3.0381912707941005e-2); r = fma (r, q, 4.4642857881094775e-2); t = fma (t, q, 7.4999999991367292e-2); r = fma (r, s, t); r = fma (r, s, 1.6666666666670193e-1); t = a * s; r = fma (r, t, a); return r; } /* Compute arccosine (a), maximum error observed: 1.4316 ulp Double-precision factorization of ฯ€ courtesy of Tor Myklebust */ double my_acos (double a) { double r; r = (a > 0.0) ? -a : a; // avoid modifying the "sign" of NaNs if (r > -0.5625) { /* arccos(x) = pi/2 - arcsin(x) */ r = fma (9.3282184640716537e-1, 1.6839188885261840e+0, asin_core (r)); } else { /* arccos(x) = 2 * arcsin (sqrt ((1-x) / 2)) */ r = 2.0 * asin_core (sqrt (fma (0.5, r, 0.5))); } if (!(a > 0.0) && (a >= -1.0)) { // avoid modifying the "sign" of NaNs /* arccos (-x) = pi - arccos(x) */ r = fma (1.8656436928143307e+0, 1.6839188885261840e+0, -r); } return r; } 
+1
source

Source: https://habr.com/ru/post/983608/


All Articles