EDIT:
Goal:
Create an omnipresent method to get a custom power function that outperforms the built-in pow(double, uint) by reusing pre-computed / cached powers from power calculations for common variables.
What has already been done:
I already got such a function, which is about 40% faster than the built-in, but this is a brute-force function, created manually - I need a way to automatically generate such a power functional block for arbitrary power uint .
Knowns
To get the optimal custom pow(double, uint) , you will need some famous ones. On this issue are known (specify):
- Strength will be an integer.
- The maximum power value may be known (
N_MAX ). - There are known pre-calculated permissions that can be (re) used at compile time (for example, in my example
r2 , r4 and r6 ). - The square
r2 can always be considered calculated independently of other previously calculated powers.
DECISION REQUIREMENTS
The optimal solution, requiring a separate program for writing the case lookup table or preprocessor logic to create such a table, is acceptable, however, non-optimal solutions using manual lookup tables (i.e., obtained through enumeration) using authority at hand will not be made ( as I said this, and I will show that in my example ... the idea is to get away from it).
POSSIBLE DESTINATION ROUTE
As a suggestion, you know N_MAX and the set of permissions that B pre-calculated ( B={2,4,6} for my example). You can create either in a separate program or in the preprocessor a table of all squares Sq(Bi, x ) <= N_MAX . You can use this to form a basis set . You can use this to form a basis set A , which you then search somehow to determine the least number of terms that can be summed to produce an arbitrary exponent of n → 1 , where n <= N_MAX` take care of the odd case by checking LSB and multiplying by sqrt (r2)).
THEORETICAL REFERENCE
I believe that formally the method below is a modified version of exponentials by squaring:
http://en.wikipedia.org/wiki/Exponentiation_by_squaring
...., which uses the fact that some lower order powers are already pre-calculated by necessity, so it shifts the optimal set of multiplications from the vanilla exponent by the square (which, I believe, uses pow(double, int) ).
However, there are significant savings through the use of stored, low power intermediates instead of simple exp. squares on r2 .
THEORETICAL ACTIVITY
For example, for one set of objects n=14 .... in this scenario exp. gives strength
double r4 = Sq(r2), r14=Sq(r4)*r4*r2; //4 op.
... which takes 4 FP multiplications ..... but using r2 and r6 , we have
double r14=Sq(r6)*r2; //2 op.
.... 2 multiplications of FP .... in other words, going from dumb exponential squares to my modified exp. by squares using the general preliminary preparation of the exhibitors, I reduced my computational costs by 50% in terms of multiplication ... at least until the memory costs were considered.
REAL PERFORMANCE
With my current method (compiled with gcc -O3 ) I get 35.1 sec. to run 1 million cycles of my program, against (without any other modifications) 56.6 s using the built-in int pow(double, int) .... so it's almost theoretical acceleration.
At this point, you can scratch your head with how a 50% reduction in multiplications on a single command line can deliver an acceleration of 40%. But basically this line of code is called 1000+ times per cycle and is by far the most rated / most expensive line of code in the entire program. Therefore, the program seems very sensitive to a small optimization / improvement of this fragment.
ORIGINAL MAIL AND EXAMPLE CODE
I need to replace the pow(double, int) function, since I already calculated the 6th power term and saved the 2nd, 4th intermediate products, all of which can be used to reduce multiplications in the second pow call, which uses the same base double .
More specifically, in my C ++ code, I have a critical piece of code with critical characteristics, where I will raise the inverse distance between 3D points to 6th power and n-th power. eg:.
double distSq = CalcDist(p1,p2), r2 = a/distSq, r6 = r2 * r2 * r2; results += m*(pow(sqrt(r2), n) - r6);
Where m and a are the constants associated with the established equation, and n is an arbitrary power.
A slightly more efficient form:
double distSq = CalcDist(p1,p2), r2 = a/distSq, r6 = r2 * r2 * r2; results += m*(pow(r2, n)*(n&0x1?sqrt(r2):1.0) - r6);
However, this is also not optimal. What I found is much faster is to have a custom pow function that uses multiples of r2, r4 and r6, which I should already calculate in any case for the second term.
eg:.
double distSq = CalcDist(p1,p2), r2 = a/distSq, r4 = r2 * r2, r6 = r4 * r2; results += m*(POW(r2, r4, r6 n) - r6);
Inside the function:
double POW(double r2, double r4, double r6, uint n) { double results = (n&0x1 : sqrt(r2) : 1.0); n >>= 1; switch (n) { case 1: .... case 12: Sq(Sq(r6)); } return result; }
It's good that my function quickly appears during preliminary testing. The bad news is that it is not very ubiquitous and very long, because I need case for int powers from 8 to 50 or so (potentially even higher in the future). Further, in each case, I had to research and try to find different combinations to find using brute force output, the combination of r2 , r4 and r6 gave the smallest multiplications
Does anyone have a more common pow(double, int) replacement solution that uses the precalculated powers of the base to reduce the number of multiplications needed and / or have a ubiquitous theory on how you can determine the perfect combination to get the least multiplication for an arbitrary n and some set of previously calculated multiple values