Fastest pow () replacement via modified exp. by squaring when lower powers are already calculated

Question

Fastest pow () replacement via modified exp. by squaring when lower powers are already calculated

EDIT:

Goal:
Create an omnipresent method to get a custom power function that outperforms the built-in pow(double, uint) by reusing pre-computed / cached powers from power calculations for common variables.

What has already been done:
I already got such a function, which is about 40% faster than the built-in, but this is a brute-force function, created manually - I need a way to automatically generate such a power functional block for arbitrary power uint .

Knowns

To get the optimal custom pow(double, uint) , you will need some famous ones. On this issue are known (specify):

Strength will be an integer.
The maximum power value may be known ( N_MAX ).
There are known pre-calculated permissions that can be (re) used at compile time (for example, in my example r2 , r4 and r6 ).
The square r2 can always be considered calculated independently of other previously calculated powers.

DECISION REQUIREMENTS

The optimal solution, requiring a separate program for writing the case lookup table or preprocessor logic to create such a table, is acceptable, however, non-optimal solutions using manual lookup tables (i.e., obtained through enumeration) using authority at hand will not be made ( as I said this, and I will show that in my example ... the idea is to get away from it).

POSSIBLE DESTINATION ROUTE

As a suggestion, you know N_MAX and the set of permissions that B pre-calculated ( B={2,4,6} for my example). You can create either in a separate program or in the preprocessor a table of all squares Sq(Bi, x ) <= N_MAX . You can use this to form a basis set . You can use this to form a basis set A , which you then search somehow to determine the least number of terms that can be summed to produce an arbitrary exponent of n → 1 , where n <= N_MAX` take care of the odd case by checking LSB and multiplying by sqrt (r2)).

THEORETICAL REFERENCE

I believe that formally the method below is a modified version of exponentials by squaring:

http://en.wikipedia.org/wiki/Exponentiation_by_squaring

...., which uses the fact that some lower order powers are already pre-calculated by necessity, so it shifts the optimal set of multiplications from the vanilla exponent by the square (which, I believe, uses pow(double, int) ).

However, there are significant savings through the use of stored, low power intermediates instead of simple exp. squares on r2 .

THEORETICAL ACTIVITY

For example, for one set of objects n=14 .... in this scenario exp. gives strength

 double r4 = Sq(r2), r14=Sq(r4)*r4*r2; //4 op.

... which takes 4 FP multiplications ..... but using r2 and r6 , we have

 double r14=Sq(r6)*r2; //2 op.

.... 2 multiplications of FP .... in other words, going from dumb exponential squares to my modified exp. by squares using the general preliminary preparation of the exhibitors, I reduced my computational costs by 50% in terms of multiplication ... at least until the memory costs were considered.

REAL PERFORMANCE

With my current method (compiled with gcc -O3 ) I get 35.1 sec. to run 1 million cycles of my program, against (without any other modifications) 56.6 s using the built-in int pow(double, int) .... so it's almost theoretical acceleration.

At this point, you can scratch your head with how a 50% reduction in multiplications on a single command line can deliver an acceleration of 40%. But basically this line of code is called 1000+ times per cycle and is by far the most rated / most expensive line of code in the entire program. Therefore, the program seems very sensitive to a small optimization / improvement of this fragment.

ORIGINAL MAIL AND EXAMPLE CODE

I need to replace the pow(double, int) function, since I already calculated the 6th power term and saved the 2nd, 4th intermediate products, all of which can be used to reduce multiplications in the second pow call, which uses the same base double .

More specifically, in my C ++ code, I have a critical piece of code with critical characteristics, where I will raise the inverse distance between 3D points to 6th power and n-th power. eg:.

 double distSq = CalcDist(p1,p2), r2 = a/distSq, r6 = r2 * r2 * r2; results += m*(pow(sqrt(r2), n) - r6);

Where m and a are the constants associated with the established equation, and n is an arbitrary power.

A slightly more efficient form:

 double distSq = CalcDist(p1,p2), r2 = a/distSq, r6 = r2 * r2 * r2; results += m*(pow(r2, n)*(n&0x1?sqrt(r2):1.0) - r6);

However, this is also not optimal. What I found is much faster is to have a custom pow function that uses multiples of r2, r4 and r6, which I should already calculate in any case for the second term.

eg:.

 double distSq = CalcDist(p1,p2), r2 = a/distSq, r4 = r2 * r2, r6 = r4 * r2; results += m*(POW(r2, r4, r6 n) - r6);

Inside the function:

 double POW(double r2, double r4, double r6, uint n) { double results = (n&0x1 : sqrt(r2) : 1.0); n >>= 1; switch (n) { case 1: .... case 12: Sq(Sq(r6)); } return result; }

It's good that my function quickly appears during preliminary testing. The bad news is that it is not very ubiquitous and very long, because I need case for int powers from 8 to 50 or so (potentially even higher in the future). Further, in each case, I had to research and try to find different combinations to find using brute force output, the combination of r2 , r4 and r6 gave the smallest multiplications

Does anyone have a more common pow(double, int) replacement solution that uses the precalculated powers of the base to reduce the number of multiplications needed and / or have a ubiquitous theory on how you can determine the perfect combination to get the least multiplication for an arbitrary n and some set of previously calculated multiple values

+6

c ++ optimization math pow

Jason R. mick Sep 22 '13 at 15:13

source share

1 answer

us2012 · Answer 1 · 2013-09-22T16:37:17+0000

Here is a few DP-like algorithm that will give you the minimum number of multiplications for a given n and the available powers x^i , as well as optimal strategies using backtracking. For each possible exponent n match a pair (minimum number of multiplications to get here, type of multiplication that gets you there) , where for the second number just write i or a special character S to be squared.

Obviously, you start with 1 -> (0, /) .

Given n -> (m_n, Action_m) , set n+i -> to (m_n + 1, i) if m_n + 1 less than the previously calculated minimum number of moves by n+i . Similarly, set 2n -> (m_n + 1, S) if it is better than a possible previous solution.

This algorithm gives optimal strategies in about O(n_max * #available powers) . I do not claim that the algorithm itself is optimally efficient, although, of course, it makes no sense to use it on the fly. This is only useful if you have a reasonable n_max (100, of course, good in your case) and an efficient way to store strategies.

Two thoughts:

(1) Until this compares with the test results, I am not sure that this will lead to a significant increase in productivity compared to the standard exp by squaring (it largely depends on the available permissions, of course).

(2) The numerical error behavior of such strategies (as well as exp squared) is completely different from pow(double, double) .

Fastest pow () replacement via modified exp. by squaring when lower powers are already calculated

More articles: