I am writing a small library for statistical sampling, which should run as quickly as possible. In profiling, I found that about 40% of the time spent on this function is spent on calculating the Stirling approximation for the logarithm of the factorial. I am focusing optimization efforts on this snippet. Here is my code (which uses MPFR ):
const double AL[8] =
{ 0.0, 0.0, 0.6931471806, 1.791759469, 3.178053830, 4.787491743,
6.579251212, 8.525161361 };
void HGD::mpfr_afc(mpfr_t &ret, const mpfr_t &val){
if(mpfr_cmp_ui(val, 7) <= 0){
mpfr_set_d(ret, AL[mpfr_get_ui(val, MPFR_RNDN)], MPFR_RNDN);
}else{
mpfr_set(ret, val, MPFR_RNDN);
mpfr_add_d(ret, ret, 0.5, MPFR_RNDN);
mpfr_log(LV, val, MPFR_RNDN);
mpfr_mul(ret, LV, ret, MPFR_RNDN);
mpfr_sub(ret, ret, val, MPFR_RNDN);
mpfr_add_d(ret, ret, 0.399089934, MPFR_RNDN);
}
}
I have several different ideas:
- Comprehend more than the first 8 inputs to a function.
- Optimize math (use a coarser approximation for less accuracy)
- Using multiple threads for parallel computation on different inputs
- Switch to your own arithmetic when numbers can be inserted into machine data types
?