How to improve the performance of this numerical computation in Haskell?

I am in the middle of porting David Bleigh's original C-implementation of highlighting the hidden Dirichlet distribution in Haskell, and I'm trying to decide whether to leave some of the low-level material in C. The next function, one example, is to approximate the second derivative of lgamma :

 double trigamma(double x) { double p; int i; x=x+6; p=1/(x*x); p=(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x+0.5*p; for (i=0; i<6 ;i++) { x=x-1; p=1/(x*x)+p; } return(p); } 

I translated this into a more or less idiomatic Haskell as follows:

 trigamma :: Double -> Double trigamma x = snd $ last $ take 7 $ iterate next (x' - 1, p') where x' = x + 6 p = 1 / x' ^ 2 p' = p / 2 + c / x' c = foldr1 (\ab -> (a + b * p)) [1, 1/6, -1/30, 1/42, -1/30, 5/66] next (x, p) = (x - 1, 1 / x ^ 2 + p) 

The problem is that when I run both through Criterion , my Haskell version is six or seven times slower (I compile with -O2 on GHC 6.12.1). Some of these features are even worse.

I don't know anything about Haskell's performance, and I'm not really interested in digging through Core or something like that, since I can always just call a handful of C-functions with mathematical intensity through FFI.

But I'm curious if there are low-grade fruits that I miss - some kind of extension or library or annotation that I could use to speed up this numerical material without making it too ugly.




UPDATE: Here are two best solutions, thanks to Don Stewart and Yitz . I modified Yitz a bit to use Data.Vector .

 invSq x = 1 / (x * x) computeP x = (((((5/66*p-1/30)*p+1/42)*p-1/30)*p+1/6)*p+1)/x+0.5*p where p = invSq x trigamma_d :: Double -> Double trigamma_d x = go 0 (x + 5) $ computeP $ x + 6 where go :: Int -> Double -> Double -> Double go !i !x !p | i >= 6 = p | otherwise = go (i+1) (x-1) (1 / (x*x) + p) trigamma_y :: Double -> Double trigamma_y x = V.foldl' (+) (computeP $ x + 6) $ V.map invSq $ V.enumFromN x 6 

The performance of the two, apparently, is almost the same: one or the other gain in percentage or two, depending on the flags of the compiler.

As camccann said in Reddit , the moral of the story is β€œFor best results, use Don Stewart as a GHC code generator.” If you prohibit this decision, the safest bet seems to be to simply convert the C management structures directly to Haskell, although merging fusion can produce similar performance in a more idiomatic style.

I will probably end up using the Data.Vector approach in my code.

+46
performance c math haskell
Jun 05 '10 at 3:14
source share
2 answers

Use the same management and data structures that give:

 {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -fvia-C -optc-O3 -fexcess-precision -optc-march=native #-} {-# INLINE trigamma #-} trigamma :: Double -> Double trigamma x = go 0 (x' - 1) p' where x' = x + 6 p = 1 / (x' * x') p' =(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x'+0.5*p go :: Int -> Double -> Double -> Double go !i !x !p | i >= 6 = p | otherwise = go (i+1) (x-1) (1 / (x*x) + p) 

I do not have your testuite, but this gives the following asm:

 A_zdwgo_info: cmpq $5, %r14 jg .L3 movsd .LC0(%rip), %xmm7 movapd %xmm5, %xmm8 movapd %xmm7, %xmm9 mulsd %xmm5, %xmm8 leaq 1(%r14), %r14 divsd %xmm8, %xmm9 subsd %xmm7, %xmm5 addsd %xmm9, %xmm6 jmp A_zdwgo_info 

Which looks fine. This is the type of code that the -fllvm backend -fllvm . Good job.

GCC expands the loop, and the only way to do this is through either Template Haskell or manual expansion. You might think that (macro TH) if you do a lot of this.

In fact, the LLVM GHC backend loops around :-)

Finally, if you really like the original version of Haskell, write it using stream combinators and the GHC will convert them back to loops. (Exercise for the reader).

+48
Jun 05 2018-10-10T00:
source share

Before the optimization work, I would not say that your original translation is the most idiomatic way to express in Haskell what the C code does.

How would the optimization process continue if we started with the following:

 trigamma :: Double -> Double trigamma x = foldl' (+) p' . map invSq . take 6 . iterate (+ 1) $ x where invSq y = 1 / (y * y) x' = x + 6 p = invSq x' p' =(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x'+0.5*p 
+8
Jun 06 2018-10-06T00:
source share



All Articles