How to improve the performance of this numerical computation in Haskell?

Question

How to improve the performance of this numerical computation in Haskell?

I am in the middle of porting David Bleigh's original C-implementation of highlighting the hidden Dirichlet distribution in Haskell, and I'm trying to decide whether to leave some of the low-level material in C. The next function, one example, is to approximate the second derivative of lgamma :

 double trigamma(double x) { double p; int i; x=x+6; p=1/(x*x); p=(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x+0.5*p; for (i=0; i<6 ;i++) { x=x-1; p=1/(x*x)+p; } return(p); }

I translated this into a more or less idiomatic Haskell as follows:

 trigamma :: Double -> Double trigamma x = snd $ last $ take 7 $ iterate next (x' - 1, p') where x' = x + 6 p = 1 / x' ^ 2 p' = p / 2 + c / x' c = foldr1 (\ab -> (a + b * p)) [1, 1/6, -1/30, 1/42, -1/30, 5/66] next (x, p) = (x - 1, 1 / x ^ 2 + p)

The problem is that when I run both through Criterion , my Haskell version is six or seven times slower (I compile with -O2 on GHC 6.12.1). Some of these features are even worse.

I don't know anything about Haskell's performance, and I'm not really interested in digging through Core or something like that, since I can always just call a handful of C-functions with mathematical intensity through FFI.

But I'm curious if there are low-grade fruits that I miss - some kind of extension or library or annotation that I could use to speed up this numerical material without making it too ugly.

UPDATE: Here are two best solutions, thanks to Don Stewart and Yitz . I modified Yitz a bit to use Data.Vector .

 invSq x = 1 / (x * x) computeP x = (((((5/66*p-1/30)*p+1/42)*p-1/30)*p+1/6)*p+1)/x+0.5*p where p = invSq x trigamma_d :: Double -> Double trigamma_d x = go 0 (x + 5) $ computeP $ x + 6 where go :: Int -> Double -> Double -> Double go !i !x !p | i >= 6 = p | otherwise = go (i+1) (x-1) (1 / (x*x) + p) trigamma_y :: Double -> Double trigamma_y x = V.foldl' (+) (computeP $ x + 6) $ V.map invSq $ V.enumFromN x 6

The performance of the two, apparently, is almost the same: one or the other gain in percentage or two, depending on the flags of the compiler.

As camccann said in Reddit , the moral of the story is “For best results, use Don Stewart as a GHC code generator.” If you prohibit this decision, the safest bet seems to be to simply convert the C management structures directly to Haskell, although merging fusion can produce similar performance in a more idiomatic style.

I will probably end up using the Data.Vector approach in my code.

+46

performance c math haskell

Travis Brown Jun 05 '10 at 3:14

source share

2 answers

Before the optimization work, I would not say that your original translation is the most idiomatic way to express in Haskell what the C code does.

How would the optimization process continue if we started with the following:

 trigamma :: Double -> Double trigamma x = foldl' (+) p' . map invSq . take 6 . iterate (+ 1) $ x where invSq y = 1 / (y * y) x' = x + 6 p = invSq x' p' =(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x'+0.5*p

+8

Yitz Jun 06 2018-10-06T00:

source share

Don Stewart · Accepted Answer · 2010-06-05 03:38

Use the same management and data structures that give:

 {-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -fvia-C -optc-O3 -fexcess-precision -optc-march=native #-} {-# INLINE trigamma #-} trigamma :: Double -> Double trigamma x = go 0 (x' - 1) p' where x' = x + 6 p = 1 / (x' * x') p' =(((((0.075757575757576*p-0.033333333333333)*p+0.0238095238095238) *p-0.033333333333333)*p+0.166666666666667)*p+1)/x'+0.5*p go :: Int -> Double -> Double -> Double go !i !x !p | i >= 6 = p | otherwise = go (i+1) (x-1) (1 / (x*x) + p)

I do not have your testuite, but this gives the following asm:

 A_zdwgo_info: cmpq $5, %r14 jg .L3 movsd .LC0(%rip), %xmm7 movapd %xmm5, %xmm8 movapd %xmm7, %xmm9 mulsd %xmm5, %xmm8 leaq 1(%r14), %r14 divsd %xmm8, %xmm9 subsd %xmm7, %xmm5 addsd %xmm9, %xmm6 jmp A_zdwgo_info

Which looks fine. This is the type of code that the -fllvm backend -fllvm . Good job.

GCC expands the loop, and the only way to do this is through either Template Haskell or manual expansion. You might think that (macro TH) if you do a lot of this.

In fact, the LLVM GHC backend loops around :-)

Finally, if you really like the original version of Haskell, write it using stream combinators and the GHC will convert them back to loops. (Exercise for the reader).

How to improve the performance of this numerical computation in Haskell?

More articles: