Ackermann is very ineffective with Haskell / GHC

I am trying to calculate Ackermann(4,1) , and there is a big performance difference between different languages ​​/ compilers. Below are the results on my Core i7 3820QM, 16G, Ubuntu 12.10 64bit ,

C: 1.6s , gcc -O3 (with gcc 4.7.2)

 int ack(int m, int n) { if (m == 0) return n+1; if (n == 0) return ack(m-1, 1); return ack(m-1, ack(m, n-1)); } int main() { printf("%d\n", ack(4,1)); return 0; } 

OCaml: 3.6s , ocamlopt (with okaml 3.12.1)

 let rec ack = function | 0,n -> n+1 | m,0 -> ack (m-1, 1) | m,n -> ack (m-1, ack (m, n-1)) in print_int (ack (4, 1)) 

ML standard: 5.1s mlton -codegen c -cc-opt -O3 (with mlton 20100608)

 fun ack 0 n = n+1 | ack m 0 = ack (m-1) 1 | ack mn = ack (m-1) (ack m (n-1)); print (Int.toString (ack 4 1)); 

Racket: 11.5 with racket (with racket v5.3.3)

 (require racket/unsafe/ops) (define + unsafe-fx+) (define - unsafe-fx-) (define (ack mn) (cond [(zero? m) (+ n 1)] [(zero? n) (ack (- m 1) 1)] [else (ack (- m 1) (ack m (- n 1)))])) (time (ack 4 1)) 

Haskell: unfinished , killed by the system after 22s ghc -O2 (with ghc 7.4.2)

Haskell: 1.8 with ajhc (with ajhc 0.8.0.4)

 main = print $ ack 4 1 where ack :: Int -> Int -> Int ack 0 n = n+1 ack m 0 = ack (m-1) 1 ack mn = ack (m-1) (ack m (n-1)) 

The Haskell version is the only one that cannot complete properly because it takes up too much memory. He freezes my car and fills the swap space before they kill him. What can I do to improve it without seriously fixing the code?

EDIT . I appreciate some of the asymptotically smart solutions, but they are not quite what I ask. This is more like whether the compiler processes certain patterns in an efficient way (stack, tail calls, unpacking, etc.) than evaluating the ackermann function.

EDIT 2 . As noted by several answers, this looks like a bug in recent versions of GHC . I try the same code with AJHC and get much better performance.

Thank you very much:)

+49
performance optimization benchmarking recursion haskell
Apr 20 '13 at 2:13
source share
7 answers

NB: The high memory issue is a bug in the RHG GHC , where when overflowing and stacking new stacks on the heap, it was not checked whether garbage collection is required. It is already fixed in the GHC HEAD.




I was able to get much better performance with the ack CPS conversion:

 module Main where data P = P !Int !Int main :: IO () main = print $ ack (P 4 1) id where ack :: P -> (Int -> Int) -> Int ack (P 0 n) k = k (n + 1) ack (P m 0) k = ack (P (m-1) 1) k ack (P mn) k = ack (P m (n-1)) (\a -> ack (P (m-1) a) k) 

Your original function consumes all available memory on my machine, while it works in a constant space.

 $ time ./Test 65533 ./Test 52,47s user 0,50s system 96% cpu 54,797 total 

Ocaml is still faster:

 $ time ./test 65533./test 7,97s user 0,05s system 94% cpu 8,475 total 

Edit: When compiling with JHC, your original program is about as fast as the Ocaml version:

 $ time ./hs.out 65533 ./hs.out 5,31s user 0,03s system 96% cpu 5,515 total 

Edit 2: Something else I discovered: running the original program with a large package size ( +RTS -kc1M ) makes it work in constant space. However, the CPS version is still a little faster.

Edit 3: I was able to create a version that runs as fast as Ocaml, by manually deploying the main loop. However, it only works when starting with +RTS -kc1M (Dan Doel sent an error about this behavior):

 {-# LANGUAGE CPP #-} module Main where data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Int ack0 :: Int -> Int ack0 n =(n+1) #define C(a) a #define CONCAT(a,b) C(a)C(b) #define AckType(M) CONCAT(ack,M) :: Int -> Int AckType(1) AckType(2) AckType(3) AckType(4) #define AckDecl(M,M1) \ CONCAT(ack,M) n = case n of { 0 -> CONCAT(ack,M1) 1 \ ; 1 -> CONCAT(ack,M1) (CONCAT(ack,M1) 1) \ ; _ -> CONCAT(ack,M1) (CONCAT(ack,M) (n-1)) } AckDecl(1,0) AckDecl(2,1) AckDecl(3,2) AckDecl(4,3) ack :: P -> (Int -> Int) -> Int ack (P mn) k = case m of 0 -> k (ack0 n) 1 -> k (ack1 n) 2 -> k (ack2 n) 3 -> k (ack3 n) 4 -> k (ack4 n) _ -> case n of 0 -> ack (P (m-1) 1) k 1 -> ack (P (m-1) 1) (\a -> ack (P (m-1) a) k) _ -> ack (P m (n-1)) (\a -> ack (P (m-1) a) k) main :: IO () main = print $ ack (P 4 1) id 

Testing:

 $ time ./Test +RTS -kc1M 65533 ./Test +RTS -kc1M 6,30s user 0,04s system 97% cpu 6,516 total 

Change 4 . Space leak seems to be fixed in GHC HEAD , therefore +RTS -kc1M will not be in the future.

+36
Apr 20 '13 at 3:00
source share

There seems to be some kind of mistake. What version of GHC are you using?

With GHC 7, I get the same behavior as you. The program consumes all available memory without producing any output.

However, if I compile it with GHC 6.12.1 only with ghc --make -O2 Ack.hs , it works fine. It calculates the result in 10.8s on my computer, and the open source version is 7.8s .

I suggest you report this error on the GHC website .

+13
Apr 20 '13 at
source share

This version uses some properties of the ackermann function. This is not equivalent to other versions, but it is quick:

 ackermann :: Int -> Int -> Int ackermann 0 n = n + 1 ackermann m 0 = ackermann (m - 1) 1 ackermann 1 n = n + 2 ackermann 2 n = 2 * n + 3 ackermann 3 n = 2 ^ (n + 3) - 3 ackermann mn = ackermann (m - 1) (ackermann m (n - 1)) 

Edit: And this is the version with memoization, we see that it is easy to memoize functions in haskell, the only change is on the call site:

 import Data.Function.Memoize ackermann :: Integer -> Integer -> Integer ackermann 0 n = n + 1 ackermann m 0 = ackermann (m - 1) 1 ackermann 1 n = n + 2 ackermann 2 n = 2 * n + 3 ackermann 3 n = 2 ^ (n + 3) - 3 ackermann mn = ackermann (m - 1) (ackermann m (n - 1)) main :: IO () main = print $ memoize2 ackermann 4 2 
+7
Apr 20 '13 at 10:01
source share

The following is an idiomatic version that uses Haskell laziness and GHC optimization for top-level constant expressions.

 acks :: [[Int]] acks = [ [ case (m, n) of (0, _) -> n + 1 (_, 0) -> acks !! (m - 1) !! 1 (_, _) -> acks !! (m - 1) !! (acks !! m !! (n - 1)) | n <- [0..] ] | m <- [0..] ] main :: IO () main = print $ acks !! 4 !! 1 

Here we lazily construct a matrix for all values ​​of the Ackerman function. As a result, subsequent calls to acks will not compromise anything (i.e., evaluating acks !! 4 !! 1 will not double the execution time again).

Although this is not the fastest solution, it is very similar to a naive implementation, it is very efficient in terms of memory usage, and it processes one of the powerful Haskell functions (laziness) as a force.

+5
Apr 20 '13 at 3:10
source share

I don’t see that this is a mistake at all, ghc simply does not use the fact that it knows that 4 and 1 are the only arguments from which the function will ever be called - this, roughly speaking, it does not deceive. It also does not do constant math for you, so if you wrote main = print $ ack (2+2) 1 , it would not calculate 2 + 2 = 4 before the execution time. ghc has much more important things to think about. Help for the latter difficulty is available if you take care of it http://hackage.haskell.org/package/const-math-ghc-plugin .

So ghc helps if you do a little math, for example. this is at least crystal time as fast as your C program with 4 and 1 as arguments. But try with 4 and 2:

 main = print $ ack 4 2 where ack :: Int -> Integer -> Integer ack 0 n = n + 1 ack 1 n = n + 2 ack 2 n = 2 * n + 3 ack m 0 = ack (m-1) 1 ack mn = ack (m-1) (ack m (n-1) ) 

This will give the correct answer, all ~ 20,000 digits, for a tenth of a second, while gcc, with your algorithm, will return forever to give the wrong answer.

+4
Apr 20
source share

Writing an algorithm in Haskell in a way that is similar to the way you wrote it in C is not the same algorithm, because the semantics of recursion are completely different.

Here is a version using the same mathematical algorithm, but where we present the calls to the Ackermann function symbolically using a data type. Thus, we can more precisely control the semantics of recursion.

When compiling with optimizations, this version works in read-only memory, but it is slow - about 4.5 minutes in an environment similar to yours. But I’m sure that it could be changed to be much faster. This is just to give an idea.

 data Ack = Ack !Int ack :: Int -> Int -> Int ack mn = length . ackR $ Ack m : replicate n (Ack 0) where ackR n@(Ack 0 : _) = n ackR n = ackR $ ack' n ack' [] = [] ack' (Ack 0 : n) = Ack 0 : ack' n ack' [Ack m] = [Ack (m-1), Ack 0] ack' (Ack m : n) = Ack (m-1) : ack' (Ack m : decr n) decr (Ack 0 : n) = n decr n = decr $ ack' n 
+4
Apr 23 '13 at 18:09
source share

This performance issue (with the exception of the GHC RTS error, obviously) is apparently now fixed in OS X 10.8 after upgrading from Apple XCode to 4.6.2 . I can still play it on Linux (I tested using the GHC LLVM backend, though), but not anymore on OS X. After I upgraded Xcode to 4.6.2, the new version seems to have influenced the generation of GHC code for Ackermann is significant (from what I remember looking at the preliminary update of the object dumps). I was able to reproduce the performance issue on Mac before the Xcode upgrade - I have no numbers, but they were completely bad. Thus, it seems that the Xcode update has improved the generation of GHC code for Ackermann.

Now the C and GHC versions are pretty close. C code:

 int ack(int m,int n){ if(m==0) return n+1; if(n==0) return ack(m-1,1); return ack(m-1, ack(m,n-1)); } 

Ack runtime (4.1):

 GCC 4.8.0: 2.94s Clang 4.1: 4s 

Haskell Code:

 ack :: Int -> Int -> Int ack 0 n = n+1 ack m 0 = ack (m-1) 1 ack mn = ack (m-1) (ack m (n-1)) 

Runtime ack 4 1 (s + RTS -kc1M):

 GHC 7.6.1 Native: 3.191s GHC 7.6.1 LLVM: 3.8s 

All were compiled with the -O2 flag (and -rtsopts for GHC to bypass the RTS error). However, this is quite a head. The Xcode update seems to have had a big impact on Ackermann's optimization in the GHC.

+3
Apr 27 '13 at 3:05
source share



All Articles