An instance of a poorly performing non-tail recursive function

Below are two functions defined to determine the maximum value of a list of numbers.

mx :: (Ord a) => [a] -> a mx [] = error "Empty list" mx [x] = x mx (x:xs) | x > (mx xs) = x | otherwise = (mx xs) mx' (x:xs) = findMax x xs where findMax cmx [] = cmx findMax cmx (x:xs) | x > cmx = findMax x xs | otherwise = findMax cmx xs main = do print $ mx [1..30] 

The timing of the above code, first for mx '(tail-recursive) and next for mx (non-tail-recursive), we have the following timings.

 Lenovo-IdeaPad-Y510P:/tmp$ time ./t 30 real 0m0.002s user 0m0.000s sys 0m0.001s Lenovo-IdeaPad-Y510P:/tmp$ ghc -O2 t.hs [1 of 1] Compiling Main ( t.hs, to ) Linking t ... Lenovo-IdeaPad-Y510P:/tmp$ time ./t 30 real 0m6.272s user 0m6.274s sys 0m0.000s 

Can someone explain why there is such a huge performance difference for a list of 30 items?

+6
source share
4 answers

As others have pointed out, the GHC does not perform Total Sub-Expression Cancellation (CSE), resulting in your first fragment to run exponentially.

To understand why, for example,

 test1 = length [1..1000] + sum [1..1000] test2 = let l = [1..1000] in length l + sum l 

Two examples are semantically equivalent, but test1 works in a constant space while test2 in linear space (all 1000 cells are selected). Basically, in this case, CSE denies the benefits of laziness.

Since CSE can lead to poor performance, GHC is quite conservative in use.

Further clarification on the GHC FAQ:

https://www.haskell.org/haskellwiki/GHC/FAQ#Does_GHC_do_common_subexpression_elimination.3F

+11
source

The problem is not so much in tail recursion, but in that in mx you generally calculate mx xs twice: once to compare it with x , and then a second time to return it, Each of these calls itself calls mx xs twice, which then does the same, etc., which leads to exponential runtime.

You can remove this problem by simply storing the result of the first call to use it a second time:

 mx :: (Ord a) => [a] -> a mx [] = error "Empty list" mx [x] = x mx (x:xs) = let mxxs = mx xs in if x > mxxs then x else mxxs 
+9
source

Your second algorithm is linear, and in the end it will make one pass through your list. In this case, your first algorithm has an exponential runtime (which is the worst). You end up essentially checking everything in the list to determine that the first element, 1, is not the maximum. Then you look at the second element, 2, and look at the rest of the list to find out that it is also not maximal.

If you run your program with mx , and values ​​like 22 , 23 , ..., 30 , you will see a clearly exponential increase in runtime.

In particular, this is not just a matter of tail recursion, and not, it is an inefficient recursive algorithm versus efficient. You can implement them in a language without tail recursion and still see mx' better performance over mx .

+5
source

A call to mx [1..3] leads to the following calls:

 mx [1..3] mx [2..3] -- x > (mx xs) in mx [1..3] mx [3] -- x > (mx xs) in mx [1..2] mx [3] -- otherwise = (mx xs) in mx [1..2] mx [2..3] -- otherwise = (mx xs) in mx [1..3] mx [3] -- x > (mx xs) in mx [1..2] mx [3] -- otherwise = (mx xs) in mx [1..2] 

The number mx gives an exact definition of [1..n] O(2^n) : 2^n - 1 .

mx' makes O(n) calls: n + 1 , exactly.

 mx' [1..3] findMax 1 [2, 3] findMax 2 [3] findMax 3 [] 

For n = 30 , as in your test, mx calls 1073741823 and mx' only 29 .

+3
source

Source: https://habr.com/ru/post/978184/


All Articles