Why does the slower version of splitAt work with a lazy template?

Question

Why does the slower version of splitAt work with a lazy template?

The splitAt function can be implemented as follows ( https://wiki.haskell.org/Lazy_pattern_match ):

 import Prelude hiding (splitAt) splitAt :: Int -> [a] -> ([a], [a]) splitAt n xs = if n<=0 then ([], xs) else case xs of [] -> ([], []) y:ys -> case splitAt (n-1) ys of ~(prefix, suffix) -> (y : prefix, suffix) -- Here using lazy pattern match main :: IO () main = do let r = splitAt 1000000 $ repeat 'a' print $ length $ fst r

And using strict pattern matching can slow down a lot.

 time ./lazy -- 0.04s user 0.00s system 90% cpu 0.047 total time ./strict -- 0.12s user 0.02s system 96% cpu 0.147 total

I can not understand the reason. According to the article, a strict version may require more memory and all recursive calls to check if the template is suitable. But I think that the lazy version also needs all recursive calls and needs memory to contain the result of recursive function calls. What makes these differences?

+6

pattern-matching recursion haskell

hliu Feb 10 '17 at 2:42

source share

1 answer

Carl · Accepted Answer · 2017-02-10T03:36:44+0000

There are many differences.

Look at the difference between options with and without ~ on line 11.

Evaluation at GHC Haskell is done by pattern matching. When a pattern is matched in a case or LHS expression of a function definition, this requires the constructors in the pattern to be evaluated. (The patterns in let and where the bindings are treated as lazy pattern matches.) Thus, this means that the splitAt 1000000 (repeat 'a') score of splitAt 1000000 (repeat 'a') depends on the constructor matching (,) resulting from the splitAt 999999 ... recursive call splitAt 999999 ... and t .d., Until the final call splitAt 0 ... This requires stack space for evaluation. In fact, this is a lot. It is very possible that the stack needs to be grown several times to avoid failures.

In addition, the entire line of the result "aaaaa..." is built on the heap until this happens before length ever starts to process it. (Due to the optimization in repeat second element of the result is actually a circular list that never highlights anything new in the entire recursive evaluation.)

When a pattern match becomes lazy, everything changes. The return value from splitAt 1000000 (repeat 'a') is evaluated as ('a':_thunk1, _thunk2) without making a splitAt recursive call. This is a pattern known as guarded shipbuilding. Further evaluation is hidden behind data constructors such as (,) and (:) , and therefore is only performed if another case expression is required.

Calling fst cancels _thunk2 , so it will never be evaluated. The length call begins by consuming the first constructor (:) , throwing the value of 'a' , and then making a recursive call to _thunk1 . At the moment, nothing in memory still refers to the constructor (:) , so the garbage collector can return it when it works further. (The value of 'a' is common, so there are still pointers for it, so it does not receive or stand out during this process.)

What happens when _thunk1 is evaluated a little subtly. This makes the recursive call splitAt 999999 ... This results in ('a':_thunk3, _thunk4) . Nothing holds _thunk4 , so it can be collected whenever it _thunk4 trash. The length evaluation continues as described above. The constructor (:) no longer stored in memory and can be built for free.

Evaluation continues this way, only ever holding onto one constructor (:) on the heap at a time and without writing any stack space at all. GHC garbage collector runtime depends on the size of the resident set. Since in this case there is no more than one constructor constructor (:) , in this case it goes very quickly.

In this case, I suspect that you see the difference in speed. You should try to run two programs with the argument +RTS -s and look at the statistics regarding the maximum resident size and garbage collector time.

Perhaps the GHC is really smart at optimizing. I have not tested it, but in some cases I know that it can rewrite algorithms in terms of an explicit application (:) in terms of build . If this were done, it would allow the merge fold / build merge to completely remove the constructors (:) . (Yes, length is defined in terms of foldr through some really cool tricks for efficiency, mainly in order to work with foldr / build merging.)

If in this case you find an even smaller distribution in the lazy case, but a strict case will be just as bad. I do not think this can happen here, but I am not sure, as I have not tested.

Why does the slower version of splitAt work with a lazy template?

More articles: