There are many differences.
Look at the difference between options with and without ~
on line 11.
Evaluation at GHC Haskell is done by pattern matching. When a pattern is matched in a case or LHS expression of a function definition, this requires the constructors in the pattern to be evaluated. (The patterns in let and where the bindings are treated as lazy pattern matches.) Thus, this means that the splitAt 1000000 (repeat 'a')
score of splitAt 1000000 (repeat 'a')
depends on the constructor matching (,)
resulting from the splitAt 999999 ...
recursive call splitAt 999999 ...
and t .d., Until the final call splitAt 0 ...
This requires stack space for evaluation. In fact, this is a lot. It is very possible that the stack needs to be grown several times to avoid failures.
In addition, the entire line of the result "aaaaa..."
is built on the heap until this happens before length
ever starts to process it. (Due to the optimization in repeat
second element of the result is actually a circular list that never highlights anything new in the entire recursive evaluation.)
When a pattern match becomes lazy, everything changes. The return value from splitAt 1000000 (repeat 'a')
is evaluated as ('a':_thunk1, _thunk2)
without making a splitAt
recursive call. This is a pattern known as guarded shipbuilding. Further evaluation is hidden behind data constructors such as (,)
and (:)
, and therefore is only performed if another case expression is required.
Calling fst
cancels _thunk2
, so it will never be evaluated. The length
call begins by consuming the first constructor (:)
, throwing the value of 'a'
, and then making a recursive call to _thunk1
. At the moment, nothing in memory still refers to the constructor (:)
, so the garbage collector can return it when it works further. (The value of 'a'
is common, so there are still pointers for it, so it does not receive or stand out during this process.)
What happens when _thunk1
is evaluated a little subtly. This makes the recursive call splitAt 999999 ...
This results in ('a':_thunk3, _thunk4)
. Nothing holds _thunk4
, so it can be collected whenever it _thunk4
trash. The length
evaluation continues as described above. The constructor (:)
no longer stored in memory and can be built for free.
Evaluation continues this way, only ever holding onto one constructor (:)
on the heap at a time and without writing any stack space at all. GHC garbage collector runtime depends on the size of the resident set. Since in this case there is no more than one constructor constructor (:)
, in this case it goes very quickly.
In this case, I suspect that you see the difference in speed. You should try to run two programs with the argument +RTS -s
and look at the statistics regarding the maximum resident size and garbage collector time.
Perhaps the GHC is really smart at optimizing. I have not tested it, but in some cases I know that it can rewrite algorithms in terms of an explicit application (:)
in terms of build
. If this were done, it would allow the merge fold / build merge to completely remove the constructors (:)
. (Yes, length
is defined in terms of foldr
through some really cool tricks for efficiency, mainly in order to work with foldr / build merging.)
If in this case you find an even smaller distribution in the lazy case, but a strict case will be just as bad. I do not think this can happen here, but I am not sure, as I have not tested.