How Haskell Garbage Collector Effectively Collects Trees

Question

How Haskell Garbage Collector Effectively Collects Trees

This code from the answer to this question , copied below, pretty nicely occupies only the O(n) space to make the depth of the first traversal of a tree of depth n that contains O(2^n) nodes. This is very good, the garbage collector seems to be doing a good job of cleaning up an already processed tree.

But my question I have is how? Unlike the list, when we process the first element, we can completely forget it, we cannot refuse the root of the node after processing the first leaf of the node. We need to wait until the left half of the tree is processed (because in the end we will have to cross to the right of the root). In addition, since the root of the node points to the nodes below it, etc., down to the leaves, which apparently means that we will not be able to collect any of the first half of the tree until we start from the second half (since all these nodes will have links to them, starting from the remaining living root of the node). This, fortunately, is not the case, but can someone explain how?

 import Data.List (foldl') data Tree = Tree Int Tree Tree tree n = Tree n (tree (2 * n)) (tree (2 * n + 1)) treeOne = tree 1 depthNTree nt = go nt [] where go 0 (Tree x _ _) = (x:) go n (Tree _ lr) = go (n - 1) l . go (n - 1) r main = do x <- getLine print . foldl' (+) 0 . filter (\x -> x `rem` 5 == 0) $ depthNTree (read x) treeOne

+5

garbage-collection haskell

Clinton Dec 23 '15 at 10:37

source share

3 answers

In fact, you do not hold on to the root when you lower the left subtree.

 go n (Tree _ lr) = go (n - 1) l . go (n - 1) r

Thus, the root turns into two tricks, composed together. One contains a link to the left subtree, the other contains a link to the right subtree. The root node itself is garbage.

The left and right subtrees themselves are just thunks, because the tree is created lazily, so they do not consume much space yet.

We evaluate only go n (Tree _ lr) because we evaluate depthNTree nt , which is go nt [] . Thus, we immediately force the two created go calls to simply turn the root into:

 (go (n - 1) l . go (n - 1) r) [] = (go (n - 1) l) ((go (n - 1) r) [])

And since this is lazily evaluated, we first make an external call, leaving ((go (n - 1) r) []) as thunk (and therefore do not generate more r ).

The recursion in go will force l , so we will create more of this. But then we do the same thing again one level down; again, that the node tree becomes garbage immediately, we generate two tricks, holding the left and right sub-trees, and then we force only the left one.

After n calls, we will evaluate go 0 (Tree x _ _) = (x:) . We created pairs of ten n and forcibly left n left, leaving the necessary in memory; because the right subtrees are unvalued thunks, they are constant space each, and there are only n of them, therefore only O(n) space. And all the nodes of the tree leading to this path are now not found.

In fact, we have an external list constructor (and the first element of the list). Forcing more of this list will examine these regular subtrees expanding along the composition chain, but there will be no more than n .

Technically, you linked the link to tree 1 in the treeOne with global reach, so in fact you could keep the link to every node that you ever produced, so you rely on GHC to treeOne that treeOne only ever used once and should not be saved.

+4

Ben Dec 23 '15 at 12:11

source share

We rewrite the recursive case go as

 go nt = case t of Tree _ lr -> go (n - 1) l . go (n - 1) r

On the right side of the case alternative, the source tree t no longer lives. Only l and r are alive. So, if we first go back to l , say, nothing holds the left side of the tree except l ; r precisely keeps the right side of the tree alive.

At any point in the recursion, live nodes are exactly the roots of subtrees cut off along the path from the original root of the tree to the node being checked, which have not yet been processed. There is no more than the path length of these subtrees, therefore the use of the space is O (n).

The key is that the original tree t becomes dead before we recurs. If you write (denotationally equivalent, but bad style for a number of reasons)

 leftChild (Tree _ lr) = l rightChild (Tree _ lr) = r go nt = go (n - 1) (leftChild t) . go (n - 1) (rightChild t)

now that the recursion is in go (n - 1) (leftChild t) , the reference to t still stored in the unvalued expression rightChild t t . Consequently, the use of space is now exponential.

+3

Reid barton Dec 23 '15 at 12:14

source share

András kovács · Accepted Answer · 2015-12-23T12:22:06+0000

I wrote a small manual tree estimate for depth 2. I hope it can illustrate why the tree nodes can be garbage collected along the way.

Suppose we start with a tree as follows:

 tree = Tree (Tree _ -- l (Tree a _ _) -- ll (Tree b _ _)) -- lr (Tree _ -- r (Tree c _ _) -- rl (Tree d _ _)) -- rr

Now call depthNTree 2 tree :

 go 2 tree [] go 2 (Tree _ lr) [] go 1 l (go 1 r []) go 1 (Tree _ ll lr) (go 1 r []) go 0 ll (go 0 lr (go 1 r [])) go 0 (Tree a _ _) (go 0 lr (go 1 r [])) a : go 0 lr (go 1 r []) -- gc can collect ll a : go 0 (Tree b _ _) (go 1 r []) a : b : go 1 r [] -- gc can collect lr and thus l a : b : go 1 (Tree _ rl rr) [] a : b : go 0 rl (go 0 rr []) a : b : go 0 (Tree c _ _) (go 0 rr []) a : b : c : go 0 rr [] -- gc can collect rl a : b : c : go 0 (Tree d _ _) [] a : b : c : d : [] -- gc can collect rr and thus r and tree

Note that since treeOne is a static value, there must be some additional hardware behind the scenes that allows garbage collection. Fortunately, the GHC supports GC static values.

How Haskell Garbage Collector Effectively Collects Trees

More articles: