GHC RTS garbage collection options

Question

GHC RTS garbage collection options

I have a Haskell program that processes a text file and creates a Map (with several million elements). All this can last 2-3 minutes. I found that setting the -H and -A options makes a big difference in runtime.

There is documentation about this RTS functionality, but it’s hard to read for me, since I don’t know the algorithms and terms from GK Theory. I am looking for a less technical explanation, preferably for Haskell / GHC. Are there any links to the selection of reasonable values for these parameters?

EDIT: What code, it builds a trie for a given list of words.

 buildTrie :: [B.ByteString] -> MyDFA buildTrie l = fst3 $ foldl' step (emptyDFA, B.empty, 1) $ sort $ map B.reverse l where step :: (MyDFA , B.ByteString, Int) -> B.ByteString -> (MyDFA , B.ByteString, Int) step (dfa, lastWord, newIndex) newWord = (insertNewStates, newWord, newIndex + B.length newSuffix) where (pref, lastSuffix, newSuffix) = splitPrefix lastWord newWord branchPoint = transStar dfa pref --new state labels for the newSuffix path newStates = [newIndex .. newIndex + B.length newSuffix - 1] --insert newStates insertNewStates = (foldl' (flip insertTransition) dfa $ zip3 (branchPoint:init newStates) (B.unpack newSuffix) newStates)

+37

performance garbage-collection haskell ghc

Daniel Velkov Jul 03 '10 at 15:26

source share

2 answers

It is probably possible to reproduce the problem for small data sets where it will be easier to see what is happening. In particular, I suggest familiarizing yourself with profiling:

the chapter profiling section in the GHC manual documents the available features

Then check to see if the memory profiles match your expectations (you don't need to know about all of the profiling options to get useful graphs). Combining a strict foldl' with a non-strict tuple as a battery would be the first thing I would look at: if the components of the tuple are not forced, this recursion creates unbalanced expressions.

Btw, you can create a good intuition about such things by trying to evaluate your code manually for really tiny datasets. Just a few iterations are enough to see if the expressions will be evaluated or remain unappreciated according to your application.

+8

claus Jul 03 '10 at 21:47

source share

Simon Marlow · Accepted Answer · 2010-07-03 19:56

Generally speaking, garbage collection is a compromise between space and time. Give GC more space and it will take less time. There are (many) other factors in the game, in particular cache, but the trade-off between space and time is the most important.

The compromise works as follows: the program allocates memory until it reaches a certain limit (determined by the automatic GC settings or explicitly through the RTS parameters). When the limit is reached, the GC keeps track of all the data that the program currently uses and restores all the memory used by the data that is no longer needed. The longer you can delay this process, the more data will become inaccessible ("dead") in the meantime, so the GC avoids tracking this data. The only way to defer GC is to make more memory available for distribution; therefore, more memory equals less GC, equal to lower GC flow. Roughly speaking, the GHC -H option allows you to set the lower limit of the amount of memory used by the GC, so it can reduce the overhead of the GC.

The GHC uses the GC of the generation generation, which is an optimization of the basic scheme in which the heap is divided into two or more generations. Objects stand out in the "young" generation, and those that live long enough fall into the "old" generation (in the setting of 2 generations). The younger generation is going much more often than the old generation, the idea is that “most objects die young”, therefore the collections of the young generation are cheap (they do not spend a lot of data), but they recover a lot of memory. Roughly speaking, the -A option sets the size of the young generation - that is, the amount of memory that will be allocated before the young generation is collected.

The default value for -A is 512k: it is recommended to keep the younger generation smaller than the L2 cache, and performance tends to drop if you exceed the size of the L2 cache. But working in the opposite direction is a compromise between GC space / time: using the large size of the younger generation can outweigh the benefits of the cache, reducing the amount of work that the GC has to do. This does not always happen, it depends on the dynamics of the application, which makes it difficult to automatically configure the GC. The -H option also increases the size of the younger generation, therefore, it can also negatively affect the use of the cache.

The bottom line is: play with the options and see what works. If you have a lot of memory, you can get a performance boost by using either -A or -H, but not necessarily.

GHC RTS garbage collection options

More articles: