Generally speaking, garbage collection is a compromise between space and time. Give GC more space and it will take less time. There are (many) other factors in the game, in particular cache, but the trade-off between space and time is the most important.
The compromise works as follows: the program allocates memory until it reaches a certain limit (determined by the automatic GC settings or explicitly through the RTS parameters). When the limit is reached, the GC keeps track of all the data that the program currently uses and restores all the memory used by the data that is no longer needed. The longer you can delay this process, the more data will become inaccessible ("dead") in the meantime, so the GC avoids tracking this data. The only way to defer GC is to make more memory available for distribution; therefore, more memory equals less GC, equal to lower GC flow. Roughly speaking, the GHC -H option allows you to set the lower limit of the amount of memory used by the GC, so it can reduce the overhead of the GC.
The GHC uses the GC of the generation generation, which is an optimization of the basic scheme in which the heap is divided into two or more generations. Objects stand out in the "young" generation, and those that live long enough fall into the "old" generation (in the setting of 2 generations). The younger generation is going much more often than the old generation, the idea is that βmost objects die youngβ, therefore the collections of the young generation are cheap (they do not spend a lot of data), but they recover a lot of memory. Roughly speaking, the -A option sets the size of the young generation - that is, the amount of memory that will be allocated before the young generation is collected.
The default value for -A is 512k: it is recommended to keep the younger generation smaller than the L2 cache, and performance tends to drop if you exceed the size of the L2 cache. But working in the opposite direction is a compromise between GC space / time: using the large size of the younger generation can outweigh the benefits of the cache, reducing the amount of work that the GC has to do. This does not always happen, it depends on the dynamics of the application, which makes it difficult to automatically configure the GC. The -H option also increases the size of the younger generation, therefore, it can also negatively affect the use of the cache.
The bottom line is: play with the options and see what works. If you have a lot of memory, you can get a performance boost by using either -A or -H, but not necessarily.
Simon Marlow Jul 03 '10 at 19:56 2010-07-03 19:56
source share