What is a simple garbage collection algorithm for experiments with a simple interpreter?

Question

What is a simple garbage collection algorithm for experiments with a simple interpreter?

I experimented with the design of a programming language and came to the conclusion that I need to implement a garbage collection system. Now the first thing that came to mind was link counting, but that would not handle link loops. Most of the pages I come across when searching for algorithms are links to setting up garbage collectors in existing languages such as Java. When I find anything describing specific algorithms, I don't get enough details to implement. For example, most descriptions include "when your program runs low on memory ...", which is unlikely to happen in the near future on a 4 GB system with lots of swaps. So I'm looking for some tutorials with good implementation details, such as how to configure when to start the garbage collector (i.e. Collect after X the number of memory allocations or every Y minutes, etc.).

To give a couple of details about what I'm trying to do, I start writing a stack-based script like Postscript, and the next attempt is likely to be an S-expression language based on one of the Lisp dialects. I am implementing a straightforward C. My goal is self-education and document the various stages in the tutorial "How to Design and Write a Translator".

Regarding what I have done so far, I have written a simple interpreter that implements an imperative C style language that is processed and processed by the stack style virtual machine (see lang2e.sourceforge.net). But this language does not allocate new memory when entering any function and does not have any types of pointer data, so at that time there was no need for any type of advanced memory management. For my next project, I am going to start by counting links for objects of type not a pointer (integers, strings, etc.), and then track any object of type a pointer (which can generate circular references) in a separate memory pool, then when the pool growing more than the distribution units X more than at the end of the previous garbage collection cycle, run the collector again.

My requirements are that it is not too inefficient, but is easily implemented and documented clearly (remember, I want to develop this in paper or book so that others can follow). The algorithm that I just got at the front is a three-color marking, but it seems that the generation collector will be a little better, but harder to document and understand. So I'm looking for some clear reference material (preferably online) that includes enough implementation details to get me started.

+6

garbage-collection algorithm interpreter

Derek pressnall May 18 '12 at 1:29

source share

2 answers

When to start the allocator is probably wide open - you could have a GC when the memory allocation would otherwise have failed, or you could have a GC every time the link was reset, or somewhere in between.

Waiting until you have a choice may mean that you will never be a GC if the code works pretty well. Or it can lead to a giant pause in your environment and completely destroy your response time or animation or sound.

Performing a full GC on each free() can lead to lower costs due to more operations, although as a result the whole system can run slower. You might be more predictable, but slower overall.

If you want to test this by artificially restricting memory, you can simply run with very restrictive resource constraints in place. Run ulimit -v 1024 , and each process created by this shell will have only one megabyte of memory to work with.

+1

sarnold May 18 '12 at 1:39

source share

Antti huima · Accepted Answer · 2012-05-18T06:01:42+0000

There's a great book on garbage collection. This is called garbage collection: automatic dynamic memory management algorithms, and that's great. I read it, so I do not recommend it just because you can find it on Google. Look at it here .

For simple prototyping, use the label and scroll, or any simple, non-obedient, non-incremental compact collector. Incremental collectors are only good if you need to provide a real-time response from your system. As long as your system can lag significantly at any given time, you do not need incremental. Generation collectors reduce average waste collection costs by adopting something about the life cycles of objects.

I implemented everything (generation / non-generation, incremental / non-incremental) and garbage collectors for debugging are quite difficult. Since you want to focus on designing your language, and perhaps not so much on debugging a more complex garbage collector, you can keep it simple. Most likely, I would go for a mark-and-scan.

When you use garbage collection, you don't need reference counting. Throw it.

What is a simple garbage collection algorithm for experiments with a simple interpreter?

More articles: