How to implement a platform-independent garbage collector?

Basically, I'm interested in writing a platformless garbage collector in C , perhaps using the mark-and-sweep algorithm or one of its common variations. Ideally, the interface would work in the following lines:

(1) gc_alloc() allocates memory

(2) gc_realloc() reallocates memory

(3) gc_run() starts the garbage collector.

I have already reviewed the libgc garbage collection libgc developed by Boehm et. al., but it is platform independent; it was just ported to many different systems. I would like to implement a garbage collector that does not contain system code. Speed ​​is not a big issue.

Any suggestions?

+4
source share
2 answers

Unfortunately, it is really impossible to make a truly platform-independent garbage collector in C. A strict reading of the C standard allows any type (except unsigned char ) to have bit-bit traps, which, when they are wrong, cause the system to signal exception (i.e. undefined behavior). When scanning selected blocks for pointers, you cannot determine whether a particular memory block contains the value of a legal pointer, or if it will be a trap as soon as you try to look at the value in it.

Checking pointers as int does not help either - an int type is not required for compatibility of the view with a pointer. intptr_t is only available on the latest compilers, and I don't think its presentation should be compatible. And ints can have traps.

You also do not know the requirements for aligning pointers. On a platform where pointers do not have alignment requirements (i.e., can start with any byte), this means that you need to stop at each byte, memcpy to the appropriate type of pointer and check the result. Oh, and different types of pointers may also have different representations, which is also inevitable.

But the big problem is finding the root set. Bohem GC and others typically scan the stack, as well as static data, for pointers that must go in the root set. This is impossible without knowledge of the OS memory layout . Thus, you will need to explicitly tell the user the root set element that hits the garbage collection point.

So, in short, you cannot make a GC in a truly portable C. In principle, you can make a few assumptions:

  • Suppose the root set is explicitly provided to you by the user.
  • Assume that there are no bit traps in pointers or views.
  • Suppose that intptr_t exists or that all void * are assumed to be strictly ordered (i.e., < and > work reasonably with pointers from different malloc ).
  • Suppose all types of data pointers have a compatible representation with void * .
  • Optional, but gives a great speedup: Hardcode - alignment of pointers (this is far from universal and should be specific to the compiler and platform). This assumption will allow you to skip memcpy ing pointers to a known -limited location, and also reduce the number of potential pointers to check.

If you make these assumptions, you can create a conservative scan marker. Use the binary tree to store information about where distributions are located, and view all possible aligned pointer locations in the highlighted blocks for pointers. However, the need to explicitly provide the root set will make all this pointless - it will be malloc and free again, except that for a specific undefined set of objects, you can skip it. Not exactly what the GC should provide, but I suppose it could be a place, for example, part of a virtual machine (in this case, the root set will be obtained from the information available to the virtual machine).

Please note that all this applies only to conservative GCs, i.e. those that work blindly, scanning pointers in the data, not knowing where it might be. If you work with a virtual machine, it is much simpler - you can create a single data type for all distributions by a virtual machine that explicitly lists where pointers can be found. With this plus an explicit set of roots, you can build a non-conservative GC; this should be enough to create a virtual machine or interpreter.

+10
source

For the mark and sweep algorithm, all you really need to do is figure out which objects are available from the root set, right? (It has been a long time since I dug into it ...)

This can be controlled by a separate object graph for objects managed by the GC, and β€œall” you will need to add functions to properly manage this graph whenever you select or modify managed objects. If you also add reference counting for managed objects, it will be easier for you to calculate which ones are available directly from stack links.

It is probably possible to write quite platform-independent, although it may be debatable whether this will be a real garbage collector.

A simple pseudo code to show what I mean by reference counting and graph control:

 some_object * pObject = gc_alloc(sizeof(some_object)); some_container * pContainer = gc_alloc(sizeof(some_container)); pContainer->pObject = pObject; /* let the GC know that pContainer has a reference to pObject */ gc_object_reference(pContainer, pObject); /* a new block of some kind */ { /* let the GC know we have a new reference for pObject */ some_object * pReference = gc_reference(pObject); /* do stuff */ ... /* let the GC know that this reference is no longer used */ gc_left_scope(pReference); } gc_left_scope(pObject); gc_run(); /* should not be able to recycle anything, since there still is a live * reference to pContainer, and that has a reference to pObject */ 
+1
source

Source: https://habr.com/ru/post/1339358/


All Articles