Unfortunately, it is really impossible to make a truly platform-independent garbage collector in C. A strict reading of the C standard allows any type (except unsigned char ) to have bit-bit traps, which, when they are wrong, cause the system to signal exception (i.e. undefined behavior). When scanning selected blocks for pointers, you cannot determine whether a particular memory block contains the value of a legal pointer, or if it will be a trap as soon as you try to look at the value in it.
Checking pointers as int does not help either - an int type is not required for compatibility of the view with a pointer. intptr_t is only available on the latest compilers, and I don't think its presentation should be compatible. And ints can have traps.
You also do not know the requirements for aligning pointers. On a platform where pointers do not have alignment requirements (i.e., can start with any byte), this means that you need to stop at each byte, memcpy to the appropriate type of pointer and check the result. Oh, and different types of pointers may also have different representations, which is also inevitable.
But the big problem is finding the root set. Bohem GC and others typically scan the stack, as well as static data, for pointers that must go in the root set. This is impossible without knowledge of the OS memory layout . Thus, you will need to explicitly tell the user the root set element that hits the garbage collection point.
So, in short, you cannot make a GC in a truly portable C. In principle, you can make a few assumptions:
- Suppose the root set is explicitly provided to you by the user.
- Assume that there are no bit traps in pointers or views.
- Suppose that
intptr_t exists or that all void * are assumed to be strictly ordered (i.e., < and > work reasonably with pointers from different malloc ). - Suppose all types of data pointers have a compatible representation with
void * . - Optional, but gives a great speedup: Hardcode - alignment of pointers (this is far from universal and should be specific to the compiler and platform). This assumption will allow you to skip
memcpy ing pointers to a known -limited location, and also reduce the number of potential pointers to check.
If you make these assumptions, you can create a conservative scan marker. Use the binary tree to store information about where distributions are located, and view all possible aligned pointer locations in the highlighted blocks for pointers. However, the need to explicitly provide the root set will make all this pointless - it will be malloc and free again, except that for a specific undefined set of objects, you can skip it. Not exactly what the GC should provide, but I suppose it could be a place, for example, part of a virtual machine (in this case, the root set will be obtained from the information available to the virtual machine).
Please note that all this applies only to conservative GCs, i.e. those that work blindly, scanning pointers in the data, not knowing where it might be. If you work with a virtual machine, it is much simpler - you can create a single data type for all distributions by a virtual machine that explicitly lists where pointers can be found. With this plus an explicit set of roots, you can build a non-conservative GC; this should be enough to create a virtual machine or interpreter.