Mixing tags and sweeps with a link

I am developing a very simple compiler (such as academic research), and I am considering introducing a simple GC count, as well as tags and scans. The idea is that reference counting can release dead objects very early when there are no loops, and this leaves the following idea: Mark and sweep are usually associated with some pauses due to the fact that the marking and sweep process must go through many links, therefore : Shouldn't recounting previuos mark and speed up (fewer items)? Is this idea nonsense? I had never realized such a complicated thing before, and I don’t want to work hard to find that it was a very bad idea.

+4
source share
3 answers

If you are developing a new GC language environment, the deterministic semantics of resource collection / deletion should be constructed rather than locked up as an afterthought (as is the case with, for example, IDisposable ). However, reference counting should not be considered part of the GC because maintaining absolutely reliable reference counts in a multi-threaded environment is expensive and it is absolutely necessary to avoid any possibility of reusing memory while the reference exists, even if the reference counter erroneously reports that the object is no longer needed. Note that prematurely freeing a resource is nowhere as bad as prematurely freeing memory, because the code that frees a resource can invalidate an object that encapsulates it. Even if the object is dead, any references to it will be valid links to the identified object. On the contrary, if the memory was recycled when the link still exists, the link itself will become invalid.

What I would suggest you take a look at this, if you want to maximize the performance of GC in a simple system, it will provide good support for safely freezing links stored in objects [for example. that fields marked readonly can only be written to the constructor before the object is frozen, and the constructor must freeze the object before exposing it to external code]. The β€œmark” or β€œcopy” phase of the GC should examine every object that might have new links written for it since the last GC cycle. If the GC is an agnostic for which objects are mutable, it must either examine everything or use the write functions to set a flag the first time the object changes after the GC survives. However, if an object contains only immutable links, then any objects to which it contains links must be older than the object itself [the lifetime of a frozen object begins when it is "frozen" - note that the reference to the object will not exist outside the constructor at this point]. Therefore, objects that do not contain any mutable references can be completely ignored by the garbage collector when executing collections of a higher generation.

+1
source

At first I intuitively agreed, but I came to the conclusion that this intuition is wrong. Although ref-counting will remove some garbage in front of the label and deploy GC (I will call it msgc for short), this does not really help msgc performance. The marking phase never looks at the trash, so early removal of trash using refcounting does not speed up the marking. I'm not too sure about this phase, as it depends on how you implement it, but I can present several strategies that do not depend on the amount of garbage sufficient to make it worthwhile.

Given the added complexity, this is probably not worth it. In any case, you will not get too much performance from plain msgc, and the added cost of refcounts (a larger object title, slower assignment, etc.) will reduce the gain, even if there is one.

+2
source

delnan is correct that using ref counting in addition to marking and sweep will not speed up a significant mark and sweep speed, but it can still serve an important purpose.

If almost all objects are destroyed (using the ref count) as soon as they get out of scope, then your interpreter will not collect almost as much memory, in which case the GC will not need to be called as often.

Link counting takes place at a constant time, and the performance hit occurs in such small pieces that the hit is really not even comparable with the label and sweep algorithm. This is a compromise, but from the point of view of users, microscopic pauses occurring every second may be better than sudden, arbitrary pauses that persist for a considerable period of time. Naturally, it depends on the application.

This is the rationale for combining CPython between the two methods; a circular garbage collector rarely (if ever) should be called because if the program does not have or only a few cycles, most of the memory will be freed up on the fly.

I can tell you from experience, however, implementing and supporting even a simple interpreter that relies on reference counting can be a huge pain (especially if you use simple C).

+1
source

Source: https://habr.com/ru/post/1479578/


All Articles