28 November 2022 9:11 PM (mark functions | bdw | garbage collection | ephemerons | finalizers | finalization | guardians | weak references | js | guile | scheme | igalia)
Good evening :) A quick note, tonight: I've long thought that ephemerons are primitive and can't be implemented with mark functions and/or finalizers, but today I think I have a counterexample.
For context, one of the goals of the GC implementation I have been working on on is to replace Guile's current use of the Boehm-Demers-Weiser (BDW) conservative collector. Of course, changing a garbage collector for a production language runtime is risky, and for Guile one of the mitigation strategies for this work is that the new collector is behind an abstract API whose implementation can be chosen at compile-time, without requiring changes to user code. That way we can first switch to BDW-implementing-the-new-GC-API, then switch the implementation behind that API to something else.
Abstracting GC is a tricky problem to get right, and I thank the MMTk project for showing that this is possible -- you have user-facing APIs that need to be implemented by concrete collectors, but also extension points so that the user can provide some compile-time configuration too, for example to provide field-tracing visitors that take into account how a user wants to lay out objects.
Anyway. As we discussed last time, ephemerons are usually have explicit support from the GC, so we need an ephemeron abstraction as part of the abstract GC API. The question is, can BDW-GC provide an implementation of this API?
I think the answer is "yes, but it's very gnarly and will kill performance so bad that you won't want to do it."
Consider that the primitives that you get with BDW-GC are custom mark functions, run on objects when they are found to be live by the mark workers; disappearing links, a kind of weak reference; and finalizers, which receive the object being finalized, can allocate, and indeed can resurrect the object.
BDW-GC's finalizers are a powerful primitive, but not one that is useful for implementing the "conjunction" aspect of ephemerons, as they cannot constrain the marker's idea of graph connectivity: a finalizer can only prolong the life of an object subgraph, not cut it short. So let's put finalizers aside.
Weak references have a tantalizingly close kind of conjunction property: if the weak reference itself is alive, and the referent is also otherwise reachable, then the weak reference can be dereferenced. However this primitive only involves the two objects E and K; there's no way to then condition traceability of a third object V to E and K.
We are left with mark functions. These are an extraordinarily powerful interface in BDW-GC, but somewhat expensive also: not inlined, and going against the grain of what BDW-GC is really about (heaps in which the majority of all references are conservative). But, OK. They way they work is, your program allocates a number of GC "kinds", and associates mark functions with those kinds. Then when you allocate objects, you use those kinds. BDW-GC will call your mark functions when tracing an object of those kinds.
Let's assume firstly that you have a kind for ephemerons; then when you go to mark an ephemeron E, you mark the value V only if the key K has been marked. Problem solved, right? Only halfway: you also have to handle the case in which E is marked first, then K. So you publish E to a global hash table, and... well. You would mark V when you mark a K for which there is a published E. But, for that you need a hook into marking V, and V can be any object...
So now we assume additionally that all objects are allocated with user-provided custom mark functions, and that all mark functions check if the marked object is in the published table of pending ephemerons, and if so marks values. This is essentially what a proper ephemeron implementation would do, though there are some optimizations one can do to avoid checking the table for each object before the mark stack runs empty for the first time. In this case, yes you can do it! Additionally if you register disappearing links for the K field in each E, you can know if an ephemeron E was marked dead in a previous collection. Add a pre-mark hook (something BDW-GC provides) to clear the pending ephemeron table, and you are in business.
yes, but no
So, it is possible to implement ephemerons with just custom mark functions. I wouldn't want to do it, though: missing the mostly-avoid-pending-ephemeron-check optimization would be devastating, and really what you want is support in the GC implementation. I think that for the BDW-GC implementation in whippet I'll just implement weak-key associations, in which the value is always marked strongly unless the key was dead on a previous collection, using disappearing links on the key field. That way a (possibly indirect) reference from a value V to a key K can indeed keep K alive, but oh well: it's a conservative approximation of what should happen, and not worse than what Guile has currently.
Good night and happy hacking!