whippet lab notebook: on untagged mallocs

Salutations, populations. Today’s note is more of a work-in-progress than usual; I have been finally starting to look at getting Whippet into Guile, and there are some open questions.

inventory

I started by taking a look at how Guile uses the Boehm-Demers-Weiser collector‘s API, to make sure I had all my bases covered for an eventual switch to something that was not BDW. I think I have a good overview now, and have divided the parts of BDW-GC used by Guile into seven categories.

implicit uses

Firstly there are the ways in which Guile’s run-time and compiler depend on BDW-GC’s behavior, without actually using BDW-GC’s API. By this I mean principally that we assume that any reference to a GC-managed object from any thread’s stack will keep that object alive. The same goes for references originating in global variables, or static data segments more generally. Additionally, we rely on GC objects not to move: references to GC-managed objects in registers or stacks are valid across a GC boundary, even if those references are outside the GC-traced graph: all objects are pinned.

Some of these “uses” are internal to Guile’s implementation itself, and thus amenable to being changed, albeit with some effort. However some escape into the wild via Guile’s API, or, as in this case, as implicit behaviors; these are hard to change or evolve, which is why I am putting my hopes on Whippet’s mostly-marking collector, which allows for conservative roots.

defensive uses

Then there are the uses of BDW-GC’s API, not to accomplish a task, but to protect the mutator from the collector: GC_call_with_alloc_lock, explicitly enabling or disabling GC, calls to sigmask that take BDW-GC’s use of POSIX signals into account, and so on. BDW-GC can stop any thread at any time, between any two instructions; for most users is anodyne, but if ever you use weak references, things start to get really gnarly.

Of course a new collector would have its own constraints, but switching to cooperative instead of pre-emptive safepoints would be a welcome relief from this mess. On the other hand, we will require client code to explicitly mark their threads as inactive during calls in more cases, to ensure that all threads can promptly reach safepoints at all times. Swings and roundabouts?

precise tracing

Did you know that the Boehm collector allows for precise tracing? It does! It’s slow and truly gnarly, but when you need precision, precise tracing nice to have. (This is the GC_new_kind interface.) Guile uses it to mark Scheme stacks, allowing it to avoid treating unboxed locals as roots. When it loads compiled files, Guile also adds some sliced of the mapped files to the root set. These interfaces will need to change a bit in a switch to Whippet but are ultimately internal, so that’s fine.

What is not fine is that Guile allows C users to hook into precise tracing, notably via scm_smob_set_mark. This is not only the wrong interface, not allowing for copying collection, but these functions are just truly gnarly. I don’t know know what to do with them yet; are our external users ready to forgo this interface entirely? We have been working on them over time, but I am not sure.

reachability

Weak references, weak maps of various kinds: the implementation of these in terms of BDW’s API is incredibly gnarly and ultimately unsatisfying. We will be able to replace all of these with ephemerons and tables of ephemerons, which are natively supported by Whippet. The same goes with finalizers.

The same goes for constructs built on top of finalizers, such as guardians; we’ll get to reimplement these on top of nice Whippet-supplied primitives. Whippet allows for resuscitation of finalized objects, so all is good here.

misc

There is a long list of miscellanea: the interfaces to explicitly trigger GC, to get statistics, to control the number of marker threads, to initialize the GC; these will change, but all uses are internal, making it not a terribly big deal.

I should mention one API concern, which is that BDW’s state is all implicit. For example, when you go to allocate, you don’t pass the API a handle which you have obtained for your thread, and which might hold some thread-local freelists; BDW will instead load thread-local variables in its API. That’s not as efficient as it could be and Whippet goes the explicit route, so there is some additional plumbing to do.

Finally I should mention the true miscellaneous BDW-GC function: GC_free. Guile exposes it via an API, scm_gc_free. It was already vestigial and we should just remove it, as it has no sensible semantics or implementation.

allocation

That brings me to what I wanted to write about today, but am going to have to finish tomorrow: the actual allocation routines. BDW-GC provides two, essentially: GC_malloc and GC_malloc_atomic. The difference is that “atomic” allocations don’t refer to other GC-managed objects, and as such are well-suited to raw data. Otherwise you can think of atomic allocations as a pure optimization, given that BDW-GC mostly traces conservatively anyway.

From the perspective of a user of BDW-GC looking to switch away, there are two broad categories of allocations, tagged and untagged.

Tagged objects have attached metadata bits allowing their type to be inspected by the user later on. This is the happy path! We’ll be able to write a gc_trace_object function that takes any object, does a switch on, say, some bits in the first word, dispatching to type-specific tracing code. As long as the object is sufficiently initialized by the time the next safepoint comes around, we’re good, and given cooperative safepoints, the compiler should be able to ensure this invariant.

Then there are untagged allocations. Generally speaking, these are of two kinds: temporary and auxiliary. An example of a temporary allocation would be growable storage used by a C run-time routine, perhaps as an unbounded-sized alternative to alloca. Guile uses these a fair amount, as they compose well with non-local control flow as occurring for example in exception handling.

An auxiliary allocation on the other hand might be a data structure only referred to by the internals of a tagged object, but which itself never escapes to Scheme, so you never need to inquire about its type; it’s convenient to have the lifetimes of these values managed by the GC, and when desired to have the GC automatically trace their contents. Some of these should just be folded into the allocations of the tagged objects themselves, to avoid pointer-chasing. Others are harder to change, notably for mutable objects. And the trouble is that for external users of scm_gc_malloc, I fear that we won’t be able to migrate them over, as we don’t know whether they are making tagged mallocs or not.

what is to be done?

One conventional way to handle untagged allocations is to manage to fit your data into other tagged data structures; V8 does this in many places with instances of FixedArray, for example, and Guile should do more of this. Otherwise, you make new tagged data types. In either case, all auxiliary data should be tagged.

I think there may be an alternative, which would be just to support the equivalent of untagged GC_malloc and GC_malloc_atomic; but for that, I am out of time today, so type at y’all tomorrow. Happy hacking!

No responses