wingologA mostly dorky weblog by Andy Wingo2023-01-24T10:48:49Ztekutihttps://wingolog.org/feed/atomAndy Wingohttps://wingolog.org/parallel ephemeron tracinghttps://wingolog.org/2023/01/24/parallel-ephemeron-tracing2023-01-24T10:48:49Z2023-01-24T10:48:49Z

Hello all, and happy new year. Today’s note continues the series on implementing ephemerons in a garbage collector.

In our last dispatch we looked at a serial algorithm to trace ephemerons. However, production garbage collectors are parallel: during collection, they trace the object graph using multiple worker threads. Our problem is to extend the ephemeron-tracing algorithm with support for multiple tracing threads, without introducing stalls or serial bottlenecks.

Recall that we ended up having to define a table of pending ephemerons:

struct gc_pending_ephemeron_table {
  struct gc_ephemeron *resolved;
  size_t nbuckets;
  struct gc_ephemeron *buckets[0];
};

This table holds pending ephemerons that have been visited by the graph tracer but whose keys haven’t been found yet, as well as a singly-linked list of resolved ephemerons that are waiting to have their values traced. As a global data structure, the pending ephemeron table is a point of contention between tracing threads that we need to design around.

a confession

Allow me to confess my sins: things would be a bit simpler if I didn’t allow tracing workers to race.

As background, if your GC supports marking in place instead of always evacuating, then there is a mark bit associated with each object. To reduce the overhead of contention, a common strategy is to actually use a whole byte for the mark bit, and to write to it using relaxed atomics (or even raw stores). This avoids the cost of a compare-and-swap, but at the cost that multiple marking threads might see that an object’s mark was unset, go to mark the object, and think that they were the thread that marked the object. As far as the mark byte goes, that’s OK because everybody is writing the same value. The object gets pushed on the to-be-traced grey object queues multiple times, but that’s OK too because tracing should be idempotent.

This is a common optimization for parallel marking, and it doesn’t have any significant impact on other parts of the GC–except ephemeron marking. For ephemerons, because the state transition isn’t simply from unmarked to marked, we need more coordination.

high level

The parallel ephemeron marking algorithm modifies the serial algorithm in just a few ways:

  1. We have an atomically-updated state field in the ephemeron, used to know if e.g. an ephemeron is pending or resolved;

  2. We use separate fields for the pending and resolved links, to allow for concurrent readers across a state change;

  3. We introduce “traced” and “claimed” states to resolve races between parallel tracers on the same ephemeron, and track the “epoch” at which an ephemeron was last traced;

  4. We remove resolved ephemerons from the pending ephemeron hash table lazily, and use atomic swaps to pop from the resolved ephemerons list;

  5. We have to re-check key liveness after publishing an ephemeron to the pending ephemeron table.

Regarding the first point, there are four possible values for the ephemeron’s state field:

enum {
  TRACED, CLAIMED, PENDING, RESOLVED
};

The state transition diagram looks like this:

  ,----->TRACED<-----.
 ,         | ^        .
,          v |         .
|        CLAIMED        |
|  ,-----/     \---.    |
|  v               v    |
PENDING--------->RESOLVED

With this information, we can start to flesh out the ephemeron object itself:

struct gc_ephemeron {
  uint8_t state;
  uint8_t is_dead;
  unsigned epoch;
  struct gc_ephemeron *pending;
  struct gc_ephemeron *resolved;
  void *key;
  void *value;
};

The state field holds one of the four state values; is_dead indicates if a live ephemeron was ever proven to have a dead key, or if the user explicitly killed the ephemeron; and epoch is the GC count at which the ephemeron was last traced. Ephemerons are born TRACED in the current GC epoch, and the collector is responsible for incrementing the current epoch before each collection.

algorithm: tracing ephemerons

When the collector first finds an ephemeron, it does a compare-and-swap (CAS) on the state from TRACED to CLAIMED. If that succeeds, we check the epoch; if it’s current, we revert to the TRACED state: there’s nothing to do.

(Without marking races, you wouldn’t need either TRACED or CLAIMED states, or the epoch; it would be implicit in the fact that the ephemeron was being traced at all that you had a TRACED ephemeron with an old epoch.)

So now we have a CLAIMED ephemeron with an out-of-date epoch. We update the epoch and clear the pending and resolved fields, setting them to NULL. If, then, the ephemeron is_dead, we are done, and we go back to TRACED.

Otherwise we check if the key has already been traced. If so we forward it (if evacuating) and then trace the value edge as well, and transition to TRACED.

Otherwise we have a live E but we don’t know about K; this ephemeron is pending. We transition E‘s state to PENDING and add it to the front of K’s hash bucket in the pending ephemerons table, using CAS to avoid locks.

We then have to re-check if K is live, after publishing E, to account for other threads racing to mark to K while we mark E; if indeed K is live, then we transition to RESOLVED and push E on the global resolved ephemeron list, using CAS, via the resolved link.

So far, so good: either the ephemeron is fully traced, or it’s pending and published, or (rarely) published-then-resolved and waiting to be traced.

algorithm: tracing objects

The annoying thing about tracing ephemerons is that it potentially impacts tracing of all objects: any object could be the key that resolves a pending ephemeron.

When we trace an object, we look it up in the pending ephemeron hash table. But, as we traverse the chains in a bucket, we also load each node’s state. If we find a node that’s not in the PENDING state, we atomically forward its predecessor to point to its successor. This is correct for concurrent readers because the end of the chain is always reachable: we only skip nodes that are not PENDING, nodes never become PENDING after they transition away from being PENDING, and we only add PENDING nodes to the front of the chain. We even leave the pending field in place, so that any concurrent reader of the chain can still find the tail, even when the ephemeron has gone on to be RESOLVED or even TRACED.

(I had thought I would need Tim Harris’ atomic list implementation, but it turns out that since I only ever insert items at the head, having annotated links is not necessary.)

If we find a PENDING ephemeron that has K as its key, then we CAS its state from PENDING to RESOLVED. If this works, we CAS it onto the front of the resolved list. (Note that we also have to forward the key at this point, for a moving GC; this was a bug in my original implementation.)

algorithm: resolved ephemerons

Periodically a thread tracing the graph will run out of objects to trace (its mark stack is empty). That’s a good time to check if there are resolved ephemerons to trace. We atomically exchange the global resolved list with NULL, and then if there were resolved ephemerons, then we trace their values and transition them to TRACED.

At the very end of the GC cycle, we sweep the pending ephemeron table, marking any ephemeron that’s still there as is_dead, transitioning them back to TRACED, clearing the buckets of the pending ephemeron table as we go.

nits

So that’s it. There are some drawbacks, for example that this solution takes at least three words per ephemeron. Oh well.

There is also an annoying point of serialization, which is related to the lazy ephemeron resolution optimization. Consider that checking the pending ephemeron table on every object visit is overhead; it would be nice to avoid this. So instead, we start in “lazy” mode, in which pending ephemerons are never resolved by marking; and then once the mark stack / grey object worklist fully empties, we sweep through the pending ephemeron table, checking each ephemeron’s key to see if it was visited in the end, and resolving those ephemerons; we then switch to “eager” mode in which each object visit could potentially resolve ephemerons. In this way the cost of ephemeron tracing is avoided for that part of the graph that is strongly reachable. However, with parallel markers, would you switch to eager mode when any thread runs out of objects to mark, or when all threads run out of objects? You would get greatest parallelism with the former, but you run the risk of some workers prematurely running out of data, but when there is still a significant part of the strongly-reachable graph to traverse. If you wait for all threads to be done, you introduce a serialization point. There is a related question of when to pump the resolved ephemerons list. But these are engineering details.

Speaking of details, there are some gnarly pitfalls, particularly that you have to be very careful about pre-visit versus post-visit object addresses; for a semi-space collector, visiting an object will move it, so for example in the pending ephemeron table which by definition is keyed by pre-visit (fromspace) object addresses, you need to be sure to trace the ephemeron key for any transition to RESOLVED, and there are a few places this happens (the re-check after publish, sweeping the table after transitioning from lazy to eager, and when resolving eagerly).

implementation

If you’ve read this far, you may be interested in the implementation; it’s only a few hundred lines long. It took me quite a while to whittle it down!

Ephemerons are challenging from a software engineering perspective, because they are logically a separate module, but they interact both with users of the GC and with the collector implementations. It’s tricky to find the abstractions that work for all GC algorithms, whether they mark in place or move their objects, and whether they mark the heap precisely or if there are some conservative edges. But if this is the sort of thing that interests you, voilà the API for users and the API to and from collector implementations.

And, that’s it! I am looking forward to climbing out of this GC hole, one blog at a time. There are just a few more features before I can seriously attack integrating this into Guile. Until the next time, happy hacking :)

Andy Wingohttps://wingolog.org/ephemeral successhttps://wingolog.org/2022/12/15/ephemeral-success2022-12-15T21:21:05Z2022-12-15T21:21:05Z

Good evening, patient hackers :) Today finishes off my series on implementing ephemerons in a garbage collector.

Last time, we had a working solution for ephemerons, but it involved recursively visiting any pending ephemerons from within the copy routine—the bit of a semi-space collector that is called when traversing the object graph and we see an object that we hadn’t seen yet. This recursive visit could itself recurse, and so we could overflow the control stack.

The solution, of course, is “don’t do that”: instead of visiting recursively, enqueue the ephemeron for visiting later. Iterate, don’t recurse. But here we run into a funny problem: how do we add an ephemeron to a queue or worklist? It’s such a pedestrian question (“just... enqueue it?”) but I think it illustrates some of the particular concerns of garbage collection hacking.

speak, memory

The issue is that we are in the land of “can’t use my tools because I broke my tools with my tools”. You can’t make a standard List<T> because you can’t allocate list nodes inside the tracing routine: if you had memory in which you could allocate, you wouldn’t be calling the garbage collector.

If the collector needs a data structure whose size doesn’t depend on the connectivity of the object graph, you can pre-allocate it in a reserved part of the heap. This adds memory overhead, of course; for a 1000 MB heap, say, you used to be able to make graphs 500 MB in size (for a semi-space collector), but now you can only do 475 MB because you have to reserve 50 MB (say) for your data structures. Another way to look at it is, if you have a 400 MB live set and then you allocate 2GB of garbage, if your heap limit is 500 MB you will collect 20 times, but if it’s 475 MB you’ll collect 26 times, which is more expensive. This is part of why GC algorithms are so primitive; implementors have to be stingy that we don’t get to have nice things / data structures.

However in the case of ephemerons, we will potentially need one worklist entry per ephemeron in the object graph. There is no optimal fixed size for a worklist of ephemerons. Most object graphs will have no or few ephemerons. Some, though, will have practically the whole heap.

For data structure needs like this, the standard solution is to reserve the needed space for a GC-managed data structure in the object itself. For example, for concurrent copying collectors, the GC might reserve a word in the object for a forwarding pointer, instead of just clobbering the first word. If you needed a GC-managed binary tree for a specific kind of object, you’d reserve two words. Again there are strong pressures to minimize this overhead, but in the case of ephemerons it seems sensible to make them pay their way on a per-ephemeron basis.

so let’s retake the thing

So sometimes we might need to put an ephemeron in a worklist. Let’s add a member to the ephemeron structure:

struct gc_ephemeron {
  struct gc_obj header;
  int dead;
  struct gc_obj *key;
  struct gc_obj *value;
  struct gc_ephemeron *gc_link; // *
};

Incidentally this also solves the problem of how to represent the struct gc_pending_ephemeron_table; just reserve 0.5% of the heap or so as a bucket array for a buckets-and-chains hash table, and use the gc_link as the intrachain links.

struct gc_pending_ephemeron_table {
  struct gc_ephemeron *resolved;
  size_t nbuckets;
  struct gc_ephemeron buckets[0];
};

An ephemeron can end up in three states, then:

  1. Outside a collection: gc_link can be whatever.

  2. In a collection, the ephemeron is in the pending ephemeron table: gc_link is part of a hash table.

  3. In a collection, the ephemeron’s key has been visited, and the ephemeron is on the to-visit worklist; gc_link is part of the resolved singly-linked list.

Instead of phrasing the interface to ephemerons in terms of visiting edges in the graph, the verb is to resolve ephemerons. Resolving an ephemeron adds it to a worklist instead of immediately visiting any edge.

struct gc_ephemeron **
pending_ephemeron_bucket(struct gc_pending_ephemeron_table *table,
                         struct gc_obj *key) {
  return &table->buckets[hash_pointer(obj) % table->nbuckets];
}

void add_pending_ephemeron(struct gc_pending_ephemeron_table *table,
                           struct gc_obj *key,
                           struct gc_ephemeron *ephemeron) {
  struct gc_ephemeron **bucket = pending_ephemeron_bucket(table, key);
  ephemeron->gc_link = *bucket;
  *bucket = ephemeron;
}

void resolve_pending_ephemerons(struct gc_pending_ephemeron_table *table,
                                struct gc_obj *obj) {
  struct gc_ephemeron **link = pending_ephemeron_bucket(table, obj);
  struct gc_ephemeron *ephemeron;
  while ((ephemeron = *link)) {
    if (ephemeron->key == obj) {
      *link = ephemeron->gc_link;
      add_resolved_ephemeron(table, ephemeron);
    } else {
      link = &ephemeron->gc_link;
    }
  }
}

Copying an object may add it to the set of pending ephemerons, if it is itself an ephemeron, and also may resolve other pending ephemerons.

void resolve_ephemerons(struct gc_heap *heap, struct gc_obj *obj) {
  resolve_pending_ephemerons(heap->pending_ephemerons, obj);

  struct gc_ephemeron *ephemeron;
  if ((ephemeron = as_ephemeron(forwarded(obj)))
      && !ephemeron->dead) {
    if (is_forwarded(ephemeron->key))
      add_resolved_ephemeron(heap->pending_ephemerons,
                             ephemeron);
    else
      add_pending_ephemeron(heap->pending_ephemerons,
                            ephemeron->key, ephemeron);
  }
}

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  ...
  resolve_ephemerons(heap, obj); // *
  return new_obj;
}

Finally, we need to add something to the core collector to scan resolved ephemerons:

int trace_some_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *resolved = heap->pending_ephemerons->resolved;
  if (!resolved) return 0;
  heap->pending_ephemerons->resolved = NULL;
  while (resolved) {
    resolved->key = forwarded(resolved->key);
    visit_field(&resolved->value, heap);
    resolved = resolved->gc_link;
  }
  return 1;
}

void kill_pending_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *ephemeron;
  struct gc_pending_ephemeron_table *table = heap->pending_ephemerons;
  for (size_t i = 0; i < table->nbuckets; i++) {
    for (struct gc_ephemeron *chain = table->buckets[i];
         chain;
         chain = chain->gc_link)
      chain->dead = 1;    
    table->buckets[i] = NULL;
  }
}

void collect(struct gc_heap *heap) {
  flip(heap);
  uintptr_t scan = heap->hp;
  trace_roots(heap, visit_field);
  do { // *
    while(scan < heap->hp) {
      struct gc_obj *obj = scan;
      scan += align_size(trace_heap_object(obj, heap, visit_field));
    }
  } while (trace_ephemerons(heap)); // *
  kill_pending_ephemerons(heap); // *
}

The result is... not so bad? It makes sense to make ephemerons pay their own way in terms of memory, having an internal field managed by the GC. In fact I must confess that in the implementation I have been woodshedding, I actually have three of these damn things; perhaps more on that in some other post. But the perturbation to the core algorithm is perhaps less than the original code. There are still some optimizations to make, notably postponing hash-table lookups until the whole strongly-reachable graph is discovered; but again, another day.

And with that, thanks for coming along with me for my journeys into ephemeron-space.

I would like to specifically thank Erik Corry and Steve Blackburn for their advice over the years, and patience with my ignorance; I can only imagine that it’s quite amusing when you have experience in a domain to see someone new and eager come in and make many of the classic mistakes. They have both had a kind of generous parsimony in the sense of allowing me to make the necessary gaffes but also providing insight where it can be helpful.

I’m thinking of many occasions but I especially appreciate the advice to start with a semi-space collector when trying new things, be it benchmarks or test cases or API design or new functionality, as it’s a simple algorithm, hard to get wrong on the implementation side, and perfect for bringing out any bugs in other parts of the system. In this case the difference between fromspace and tospace pointers has a material difference to how you structure the ephemeron implementation; it’s not something you can do just in a trace_heap_object function, as you don’t have the old pointers there, and the pending ephemeron table is indexed by old object addresses.

Well, until some other time, gentle hackfolk, do accept my sincerest waste disposal greetings. As always, yours in garbage, etc.,

Andy Wingohttps://wingolog.org/i'm throwing ephemeron party & you're invitedhttps://wingolog.org/2022/12/12/im-throwing-ephemeron-party-youre-invited2022-12-12T14:02:41Z2022-12-12T14:02:41Z

Good day, hackfolk. Today’s note tries to extend our semi-space collector with support for ephemerons. Spoiler alert: we fail in a subtle and interesting way. See if you can spot it before the end :)

Recall that, as we concluded in an earlier article, a memory manager needs to incorporate ephemerons as a core part of the tracing algorithm. Ephemerons are not macro-expressible in terms of object trace functions.

Instead, to support ephemerons we need to augment our core trace routine. When we see an ephemeron E, we need to check if the key K is already visited (and therefore live); if so, we trace the value V directly, and we’re done. Otherwise, we add E to a global table of pending ephemerons T, indexed under K. Finally whenever we trace a new object O, ephemerons included, we look up O in T, to trace any pending ephemerons for O.

So, taking our semi-space collector as a workbench, let’s start by defining what an ephemeron is.

struct gc_ephemeron {
  struct gc_obj header;
  int dead;
  struct gc_obj *key;
  struct gc_obj *value;
};

enum gc_obj_kind { ..., EPHEMERON, ... };

static struct gc_ephemeron* as_ephemeron(struct gc_obj *obj) {
  uintptr_t ephemeron_tag = NOT_FORWARDED_BIT | (EPHEMERON << 1);
  if (obj->tag == ephemeron_tag)
    return (struct gc_ephemeron*)obj;
  return NULL;
}

First we need to allow the GC to know when an object is an ephemeron or not. This is somewhat annoying, as you would like to make this concern entirely the responsibility of the user, and let the GC be indifferent to the kinds of objects it’s dealing with, but it seems to be unavoidable.

The heap will need some kind of data structure to track pending ephemerons:

struct gc_pending_ephemeron_table;

struct gc_heap {
  ...
  struct gc_pending_ephemeron_table *pending_ephemerons;
}

struct gc_ephemeron *
pop_pending_ephemeron(struct gc_pending_ephemeron_table*,
                      struct gc_obj*);
void
add_pending_ephemeron(struct gc_pending_ephemeron_table*,
                      struct gc_obj*, struct gc_ephemeron*);
struct gc_ephemeron *
pop_any_pending_ephemeron(struct gc_pending_ephemeron_table*);

Now let’s define a function to handle ephemeron shenanigans:

void visit_ephemerons(struct gc_heap *heap, struct gc_obj *obj) {
  // We are visiting OBJ for the first time.
  // OBJ is the old address, but it is already forwarded.
  ASSERT(is_forwarded(obj));

  // First, visit any pending ephemeron for OBJ.
  struct gc_ephemeron *ephemeron;
  while ((ephemeron =
            pop_pending_ephemeron(heap->pending_ephemerons, obj))) {
    ASSERT(obj == ephemeron->key);
    ephemeron->key = forwarded(obj);
    visit_field(&ephemeron->value, heap);
  }

  // Then if OBJ is itself an ephemeron, trace it.
  if ((ephemeron = as_ephemeron(forwarded(obj)))
      && !ephemeron->dead) {
    if (is_forwarded(ephemeron->key)) {
      ephemeron->key = forwarded(ephemeron->key);
      visit_field(&ephemeron->value, heap);
    } else {
      add_pending_ephemeron(heap->pending_ephemerons,
                            ephemeron->key, ephemeron);
    }
  }
}

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  ...
  visit_ephemerons(heap, obj); // *
  return new_obj;
}

We wire it into the copy routine, as that’s the bit of the collector that is called only once per object and which has access to the old address. We actually can’t process ephemerons during the Cheney field scan, as there we don’t have old object addresses.

Then at the end of collection, we kill any ephemeron whose key hasn’t been traced:

void kill_pending_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *ephemeron;
  while ((ephemeron =
            pop_any_pending_ephemeron(heap->pending_ephemerons)))
    ephemeron->dead = 1;    
}

void collect(struct gc_heap *heap) {
  // ...
  kill_pending_ephemerons(heap);
}

First observation: Gosh, this is quite a mess. It’s more code than the core collector, and it’s gnarly. There’s a hash table, for goodness’ sake. Goodbye, elegant algorithm!

Second observation: Well, at least it works.

Third observation: Oh. It works in the same way as tracing in the copy routine works: well enough for shallow graphs, but catastrophically for arbitrary graphs. Calling visit_field from within copy introduces unbounded recursion, as tracing one value can cause more ephemerons to resolve, ad infinitum.

Well. We seem to have reached a dead-end, for now. Will our hero wrest victory from the jaws of defeat? Tune in next time for find out: same garbage time (unpredictable), same garbage channel (my wordhoard). Happy hacking!

Andy Wingohttps://wingolog.org/are ephemerons primitive?https://wingolog.org/2022/11/28/are-ephemerons-primitive2022-11-28T21:11:30Z2022-11-28T21:11:30Z

Good evening :) A quick note, tonight: I’ve long thought that ephemerons are primitive and can’t be implemented with mark functions and/or finalizers, but today I think I have a counterexample.

For context, one of the goals of the GC implementation I have been working on on is to replace Guile‘s current use of the Boehm-Demers-Weiser (BDW) conservative collector. Of course, changing a garbage collector for a production language runtime is risky, and for Guile one of the mitigation strategies for this work is that the new collector is behind an abstract API whose implementation can be chosen at compile-time, without requiring changes to user code. That way we can first switch to BDW-implementing-the-new-GC-API, then switch the implementation behind that API to something else.

Abstracting GC is a tricky problem to get right, and I thank the MMTk project for showing that this is possible – you have user-facing APIs that need to be implemented by concrete collectors, but also extension points so that the user can provide some compile-time configuration too, for example to provide field-tracing visitors that take into account how a user wants to lay out objects.

Anyway. As we discussed last time, ephemerons are usually have explicit support from the GC, so we need an ephemeron abstraction as part of the abstract GC API. The question is, can BDW-GC provide an implementation of this API?

I think the answer is “yes, but it’s very gnarly and will kill performance so bad that you won’t want to do it.”

the contenders

Consider that the primitives that you get with BDW-GC are custom mark functions, run on objects when they are found to be live by the mark workers; disappearing links, a kind of weak reference; and finalizers, which receive the object being finalized, can allocate, and indeed can resurrect the object.

BDW-GC’s finalizers are a powerful primitive, but not one that is useful for implementing the “conjunction” aspect of ephemerons, as they cannot constrain the marker’s idea of graph connectivity: a finalizer can only prolong the life of an object subgraph, not cut it short. So let’s put finalizers aside.

Weak references have a tantalizingly close kind of conjunction property: if the weak reference itself is alive, and the referent is also otherwise reachable, then the weak reference can be dereferenced. However this primitive only involves the two objects E and K; there’s no way to then condition traceability of a third object V to E and K.

We are left with mark functions. These are an extraordinarily powerful interface in BDW-GC, but somewhat expensive also: not inlined, and going against the grain of what BDW-GC is really about (heaps in which the majority of all references are conservative). But, OK. They way they work is, your program allocates a number of GC “kinds”, and associates mark functions with those kinds. Then when you allocate objects, you use those kinds. BDW-GC will call your mark functions when tracing an object of those kinds.

Let’s assume firstly that you have a kind for ephemerons; then when you go to mark an ephemeron E, you mark the value V only if the key K has been marked. Problem solved, right? Only halfway: you also have to handle the case in which E is marked first, then K. So you publish E to a global hash table, and... well. You would mark V when you mark a K for which there is a published E. But, for that you need a hook into marking V, and V can be any object...

So now we assume additionally that all objects are allocated with user-provided custom mark functions, and that all mark functions check if the marked object is in the published table of pending ephemerons, and if so marks values. This is essentially what a proper ephemeron implementation would do, though there are some optimizations one can do to avoid checking the table for each object before the mark stack runs empty for the first time. In this case, yes you can do it! Additionally if you register disappearing links for the K field in each E, you can know if an ephemeron E was marked dead in a previous collection. Add a pre-mark hook (something BDW-GC provides) to clear the pending ephemeron table, and you are in business.

yes, but no

So, it is possible to implement ephemerons with just custom mark functions. I wouldn’t want to do it, though: missing the mostly-avoid-pending-ephemeron-check optimization would be devastating, and really what you want is support in the GC implementation. I think that for the BDW-GC implementation in whippet I’ll just implement weak-key associations, in which the value is always marked strongly unless the key was dead on a previous collection, using disappearing links on the key field. That way a (possibly indirect) reference from a value V to a key K can indeed keep K alive, but oh well: it’s a conservative approximation of what should happen, and not worse than what Guile has currently.

Good night and happy hacking!

Andy Wingohttps://wingolog.org/ephemerons and finalizershttps://wingolog.org/2022/10/31/ephemerons-and-finalizers2022-10-31T12:21:23Z2022-10-31T12:21:23Z

Good day, hackfolk. Today we continue the series on garbage collection with some notes on ephemerons and finalizers.

conjunctions and disjunctions

First described in a 1997 paper by Barry Hayes, which attributes the invention to George Bosworth, ephemerons are a kind of weak key-value association.

Thinking about the problem abstractly, consider that the garbage collector’s job is to keep live objects and recycle memory for dead objects, making that memory available for future allocations. Formally speaking, we can say:

  • An object is live if it is in the root set

  • An object is live it is referenced by any live object.

This circular definition uses the word any, indicating a disjunction: a single incoming reference from a live object is sufficient to mark a referent object as live.

Ephemerons augment this definition with a conjunction:

  • An object V is live if, for an ephemeron E containing an association betweeen objects K and V, both E and K are live.

This is a more annoying property for a garbage collector to track. If you happen to mark K as live and then you mark E as live, then you can just continue to trace V. But if you see E first and then you mark K, you don’t really have a direct edge to V. (Indeed this is one of the main purposes for ephemerons: associating data with an object, here K, without actually modifying that object.)

During a trace of the object graph, you can know if an object is definitely alive by checking if it was visited already, but if it wasn’t visited yet that doesn’t mean it’s not live: we might just have not gotten to it yet. Therefore one common implementation strategy is to wait until tracing the object graph is done before tracing ephemerons. But then we have another annoying problem, which is that tracing ephemerons can result in finding more live ephemerons, requiring another tracing cycle, and so on. Mozilla’s Steve Fink wrote a nice article on this issue earlier this year, with some mitigations.

finalizers aren’t quite ephemerons

All that is by way of introduction. If you just have an object graph with strong references and ephemerons, our definitions are clear and consistent. However, if we add some more features, we muddy the waters.

Consider finalizers. The basic idea is that you can attach one or a number of finalizers to an object, and that when the object becomes unreachable (not live), the system will invoke a function. One way to imagine this is a global association from finalizable object O to finalizer F.

As it is, this definition is underspecified in a few ways. One, what happens if F references O? It could be a GC-managed closure, after all. Would that prevent O from being collected?

Ephemerons solve this problem, in a way; we could trace the table of finalizers like a table of ephemerons. In that way F would only be traced if O is live already, so that by itself it wouldn’t keep O alive. But then if O becomes dead, you’d want to invoke F, so you’d need it to be live, so reachability of finalizers is not quite the same as ephemeron-reachability: indeed logically all F values in the finalizer table are live, because they all will be invoked at some point.

In the end, if F references O, then F actually keeps O alive. Whether this prevents O from being finalized depends on our definition for finalizability. We could say that an object is finalizable if it is found to be unreachable after a full trace, and the finalizers F are in the root set. Or we could say that an object is finalizable if it is unreachable after a partial trace, in which finalizers are not themselves in the initial root set, and instead we trace them after determining the finalizable set.

Having finalizers in the initial root set is unfortunate: there’s no quick check you can make when adding a finalizer to signal this problem to the user, and it’s very hard to convey to a user exactly how it is that an object is referenced. You’d have to add lots of gnarly documentation on top of the already unavoidable gnarliness that you already had to write. But, perhaps it is a local maximum.

Incidentally, you might think that you can get around these issues by saying “don’t reference objects from their finalizers”, and that’s true in a way. However it’s not uncommon for finalizers to receive the object being finalized as an argument; after all, it’s that object which probably encapsulates the information necessary for its finalization. Of course this can lead to the finalizer prolonging the longevity of an object, perhaps by storing it to a shared data structure. This is a risk for correct program construction (the finalized object might reference live-but-already-finalized objects), but not really a burden for the garbage collector, except in that it’s a serialization point in the collection algorithm: you trace, you compute the finalizable set, then you have to trace the finalizables again.

ephemerons vs finalizers

The gnarliness continues! Imagine that O is associated with a finalizer F, and also, via ephemeron E, some auxiliary data V. Imagine that at the end of the trace, O is unreachable and so will be dead. Imagine that F receives O as an argument, and that F looks up the association for O in E. Is the association to V still there?

Guile’s documentation on guardians, a finalization-like facility, specifies that weak associations (i.e. ephemerons) remain in place when an object becomes collectable, though I think in practice this has been broken since Guile switched to the BDW-GC collector some 20 years ago or so and I would like to fix it.

One nice solution falls out if you prohibit resuscitation by not including finalizer closures in the root set and not passing the finalizable object to the finalizer function. In that way you will never be able to look up E×OV, because you don’t have O. This is the path that JavaScript has taken, for example, with WeakMap and FinalizationRegistry.

However if you allow for resuscitation, for example by passing finalizable objects as an argument to finalizers, I am not sure that there is an optimal answer. Recall that with resuscitation, the trace proceeds in three phases: first trace the graph, then compute and enqueue the finalizables, then trace the finalizables. When do you perform the conjunction for the ephemeron trace? You could do so after the initial trace, which might augment the live set, protecting some objects from finalization, but possibly missing ephemeron associations added in the later trace of finalizable objects. Or you could trace ephemerons at the very end, preserving all associations for finalizable objects (and their referents), which would allow more objects to be finalized at the same time.

Probably if you trace ephemerons early you will also want to trace them later, as you would do so because you think ephemeron associations are important, as you want them to prevent objects from being finalized, and it would be weird if they were not present for finalizable objects. This adds more serialization to the trace algorithm, though:

  1. (Add finalizers to the root set?)

  2. Trace from the roots

  3. Trace ephemerons?

  4. Compute finalizables

  5. Trace finalizables (and finalizer closures if not done in 1)

  6. Trace ephemerons again?

These last few paragraphs are the reason for today’s post. It’s not clear to me that there is an optimal way to compose ephemerons and finalizers in the presence of resuscitation. If you add finalizers to the root set, you might prevent objects from being collected. If you defer them until later, you lose the optimization that you can skip steps 5 and 6 if there are no finalizables. If you trace (not-yet-visited) ephemerons twice, that’s overhead; if you trace them only once, the user could get what they perceive as premature finalization of otherwise reachable objects.

In Guile I think I am going to try to add finalizers to the root set, pass the finalizable to the finalizer as an argument, and trace ephemerons twice if there are finalizable objects. I think this wil minimize incoming bug reports. I am bummed though that I can’t eliminate them by construction.

Until next time, happy hacking!