wingolog

whippet progress update: funding, features, future

2024-07-24T09:19:15Z

Greets greets! Today, an update on recent progress in Whippet, including sponsorship, a new collector, and a new feature.

the lob, the pitch

But first, a reminder of what the haps: Whippet is a garbage collector library. The target audience is language run-time authors, particularly “small” run-times: wasm2c, Guile, OCaml, and so on; to a first approximation, the kinds of projects that currently use the Boehm-Demers-Weiser collector.

The pitch is that if you use Whippet, you get a low-fuss small dependency to vendor into your source tree that offers you access to a choice of advanced garbage collectors: not just the conservative mark-sweep collector from BDW-GC, but also copying collectors, an Immix-derived collector, generational collectors, and so on. You can choose the GC that fits your problem domain, like Java people have done for many years. The Whippet API is designed to be a no-overhead abstraction that decouples your language run-time from the specific choice of GC.

I co-maintain Guile and will be working on integrating Whippet in the next months, and have high hopes for success.

bridgeroos!

I’m delighted to share that Whippet was granted financial support from the European Union via the NGI zero core fund, administered by the Dutch non-profit, NLnet foundation. See the NLnet project page for the overview.

This funding allows me to devote time to Whippet to bring it from proof-of-concept to production. I’ll finish the missing features, spend some time adding tracing support, measuring performance, and sanding off any rough edges, then work on integrating Whippet into Guile.

This bloggery is a first update of the progress of the funded NLnet project.

a new collector!

I landed a new collector a couple weeks ago, a parallel copying collector (PCC). It’s like a semi-space collector, in that it always evacuates objects (except large objects, which are managed in their own space). However instead of having a single global bump-pointer allocation region, it breaks the heap into 64-kB blocks. In this way it supports multiple mutator threads: mutators do local bump-pointer allocation into their own block, and when their block is full, they fetch another from the global store.

The block size is 64 kB, but really it’s 128 kB, because each block has two halves: the active region and the copy reserve. It’s a copying collector, after all. Dividing each block in two allows me to easily grow and shrink the heap while ensuring there is always enough reserve space.

Blocks are allocated in 64-MB aligned slabs, so you get 512 blocks in a slab. The first block in a slab is used by the collector itself, to keep metadata for the rest of the blocks, for example a chain pointer allowing blocks to be collected in lists, a saved allocation pointer for partially-filled blocks, whether the block is paged in or out, and so on.

The PCC not only supports parallel mutators, it can also trace in parallel. This mechanism works somewhat like allocation, in which multiple trace workers compete to evacuate objects into their local allocation buffers; when an allocation buffer is full, the trace worker grabs another, just like mutators do.

However, unlike the simple semi-space collector which uses a Cheney grey worklist, the PCC uses the fine-grained work-stealing parallel tracer originally developed for Whippet’s Immix-like collector. Each trace worker maintains a local queue of objects that need tracing, which currently has 1024 entries. If the local queue becomes full, the worker will publish 3/4 of those entries to the worker’s shared worklist. When a worker runs out of local work, it will first try to remove work from its own shared worklist, then will try to steal from other workers.

Of course, because threads compete to evacuate objects, we have to use atomic compare-and-swap instead of simple forwarding pointer updates; if you only have one mutator thread and are OK with just one tracing thread, you can avoid the ~30% performance penalty that atomic operations impose. The PCC generally starts to win over a semi-space collector when it can trace with 2 threads, and gets better with each thread you add.

I sometimes wonder whether the worklist should contain grey edges or grey objects. MMTk seems to do the former, and bundles edges into work packets, which are the unit of work sharing. I don’t know yet what is best and look forward to experimenting once I have better benchmarks.

Anyway, maintaining an external worklist is cheating in a way: unlike the Cheney worklist, this memory is not accounted for as part of the heap size. If you are targetting a microcontroller or something, probably you need to choose a different kind of collector. Fortunately, Whippet enables this kind of choice, as it contains a number of collector implementations.

What about benchmarks? Well, I’ll be doing those soon in a more rigorous way. For now I will just say that it seems to behave as expected and I am satisfied; it is useful to have a performance oracle against which to compare other collectors.

finalizers!

This week I landed support for finalizers!

Finalizers work in all the collectors: semi, pcc, whippet, and the BDW collector that is a shim to use BDW-GC behind the Whippet API. They have a well-defined relationship with ephemerons and are multi-priority, allowing embedders to build guardians or phantom references on top.

In the future I should do some more work to make finalizers support generations, if the collector is generational, allowing a minor collection to avoid visiting finalizers for old objects. But this is a straightforward extension that will happen at some point.

future!

And that’s the end of this update. Next up, I am finally going to tackle heap resizing, using the membalancer approach. Then basic Windows and Mac support, then I move on to the tracing and performance measurement phase. Onwards and upwards!

ephemeral success

2022-12-15T21:21:05Z

Good evening, patient hackers :) Today finishes off my series on implementing ephemerons in a garbage collector.

Last time, we had a working solution for ephemerons, but it involved recursively visiting any pending ephemerons from within the copy routine—the bit of a semi-space collector that is called when traversing the object graph and we see an object that we hadn’t seen yet. This recursive visit could itself recurse, and so we could overflow the control stack.

The solution, of course, is “don’t do that”: instead of visiting recursively, enqueue the ephemeron for visiting later. Iterate, don’t recurse. But here we run into a funny problem: how do we add an ephemeron to a queue or worklist? It’s such a pedestrian question (“just... enqueue it?”) but I think it illustrates some of the particular concerns of garbage collection hacking.

speak, memory

The issue is that we are in the land of “can’t use my tools because I broke my tools with my tools”. You can’t make a standard List because you can’t allocate list nodes inside the tracing routine: if you had memory in which you could allocate, you wouldn’t be calling the garbage collector.

If the collector needs a data structure whose size doesn’t depend on the connectivity of the object graph, you can pre-allocate it in a reserved part of the heap. This adds memory overhead, of course; for a 1000 MB heap, say, you used to be able to make graphs 500 MB in size (for a semi-space collector), but now you can only do 475 MB because you have to reserve 50 MB (say) for your data structures. Another way to look at it is, if you have a 400 MB live set and then you allocate 2GB of garbage, if your heap limit is 500 MB you will collect 20 times, but if it’s 475 MB you’ll collect 26 times, which is more expensive. This is part of why GC algorithms are so primitive; implementors have to be stingy that we don’t get to have nice things / data structures.

However in the case of ephemerons, we will potentially need one worklist entry per ephemeron in the object graph. There is no optimal fixed size for a worklist of ephemerons. Most object graphs will have no or few ephemerons. Some, though, will have practically the whole heap.

For data structure needs like this, the standard solution is to reserve the needed space for a GC-managed data structure in the object itself. For example, for concurrent copying collectors, the GC might reserve a word in the object for a forwarding pointer, instead of just clobbering the first word. If you needed a GC-managed binary tree for a specific kind of object, you’d reserve two words. Again there are strong pressures to minimize this overhead, but in the case of ephemerons it seems sensible to make them pay their way on a per-ephemeron basis.

so let’s retake the thing

So sometimes we might need to put an ephemeron in a worklist. Let’s add a member to the ephemeron structure:

struct gc_ephemeron {
  struct gc_obj header;
  int dead;
  struct gc_obj *key;
  struct gc_obj *value;
  struct gc_ephemeron *gc_link; // *
};

Incidentally this also solves the problem of how to represent the struct gc_pending_ephemeron_table; just reserve 0.5% of the heap or so as a bucket array for a buckets-and-chains hash table, and use the gc_link as the intrachain links.

struct gc_pending_ephemeron_table {
  struct gc_ephemeron *resolved;
  size_t nbuckets;
  struct gc_ephemeron buckets[0];
};

An ephemeron can end up in three states, then:

Outside a collection: gc_link can be whatever.
In a collection, the ephemeron is in the pending ephemeron table: gc_link is part of a hash table.
In a collection, the ephemeron’s key has been visited, and the ephemeron is on the to-visit worklist; gc_link is part of the resolved singly-linked list.

Instead of phrasing the interface to ephemerons in terms of visiting edges in the graph, the verb is to resolve ephemerons. Resolving an ephemeron adds it to a worklist instead of immediately visiting any edge.

struct gc_ephemeron **
pending_ephemeron_bucket(struct gc_pending_ephemeron_table *table,
                         struct gc_obj *key) {
  return &table->buckets[hash_pointer(obj) % table->nbuckets];
}

void add_pending_ephemeron(struct gc_pending_ephemeron_table *table,
                           struct gc_obj *key,
                           struct gc_ephemeron *ephemeron) {
  struct gc_ephemeron **bucket = pending_ephemeron_bucket(table, key);
  ephemeron->gc_link = *bucket;
  *bucket = ephemeron;
}

void resolve_pending_ephemerons(struct gc_pending_ephemeron_table *table,
                                struct gc_obj *obj) {
  struct gc_ephemeron **link = pending_ephemeron_bucket(table, obj);
  struct gc_ephemeron *ephemeron;
  while ((ephemeron = *link)) {
    if (ephemeron->key == obj) {
      *link = ephemeron->gc_link;
      add_resolved_ephemeron(table, ephemeron);
    } else {
      link = &ephemeron->gc_link;
    }
  }
}

Copying an object may add it to the set of pending ephemerons, if it is itself an ephemeron, and also may resolve other pending ephemerons.

void resolve_ephemerons(struct gc_heap *heap, struct gc_obj *obj) {
  resolve_pending_ephemerons(heap->pending_ephemerons, obj);

  struct gc_ephemeron *ephemeron;
  if ((ephemeron = as_ephemeron(forwarded(obj)))
      && !ephemeron->dead) {
    if (is_forwarded(ephemeron->key))
      add_resolved_ephemeron(heap->pending_ephemerons,
                             ephemeron);
    else
      add_pending_ephemeron(heap->pending_ephemerons,
                            ephemeron->key, ephemeron);
  }
}

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  ...
  resolve_ephemerons(heap, obj); // *
  return new_obj;
}

Finally, we need to add something to the core collector to scan resolved ephemerons:

int trace_some_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *resolved = heap->pending_ephemerons->resolved;
  if (!resolved) return 0;
  heap->pending_ephemerons->resolved = NULL;
  while (resolved) {
    resolved->key = forwarded(resolved->key);
    visit_field(&resolved->value, heap);
    resolved = resolved->gc_link;
  }
  return 1;
}

void kill_pending_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *ephemeron;
  struct gc_pending_ephemeron_table *table = heap->pending_ephemerons;
  for (size_t i = 0; i < table->nbuckets; i++) {
    for (struct gc_ephemeron *chain = table->buckets[i];
         chain;
         chain = chain->gc_link)
      chain->dead = 1;    
    table->buckets[i] = NULL;
  }
}

void collect(struct gc_heap *heap) {
  flip(heap);
  uintptr_t scan = heap->hp;
  trace_roots(heap, visit_field);
  do { // *
    while(scan < heap->hp) {
      struct gc_obj *obj = scan;
      scan += align_size(trace_heap_object(obj, heap, visit_field));
    }
  } while (trace_ephemerons(heap)); // *
  kill_pending_ephemerons(heap); // *
}

The result is... not so bad? It makes sense to make ephemerons pay their own way in terms of memory, having an internal field managed by the GC. In fact I must confess that in the implementation I have been woodshedding, I actually have three of these damn things; perhaps more on that in some other post. But the perturbation to the core algorithm is perhaps less than the original code. There are still some optimizations to make, notably postponing hash-table lookups until the whole strongly-reachable graph is discovered; but again, another day.

And with that, thanks for coming along with me for my journeys into ephemeron-space.

I would like to specifically thank Erik Corry and Steve Blackburn for their advice over the years, and patience with my ignorance; I can only imagine that it’s quite amusing when you have experience in a domain to see someone new and eager come in and make many of the classic mistakes. They have both had a kind of generous parsimony in the sense of allowing me to make the necessary gaffes but also providing insight where it can be helpful.

I’m thinking of many occasions but I especially appreciate the advice to start with a semi-space collector when trying new things, be it benchmarks or test cases or API design or new functionality, as it’s a simple algorithm, hard to get wrong on the implementation side, and perfect for bringing out any bugs in other parts of the system. In this case the difference between fromspace and tospace pointers has a material difference to how you structure the ephemeron implementation; it’s not something you can do just in a trace_heap_object function, as you don’t have the old pointers there, and the pending ephemeron table is indexed by old object addresses.

Well, until some other time, gentle hackfolk, do accept my sincerest waste disposal greetings. As always, yours in garbage, etc.,

i'm throwing ephemeron party & you're invited

2022-12-12T14:02:41Z

Good day, hackfolk. Today’s note tries to extend our semi-space collector with support for ephemerons. Spoiler alert: we fail in a subtle and interesting way. See if you can spot it before the end :)

Recall that, as we concluded in an earlier article, a memory manager needs to incorporate ephemerons as a core part of the tracing algorithm. Ephemerons are not macro-expressible in terms of object trace functions.

Instead, to support ephemerons we need to augment our core trace routine. When we see an ephemeron E, we need to check if the key K is already visited (and therefore live); if so, we trace the value V directly, and we’re done. Otherwise, we add E to a global table of pending ephemerons T, indexed under K. Finally whenever we trace a new object O, ephemerons included, we look up O in T, to trace any pending ephemerons for O.

So, taking our semi-space collector as a workbench, let’s start by defining what an ephemeron is.

struct gc_ephemeron {
  struct gc_obj header;
  int dead;
  struct gc_obj *key;
  struct gc_obj *value;
};

enum gc_obj_kind { ..., EPHEMERON, ... };

static struct gc_ephemeron* as_ephemeron(struct gc_obj *obj) {
  uintptr_t ephemeron_tag = NOT_FORWARDED_BIT | (EPHEMERON << 1);
  if (obj->tag == ephemeron_tag)
    return (struct gc_ephemeron*)obj;
  return NULL;
}

First we need to allow the GC to know when an object is an ephemeron or not. This is somewhat annoying, as you would like to make this concern entirely the responsibility of the user, and let the GC be indifferent to the kinds of objects it’s dealing with, but it seems to be unavoidable.

The heap will need some kind of data structure to track pending ephemerons:

struct gc_pending_ephemeron_table;

struct gc_heap {
  ...
  struct gc_pending_ephemeron_table *pending_ephemerons;
}

struct gc_ephemeron *
pop_pending_ephemeron(struct gc_pending_ephemeron_table*,
                      struct gc_obj*);
void
add_pending_ephemeron(struct gc_pending_ephemeron_table*,
                      struct gc_obj*, struct gc_ephemeron*);
struct gc_ephemeron *
pop_any_pending_ephemeron(struct gc_pending_ephemeron_table*);

Now let’s define a function to handle ephemeron shenanigans:

void visit_ephemerons(struct gc_heap *heap, struct gc_obj *obj) {
  // We are visiting OBJ for the first time.
  // OBJ is the old address, but it is already forwarded.
  ASSERT(is_forwarded(obj));

  // First, visit any pending ephemeron for OBJ.
  struct gc_ephemeron *ephemeron;
  while ((ephemeron =
            pop_pending_ephemeron(heap->pending_ephemerons, obj))) {
    ASSERT(obj == ephemeron->key);
    ephemeron->key = forwarded(obj);
    visit_field(&ephemeron->value, heap);
  }

  // Then if OBJ is itself an ephemeron, trace it.
  if ((ephemeron = as_ephemeron(forwarded(obj)))
      && !ephemeron->dead) {
    if (is_forwarded(ephemeron->key)) {
      ephemeron->key = forwarded(ephemeron->key);
      visit_field(&ephemeron->value, heap);
    } else {
      add_pending_ephemeron(heap->pending_ephemerons,
                            ephemeron->key, ephemeron);
    }
  }
}

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  ...
  visit_ephemerons(heap, obj); // *
  return new_obj;
}

We wire it into the copy routine, as that’s the bit of the collector that is called only once per object and which has access to the old address. We actually can’t process ephemerons during the Cheney field scan, as there we don’t have old object addresses.

Then at the end of collection, we kill any ephemeron whose key hasn’t been traced:

void kill_pending_ephemerons(struct gc_heap *heap) {
  struct gc_ephemeron *ephemeron;
  while ((ephemeron =
            pop_any_pending_ephemeron(heap->pending_ephemerons)))
    ephemeron->dead = 1;    
}

void collect(struct gc_heap *heap) {
  // ...
  kill_pending_ephemerons(heap);
}

First observation: Gosh, this is quite a mess. It’s more code than the core collector, and it’s gnarly. There’s a hash table, for goodness’ sake. Goodbye, elegant algorithm!

Second observation: Well, at least it works.

Third observation: Oh. It works in the same way as tracing in the copy routine works: well enough for shallow graphs, but catastrophically for arbitrary graphs. Calling visit_field from within copy introduces unbounded recursion, as tracing one value can cause more ephemerons to resolve, ad infinitum.

Well. We seem to have reached a dead-end, for now. Will our hero wrest victory from the jaws of defeat? Tune in next time for find out: same garbage time (unpredictable), same garbage channel (my wordhoard). Happy hacking!

we iterate so that you can recurse

2022-12-11T21:19:20Z

Sometimes when you see an elegant algorithm, you think “looks great, I just need it to also do X”. Perhaps you are able to build X directly out of what the algorithm gives you; fantastic. Or, perhaps you can alter the algorithm a bit, and it works just as well while also doing X. Sometimes, though, you alter the algorithm and things go pear-shaped.

Tonight’s little note builds on yesterday’s semi-space collector article and discusses an worse alternative to the Cheney scanning algorithm.

To recall, we had this visit_field function that takes a edge in the object graph, as the address of a field in memory containing a struct gc_obj*. If the edge points to an object that was already copied, visit_field updates it to the forwarded address. Otherwise it copies the object, thus computing the new address, and then updates the field.

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  size_t size = heap_object_size(obj);
  struct gc_obj *new_obj = (struct gc_obj*)heap->hp;
  memcpy(new_obj, obj, size);
  forward(obj, new_obj);
  heap->hp += align_size(size);
  return new_obj;
}

void visit_field(struct gc_obj **field, struct gc_heap *heap) {
  struct gc_obj *from = *field;
  struct gc_obj *to =
    is_forwarded(from) ? forwarded(from) : copy(heap, from);
  *field = to;
}

Although a newly copied object is in tospace, all of its fields still point to fromspace. The Cheney scan algorithm later visits the fields in the newly copied object with visit_field, which both discovers new objects and updates the fields to point to tospace.

One disadvantage of this approach is that the order in which the objects are copied is a bit random. Given a hierarchical memory system, it’s better if objects that are accessed together in time are close together in space. This is an impossible task without instrumenting the actual data access in a program and then assuming future accesses will be like the past. Instead, the generally-accepted solution is to ensure that objects that are allocated close together in time be adjacent in space. The bump-pointer allocator in a semi-space collector provides this property, but the evacuation algorithm above does not: it would need to preserve allocation order, but instead its order is driven by graph connectivity.

I say that the copying algorithm above is random but really it favors a breadth-first traversal; if you have a binary tree, first you will copy the left and the right nodes of the root, then the left and right children of the left, then the left and right children of the right, then grandchildren, and so on. Maybe it would be better to keep parent and child nodes together? After all they are probably allocated that way.

So, what if we change the algorithm:

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  size_t size = heap_object_size(obj);
  struct gc_obj *new_obj = (struct gc_obj*)heap->hp;
  memcpy(new_obj, obj, size);
  forward(obj, new_obj);
  heap->hp += align_size(size);
  trace_heap_object(new_obj, heap, visit_field); // *
  return new_obj;
}

void visit_field(struct gc_obj **field, struct gc_heap *heap) {
  struct gc_obj *from = *field;
  struct gc_obj *to =
    is_forwarded(from) ? forwarded(from) : copy(heap, from);
  *field = to;
}

void collect(struct gc_heap *heap) {
  flip(heap);
  trace_roots(heap, visit_field);
}

Here we favor a depth-first traversal: we eagerly call trace_heap_object within copy. No need for the Cheney scan algorithm; tracing does it all.

The thing is, this works! It might even have better performance for some workloads, depending on access patterns. And yet, nobody does this. Why?

Well, consider a linked list with a million nodes; you’ll end up with a million recursive calls to copy, as visiting each link eagerly traverses the next. While I am all about unbounded recursion, an infinitely extensible stack is something that a language runtime has to provide to a user, and here we’re deep into implementing-the-language-runtime territory. At some point a user’s deep heap graph is going to cause a gnarly system failure via stack overflow.

Ultimately stack space needed by a GC algorithm counts towards collector memory overhead. In the case of a semi-space collector you already need twice the amount memory as your live object graph, and if you recursed instead of iterated this might balloon to 3x or more, depending on the heap graph shape.

Hey that’s my note! All this has been context for some future article, so this will be on the final exam. Until then!

a simple semi-space collector

2022-12-10T20:50:28Z

Good day, hackfolk. Today’s article is about semi-space collectors. Many of you know what these are, but perhaps not so many have seen an annotated implementation, so let’s do that.

Just to recap, the big picture here is that a semi-space collector divides a chunk of memory into two equal halves or spaces, called the fromspace and the tospace. Allocation proceeds linearly across tospace, from one end to the other. When the tospace is full, we flip the spaces: the tospace becomes the fromspace, and the fromspace becomes the tospace. The collector copies out all live data from the fromspace to the tospace (hence the names), starting from some set of root objects. Once the copy is done, allocation then proceeds in the new tospace.

In practice when you build a GC, it’s parameterized in a few ways, one of them being how the user of the GC will represent objects. Let’s take as an example a simple tag-in-the-first-word scheme:

struct gc_obj {
  union {
    uintptr_t tag;
    struct gc_obj *forwarded; // for GC
  };
  uintptr_t payload[0];
};

We’ll divide all the code in the system into GC code and user code. Users of the GC define how objects are represented. When user code wants to know what the type of an object is, it looks at the first word to check the tag. But, you see that GC has a say in what the representation of user objects needs to be: there’s a forwarded member too.

static const uintptr_t NOT_FORWARDED_BIT = 1;
int is_forwarded(struct gc_obj *obj) {
  return (obj->tag & NOT_FORWARDED_BIT) == 1;
}
void* forwarded(struct gc_obj *obj) { 
  return obj->forwarded;
}
void forward(struct gc_obj *from, struct gc_obj *to) {
  from->forwarded = to;
}

forwarded is a forwarding pointer. When GC copies an object from fromspace to tospace, it clobbers the first word of the old copy in fromspace, writing the new address there. It’s like when you move to a new flat and have your mail forwarded from your old to your new address.

There is a contract between the GC and the user in which the user agrees to always set the NOT_FORWARDED_BIT in the first word of its objects. That bit is a way for the GC to check if an object is forwarded or not: a forwarded pointer will never have its low bit set, because allocations are aligned on some power-of-two boundary, for example 8 bytes.

struct gc_heap;

// To implement by the user:
size_t heap_object_size(struct gc_obj *obj);
size_t trace_heap_object(struct gc_obj *obj, struct gc_heap *heap,
                         void (*visit)(struct gc_obj **field,
                                       struct gc_heap *heap));
size_t trace_roots(struct gc_heap *heap,
                   void (*visit)(struct gc_obj **field,
                                 struct gc_heap *heap));

The contract between GC and user is in practice one of the most important details of a memory management system. As a GC author, you want to expose the absolute minimum interface, to preserve your freedom to change implementations. The GC-user interface does need to have some minimum surface area, though, for example to enable inlining of the hot path for object allocation. Also, as we see here, there are some operations needed by the GC which are usually implemented by the user: computing the size of an object, tracing its references, and tracing the root references. If this aspect of GC design interests you, I would strongly recommend having a look at MMTk, which has been fruitfully exploring this space over the last two decades.

struct gc_heap {
  uintptr_t hp;
  uintptr_t limit;
  uintptr_t from_space;
  uintptr_t to_space;
  size_t size;
};

Now we get to the implementation of the GC. With the exception of how to inline the allocation hot-path, none of this needs to be exposed to the user. We start with a basic definition of what a semi-space heap is, above, and below we will implement collection and allocation.

static uintptr_t align(uintptr_t val, uintptr_t alignment) {
  return (val + alignment - 1) & ~(alignment - 1);
}
static uintptr_t align_size(uintptr_t size) {
  return align(size, sizeof(uintptr_t));
}

All allocators have some minimum alignment, which is usually a power of two at least as large as the target language’s ABI alignment. Usually it’s a word or two; here we just use one word (4 or 8 bytes).

struct gc_heap* make_heap(size_t size) {
  size = align(size, getpagesize());
  struct gc_heap *heap = malloc(sizeof(struct gc_heap));
  void *mem = mmap(NULL, size, PROT_READ|PROT_WRITE,
                   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
  heap->to_space = heap->hp = (uintptr_t) mem;
  heap->from_space = heap->limit = space->hp + size / 2;
  heap->size = size;
  return heap;
}

Making a heap is just requesting a bunch of memory and dividing it in two. How you get that space differs depending on your platform; here we use mmap and also the platform malloc for the struct gc_heap metadata. Of course you will want to check that both the mmap and the malloc succeed :)

struct gc_obj* copy(struct gc_heap *heap, struct gc_obj *obj) {
  size_t size = heap_object_size(obj);
  struct gc_obj *new_obj = (struct gc_obj*)heap->hp;
  memcpy(new_obj, obj, size);
  forward(obj, new_obj);
  heap->hp += align_size(size);
  return new_obj;
}

void flip(struct gc_heap *heap) {
  heap->hp = heap->from_space;
  heap->from_space = heap->to_space;
  heap->to_space = heap->hp;
  heap->limit = heap->hp + heap->size / 2;
}  

void visit_field(struct gc_obj **field, struct gc_heap *heap) {
  struct gc_obj *from = *field;
  struct gc_obj *to =
    is_forwarded(from) ? forwarded(from) : copy(heap, from);
  *field = to;
}

void collect(struct gc_heap *heap) {
  flip(heap);
  uintptr_t scan = heap->hp;
  trace_roots(heap, visit_field);
  while(scan < heap->hp) {
    struct gc_obj *obj = scan;
    scan += align_size(trace_heap_object(obj, heap, visit_field));
  }
}

Here we have the actual semi-space collection algorithm! It’s a tiny bit of code about which people have written reams of prose, and to be fair there are many things to say—too many for here.

Personally I think the most interesting aspect of a semi-space collector is the so-called “Cheney scanning algorithm”: when we see an object that’s not yet traced, in visit_field, we copy() it to tospace, but don’t actually look at its fields. Instead collect keeps track of the partition of tospace that contains copied objects which have not yet been traced, which are those in [scan, heap->hp). The Cheney scan sweeps through this space, advancing scan, possibly copying more objects and extending heap->hp, until such a time as the needs-tracing partition is empty. It’s quite a neat solution that requires no additional memory.

inline struct gc_obj* allocate(struct gc_heap *heap, size_t size) {
retry:
  uintptr_t addr = heap->hp;
  uintptr_t new_hp = align_size(addr + size);
  if (heap->limit < new_hp) {
    collect(heap);
    if (heap->limit - heap->hp < size) {
      fprintf(stderr, "out of memory\n");
      abort();
    }
    goto retry;
  }
  heap->hp = new_hp;
  return (struct gc_obj*)addr;
}

Finally, we have the allocator: the reason we have the GC in the first place. The fast path just returns heap->hp, and arranges for the next allocation to return heap->hp + size. The slow path calls collect() and then retries.

Welp, that’s a semi-space collector. Until next time for some notes on ephemerons again. Until then, have a garbage holiday season!