While poking the other day at making a Guile binding for Harfbuzz, I remembered why I don’t much do this any more: it is impossible to compose GC with explicit ownership.
Allow me to illustrate with an example. Harfbuzz has a concept of blobs, which are refcounted sequences of bytes. It uses these in a number of places, for example when loading OpenType fonts. You can get a peek at the blob’s contents back with hb_blob_get_data, which gives you a pointer and a length.
Say you are in LuaJIT. (To think that for a couple years, I wrote LuaJIT all day long; now I can hardly remember.) You get a blob from somewhere and want to get its data. You define a wrapper for hb_blob_get_data:
local hb = ffi.load("harfbuzz") ffi.cdef [[ typedef struct hb_blob_t hb_blob_t; const char * hb_blob_get_data (hb_blob_t *blob, unsigned int *length); ]]
Presumably you then arrange to release LuaJIT’s reference on the blob when GC collects a Lua wrapper for a blob:
ffi.cdef [[ void hb_blob_destroy (hb_blob_t *blob); ]] function adopt_blob(ptr) return ffi.gc(ptr, hb.hb_blob_destroy) end
OK, so let’s say we get a blob from somewhere, and want to copy out its contents as a byte string.
function blob_contents(blob) local len_out = ffi.new('unsigned int') local contents = hb.hb_blob_get_data(blob, len_out) local len = len_out[0]; return ffi.string(contents, len) end
The thing is, this code is as correct as you can get it, but it’s not correct enough. In between the call to hb_blob_get_data and, well, anything else, GC could run, and if blob is not used in the future of the program execution (the continuation), then it could be collected, causing the hb_blob_destroy finalizer to release the last reference on the blob, freeing contents: we would then be accessing invalid memory.
Among GC implementors, it is a truth universally acknowledged that a program containing finalizers must be in want of a segfault. The semantics of LuaJIT do not prescribe when GC can happen and what values will be live, so the GC and the compiler are not constrained to extend the liveness of blob to, say, the entirety of its lexical scope. It is perfectly valid to collect blob after its last use, and so at some point a GC will evolve to do just that.
I chose LuaJIT not to pick on it, but rather because its FFI is very straightforward. All other languages with GC that I am aware of have this same issue. There are but two work-arounds, and neither are satisfactory: either develop a deep and correct knowledge of what the compiler and run-time will do for a given piece of code, and then pray that knowledge does not go out of date, or attempt to manually extend the lifetime of a finalizable object, and then pray the compiler and GC don’t learn new tricks to invalidate your trick.
This latter strategy takes the form of “remember-this” procedures that are designed to outsmart the compiler. They have mostly worked for the last few decades, but I wouldn’t bet on them in the future.
Another way to look at the problem is that once you have a system working—though, how would you know it’s correct?—then you either never update the compiler and run-time, or you become fast friends with whoever maintains your GC, and probably your compiler too.
For more on this topic, as always Hans Boehm has the first and last word; see for example the 2002 Destructors, finalizers, and synchronization. These considerations don’t really apply to destructors, which are used in languages with ownership and generally run synchronously.
Happy hacking, and be safe out there!
Good evening :) A quick note, tonight: I’ve long thought that ephemerons are primitive and can’t be implemented with mark functions and/or finalizers, but today I think I have a counterexample.
For context, one of the goals of the GC implementation I have been working on on is to replace Guile‘s current use of the Boehm-Demers-Weiser (BDW) conservative collector. Of course, changing a garbage collector for a production language runtime is risky, and for Guile one of the mitigation strategies for this work is that the new collector is behind an abstract API whose implementation can be chosen at compile-time, without requiring changes to user code. That way we can first switch to BDW-implementing-the-new-GC-API, then switch the implementation behind that API to something else.
Abstracting GC is a tricky problem to get right, and I thank the MMTk project for showing that this is possible – you have user-facing APIs that need to be implemented by concrete collectors, but also extension points so that the user can provide some compile-time configuration too, for example to provide field-tracing visitors that take into account how a user wants to lay out objects.
Anyway. As we discussed last time, ephemerons are usually have explicit support from the GC, so we need an ephemeron abstraction as part of the abstract GC API. The question is, can BDW-GC provide an implementation of this API?
I think the answer is “yes, but it’s very gnarly and will kill performance so bad that you won’t want to do it.”
the contenders
Consider that the primitives that you get with BDW-GC are custom mark functions, run on objects when they are found to be live by the mark workers; disappearing links, a kind of weak reference; and finalizers, which receive the object being finalized, can allocate, and indeed can resurrect the object.
BDW-GC’s finalizers are a powerful primitive, but not one that is useful for implementing the “conjunction” aspect of ephemerons, as they cannot constrain the marker’s idea of graph connectivity: a finalizer can only prolong the life of an object subgraph, not cut it short. So let’s put finalizers aside.
Weak references have a tantalizingly close kind of conjunction property: if the weak reference itself is alive, and the referent is also otherwise reachable, then the weak reference can be dereferenced. However this primitive only involves the two objects E and K; there’s no way to then condition traceability of a third object V to E and K.
We are left with mark functions. These are an extraordinarily powerful interface in BDW-GC, but somewhat expensive also: not inlined, and going against the grain of what BDW-GC is really about (heaps in which the majority of all references are conservative). But, OK. They way they work is, your program allocates a number of GC “kinds”, and associates mark functions with those kinds. Then when you allocate objects, you use those kinds. BDW-GC will call your mark functions when tracing an object of those kinds.
Let’s assume firstly that you have a kind for ephemerons; then when you go to mark an ephemeron E, you mark the value V only if the key K has been marked. Problem solved, right? Only halfway: you also have to handle the case in which E is marked first, then K. So you publish E to a global hash table, and... well. You would mark V when you mark a K for which there is a published E. But, for that you need a hook into marking V, and V can be any object...
So now we assume additionally that all objects are allocated with user-provided custom mark functions, and that all mark functions check if the marked object is in the published table of pending ephemerons, and if so marks values. This is essentially what a proper ephemeron implementation would do, though there are some optimizations one can do to avoid checking the table for each object before the mark stack runs empty for the first time. In this case, yes you can do it! Additionally if you register disappearing links for the K field in each E, you can know if an ephemeron E was marked dead in a previous collection. Add a pre-mark hook (something BDW-GC provides) to clear the pending ephemeron table, and you are in business.
yes, but no
So, it is possible to implement ephemerons with just custom mark functions. I wouldn’t want to do it, though: missing the mostly-avoid-pending-ephemeron-check optimization would be devastating, and really what you want is support in the GC implementation. I think that for the BDW-GC implementation in whippet I’ll just implement weak-key associations, in which the value is always marked strongly unless the key was dead on a previous collection, using disappearing links on the key field. That way a (possibly indirect) reference from a value V to a key K can indeed keep K alive, but oh well: it’s a conservative approximation of what should happen, and not worse than what Guile has currently.
Good night and happy hacking!
Good day, hackfolk. Today we continue the series on garbage collection with some notes on ephemerons and finalizers.
conjunctions and disjunctions
First described in a 1997 paper by Barry Hayes, which attributes the invention to George Bosworth, ephemerons are a kind of weak key-value association.
Thinking about the problem abstractly, consider that the garbage collector’s job is to keep live objects and recycle memory for dead objects, making that memory available for future allocations. Formally speaking, we can say:
An object is live if it is in the root set
An object is live it is referenced by any live object.
This circular definition uses the word any, indicating a disjunction: a single incoming reference from a live object is sufficient to mark a referent object as live.
Ephemerons augment this definition with a conjunction:
An object V is live if, for an ephemeron E containing an association betweeen objects K and V, both E and K are live.
This is a more annoying property for a garbage collector to track. If you happen to mark K as live and then you mark E as live, then you can just continue to trace V. But if you see E first and then you mark K, you don’t really have a direct edge to V. (Indeed this is one of the main purposes for ephemerons: associating data with an object, here K, without actually modifying that object.)
During a trace of the object graph, you can know if an object is definitely alive by checking if it was visited already, but if it wasn’t visited yet that doesn’t mean it’s not live: we might just have not gotten to it yet. Therefore one common implementation strategy is to wait until tracing the object graph is done before tracing ephemerons. But then we have another annoying problem, which is that tracing ephemerons can result in finding more live ephemerons, requiring another tracing cycle, and so on. Mozilla’s Steve Fink wrote a nice article on this issue earlier this year, with some mitigations.
finalizers aren’t quite ephemerons
All that is by way of introduction. If you just have an object graph with strong references and ephemerons, our definitions are clear and consistent. However, if we add some more features, we muddy the waters.
Consider finalizers. The basic idea is that you can attach one or a number of finalizers to an object, and that when the object becomes unreachable (not live), the system will invoke a function. One way to imagine this is a global association from finalizable object O to finalizer F.
As it is, this definition is underspecified in a few ways. One, what happens if F references O? It could be a GC-managed closure, after all. Would that prevent O from being collected?
Ephemerons solve this problem, in a way; we could trace the table of finalizers like a table of ephemerons. In that way F would only be traced if O is live already, so that by itself it wouldn’t keep O alive. But then if O becomes dead, you’d want to invoke F, so you’d need it to be live, so reachability of finalizers is not quite the same as ephemeron-reachability: indeed logically all F values in the finalizer table are live, because they all will be invoked at some point.
In the end, if F references O, then F actually keeps O alive. Whether this prevents O from being finalized depends on our definition for finalizability. We could say that an object is finalizable if it is found to be unreachable after a full trace, and the finalizers F are in the root set. Or we could say that an object is finalizable if it is unreachable after a partial trace, in which finalizers are not themselves in the initial root set, and instead we trace them after determining the finalizable set.
Having finalizers in the initial root set is unfortunate: there’s no quick check you can make when adding a finalizer to signal this problem to the user, and it’s very hard to convey to a user exactly how it is that an object is referenced. You’d have to add lots of gnarly documentation on top of the already unavoidable gnarliness that you already had to write. But, perhaps it is a local maximum.
Incidentally, you might think that you can get around these issues by saying “don’t reference objects from their finalizers”, and that’s true in a way. However it’s not uncommon for finalizers to receive the object being finalized as an argument; after all, it’s that object which probably encapsulates the information necessary for its finalization. Of course this can lead to the finalizer prolonging the longevity of an object, perhaps by storing it to a shared data structure. This is a risk for correct program construction (the finalized object might reference live-but-already-finalized objects), but not really a burden for the garbage collector, except in that it’s a serialization point in the collection algorithm: you trace, you compute the finalizable set, then you have to trace the finalizables again.
ephemerons vs finalizers
The gnarliness continues! Imagine that O is associated with a finalizer F, and also, via ephemeron E, some auxiliary data V. Imagine that at the end of the trace, O is unreachable and so will be dead. Imagine that F receives O as an argument, and that F looks up the association for O in E. Is the association to V still there?
Guile’s documentation on guardians, a finalization-like facility, specifies that weak associations (i.e. ephemerons) remain in place when an object becomes collectable, though I think in practice this has been broken since Guile switched to the BDW-GC collector some 20 years ago or so and I would like to fix it.
One nice solution falls out if you prohibit resuscitation by not including finalizer closures in the root set and not passing the finalizable object to the finalizer function. In that way you will never be able to look up E×O⇒V, because you don’t have O. This is the path that JavaScript has taken, for example, with WeakMap and FinalizationRegistry.
However if you allow for resuscitation, for example by passing finalizable objects as an argument to finalizers, I am not sure that there is an optimal answer. Recall that with resuscitation, the trace proceeds in three phases: first trace the graph, then compute and enqueue the finalizables, then trace the finalizables. When do you perform the conjunction for the ephemeron trace? You could do so after the initial trace, which might augment the live set, protecting some objects from finalization, but possibly missing ephemeron associations added in the later trace of finalizable objects. Or you could trace ephemerons at the very end, preserving all associations for finalizable objects (and their referents), which would allow more objects to be finalized at the same time.
Probably if you trace ephemerons early you will also want to trace them later, as you would do so because you think ephemeron associations are important, as you want them to prevent objects from being finalized, and it would be weird if they were not present for finalizable objects. This adds more serialization to the trace algorithm, though:
(Add finalizers to the root set?)
Trace from the roots
Trace ephemerons?
Compute finalizables
Trace finalizables (and finalizer closures if not done in 1)
Trace ephemerons again?
These last few paragraphs are the reason for today’s post. It’s not clear to me that there is an optimal way to compose ephemerons and finalizers in the presence of resuscitation. If you add finalizers to the root set, you might prevent objects from being collected. If you defer them until later, you lose the optimization that you can skip steps 5 and 6 if there are no finalizables. If you trace (not-yet-visited) ephemerons twice, that’s overhead; if you trace them only once, the user could get what they perceive as premature finalization of otherwise reachable objects.
In Guile I think I am going to try to add finalizers to the root set, pass the finalizable to the finalizer as an argument, and trace ephemerons twice if there are finalizable objects. I think this wil minimize incoming bug reports. I am bummed though that I can’t eliminate them by construction.
Until next time, happy hacking!
Hello, interwebs! Today I'd like to share a little skunkworks project with y'all: Pictie, a workbench for WebAssembly C++ integration on the web.
loading pictie...
wtf just happened????!?
So! If everything went well, above you have some colors and a prompt that accepts Javascript expressions to evaluate. If the result of evaluating a JS expression is a painter, we paint it onto a canvas.
But allow me to back up a bit. These days everyone is talking about WebAssembly, and I think with good reason: just as many of the world's programs run on JavaScript today, tomorrow much of it will also be in languages compiled to WebAssembly. JavaScript isn't going anywhere, of course; it's around for the long term. It's the "also" aspect of WebAssembly that's interesting, that it appears to be a computing substrate that is compatible with JS and which can extend the range of the kinds of programs that can be written for the web.
And yet, it's early days. What are programs of the future going to look like? What elements of the web platform will be needed when we have systems composed of WebAssembly components combined with JavaScript components, combined with the browser? Is it all going to work? Are there missing pieces? What's the status of the toolchain? What's the developer experience? What's the user experience?
When you look at the current set of applications targetting WebAssembly in the browser, mostly it's games. While compelling, games don't provide a whole lot of insight into the shape of the future web platform, inasmuch as there doesn't have to be much JavaScript interaction when you have an already-working C++ game compiled to WebAssembly. (Indeed, much of the incidental interactions with JS that are currently necessary -- bouncing through JS in order to call WebGL -- people are actively working on removing all of that overhead, so that WebAssembly can call platform facilities (WebGL, etc) directly. But I digress!)
For WebAssembly to really succeed in the browser, there should also be incremental stories -- what does it look like when you start to add WebAssembly modules to a system that is currently written mostly in JavaScript?
To find out the answers to these questions and to evaluate potential platform modifications, I needed a small, standalone test case. So... I wrote one? It seemed like a good idea at the time.
pictie is a test bed
Pictie is a simple, standalone C++ graphics package implementing an algebra of painters. It was created not to be a great graphics package but rather to be a test-bed for compiling C++ libraries to WebAssembly. You can read more about it on its github page.
Structurally, pictie is a modern C++ library with a functional-style interface, smart pointers, reference types, lambdas, and all the rest. We use emscripten to compile it to WebAssembly; you can see more information on how that's done in the repository, or check the README.
Pictie is inspired by Peter Henderson's "Functional Geometry" (1982, 2002). "Functional Geometry" inspired the Picture language from the well-known Structure and Interpretation of Computer Programs computer science textbook.
prototype in action
So far it's been surprising how much stuff just works. There's still lots to do, but just getting a C++ library on the web is pretty easy! I advise you to take a look to see the details.
If you are thinking of dipping your toe into the WebAssembly water, maybe take a look also at Pictie when you're doing your back-of-the-envelope calculations. You can use it or a prototype like it to determine the effects of different compilation options on compile time, load time, throughput, and network trafic. You can check if the different binding strategies are appropriate for your C++ idioms; Pictie currently uses embind (source), but I would like to compare to WebIDL as well. You might also use it if you're considering what shape your C++ library should have to have a minimal overhead in a WebAssembly context.
I use Pictie as a test-bed when working on the web platform; the weakref proposal which adds finalization, leak detection, and working on the binding layers around Emscripten. Eventually I'll be able to use it in other contexts as well, with the WebIDL bindings proposal, typed objects, and GC.
prototype the web forward
As the browser and adjacent environments have come to dominate programming in practice, we lost a bit of the delightful variety from computing. JS is a great language, but it shouldn't be the only medium for programs. WebAssembly is part of this future world, waiting in potentia, where applications for the web can be written in any of a number of languages. But, this future world will only arrive if it "works" -- if all of the various pieces, from standards to browsers to toolchains to virtual machines, only if all of these pieces fit together in some kind of sensible way. Now is the early phase of annealing, when the platform as a whole is actively searching for its new low-entropy state. We're going to need a lot of prototypes to get from here to there. In that spirit, may your prototypes be numerous and soon replaced. Happy annealing!
OK kids, quiz time. Spot the bugs in this Python class:
import os class FD: _all_fds = set() def __init__(self, fd): self.fd = fd self._all_fds.add(fd) def close(self): if (self.fd): os.close(self.fd) self._all_fds.remove(self.fd) self.fd = None @classmethod def for_each_fd(self, proc): for fd in self._all_fds: proc(fd) def __del__(self): self.close()
The intention is pretty clear: you have a limited resource (file descriptors, in this case). You would like to make sure they get closed, no matter what happens in your program, so you wrap them in objects known to the garbage collector, and attach finalizers that close the descriptors. You have a for_each_fd procedure that should at least allow you to close all file descriptors, for example when your program is about to exec another program.
So, bugs?
Let's start with one: FD._all_fds can't sensibly be accessed from multiple threads at the same time. The file descriptors in the set are logically owned by particular pieces of code, and those pieces of code could be closing them while you're trying to for_each_fd on them.
Well, OK. Let's restrict the problem, then. Let's say there's only one thread. Next bug?
Another bug is that signals cause arbitrary code to run, at arbitrary points in your program. For example, if in the close method, you get a SIGINT after the os.close but before removing the file descriptor from the set, causing an exception to be thrown, you will be left with a closed descriptor in the set. If you swap the operations, you leak an fd. Neither situation is good.
The root cause of the problem here is that asynchronous signals introduce concurrency. Signal handlers are run in another logical thread of execution in your program -- even if they happen to share the same stack (as they do in CPython).
OK, let's mask out signals then. (This is starting to get ugly). What next?
What happens if, during for_each_fd, one of the FD objects becomes unreachable?
The Python language does not guarantee anything about when finalizers (__del__ methods) get called. (Indeed, it doesn't guarantee that they get called at all.) The CPython implementation will immediately finalize objects whose refcount equals zero. Running a finalizer on the method will mutate FD._all_fds, while it is being traversed, in this case.
The implications of this particular bug are either that CPython will throw an exception when it sees that the set was modified while iterating over it, or that the finalizer happens to close the fd being processed. Neither one of these cases are very nice, either.
This is the bug I wanted to get to with this article. Like asynchronous signals, finalizers introduce concurrency: even in languages with primitive threading models like Python.
Incidentally, this behavior of running finalizers from the main thread was an early bug in common Java implementations, 15 years ago. All JVM implementors have since fixed this, in the same way: running finalizers within a dedicated thread. This avoids the opportunity for deadlock, or for seeing inconsistent state. Guile will probably do this in 2.2.
For a more thorough discussion of this problem, Hans Boehm has published a number of papers on this topic. The 2002 Destructors, Finalizers, and Synchronization paper is a good one.