on hoot, on boot

I realized recently that I haven’t been writing much about the Hoot Scheme-to-WebAssembly compiler. Upon reflection, I have been too conscious of its limitations to give it verbal tribute, preferring to spend each marginal hour fixing bugs and filling in features rather than publicising progress.

In the last month or so, though, Hoot has gotten to a point that pleases me. Not to the point where I would say “accept no substitutes” by any means, but good already for some things, and worth writing about.

So let’s start today by talking about bootie. Boot, I mean! The boot, the boot, the boot of Hoot.

hoot boot: temporal tunnel

The first axis of boot is time. In the beginning, there was nary a toot, and now, through boot, there is Hoot.

The first boot of Hoot was on paper. Christine Lemmer-Webber had asked me, ages ago, what I thought Guile should do about the web. After thinking a bit, I concluded that it would be best to avoid compromises when building an in-browser Guile: if you have to pollute Guile to match what JavaScript offers, you might as well program in JavaScript. JS is cute of course, but Guile is a bit different in some interesting ways, the most important of which is control: delimited continuations, multiple values, tail calls, dynamic binding, threads, and all that. If Guile’s web bootie doesn’t pack all the funk in its trunk, probably it’s just junk.

So I wrote up a plan something to which I attributed the name tailification. In retrospect, this is simply a specific flavor of a continuation-passing-style (CPS) transmutation, late in the compiler pipeline. I’ll elocute more in a future dispatch. I did end up writing the tailification pass back then; I could have continued to target JS, but it was sufficiently annoying and I didn’t prosecute. It sat around unused for a few years, until Christine’s irresistable charisma managed to conjure some resources for Hoot.

In the meantime, the GC extension for WebAssembly shipped (woot woot!), and to boot Hoot, I filled in the missing piece: a backend for Guile’s compiler that tailified and then translated primitive operations to snippets of WebAssembly.

It was, well, hirsute, but cute and it did compute, so we continued to boot. From this root we grew a small run-time library, written in raw WebAssembly, used for slow-paths for the various primitive operations that are part of Guile’s compiler back-end. We filled out Guile primcalls, in minute commits, growing the WebAssembly runtime library and toolchain as we went.

Eventually we started constituting facilities defined in terms of those primitives, via a Scheme prelude that was prepended to all programs, within a nested lexical environment. It was never our intention though to drown the user’s programs in a sea of predefined bindings, as if the ultimate program were but a vestigial inhabitant of the lexical lake—don’t dilute the newt!, we would often say [ed: we did not]— so eventually when the prelude became unmanageable, we finally figured out how to do whole-program compilation of a set of modules.

Then followed a long month in which I would uproot the loot from the boot: take each binding from the prelude and reattribute it into an appropriate module. User code could import all the modules that suit, as long as they were known to Hoot, but no others; it was only until we added the ability for users to programmatically consitute an environment from their modules that Hoot became a language implementation of any repute.

Which brings us to the work of the last month, about which I cannot be mute. When you have existing Guile code that you want to distribute via the web, Hoot required you transmute its module definitions into the more precise R6RS syntax. Precise, meaning that R6RS modules are static, in a way that Guile modules, at least in absolute terms, are not: Guile programs can use first-class accessors on the module systems to pull out bindings. This is yet another example of what I impute as the original sin of 1990s language development, that modules are just mutable hash maps. You see it in Python, for example: because you don’t know for sure to what values global names are bound, it is easy for any discussion of what a particular piece of code means to end in dispute.

The question is, though, are the semantics of name binding in a language fixed and absolute? Once your language is booted, are its aspects definitively attributed? I think some perfection, in the sense of becoming more perfect or more like the thing you should be, is something to salute. Anyway, in Guile it would be coherent with Scheme’s lexical binding heritage to restitute some certainty as to the meanings of names, at least in a default compilation node. Lexical binding is, after all, the foundation of the Macro Writer’s Statute of Rights. Of course if you are making a build for development purposes, not to distribute, then you might prefer a build that marks all bindings as dynamic. Otherwise I think it’s reasonable to require the user to explicitly indicate which definitions are denotations, and which constitute locations.

Hoot therefore now includes an implementation of the static semantics of Guile’s define-module: it can load Guile modules directly, and as a tribute, it also has an implementation of the ambient (guile) module that constitutes the lexical soup of modules that aren’t #:pure. (I agree, it would be better if all modules were explicit about the language they are written in—their imported bindings and so on—but there is an existing corpus to accomodate; the point is moot.)

The astute reader (whom I salute!) will note that we have a full boot: Hoot is a Guile. Not an implementation to substitute the original, but more of an alternate route to the same destination. So, probably we should scoot the two implementations together, to knock their boots, so to speak, merging the offshoot Hoot into Guile itself.

But do I circumlocute: I can only plead a case of acute Hoot. Tomorrow, we elocute on a second axis of boot. Until then, happy compute!

partitioning pitfalls for generational collectors

You might have heard of long-pole problems: when you are packing a tent, its bag needs to be at least as big as the longest pole. (This is my preferred etymology; there are others.) In garbage collection, the long pole is the longest chain of object references; there is no way you can algorithmically speed up pointer-chasing of (say) a long linked-list.

As a GC author, some of your standard tools can mitigate the long-pole problem, and some don’t apply.

Parallelism doesn’t help: a baby takes 9 months, no matter how many people you assign to the problem. You need to visit each node in the chain, one after the other, and having multiple workers available to process those nodes does not get us there any faster. Parallelism does help in general, of course, but doesn’t help for long poles.

You can apply concurrency: instead of stopping the user’s program (the mutator) to enumerate live objects, you trace while the mutator is running. This can happen on mutator threads, interleaved with allocation, via what is known as incremental tracing. Or it can happen in dedicated tracing threads, which is what people usually mean when they refer to concurrent tracing. Though it does impose some coordination overhead on the mutator, the pause-time benefits are such that most production garbage collectors do trace concurrently with the mutator, if they have the development resources to manage the complexity.

Then there is partitioning: instead of tracing the whole graph all the time, try to just trace part of it and just reclaim memory for that part. This bounds the size of a long pole—it can’t be bigger than the partition you trace—and so tracing a graph subset should reduce total pause time.

The usual way to partition is generational, in which the main program allocates into a nursery, and objects that survive a few collections then get promoted into old space. But there may be other partitions, for example to put “large objects” (for example, bigger than a few virtual memory pages) in their own section, to be managed with their own algorithm.

partitions, take one

And that brings me to today’s article: generational partitioning has a failure mode which manifests itself as spurious out-of-memory. For example, in V8, running this small JavaScript program that repeatedly allocates a 1-megabyte buffer grows to consume all memory in the system and eventually panics:

while (1) new Uint8Array(1e6);

This is a funny result, to say the least. Let’s dig in a bit to see why.

First off, note that allocating a 1-megabyte Uint8Array makes a large object, because it is bigger than half a V8 page, which is 256 kB on most systems There is a backing store for the array that will get allocated into the large object space, and then the JS object wrapper that gets allocated in the young generation of the regular object space.

Let’s imagine the heap consists of the nursery, the old space, and a large object space (lospace, for short). (See the next section for a refinement of this model.) In the loop, we cause allocation in the nursery and the lospace. When the nursery starts to get full, we run a minor collection, to trace the newly allocated part of the object graph, possibly promoting some objects to the old generation.

Promoting an object adds to the byte count in the old generation. You could say that promoting causes allocation in the old-gen, though it might happen just on an accounting level if an object is promoted in place. In V8, old-gen allocation is subject to a limit, and that limit is set by a gnarly series of heuristics. Exceeding the limit will cause a major GC, in which the whole heap is traced.

So, minor collections cause major collections via promotion. But what if a minor collection never promotes? This can happen in request-oriented workflows, in which a request comes in, you handle it in JS, write out the result, and then handle the next. If the nursery is at least as large as the memory allocated in one request, no object will ever survive more than one collection. But in our new Uint8Array(1e6) example above, we are allocating to newspace and the lospace. If we never cause promotion, we will always collect newspace but never trigger the full GC that would also collect the large object space, triggering this out-of-memory phenomenon.

partitions, take two

In general, the phenomenon we are looking for is nursery allocations that cause non-nursery allocations, where those non-nursery allocations will not themselves bring about a major GC.

In our example above, it was a typed array object with an associated lospace backing store, assuming the lospace allocation wouldn’t eventually trigger GC. But V8’s heap is not exactly like our model, for one because it actually has separate nursery and old-generation lospaces, and for two because allocation to the old-generation lospace does count towards the old-gen allocation limit. And yet, that simple does still blow through all memory. So what is the deal?

The answer is that V8’s heap now has around two dozen spaces, and allocations to some of those spaces escape the limits and allocation counters. In this case, V8’s sandbox makes the link between the JS typed array object and its backing store pass through a table of indirections. At each allocation, we make an object in the nursery, allocate a backing store in the nursery lospace, and then allocate a new entry in the external pointer table, storing the index of that entry in the JS object. When the object needs to get its backing store, it dereferences the corresponding entry in the external pointer table.

Our tight loop above would therefore cause an allocation (in the external pointer table) that itself would not hasten a major collection.

Seen this way, one solution immediately presents itself: maybe we should find a way to make external pointer table entry allocations trigger a GC based on some heuristic, perhaps the old-gen allocated bytes counter. But then you have to actually trigger GC, and this is annoying and not always possible, as EPT entries can be allocated in response to a pointer write. V8 hacker Michael Lippautz summed up the issue in a comment that can be summarized as “it’s gnarly”.

In the end, it would be better if new-space allocations did not cause old-space allocations. We should separate the external pointer table (EPT) into old and new spaces. Because there is at most one “host” object that is associated with any EPT entry—if it’s zero objects, the entry is dead—then the heuristics that apply to host objects will do the right thing with regards to EPT entries.

A couple weeks ago, I landed a patch to do this. It was much easier said than done; the patch went through upwards of 40 revisions, as it took me a while to understand the delicate interactions between the concurrent and parallel parts of the collector, which were themselves evolving as I was working on the patch.

The challenge mostly came from the fact that V8 has two nursery implementations that operate in different ways.

The semi-space nursery (the scavenger) is a parallel stop-the-world collector, which is relatively straightforward... except that there can be a concurrent/incremental major trace in progress while the scavenger runs (a so-called interleaved minor GC). External pointer table entries have mark bits, which the scavenger collector will want to use to determine which entries are in use and which can be reclaimed, but the concurrent major marker will also want to use those mark bits. In the end we decided that if a major mark is in progress, an interleaved collection will neither mark nor sweep the external pointer table nursery; a major collection will finish soon, and reclaiming dead EPT entries will be its responsibility.

The minor mark-sweep nursery does not run concurrently with a major concurrent/incremental trace, which is something of a relief, but it does run concurrently with the mutator. When we promote a page, we also need to promote all EPT entries for objects on that page. To keep track of which objects have external pointers, we had to add a new remembered set, built up during a minor trace, and cleared when the page is swept (also lazily / concurrently). Promoting a page iterates through that set, evacuating EPT entries to the old space. This is additional work during the stop-the-world pause, but hopefully it is not too bad.

To be honest I don’t have the performance numbers for this one. It rides the train this week to Chromium 126 though, and hopefully it fixes this problem in a robust way.

partitions, take three

The V8 sandboxing effort has sprouted a number of indirect tables: external pointers, external buffers, trusted objects, and code objects. Also recently to better integrate with compressed pointers, there is also a table for V8-to-managed-C++ (Oilpan) references. I expect the Oilpan reference table will soon grow a nursery along the same lines as the regular external pointer table.

In general, if you have a generational partition of your main object space, it would seem that you almost always need a generational partition of every other space. Otherwise either you cause new allocations to occur in what is effectively an old space, perhaps needlessly hastening a major GC, or you forget to track allocations in that space, leading to a memory leak. If the generational hypothesis applies for a wrapper object, it probably also applies for any auxiliary allocations as well.

fin

I have a few cleanups remaining in this area but I am relieved to have landed this patch, and pleased to have spent time under the hood of V8’s collector. Many many thanks to Samuel Groß, Michael Lippautz, and Dominik Inführ for their review and patience. Until the next cycle, happy allocating!

hacking v8 with guix, bis

Good day, hackers. Today, a pragmatic note, on hacking on V8 from a Guix system.

I’m going to skip a lot of the background because, as it turns out, I wrote about this already almost a decade ago. But following that piece, I mostly gave up on doing V8 hacking from a Guix machine—it was more important to just go with the flow of the ever-evolving upstream toolchain. In fact, I ended up installing Ubuntu LTS on my main workstations for precisely this reason, which has worked fine; I still get Guix in user-space, which is better than nothing.

Since then, though, Guix has grown to the point that it’s easier to create an environment that can run a complicated upstream source management project like V8’s. This is mainly guix shell in the --container --emulate-fhs mode. This article is a step-by-step for how to get started with V8 hacking using Guix.

get the code

You would think this would be the easy part: just git clone the V8 source. But no, the build wants a number of other Google-hosted dependencies to be vendored into the source tree. To perform the initial fetch for those dependencies and to keep them up to date, you use helpers from the depot_tools project. You also use depot_tools to submit patches to code review.

When you live in the Guix world, you might be tempted to look into what depot_tools actually does, and to replicate its functionality in a more minimal, Guix-like way. Which, sure, perhaps this is a good approach for packaging V8 or Chromium or something, but when you want to work on V8, you need to learn some humility and just go with the flow. (It’s hard for the kind of person that uses Guix. But it’s what you do.)

You can make some small adaptations, though. depot_tools is mostly written in Python, and it actually bundles its own virtualenv support for using a specific python version. This isn’t strictly needed, so we can set the funny environment variable VPYTHON_BYPASS="manually managed python not supported by chrome operations" to just use python from the environment.

Sometimes depot_tools will want to run some prebuilt binaries. Usually on Guix this is anathema—we always build from source—but there’s only so much time in the day and the build system is not our circus, not our monkeys. So we get Guix to set up the environment using a container in --emulate-fhs mode; this lets us run third-party pre-build binaries. Note, these binaries are indeed free software! We can run them just fine if we trust Google, which you have to when working on V8.

no, really, get the code

Enough with the introduction. The first thing to do is to check out depot_tools.

mkdir src
cd src
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

I’m assuming you have git in your Guix environment already.

Then you need to initialize depot_tools. For that you run a python script, which needs to run other binaries – so we need to make a specific environment in which it can run. This starts with a manifest of packages, is conventionally placed in a file named manifest.scm in the project’s working directory, though you don’t have one yet, so you can just write it into v8.scm or something anywhere:

(use-modules (guix packages)
             (gnu packages gcc))

(concatenate-manifests
 (list
  (specifications->manifest
   '(
     "bash"
     "binutils"
     "clang-toolchain"
     "coreutils"
     "diffutils"
     "findutils"
     "git"
     "glib"
     "glibc"
     "glibc-locales"
     "grep"
     "less"
     "ld-gold-wrapper"
     "make"
     "nss-certs"
     "nss-mdns"
     "openssh"
     "patch"
     "pkg-config"
     "procps"
     "python"
     "python-google-api-client"
     "python-httplib2"
     "python-pyparsing"
     "python-requests"
     "python-tzdata"
     "sed"
     "tar"
     "wget"
     "which"
     "xz"
     ))
  (packages->manifest
   `((,gcc "lib")))))

Then, you guix shell -m v8.scm. But you actually do more than that, because we need to set up a container so that we can expose a standard /lib, /bin, and so on:

guix shell --container --network \
  --share=$XDG_RUNTIME_DIR --share=$HOME \
  --preserve=TERM --preserve=SSH_AUTH_SOCK \
  --emulate-fhs \
  --manifest=v8.scm

Let’s go through these options one by one.

  • --container: This is what lets us run pre-built binaries, because it uses Linux namespaces to remap the composed packages to /bin, /lib, and so on.

  • --network: Depot tools are going to want to download things, so we give them net access.

  • --share: By default, the container shares the current working directory with the “host”. But we need not only the checkout for V8 but also the sibling checkout for depot tools (more on this in a minute); let’s just share the whole home directory. Also, we share the /run/user/1000 directory, which is $XDG_RUNTIME_DIR, which lets us access the SSH agent, so we can check out over SSH.

  • --preserve: By default, the container gets a pruned environment. This lets us pass some environment variables through.

  • --emulate-fhs: The crucial piece that lets us bridge the gap between Guix and the world.

  • --manifest: Here we specify the list of packages to use when composing the environment.

We can use short arguments to make this a bit less verbose:

guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm

I would like it if all of these arguments could somehow be optional, that I could get a bare guix shell invocation to just apply them, when run in this directory. Perhaps some day.

Running guix shell like this drops you into a terminal. So let’s initialize depot tools:

cd $HOME/src
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export PATH=$HOME/src/depot_tools:$PATH
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
gclient

This should download a bunch of things, I don’t know what. But at this point we’re ready to go:

fetch v8

This checks out V8, which is about 1.3 GB, and then probably about as much again in dependencies.

build v8

You can build V8 directly:

# note caveat below!
cd v8
tools/dev/gm.py x64.release

This will build fine... and then fail to link. The precise reason is obscure to me: it would seem that by default, V8 uses a whole Debian sysroot for Some Noble Purpose, and ends up linking against it. But it compiles against system glibc, which seems to have replaced fcntl64 with a versioned symbol, or some such nonsense. It smells like V8 built against a too-new glibc and then failed trying to link to an old glibc.

To fix this, you need to go into the args.gn that was generated in out/x64.release and then add use_sysroot = false, so that it links to system glibc instead of the downloaded one.

echo 'use_sysroot = false' >> out/x64.release/args.gn
tools/dev/gm.py x64.release

You probably want to put the commands needed to set up your environment into some shell scripts. For Guix you could make guix-env:

#!/bin/sh
guix shell -CNF --share=$XDG_RUNTIME_DIR --share=$HOME \
  -ETERM -ESSH_AUTH_SOCK -m manifest.scm -- "$@"

Then inside the container you need to set the PATH and such, so we could put this into the V8 checkout as env:

#!/bin/sh
# Look for depot_tools in sibling directory.
depot_tools=`cd $(dirname $0)/../depot_tools && pwd`
export PATH=$depot_tools:$PATH
export VPYTHON_BYPASS="manually managed python not supported by chrome operations"
export SSL_CERT_DIR=/etc/ssl/certs/
export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
exec "$@"

This way you can run ./guix-env ./env tools/dev/gm.py x64.release and not have to “enter” the container so much.

notes

This all works fine enough, but I do have some meta-reflections.

I would prefer it if I didn’t have to use containers, for two main reasons. One is that the resulting build artifacts have to be run in the container, because they are dynamically linked to e.g. /lib, at least for the ELF loader. It would be better if I could run them on the host (with the host debugger, for example). Using Guix to make the container is better than e.g. docker, though, because I can ensure that the same tools are available in the guest as I use on the host. But also, I don’t like adding “modes” to my terminals: are you in or out of this or that environment. Being in a container is not like being in a vanilla guix shell, and that’s annoying.

The build process uses many downloaded tools and artifacts, including clang itself. This is a feature, in that I am using the same compiler that colleagues at Google use, which is important. But it’s also annoying and it would be nice if I could choose. (Having the same clang-format though is an absolute requirement.)

There are two tests failing, in this configuration. It is somehow related to time zones. I have no idea why, but I just ignore them.

If the build system were any weirder, I would think harder about maybe using Docker or something like that. Colleagues point to distrobox as being a useful wrapper. It is annoying though, because such a docker image becomes like a little stateful thing to do sysadmin work on, and I would like to avoid that if I can.

Welp, that’s all for today. Hopefully if you are contemplating installing Guix as your operating system (rather than just in user-space), this can give you a bit more information as to what it might mean when working on third-party projects. Happy hacking and until next time!

on the impossibility of composing finalizers and ffi

While poking the other day at making a Guile binding for Harfbuzz, I remembered why I don’t much do this any more: it is impossible to compose GC with explicit ownership.

Allow me to illustrate with an example. Harfbuzz has a concept of blobs, which are refcounted sequences of bytes. It uses these in a number of places, for example when loading OpenType fonts. You can get a peek at the blob’s contents back with hb_blob_get_data, which gives you a pointer and a length.

Say you are in LuaJIT. (To think that for a couple years, I wrote LuaJIT all day long; now I can hardly remember.) You get a blob from somewhere and want to get its data. You define a wrapper for hb_blob_get_data:

local hb = ffi.load("harfbuzz")
ffi.cdef [[
typedef struct hb_blob_t hb_blob_t;

const char *
hb_blob_get_data (hb_blob_t *blob, unsigned int *length);
]]

Presumably you then arrange to release LuaJIT’s reference on the blob when GC collects a Lua wrapper for a blob:

ffi.cdef [[
void hb_blob_destroy (hb_blob_t *blob);
]]

function adopt_blob(ptr)
  return ffi.gc(ptr, hb.hb_blob_destroy)
end

OK, so let’s say we get a blob from somewhere, and want to copy out its contents as a byte string.

function blob_contents(blob)
   local len_out = ffi.new('unsigned int')
   local contents = hb.hb_blob_get_data(blob, len_out)
   local len = len_out[0];
   return ffi.string(contents, len)
end

The thing is, this code is as correct as you can get it, but it’s not correct enough. In between the call to hb_blob_get_data and, well, anything else, GC could run, and if blob is not used in the future of the program execution (the continuation), then it could be collected, causing the hb_blob_destroy finalizer to release the last reference on the blob, freeing contents: we would then be accessing invalid memory.

Among GC implementors, it is a truth universally acknowledged that a program containing finalizers must be in want of a segfault. The semantics of LuaJIT do not prescribe when GC can happen and what values will be live, so the GC and the compiler are not constrained to extend the liveness of blob to, say, the entirety of its lexical scope. It is perfectly valid to collect blob after its last use, and so at some point a GC will evolve to do just that.

I chose LuaJIT not to pick on it, but rather because its FFI is very straightforward. All other languages with GC that I am aware of have this same issue. There are but two work-arounds, and neither are satisfactory: either develop a deep and correct knowledge of what the compiler and run-time will do for a given piece of code, and then pray that knowledge does not go out of date, or attempt to manually extend the lifetime of a finalizable object, and then pray the compiler and GC don’t learn new tricks to invalidate your trick.

This latter strategy takes the form of “remember-this” procedures that are designed to outsmart the compiler. They have mostly worked for the last few decades, but I wouldn’t bet on them in the future.

Another way to look at the problem is that once you have a system working—though, how would you know it’s correct?—then you either never update the compiler and run-time, or you become fast friends with whoever maintains your GC, and probably your compiler too.

For more on this topic, as always Hans Boehm has the first and last word; see for example the 2002 Destructors, finalizers, and synchronization. These considerations don’t really apply to destructors, which are used in languages with ownership and generally run synchronously.

Happy hacking, and be safe out there!

guix on the framework 13 amd

I got a new laptop! It’s a Framework 13 AMD: 8 cores, 2 threads per core, 64 GB RAM, 3:2 2256×1504 matte screen. It kicks my 5-year-old Dell XPS 13 in the pants, and I am so relieved to be back to a matte screen. I just got it up and running with Guix, which though easier than past installation experiences was not without some wrinkles, so here I wanted to share a recipe for what worked for me.

(I swear this isn’t going to become a product review blog, but when I went to post something like this on the Framework forum I got an error saying that new users could only post 2 links. I understand how we got here but hoo, that is a garbage experience!)

The basic deal

Upstream Guix works on the Framework 13 AMD, but only with software rendering and no wifi, and I wasn’t able to install from upstream media. This is mainly because Guix uses a modified kernel and doesn’t include necessary firmware. There is a third-party nonguix repository that defines packages for the vanilla Linux kernel and the linux-firmware collection; we have to use that repo if we want all functionality.

Of course having the firmware be user-hackable would be better, and it would be better if the framework laptop used parts with free firmware. Something for a next revision, hopefully.

On firmware

As an aside, I think the official Free Software Foundation position on firmware is bad praxis. To recall, the idea is that if a device has embedded software (firmware) that can be updated, but that software is in a form that users can’t modify, then the system as a whole is not free software. This is technically correct but doesn’t logically imply that the right strategy for advancing free software is to forbid firmware blobs; you have a number of potential policy choices and you have to look at their expected results to evaluate which one is most in line with your goals.

Bright lines are useful, of course; I just think that with respect to free software, drawing that line around firmware is not interesting. To illustrate this point, I believe the current FSF position is that if you can run e.g. a USB ethernet adapter without installing non-free firmware, then it is kosher, otherwise it is haram. However many of these devices have firmware; it’s just that you aren’t updating it. So for example the the USB Ethernet adapter I got with my Dell system many years ago has firmware, therefore it has bugs, but I have never updated that firmware because that’s not how we roll. Or, on my old laptop, I never updated the CPU microcode, despite spectre and meltdown and all the rest.

“Firmware, but never updated” reminds me of the wires around some New York neighborhoods that allow orthodox people to leave the house on Sabbath; useful if you are of a given community and enjoy the feeling of belonging, but I think even the faithful would see it as a hack. It is like how Richard Stallman wouldn’t use travel booking web sites because they had non-free JavaScript, but would happily call someone on the telephone to perform the booking for him, using those same sites. In that case, the net effect on the world of this particular bright line is negative: it does not advance free software in the least and only adds overhead. Privileging principle over praxis is generally a losing strategy.

Installation

Firstly I had to turn off secure boot in the bios settings; it’s in “security”.

I wasn’t expecting wifi to work out of the box, but for some reason the upstream Guix install media was not able to configure the network via the Ethernet expansion card nor an external USB-C ethernet adapter that I had; stuck at the DHCP phase. So my initial installation attempt failed.

Then I realized that the nonguix repository has installation media, which is the same as upstream but with the vanilla kernel and linux-firmware. So on another machine where I had Guix installed, I added the nonguix channel and built the installation media, via guix system image -t iso9660 nongnu/system/install.scm. That gave me a file that I could write to a USB stick.

Using that installation media, installing was a breeze.

However upon reboot, I found that I had no wifi and I was using software rendering; clearly, installation produced an OS config with the Guix kernel instead of upstream Linux. Happily, at this point the ethernet expansion card was able to work, so connect to wired ethernet, open /etc/config.scm, add the needed lines as described in the operating-system part of the nonguix README, reconfigure, and reboot. Building Linux takes a little less than an hour on this machine.

Fractional scaling

At that point you have wifi and graphics drivers. I use GNOME, and things seem to work. However the screen defaults to 200% resolution, which makes everything really big. Crisp, pretty, but big. Really you would like something in between? Or that the Framework ships a higher-resolution screen so that 200% would be a good scaling factor; this was the case with my old Dell XPS 13, and it worked well. Anyway with the Framework laptop, I wanted 150% scaling, and it seems these days that the way you have to do this is to use Wayland, which Guix does not yet enable by default.

So you go into config.scm again, and change where it says %desktop-services to be:

(modify-services %desktop-services
  (gdm-service-type config =>
    (gdm-configuration (inherit config) (wayland? #t))))

Then when you reboot you are in Wayland. Works fine, it seems. But then you have to go and enable an experimental mutter setting; install dconf-editor, run it, search for keys with “mutter” in the name, find the “experimental settings” key, tell it to not use the default setting, then click the box for “scale-monitor-framebuffer”.

Then! You can go into GNOME settings and get 125%, 150%, and so on. Great.

HOWEVER, and I hope this is a transient situation, there is a problem: in GNOME, applications that aren’t native Wayland apps don’t scale nicely. It’s like the app gets rendered to a texture at the original resolution, which then gets scaled up in a blurry way. There aren’t so many of these apps these days as most things have been ported to be Wayland-capable, Firefox included, but Emacs is one of them :( However however! If you install the emacs-pgtk package instead of emacs, it looks better. Not perfect, but good enough. So that’s where I am.

Bugs

The laptop hangs on reboot due to this bug, but that seems a minor issue at this point. There is an ongoing tracker discussion on the community forum; like other problems in that thread, I hope that this one resolves itself upstream in Linux over time.

Other things?

I didn’t mention the funniest thing about this laptop: it comes in pieces that you have to put together :) I am not so great with hardware, but I had no problem. The build quality seems pretty good; not a MacBook Air, but then it’s also user-repairable, which is a big strong point. It has these funny extension cards that slot into the chassis, which I have found to be quite amusing.

I haven’t had the machine for long enough but it seems to work fine up to now: suspend, good battery use, not noisy (unless it’s compiling on all 16 threads), graphics, wifi, ethernet, good compilation speed. (I should give compiling LLVM a go; that’s a useful workload.) I don’t have bluetooth or the fingerprint reader working yet; I give it 25% odds that I get around to this during the lifetime of this laptop :)

Until next time, happy hacking!

family bike transportation

Good evening! Tonight I have a brief and unusual post, which is a product review of an electric cargo bike and its accessories for transporting kids. Let’s see if I can get this finished while I wait for my new laptop to finish installing.

So, I have three young kids (5yo, 3yo, 1yo), and I need to get them places. Before the 3rd was born I would use a bike trailer (Thule Chariot Lite single, bought when there was just one kid) and a bike seat (Thule RideAlong Lite, attached on seat-post). That was fine, though sometimes the thought of lugging their ever-increasing kilograms somewhere would give me pause. Then when the third kid arrived, hoo boy; I got a front-mount Thule Yepp Nexxt 2 Mini, to see if I could do three kids on one me-powered bike, but that was too tight to manage; not enough space to kick my leg over when getting on.

In the end we finally broke down and looked at electric cargo bikes. Of course I had looked at these over the years and always bounced off the price. Initially I had thought a front box-bike would be the thing, but then as kids grew up I realized they wouldn’t be comfortable there, and that a long-tail was probably better for the long term. But holy Christ, they are expensive. Happily, Decathlon came out with an electric longtail which is quite acceptable, and for about half the price of something equivalent from elsewhere.

Funny story: a friend got her bike stolen in the center of Geneva one day; thieves came and took all the bikes on a rack. I guess it was a battery-operated angle grinder; apparently that is the modus operandi these days. She moped a bit but then decided to buy the same bike again, from Decathlon as it happens. While she was at the store she entered a raffle. Then the cops called to say they found her bike – I know right?! Turns out some other bike that was stolen had an Apple AirTag on it, and its owner called the cops to tell them where the bike was, and all of the bikes were recovered. In the meantime my friend’s insurance had paid out for her stolen bike, so she had an extra bike. Then the local Decathlon called to tell her she won the raffle, for some kind of electric bike. When she went to pick it up, it was the electric longtail, for free. Anyway, we got to try hers before deciding to get one too.

One of my questions was, can you jam 3 kids on this thing? In terms of weight, yes: it will take 80 kilos on the back, and our kids total 45 kilos. In terms of space it’s OK, but really for the 1yo you need a bike seat, and even for the 3yo she should really be in a seat unless she’s very awake. The back rack has a frame around it, which does help keep kids on, but it’s not sufficient for a sleepy 3yo.

I was hoping to find a quite narrow kid bike seat so I could put on two seats for the young ones and then jam the oldest in somehow. I started with the Thule Yepp Nexxt 2 Maxi, but the release clamp kinda wants to be where the frame around the back rack is, so it’s not so efficient, and not easy to remove. Also the plastic guards so that kids don’t catch their legs in the back wheel aren’t needed on this particular bike, but they do prevent me from effectively accessing the otherwise well-designed panniers (c’est drôle mais ce ne sont pas des panniers, mais des saccoches).

So, with the Thule rear-mount seat I could get one bike seat for the 1yo and then jam in the 3yo and 5yo. It worked fine.

Then, annoyingly, thieves stole our electric longtail. Apparently insurance will pay out for us too—this is a relatively standard feature in France for the kind of insurance you have to have already for your place of residence—but for the last few weeks we have been without our longtail, and it is terrible. In the end we decided just to buy the same bike again: proof that it is good enough.

There are other electric longtails out there. If you can afford it, a pedal motor will be better than the hub motor on the Decathlon model. But if you are willing to accept less than the best, I think the Decathlon bike is quite good for what it is and I am looking forward to picking up the new bike tomorrow. It fits the kids, easily adjusts between different rider heights, and is a real joy to be on as a family. It lets me go places I wouldn’t think of going without the ability to just chuck everybody on the bike and zip away.

As far as bike seats go, I am going to try a new seat, to see if I can avoid the leg covers and to see if it’s more narrow. Ping me on the Mastodon if you want a follow-up. Thoughts welcome below for things that have worked for you. Until next time, happy cycling!

nombrilliant, actually

Today, a middle-aged note: when you are young, unless you been failed by The System, you enjoy a radiant confidence: everything you say burns with rightness and righteousness, that the world Actually Is This Way, You See, and if you think about it, it Actually Should Be This Other Specific Way. This is how you get the fervent young communists and Scala enthusiasts and ecologists and Ayn Randians. The ideas are so right that you become an evangelist, a prophet, a truth-speaker; a youtuber, perhaps.

Then, with luck, you meet the world: you build, you organize, you invest, you double down. And in that doubling, the ideas waver, tremble, resonate, imperceptibly at first, reinforced in some ways, impeded in others. The world works in specific ways, too, and you don’t really know them in the beginning: not in the bones, anyway. The unknowns become known, enumerate themselves, dragons everywhere; and in the end, what can you say about them? Do you stand in a spot that can see anything at all? Report, observe, yes; analyze, maybe, eventually; prophesize, never. Not any more.

And then, years later, you are still here. The things you see, the things you know, other people don’t: they can’t. They weren’t here. They aren’t here. They hear (and retell) stories, back-stories, back-back-stories, a whole cinematic universe of narrative, and you know that it’s powerful and generative and yet unhinged, essentially unmoored and distinct from reality, right and even righteous in some ways, but wrong in others. This happen in all domains: macroeconomics, programming languages, landscape design, whatever. But you see. You see through stories, their construction and relation to the past, on a meta level, in a way that was not apparent when you were young.

I tell this story (everything is story) as an inexorable progression, a Hegelian triptych of thesis-antithesis-synthesis; a conceit. But there are structures that can to get you to synthesis more efficiently. PhD programs try: they break you down to allow you to build. They do it too quickly, perhaps; you probably have to do it again in your next phase, academia or industry, though I imagine it’s easier the second time around. Some corporate hierarchies also manage to do this, in which when you become Staff Engineer, you become the prophet.

Of course, synthesis is not inexorable; you can stop turning the crank anywhere. Perhaps you never move from ideal to real. Perhaps, unmoored, you drift, painter rippling the waters. But what do you do when the crank comes around? Where to next?

Anyway, all this is to say that I have lately been backing away from bashfulness in a professional context: there are some perspectives that I see that can’t be seen or expressed by others. It feel very strange to write it, but I am even trying to avoid self-deprecation and hedging; true, I might not possess the authoritative truth on, I don’t know, WebAssembly, or Scheme language development, but nobody else does either, and I might as well just say what I think as if it’s true.

* * *

Getting old is not so bad. You say very cheesy things, you feel cheesy, but it is a kind of new youth too, reclaiming a birthday-right of being earnest. I am doubling down on Dad energy. (Yes, there is a similar kind of known-tenuous confidence necessary to raise kids. I probably would have forced into this position earlier if I had kids younger. But, I don’t mean to take the metaphor fa(r)ther; responsible community care for the young is by far not the sole province of the family man.)

So, for the near future, I embrace the cheese. And then, where to? I suspect excessive smarm. But if I manage to succeed in avoiding that, I look forward to writing about ignorance in another 5 years. Until then, happy hacking to all, and thank you for your forbearance!

micro macro story time

Today, a tiny tale: about 15 years ago I was working on Guile’s macro expander. Guile inherited this code from an early version of Kent Dybvig’s portable syntax expander. It was... not easy to work with.

Some difficulties were essential. Scope is tricky, after all.

Some difficulties were incidental, but deep. The expander is ultimately a function that translates Scheme-with-macros to Scheme-without-macros. However, it is itself written in Scheme-with-macros, so to load it on a substrate without macros requires a pre-expanded copy of itself, whose data representations need to be compatible with any incremental change, so that you will be able to use the new expander to produce a fresh pre-expansion. This difficulty could have been avoided by incrementally bootstrapping the library. It works once you are used to it, but it’s gnarly.

But then, some difficulties were just superflously egregious. Dybvig is a totemic developer and researcher, but a generation or two removed from me, and when I was younger, it never occurred to me to just email him to ask why things were this way. (A tip to the reader: if someone is doing work you are interested in, you can just email them. Probably they write you back! If they don’t respond, it’s not you, they’re probably just busy and their inbox leaks.) Anyway in my totally speculatory reconstruction of events, when Dybvig goes to submit his algorithm for publication, he gets annoyed that “expand” doesn’t sound fancy enough. In a way it’s similar to the original SSA developers thinking that “phony functions” wouldn’t get published.

So Dybvig calls the expansion function “χ”, because the Greek chi looks like the X in “expand”. Fine for the paper, whatever paper that might be, but then in psyntax, there are all these functions named chi and chi-lambda and all sorts of nonsense.

In early years I was often confused by these names; I wasn’t in on the pun, and I didn’t feel like I had enough responsibility for this code to think what the name should be. I finally broke down and changed all instances of “chi” to “expand” back in 2011, and never looked back.

Anyway, this is a story with a very specific moral: don’t name your functions chi.

missing the point of webassembly

I find most descriptions of WebAssembly to be uninspiring: if you start with a phrase like “assembly-like language” or a “virtual machine”, we have already lost the plot. That’s not to say that these descriptions are incorrect, but it’s like explaining what a dog is by starting with its circulatory system. You’re not wrong, but you should probably lead with the bark.

I have a different preferred starting point which is less descriptive but more operational: WebAssembly is a new fundamental abstraction boundary. WebAssembly is a new way of dividing computing systems into pieces and of composing systems from parts.

This all may sound high-falutin´, but it’s for real: this is the actually interesting thing about Wasm.

fundamental & abstract

It’s probably easiest to explain what I mean by example. Consider the Linux ABI: Linux doesn’t care what code it’s running; Linux just handles system calls and schedules process time. Programs that run against the x86-64 Linux ABI don’t care whether they are in a container or a virtual machine or “bare metal” or whether the processor is AMD or Intel or even a Mac M3 with Docker and Rosetta 2. The Linux ABI interface is fundamental in the sense that either side can implement any logic, subject to the restrictions of the interface, and abstract in the sense that the universe of possible behaviors has been simplified to a limited language, in this case that of system calls.

Or take HTTP: when you visit wingolog.org, you don’t have to know (but surely would be delighted to learn) that it’s Scheme code that handles the request. I don’t have to care if the other side of the line is curl or Firefox or Wolvic. HTTP is such a successful fundamental abstraction boundary that at this point it is the default for network endpoints; whether you are a database or a golang microservice, if you don’t know that you need a custom protocol, you use HTTP.

Or, to rotate our metaphorical compound microscope to high-power magnification, consider the SYS-V amd64 C ABI: almost every programming language supports some form of extern C {} to access external libraries, and the best language implementations can produce artifacts that implement the C ABI as well. The standard C ABI splits programs into parts, and allows works from separate teams to be composed into a whole. Indeed, one litmus test of a fundamental abstraction boundary is, could I reasonably define an interface and have an implementation of it be in Scheme or OCaml or what-not: if the answer is yes, we are in business.

It is in this sense that WebAssembly is a new fundamental abstraction boundary.

WebAssembly shares many of the concrete characteristics of other abstractions. Like the Linux syscall interface, WebAssembly defines an interface language in which programs rely on host capabilities to access system features. Like the C ABI, calling into WebAssembly code has a predictable low cost. Like HTTP, you can arrange for WebAssembly code to have no shared state with its host, by construction.

But WebAssembly is a new point in this space. Unlike the Linux ABI, there is no fixed set of syscalls: WebAssembly imports are named, typed, and without pre-defined meaning, more like the C ABI. Unlike the C ABI, WebAssembly modules have only the shared state that they are given; neither side has a license to access all of the memory in the “process”. And unlike HTTP, WebAssembly modules are “in the room” with their hosts: close enough that hosts can allow themselves the luxury of synchronous function calls, and to allow WebAssembly modules to synchronously call back into their hosts.

applied teleology

At this point, you are probably nodding along, but also asking yourself, what is it for? If you arrive at this question from the “WebAssembly is a virtual machine” perspective, I don’t think you’re well-equipped to answer. But starting as we did by the interface, I think we are better positioned to appreciate how WebAssembly fits into the computing landscape: the narrative is generative, in that you can explore potential niches by identifying existing abstraction boundaries.

Again, let’s take a few examples. Say you ship some “smart cities” IoT device, consisting of a microcontroller that runs some non-Linux operating system. The system doesn’t have an MMU, so you don’t have hardware memory protections, but you would like to be able to enforce some invariants on the software that this device runs; and you would also like to be able to update that software over the air. WebAssembly is getting used in these environments; I wish I had a list of deployments at hand, but perhaps we can at least take this article last year from a WebAssembly IoT vendor as proof of commercial interest.

Or, say you run a function-as-a-service cloud, meaning that you run customer code in response to individual API requests. You need to limit the allowable set of behaviors from the guest code, so you choose some abstraction boundary. You could use virtual machines, but that would be quite expensive in terms of memory. You could use containers, but you would like more control over the guest code. You could have these functions written in JavaScript, but that means that your abstraction is no longer fundamental; you limit your applicability. WebAssembly fills an interesting niche here, and there are a number of products in this space, for example Fastly Compute or Fermyon Spin.

Or to go smaller, consider extensible software, like the GIMP image editor or VS Code: in the past you would use loadable plug-in modules via the C ABI, which can be quite gnarly, or you lean into a particular scripting language, which can be slow, inexpressive, and limit the set of developers that can write extensions. It’s not a silver bullet, but WebAssembly can have a role here. For example, the Harfbuzz text shaping library supports fonts with an embedded (em-behdad?) WebAssembly extension to control how strings of characters are mapped to positioned glyphs.

aside: what boundaries do

They say that good fences make good neighbors, and though I am not quite sure it is true—since my neighbor put up a fence a few months ago, our kids don’t play together any more—boundaries certainly facilitate separation of functionality. Conway’s law is sometimes applied as a descriptive observation—ha-ha, isn’t that funny, they just shipped their org chart—but this again misses the point, in that boundaries facilitate separation, but also composition: if I know that I can fearlessly allow a font to run code because I have an appropriate abstraction boundary between host application and extension, I have gained in power. I no longer need to be responsible for every part of the product, and my software can scale up to solve harder problems by composing work from multiple teams.

There is little point in using WebAssembly if you control both sides of a boundary, just as (unless you have chickens) there is little point in putting up a fence that runs through the middle of your garden. But where you want to compose work from separate teams, the boundaries imposed by WebAssembly can be a useful tool.

narrative generation

WebAssembly is enjoying a tail-wind of hype, so I think it’s fair to say that wherever you find a fundamental abstraction boundary, someone is going to try to implement it with WebAssembly.

Again, some examples: back in 2022 I speculated that someone would “compile” Docker containers to WebAssembly modules, and now that is a thing.

I think at some point someone will attempt to replace eBPF with Wasm in the Linux kernel; eBPF is just not as good a language as Wasm, and the toolchains that produce it are worse. eBPF has clunky calling-conventions about what registers are saved and spilled at call sites, a decision that can be made more efficiently for the program and architecture at hand when register-allocating WebAssembly locals. (Sometimes people lean on the provably-terminating aspect of eBPF as its virtue, but that could apply just as well to Wasm if you prohibit the loop opcode (and the tail-call instructions) at verification-time.) And why don’t people write whole device drivers in eBPF? Or rather, targetting eBPF from C or what-have-you. It’s because eBPF is just not good enough. WebAssembly is, though! Anyway I think Linux people are too chauvinistic to pick this idea up but I bet Microsoft could do it.

I was thinking today, you know, it actually makes sense to run a WebAssembly operating system, one which runs WebAssembly binaries. If the operating system includes the Wasm run-time, it can interpose itself at syscall boundaries, sometimes allowing it to avoid context switches. You could start with something like the Linux ABI, perhaps via WALI, but for a subset of guest processes that conform to particular conventions, you could build purpose-built composition that can allocate multiple WebAssembly modules to a single process, eliding inter-process context switches and data copies for streaming computations. Or, focussing on more restricted use-cases, you could make a microkernel; googling around I found this article from a couple days ago where someone is giving this a go.

wwwhat about the wwweb

But let’s go back to the web, where you are reading this. In one sense, WebAssembly is a massive web success, being deployed to literally billions of user agents. In another, it is marginal: people do not write web front-ends in WebAssembly. Partly this is because the kind of abstraction supported by linear-memory WebAssembly 1.0 isn’t a good match for the garbage-collected DOM API exposed by web browsers. As a corrolary, languages that are most happy targetting this linear-memory model (C, Rust, and so on) aren’t good for writing DOM applications either. WebAssembly is used in auxiliary modules where you want to run legacy C++ code on user devices, or to speed up a hot leaf function, but isn’t a huge success.

This will change with the recent addition of managed data types to WebAssembly, but not necessarily in the way that you might think. Like, now that it will be cheaper and more natural to pass data back and forth with JavaScript, are we likely to see Wasm/GC progressively occupying more space in web applications? For me, I doubt that progressive is the word. In the same way that you wouldn’t run a fence through the middle of your front lawn, you wouldn’t want to divide your front-end team into JavaScript and WebAssembly sub-teams. Instead I think that we will see more phase transitions, in which whole web applications switch from JavaScript to Wasm/GC, compiled from Dart or Elm or what have you. The natural fundamental abstraction boundary in a web browser is between the user agent and the site’s code, not within the site’s code itself.

conclusion

So, friends, if you are talking to a compiler engineer, by all means: keep describing WebAssembly as an virtual machine. It will keep them interested. But for everyone else, the value of WebAssembly is what it does, which is to be a different way of breaking a system into pieces. Armed with this observation, we can look at current WebAssembly uses to understand the nature of those boundaries, and to look at new boundaries to see if WebAssembly can have a niche there. Happy hacking, and may your components always compose!

scheme modules vs whole-program compilation: fight

In a recent dispatch, I explained the whole-program compilation strategy used in Whiffle and Hoot. Today’s note explores what a correct solution might look like.

being explicit

Consider a module that exports an increment-this-integer procedure. We’ll use syntax from the R6RS standard:

(library (inc)
  (export inc)
  (import (rnrs))
  (define (inc n) (+ n 1)))

If we then have a program:

(import (rnrs) (inc))
(inc 42)

Then the meaning of this program is clear: it reduces to (+ 42 1), then to 43. Fine enough. But how do we get there? How does the compiler compose the program with the modules that it uses (transitively), to produce a single output?

In Whiffle (and Hoot), the answer is, sloppily. There is a standard prelude that initially has a number of bindings from the host compiler, Guile. One of these is +, exposed under the name %+, where the % in this case is just a warning to the reader that this is a weird primitive binding. Using this primitive, the prelude defines a wrapper:

...
(define (+ x y) (%+ x y))
...

At compilation-time, Guile’s compiler recognizes %+ as special, and therefore compiles the body of + as consisting of a primitive call (primcall), in this case to the addition primitive. The Whiffle (and Hoot, and native Guile) back-ends then avoid referencing an imported binding when compiling %+, and instead produce backend-specific code: %+ disappears. Most uses of the + wrapper get inlined so %+ ends up generating code all over the program.

The prelude is lexically splatted into the compilation unit via a pre-expansion phase, so you end up with something like:

(let () ; establish lexical binding contour
  ...
  (define (+ x y) (%+ x y))
  ...
  (let () ; new nested contour
    (define (inc n) (+ n 1))
    (inc 42)))

This program will probably optimize (via partial evaluation) to just 43. (What about let and define? Well. Perhaps we’ll get to that.)

But, again here I have taken a short-cut, which is about modules. Hoot and Whiffle don’t really do modules, yet anyway. I keep telling Spritely colleagues that it’s complicated, and rightfully they keep asking why, so this article gets into it.

is it really a big letrec?

Firstly you have to ask, what is the compilation unit anyway? I mean, given a set of modules A, B, C and so on, you could choose to compile them separately, relying on the dynamic linker to compose them at run-time, or all together, letting the compiler gnaw on them all at once. Or, just A and B, and so on. One good-enough answer to this problem is library-group form, which explicitly defines a set of topologically-sorted modules that should be compiled together. In our case, to treat the (inc) module together with our example program as one compilation unit, we would have:

(library-group
  ;; start with sequence of libraries
  ;; to include in compilation unit...
  (library (inc) ...)

  ;; then the tail is the program that
  ;; might use the libraries
  (import (rnrs) (inc))
  (inc 42))

In this example, the (rnrs) base library is not part of the compilation unit. Presumably it will be linked in, either as a build step or dynamically at run-time. For Hoot we would want the whole prelude to be included, because we don’t want any run-time dependencies. Anyway hopefully this would expand out to something like the set of nested define forms inside nested let lexical contours.

And that was my instinct: somehow we are going to smash all these modules together into a big nested letrec, and the compiler will go to town. And this would work, for a “normal” programming language.

But with Scheme, there is a problem: macros. Scheme is a “programmable programming language” that allows users to extend its syntax as well as its semantics. R6RS defines a procedural syntax transformer (“macro”) facility, in which the user can define functions that run on code at compile-time (specifically, during syntax expansion). Scheme macros manage to compose lexical scope from the macro definition with the scope at the macro instantiation site, by annotating these expressions with source location and scope information, and making syntax transformers mostly preserve those annotations.

“Macros are great!”, you say: well yes, of course. But they are a problem too. Consider this incomplete library:

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      ...)) // ***

The idea is to define a version of inc, but at compile-time: a (ctinc 42) form should expand directly to 43, not a call to inc (or even +, or %+). We define syntax transformers with define-syntax instead of define. The right-hand-side of the definition ((lambda (stx) ...)) should be a procedure of one argument, which returns one value: so far so good. Or is it? How do we actually evaluate what (lambda (stx) ...) means? What should we fill in for ...? When evaluating the transformer value, what definitions are in scope? What does lambda even mean in this context?

Well... here we butt up against the phasing wars of the mid-2000s. R6RS defines a whole system to explicitly declare what bindings are available when, then carves out a huge exception to allow for so-called implicit phasing, in which the compiler figures it out on its own. In this example we imported (rnrs) for the default phase, and this is the module that defines lambda (and indeed define and define-syntax). The standard defines that (rnrs) makes its bindings available both at run-time and expansion-time (compilation-time), so lambda means what we expect that it does. Whew! Let’s just assume implicit phasing, going forward.

The operand to the syntax transformer is a syntax object: an expression annotated with source and scope information. To pick it apart, R6RS defines a pattern-matching helper, syntax-case. In our case ctinc is unary, so we can begin to flesh out the syntax transformer:

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      (syntax-case stx ()
        ((ctinc n)
         (inc n)))))) // ***

But here there’s a detail, which is that when syntax-case destructures stx to its parts, those parts themselves are syntax objects which carry the scope and source location annotations. To strip those annotations, we call the syntax->datum procedure, exported by (rnrs).

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      (syntax-case stx ()
        ((ctinc n)
         (inc (syntax->datum #'n)))))))

And with this, voilà our program:

(library-group
  (library (inc) ...)
  (library (ctinc) ...)
  (import (rnrs) (ctinc))
  (ctinc 42))

This program should pre-expand to something like:

(let ()
  (define (inc n) (+ n 1))
  (let ()
    (define-syntax ctinc
      (lambda (stx)
        (syntax-case stx ()
          ((ctinc n)
           (inc (syntax->datum #'n))))))
    (ctinc 42)))

And then expansion should transform (ctinc 42) to 43. However, our naïve pre-expansion is not good enough for this to be possible. If you ran this in Guile you would get an error:

Syntax error:
unknown file:8:12: reference to identifier outside its scope in form inc

Which is to say, inc is not available as a value within the definition of ctinc. ctinc could residualize an expression that refers to inc, but it can’t use it to produce the output.

modules are not expressible with local lexical binding

This brings us to the heart of the issue: with procedural macros, modules impose a phasing discipline on the expansion process. Definitions from any given module must be available both at expand-time and at run-time. In our example, ctinc needs inc at expand-time, which is an early part of the compiler that is unrelated to any later partial evaluation by the optimizer. We can’t make inc available at expand-time just using let / letrec bindings.

This is an annoying result! What do other languages do? Well, mostly they aren’t programmable, in the sense that they don’t have macros. There are some ways to get programmability using e.g. eval in JavaScript, but these systems are not very amenable to “offline” analysis of the kind needed by an ahead-of-time compiler.

For those declarative languages with macros, Scheme included, I understand the state of the art is to expand module-by-module and then stitch together the results of expansion later, using a kind of link-time optimization. You visit a module’s definitions twice: once to evaluate them while expanding, resulting in live definitions that can be used by further syntax expanders, and once to residualize an abstract syntax tree, which will eventually be spliced into the compilation unit.

Note that in general the expansion-time and the residual definitions don’t need to be the same, and indeed during cross-compilation they are often different. If you are compiling with Guile as host and Hoot as target, you might implement cons one way in Guile and another way in Hoot, choosing between them with cond-expand.

lexical scope regained?

What is to be done? Glad you asked, Vladimir. But, I don’t really know. The compiler wants a big blob of letrec, but the expander wants a pearl-string of modules. Perhaps we try to satisfy them both? The library-group paper suggests that modules should be expanded one by one, then stitched into a letrec by AST transformations. It’s not that lexical scope is incompatible with modules and whole-program compilation; the problems arise when you add in macros. So by expanding first, in units of modules, we reduce high-level Scheme to a lower-level language without syntax transformers, but still on the level of letrec.

I was unreasonably pleased by the effectiveness of the “just splat in a prelude” approach, and I will miss it. I even pled for a kind of stop-gap fat-fingered solution to sloppily parse module forms and keep on splatting things together, but colleagues helpfully talked me away from the edge. So good-bye, sloppy: I repent my ways and will make amends, with 40 hail-maries and an alpha renaming thrice daily and more often if in moral distress. Further bulletins as events warrant. Until then, happy scheming!