growing a bootie

Following on last week’s egregious discussion of the Hoot Scheme-to-WebAssembly compiler bootie, today I would like to examine another axis of boot, which is a kind of rebased branch of history: not the hack as it happened, but the logic inside the hack, the structure of the built thing, the history as it might have been. Instead of describing the layers of shims and props that we used while discovering what were building, let’s look at how we would build Hoot again, if we had to.

I think many readers of this blog will have seen Growing a Language, a talk / performance art piece in which Guy L. Steele—I once mentioned to him that Guy L. was one of the back-justifications for the name Guile; he did not take it well—in which Steele takes the set of monosyllabic words as primitives and builds up a tower of terms on top, bootstrapping a language as he goes. I just watched it again and I think it holds up, probably well enough to forgive the superfluous presence of the gender binary in the intro; ideas were different in the 1900s.

It is in the sense of that talk that I would like to look at growing a Hoot: how Hoot defines nouns and verbs in terms of smaller, more primitive terms: terms in terms of terms.

If you are reading this on the web, you should see above a graph of dependencies among the 50 or so libraries that are shipped as part of Hoot. (Somehow I doubt that a feed reader will plumb through the inline SVG, but who knows.) It’s a bit of a mess, but still I think it’s a useful illustration of a number of properties of how the Hoot language is grown from small to large. Click on any box to visit the source code for that module.

the root of the boot

Firstly, let us note that the graph is not a forest: it is a single tree. There is no module that does not depend (possibly indirectly) on (hoot primitives). This is because there are no capabilities that Hoot libraries can access without importing them, and the only way into the Hootosphere from outside is via the definitions in the primitives module.

So what are these definitions, you might ask? Well, these are the “well-known” bindings, for example + for which the compiler might have some special understanding, the sort of binding that gets translated to a primitive operation at the compiler IR level. They are used in careful ways by the modules that use (hoot primitives) to ensure that their uses are all open-coded by the compiler. (“Open coding” is inlining. But inlining to me implies that the whole implementation is inlined, with no slow-path callouts, whereas open coding implies to me that it’s the compiler that knows what the op does and may or may not inline the actual asm.)

But, (hoot primitives) also exposes some other definitions, for example define and let and lambda and all that. Scheme doesn’t have keywords in the sense that Python has def and with and such: there is no privileged way to associate a name with its meaning. It is in this sense that it is impossible to avoid (hoot primitives): the most simple (define x 42) depends on the lexical meaning of define, which is provided by the primitives module.

Syntax definitions are an expander construct; they are not present at run-time. Using a syntax definition causes the expander to invoke code, and the expander runs on the host system, which is Guile and not WebAssembly. So, syntax definitions belong to the host. This goes also for some first-order definitions such as syntax->datum and so on, which are only used in syntax expanders; these definitions are plumbed through (hoot primitives), but can only ever be used by macro definitions, which run on the meta-level.

(Is this too heavy? Allow me to lighten the mood: when I was 22 or so and working in Namibia, I somehow got an advance copy of Notes from the Metalevel. I was working on algorithmic music synthesis, and my chief strategy was knocking hubris together with itself, as one does. I sent the author a bunch of uninvited corrections to his book. I think it was completely unwelcome! Anyway, moral of the story, at 22 you get a free pass to do whatever you want, and come to think of it, now that I am 44 I think I should get some kind of hubris loyalty award or something.)

powerful primitives

So, there are expand-time primitives and run-time primitives. The expander knows about expand-time primitives and the compiler knows about run-time primitives. One particularly powerful primitive is %inline-wasm, which takes an inline snippet of WebAssembly as an s-expression and applies it to a number of arguments passed at run-time. Consider make-bytevector:

(define* (make-bytevector len #:optional (init 0))
  (%inline-wasm
   '(func (param $len i32) (param $init i32)
      (result (ref eq))
      (struct.new
       $mutable-bytevector
       (i32.const 0)
       (array.new $raw-bytevector
                  (local.get $init)
                  (local.get $len))))
   len init))

We have an inline snippet of wasm that makes a $mutable-bytevector. It passes 0 as the hash field, meaning that the hashq of this value will be lazily initialized, and the contents are a new array of a given size and initial value. Inputs will be unboxed to the appropriate type (two i32s in this case), and likewise with outputs; here we produce the universal (ref eq) representation.

The nice thing about %inline-wasm is that the compiler didn’t have to be taught about make-bytevector: this definition suffices, because %inline-wasm can access a number of lower-level capabilities.

dual denotations

But as we learned in my notes on whole-program compilation, any run-time definition is available at compile-time, if it is reachable from a syntax transformer. So this definition above isn’t quite sufficient; we can’t call make-bytevector as part of a procedural macro, which we might want to do. What we need instead is to provide one definition when residualizing wasm at run-time, and another when loading a module at expand-time.

In Hoot we do this with cond-expand, where we expand to %inline-wasm when targetting Hoot, and... what, precisely, at expand-time? Really we need to make a Guile bytevector, so in this sort of case, we end up having to include a run-time make-bytevector definition in the (hoot primitives) module. This happens whereever we end up using %inline-wasm.

building to guile

Returning to our graph, we see that there is a red-colored block for Hoot modules, a teal-colored layer on top for those modules that are defined by R7RS, a few oddballs, and then (guile) and Fibers built on top. The (guile) module provides a shim that implements Guile’s own default set of bindings, allowing Guile modules to be loaded on a Hoot system. (guile) is layered on top of the low-level Hoot libraries, and out of convenience, on top of the various R7RS libraries as well, because it was easiest to remember what was where in R7RS than our ad-hoc nest of Hoot internal libraries.

Having (guile) lets Guile hackers build on Hoot. It’s still incomplete but I think eventually it will be capital-G Good. Even for a library that needed more porting like Fibers (Hoot has no threads so much of the parallel concurrent ML implementation can be simplified, and we use an event loop from the Wasm run-time instead of an epoll-based scheduler), it was still pleasant to be able to use define-module and keyword arguments and all of that.

next layers

I mentioned that this tower of terms is incomplete, and so that is one of the next work items for Hoot: complete support for Guile’s run-time library. At that point we’d probably want to merge it into Guile, but that is another topic.

But let’s leave that for another day; until then, happy hacking!

3 responses

Alaric Snell-Pym says:
22 May 2024 9:24 AM
Just FYI, the inline SVG with clickable boxes came out just fine in the Thunderbird feed reader!
Alex says:
22 May 2024 2:08 PM
Once support for the complete guile runtime has been grown, will support for compiling other guile languages to wasm come for free? like lokke and python-on-guile etc.
Amir says:
22 May 2024 2:30 PM
Out of interest - one alternative to implementing all primitives in both scheme and wasm would be to implement them only in wasm, and do the expansion phase in a wasm interpreter.
This requires defining syntax objects and transformations in wasm, was that the reason not to do it? Or performance? Or something else?

Comments are closed.