Following on last week’s
egregious discussion of the Hoot Scheme-to-WebAssembly compiler bootie,
today I would like to examine another axis of boot, which is a kind of
rebased branch of history: not the hack as it happened, but the logic inside the
hack, the structure of the built thing, the history as it might have been. Instead of describing the layers of shims and props that we used while discovering what were building, let’s look at how we would build Hoot again, if we had to.
I think many readers of this blog will have seen Growing a
Language, a talk /
performance art piece in which Guy L. Steele—I once mentioned to him
that Guy L. was one of the back-justifications for the name Guile; he
did not take it well—in which Steele takes the set of monosyllabic words as primitives
and builds up a tower of terms on top, bootstrapping a language as he
goes. I just watched it again and I think it holds up, probably well
enough to forgive the superfluous presence of the gender binary in the
intro; ideas were different in the 1900s.
It is in the sense of that talk that I would like to look at growing a
Hoot: how Hoot defines nouns and verbs in terms of smaller, more
primitive terms: terms in terms of terms.
If you are reading this on the web, you should see above a graph of
dependencies among the 50 or so libraries that are shipped as part of
Hoot. (Somehow I doubt that a feed reader will plumb through the inline
SVG, but who knows.) It’s a bit of a mess, but still I think it’s a
useful illustration of a number of properties of how the Hoot language
is grown from small to large. Click on any box to visit the source code for that module.
the root of the boot
Firstly, let us note that the graph is not a forest: it is a single
tree. There is no module that does not depend (possibly indirectly) on
(hoot primitives). This is because there are no capabilities that
Hoot libraries can access without importing them, and the only way into
the Hootosphere from outside is via the definitions in the primitives
module.
So what are these definitions, you might ask? Well, these are the
“well-known” bindings, for example + for which the compiler might have
some special understanding, the sort of binding that gets translated to
a primitive operation at the compiler IR level. They are used in
careful ways by the modules that use (hoot primitives) to ensure that
their uses are all open-coded by the compiler. (“Open coding” is
inlining. But inlining to me implies that the whole implementation is
inlined, with no slow-path callouts, whereas open coding implies to me
that it’s the compiler that knows what the op does and may or may not
inline the actual asm.)
But, (hoot primitives) also exposes some other definitions, for
example define and let and lambda and all that. Scheme doesn’t
have keywords in the sense that Python has def and with and such:
there is no privileged way to associate a name with its meaning. It is
in this sense that it is impossible to avoid (hoot primitives): the
most simple (define x 42) depends on the lexical meaning of define,
which is provided by the primitives module.
Syntax definitions are an expander construct; they are not present at
run-time. Using a syntax definition causes the expander to invoke code,
and the expander runs on the host system, which is Guile and not
WebAssembly. So, syntax definitions belong to the host. This goes also
for some first-order definitions such as syntax->datum and so on,
which are only used in syntax expanders; these definitions are plumbed
through (hoot primitives), but can only ever be used by macro
definitions, which run on the meta-level.
(Is this too heavy? Allow me to lighten the mood: when I was 22 or so
and working in Namibia, I somehow got an advance copy of Notes from the
Metalevel. I was working on
algorithmic music synthesis, and my chief strategy was knocking hubris
together with itself, as one does. I sent the author a bunch of
uninvited corrections to his book. I think it was completely unwelcome!
Anyway, moral of the story, at 22 you get a free pass to do whatever you
want, and come to think of it, now that I am 44 I think I should get
some kind of hubris loyalty award or something.)
powerful primitives
So, there are expand-time primitives and run-time primitives. The
expander knows about expand-time primitives and the compiler knows about
run-time primitives. One particularly powerful primitive is
%inline-wasm, which takes an inline snippet of WebAssembly as an
s-expression and applies it to a number of arguments passed at run-time.
Consider
make-bytevector:
We have an inline snippet of wasm that makes a $mutable-bytevector.
It passes 0 as the hash field, meaning that the hashq of this value
will be lazily initialized, and the contents are a new array of a given
size and initial value. Inputs will be unboxed to the appropriate type
(two i32s in this case), and likewise with outputs; here we produce
the universal (ref eq) representation.
The nice thing about %inline-wasm is that the compiler didn’t have to
be taught about make-bytevector: this definition suffices, because
%inline-wasm can access a number of lower-level capabilities.
dual denotations
But as we learned in my notes on whole-program
compilation,
any run-time definition is available at compile-time, if it is reachable
from a syntax transformer. So this definition above isn’t quite
sufficient; we can’t call make-bytevector as part of a procedural
macro, which we might want to do. What we need instead is to provide
one definition when residualizing wasm at run-time, and another when
loading a module at expand-time.
In Hoot we do this with cond-expand, where we expand to %inline-wasm
when targetting Hoot, and... what, precisely, at expand-time? Really we
need to make a Guile bytevector, so in this sort of case, we end up
having to include a run-time make-bytevector definition in the (hoot primitives) module. This happens whereever we end up using
%inline-wasm.
building to guile
Returning to our graph, we see that there is a red-colored block for
Hoot modules, a teal-colored layer on top for those modules that are
defined by R7RS, a few oddballs, and then (guile) and Fibers built on
top. The
(guile) module provides a shim that implements Guile’s own default set
of bindings, allowing Guile modules to be loaded on a Hoot system.
(guile) is layered on top of the low-level Hoot libraries, and out of
convenience, on top of the various R7RS libraries as well, because it
was easiest to remember what was where in R7RS than our ad-hoc nest of
Hoot internal libraries.
Having (guile) lets Guile hackers build on Hoot. It’s still
incomplete but I think eventually it will be capital-G Good. Even for a
library that needed more porting like Fibers (Hoot has no threads so
much of the parallel concurrent ML implementation can be simplified, and
we use an event loop from the Wasm run-time instead of an epoll-based
scheduler), it was still pleasant to be able to use define-module and
keyword arguments and all of that.
next layers
I mentioned that this tower of terms is incomplete, and so that is one
of the next work items for Hoot: complete support for Guile’s run-time
library. At that point we’d probably want to merge it into Guile, but
that is another topic.
But let’s leave that for another day; until then, happy hacking!
Once support for the complete guile runtime has been grown, will support for compiling other guile languages to wasm come for free? like lokke and python-on-guile etc.
Amir says:
Out of interest - one alternative to implementing all primitives in both scheme and wasm would be to implement them only in wasm, and do the expansion phase in a wasm interpreter.
This requires defining syntax objects and transformations in wasm, was that the reason not to do it? Or performance? Or something else?
Just FYI, the inline SVG with clickable boxes came out just fine in the Thunderbird feed reader!
Once support for the complete guile runtime has been grown, will support for compiling other guile languages to wasm come for free? like lokke and python-on-guile etc.
Out of interest - one alternative to implementing all primitives in both scheme and wasm would be to implement them only in wasm, and do the expansion phase in a wasm interpreter.
This requires defining syntax objects and transformations in wasm, was that the reason not to do it? Or performance? Or something else?