wastrel milestone: full hoot support, with generational gc as a treat

Hear ye, hear ye: Wastrel and Hoot means REPL!

Which is to say, Wastrel can now make native binaries out of WebAssembly files as produced by the Hoot Scheme toolchain, up to and including a full read-eval-print loop. Like the REPL on the Hoot web page, but instead of requiring a browser, you can just run it on your console. Amazing stuff!

try it at home

First, we need the latest Hoot. Build it from source, then compile a simple REPL:

echo '(import (hoot repl)) (spawn-repl)' > repl.scm
./pre-inst-env hoot compile -fruntime-modules -o repl.wasm repl.scm

This takes about a minute. The resulting wasm file has a pretty full standard library including a full macro expander and evaluator.

Normally Hoot would do some aggressive tree-shaking to discard any definitions not used by the program, but with a REPL we don’t know what we might need. So, we pass -fruntime-modules to instruct Hoot to record all modules and their bindings in a central registry, so they can be looked up at run-time. This results in a 6.6 MB Wasm file; with tree-shaking we would have been at 1.2 MB.

Next, build Wastrel from source, and compile our new repl.wasm:

wastrel compile -o repl repl.wasm

This takes about 5 minutes on my machine: about 3 minutes to generate all the C, about 6.6MLOC all in all, split into a couple hundred files of about 30KLOC each, and then 2 minutes to compile with GCC and link-time optimization (parallelised over 32 cores in my case). I have some ideas to golf the first part down a bit, but the the GCC side will resist improvements.

Finally, the moment of truth:

$ ./repl
Hoot 0.8.0

Enter `,help' for help.
(hoot user)> "hello, world!"
=> "hello, world!"
(hoot user)>

statics

When I first got the REPL working last week, I gasped out loud: it’s alive, it’s alive!!! Now that some days have passed, I am finally able to look a bit more dispassionately at where we’re at.

Firstly, let’s look at the compiled binary itself. By default, Wastrel passes the -g flag to GCC, which results in binaries with embedded debug information. Which is to say, my ./repl is chonky: 180 MB!! Stripped, it’s “just” 33 MB. 92% of that is in the .text (code) section. I would like a smaller binary, but it’s what we got for now: each byte in the Wasm file corresponds to around 5 bytes in the x86-64 instruction stream.

As for dependencies, this is a pretty minimal binary, though dynamically linked to libc:

linux-vdso.so.1 (0x00007f6c19fb0000)
libm.so.6 => /gnu/store/…-glibc-2.41/lib/libm.so.6 (0x00007f6c19eba000)
libgcc_s.so.1 => /gnu/store/…-gcc-15.2.0-lib/lib/libgcc_s.so.1 (0x00007f6c19e8d000)
libc.so.6 => /gnu/store/…-glibc-2.41/lib/libc.so.6 (0x00007f6c19c9f000)
/gnu/store/…-glibc-2.41/lib/ld-linux-x86-64.so.2 (0x00007f6c19fb2000)

Our compiled ./repl includes a garbage collector from Whippet, about which, more in a minute. For now, we just note that our use of Whippet introduces no run-time dependencies.

dynamics

Just running the REPL with WASTREL_PRINT_STATS=1 in the environment, it seems that the REPL has a peak live data size of 4MB or so, but for some reason uses 15 MB total. It takes about 17 ms to start up and then exit.

These numbers I give are consistent over a choice of particular garbage collector implementations: the default --gc=stack-conservative-parallel-generational-mmc, or the non-generational stack-conservative-parallel-mmc, or the Boehm-Demers-Weiser bdw. Benchmarking collectors is a bit gnarly because the dynamic heap growth heuristics aren’t the same between the various collectors; by default, the heap grows to 15 MB or so with all collectors, but whether it chooses to collect or expand the heap in response to allocation affects startup timing. I get the above startup numbers by setting GC_OPTIONS=heap-size=15m,heap-size-policy=fixed in the environment.

Hoot implements Guile Scheme, so we can also benchmark Hoot against Guile. Given the following test program that sums the leaf values for ten thousand quad trees of height 5:

(define (quads depth)
  (if (zero? depth)
      1
      (vector (quads (- depth 1))
              (quads (- depth 1))
              (quads (- depth 1))
              (quads (- depth 1)))))
(define (sum-quad q)
  (if (vector? q)
      (+ (sum-quad (vector-ref q 0))
         (sum-quad (vector-ref q 1))
         (sum-quad (vector-ref q 2))
         (sum-quad (vector-ref q 3)))
      q))

(define (sum-of-sums n depth)
  (let lp ((n n) (sum 0))
    (if (zero? n)
        sum
        (lp (- n 1)
            (+ sum (sum-quad (quads depth)))))))


(sum-of-sums #e1e4 5)

We can cat it to our repl to see how we do:

Hoot 0.8.0

Enter `,help' for help.
(hoot user)> => 10240000
(hoot user)>
Completed 3 major collections (281 minor).
4445.267 ms total time (84.214 stopped); 4556.235 ms CPU time (189.188 stopped).
0.256 ms median pause time, 0.272 p95, 7.168 max.
Heap size is 28.269 MB (max 28.269 MB); peak live data 9.388 MB.

That is to say, 4.44s, of which 0.084s was spent in garbage collection pauses. The default collector configuration is generational, which can result in some odd heap growth patterns; as it happens, this workload runs fine in a 15MB heap. Pause time as a percentage of total run-time is very low, so all the various GCs perform the same, more or less; we seem to be benchmarking eval more than the GC itself.

Is our Wastrel-compiled repl performance good? Well, we can evaluate it in two ways. Firstly, against Chrome or Firefox, which can run the same program; if I paste in the above program in the REPL over at the Hoot web site, it takes about 5 or 6 times as long to complete, respectively. Wastrel wins!

I can also try this program under Guile itself: if I eval it in Guile, it takes about 3.5s. Granted, Guile’s implementation of the same source language is different, and it benefits from a number of representational tricks, for example using just two words for a pair instead of four on Hoot+Wastrel. But these numbers are in the same ballpark, which is heartening. Compiling the test program instead of interpreting is about 10× faster with both Wastrel and Guile, with a similar relative ratio.

Finally, I should note that Hoot’s binaries are pretty well optimized in many ways, but not in all the ways. Notably, they use too many locals, and the post-pass to fix this is unimplemented, and last time I checked (a long time ago!), wasm-opt didn’t work on our binaries. I should take another look some time.

generational?

This week I dotted all the t’s and crossed all the i’s to emit write barriers when we mutate the value of a field to store a new GC-managed data type, allowing me to enable the sticky mark-bit variant of the Immix-inspired mostly-marking collector. It seems to work fine, though this kind of generational collector still baffles me sometimes.

With all of this, Wastrel’s GC-using binaries use a stack-conservative, parallel, generational collector that can compact the heap as needed. This collector supports multiple concurrent mutator threads, though Wastrel doesn’t do threading yet. Other collectors can be chosen at compile-time, though always-moving collectors are off the table due to not emitting stack maps.

The neat thing is that any language that compiles to Wasm can have any of these collectors! And when the Whippet GC library gets another collector or another mode on an existing collector, you can have that too.

missing pieces

The biggest missing piece for Wastrel and Hoot is some kind of asynchrony, similar to JavaScript Promise Integration (JSPI), and somewhat related to stack switching. You want Wasm programs to be able to wait on external events, and Wastrel doesn’t support that yet.

Other than that, it would be lovely to experiment with Wasm shared-everything threads at some point.

what’s next

So I have an ahead-of-time Wasm compiler. It does GC and lots of neat things. Its performance is state-of-the-art. It implements a few standard libraries, including WASI 0.1 and Hoot. It can make a pretty good standalone Guile REPL. But what the hell is it for?

Friends, I... I don’t know! It’s really cool, but I don’t yet know who needs it. I have a few purposes of my own (pushing Wasm standards, performance work on Whippet, etc), but you or someone you know needs a wastrel, do let me know at wingo@igalia.com: I would love to be able to spend more time hacking in this area.

Until next time, happy compiling to all!

No responses