lightening run-time code generation

The upcoming Guile 3 release will have just-in-time native code generation. Finally, amirite? There's lots that I'd like to share about that and I need to start somewhere, so this article is about one piece of it: Lightening, a library to generate machine code.

on lightning

Lightening is a fork of GNU Lightning, adapted to suit the needs of Guile. In fact at first we chose to use GNU Lightning directly, "vendored" into the Guile source respository via the git subtree mechanism. (I see that in the meantime, git gained a kind of a subtree command; one day I will have to figure out what it's for.)

GNU Lightning has lots of things going for it. It has support for many architectures, even things like Itanium that I don't really care about but which a couple Guile users use. It abstracts the differences between e.g. x86 and ARMv7 behind a common API, so that in Guile I don't need to duplicate the JIT for each back-end. Such an abstraction can have a slight performance penalty, because maybe it missed the opportunity to generate optimal code, but this is acceptable to me: I was more concerned about the maintenance burden, and GNU Lightning seemed to solve that nicely.

GNU Lightning also has fantastic documentation. It's written in C and not C++, which is the right thing for Guile at this time, and it's also released under the LGPL, which is Guile's license. As it's a GNU project there's a good chance that GNU Guile's needs might be taken into account if any changes need be made.

I mentally associated Paolo Bonzini with the project, who I knew was a good no-nonsense hacker, as he used Lightning for a smalltalk implementation; and I knew also that Matthew Flatt used Lightning in Racket. Then I looked in the source code to see architecture support and was pleasantly surprised to see MIPS, POWER, and so on, so I went with GNU Lightning for Guile in our 2.9.1 release last October.

on lightening the lightning

When I chose GNU Lightning, I had in mind that it was a very simple library to cheaply write machine code into buffers. (Incidentally, if you have never worked with this stuff, I remember a time when I was pleasantly surprised to realize that an assembler could be a library and not just a program that processes text. A CPU interprets machine code. Machine code is just bytes, and you can just write C (or Scheme, or whatever) functions that write bytes into buffers, and pass those buffers off to the CPU. Now you know!)

Anyway indeed GNU Lightning 1.4 or so was that very simple library that I had in my head. I needed simple because I would need to debug any problems that came up, and I didn't want to add more complexity to the C side of Guile -- eventually I should be migrating this code over to Scheme anyway. And, of course, simple can mean fast, and I needed fast code generation.

However, GNU Lightning has a new release series, the 2.x series. This series is a rewrite in a way of the old version. On the plus side, this new series adds all of the weird architectures that I was pleasantly surprised to see. The old 1.4 didn't even have much x86-64 support, much less AArch64.

This new GNU Lightning 2.x series fundamentally changes the way the library works: instead of having a jit_ldr_f function that directly emits code to load a float from memory into a floating-point register, the jit_ldr_f function now creates a node in a graph. Before code is emitted, that graph is optimized, some register allocation happens around call sites and for temporary values, dead code is elided, and so on, then the graph is traversed and code emitted.

Unfortunately this wasn't really what I was looking for. The optimizations were a bit opaque to me and I just wanted something simple. Building the graph took more time than just emitting bytes into a buffer, and it takes more memory as well. When I found bugs, I couldn't tell whether they were related to my usage or in the library itself.

In the end, the node structure wasn't paying its way for me. But I couldn't just go back to the 1.4 series that I remembered -- it didn't have the architecture support that I needed. Faced with the choice between changing GNU Lightning 2.x in ways that went counter to its upstream direction, switching libraries, or refactoring GNU Lightning to be something that I needed, I chose the latter.

in which our protagonist cannot help himself

Friends, I regret to admit: I named the new thing "Lightening". True, it is a lightened Lightning, yes, but I am aware that it's horribly confusing. Pronounced like almost the same, visually almost identical -- I am a bad person. Oh well!!

I ported some of the existing GNU Lightning backends over to Lightening: ia32, x86-64, ARMv7, and AArch64. I deleted the backends for Itanium, HPPA, Alpha, and SPARC; they have no Debian ports and there is no situation in which I can afford to do QA on them. I would gladly accept contributions for PPC64, MIPS, RISC-V, and maybe S/390. At this point I reckon it takes around 20 hours to port an additional backend from GNU Lightning to Lightening.

Incidentally, if you need a code generation library, consider your choices wisely. It is likely that Lightening is not right for you. If you can afford platform-specific code and you need C, Lua's DynASM is probably the right thing for you. If you are in C++, copy the assemblers from a JavaScript engine -- C++ offers much more type safety, capabilities for optimization, and ergonomics.

But if you can only afford one emitter of JIT code for all architectures, you need simple C, you don't need register allocation, you want a simple library to just include in your source code, and you are good with the LGPL, then Lightening could be a thing for you. Check the gitlab page for info on how to test Lightening and how to include it into your project.

giving it a spin

Yesterday's Guile 2.9.2 release includes Lightening, so you can give it a spin. The switch to Lightening allowed us to lower our JIT optimization threshold by a factor of 50, letting us generate fast code sooner. If you try it out, let #guile on freenode know how it went. In any case, happy hacking!

8 responses

  1. Nathan Myers says:

    If the other responses here are spam, they are very strange spam. The complaints about JS bigint, elsewhere, were even stranger, and the apology even more surprising.

    If there is a hell, you will surely burn in it for the name. But probably there isn't, so OK.

    I have followed your Guile work from afar. Despite my misgivings about obligate-GC languages, it makes me obscurely happy to see Guile evolving and remaining useful.

  2. Vítor De Araújo says:

    On the topic of spam, is the '34 to 42' thing inclusive? I would advise against listing a valid answer literally alongside the field...

    Great work on Guile, btw. I've been using Chez lately, but I find Guile is generally easier to work with, debug code, etc. With Guile performance improving, I might be tempted to start using it more. :)

  3. Alexandr Karbivnichiy says:

    "Fasta" test from Benchmarksgame.

    Python with multiprocessing: 32.7 seconds, ~300 MB
    Guile 2.9.2 single threaded: 39.4 seconds, ~40 MB

    Python 205 LOC. Guile 120 LOC.

    So Guile in performance is between Go and Python. But Go's interoperability with C is quite restricted.

  4. Alexandr Karbivnichiy says:

    ^^ Actually in Guile "fasta" program is even shorter, 105 LOC(not 120) without commented "statprof" lines(in Python 205 LOC).
    Longest line in Guile "fasta" is 72 chars(77 in Python).

    Parallel Forms in Guile's manual is a very interesting section. If it's suited for the task, Guile version will still be shorter than Python's.

  5. Reidar Rymdnalle says:

    Alexandr: could you share the fasta code? I can't find any implementation of it, and it would be nice to play with it...

  6. Alexandr Karbivnichiy says:

    Reidar Rymdnalle:

    karbiv/guile_benchmarksgame on github.

  7. Dale P. Smith says:

    Yey!

    Better and Better Andy!

  8. William ML Leslie says:

    It's great to see you posting again. I hadn't thought to benchmark the latest Guile on my Itanium system; but I totally get that it's a maintenance pain.

Comments are closed.