wingolog

maxwell's equations of software

2009-12-11T14:23:10Z

There was some interesting feedback on my last article, and I ended up wanting to say too much in reply that here I am again, typing at the ether.

Stephen reflects the common confusion that somehow this is just a little file that might or might not work. No man, this is Guile! That's the implementation of Guile's eval. So unit tests, yes, we have more than 12000 of them. Only some of them specifically test the evaluator, but all of them go through the evaluator.

But to be honest, I think unit tests help, but when I'm hacking the compiler the most useful test is to simply rebuild the compiler. If it successfully bootstraps, usually we're doing pretty well.

Zed raises a more direct criticism:

This is why I hate lisp: [link to my article] Long and dense, no comments, not semantic. That's "awesome code"?

I don't think I used the word "awesome", but yes, I believe so, in the sense of "inspiring awe" -- at least for me.

I'm going to call on my homie Alan Kay for some support here. There was an oft-cited interview with Alan Kay a few years back, when he described eval as "Maxwell's equations of software". I quote:

[Interviewer:] If nothing else, Lisp was carefully defined in terms of Lisp.
[Alan Kay:] Yes, that was the big revelation to me when I was in graduate school—when I finally understood that the half page of code on the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself. These were “Maxwell’s Equations of Software!” This is the whole world of programming in a few lines that I can put my hand over.
I realized that anytime I want to know what I’m doing, I can just write down the kernel of this thing in a half page and it’s not going to lose any power. In fact, it’s going to gain power by being able to reenter itself much more readily than most systems done the other way can possibly do.

Just for context, here is that half-a-page -- Maxwell's equations of software:

evalquote is defined by using two main functions, called eval and apply. apply handles a function and its arguments, while eval handles forms. Each of these functions also has another argument that is used as an association list for storing the values of bound variables and function names.

 evalquote[fn;x] = apply[fn;x;NIL]

where

 apply[fn;x;a] =
       [atom[fn] → [eq[fn;CAR] → caar[x];
                    eq[fn;CDR] → cdar[x];
                    eq[fn;CONS] → cons[car[x];cadr[x]];
                    eq[fn;ATOM] → atom[car[x]];
                    eq[fn;EQ] → eq[car[x];cadr[x]];
                    T → apply[eval[fn;a];x;a]];
        eq[car[fn];LAMBDA] → eval[caddr[fn];pairlis[cadr[fn];x;a]];
        eq[car[fn];LABEL] → apply[caddr[fn];x;cons[cons[cadr[fn];
                                                   caddr[fn]];a]]]

 eval[e;a] =
       [atom[e] → cdr[assoc[e;a]];
        atom[car[e]] → [eq[car[e];QUOTE] → cadr[e];
                        eq[car[e];COND] → evcon[cdr[e];a];
                        T → apply[car[e];evlis[cdr[e];a];a]];
        T → apply[car[e];evlis[cdr[e];a];a]]

pairlis and assoc have been previously defined.

 evcon[c;a] = [eval[caar[c];a] → eval[cadar[c];a];
               T → evcon[cdr[c];a]]

and

 evlis[m;a] = [null[m] →  NIL;
               T → cons[eval[car[m];a];evlis[cdr[m];a]]]

From the LISP 1.5 Programmer's Manual

What a mess! Where are the unit tests? Where are the comments? A little bit of whitespace, please! They use "cdar", "caar", and "cadr". They should be using named accessors! What if you eval an atom that's not found in the association list? Did they really name a function "evlis"? Et cetera.

Now, I do think the LISP 1.5 (it wasn't yet spelled "Lisp" then) evaluator is awesome, as it was in McCarthy's first papers on the subject. If you disagree with that, that's cool, I see why one would "hate" my poor imitation.

But given the Lisp history of meta-circular evaluators, one doesn't need to comment every one as if it were the first. In the same way that one recognizes an if statement without the need for a design pattern behind it, I would want anyone who's hacking on Guile's evaluator to have read or perhaps even written several evaluators; and once you've written one, you know the pattern.

More substantively, Charles speculates on source-to-source transformations to ease interpretation. I admit almost total ignorance regarding PreScheme; I've been meaning to learn about it for years now. But the point of this evaluator was to have it use the same representation as VM-compiled procedures; ideally it should run fast, but the first priority is for it to run on the VM itself instead of on a separate stack. There's certainly many more clever things that can be done there, and thankfully since eval is implemented in Scheme and compiled like anything else, it will also reap advantages of an improving compiler.

Regarding Emacs, I hope to say more on that point within a week or so, when the video for my talk at the recent GNU Hackers Meeting gets put up.

Finally, Bubo asks if this work is in a released tarball. Not yet is the answer, but it will be in Tuesday's 1.9.6 release.

Happy hacking!

in which our protagonist forgoes modesty

2009-12-09T20:51:00Z

I wrote the most beautiful code of my life last week.

I would like to explain it to you all, but I have to tell a little bit of backstory first.

the distant future, the year 2000

Until earlier this year, Guile has been an interpreted Scheme. Guile ran your code by parsing it into tree-like data structures, then doing a depth-first evaluation of that tree. The evaluation itself was performed with a C function, ceval, which would recursively call itself when evaluating sub-expressions.

ceval was OK, but not so great. Instead of recursively traversing a tree, it's better to pre-examine the expressions you're going to evaluate, and then emit linear sequences of code to handle those steps. That's to say that it's better to have a compiler than an interpreter. So I dug up Keisuke Nishida's old bytecode compiler that he wrote back in 2001, and eventually hacked it into a shape that we merged it into Guile itself.

That was a pretty sweet hack, to retrofit a compiler into Guile. But it wasn't as beautiful as the code I wrote last week.

the present

See, the problem was that now we had two stacks: the C stack that ceval used, and the virtual machine stack used by byte-compiled code. This was a debugging headache, as to get backtraces you had to ping-pong back and forth between the two stacks, interleaving their frames together in the right order. Also with two stacks it's practically impossible to write a real debugger that does single-stepping, inspection and modification of stack frames, etc.

The two-stack solution (ahem) had another problem: you couldn't tail-call between interpreted and compiled code, because the procedures used different stacks. Normally this wasn't a big deal because all code was compiled, but it would occasionally bite you. (The usual case would be when you had a compiled Guile, but just pulled new code from the git repository, then tried to compile it again -- so some of your compiled code was out-of-date and therefore not loaded, you had a mix of compiled and interpreted code in some important places, and your loop starts consuming stack.)

Finally, interpreted code behaved differently than compiled code in some cases. For example, consider the following code:

;; Returns two values: the value, if found,
;; and a flag indicating success.
(define (table-lookup table key)
  (let ((handle (assq key table)))
    (if handle
        (values (cdr handle) #t)
        (values #f #f))))

(define (trace-call f . args)
  (let ((result (apply f args)))
    (format #t "\nfunction returned ~a\n" result)
    result))

(trace-call table-lookup '((x . y)) 'x)

So if I try this at my Guile 1.8 prompt, I get this:

guile> (trace-call table-lookup '((x . y)) 'x)
function returned #
$1 = y
$2 = #t

We see that trace-call returns two values, and the tracing printout shows a "multiple values object" -- a Scheme object like any other, but that the primitive call-with-values knows how to destructure. The toplevel repl is wrapped in a call-with-values, so t and #t print separately.

Now if I fire up Guile 1.9, let's see what we get:

> (trace-call table-lookup '((x . y)) 'x)
function returned y
$1 = y

Guile 1.9's repl compiles its expressions, by default, and indeed we see different behavior -- the trace printout has a naked value, y, and only one value is returned.

Both of these behaviors are compatible with standard Scheme from R5RS on. The origin of the difference is that the behavior of values within a continuation that was not created with call-with-values is unspecified. Relatedly, it is not specified what will happen when you return N values to a continuation accepting M values, where N != M.

What's happening is that the compiler actually has two return addresses in each stack frame -- one for the normal singly-valued case, and one for multiple values. values will return to the multiple-value return address (MVRA), and anything else will go to the normal return address. So actually, compiled code can choose what to do when it gets multiple values. Instead of raising an error when two values are returned to the (let ((result ...)) ...) continuation, Guile chooses to do what you (probably) expect and just drop the second value.

In contrast, with a C evaluator, even noticing that two values were returned to a singly-valued continuation is a pain -- because you have to check and branch every time you recursively call ceval to see if you're getting a multiple-values object.

But I digress. I promised something nice, and here I am noodling about something else.

exit strategy

The solution to all these problems, of course, is to use just one stack, and have that stack be the same as the one that compiled code uses.

Practically what this means is that eval should not be a C function, because Guile does not compile to C; it should be something that ends up as compiled code.

(For now, compiled code is bytecode, run on the VM. I'm being a little vague here because Guile doesn't do native compilation yet, but it will, within a year or two, and the same considerations apply.)

I actually toyed with the idea of writing a hand-coded eval in VM bytecode, but I came to my senses soon enough, and the answer was delightful.

eval in scheme

Of course! Scheme's eval should be written in Scheme itself. Then we just compile it to bytecode, like any other Scheme procedure.

At this point, anyone who's actually had to do Scheme at university (not me) will recognize this as the meta-circular evaluator pattern. To be honest I had never written one before -- and I think the reason was that they always seemed so peripheral. When you write a meta-circular evaluator, the language you really work in is the one that implements the meta-circular evaluator, not the one implemented by the evaluator -- or at least, that's the case if you're trying to get something done, rather than learn about language.

But this is different. This time the meta-circular evaluator actually sits at the heart of Guile -- in fact, we use eval, as implemented in Scheme, and compiled to bytecode, to compile the compiler -- which itself is written in Scheme of course.

In the end, though, you have to have a Scheme compiler to compile eval.scm itself, so we do end up keeping around an evaluator in C. Its only purpose is to interpret the compiler, so we can compile eval.scm: then the compiled version of eval.scm compiles the rest of Guile, including the compiler.

Another option would have been to require a new-enough version of Guile itself to compile the compiler. But I want to be able to sanely bootstrap Guile's compiler, so that's out of the question. We could implement the compiler in portable Scheme, but that would forbid the compiler from making use of any of Guile's niceties.

the code

So here it is (and below). I don't claim that it is actually the most elegant code I have written, though I can think of none better at the moment; nor is it the fastest code, nor the most concise. But it sits in such a powerful place, and in so few lines, that I cannot help but to be pleased with it.

(define primitive-eval
  (let ()
    ;; The "engine". EXP is a memoized expression.
    (define (eval exp env)
      (memoized-expression-case exp
        (('begin (first . rest))
         (let lp ((first first) (rest rest))
           (if (null? rest)
               (eval first env)
               (begin
                 (eval first env)
                 (lp (car rest) (cdr rest))))))
      
        (('if (test consequent . alternate))
         (if (eval test env)
             (eval consequent env)
             (eval alternate env)))
      
        (('let (inits . body))
         (let lp ((inits inits) (new-env (capture-env env)))
           (if (null? inits)
               (eval body new-env)
               (lp (cdr inits)
                   (cons (eval (car inits) env) new-env)))))
      
        (('lambda (nreq rest? . body))
         (let ((env (capture-env env)))
           (lambda args
             (let lp ((env env) (nreq nreq) (args args))
               (if (zero? nreq)
                   (eval body
                         (if rest?
                             (cons args env)
                             (if (not (null? args))
                                 (scm-error 'wrong-number-of-args
                                            "eval" "Wrong number of arguments"
                                            '() #f)
                                 env)))
                   (if (null? args)
                       (scm-error 'wrong-number-of-args
                                  "eval" "Wrong number of arguments"
                                  '() #f)
                       (lp (cons (car args) env)
                           (1- nreq)
                           (cdr args))))))))

        (('quote x)
         x)

        (('define (name . x))
         (define! name (eval x env)))
      
        (('apply (f args))
         (apply (eval f env) (eval args env)))

        (('call (f . args))
         (let ((proc (eval f env)))
           (let eval-args ((in args) (out '()))
             (if (null? in)
                 (apply proc (reverse out))
                 (eval-args (cdr in)
                            (cons (eval (car in) env) out))))))
      
        (('call/cc proc)
         (call/cc (eval proc env)))

        (('call-with-values (producer . consumer))
         (call-with-values (eval producer env)
           (eval consumer env)))

        (('lexical-ref n)
         (let lp ((n n) (env env))
           (if (zero? n)
               (car env)
               (lp (1- n) (cdr env)))))
      
        (('lexical-set! (n . x))
         (let ((val (eval x env)))
           (let lp ((n n) (env env))
             (if (zero? n)
                 (set-car! env val)
                 (lp (1- n) (cdr env))))))
        
        (('toplevel-ref var-or-sym)
         (variable-ref
          (if (variable? var-or-sym)
              var-or-sym
              (let lp ((env env))
                (if (pair? env)
                    (lp (cdr env))
                    (memoize-variable-access! exp (capture-env env)))))))

        (('toplevel-set! (var-or-sym . x))
         (variable-set!
          (if (variable? var-or-sym)
              var-or-sym
              (let lp ((env env))
                (if (pair? env)
                    (lp (cdr env))
                    (memoize-variable-access! exp (capture-env env)))))
          (eval x env)))
      
        (('module-ref var-or-spec)
         (variable-ref
          (if (variable? var-or-spec)
              var-or-spec
              (memoize-variable-access! exp #f))))

        (('module-set! (x . var-or-spec))
         (variable-set!
          (if (variable? var-or-spec)
              var-or-spec
              (memoize-variable-access! exp #f))
          (eval x env)))))
  
    ;; primitive-eval
    (lambda (exp)
      "Evaluate @var{exp} in the current module."
      (eval 
       (memoize-expression ((or (module-transformer (current-module))
                                (lambda (x) x))
                            exp))
       '()))))