instruction explosion in guile

17 January 2018 10:30 AM (guile | compilers | instruction explosion | loop optimizations | cps)

Greetings, fellow Schemers and compiler nerds: I bring fresh nargery!

instruction explosion

A couple years ago I made a list of compiler tasks for Guile. Most of these are still open, but I've been chipping away at the one labeled "instruction explosion":

Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do this, which would be a lose overall because you have more instruction dispatch.

Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.

Now that I'm getting close to finished I wanted to share some thoughts. Previous progress reports on the mailing list.

a simple loop

As an example, consider this loop that sums the 32-bit floats in a bytevector. I've annotated the code with lines and columns so that you can correspond different pieces to the assembly.

   0       8   12     19
1| (use-modules (rnrs bytevectors))
2| (define (f32v-sum bv)
3|   (let lp ((n 0) (sum 0.0))
4|     (if (< n (bytevector-length bv))
5|         (lp (+ n 4)
6|             (+ sum (bytevector-ieee-single-native-ref bv n)))
7|          sum)))

The assembly for the loop before instruction explosion went like this:

  17    (handle-interrupts)     at (unknown file):5:12
  18    (uadd/immediate 0 1 4)
  19    (bv-f32-ref 1 3 1)      at (unknown file):6:19
  20    (fadd 2 2 1)            at (unknown file):6:12
  21    (s64<? 0 4)             at (unknown file):4:8
  22    (jnl 8)                ;; -> L4
  23    (mov 1 0)               at (unknown file):5:8
  24    (j -7)                 ;; -> L1

So, already Guile's compiler has hoisted the (bytevector-length bv) and unboxed the loop index n and accumulator sum. This work aims to simplify further by exploding bv-f32-ref.

exploding the loop

In practice, instruction explosion happens in CPS conversion, as we are converting the Scheme-like Tree-IL language down to the CPS soup language. When we see a Tree-Il primcall (a call to a known primitive), instead of lowering it to a corresponding CPS primcall, we inline a whole blob of code.

In the concrete case of bv-f32-ref, we'd inline it with something like the following:

(unless (and (heap-object? bv)
             (eq? (heap-type-tag bv) %bytevector-tag))
  (error "not a bytevector" bv))
(define len (word-ref bv 1))
(define ptr (word-ref bv 2))
(unless (and (<= 4 len)
             (<= idx (- len 4)))
  (error "out of range" idx))
(f32-ref ptr len)

As you can see, there are four branches hidden in the bv-f32-ref: two to check that the object is a bytevector, and two to check that the index is within range. In this explanation we assume that the offset idx is already unboxed, but actually unboxing the index ends up being part of this work as well.

One of the goals of instruction explosion was that by breaking the operation into a number of smaller, more orthogonal parts, native code generation would be easier, because the compiler would only have to know about those small bits. However without an optimizing compiler, it would be better to reify a call out to a specialized bv-f32-ref runtime routine instead of inlining all of this code -- probably whatever language you write your runtime routine in (C, rust, whatever) will do a better job optimizing than your compiler will.

But with an optimizing compiler, there is the possibility of removing possibly everything but the f32-ref. Guile doesn't quite get there, but almost; here's the post-explosion optimized assembly of the inner loop of f32v-sum:

  27    (handle-interrupts)
  28    (tag-fixnum 1 2)
  29    (s64<? 2 4)             at (unknown file):4:8
  30    (jnl 15)               ;; -> L5
  31    (uadd/immediate 0 2 4)  at (unknown file):5:12
  32    (u64<? 2 7)             at (unknown file):6:19
  33    (jnl 5)                ;; -> L2
  34    (f32-ref 2 5 2)
  35    (fadd 3 3 2)            at (unknown file):6:12
  36    (mov 2 0)               at (unknown file):5:8
  37    (j -10)                ;; -> L1

good things

The first thing to note is that unlike the "before" code, there's no instruction in this loop that can throw an exception. Neat.

Next, note that there's no type check on the bytevector; the peeled iteration preceding the loop already proved that the bytevector is a bytevector.

And indeed there's no reference to the bytevector at all in the loop! The value being dereferenced in (f32-ref 2 5 2) is a raw pointer. (Read this instruction as, "sp[2] = *(float*)((byte*)sp[5] + (uptrdiff_t)sp[2])".) The compiler does something interesting; the f32-ref CPS primcall actually takes three arguments: the garbage-collected object protecting the pointer, the pointer itself, and the offset. The object itself doesn't appear in the residual code, but including it in the f32-ref primcall's inputs keeps it alive as long as the f32-ref itself is alive.

bad things

Then there are the limitations. Firstly, instruction 28 tags the u64 loop index as a fixnum, but never uses the result. Why is this here? Sadly it's because the value is used in the bailout at L2. Recall this pseudocode:

(unless (and (<= 4 len)
             (<= idx (- len 4)))
  (error "out of range" idx))

Here the error ends up lowering to a throw CPS term that the compiler recognizes as a bailout and renders out-of-line; cool. But it uses idx as an argument, as a tagged SCM value. The compiler untags the loop index, but has to keep a tagged version around for the error cases.

The right fix is probably some kind of allocation sinking pass that sinks the tag-fixnum to the bailouts. Oh well.

Additionally, there are two tests in the loop. Are both necessary? Turns out, yes :( Imagine you have a bytevector of length 1025. The loop continues until the last ref at offset 1024, which is within bounds of the bytevector but there's one one byte available at that point, so we need to throw an exception at this point. The compiler did as good a job as we could expect it to do.

is is worth it? where to now?

On the one hand, instruction explosion is a step sideways. The code is more optimal, but it's more instructions. Because Guile currently has a bytecode VM, that means more total interpreter overhead. Testing on a 40-megabyte bytevector of 32-bit floats, the exploded f32v-sum completes in 115 milliseconds compared to around 97 for the earlier version.

On the other hand, it is very easy to imagine how to compile these instructions to native code, either ahead-of-time or via a simple template JIT. You practically just have to look up the instructions in the corresponding ISA reference, is all. The result should perform quite well.

I will probably take a whack at a simple template JIT first that does no register allocation, then ahead-of-time compilation with register allocation. Getting the AOT-compiled artifacts to dynamically link with runtime routines is a sufficient pain in my mind that I will put it off a bit until later. I also need to figure out a good strategy for truly polymorphic operations like general integer addition; probably involving inline caches.

So that's where we're at :) Thanks for reading, and happy hacking in Guile in 2018!

36 responses

  1. non-e-moose says:

    Another example of an inept developer talking about "instructions" when their only understanding is that of an IR (Itermediate Representation)

    "I also need to figure out a good strategy for truly polymorphic operations like general integer addition;"

    A great start would be to understand actual instruction sequences.
    If you are (and you seem to be) pick clang as a way to learn about code sequences.

  2. John Cowan says:

    Everybody's understanding is only at the level of an intermediate representation unless they know exactly how the CPU works at the circuit level.

  3. Stephen Eilert says:

    "Another example of an inept developer talking about "instructions" when their only understanding is that of an IR "

    How interesting would a world be where Andy Wingo would be considered an "inept" developer. I wonder which category would everyone else fall into.

  4. rixed says:

    "Another example of an inept developer"

    Could you comment on the ideas rather than the person?
    I guess we'd all like to learn something about what's so special about instruction sequences that made you dismiss the whole post like this. If your comment was anything else than an attempt at AI, that is.

  5. Arne Babenhauserheide says:

    …when they start to bite you unfairly, you know you are getting the attention from those who consider themselves to be above you…

  6. sims freeplay lp cheats says:

    We will visit the best place to use sims free play cheats so that we could save our time as well as money also.

  7. OkDissertations says:

    It's great that you bring all the readers here a round-up of what's happening with "instruction explosion". I'm sure that breaking the operation into a number of smaller, more orthogonal parts, will make native code generation easier than ever before. Thanks for sharing this info.

  8. Lenovo Support says:

    This content contains the instruction about an explosion in guile. The best part of this article is good things and bad things. Its help me to understand its benefits. The explosion loop is so good to understand.

  9. Solusi Alami Untuk Mengobati Kutil kelamin says:

    I would love to know about her childhood, about how she first met on writing. Because I think that this period has the greatest influence on the formation of a man as a creative person.

  10. CPI Mobi says:

    I'm a young programer. Found there a lot of useful and necessary information. Thank you for sharing this. Hope that you will no stop doing this! Thank you again!

  11. lenovo support says:

    it's a great article, You have to define good things as well as a bad thing very well. It makes clear understanding.

  12. novel updates says:

    This is a great little post with some valuable tips. I totally agree.

  13. dvd says:

    In a short time if you wan to access your data in a manner and also in a sequence then come here and get the solution for that things.

  14. shell shockers says:

    btw. just an idea. If GNOME decides to stay in the GNU project (I would appreciate such a decision). What about a joint GUADEC and GHM? The last GUADEC was together with Akademy but I think it would be good and about time that GNOME and GNU come more closely together and get to know each other and to understand each other again.

  15. Mike says:

    This is not that simple as I thought it to be. Thanks anyway.

  16. says:

    The last GUADEC was together with Akademy but I think it would be good and about time that GNOME and GNU come more closely together and get to know each other and to understand each other again.

  17. majece majece says:

    It's clear for me that on you can read more about getting assignment help. I had such experience recently

  18. reza says:

    Thanks For this great post.

  19. NSNHVerification says:

    It was difficult to understand, but it worth the solution that came out at the end. Thanks for the description.

  20. Emily Brown says:

    Thanks a lot for describing here instruction explosion in guile with a simple loop and exploding the loop. Very helpful information here for the beginners IT learners.


  21. ifreegiveaways says:

    Develop the discipline to set aside a certain amount of your earnings to pay income taxes. Even though home business owners get a decent number of tax write-offs, there is a very good chance that you will still need to pay something to the tax man. Make sure to set aside a portion each month to avoid taking a huge cash-flow hit all in one month.

  22. Quickbooks Support says:

    This information is helpful, as you have tested the outcomes.

  23. cdhpl says:

    thanks for this tutorial and share with us

  24. panel installation says:

    It's my pleasure to find this amazing website who provides mostly unseen information about every single topic which is i really like it.

  25. Ducksavy says:

    Nice post, thank you so much!

  26. Macbook Repair Oxford says:

    Repair My Phone Today repair shop located in Oxford. We repair smartphones, tablets and computers near Oxford.

  27. Lucy Orloski says:

    One of the finest Indian restaurants in Oxford is the TreeHotel restaurant, founded in 2004, it offers fine, affordable Indian cuisine and it offers Indian cuisine with finely blended spices and seasoning to tantalize your senses. It offers a huge variety of Hyderabadi and north Indian cuisine at its finest with each dish transporting you to the very shores of India as it tells its own delicious tales through the exotic flavours and colors.

  28. says:

    Here I am sharing a very useful resource for students looking for assignment help. Yes it is Proassignmenthelp providing all sort of assignment writing in various subjects like as : Electrical engineering assignment help , electronics assignment help , MATLAB assignment help , management assignment help , etc.

  29. Click here says:

    A perfect info source. Thanks for taking the time to discuss this, I feel strongly about it and love learning more on this topic.

  30. Smadav 2020 says:

    Smadav 2020 free download is an antivirus software made to be a consisted of protection or a second layer of security for computer systems as well as likewise laptops, appropriate with numerous other Antivirus.

  31. adamschule85 says:

    This actually taught me something... famous board

  32. temple run 2 says:

    So, already Guile's compiler has hoisted the (bytevector-length bv) and unboxed the loop index n and accumulator sum. This work aims to simplify further by exploding bv-f32-ref.

  33. Dissertation writing services UK says:

    This is one of the best post and we can learn a lot from this post. We should share our knowledge to our new generation, so that they can easily get knowledge.

  34. ترجمه تخصصی پزشکی says:

    thank you for amazing website

  35. finance dissertation Help UK says:

    Great, I must say and thanks for sharing this informative post. I am really impressed that there is so much information about this subject that have been uncovered and you’ve done your best, is giving help to students who are stressed with their homework help and submit their assignment on time. we are already trusted by thousands of students who struggle to write their academic papers and also by those students who simply want finance dissertation Help the UK to save their time and make life easy.

  36. atari breakout says:

    The information you have posted is really cool and very helpful, I will often visit your site.

Leave a Reply