design notes on inline caches in guile

7 February 2018 3:14 PM (inline caches | compilers | v8 | igalia | guile | gnu | adaptive optimization | type feedback)

Ahoy, programming-language tinkerfolk! Today's rambling missive chews the gnarly bones of "inline caches", in general but also with particular respect to the Guile implementation of Scheme. First, a little intro.

inline what?

Inline caches are a language implementation technique used to accelerate polymorphic dispatch. Let's dive in to that.

By implementation technique, I mean that the technique applies to the language compiler and runtime, rather than to the semantics of the language itself. The effects on the language do exist though in an indirect way, in the sense that inline caches can make some operations faster and therefore more common. Eventually inline caches can affect what users expect out of a language and what kinds of programs they write.

But I'm getting ahead of myself. Polymorphic dispatch literally means "choosing based on multiple forms". Let's say your language has immutable strings -- like Java, Python, or Javascript. Let's say your language also has operator overloading, and that it uses + to concatenate strings. Well at that point you have a problem -- while you can specify a terse semantics of some core set of operations on strings (win!), you can't choose one representation of strings that will work well for all cases (lose!). If the user has a workload where they regularly build up strings by concatenating them, you will want to store strings as trees of substrings. On the other hand if they want to access characterscodepoints by index, then you want an array. But if the codepoints are all below 256, maybe you should represent them as bytes to save space, whereas maybe instead as 4-byte codepoints otherwise? Or maybe even UTF-8 with a codepoint index side table.

The right representation (form) of a string depends on the myriad ways that the string might be used. The string-append operation is polymorphic, in the sense that the precise code for the operator depends on the representation of the operands -- despite the fact that the meaning of string-append is monomorphic!

Anyway, that's the problem. Before inline caches came along, there were two solutions: callouts and open-coding. Both were bad in similar ways. A callout is where the compiler generates a call to a generic runtime routine. The runtime routine will be able to handle all the myriad forms and combination of forms of the operands. This works fine but can be a bit slow, as all callouts for a given operator (e.g. string-append) dispatch to a single routine for the whole program, so they don't get to optimize for any particular call site.

One tempting thing for compiler writers to do is to effectively inline the string-append operation into each of its call sites. This is "open-coding" (in the terminology of the early Lisp implementations like MACLISP). The advantage here is that maybe the compiler knows something about one or more of the operands, so it can eliminate some cases, effectively performing some compile-time specialization. But this is a limited technique; one could argue that the whole point of polymorphism is to allow for generic operations on generic data, so you rarely have compile-time invariants that can allow you to specialize. Open-coding of polymorphic operations instead leads to code bloat, as the string-append operation is just so many copies of the same thing.

Inline caches emerged to solve this problem. They trace their lineage back to Smalltalk 80, gained in complexity and power with Self and finally reached mass consciousness through Javascript. These languages all share the characteristic of being dynamically typed and object-oriented. When a user evaluates a statement like x = y.z, the language implementation needs to figure out where y.z is actually located. This location depends on the representation of y, which is rarely known at compile-time.

However for any given reference y.z in the source code, there is a finite set of concrete representations of y that will actually flow to that call site at run-time. Inline caches allow the language implementation to specialize the y.z access for its particular call site. For example, at some point in the evaluation of a program, y may be seen to have representation R1 or R2. For R1, the z property may be stored at offset 3 within the object's storage, and for R2 it might be at offset 4. The inline cache is a bit of specialized code that compares the type of the object being accessed against R1 , in that case returning the value at offset 3, otherwise R2 and offset r4, and otherwise falling back to a generic routine. If this isn't clear to you, Vyacheslav Egorov write a fine article describing and implementing the object representation optimizations enabled by inline caches.

Inline caches also serve as input data to later stages of an adaptive compiler, allowing the compiler to selectively inline (open-code) only those cases that are appropriate to values actually seen at any given call site.

but how?

The classic formulation of inline caches from Self and early V8 actually patched the code being executed. An inline cache might be allocated at address 0xcabba9e5 and the code emitted for its call-site would be jmp 0xcabba9e5. If the inline cache ended up bottoming out to the generic routine, a new inline cache would be generated that added an implementation appropriate to the newly seen "form" of the operands and the call-site. Let's say that new IC (inline cache) would have the address 0x900db334. Early versions of V8 would actually patch the machine code at the call-site to be jmp 0x900db334 instead of jmp 0xcabba6e5.

Patching machine code has a number of disadvantages, though. It inherently target-specific: you will need different strategies to patch x86-64 and armv7 machine code. It's also expensive: you have to flush the instruction cache after the patch, which slows you down. That is, of course, if you are allowed to patch executable code; on many systems that's impossible. Writable machine code is a potential vulnerability if the system may be vulnerable to remote code execution.

Perhaps worst of all, though, patching machine code is not thread-safe. In the case of early Javascript, this perhaps wasn't so important; but as JS implementations gained parallel garbage collectors and JS-level parallelism via "service workers", this becomes less acceptable.

For all of these reasons, the modern take on inline caches is to implement them as a memory location that can be atomically modified. The call site is just jmp *loc, as if it were a virtual method call. Modern CPUs have "branch target buffers" that predict the target of these indirect branches with very high accuracy so that the indirect jump does not become a pipeline stall. (What does this mean in the face of the Spectre v2 vulnerabilities? Sadly, God only knows at this point. Saddest panda.)

cry, the beloved country

I am interested in ICs in the context of the Guile implementation of Scheme, but first I will make a digression. Scheme is a very monomorphic language. Yet, this monomorphism is entirely cultural. It is in no way essential. Lack of ICs in implementations has actually fed back and encouraged this monomorphism.

Let us take as an example the case of property access. If you have a pair in Scheme and you want its first field, you do (car x). But if you have a vector, you do (vector-ref x 0).

What's the reason for this nonuniformity? You could have a generic ref procedure, which when invoked as (ref x 0) would return the field in x associated with 0. Or (ref x 'foo) to return the foo property of x. It would be more orthogonal in some ways, and it's completely valid Scheme.

We don't write Scheme programs this way, though. From what I can tell, it's for two reasons: one good, and one bad.

The good reason is that saying vector-ref means more to the reader. You know more about the complexity of the operation and what side effects it might have. When you call ref, who knows? Using concrete primitives allows for better program analysis and understanding.

The bad reason is that Scheme implementations, Guile included, tend to compile (car x) to much better code than (ref x 0). Scheme implementations in practice aren't well-equipped for polymorphic data access. In fact it is standard Scheme practice to abuse the "macro" facility to manually inline code so that that certain performance-sensitive operations get inlined into a closed graph of monomorphic operators with no callouts. To the extent that this is true, Scheme programmers, Scheme programs, and the Scheme language as a whole are all victims of their implementations. JavaScript, for example, does not have this problem -- to a small extent, maybe, yes, performance tweaks and tuning are always a thing but JavaScript implementations' ability to burn away polymorphism and abstraction results in an entirely different character in JS programs versus Scheme programs.

it gets worse

On the most basic level, Scheme is the call-by-value lambda calculus. It's well-studied, well-understood, and eminently flexible. However the way that the syntax maps to the semantics hides a constrictive monomorphism: that the "callee" of a call refer to a lambda expression.

Concretely, in an expression like (a b), in which a is not a macro, a must evaluate to the result of a lambda expression. Perhaps by reference (e.g. (define a (lambda (x) x))), perhaps directly; but a lambda nonetheless. But what if a is actually a vector? At that point the Scheme language standard would declare that to be an error.

The semantics of Clojure, though, would allow for ((vector 'a 'b 'c) 1) to evaluate to b. Why not in Scheme? There are the same good and bad reasons as with ref. Usually, the concerns of the language implementation dominate, regardless of those of the users who generally want to write terse code. Of course in some cases the implementation concerns should dominate, but not always. Here, Scheme could be more flexible if it wanted to.

what have you done for me lately

Although inline caches are not a miracle cure for performance overheads of polymorphic dispatch, they are a tool in the box. But what, precisely, can they do, both in general and for Scheme?

To my mind, they have five uses. If you can think of more, please let me know in the comments.

Firstly, they have the classic named property access optimizations as in JavaScript. These apply less to Scheme, as we don't have generic property access. Perhaps this is a deficiency of Scheme, but it's not exactly low-hanging fruit. Perhaps this would be more interesting if Guile had more generic protocols such as Racket's iteration.

Next, there are the arithmetic operators: addition, multiplication, and so on. Scheme's arithmetic is indeed polymorphic; the addition operator + can add any number of complex numbers, with a distinction between exact and inexact values. On a representation level, Guile has fixnums (small exact integers, no heap allocation), bignums (arbitrary-precision heap-allocated exact integers), fractions (exact ratios between integers), flonums (heap-allocated double-precision floating point numbers), and compnums (inexact complex numbers, internally a pair of doubles). Also in Guile, arithmetic operators are a "primitive generics", meaning that they can be extended to operate on new types at runtime via GOOPS.

The usual situation though is that any particular instance of an addition operator only sees fixnums. In that case, it makes sense to only emit code for fixnums, instead of the product of all possible numeric representations. This is a clear application where inline caches can be interesting to Guile.

Third, there is a very specific case related to dynamic linking. Did you know that most programs compiled for GNU/Linux and related systems have inline caches in them? It's a bit weird but the "Procedure Linkage Table" (PLT) segment in ELF binaries on Linux systems is set up in a way that when e.g. is loaded, the dynamic linker usually doesn't eagerly resolve all of the external routines that uses. The first time that calls frobulate, it ends up calling a procedure that looks up the location of the frobulate procedure, then patches the binary code in the PLT so that the next time frobulate is called, it dispatches directly. To dynamic language people it's the weirdest thing in the world that the C/C++/everything-static universe has at its cold, cold heart a hash table and a dynamic dispatch system that it doesn't expose to any kind of user for instrumenting or introspection -- any user that's not a malware author, of course.

But I digress! Guile can use ICs to lazily resolve runtime routines used by compiled Scheme code. But perhaps this isn't optimal, as the set of primitive runtime calls that Guile will embed in its output is finite, and so resolving these routines eagerly would probably be sufficient. Guile could use ICs for inter-module references as well, and these should indeed be resolved lazily; but I don't know, perhaps the current strategy of using a call-site cache for inter-module references is sufficient.

Fourthly (are you counting?), there is a general case of the former: when you see a call (a b) and you don't know what a is. If you put an inline cache in the call, instead of having to emit checks that a is a heap object and a procedure and then emit an indirect call to the procedure's code, you might be able to emit simply a check that a is the same as x, the only callee you ever saw at that site, and in that case you can emit a direct branch to the function's code instead of an indirect branch.

Here I think the argument is less strong. Modern CPUs are already very good at indirect jumps and well-predicted branches. The value of a devirtualization pass in compilers is that it makes the side effects of a virtual method call concrete, allowing for more optimizations; avoiding indirect branches is good but not necessary. On the other hand, Guile does have polymorphic callees (generic functions), and call ICs could help there. Ideally though we would need to extend the language to allow generic functions to feed back to their inline cache handlers.

Finally, ICs could allow for cheap tracepoints and breakpoints. If at every breakable location you included a jmp *loc, and the initial value of *loc was the next instruction, then you could patch individual locations with code to run there. The patched code would be responsible for saving and restoring machine state around the instrumentation.

Honestly I struggle a lot with the idea of debugging native code. GDB does the least-overhead, most-generic thing, which is patching code directly; but it runs from a separate process, and in Guile we need in-process portable debugging. The debugging use case is a clear area where you want adaptive optimization, so that you can emit debugging ceremony from the hottest code, knowing that you can fall back on some earlier tier. Perhaps Guile should bite the bullet and go this way too.

implementation plan

In Guile, monomorphic as it is in most things, probably only arithmetic is worth the trouble of inline caches, at least in the short term.

Another question is how much to specialize the inline caches to their call site. On the extreme side, each call site could have a custom calling convention: if the first operand is in register A and the second is in register B and they are expected to be fixnums, and the result goes in register C, and the continuation is the code at L, well then you generate an inline cache that specializes to all of that. No need to shuffle operands or results, no need to save the continuation (return location) on the stack.

The opposite would be to call ICs as if their were normal procedures: shuffle arguments into fixed operand registers, push a stack frame, and when the IC returns, shuffle the result into place.

Honestly I am looking mostly to the simple solution. I am concerned about code and heap bloat if I specify to every last detail of a call site. Also maximum speed comes with an adaptive optimizer, and in that case simple lower tiers are best.

sanity check

To compare these impressions, I took a look at V8's current source code to see where they use ICs in practice. When I worked on V8, the compiler was entirely different -- there were two tiers, and both of them generated native code. Inline caches were everywhere, and they were gnarly; every architecture had its own implementation. Now in V8 there are two tiers, not the same as the old ones, and the lowest one is a bytecode interpreter.

As an adaptive optimizer, V8 doesn't need breakpoint ICs. It can always deoptimize back to the interpreter. In actual practice, to debug at a source location, V8 will patch the bytecode to insert a "DebugBreak" instruction, which has its own support in the interpreter. V8 also supports optimized compilation of this operation. So, no ICs needed here.

Likewise for generic type feedback, V8 records types as data rather than in the classic formulation of inline caches as in Self. I think WebKit's JavaScriptCore uses a similar strategy.

V8 does use inline caches for property access (loads and stores). Besides that there is an inline cache used in calls which is just used to record callee counts, and not used for direct call optimization.

Surprisingly, V8 doesn't even seem to use inline caches for arithmetic (any more?). Fair enough, I guess, given that JavaScript's numbers aren't very polymorphic, and even with a system with fixnums and heap floats like V8, floating-point numbers are rare in cold code.

The dynamic linking and relocation points don't apply to V8 either, as it doesn't receive binary code from the internet; it always starts from source.

twilight of the inline cache

There was a time when inline caches were recommended to solve all your VM problems, but it would seem now that their heyday is past.

ICs are still a win if you have named property access on objects whose shape you don't know at compile-time. But improvements in CPU branch target buffers mean that it's no longer imperative to use ICs to avoid indirect branches (modulo Spectre v2), and creating direct branches via code-patching has gotten more expensive and tricky on today's targets with concurrency and deep cache hierarchies.

Besides that, the type feedback component of inline caches seems to be taken over by explicit data-driven call-site caches, rather than executable inline caches, and the highest-throughput tiers of an adaptive optimizer burn away inline caches anyway. The pressure on an inline cache infrastructure now is towards simplicity and ease of type and call-count profiling, leaving the speed component to those higher tiers.

In Guile the bounded polymorphism on arithmetic combined with the need for ahead-of-time compilation means that ICs are probably a code size and execution time win, but it will take some engineering to prevent the calling convention overhead from dominating cost.

Time to experiment, then -- I'll let y'all know how it goes. Thoughts and feedback welcome from the compilerati. Until then, happy hacking :)

73 responses

  1. Iliyan says:

    Hi, in "Concretely, in an expression like (a b), in which a is not a macro, a must evaluate to the result of a lambda expression.", I think you meant to say "...a must evaluate to a lambda expression."

  2. Tubemate Free says:

    Apps Tubemate is an application that you must have, especially for those who like to watch and download the video to be stored on a device a smartphone, there are 3 types of video formats that can later be selected when downloading which WEBM, MP4, or 3GP with file size and video resolution.

  3. do my assignment says:

    When we talk about inline caches it means this is the technique where we implement to accelerate. Through this lots of problems are solved.

  4. Toon Verwaest says:

    In V8 we don't actually implement inline caches as indirect jumps. There's simply a stub that interprets data that encodes how to load properties from receivers it has recorded. So there's no Spectre V2 issue :-)

    A good reason why this works is of course, as you mention, that higher tiers compile away inline caches anyway. Another reason is that JavaScript likely isn't as monomorphic as Self. Flushing the instruction stream is pretty expensive on modern out-of-order CPUs and better be amortized over time. It turns out that it's mainly a wash. We didn't really see JS performance get worse after this change.

    On the other hand, creating custom code objects is much more expensive than updating a data-driven cache. That's where we saw a big win on real-world pages: IC miss time typically halved. And before this change, on real-world page startup we spent roughly the same amount of time missing ICs as we were executing JS...

  5. Carousel Checks promo codes says:

    It is entirely different what we had in our minds, but this seems more reasonable and effective tho. But what about the compiler issues rises when the designing fails? How would you give a light on that?

  6. irmina says:

    Really nteresting

  7. discount codes says:

    Very interesting post.Really enjoyed reading on the topic inline caches.Thank you

  8. medical billing services says:

    Design notes are very appreciable.Hope you share more such informative posts and ideas for us.Thank you

  9. says:

    There are online tutorial about where is all programs in windows 10 without downloading any file or something.

  10. Smadav Antivirus Download says:

    Coming with every one of this more recent innovation are communication upgrades or two called transformations which come to be the structures where mans capability to organize, handle, and manage the newer much more steady set of spatial dynamics. All from progressing power technologies.guarantor financings

  11. vouchercodesdaily says:

    "Find the latest voucher codes daily and active 2018 discount codes for major merchants in the UK london. Save hundreds of pounds with freebies and ready to use voucher codes daily

  12. wireless mouse is problems says:

    Usually in all over the work can be represented by the wireless mouse but it get some problem its because of range that not means that wireless mouse get problem if you want to get full knowledge on them then you can go and search for more informative information.

  13. best hermes replica says:

    Where to buy best hermes replica? is the trusted seller for hermes replica bags and wallets. We have servered more than 10 k customers during the past years, and most of them are happy with the result.

  14. wuxiaworld says:

    Very good, I think I found the knowledge I needed. I will see and refer some information in your post. thank you

  15. hotmail login says:

    Interesting post! This is really helpful for me. I like it!

  16. my website says:

    Nice, posts similar to this are a very cool addition. I question what it would mean to the GOPs effort to weaken taxes.

  17. cool things to buy says:

    Nice ideas. It's very useful. Thanks for your sharing this information.

  18. Rubbish Removal Service says:

    This is a great information thanks for the post !

  19. Rubbish Removal Service says:

    This is a great information thanks for the post !

  20. says:

    Traveling is not my thing because I can't travel so much so far. The health issues always stop me when I am going for any adventure things.

  21. Essays Chief says:

    Do you know about the Design Notes on Inline Caches in Guile? If you don’t know, you should read the article since it discusses the design notes on inline caches in guile. The blog article mentions that Inline caches are a language implementation method used to pick up the pace of polymorphic dispatch. The article is well structured and it is divided into different paragraphs with subtitles

  22. Assignment Help says:

    Looking for someone write my assignment here are Expert Allassignmenthelp well efficient and capable of creating unique assignments for students .Assignment Help is a term which is best for students help, they can easily get help for their assignments online.

  23. Buy dissertation proposal writing says:

    When we talk about inline reserves it implies this is where we actualize to quicken. Through this loads of issues are explained.

  24. Finance Assignment Help says:

    Great article, thanks for sharing this.

  25. lily says:


  26. click says:

    Wow! I just learned something new today! Thanks to you!

  27. iphone 5s screen replacement says:

    For an easier repair, use our fix kit and follow this shorter guide to replace your iPhone’s entire display assembly.

  28. Adam Cole says:

    Nice post

  29. 1998yes says:

    Wow! I just learned something new today!

  30. Horea says:

    This is very interesting. 21st century learning!

  31. Trusty Tree Service says:

    Goodness! I almost fell for it if not for reading this article.

  32. Joan says:

    Thanks for diving deep on this topic! You're amazing.

  33. Web hosting and Software deals at cheap price says:

    I appreciate your post. I have got many ideas and knowledge.I really enjoy your post. If you want to buy a Web hosting and Software deals at cheap price. Feel free to Check this.

  34. Stove Repair says:

    I have been studying your entries all the way through my morning holiday, and I should admit the entire article has been very enlightening and very well written.

  35. Dissertation Help says:

    Nice article, thank you so much for posting this.

  36. nursing assignment help says:

    I just like this article, thank you so much and keep it up.

  37. says:

    Are you in the market for a new laptop and don't know where to start? This article will assist you in identifying the best laptop brands and provide you with information to help you find your next laptop.

  38. fermented cod liver oil says:

    The health benefits of cod liver oil include a lower cholesterol level, type I diabetes control in children,

  39. Webfeed360 says:

    Thanks to share this best information with us.

  40. johnhangkock says:

    Nice Blog! IT Contain very informative information about Inline caches keep sharing.

  41. kajal singh says:

    Download and install Vidmate App which is the best HD video downloader software available for Android. Get free latest HD movies, songs, and your favorite TV shows.

  42. Keira Doltan says:

    The increasing importance of Writing Assignments has prompted students to take a helping hand fromHomework

  43. coupon outlet says:

    Nice, posts similar to this are a very cool addition.

  44. Click here says:

    It's my pleasure to find this amazing website who provides mostly unseen information about every single topic which is i really like it.

  45. capital smart city islamabad says:

    Web site design is a original work that needs some thought and imagination. For a lucrative web site design, each element of design should be in right mixture.

  46. Bilal Alvi says:

    You have a good point here!I totally agree with what you have said!!Thanks for sharing your views…hope more people will read this article!!! Capital Smart City

  47. Website says:

    Titan Gel Special Penis Enlargement Gel For Men at best price

  48. Black Latte Reviews says:

    The iced matte black latte at Round K Cafe contains 98% Cacao powder.

  49. adamschule85 says:

    Someone essentially help to make critically posts I’d state. forbet

  50. park view city Islamabad map says:

    Repossessed Houses For Sale - Best Investment opportunity

  51. ff says:


  52. Jason Boo says:


  53. Boden Voucher Codes says:

    I’ve two boys, they’re 5 and 7, and I think I’ve found the ultimate stopover to shop for them. They’ve just converted me into a fan of their kids’ line of products. The wardrobes here are crammed of all the fabulous goods from Boden, and my kids look gorgeous in them. They just love clothes from them!

  54. My Reviewing says:

    Looking to get/replace a wireless router? Here we present some useful buying guidelines as well as some advice on making the most of your new purchase.

  55. online assignment help says:

    Need Online Assignment Help Australia? Goto Assignment Help offers all these courses:- Assignment Help Australia, Online Assignment Help, Australian Assignment Help, Help with Assignment, Assignment Helper with 25% OFF

  56. programming help says:

    My self David Tylor. From GotoAssignmentHelp. GotoAssignmentHelp is the leading plagiarism free assignment help provider in Australia and across the whole world. we cover 150+ subject and 600+ expert is on board. Service starting At 80$ /- and flat 25% off on your first order.

  57. dissertation help uk says:

    Googling for Affordable dissertation help UK?
    Check Out Our Online dissertation writer
    Are you facing multiple problems regarding to your proofreading services uk ?

  58. essay writing service says:

    Googling for Affordable essay writing service?
    Check Out Our Online best essay cheap writer.
    Are you facing multiple problems regarding to your essay writing service?

  59. writingpapersucks says:

    I ordered coursework after reading boost my grades reviews recently and even have not expected that it would be done so fast! Really! My writer Mina was very polite during communication, she answered all my questions and also clarified with me what I would like to include in my coursework, what materials to accompany and what should be avoided.

  60. Dissertation Help Online UK says:

    The perfect way to follow the instruction and updates about the generated method with the same set of entries to modify source code and built-in types whose instances can be differentiated easily by Optimizing dynamically-typed object-oriented languages.

  61. take my online test says:

    There has probably been a "do my essay for me" situation in every student`s life. Some ignore the inner voice and continue to fight a number of assignments that the instructors keep overloading them with. It`s an open secret that the Internet is full of the providers of the writing services of a different quality like this " take my online test "

  62. hi there says:

    ok, so far so good...

    best wedding photographers

  63. dealmecoupon says:

    Thanks for sharing this amazing article, it is very informative post good work keep it up.

  64. complete course work for me says:

    Nice idea. I deem that it can be challengeable to complete this deal. I hope that as a result, you'll get what you dream about.

  65. Assignment Writing Service says:

    Thank you for sharing this post, It is a good way of explaining and easy to understand.

  66. Fencestore discount code 35% Off Dec 2019 says:

    Thanks for the great post, i really loved it..

  67. Cassandra D. Everhart says:

    I am not positive the place you're getting your info, but good topic. our site

  68. Pennysaviour says:

    It’s quite innovative from the thing that we usually have in our mind, No wonder that it seen justifiable. There might be some sort of solution if the design fails. How are we probably going to light that up?

  69. Web Developer NYC says:


  70. Best Assignment Help Online says:

    Thank you for sharing this information with us. I really appreciate your efforts to write such an amazing piece of content for us.

  71. paper editing websites says:

    I am a professional writer! and I opened a new kind of company! we check the text for grammar! therefore, if you doubt your literacy or simply think that you should always be at your best! contact us! we will check your text! here is our website

  72. multi domain ssl says:

    this is really awesome. a bit overwhelming with so many setups though. which ones do you think are best?

  73. Get New Gmail Account says:

    This is very informative post for me and I will must visit here again.

Leave a Reply