firefox's low-latency webassembly compiler

25 March 2020 4:29 PM (igalia | compilers | firefox | spidermonkey | webassembly | bloomberg)

Good day!

Today I'd like to write a bit about the WebAssembly baseline compiler in Firefox.

background: throughput and latency

WebAssembly, as you know, is a virtual machine that is present in web browsers like Firefox. An important initial goal for WebAssembly was to be a good target for compiling programs written in C or C++. You can visit a web page that includes a program written in C++ and compiled to WebAssembly, and that WebAssembly module will be downloaded onto your computer and run by the web browser.

A good virtual machine for C and C++ has to be fast. The throughput of a program compiled to WebAssembly (the amount of work it can get done per unit time) should be approximately the same as its throughput when compiled to "native" code (x86-64, ARMv7, etc.). WebAssembly meets this goal by defining an instruction set that consists of similar operations to those directly supported by CPUs; WebAssembly implementations use optimizing compilers to translate this portable instruction set into native code.

There is another dimension of fast, though: not just work per unit time, but also time until first work is produced. If you want to go play Doom 3 on the web, you care about frames per second but also time to first frame. Therefore, WebAssembly was designed not just for high throughput but also for low latency. This focus on low-latency compilation expresses itself in two ways: binary size and binary layout.

On the size front, WebAssembly is optimized to encode small files, reducing download time. One way in which this happens is to use a variable-length encoding anywhere an instruction needs to specify an integer. In the usual case where, for example, there are fewer than 128 local variables, this means that a local.get instruction can refer to a local variable using just one byte. Another strategy is that WebAssembly programs target a stack machine, reducing the need for the instruction stream to explicitly load operands or store results. Note that size optimization only goes so far: it's assumed that the bytes of the encoded module will be compressed by gzip or some other algorithm, so sub-byte entropy coding is out of scope.

On the layout side, the WebAssembly binary encoding is sorted by design: definitions come before uses. For example, there is a section of type definitions that occurs early in a WebAssembly module. Any use of a declared type can only come after the definition. In the case of functions which are of course mutually recursive, function type declarations come before the actual definitions. In theory this allows web browsers to take a one-pass, streaming approach to compilation, starting to compile as functions arrive and before download is complete.

implementation strategies

The goals of high throughput and low latency conflict with each other. To get best throughput, a compiler needs to spend time on code motion, register allocation, and instruction selection; to get low latency, that's exactly what a compiler should not do. Web browsers therefore take a two-pronged approach: they have a compiler optimized for throughput, and a compiler optimized for latency. As a WebAssembly file is being downloaded, it is first compiled by the quick-and-dirty low-latency compiler, with the goal of producing machine code as soon as possible. After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code. The optimizing compiler can take more time because it runs on a separate thread. When the optimizing compiler is done, it replaces the baseline code. (The actual heuristics about whether to do baseline + optimizing ("tiering") or just to go straight to the optimizing compiler are a bit hairy, but this is a summary.)

This article is about the WebAssembly baseline compiler in Firefox. It's a surprising bit of code and I learned a few things from it.

design questions

Knowing what you know about the goals and design of WebAssembly, how would you implement a low-latency compiler?

It's a question worth thinking about so I will give you a bit of space in which to do so.




After spending a lot of time in Firefox's WebAssembly baseline compiler, I have extracted the following principles:

  1. The function is the unit of compilation

  2. One pass, and one pass only

  3. Lean into the stack machine

  4. No noodling!

In the remainder of this article we'll look into these individual points. Note, although I have done a good bit of hacking on this compiler, its design and original implementation comes mainly from Mozilla hacker Lars Hansen, who also currently maintains it. All errors of exegesis are mine, of course!

the function is the unit of compilation

As we mentioned, in the binary encoding of a WebAssembly module, all definitions needed by any function come before all function definitions. This naturally leads to a partition between two phases of bytestream parsing: an initial serial phase that collects the set of global type definitions, annotations as to which functions are imported and exported, and so on, and a subsequent phase that compiles individual functions in an essentially independent manner.

The advantage of this approach is that compiling functions is a natural task unit of parallelism. If the user has a machine with 8 virtual cores, the web browser can keep one or two cores for the browser itself and farm out WebAssembly compilation tasks to the rest. The result is that the compiled code is available sooner.

Taking functions to be the unit of compilation also allows for an easy "tier-up" mechanism: after the baseline compiler is done, the optimizing compiler can take more time to produce better code, and when it is done, it can swap out the results on a per-function level. All function calls from the baseline compiler go through a jump table indirection, to allow for tier-up. In SpiderMonkey there is no mechanism currently to tier down; if you need to debug WebAssembly code, you need to refresh the page, causing the wasm code to be compiled in debugging mode. For the record, SpiderMonkey can only tier up at function calls (it doesn't do OSR).

This simple approach does have some down-sides, in that it leaves intraprocedural optimizations on the table (inlining, contification, custom calling conventions, speculative optimizations). This is mitigated in two ways, the most obvious being that LLVM or whatever produced the WebAssembly has ideally already done whatever inlining might be fruitful. The second is that WebAssembly is designed for predictable performance. In JavaScript, an implementation needs to do run-time type feedback and speculative optimizations to get good performance, but the result is that it can be hard to understand why a program is fast or slow. The designers and implementers of WebAssembly in browsers all had first-hand experience with JavaScript virtual machines, and actively wanted to avoid unpredictable performance in WebAssembly. Therefore there is currently a kind of détente among the various browser vendors, that everyone has agreed that they won't do speculative inlining -- yet, anyway. Who knows what will happen in the future, though.

Digressing, the summary here is that the baseline compiler receives an individual function body as input, and generates code just for that function.

one pass, and one pass only

The WebAssembly baseline compiler makes one pass through the bytecode of a function. Nowhere in all of this are we going to build an abstract syntax tree or a graph of basic blocks. Let's follow through how that works.

Firstly, emitFunction simply emits a prologue, then the body, then an epilogue. emitBody is basically a big loop that consumes opcodes from the instruction stream, dispatching to opcode-specific code emitters (e.g. emitAddI32).

The opcode-specific code emitters are also responsible for validating their arguments; for example, emitAddI32 is wrapped in an assertion that there are two i32 values on the stack. This validation logic is shared by a templatized codestream iterator so that it can be re-used by the optimizing compiler, as well as by the publicly-exposed WebAssembly.validate function.

A corollary of this approach is that machine code is emitted in bytestream order; if the WebAssembly instruction stream has an i32.add followed by a i32.sub, then the machine code will have an addl followed by a subl.

WebAssembly has a syntactically limited form of non-local control flow; it's not goto. Instead, instructions are contained in a tree of nested control blocks, and control can only exit nonlocally to a containing control block. There are three kinds of control blocks: jumping to a block or an if will continue at the end of the block, whereas jumping to a loop will continue at its beginning. In either case, as the compiler keeps a stack of nested control blocks, it has the set of valid jump targets and can use the usual assembler logic to patch forward jump addresses when the compiler gets to the block exit.

lean into the stack machine

This is the interesting bit! So, WebAssembly instructions target a stack machine. That is to say, there's an abstract stack onto which evaluating i32.const 32 pushes a value, and if followed by i32.const 10 there would then be i32(32) | i32(10) on the stack (where new elements are added on the right). A subsequent i32.add would pop the two values off, and push on the result, leaving the stack as i32(42). There is also a fixed set of local variables, declared at the beginning of the function.

The easiest thing that a compiler can do, then, when faced with a stack machine, is to emit code for a stack machine: as values are pushed on the abstract stack, emit code that pushes them on the machine stack.

The downside of this approach is that you emit a fair amount of code to do read and write values from the stack. Machine instructions generally take arguments from registers and write results to registers; going to memory is a bit superfluous. We're willing to accept suboptimal code generation for this quick-and-dirty compiler, but isn't there something smarter we can do for ephemeral intermediate values?

Turns out -- yes! The baseline compiler keeps an abstract value stack as it compiles. For example, compiling i32.const 32 pushes nothing on the machine stack: it just adds a ConstI32 node to the value stack. When an instruction needs an operand that turns out to be a ConstI32, it can either encode the operand as an immediate argument or load it into a register.

Say we are evaluating the i32.add discussed above. After the add, where does the result go? For the baseline compiler, the answer is always "in a register" via pushing a new RegisterI32 entry on the value stack. The baseline compiler includes a stupid register allocator that spills the value stack to the machine stack if no register is available, updating value stack entries from e.g. RegisterI32 to MemI32. Note, a ConstI32 never needs to be spilled: its value can always be reloaded as an immediate.

The end result is that the baseline compiler avoids lots of stack store and load code generation, which speeds up the compiler, and happens to make faster code as well.

Note that there is one limitation, currently: control-flow joins can have multiple predecessors and can pass a value (in the current WebAssembly specification), so the allocation of that value needs to be agreed-upon by all predecessors. As in this code:

(func $f (param $arg i32) (result i32)
  (block $b (result i32)
    (i32.const 0)
    (local.get $arg)
    (br_if $b) ;; return 0 from $b if $arg is zero
    (i32.const 1))) ;; otherwise return 1
;; result of block implicitly returned

When the br_if branches to the block end, where should it put the result value? The baseline compiler effectively punts on this question and just puts it in a well-known register (e.g., $rax on x86-64). Results for block exits are the only place where WebAssembly has "phi" variables, and the baseline compiler allocates all integer phi variables to the same register. A hack, but there we are.

no noodling!

When I started to hack on the baseline compiler, I did a lot of code reading, and eventually came on code like this:

void BaseCompiler::emitAddI32() {
  int32_t c;
  if (popConstI32(&c)) {
    RegI32 r = popI32();
    masm.add32(Imm32(c), r);
  } else {
    RegI32 r, rs;
    pop2xI32(&r, &rs);
    masm.add32(rs, r);

I said to myself, this is silly, why are we only emitting the add-immediate code if the constant is on top of the stack? What if instead the constant was the deeper of the two operands, why do we then load the constant into a register? I asked on the chat channel if it would be OK if I improved codegen here and got a response I was not expecting: no noodling!

The reason is, performance of baseline-compiled code essentially doesn't matter. Obviously let's not pessimize things but the reason there's a baseline compiler is to emit code quickly. If we start to add more code to the baseline compiler, the compiler itself will slow down.

For that reason, changes are only accepted to the baseline compiler if they are necessary for some reason, or if they improve latency as measured using some real-world benchmark (time-to-first-frame on Doom 3, for example).

This to me was a real eye-opener: a compiler optimized not for the quality of the code that it generates, but rather for how fast it can produce the code. I had seen this in action before but this example really brought it home to me.

The focus on compiler throughput rather than compiled-code throughput makes it pretty gnarly to hack on the baseline compiler -- care has to be taken when adding new features not to significantly regress the old. It is much more like hacking on a production JavaScript parser than your traditional SSA-based compiler.

that's a wrap!

So that's the WebAssembly baseline compiler in SpiderMonkey / Firefox. Until the next time, happy hacking!

69 responses

  1. Marius Gedminas says:

    Would you mind defining "OSR"? On-Stack Replacement? Wikipedia lists the term on the disambiguation page but has no article about it; Stack Overflow seems to have some explanation at It seems to match the context.

  2. justine says:

    I wish this post already compiler to this low-latency.

  3. Online Class Help says:

    This post is very informative for me because I am a student and I am learning in this course.

  4. says:

    Learned a lot about WebAssembly.

  5. says:

    Thanks for sharing this!

  6. hayvine says:

    The firefox is always lacking behind than chrome! Due to the priorities set by the products. And the compiler is no different

  7. new york electrician says:

    Your point about the baseline compiler is great, I've been looking for ways to speed up the compiler and I believe you found the answer!

  8. says:

    I really appreciate your effort in posting this article, you've shared a great info. Thanks!

  9. rijschool pijnacker says:

    Thanks, very good post. Keep posting.

  10. KN95 Manufacturer says:

    any update about latest firefox

  11. Mens underwear says:

    Boxer shorts for men don’t have to be a boring purchase. Splash out on men’s fashion underwear that we have on offer. We have something for all men’s underwear styles - boxer briefs and boxer shorts for men as well as some more revealing men’s shorts to try. Looking slick underwear for men are truly unique for anyone who is looking to wow their partner by owning these mens revealing shorts

  12. sextreffen Bremen says:

    v Bei einem Online Bremen sextreffen wird es richtig pervers und versaut

  13. real estate investments in Raleigh NC says:

    I have read your excellent post. I have enjoyed reading your post first time. Thank you...real estate investments in Raleigh NC

  14. keto diet pills amazon says:

    I really enjoy reading and also appreciate your work.Keto BHB Diet Pills

  15. skincare21 says:

    v.good compiler

  16. مباريات اليوم says:

    I really enjoy reading and also appreciate your work

  17. says:

    love this so much

  18. Air Charter says:

    thanks for delivering your ideas

  19. says:

    Great job for sharing about WebAssembly.
    - Zachary Costello, Albuquerque Land Surveyors

  20. Property Surveyor says:

    Great post about WebAssembly in firefox.

  21. makeup says:


  22. BH5 hormonal harmony says:

    When some one searches for his vital thing, so he/she needs to be available that in detail, so that thing is maintained over

  23. electricians alexandria va says:

    After that "baseline" compiler has run, the "optimizing" compiler works in the background to produce high-throughput code.

  24. print gekko says:

    I think other site proprietors should take this web site as an model, very clean and wonderful user genial style and design, as well as the content

  25. click says:

    It is a binary instruction format for a stack-based virtual machine.

  26. derma correct says:

    Every weekend i used to pay a quick visit this website, because i wish for enjoyment, for the reason that this this website conations genuinely good funny information too.

  27. Independent Delhi Escorts says:

    I really like this amazing blog. I have found lots of information here and it was very helpful post.

  28. says:

    There as certainly a lot to find out about this subject. I really like all the points you made.

  29. section township range says:

    Pretty section of content. I just stumbled upon your weblog and in accession capital to claim that I get...

  30. healing crystals says:

    Crystals and Pearls Australia is a Sydney Australia based gemstone and crystal retail provider which sells high-quality Crystals, Gemstones, Pearls, Crystal Generators, and Healing Crystals worldwide.

  31. American made organic clothing says:

    As a family-owned and operated Certified Organic Apparel brand, Leaf to Ember takes pride in producing premium-quality American made Organic Clothing, T-Shirts, Hats, Bags, Sneakers and more. Shop our online store for the best premium organic hemp and organic cotton clothing made in the USA.

  32. says:
  33. Indian Escorts in Abu Dhabi says:

    meet and enjoy your life in abu dhabi

  34. Pune Escorts says:

    I love to read this wonderful article all the info are very informative on this blog.

  35. Arpita Chowdhary says:

    Thank you so much for sharing valuable Info! I was looking something just like this for my own project! I am quite thankful I came across to your site.

  36. Shruti Goyal says:

    Wow! This might be one special of the Useful sites we have arrived across with this specific issue. So that I can understand your campaign, I will be also a professional in this problem.

  37. bubble shooter says:

    There are many people looking for articles like this one. I have received quite a bit of information from this article.vex

  38. Jeanne Page says:

    Much thanks to you such a great amount for sharing important Info! I was looking something simply like this for my own undertaking! I am very appreciative I went over to your site.Someone Do My Online Classes For Me

  39. simran patel says:

    We have searing and sizzling youngsters in city. We have blonde youngsters, brunette females, energetic Escorts,
    slight Escorts, well proportioned Escorts, Escorts to go with you abroad on a business trip or escape. Etc and we finished it for you.

  40. neha singh says:
  41. Love marriage problem solution astrologer says:

    Visit my webpage For Free online Consultancy of Love vashikaran specialist

  42. JACKSON says:

    Excellent service both the driver who delivered the skip and the driver who collected were both polite, helpful, and courteous. From initial booking to collection of the skip, all went super smooth. We'll definitely be using this company again the skip hire poole dorset.

  43. OSCAR says:

    Very pleased with the work. It was completed very quickly and to a high standard. Even the neighbors have said how impressed they are. The business owner was friendly and professional. Definitely use them again when I'm ready to do other work on cardiff driveways and landscaping.

  44. ajmer escort says:

    Our female escorts are incredibly reasonable, have extraordinary feeling of parody and receptive, that is the reason our female mates would be the absolute best escorts in udaipur and the numerous lovely . The girls around udaipur town have requested that I examine my extraordinary viewpoint concerning the universe of beneath issues here on their site and I am glad to do as such. Each person needs to put in almost no time in full tranquility and harmony.

  45. Simmi Pradhan says:

    For Delight your mind with these hot ladies, Get a complete package of fun and enjoyment with babes or models. You may browse here easily: @

  46. says:

    Glad to see this great information.

  47. Mark Miller says:

    WebAssembly is a virtual machine for C and C++ which is very good and it is present on the various web browsers like firefox.

  48. Tree trimming raleigh says:

    I am really impressed with the information you provide in your post.

  49. says:

    you really impressed me with this one thank you so much sir for the amazing work

  50. says:

    Thanks for sharing this is very informative blog.

  51. Auto Detailing Charlotte says:

    Thank you! We've been using your website as a guide. Throughout 2020, we've been able to build our own website! Thanks again, and we'd love to detail your car.

  52. jordan says:

    Thank you for sharing such a nice blog.Astroswamig is the famous online astrology prediction you can talk to best astrologer of India.and you know your horoscope and kundali.

    Gemini Horoscope 2021Taurus Horoscope 2021Aries Horoscope 2021Talk To AstrologerAaj ka RashifalOnline AstrologerFree kundli Prediction

  53. online chicken delivery in guwahati says:

    online chicken delivery in Guwahati and get free delivery by giggsmeat

  54. online raw chicken delivery in guwahati says:
  55. HP Printer Support Number says:

    HP PRINTER Tech Support Number Helpline Customer Care Helpline Customer Care became in Concord, it wasn't surprising to discover Frank Allocco Jr. sitting in the stands at Clayton Valley HP PRINTER young men b-ball games

  56. Kundli in Hindi says:

    Get your kundli in hindi at for free. Our free kundli will providing accurate future predictions based on your birth details. Also you check shani sade sati, birth chart and dosha in your kundli.

  57. Uttam Nagar escorts says:

    Our agency provides the best Uttam Nagar Escorts service by the highest and current escort young women in Delhi. With our escort agency in Uttam Nagar, clients feel happy and will appreciate hot young women for diversion and happiness. Thus, you will make your day with a higher model. With our agency, customers are happy all night with their beauty queens from a wide range of their portfolio.

  58. Uttam Nagar escorts says:

    Our agency provides the best Uttam Nagar Escorts service by the highest and current escort young women in Delhi. With our escort agency in Uttam Nagar, clients feel happy and will appreciate hot young women for diversion and happiness. Thus, you will make your day with a higher model. With our agency, customers are happy all night with their beauty queens from a wide range of their portfolio.

  59. Bangalore Escorts says:

    Thanks for sharing this brilliant article it was a very useful and helpful article.

    devanahalli call girlsub city escortscall girls in nandini layout

  60. custom essay writing service usa says:

    Oh, great, your article provided me with useful information and a fresh perspective on the subject [url=]Fast Cash[/url

  61. Wuxiaworld says:

    Thank you, me and many others are also interested in this matter.

  62. Rotherham Kitchens says:

  63. Kitchen planning says:

    Kitchens planning

  64. Doncaster Kitchens says:

    Doncaster kitchens

  65. High-Class-Escorts-In-Dlhi says:
  66. אינסטלטור בחיפה says:

    Perfectly written articles , appreciate it for selective information .

  67. Divya Goal says:
  68. Ankita Tiwari says:
  69. Ankita Tiwari says:


Leave a Reply