ecmascript generators from a performance perspective

11 June 2013 1:48 PM (ecmascript | es6 | igalia | v8 | compilers | partial evaluation | performance)

It's been really gratifying to see the recent interest in ES6 generators in V8. I'm still a Schemer at heart, but one thing all language communities should learn from JS is its enthusiasm and sense of joyful play. One can't spend much time around folks like James Long or Domenic Denicola without feeling the urge to go out and build something fun.

perfie perf perf perf

But this is a blog about nargery! A lot of people have been speculating about the performance impact of using generators, and I've been meaning to write about it, so perhaps this article will tickle your neuron.

A generator is a function that can suspend. So the first perf question to ask yourself is, does the performance of generators matter at all? If your generator spends more time suspended than active, then probably not. Though it's not the only application of generators, if you are using generators for asynchronous programming, then probably generator performance matters not at all.

But that's not very satisfying, right? Even though it probably doesn't matter, maybe it does, and then you need to make sure your mental model of what is fast and what is slow corresponds to reality.


So, the skinny: as implemented in V8, the primary perf impact of generators is that they do not get optimized by Crankshaft.

As you probably know, Crankshaft is V8's optimizing compiler. There are two ways a function can be optimized: on entry, because the function is called many times, or within a loop, because the loop is taking too long. In the first case it's fairly easy: you kick off a parallel task to recompile the function, reading the type feedback collected by the inline caches from the baseline compiler. Once you've made the optimized function, you swap the function's code pointer, and the next time you call the function you get the optimized version instead of the generic version.

The other way is if you are in a function, in a hot loop, and you decide to optimize the function while you are in it. This is called on-stack replacement (OSR), and is important for functions that are only called once but do have hot loops. (Cough KRAKEN cough.) It's trickier to do parallel compilation with OSR, as the function might have left the loop by the time recompilation finishes, but oh well.

So in summary, when V8 goes to optimize a function it will be in one of two states: inactive, because it hasn't started to run yet, or active, because it's stuck in a tight loop.

Generators introduce a new state for functions: suspended. You can know whether a function is active or inactive, but you can't know how many suspended generator activations are out in the JS heap. So you have multiple potential entry points to the function, which complicates the optimizer as it has to work on graphs with irreducible loops. Perhaps it's still possible to optimize, as OSR is a form of irreducible loops; I suspect though that you would end up compiling as many optimized functions as there are yield points, plus one for the function entry (and possibly another for OSR). Anyway it's a bit complicated.

Also, deoptimization (bailout) becomes more difficult if you can have suspended activations, and as a technical detail, it's tricky for V8 to efficiently capture the state of an activation in a way that the garbage collector can understand.

The upshot is that tight loops in generators won't run fast. If you need speed in a loop in a generator, you will need to call out to some other function (which itself would get optimized).


On the other hand, I should note that generators are optimized for quick suspend and resume. In a generator function, V8 stores local variables on the heap instead of on the stack. You would think this would make things slow, but that is not the case. Access to locals has the same speed characteristics: both stack and heap locals are accessed by index from a register (the stack pointer or the context pointer).

There is an allocation overhead when you enter a generator to create the heap storage, but it is almost never the case that you need to copy or allocate when suspending a generator. Since the locals are on the heap already, there is nothing to do! You just write the current instruction pointer (something you know at compile-time) into a slot in the generator object and return the result.

Similarly, when resuming you have to push on a few words to rebuild the stack frame, but after that's done you just jump back to where you were and all is good.

The exception to this case is when you yield and there are temporaries on the stack. Recall in my article on V8's baseline compiler that the full-codegen is a stack machine. It allocates slots to named locals, but temporary values go on the stack at run-time, and unfortunately V8's baseline compiler is too naive even to know what the stack height is at compile-time. So each suspension checks the stack height, and if it is nonzero, it calls out to a runtime helper to store the operands.

This run-time save would be avoidable in register machines like JavaScriptCore's baseline compiler, which avoids pushing and popping on the stack. Perhaps this note might spur the competitive urges of some fine JSC hacker, to show up V8's generators performance. I would accept JSC having faster generators if that's what it took to get generators into WebKit, at least for a while anyway :)

abstract musings

Many of you have read about the Futamura projections, in which one compiles a program via partial application of an interpreter with respect to a given input program. Though in the real world this does not appear to be a good strategy for implementing compilers, as so much of compilation is not simply about program structure but also about representation, there is a kernel of deep truth in this observation: the essence of compilation is partial evaluation of a strategy with respect to a particular program.

This observation most often helps me in the form of its converse. Whenever I find myself implementing some sort of generic evaluation strategy, there's a sign in my head that start blinking the word "interpreter". (Kinda like the "on-air" signs in studios.) That means I'm doing something that's probably not as efficient as it could be. I find myself wondering how I can make the strategy specific to a particular program.

In this case, we can think of suspending and resuming not as copying a stack frame out and in, but instead compiling specific stubs to copy out only the values that are needed, in the representations that we know that they have. Instead of looping from the bottom to the top of a frame, you emit code for each element. In this way you get a specific program, and if you manage to get here, you have a program that can be inlined too: the polymorphic "next" reduces to a monomorphic call of a particular continuation.

Anyway, Amdahl's law limits the juice we can squeeze from this orange. For a tight for-of loop with a couple of optimizations that should land soon in V8, I see generator performance that is only 2 times slower that the corresponding hand-written iterators that use stateful closures or objects: 25 nanoseconds per suspend/resume versus 12 for a hand-written iterator.

25 nanoseconds, people. You're using jQuery and asking for data from a server in the next country. It is highly unlikely that generators will be a bottleneck in your program :)

16 responses

  1. Sanjoy Das says:

    > It's trickier to do parallel compilation with OSR, as the
    > function might have left the loop by the time recompilation
    > finishes, but oh well.

    Another reason why parallel OSR is difficult in v8: the OSR machinery needs to be told about the pc at which the non-optimized code will jump into the optimized code (so OSR is problematic even if the pc is in the loop by the end of compilation).

  2. Ole Laursen says:

    About the nanoseconds - I once wrote a little C++ iterator utility class for accessing pixels in gdk-pixbuf image data. It made it easy to process the pixels much faster than through the official interface. But it was only possible because the abstraction overhead was removed by the compiler.

  3. Anonymous says:

    I wonder how much work it would be to rework a Javascript interpreter/compiler to use a GHC-style thunk-evaluation model for lazy evaluations?

  4. daishi says:

    Hi, I worked on CPS transformation:
    I was expecting generators may speed up this.
    So double overhead is a bad news.
    Is there any possibility, once the optimization is done,
    to reduce the overhead of generators less than that of hand written iterators?

  5. Joint Ventures Case Solution says:

    Your web journal is superb and i genuinely welcome you and seeking after some more accommodating posts.

  6. Manchun says:

    Nice post thank you website Designing Company

  7. Jack Fernando says:

    Excellent post! Thanks

    If you are facing problems with your Email, contact us anytime for getting immediate resolution for their Ymail issues at our Yahoo contact support phone number UK. On getting trapped with any unexpected error that creates issue in smooth login into gmail account, a customer can get help on just a quick call at Gmail customer support contact helpline number UK. All sort of mailing issues are fixed in the minimum possible time.

  8. Mark Steve says:

    Very information about ecmascript generators. Thanks for sharing it.

  9. jackson says:

    Telangana 10th results will be available on All the students of telangana can watch following website for their results.

  10. Australian Open 2017 says:

    An “After 5” Ground Pass ticket will be available for the 2017 Australian Open. The “After 5” Ground Pass will provide access to the Australian Open

  11. flipkart app for pc says:

    Flipkart's app. ... Almost 10 months after launching its in-app peer-to-peer chat service Ping, e-commerce major Flipkart has decided to replace it with ‘user to seller’ chat.

  12. weight watchers says:

    This articles is great to be continued please

  13. YO YO says:
  14. ADIL says:
  15. Ignacio Terada says:

    “Prestasi tersebut dapat diraih karena bimbingan dari para pembina, para guru, dan para kepala sekolah dalam ironsteelcenter.comHarga besi beton Sni Ulir Polos Harga besi beton Sni Ulir PolosHarga besi hollow Harga besi hollowHarga besi cnp Harga besi cnpHarga besi unp Harga besi unpHarga wiremesh Harga wiremeshHarga besi wf Harga besi wfHarga besi h beam Harga besi h beamHarga Plat besi Harga Plat besiHarga pipa besi baja sch 40 sch 80 Harga pipa besi baja sch 40 sch 80Harga besi siku Harga besi sikuHarga Plat kapal besi baja bki krakatau steel Harga Plat kapal besi baja bki krakatau steelHarga bondek Harga bondekHarga baja ringan Harga baja ringanHarga Atap spandek Harga atap spandekHarga stainless steel Harga stainless steeljasa konstruksi jasa konstruksi besi baja jasa konstruksi gudang jasa konstruksi gedung jasa konstruksi undangan pernikahan undangan pernikahan simpleundangan pernikahan online udangan pernikahan pinkundangan pernikahan unik undangan pernikahan onlineundangan pernikahan murah undangan pernikahan islamiundangan pernikahan islami undangan pernikahan murahundangan pernikahan elegan undangan pernikahan artisundangan pernikahan unik dan murah contoh undangan pernikahan www.gudangbesibaja.comHarga besi cnp Harga besi cnpHarga besi h beam baja Harga besi h beam bajaHarga Plat besi plat kapal Harga Plat besi plat kapalHarga besi siku Harga besi sikuHarga besi unp Harga besi unpHarga besi wf baja Harga besi wf bajaHarga besi beton Sni Ulir Polos Harga besi beton Sni Ulir PolosHarga besi hollow Harga besi hollowHarga pipa besi baja sch 40 sch 80 Harga pipa besi baja sch 40 sch 80Harga wiremesh Harga wiremeshHarga bondek Harga bondekHarga besi Wf Baja Harga besi Wf Bajajasa konstruksi baja wf jasa konstruksi jembatan jasa konstruksi bangunan jasa konstruksi undangan pernikahan elegan dan murah undangan pernikahan eleganundangan pernikahan simple undangan pernikahan elegan dan murahundangan pernikahan artis undangan pernikahan putihudangan pernikahan pink undangan pernikahan unik dan murahundangan pernikahan putih undangan pernikahan unikContoh undangan pernikahan undangan pernikahan

    harga besi beton sni toko besi baja harga besi bahan bangunanharga pipa stainless steel pipa galvanis medium a besi bjkujual baja wf tabel baja krakatau steel harga besi ulir 16 mmharga stainless steel harga baja profil per kg harga besi 12 sniharga besi ulir harga besi wire mesh harga besi 8 mmdaftar harga pipa galvanis harga besi hollow stainless harga besi beton 10harga besi wf 200 harga baja hollow harga besi 13 ulirbesi kanal c galvanis steel rangka besi betondaftar harga besi beton harga pipa hollow harga besi kgjual wiremesh besi beam sni besi betonsupplier besi profil baja iwf harga besi behel 8mmbesi baja pipa galvanized harga besi beton 10mm snikonstruksi baja wf jual expanded metal harga besi ulir 10daftar harga besi hollow besi wire mesh harga sikuharga wiremesh profil baja h beam harga besi siku 4x4Supplier besi harga beam 200 besi siku hargaharga besi baja harga besi cnp 100 harga pipa besi hitamharga pipa baja jual besi cnp pipa seamlessbesi beton murah harga besi unp 100 daftar harga pipa stainless steelharga kanal c besi kanal c harga harga pipaharga besi stainless harga besi cnp 125 pipa stainlessharga besi per kg besi u galvanisharga plat stainless steel besi c pipa besi galvanisbesi unp harga besi cnp 150 harga besi hollow untuk pagarjual besi wf kanal c pipa besi hitamharga baja h beam daftar harga besi kanal c harga besi hollow 40x40

  16. Manchun says:

    Nice Article Thank you. JanBask Training as a Trainer, we offering Salesforce, Hadoop, Dot Net, SQL, PMP, Business Analyst and QA Testing.

Leave a Reply