the last 5 years of V8's garbage collector

Captain, status report: I’m down here in a Jeffries tube, poking at V8’s garbage collector. However, despite working on other areas of the project recently, V8 is now so large that it’s necessary to ignore whole subsystems when working on any given task. But now I’m looking at the GC in anger: what is its deal? What does V8’s GC even look like these days?

The last public article on the structure of V8’s garbage collector was in 2019; fine enough, but dated. Now in the evening of 2023 I think it could be useful to revisit it and try to summarize the changes since then. At least, it would have been useful to me had someone else written this article.

To my mind, work on V8’s GC has had three main goals over the last 5 years: improving interactions between the managed heap and C++, improving security, and increasing concurrency. Let’s visit these in turn.

C++ and GC

Building on the 2018 integration of the Oilpan tracing garbage collector into the Blink web engine, there was some refactoring to move the implementation of Oilpan into V8 itself. Oilpan is known internally as cppgc.

I find the cppgc name a bit annoying because I can never remember what it refers to, because of the other thing that has been happpening in C++ integration: a migration away from precise roots and instead towards conservative root-finding.

Some notes here: with conservative stack scanning, we can hope for better mutator throughput and fewer bugs. The throughput comes from not having to put all live pointers in memory; the compiler can keep them in registers, and avoid managing the HandleScope. You may be able to avoid the compile-time and space costs of stack maps (side tables telling the collector where the pointers are). There are also two classes of bug that we can avoid: holding on to a handle past the lifetime of a handlescope, and holding on to a raw pointer (instead of a handle) during a potential GC point.

Somewhat confusingly, it would seem that conservative stack scanning has garnered the acronym “CSS” inside V8. What does CSS have to do with GC?, I ask. I know the answer but my brain keeps asking the question.

In exchange for this goodness, conservative stack scanning means that because you can’t be sure that a word on the stack refers to an object and isn’t just a spicy integer, you can’t move objects that might be the target of a conservative root. And indeed the conservative edge might actually not point to the start of the object; it could be an interior pointer, which places additional constraints on the heap, that it be able to resolve internal pointers.

Security

Which brings us to security and the admirable nihilism of the sandbox effort. The idea is that everything is terrible, so why not just assume that no word is safe and that an attacker can modify any word they can address. The only way to limit the scope of an attacker’s modifications is then to limit the address space. This happens firstly by pointer compression, which happily also has some delightful speed and throughput benefits. Then the pointer cage is placed within a larger cage, and off-heap data such as Wasm memories and array buffers go in that larger cage. Any needed executable code or external object is accessed indirectly, through dedicated tables.

However, this indirection comes with a cost of a proliferation in the number of spaces. In the beginning, there was just an evacuating newspace, a mark-compact oldspace, and a non-moving large object space. Now there are closer to 20 spaces: a separate code space, a space for read-only objects, a space for trusted objects, a space for each kind of indirect descriptor used by the sandbox, in addition to spaces for objects that might be shared between threads, newspaces for many of the preceding kinds, and so on. From what I can see, managing this complexity has taken a significant effort. The result is pretty good to work with, but you pay for what you get. (Do you get security guarantees? I don’t know enough to say. Better pay some more to be sure.)

Finally, the C++ integration has also had an impact on the spaces structure, and with a security implication to boot. The thing is, conservative roots can’t be moved, but the original evacuating newspace required moveability. One can get around this restriction by pretenuring new allocations from C++ into the mark-compact space, but this would be a performance killer. The solution that V8 is going for is to use the block-structured mark-compact space that is already used for the old-space, but for new allocations. If an object is ever traced during a young-generation collection, its page will be promoted to the old generation, without copying. Originally called minor mark-compact or MinorMC in the commit logs, it was renamed to minor mark-sweep or MinorMS to indicate that it doesn’t actually compact. (V8’s mark-compact old-space doesn’t have to compact: V8 usually chooses to just mark in place. But we call it a mark-compact space because it has that capability.)

This last change is a performance hazard: yes, you keep the desirable bump-pointer allocation scheme for new allocations, but you lose on locality in the old generation, and the rate of promoted bytes will be higher than with the semi-space new-space. The only relief is that for a given new-space size, you can allocate twice as many objects, because you don’t need the copy reserve.

Why do I include this discussion in the security section? Well, because most MinorMS commits mention this locked bug. One day we’ll know, but not today. I speculate that evacuating is just too rich a bug farm, especially with concurrency and parallel mutators, and that never-moving collectors will have better security properties. But again, I don’t know for sure, and I prefer to preserve my ability to speculate rather than to ask for too many details.

Concurrency

Speaking of concurrency, ye gods, the last few years have been quite the ride I think. Every phase that can be done in parallel (multiple threads working together to perform GC work) is now fully parallel: semi-space evacuation, mark-space marking and compaction, and sweeping. Every phase that can be done concurrently (where the GC runs threads while the mutator is running) is concurrent: marking and sweeping. A major sweep task can run concurrently with an evacuating minor GC. And, V8 is preparing for multiple mutators running in parallel. It’s all a bit terrifying but again, with engineering investment and a huge farm of fuzzers, it seems to be a doable transition.

Concurrency and threads means that V8 has sprouted new schedulers: should a background task have incremental or concurrent marking? How many sweepers should a given isolate have? How should you pause concurrency when the engine needs to do something gnarly?

The latest in-progress work would appear to be concurrent marking of the new-space. I think we should expect this work to result in a lower overall pause-time, though I am curious also to learn more about the model: how precise is it? Does it allow a lot of slop to get promoted? It seems to have a black allocator, so there will be some slop, but perhaps it can avoid promotion for those pages. I don’t know yet.

Summary

Yeah, GCs, man. I find the move to a non-moving young generation is quite interesting and I wish the team luck as they whittle down the last sharp edges from the conservative-stack-scanning performance profile. The sandbox is pretty fun too. All good stuff and I look forward to spending a bit more time with it; engineering out.

One response

Hubert says:
9 December 2023 3:59 PM
Interesting.
It makes me wonder how well wasm’s stack-based vm (and other vms and bytecode interpreters) takes advantage of multiple CPU cores, if at all(?).

Comments are closed.