value representation in javascript implementations
In my last note I mentioned that I had been doing a lot of reading of JavaScript implementation code. One point that I didn't mention is that the state of the art is completely undocumented. So to work on rectifying that, here's the first in what might be a series of articles about the internals of JavaScript implementations.
tagging
Initially, all JavaScript implementations used tagged pointers to represent JS values. This is a old trick that comes from the observation that allocated memory takes up at least 4 or 8 bytes, and are aligned in such a way that the least significant bit or three will be zero.
Here's an old part of the Guile manual that explains the technique. For example you could take values whose lowest bit is 1 to be a pointer to the heap. Any value that ends in 0 could be an integer, represented directly in the bits of the pointer. (If your pointer tag is not 0, you will have to mask off the low bits before dereferencing a pointer..)
Google's V8 JavaScript engine still does it this way. This yields 31-bit immediate signed integers; you can just do a signed right-shift to get the value. They call them "smi" values, for "small integers"; they are more commonly called "fixnums". There are other fun tricks you can do to avoid shifts for some common operations, like addition. Note that they also take advantage of the four-byte alignment (in their case) to encode another immediate value, "failure objects", using the other low bit.
The size of a tagged pointer is the size of a pointer, so 32 bits on a 32-bit machine, and 64 on a 64-bit machine. V8 doesn't actually use 63-bit smi values on 64-bit machines, AFAIK.
nan-boxing
JavaScriptCore, the implementation used by WebKit, switched from tagged pointers to a "nan-boxed" format. The best explanation of this strategy is given by Rob Sayre of Mozilla. I believe the technique was first made popular by LuaJIT, though I would appreciate additional references.
Sayre does a good job of explaining things, but to recap, there are about 253 bit patterns that represent not-a-number values for IEEE-754 double-precision floats, but that only one of those is actually emitted by current hardware and software. So cheeky implementations are free to use the other patterns for their own purposes.
What I always wondered when I heard about this strategy was how 53 bits were possibly enough, for a 64-bit machine. How are you going to fit a pointer in that space?
Well it turns out that on x64-64 systems, you only have a 48-bit address space, currently anyway. The shape of that space is quite peculiar, too. I don't know whether Linux gives out addresses in the upper half, but Windows does not, though Solaris does.
Anyway, the other way of looking at NaN-boxing is that there are 264 - 248 values in a pointer that aren't being used, and we might as well stuff something in them, and hey, the whole space of valid double precision numbers fits!
It's convenient too that numbers in JS are specified as doubles. (Even still, all implementations define separate "small integer" types, for use in loops and indexing and such; integer operations are faster when you can use them, too.)
JSC's implementation of nan-boxing is described here, and you can see some of the neat tricks they do here. It actually makes me envious of C++, as a C programmer!
nun-boxing
So, when you choose to do nan-boxing, you basically choose to do one of two things: you favor pointers, or you favor doubles. To favor pointers means that you recognize pointers as having initial (most-significant) 0 bits; if the initial bits are not 0, then it's a double, and you have to add or subtract a bit pattern to get to the double value.
Favoring doubles means that pointers are left as NaN values, so their initial bits are all ones (or the sign bit can be unset, it doesn't matter), and to unpack a pointer, you have to rotate the double space around.
Amusingly, there is a third option as well. For 32-bit machines, you can address the second word in the double directly, so there is no need to mask off anything. This is the JSVALUE32_64 case mentioned in the JSValue code I linked to above.
JSC chose to favor pointers, and as the first JS implementation to nan-box, got to call their choice "nan-boxing". Mozilla chose to favor doubles, and so made up the silly name "nun-boxing".
get thee to a nun boxery?
So you're implementing a language. Should you nan-box or not? I can't say in your case but I can give a couple of examples.
NaN-boxing has the obvious advantage of not allocating doubles on the heap. This reduces cache pressure, GC pressure, and such. That's why Moz and JSC chose it.
V8 on the other hand has not chosen it, at least not yet, anyway. I think that one reason is because especially on embedded devices, and to an extent on ia32, passing around 64-bit values is a big lose. It's so bad that I understand that Mozilla actually passes these values by reference instead of by value in some places, on 32-bit systems only.
But the real reason that V8 doesn't nan-box I think is that they have a compiler that is able to unbox values in registers and in temporary stack locations, both as int32 values and as double values. This works for loops and such things in a function, and is obviously quite efficient, as you don't need to box and unbox at all. Passing doubles as arguments or return values however does allocate them on the heap, as far as I can tell anyway.
Also there is the hackery that with NaN-boxing, you assume things about your OS's memory management. A language implementation should be able to get around this with use of mmap at specific addresses, but still, it's tricky.
I looked at doing nan-boxing for Guile, and eventually decided against it. Guile has a few more constraints that V8 does not have. Firstly it is very portable, so it can't rely the address range constraints that x86-64 has, not without some hackery. It's also portable to the low end, like V8, so 64-bit values are a lose there.
But the killer is the conservative GC on 32-bit systems. If you represent integers with the tag in the high bits, as you would with nan-boxing, then the lower 32 bits are not distinguishable from a pointer, so they can cause the GC to retain heap objects for longer than it should. With tagged pointers, the low-bit masking lets the GC know that an integer is not a pointer. If you were to reintroduce low-bit tagging to a nan-boxed system, you've lost much of the advantage of it. Furthermore if you want to support more than 32 bits of precision in a fixnum, then on a 32-bit system you need to play games with sign extension of a 48-bit value to a 64-bit value, and for all word sizes it looks like the overhead will be significantly higher than tagging.
Finally I think that if we're really going to shoot for the moon, we should implement something like V8's Crankshaft compiler that does decent type inference, to allow unboxed values in registers and on the stack. (For readers of source code, "crankshaft" is basically the "hydrogen" flow-graph code plus a "lithium" assembler. Confusing, right?)
Well that's all for now. If you corrections, or an idea for a further topic, let me know in the comments. Thanks!
9 responses
Comments are closed.
I have discussed this topic with someone who is working on a nun-boxing implementation of R5RS Scheme. He is restricting the non-doubles to be *signaling* NaNs, thus avoiding the unsafe assumption that only one particular (quiet) NaN is *the* representation of NaN.
Chibi Scheme, which uses the tagged pointer system, used to have a configuration-time option (for 64-bit systems only) to use 32-bit floats as Scheme inexact values, directly embedded in the pointer. Apparently this has been removed now.
v8 won't have 63-bit small integers because JavaScript doesn't have an integer type at all. Every number in play is just an IEEE-754 double precision number, which of course can't represent all 64-bit integer values (integers of magnitude up to 2**53ish, yes, but past that some integer values just don't exist). So there's no real reason to have larger small integers, and indeed there might be some reason not to, because adding 32-bit numbers may well be slightly faster than adding 64-bit numbers.
The most ideal non-31-bit integer value to have would probably be 32-bit integers -- crypto algorithms in particular could benefit from not having to go to double when the high bit gets set, although runtime techniques can substantially mitigate this cost -- but again I suspect the 32-bit add versus 64-bit add is problematic.
Here's an article pointing to a 1993 paper on value representation techniques (nice, easy read) which also mentions NaN tagging briefly:
http://lambda-the-ultimate.org/node/3912
Good summary of the main design points and tradeoffs. Some comments:
On nax-boxing or not, my guess is that the original reason V8 didn't nan-box is that they have always had a good generational GC, which makes allocating doubles on the stack pretty cheap. In systems that don't have generational GC, putting doubles on the heap was way too expensive. And even with V8's generational GC, before Crankshaft came out, Jaegermonkey and Tracemonkey were generally faster on floating-point code, because of the nan-boxing.
Once you have an optimizing compiler that can operate mostly on unboxed values (Tracemonkey and Crankshaft), then the boxing format doesn't matter much at all, because you are not using boxed values in hot code. But I really doubt Crankshaft influenced the value format of the original V8.
I'm a bit surprised about your point about conservative GC--is an untagged int that likely to point inside the GC heap? We currently use conservative GC but we haven't had any evidence that that problem bites us, and I'm sure there's a lot more random stuff than boxed values on the C stack anyway, at least in a browser.
To add further to what Jeff says, floating point arithmetic seems to be about as fast as integer arithmetic now, so the big cost of using floats is in conversion to ints, which happens for bitops and array indices. But bitop inputs are always logically 32 bits in JS and array indices are going to be 32 bits or less in practice.
"If your pointer tag is not 0, you will have to mask off the low bits before dereferencing a pointer."
Don't most processors these days have a mem[ptr-const] addressing mode?
Nice article! Just a few nits: the name "nunbox" refers to nan-boxing on 32-bit architectures and is roughly equivalent to JSC's JSVALUE32_64 (the 'u' in 'nunbox' is for 'unboxed' since, as you pointed out, the low word is unboxed). The 64-bit nan-boxing encoding used by SpiderMonkey is named "punboxing" (the 'p' is for 'packed', since this scheme is logically like nunboxing, but with all the unused bits sucked out).
The IEEE 754 tagged pointer representation used by LuaJIT was first invented in 1997 for an implementation of parallel Haskell.
Sandro: NaN-boxing is older than the parallel Haskell technique -- it was for example described by Gudeman in section "2.6.1 Using IEEE NaN Codes" of "Representing Type Information in Dynamically Typed Languages" (1993).
> I don't know whether Linux gives out addresses in the upper half, but Windows does not, though Solaris does.
Linux seems to only use the upper half for virtual syscalls on my current kernel build:
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
It's probably a build-time option, though.