on "binary security of webassembly"

Greets!

You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

reader-response theory

For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

the new criticism

Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

I found the answer to be quite nuanced. To me, the paper shows three interesting things:

  1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

  2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

  3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

on mitigations

It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

chapeau

In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

Comments are closed.