parallel futures in mobile application development

Good morning, hackers. Today I’d like to pick up my series on mobile application development. To recap, we looked at:

Ionic/Capacitor, which makes mobile app development more like web app development;
React Native, a flavor of React that renders to platform-native UI components rather than the Web, with ahead-of-time compilation of JavaScript;
NativeScript, which exposes all platform capabilities directly to JavaScript and lets users layer their preferred framework on top;
Flutter, which bypasses the platform’s native UI components to render directly using the GPU, and uses Dart instead of JavaScript/TypeScript; and
Ark, which is Flutter-like in its rendering, but programmed via a dialect of TypeScript, with its own multi-tier compilation and distribution pipeline.

Taking a step back, with the exception of Ark which has a special relationship to HarmonyOS and Huawei, these frameworks are all layers on top of what is provided by Android or iOS. Why would you do that? Presumably there are benefits to these interstitial layers; what are they?

Probably the most basic answer is that an app framework layer offers the promise of abstracting over the different platforms. This way you can just have one mobile application development team instead of two or more. In practice you still need to test on iOS and Android at least, but this is cheaper than having fully separate Android and iOS teams.

Given that we are abstracting over platforms, it is natural also to abandon platform-specific languages like Swift or Kotlin. This is the moment in the strategic planning process that unleashes chaos: there is a fundamental element of randomness and risk when choosing a programming language and its community. Languages exist on a hype and adoption cycle; ideally you want to catch one on its way up, and you want it to remain popular over the life of your platform (10 years or so). This is not an easy thing to do and it’s quite possible to bet on the wrong horse. However the communities around popular languages also bring their own risks, in that they have fashions that change over time, and you might have to adapt your platform to the language as fashions come and go, whether or not these fashions actually make better apps.

Choosing JavaScript as your language places more emphasis on the benefits of popularity, and is in turn a promise to adapt to ongoing fads. Choosing a more niche language like Dart places more emphasis on predictability of where the language will go, and ability to shape the language’s future; Flutter is a big fish in a small pond.

There are other language choices, though; if you are building your own thing, you can choose any direction you like. What if you used Rust? What if you doubled down on WebAssembly, somehow? In some ways we’ll never know unless we go down one of these paths; one has to pick a direction and stick to it for long enough to ship something, and endless tergiversations on such basic questions as language are not helpful. But in the early phases of platform design, all is open, and it would be prudent to spend some time thinking about what it might look like in one of these alternate worlds. In that spirit, let us explore these futures to see how they might be.

alternate world: rust

The arc of history bends away from C and C++ and towards Rust. Given that a mobile development platform has to have some low-level code, there are arguments in favor of writing it in Rust already instead of choosing to migrate in the future.

One advantage of Rust is that programs written in it generally have fewer memory-safety bugs than their C and C++ counterparts, which is important in the context of smart phones that handle untrusted third-party data and programs, i.e., web sites.

Also, Rust makes it easy to write parallel programs. For the same implementation effort, we can expect Rust programs to make more efficient use of the hardware than C++ programs.

And relative to JavaScript et al, Rust also has the advantage of predictable performance: it requires quite a good ahead-of-time compiler, but no adaptive optimization at run-time.

These observations are just conversation-starters, though, and when it comes to imagining what a real mobile device would look like with a Rust application development framework, things get more complicated. Firstly, there is the approach to UI: how do you get pixels on the screen and events from the user? The three general solutions are to use a web browser engine, to use platform-native widgets, or to build everything in Rust using low-level graphics primitives.

The first approach is taken by the Tauri framework: an app is broken into two pieces, a Rust server and an HTML/JS/CSS front-end. Running a Tauri app creates a WebView in which to run the front-end, and establishes a bridge between the web client and the Rust server. In many ways the resulting system ends up looking a lot like Ionic/Capacitor, and many of the UI questions are left open to the user: what UI framework to use, all of the JavaScript programming, and so on.

Instead of using a platform’s WebView library, a Rust app could instead ship a WebView. This would of course make the application binary size larger, but tighter coupling between the app and the WebView may allow you to run the UI logic from Rust itself instead of having a large JS component. Notably this would be an interesting opportunity to adopt the Servo web engine, which is itself written in Rust. Servo is a project that in many ways exists in potentia; with more investment it could become a viable alternative to Gecko, Blink, or WebKit, and whoever does the investment would then be in a position of influence in the web platform.

If we look towards the platform-native side, though there are quite a number of Rust libraries that provide wrappers to native widgets, practically all of these primarily target the desktop. Only cacao supports iOS widgets, and there is no equivalent binding for Android, so any NativeScript-like solution in Rust would require a significant amount of work.

In contrast, the ecosystem of Rust UI libraries that are implemented on top of OpenGL and other low-level graphics facilities is much more active and interesting. Probably the best recent overview of this landscape is by Raph Levien, (see the “quick tour of existing architectures” subsection). In summary, everything is still in motion and there is no established consensus as to how to approach the problem of UI development, but there are many interesting experiments in progress. With my engineer hat on, exploring these directions looks like fun. As Raph notes, some degree of exploration seems necessary as well: we will only know if a given approach is a good idea if we spend some time with it.

However if instead we consider the situation from the perspective of someone building a mobile application development framework, Rust seems more of a mid/long-term strategy than a concrete short-term option. Sure, build low-level libraries in Rust, to the extent possible, but there is no compelling-in-and-of-itself story yet that you can sell to potential UI developers, because everything is still so undecided.

Finally, let us consider the question of scripting: sometimes you need to add logic to a program at run-time. It could be because actually most of your app is dynamic and comes from the network; in that case your app is like a little virtual machine. If your app development framework is written in JavaScript, like Ionic/Capacitor, then you have a natural solution: just serve JavaScript. But if your app is written in Rust, what do you do? Waiting until the app store pushes a new version of the app to the user is not an option.

There would appear to be three common solutions to this problem. One is to use JavaScript – that’s what Servo does, for example. As a web engine, Servo doesn’t have much of a choice, but the point stands. Currently Servo embeds a copy of SpiderMonkey, the JS engine from Firefox, and it does make sense for Servo to take advantage of an industrial, complete JS engine. Of course, SpiderMonkey is written in C++; if there were a JS engine written in Rust, probably Rust programmers would prefer it. Also it would be fun to write, or rather, fun to start writing; reaching the level of ECMA-262 conformance of SpiderMonkey is at least a hundred-million-dollar project. Anyway what I am saying is that I understand why Boa was started, and I wish them the many millions of dollars needed to see it through to completion.

You are not obliged to script your app via JavaScript, of course; there are many languages out there that have “extending a low-level core” as one of their core use cases. I think the mitigated success that this approach has had over the years—who embeds Python into an iPhone app?—should probably rule out this strategy as a core part of an application development framework. Still, I should mention one Rust-specific option, Rhai; the pitch is that by being Rust-specific, you get more expressive interoperation between Rhai and Rust than you would between Rust and any other dynamic language. Still, it is not a solution that I would bet on: Rhai internalizes so many Rust concepts (notably around borrowing and lifetimes) that I think you have to know Rust to write effective Rhai, and knowing both is quite rare. Anyone who writes Rhai would probably rather be writing Rust, and that’s not a good equilibrium.

The third option for scripting Rust is WebAssembly. We’ll get to that in a minute.

alternate world: the web of pixels

Let’s return to Flutter for a moment, if you will. Like the more active Rust GUI development projects, Flutter is an all-in-one rendering framework based on low-level primitives; all it needs is Vulkan or Metal or (soon) WebGPU, and it handles the rest, layering on opinionated patterns for how to build user interfaces. It didn’t arrive to this state in a day, though. To hear Eric Seidel tell the story, Flutter began as a kind of “reset” for the Web, a conscious attempt to determine from the pieces that compose the Web rendering stack, which ones enable smooth user interfaces and which ones get in the way. After taking away all of the parts they didn’t need, Flutter wasn’t left with much: just GPU texture layers, a low-level drawing toolkit, and the necessary bindings to input events. Of course what the application programmer sees is much more high-level, but underneath, these are the platform primitives that Flutter uses.

So, imagine you work at Google. You used to work on the web—maybe on WebKit and then Chrome like Eric, maybe on web standards—but you broke with this past to see what Flutter might become. Flutter works: great job everybody! The set of graphical and input primitives that you use is minimal enough that it is abstract by nature; it doesn’t much matter whether you target iOS or Android, because the primitives will be there. But the web is still the web, and it is annoying, aesthetically speaking. Could we Flutter-ize the web? What would that mean?

That’s exactly what former HTML specification editor and now Flutter team member Ian Hixie proposed this January in a brief manifesto, Towards a modern Web stack. The basic idea is that the web and thus the browser is, well, a bit much. Hixie proposed to start over, rebuilding the web on top of WebAssembly (for code), WebGPU (for graphics), WebHID (for input), and ARIA (for accessibility). Technically it’s a very interesting proposition! After all, people that build complex web apps end up having to fight with the platform to get the results they want; if we can reorient them to focus on these primitives, perhaps web apps can compete better with native apps.

However if you game out what is being proposed, I have doubts. The existing web is largely HTML, with JavaScript and CSS as add-ons: a web of structured text. Hixie’s flutterized web proposal, on the other hand, is a web of pixels. This has a number of implications. One is that each app has to ship its own text renderer and internationalization tables, which is a bit silly to say the least. And whereas we take it for granted that we can mouse over a web page and select its text, with a web of pixels it is much less obvious how that would happen. Hixie’s proposal is that apps expose structure via ARIA, but as far as I understand there is no association between pixels and ARIA properties: the pixels themselves really have no built-in structure to speak of.

And of course unlike in the web of structured text, in a web of pixels it would be up each app to actually describe its structure via ARIA: it’s not a built-in part of the system. But if you combine this with the rendering story (here’s WebGPU, now draw the rest of the owl), Hixie’s proposal leaves a void for frameworks to fill between what the app developer wants to write (e.g. Flutter/Dart) and the platform (WebGPU/ARIA/etc).

I said before that I had doubts and indeed I have doubts about my doubts. I am old enough to remember when X11 apps on Unix desktops changed from having fonts rendered on the server (i.e. by the operating system) to having them rendered on the client (i.e. the app), which was associated with a similar kind of anxiety. There were similar factors at play: slow-moving standards (X11) and not knowing at build-time what the platform would actually provide (which X server would be in use, etc). But instead of using the server, you could just ship pixels, and that’s how GNOME got good text rendering, with Pango and FreeType and fontconfig, and eventually HarfBuzz, the text shaper used in Chromium and Flutter and many other places. Client-side fonts not only enabled more complex text shaping but also eliminated some round-trips for text measurement during UI layout, which is a bit of a theme in this article series. So could it be that pixels instead of text does not represent an apocalypse for the web? I don’t know.

Incidentally I cannot move on from this point without pointing out another narrative thread, which is that of continued human effort over time. Raph Levien, who I mentioned above as a Rust UI toolkit developer, actually spent quite some time doing graphics for GNOME in the early 2000s; I remember working with his libart_lgpl. Behdad Esfahbod, author of HarfBuzz, built many parts of the free software text rendering stack before moving on to Chrome and many other things. I think that if you work on this low level where you are constantly translating text to textures, the accessibility and interaction benefits of using a platform-provided text library start to fade: you are the boss of text around here and you can implement the needed functionality yourself. From this perspective, pixels don’t represent risk at all. In the old days of GNOME 2, client-side font rendering didn’t lead to bad UI or poor accessibility. To be fair, there were other factors pushing to keep work in a commons, as the actual text rendering libraries still tended to be shipped with the operating system as shared libraries. Would similar factors prevail in a statically-linked web of pixels?

In a way it’s a moot question for us, because in this series we are focussing on native app development. So, if you ship a platform, should your app development framework look like the web-of-pixels proposal, or something else? To me it is clear that as a platform, you need more. You need a common development story for how to build user-facing apps: something that looks more like Flutter and less like the primitives that Flutter uses. Though you surely will include a web-of-pixels-like low-level layer, because you need it yourself, probably you should also ship shared text rendering libraries, to reduce the install size for each individual app.

And of course, having text as part of the system has the side benefit of making it easier to get users to install OS-level security patches: it is well-known in the industry that users will make time for the update if they get a new goose emoji in exchange.

alternate world: webassembly

Hark! Have you heard the good word? Have you accepted your Lord and savior, WebAssembly, into your heart? I jest; it does sometime feel like messianic narratives surrounding WebAssembly prevent us from considering its concrete aspects. But despite the hype, WebAssembly is clearly a technology that will be a part of the future of computing. So let’s dive in: what would it mean for a mobile app development platform to embrace WebAssembly?

Before answering that question, a brief summary of what WebAssembly is. WebAssembly 1.0 is portable bytecode format that is a good compilation target for C, C++, and Rust. These languages have good compiler toolchains that can produce WebAssembly. The nice thing is that when you instantiate a WebAssembly module, it is completely isolated from its host: it can’t harm the host (approximately speaking). All points of interoperation with the host are via copying data into memory owned by the WebAssembly guest; the compiler toolchains abstract over these copies, allowing a Rust-compiled-to-native host to call into a Rust-compiled-to-WebAssembly module using idiomatic Rust code.

So, WebAssembly 1.0 can be used as a way to script a Rust application. The guest script can be interpreted, compiled just in time, or compiled ahead of time for peak throughput.

Of course, people that would want to script an application probably want a higher-level language than Rust. In a way, WebAssembly is in a similar situation as WebGPU in the web-of-pixels proposal: it is a low-level tool that needs higher-level toolchains and patterns to bridge the gap between developers and primitives.

Indeed, the web-of-pixels proposal specifies WebAssembly as the compute primitive. The idea is that you ship your application as a WebAssembly module, and give that module WebGPU, WebHID, and ARIA capabilities via imports. Such a WebAssembly module doesn’t script an existing application: it is the app. So another way for an app development platform to use WebAssembly would be like how the web-of-pixels proposes to do it: as an interchange format and as a low-level abstraction. As in the scripting case, you can interpret or compile the module. Perhaps an infrequently-run app would just be interpreted, to save on disk space, whereas a more heavily-used app would be optimized ahead of time, or something.

We should mention another interesting benefit of WebAssembly as a distribution format, which is that it abstracts over the specific chipset on the user’s device; it’s the device itself that is responsible for efficiently executing the program, possibly via compilation to specialized machine code. I understand for example that RISC-V people are quite happy about this property because it lowers the barrier to entry for them relative to an ARM monoculture.

WebAssembly does have some limitations, though. One is that if the throughput of data transfer between guest and host is high, performance can be bad due to copying overhead. The nascent memory-control proposal aims to provide an mmap capability, but it is still early days. The need to copy would be a limitation for using WebGPU primitives.

More generally, as an abstraction, WebAssembly may not be able to express programs in the most efficient way for a given host platform. For example, its SIMD operations work on 128-bit vectors, whereas host platforms may have much wider vectors. Any current limitation will recede with time, as WebAssembly gains new features, but every year brings new hardware capabilities (tensor operation accelerator, anyone?), so there will be some impedance-matching to do for the foreseeable future.

The more fundamental limitation of the 1.0 version of WebAssembly is that it’s only a good compilation target for some languages. This is because some of the fundamental parts of WebAssembly that enable isolation between host and guest (structured control flow, opaque stack, no instruction pointer) make it difficult to efficiently implement languages that need garbage collection, such as Java or Go. The coming WebAssembly 2.0 starts to address this need by including low-level managed arrays and records, allowing for reasonable ahead-of-time compilation of languages like Java. Getting a dynamic language like JavaScript to compile to efficient WebAssembly can still be a challenge, though, because many of the just-in-time techniques needed to efficiently implement these languages will still be missing in WebAssembly 2.0.

Before moving on to WebAssembly as part of an app development framework, one other note: currently WebAssembly modules do not compose very well with each other and with the host, requiring extensive toolchain support to enable e.g. the use of any data type that’s not a scalar integer or floating-point value. The component model working group is trying to establish some abstractions and associated tooling, but (again!) it is still early days. Anyone wading into this space needs to be prepared to get their hands dirty.

To return to the question at hand, an app development framework can use WebAssembly for scripting, though the problem of how to compose a host application with a guest script requires good tooling. Or, an app development framework that exposes a web-of-pixels primitive layer can support running WebAssembly apps directly, though again, the set of imports remains to be defined. Either of these two patterns can stick with WebAssembly 1.0 or also allow for garbage collection in WebAssembly 2.0, aiming to capture mindshare among a broader community of potential developers, potentially in a wide range of languages.

As a final observation: WebAssembly is ecumenical, in the sense that it favors no specific church of how to write programs. As a platform, though, you might prefer a state religion, to avoid wasting internal and external efforts on redundant or ill-advised development. After all, if it’s your platform, presumably you know best.

summary

What is to be done?

Probably there are as many answers as people, but since this is my blog, here are mine:

On the shortest time-scale I think that it is entirely reasonable to base a mobile application development framework on JavaScript. I would particularly focus on TypeScript, as late error detection is more annoying in native applications.
I would to build something that looks like Flutter underneath: reactive, based on low-level primitives, with a multithreaded rendering pipeline. Perhaps it makes sense to take some inspiration from WebF.
In the medium-term I am sympathetic to Ark’s desire to extend the language in a more ResultBuilder-like direction, though this is not without risk.
Also in the medium-term I think that modifications to TypeScript to allow for sound typing could provide some of the advantages of Dart’s ahead-of-time compiler to JavaScript developers.
In the long term... well we can do all things with unlimited resources, right? So after solving climate change and homelessness, it makes sense to invest in frameworks that might be usable 3 or 5 years from now. WebAssembly in particular has a chance of sweeping across all platforms, and the primitives for the web-of-pixels will be present everywhere, so if you manage to produce a compelling application development story targetting those primitives, you could eat your competitors’ lunch.

Well, friends, that brings this article series to an end; it has been interesting for me to dive into this space, and if you have read down to here, I can only think that you are a masochist or that you have also found it interesting. In either case, you are very welcome. Until next time, happy hacking.

One response

Richard says:
15 June 2023 4:39 PM
Thanks for a great blog post, and it’s excellent that you’re taking an interest in Rust UI. There are 2 parts of the Rust UI story that aren’t talked about above, and I think deserve an honorable mention: vello and AccessKit.
AccessKit is a library that abstracts over accessibility frameworks and OSs, and provides a single library for manipulating an accessibility tree.
vello is a project that looks to use compute shaders to deliver levels of performance in 2D rendering that we haven’t yet seen. It’s still in an experimental state, but I know there is significant interest in it. After using its own GPU abstraction layer, it switched to WebGPU, and I think that WebGPU generally is a really welcome development, and provide a strong foundation for more work to be shifted to the GPU in a cross-platform way.

Comments are closed.