Peoples of the blogosphere, welcome back to the solipsism! Happy 2017 and all that. Today's missive is about Snabb (formerly Snabb Switch), a high-speed networking project we've been working on at work for some years now.
What's Snabb all about you say? Good question and I have a nice answer for you in video and third-party textual form! This year I managed to make it to linux.conf.au in lovely Tasmania. Tasmania is amazing, with wild wombats and pademelons and devils and wallabies and all kinds of things, and they let me talk about Snabb.
In that talk I mentioned that Snabb uses its own drivers. We were recently approached by a customer with a simple and honest question: does this really make sense? Is it really a win? Why wouldn't we just use the work that the NIC vendors have already put into their drivers for the Data Plane Development Kit (DPDK)? After all, part of the attraction of a switch to open source is that you will be able to take advantage of the work that others have produced.
Our answer is that while it is indeed possible to use drivers from DPDK, there are costs and benefits on both sides and we think that when we weigh it all up, it makes both technical and economic sense for Snabb to have its own driver implementations. It might sound counterintuitive on the face of things, so I wrote this long article to discuss some perhaps under-appreciated points about the tradeoff.
Technically speaking there are generally two ways you can imagine incorporating DPDK drivers into Snabb:
Bundle a snapshot of the DPDK into Snabb itself.
Somehow make it so that Snabb could (perhaps optionally) compile against a built DPDK SDK.
As part of a software-producing organization that ships solutions based on Snabb, I need to be able to ship a "known thing" to customers. When we ship the lwAFTR, we ship it in source and in binary form. For both of those deliverables, we need to know exactly what code we are shipping. We achieve that by having a minimal set of dependencies in Snabb -- only LuaJIT and three Lua libraries (DynASM, ljsyscall, and pflua) -- and we include those dependencies directly in the source tree. This requirement of ours rules out (2), so the option under consideration is only (1): importing the DPDK (or some part of it) directly into Snabb.
So let's start by looking at Snabb and the DPDK from the top down, comparing some metrics, seeing how we could make this combination.
|Contributors (since Jan 2016)||32||240|
|Non-merge commits (since Jan 2016)||1.4K||3.2K|
These numbers aren't directly comparable, of course; in Snabb our unit of code change is the merge rather than the commit, and in Snabb we include a number of production-ready applications like the lwAFTR and the NFV, but they are fine enough numbers to start with. What seems clear is that the DPDK project is significantly larger than Snabb, so adding it to Snabb would fundamentally change the nature of the Snabb project.
So depending on the DPDK makes it so that suddenly Snabb jumps from being a project that compiles in a minute to being a much more heavy-weight thing. That could be OK if the benefits were high enough and if there weren't other costs, but there are indeed other costs to including the DPDK:
Data-plane control. Right now when I ship a product, I can be responsible for the whole data plane: everything that happens on the CPU when packets are being processed. This includes the driver, naturally; it's part of Snabb and if I need to change it or if I need to understand it in some deep way, I can do that. But if I switch to third-party drivers, this is now out of my domain; there's a wall between me and something that running on my CPU. And if there is a performance problem, I now have someone to blame that's not myself! From the customer perspective this is terrible, as you want the responsibility for software to rest in one entity.
Impedance-matching development costs. Snabb is written in Lua; the DPDK is written in C. I will have to build a bridge, and keep it up to date as both Snabb and the DPDK evolve. This impedance-matching layer is also another source of bugs; either we make a local impedance matcher in C or we bind everything using LuaJIT's FFI. In the former case, it's a lot of duplicate code, and in the latter we lose compile-time type checking, which is a no-go given that the DPDK can and does change API and ABI.
Communication costs. The DPDK development list had 3K messages in January. Keeping up with DPDK development would become necessary, as the DPDK is now in your dataplane, but it costs significant amounts of time.
Costs relating to mismatched goals. Snabb tries to win development and run-time speed by searching for simple solutions. The DPDK tries to be a showcase for NIC features from vendors, placing less of a priority on simplicity. This is a very real cost in the form of the way network packets are represented in the DPDK, with support for such features as scatter/gather and indirect buffers. In Snabb we were able to do away with this complexity by having simple linear buffers, and our speed did not suffer; adding the DPDK again would either force us to marshal and unmarshal these buffers into and out of the DPDK's format, or otherwise to reintroduce this particular complexity into Snabb.
Abstraction costs. A network function written against the DPDK typically uses at least three abstraction layers: the "EAL" environment abstraction layer, the "PMD" poll-mode driver layer, and often an internal hardware abstraction layer from the network card vendor. (And some of those abstraction layers are actually external dependencies of the DPDK, as with Mellanox's ConnectX-4 drivers!) Any discrepancy between the goals and/or implementation of these layers and the goals of a Snabb network function is a cost in developer time and in run-time. Note that those low-level HAL facilities aren't considered acceptable in upstream Linux kernels, for all of these reasons!
Stay-on-the-train costs. The DPDK is big and sometimes its abstractions change. As a minor player just riding the DPDK train, we would have to invest a continuous amount of effort into just staying aboard.
Fork costs. The Snabb project has a number of contributors but is really run by Luke Gorrie. Because Snabb is so small and understandable, if Luke decided to stop working on Snabb or take it in a radically different direction, I would feel comfortable continuing to maintain (a fork of) Snabb for as long as is necessary. If the DPDK changed goals for whatever reason, I don't think I would want to continue to maintain a stale fork.
Overkill costs. Drivers written against the DPDK have many considerations that simply aren't relevant in a Snabb world: kernel drivers (KNI), special NIC features that we don't use in Snabb (RDMA, offload), non-x86 architectures with different barrier semantics, threads, complicated buffer layouts (chained and indirect), interaction with specific kernel modules (uio-pci-generic / igb-uio / ...), and so on. We don't need all of that, but we would have to bring it along for the ride, and any changes we might want to make would have to take these use cases into account so that other users won't get mad.
So there are lots of costs if we were to try to hop on the DPDK train. But what about the benefits? The goal of relying on the DPDK would be that we "automatically" get drivers, and ultimately that a network function would be driver-agnostic. But this is not necessarily the case. Each driver has its own set of quirks and tuning parameters; in order for a software development team to be able to support a new platform, the team would need to validate the platform, discover the right tuning parameters, and modify the software to configure the platform for good performance. Sadly this is not a trivial amount of work.
Furthermore, using a different vendor's driver isn't always easy. Consider Mellanox's DPDK ConnectX-4 / ConnectX-5 support: the "Quick Start" guide has you first install MLNX_OFED in order to build the DPDK drivers. What is this thing exactly? You go to download the tarball and it's 55 megabytes. What's in it? 30 other tarballs! If you build it somehow from source instead of using the vendor binaries, then what do you get? All that code, running as root, with kernel modules, and implementing systemd/sysvinit services!!! And this is just step one!!!! Worse yet, this enormous amount of code powering a DPDK driver is mostly driver-specific; what we hear from colleagues whose organizations decided to bet on the DPDK is that you don't get to amortize much knowledge or validation when you switch between an Intel and a Mellanox card.
In the end when we ship a solution, it's going to be tested against a specific NIC or set of NICs. Each NIC will add to the validation effort. So if we were to rely on the DPDK's drivers, we would have payed all the costs but we wouldn't save very much in the end.
There is another way. Instead of relying on so much third-party code that it is impossible for any one person to grasp the entirety of a network function, much less be responsible for it, we can build systems small enough to understand. In Snabb we just read the data sheet and write a driver. (Of course we also benefit by looking at DPDK and other open source drivers as well to see how they structure things.) By only including what is needed, Snabb drivers are typically only a thousand or two thousand lines of Lua. With a driver of that size, it's possible for even a small ISV or in-house developer to "own" the entire data plane of whatever network function you need.
Of course Snabb drivers have costs too. What are they? Are customers going to be stuck forever paying for drivers for every new card that comes out? It's a very good question and one that I know is in the minds of many.
Obviously I don't have the whole answer, as my role in this market is a software developer, not an end user. But having talked with other people in the Snabb community, I see it like this: Snabb is still in relatively early days. What we need are about three good drivers. One of them should be for a standard workhorse commodity 10Gbps NIC, which we have in the Intel 82599 driver. That chipset has been out for a while so we probably need to update it to the current commodities being sold. Additionally we need a couple cards that are going to compete in the 100Gbps space. We have the Mellanox ConnectX-4 and presumably ConnectX-5 drivers on the way, but there's room for another one. We've found that it's hard to actually get good performance out of 100Gbps cards, so this is a space in which NIC vendors can differentiate their offerings.
We budget somewhere between 3 and 9 months of developer time to create a completely new Snabb driver. Of course it usually takes less time to develop Snabb support for a NIC that is only incrementally different from others in the same family that already have drivers.
We see this driver development work to be similar to the work needed to validate a new NIC for a network function, with the additional advantage that it gives us up-front knowledge instead of the best-effort testing later in the game that we would get with the DPDK. When you add all the additional costs of riding the DPDK train, we expect that the cost of Snabb-native drivers competes favorably against the cost of relying on third-party DPDK drivers.
In the beginning it's natural that early adopters of Snabb make investments in this base set of Snabb network drivers, as they would to validate a network function on a new platform. However over time as Snabb applications start to be deployed over more ports in the field, network vendors will also see that it's in their interests to have solid Snabb drivers, just as they now see with the Linux kernel and with the DPDK, and given that the investment is relatively low compared to their already existing efforts in Linux and the DPDK, it is quite feasible that we will see the NIC vendors of the world start to value Snabb for the performance that it can squeeze out of their cards.
So in summary, in Snabb we are convinced that writing minimal drivers that are adapted to our needs is an overall win compared to relying on third-party code. It lets us ship solutions that we can feel responsible for: both for their operational characteristics as well as their maintainability over time. Still, we are happy to learn and share with our colleagues all across the open source high-performance networking space, from the DPDK to VPP and beyond.
Most of you have heard of "Conway's Law", the pithy observation that the structure of things that people build reflects the social structure of the people that build them. The extent to which there is coordination or cohesion in a system as a whole reflects the extent to which there is coordination or cohesion among the people that make the system. Interfaces between components made by different groups of people are the most fragile pieces. This division goes down to the inner life of programs, too; inside it's all just code, but when a program starts to interface with the outside world we start to see contracts, guarantees, types, documentation, fixed programming or binary interfaces, and indeed faults as well: how many bug reports end up in an accusation that team A was not using team B's API properly?
If you haven't heard of Conway's law before, well, welcome to the club. Inneresting, innit? And so thought I until now; a neat observation with explanatory power. But as aspiring engineers we should look at ways of using these laws to build systems that take advantage of their properties.
in praise of bundling
Most software projects depend on other projects. Using Conway's law, we can restate this to say that most people depend on things built by other people. The Chromium project, for example, depends on many different libraries produced by many different groups of people. But instead of requiring the user to install each of these dependencies, or even requiring the developer that works on Chrome to have them available when building Chrome, Chromium goes a step further and just includes its dependencies in its source repository. (The mechanism by which it does this isn't a direct inclusion, but since it specifies the version of all dependencies and hosts all code on Google-controlled servers, it might as well be.)
Downstream packagers like Fedora bemoan bundling, but they ignore the ways in which it can produce better software at lower cost.
One way bundling can improve software quality is by reducing the algorithmic complexity of product configurations, when expressed as a function of its code and of its dependencies. In Chromium, a project that bundles dependencies, the end product is guaranteed to work at all points in the development cycle because its dependency set is developed as a whole and thus uniquely specified. Any change to a dependency can be directly tested against the end product, and reverted if it causes regressions. This is only possible because dependencies have been pulled into the umbrella of "things the Chromium group is responsible for".
Some dependencies are automatically pulled into Chrome from their upstreams, like V8, and some aren't, like zlib. The difference is essentially social, not technical: the same organization controls V8 and Chrome and so can set the appropriate social expectations and even revert changes to upstream V8 as needed. Of course the goal of the project as a whole has technical components and technical considerations, but they can only be acted on to the extent they are socially reified: without a social organization of the zlib developers into the Chromium development team, Chromium has no business automatically importing zlib code, because the zlib developers aren't testing against Chromium when they make a release. Bundling zlib into Chromium lets the Chromium project buffer the technical artifacts of the zlib developers through the Chromium developers, thus transferring responsibility to Chromium developers as well.
Conway's law predicts that the interfaces between projects made by different groups of people are the gnarliest bits, and anyone that has ever had to maintain compatibility with a wide range of versions of upstream software has the scar tissue to prove it. The extent to which this pain is still present in Chromium is the extent to which Chromium, its dependencies, and the people that make them are not bound tightly enough. For example, making a change to V8 which results in a change to Blink unit tests is a three-step dance: first you commit a change to Blink giving Chromium a heads-up about new results being expected for the particular unit tests, then you commit your V8 change, then you commit a change to Blink marking the new test result as being the expected one. This process takes at least an hour of human interaction time, and about 4 hours of wall-clock time. This pain would go away if V8 were bundled directly into Chromium, as you could make the whole change at once.
forking considered fantastic
"Forking" sometimes gets a bad rap. Let's take the Chromium example again. Blink forked from WebKit a couple years ago, and things have been great in both projects since then. Before the split, the worst parts in WebKit were the abstraction layers that allowed Google and Apple to use the dependencies they wanted (V8 vs JSC, different process models models, some other things). These abstraction layers were the reified software artifacts of the social boundaries between Google and Apple engineers. Now that the social division is gone, the gnarly abstractions are gone too. Neither group of people has to consider whether the other will be OK with any particular change. This eliminates a heavy cognitive burden and allows both projects to move faster.
As a pedestrian counter-example, Guile uses the libltdl library to abstract over the dynamic loaders of different operating systems. (Already you are probably detecting the Conway's law keywords: uses, library, abstract, different.) For years this library has done the wrong thing while trying to do the right thing, ignoring .dylib's but loading .so's on Mac (or vice versa, I can't remember), not being able to specify soversions for dependencies, throwing a stat party every time you load a library because it grovels around for completely vestigial .la files, et cetera. We sent some patches some time ago but the upstream project is completely unmaintained; the patches haven't been accepted, users build with whatever they have on their systems, and though we could try to take over upstream it's a huge asynchronous burden for something that should be simple. There is a whole zoo of concepts we don't need here and Guile would have done better to include libltdl into its source tree, or even to have forgone libltdl and just written our own thing.
Though there are costs to maintaining your own copy of what started as someone else's work, people who yammer on against forks usually fail to recognize their benefits. I think they don't realize that for a project to be technically cohesive, it needs to be socially cohesive as well; anything else is magical thinking.
not-invented-here-syndrome considered swell
Likewise there is an undercurrent of smarmy holier-than-thou moralism in some parts of the programming world. These armchair hackers want you to believe that you are a bad person if you write something new instead of building on what has already been written by someone else. This too is magical thinking that comes from believing in the fictional existence of a first-person plural, that there is one "we" of "humanity" that is making linear progress towards the singularity. Garbage. Conway's law tells you that things made by different people will have different paces, goals, constraints, and idiosyncracies, and the impedance mismatch between you and them can be a real cost.
Sometimes these same armchair hackers will shake their heads and say "yeah, project Y had so much hubris and ignorance that they didn't want to bother understanding what X project does, and they went and implemented their own thing and made all their own mistakes." To which I say, so what? First of all, who are you to judge how other people spend their time? You're not in their shoes and it doesn't affect you, at least not in the way it affects them. An armchair hacker rarely understands the nature of value in an organization (commercial or no). People learn more when they write code than when they use it or even when they read it. When your product has a problem, where will you find the ability to fix it? Will you file a helpless bug report or will you be able to fix it directly? Assuming your software dependencies model some part of your domain, are you sure that their models are adequate for your purpose, with the minimum of useless abstraction? If the answer is "well, I'm sure they know what they're doing" then if your organization survives a few years you are certain to run into difficulties here.
As a much more minor, insignificant, first-person example, I am an OK compiler hacker now. I don't consider myself an expert but I do all right. I got here by making a bunch of mistakes in Guile's compiler. Of course it helps if you get up to speed using other projects like V8 or what-not, but building an organization's value via implementation shouldn't be discounted out-of-hand.
Another point is that when you build on someone else's work, especially if you plan on continuing to have a relationship with them, you are agreeing up-front to a communications tax. For programmers this cost is magnified by the degree to which asynchronous communication disrupts flow. This isn't to say that programmers can't or shouldn't communicate, of course, but it's a cost even in the best case, and a cost that can be avoided by building your own.
When you depend on a project made by a distinct group of people, you will also experience churn or lag drag, depending on whether the dependency changes faster or slower than your project. Depending on LLVM, for example, means devoting part of your team's resources to keeping up with the pace of LLVM development. On the other hand, depending on something more slow-moving can make it more difficult to work with upstream to ensure that the dependency actually suits your use case. Again, both of these drag costs are magnified by the asynchrony of communicating with people that probably don't share your goals.
Finally, for projects that aim to ship to end users, depending on people outside your organization exposes you to risk. When a security-sensitive bug is reported on some library that you use deep in your web stack, who is responsible for fixing it? If you are responsible for the security of a user-facing project, there are definite advantages for knowing who is on the hook for fixing your bug, and knowing that their priorities are your priorities. Though many free software people consider security to be an argument against bundling, I think the track record of consumer browsers like Chrome and Firefox is an argument in favor of giving power to the team that ships the product. (Of course browsers are terrifying security-sensitive piles of steaming C++! But that choice was made already. What I assert here is that they do well at getting security fixes out to users in a timely fashion.)
to use a thing, join its people
I'm not arguing that you as a software developer should never use code written by other people. That is silly and I would appreciate if commenters would refrain from this argument :)
Let's say you have looked at the costs and the benefits and you have decided to, say, build a browser on Chromium. Or re-use pieces of Chromium for your own ends. There are real costs to doing this, but those costs depend on your relationship with the people involved. To minimize your costs, you must somehow join the community of people that make your dependency. By joining yourself to the people that make your dependency, Conway's law predicts that the quality of your product as a whole will improve: there will be fewer abstraction layers as your needs are taken into account to a greater degree, your pace will align with the dependency's pace, and colleagues at Google will review for you because you are reviewing for them. In the case of Opera, for example, I know that they are deeply involved in Blink development, contributing significantly to important areas of the browser that are also used by Chromium. We at Igalia do this too; our most successful customers are those who are able to work the most closely with upstream.
On the other hand, if you don't become part of the community of people that makes something you depend on, don't be surprised when things break and you are left holding both pieces. How many times have you heard someone complain the "project A removed an API I was using"? Maybe upstream didn't know you were using it. Maybe they knew about it, but you were not a user group they cared about; to them, you had no skin in the game.
Foundations that govern software projects are an anti-pattern in many ways, but they are sometimes necessary, born from the need for mutually competing organizations to collaborate on a single project. Sometimes the answer for how to be able to depend on technical work from others is to codify your social relationship.
One note before opening the comment flood: I know. You can't control everything. You can't be responsible for everything. One way out of the mess is just to give up, cross your fingers, and hope for the best. Sure. Fine. But know that there is no magical first-person-plural; Conway's law will apply to you and the things you build. Know what you're actually getting when you depend on other peoples' work, and know what you are paying for it. One way or another, pay for it you must.