Verified software can (and will) be cheaper than buggy stuff
Software is expensive. Verified software is more expensive (although not as much as people think). But we’re working on closing that gap, with the aim of making verified (highest-assurance) software no more expensive than traditionally engineered (no-assurance) code. Our Cogent approach is a major step in that direction.
The cost of verified software
Our paper describing the complete seL4 verification story analysed the cost of designing, implementing and proving the implementation correctness of seL4. We found that the total cost (not counting one-off investments into tools and proof libraries) came to less than $400 per line of code (LOC).
This may look expensive at first, but is less so if taken in context. The Pistachio microkernel, developed a few years earlier in very similar circumstances (university environment with people who knew what they were doing) was only a factor of 2–3 less expensive. Pistachio makes no assurance claims whatsoever (but is generally well-engineered). Take into account that Pistachio experimented far less with kernel design, and its cost does not include the full life-cycle cost (especially later bug-hunting, which doesn’t happen for seL4), you’ll see that we aren’t all that far away.
Another data point is a number quoted by Green Hills some ten years ago: They estimated the full (design, implementation, validation, evaluation and certification) cost of their high-assurance microkernel to be in the order of $1k/LOC.
In other words, we are already much cheaper than traditional high-assurance software, and a factor of 2-3 away from low-assurance software. If we could close the latter gap, then there would be no more excuse for not verifying software.
How to reduce verification cost
The main reason for the high cost of verification is that it’s very labour-intensive. Almost all of the (by now) quarter million lines of seL4 proof code are hand-written, by real humans. And as those numbers show, there is a lot of proof code to write. Just for the seL4 correctness proof (we now have many additional proofs of high-level properties), there were over 20 lines of proof per line of C.
This means there are two obvious avenues to reducing cost:
- reducing the ratio of proof lines to code lines, and
- generating proofs automatically.
Reducing the amount of proof is possible when the code is written in a more verification-friendly way. Not much can be done there when writing in C, the implementation language of seL4 (for good reason). However if we use a high-level language that is strongly typed and memory safe, many properties we had to prove for seL4 would be automatically enforced by the language. Furthermore, functional languages have many advantages, such as freedom from side effects, and verifying code written in such a language is easier (requires fewer proofs) than imperative-language code.
A glimpse of this we got in the original seL4 verification. We did this in two steps, first refining from the formal kernel spec to an “executable spec” which was essentially Haskell code, and then a second refinement from there to C code. Had we stopped at Haskell, we would have saved about 1/3 of the total proof effort. But the Haskell code was actually intentionally written in an imperative style, and more low-level than you would normally do. This is because verifiers preferred to work on the first refinement, and tried to get as much as possible out of it by pushing the Haskell level as low as possible. Refining to more “normal” Haskell code would have made the first refinement step much simpler/cheaper (at the cost of disproportionately more effort going into the second refinement).
Of course, had we stopped at Haskell we would not have a complete verification story. We would have to trust the Haskell compiler, which is an order of magnitude bigger than seL4, and a big Haskell run time. We therefore did the real implementation in C, and proved its correctness (and later also that it was correctly translated into machine code).
However, the translation from Haskell to C, while it was done by a human, was for the most part fairly mechanical (the main exception being the performance-critical fast paths). In fact, it should be possible to automate this translation. But if you automate the translation (i.e. build a compiler) then it should be possible to prove the correctness of that translation, as it had been done by the CompCert certifying C compiler.
Cogent: Co-generation of code and proof
Our Cogent framework combines the above two observations in order to reduce verification cost:
- we use a high-level, strongly typed, memory-safe functional
implementation language — Cogent;
- we automatically translate Cogent into C, and with the C also
generate a formal spec of the code, plus a proof that the generated
C is correct against this spec.
The generated spec is in the logic of the theorem prover, Isabelle, and is at the same abstraction level as the Cogent code (visually equivalent).
Importantly, Cogent compiles into straight C, without the need for a run-time system, so verifying that code is all that’s needed. Cogent code does require some abstract data types (ADTs) implemented in C and to be verified manually, but these are explicitly called by Cogent code, so there’s no library code that gets pulled in under the hood. And there’s certainly no such ugly stuff as garbage collectors.
The combination of a nice, clean functional spec and ability to compile into straight and efficient C without garbage collection is possible through the use of linear types, which restrict the functional representation enough to enable the right translation. The need for a separate ADT library is the cost we pay for this, as the resulting language is not Turing complete.
With this, all we have to do is implement our system in Cogent and then verify that the Cogent code (as represented by the Isabelle spec) is correct against the top-level formal spec. In other words, had we written seL4 in Cogent, we would be done after the first refinement (plus verifying the ADTs). Furthermore, Cogent code is more high-level than our Haskell implementation. We would have easily saved between a third and half of the overall verification effort.
We didn’t try to use Cogent to (re)implement seL4. After all, seL4 is done, and who wants to redo something that’s already been done? (Note that the linear type system would have forced much of the kernel into external libraries, thus defeating the point of using Cogent.) Instead we picked other important systems code as a case study: file systems.
We implemented two file systems in Cogent. One is a re-implementation of the Linux ext2fs. It’s a complete implementation of the original ext2fs and passes the test suite, but lacks some more recent features (symlinks and ACLs).
The second file system is BilbyFS, a custom flash file system, which in complexity and efficiency sits somewhere in between the Linux standard flash file systems JFFS2 and UBIFS.
Our paper recently published at ASPLOS describes the experience. We found that we’re not yet quite there in terms of performance: while our file systems achieved essentially the same throughput as the hand-written C implementations, CPU load indicates that there’s presently about a factor-two overhead. According to our motto that security is no excuse for bad performance, we’ll have more work to do. One issue was that we overestimated the ability of the C compiler to optimise struct arguments, we get a lot of redundant data copies. But this is fixable by putting more smarts into the Cogent compiler.
But our experience so far shows that we’re on the right track: We can implement real-world systems code in Cogent, and integrate with real systems. For evaluation we loaded our file systems into Linux, but we will use them natively in seL4-based systems as well.
What’s the cost of verification now?
An evaluation of the verification cost is extremely encouraging. We did not completely verify our file systems, but we completely verified the implementation of two external APIs for BilbyFS: sync() and iget(). And the result is stunning: The implementation of all functionality required for sync() is 300 lines of Cogent, iget() is 200 lines. The verification of that functionality required 5,700 lines of proof for sync() and 1,800 lines for iget(). So, the first take-away is that the ratio of proof to code lines seems to be reduced, which is what we hope to see and guarantees some cost reduction.
We see a similar picture when looking at proof effort. In seL4, the cost was 1.4 person years (py) per kLOC, in BiblyFS this is down to 0.6py/kLOC, about a 60% reduction. This is due to a combination of two factors: the functional abstraction provided by Cogent, but also the fact that BilbyFS is much more modular than possible for a high-performance microkernel.
We cannot yet read too much into those numbers, as they are for small pieces of code, and experience shows that effort grows roughly with the square of the size. But the trend is good. And it is roughly in line with what we expect.
But the important part is that we totally eliminate the manual effort in verifying down to C. That’s roughly the equivalent of seL4’s second refinement step, except that we can make this automated step bigger with Cogent (and correspondingly reduce the manual step). As argued above, this alone should result in a 1/3 to 1/2 reduction of effort. And this is what we have already.
More work remains to be done, especially with respect to eliminating the performance gap. But we have taken a big step with respect to reducing the factor 2–3 cost gap of formal verification. We estimated that we’ve at least halved it.
The grand goal remains the elimination of this gap. I’m confident we’ll get there in a few years. And imagine the implications: verified, highest-assurance software at the cost of traditional, no-assurance software. We really may be able to make software bugs a thing of the past!
Thanks to Toby Murray for feedback on this blog.