Skip to content
February 15, 2012

Where Andy is wrong about L4

Andy Tanenbaum, in a recent interview with LinuxFr.org, made some comments about L4. Unfortunately, they were mostly wrong.

Make no mistake, Andy is a guy I hold in high regard, he’s done great things over the years. But that doesn’t mean he should get away with making incorrect statements about the main competitor to his Minix system! And I’m not fussed about the outdated number of 300 million L4 deployments (the official figure is 1.5 billion). There are far more substantial issues. So, let’s have a look at what he said.

Andy Quote 1: “The biggest difference is that we [Minix] run the entire operating system as a collection of servers in userland. They [L4] mostly run Linux as a whole unit in userland. This means that a single bug in Linux can crash Linux. While L4 will survive this, all that does is make the reboot faster. It doesn’t solve the reliability problem.”

While it is true that L4 is being used as a virtualization platform to host Linux (something my colleague Hermann Härtig in Dresden has been doing since 1997), and there are phones out there that run Linux on top of L4, that isn’t the whole story. In fact, if all we did was put L4 underneath Linux, we’d do nothing than adding overhead. Clearly, that would be a pointless exercise.

There are several reasons why you’d want to run Linux on top of L4. One is the classical virtualization use case of processor consolidation: running multiple systems side-by-side on the same processor, using virtual-machine encapsulation to isolate one from the other. This is the configuration that is indeed used on some of the phones that ship with L4.

The second reason is legacy support: If you think you can just introduce a new OS API (even if it’s POSIX-compliant) and the world will adapt all its software, you’re dreaming. People want to not only keep re-using their old software, they even want to build new software to the old environments. But that shouldn’t stop you from providing a better environment for safety- and security-critical components.

This trend is indeed very strong in the embedded world, even (or particularly) in the domain of safety- or security-critical devices. I rarely get to talk to someone considering deploying seL4 who doesn’t have a need for a Linux or Windows environment. The important point is that a highly trustworthy microkernel underneath allows this to co-exist safely with the critical functionality.

Andy Quote 2: “There are lots of technical differences between MINIX and L4, but the focus of the projects is very different. We [Andy] have a microkernel so we can run a multiserver system. The entire focus is on multiserver and the ability to restart OS components independently. L4 is being used to run Linux and on phones. This has nothing to do with technical differences. We could have used L4 and they could have used MINIX but it is easier to use your own since you can change it as needs arise.”

Notwithstanding my comments above, the reality is that many of the 1.5 billion L4 deployments have nothing whatsoever to do with Linux! They run L4 on the baseband processor of a smartphone, with Linux (or Windows or iOS or BBOS or xyz) running on a separate applications processor. And the baseband stack is in fact a multi-server design! It consists of multiple (each multithreaded) tasks that communicate with each other. Exactly what Andy claims he does and we don’t…

The statement that “they” (i.e. us) could have used Minix is actually quite wrong. Minix, owing to its design and strong emphasis on simplicity, is much slower than L4. (Their papers typically tell you that throughput is only degraded by a few %. However, that’s typically compared to Minux-2, which isn’t the world’s fastest system either, while a reasonable baseline would be Linux. Furthermore, they don’t tell you how much CPU usage increased – I’ve discussed this with Andy, and unless things have changed a lot recently, then CPU overhead is pretty significant. Pretending that throughput degradation measures overhead while ignoring CPU load is one of my favourite benchmarking crimes!)

Performance happens to be pretty important on a mobile device (whether it’s a phone or an implanted medical device). Throwing away cycles means throwing away battery life, and manufacturers tend to care about this. In my experience, overheads of more than 5% (measured in terms of CPU cycles) are considered unacceptable in this space. Minix certainly wouldn’t be useful in a mobile environment at all!

Andy Quote 3: “They certainly did a nice job [on formal verification of seL4] but it is really hard to verify if it works since the hardware itself is poorly defined and the OS has to deal with the hardware a lot.”

C’mon, Andy, that one’s really cheap! seL4 is the first (and so far only) OS kernel which has a formal (mathematical and machine-checked) proof that its implementation follows the specification. That means nothing less than that the implementation is bug-free with respect to the spec. Something that just cannot be achieved by simplicity and any amount of code inspection or testing!

Andy seems to imply that because the hardware isn’t exactly defined, the verification is somewhat worthless. Of course, if the hardware’s behaviour isn’t defined well enough, then that hurts Minix just as well. But the argument is actually bogus. The verification of seL4 was done for ARM processors, and (unlike x86) they have a very precisely-defined ISA. It’s so well defined that Anthony Fox and Magnus Myreen at Cambridge could turn it into a formal (mathematical) model. And we’re in the process of linking this model to our seL4 code proof.

In reality, the functional correctness proof of seL4 is very powerful. It means that seL4′s behaviour is very precisely defined (there is a formal spec of the kernel). And the implementation adheres to that spec. Which means that in order to prove safety or security properties of a system built on top, it is generally sufficient to prove that these properties hold with respect to the seL4 spec, and by implication they hold for the system. (I’m glossing over a few details here, this argument requires more than functional correctness. However, all the other bits required are either already in place for seL4, or we’re working on them and they should be in place this year!)

So, how about the multi-server issue? This is the classical microkernel approach: you structure the microkernel-based system into multiple components (or servers) and the microkernel enforces isolation between them. This can lead to a more fault-tolerant system, as Andy keeps pointing out (correctly). However, if he seems to imply that this somehow is different from what we do, then that’s simply wrong. In fact, our Trustworthy Systems research agenda uses exactly this component approach. The difference is that we can prove (security or safety or other) properties of our system. No-one else can do this!

On top of that we have a few other cool bits. Such as device drivers which are synthesised from formal specs, and are as such correct by construction! Perfect complement to a formally-verified kernel.

Andy Quote 4 (not actually about L4 but about capability-based OSes like Coyotos): “Capability systems have been around for 40 years. At the recent SOSP conference, M.I.T. professor Jack Dennis got an award for inventing them in 1967. They definitely have potential. My first distributed system, Amoeba, used them. Coyotos, however, is going nowhere.”

Am I to read this as an argument that because CoyotOS is going nowhere, capability systems aren’t?

seL4 is a capability-based system too. And, unlike CoyotOS, it is going places! In fact, capabilities are a core part of the safety/security story in seL4: they allow us to reason precisely about access rights (and information flow) in the system, in a way that is impossible to do with an access-control-list based approach as Minix uses it.

Summary

The two approaches are similar as far as OS structure is concerned. The core difference is that with Minix, Andy is trying to get things right by keeping them really simple. He cops a significant performance overhead, yet in the end can only use traditional quality-assurance methods (testing and code inspection) to make a case about the behaviour of his system. Nor can he formally reason about the security/safety properties of the complete system.

In contrast, with seL4 we have a microkernel that is blindingly fast (as fast as any L4 kernel, and no other microkernel has ever been able to beat L4 in performance), yet we can make actual guarantees about its behaviour and that of systems built on top, with the strength of mathematical proof!

Everything else is simply not playing in the same league.

February 4, 2012

Automotive: the next frontier for virtualization

I have been predicting for years that there is a need for virtualization in cars. Guess what, it’s about to happen!

There are two main drivers (excuse the pun): On the one hand, features are mushrooming in cars just as in consumer electronics. And in cars, traditionally each new feature has resulted in yet another microcontroller (or four). The barrier of 100 ECUs (as micros are called in cars) has been broken years ago. Besides the problems of increasing complexity of this distributed computing system, there are very solid reasons why this approach doesn’t scale any more: cars are a very hostile environment for electronics. It must be packaged up to be heat resistant, dust resistant, vibration resistant, water proof, grease proof, acid proof, the lot. This leads to expensive and bulky packaging, which is creating space and weight problems.

On the other hand, there is an increasing need for integration of very dissimilar systems. Classical automotive (control and convenience) functionality is subject to standards set by the automotive industry. But much of the infotainment functionality uses parts from the consumer-electronics space (eg wifi, Bluetooth, Linux, Windows). These are made to standards (or not) that are out of control of the auto industry. And as such it can be hard to predict how such parts interact with automotive subsystems.

An obvious way of getting around the issue of mushrooming functionality is to consolidate on fewer processors. This requires a protected multitasking operating system to isolate functions. However, while classical automotive functionality runs on automotive OSes, such as AUTOSAR, entertainment functionality expects more powerful OS platforms. The automotive industry has reconised this, as is obvious in GENIVI, which is a Linux-based OS. But two different OSes on the same processors only works if they run in separate virtual machines (VMs). Even if they run on different cores of a multicore processor, multiple cores aren’t isolated from each other unless there’s a hypervisor underneath.

Note that this consolidation is not only needed to save parts/space/weight. Increasingly tight integration of functionalities is inevitable. For example, infotainment is becoming tightly integrated with more classical automotive functions, such as navigation and driver assistance. (Example: information on road conditions, obtained from ad-hoc networks formed with other cars, may be used to change driving behaviour, say to stabilise the car when there is a chance of roads being slippery).

However, such integration cannot lead to misbehaviour of consumer-electronics parts spreading to other functionality. If some infotainment component crashes (or worse, catches a virus) you want to be damn sure this doesn’t spread to other functions.

There are other interesting aspects. Automotive components are expected to be ready and operational very quickly when the ignition key is activated. For example, nodes in the automotive CAN and MOST networks have a requirement to start up within 50 ms. GENIVI, being a Linux OS, cannot boot up anywhere near that quickly, yet it must be able to talk to the CAN/MOST bus and is subject to its requirements. The way out is either have the CAN/MOST controller driver on a separate processor (and chip), or, much cheaper, have it in a different virtual machine on the same chip.

All this is nothing compared to having to run GENIVI and AUTOSAR on the same processor, because of integration fo functionalities (eg. the instrument cluster, driven by AUTOSAR, is to display date that comes from the infotainment world). You’re really stuck there without virtualization.

Don’t believe me? Just wait, you’ll see it happening quite soon.

January 22, 2012

RedBend: There’s nothing “optional” about virtual machine isolation

A week or two ago “mobile virtualization” provider RedBend created a bit of press announcing their vLogix Mobile 5.0, which they claim is much faster to integrate than other solutions. (If you look at what architectures are supported, you know why: they target only ARM Cotex-A15 and Cortex-A7 cores, which are the ones with hardware virtualization support. Sure if you don’t para-virtualize things go faster. Have a look at our evaluation of ARM’s hardware support for virtualization with the first hypervisor that supports it).

So far, so boring. They also make claims that their mobile virtualization solution is deployed more widely than all the others concerned, a somewhat amusing statement, given that to date not a single product has been publicly identified that uses RedBend virtualization! If so many use it, why don’t they own up?

What most surprised me when looking at their web site is that they still have an optional isolator module. Does this sound familiar? Well, that’s exactly what VirtuaLogix (which RedBend bought in Sep’10) had! I had examined this in detail 4 years ago and pointed out that isolation of virtual machines is an inherent consequence of virtualization, not an optional add-on. What was behind is that VirtualLogix used a pseudo-virtualization approach which runs guest OSes in privileged mode, at the same privilege level as the hypervisor. Their optional “isolation mode” meant de-privileging the guest, exactly what the rest of the world calls “virtualization”.

I find this all a bit dishonest. In fact, if we were talking about a consumer product sold in Australia, I would ask the Dept of Fair Trading whether this might constitute misleading advertising…

Also, I would have thought that they would have learned how to do it properly in the meantime. Reading the description of the “optional isolator” on RedBend’s web site, it seems not.

I can only repeat my old recommendation: Take a good OS course, guys! Such as the Advanced Operating Systems course I teach at UNSW. There you’ll not only learn the concepts, you’ll also learn how to design and implement kernels so they perform without shortcuts.

September 9, 2011

Hey VMware: Secure It Ain’t!

At last week’s VMworld, VMware presented, once more, their Mobile Virtualization Platform (MVP), now called Horizon Mobile. Besides the usual hype, there were a few things that I found somewhat annoying.

Specifically, VMware’s Raj Mallempati is quoted as saying: “What VMware is going to do is provide me a corporate phone, which is a virtual machine that is completely encrypted, completely managed and secure, and they are going to deliver that onto my device.”

Even considering that it is coming from a marketing guy, I find this statement rather dishonest. Because secure it ain’t. Not for the business. Not for the owner of the phone.

Let me explain.

Insecure for the business

As I explained in a blog last year, VMware’s hypervisor is hosted inside the phone’s native Android OS kernel (which is why they call it, incorrectly, a Type-2 hypervisor). What this means is that whoever owns that OS kernel owns the VMware hypervisor, and thus the virtual machine which contains the business phone. They encrypt the business phone’s data on flash, but that doesn’t provide any protection if the native Android kernel is compromised, it can simply read the keys out of memory.

Hence, if an app compromises the Android kernel, it controls the business phone, including all its data, network connections, the lot. And notice that the private phone keeps functioning as normal, meaning the owner is free to install and run any arbitrary Android app. With the Android kernel comprising about a million of lines of code, it can be expected to contain about 10,000 bugs. How many of the 100,000+ Android apps trigger an exploit? Probably plenty. In fact, this is the primary reason businesses don’t like company-provided handsets to be open, they fear security to be compromised.

Hybrid hypervisor adds kernel module to host and runs guest VM like normal app

Hybrid hypervisor

Insecure for the owner

But the setup isn’t secure for the phone’s owner either. It would be if VMware used a proper Type-2 hypervisor, as that would be completely untrusted from the native Android kernel’s point of view. However, as I explained in another blog last year, the MVP setup is actually neither a Type-1 nor a Type-2, but a hybrid hypervisor. It is hosted inside the host OS, not on top of it. (They wouldn’t be able to achieve acceptable performance with a Type-2.)

What this means is that VMware essentially installs a rootkit into your Android kernel, which re-directs the exception vectors to their hypervisor module. Meaning they take over your phone. Effectively, your phone is now “owned” by whoever controls the hypervisor. Which isn’t you, the owner, it’s VMware or the OEM or the network provider or your employer (or maybe all of them). All your private data is at their mercy.

And VMware go on to say that they combine this with device management software, so they can remotely wipe the phone without touching it. Only the business phone, of course. Really? Are they going to cleanly un-install the rootkit? If you just got fired, would you trust your former company with all your private data? In fact, would you trust your company with all your private data on the phone even while you’re still working for them?

Summary: It Ain’t Secure!

Not secure for the company, not secure for the phone owner. Take my Advanced OS class, guys!

November 11, 2010

VMware MVP: What it really is

Recently I had a look at what has become of VMware’s MVP and explained the security shortcomings of the Type-2 hypervisor design. Today I’m looking at VWware’s approach in more detail, and explain why it is in fact not a real Type-2 hypervisor, and what this implies.

Type-2 hypervisors are known for poor performance. The reason I had explained in detail a while back, I’ll summarise them here (refer to the earlier blog for more details).

Type-1 (left) 4 mode, 2 context switches, Type-2 (right) 8 mode, 4 context switches

Virtualization mechanics in Type-1 and -2 hypervisors

A system call performed by an application is a privileged operation which is intercepted by the hypervisor, which (after deciding that this is an operation which should be handled by the guest) forwards it to the guest OS. The return to user mode from the guest takes a similar detour through the hypervisor, as indicated in the left part of the diagram.

In the case of a Type-1 hypervisor, this results in a total of four mode switches and two context switches. However, in the case of a Type-2 hypervisor, the system call is trapped by the host OS, which delivers it to the hypervisor, and a return from the hypervisor to either the guest or the app similarly takes a detour via the host. All up, the number of mode switches and context switches is doubled, as indicated in the right part of the diagram. Further cost arises from the fact that while a Type-1 hypervisor (such as OKL4) is highly optimised for this trampolining, the host OS generally isn’t. In reality, the overhead of doing a simple system call is in the Type-2 case not just double that of the Type-1, but closer to an order of magnitude higher. This is why virtualization with a Type-2 hypervisor is generally slow. Note that ARM’s forthcoming architecture extensions to support virtualization (I’ll discuss them in a future blog) help to reduce the overheads of a Type-1 hypervisor, but do little to help a Type-2.

VMware understands this, and has taken a different approach in MVP, which I’ll explain now.

Fundamentally, the high cost of Type-2 virtualization stems from the fact that the hypervisor effectively consists of two parts, the host OS and the hypervisor proper, that each (logical) hypervisor invocation bounces twice between those layers, and that the host mechanisms used for this bouncing are inefficient. So, what VMware does in MVP is to merge the hypervsior back in with the host.

Hybrid hypervisor adds kernel module to host and runs guest VM like normal app

Hybrid hypervisor

This is done by loading a MVP module (called “MVPkm”) into the host OS kernel, as shown in the diagram to the right. (They discuss this for Android, it is not clear whether they plan to support other hosts, such as Windows or Symbian. If they do, they’ll have to redo the kernel module for each host.) The MVP module effectively hijacks the host, by re-writing the exception vectors, so it obtains control whenever the guest kernel is entered. (Note: this is exactly what a piece of malware would do.) The process turns the host kernel into a hypervisor.

The result is not really a Type-2 hypervisor any more, as it actually runs native, not on top of a host OS (but inside) and has direct control over physical resources (rather than the virtualized resources provided to it by the host). However, it it isn’t a Type-1 hypervisor either, as it does not have exclusive control over the hardware, this is shared with the rest of the host, and any code inside the host kernel can interfere with the operation of the hypervisor module.

So, if this hypervisor is neither a Type-2 nor a Type-1, what is it? I call it a hybrid hypervisor, as it is somewhat of a blend of the two basic types. A better-known representative of the hybrid hypervisor type is the widely-used KVM (often falsely referred to as a Type-2 hypervisor). It operates very similarly, although KVM is dependent on virtualizaiton extensions to the architecture (MVP is not, but can make use of them).

The hybrid hypervisor can achieve similar performance as a Type-1 hypervisor, so this scheme seems pretty neat at first glance. The problem is that this performance is bought at a heavy price.

The one advantage a Type-2 hypervisor has over a Type-1 is that it can be easily installed: for the host OS it’s just another app, and it is installed just like an app, without requiring any special privileges.

This advantage is lost with the hybrid approach. It requires inserting a kernel module into the host OS, which is a highly security-critical operation (after all, it is the same as installing a root kit into the kernel!) As such it requires special privileges. On a mobile phone it requires cooperation with the device vendor or network operator, as they try very hard to prevent the unauthorised insertion of malware-like code into the OS!

While losing the ease-of-install advantage of the Type-2 to buy Type-1-like performance, the hybrid hypervisor inherits all the other drawbacks of the Type-2 hypervisor, especially the huge size of the trusted computing base. Everything in the host OS (all of a million or so lines of code!) needs to be trusted, a huge attack surface. So, while MVP is a hybrid hypervisor rather than a real Type-2, everything about the drawbacks of VMware’s approach I discussed in the earlier blog and its successor remains valid!

In summary, the hybrid approach taken with MVP has no discernible advantage over a lightweight, high-performance Type-1 hypervisor such as OKL4. MVP still requires manufacturer/MNO cooperation to install (unlike a real Type-2). It can, in theory, reach the performance of OKL4, although I’ll believe that when I see it, given that OKL4′s performance is so much better than anything else I’ve seen. But the fundamental weakness of the hybrid approach, which it shares with proper Type-2 hypervisors, is that it adds nothing to security of the guest apps, they are every bit as exposed as if they were running directly on the host. Which begs the question: Why bother?

Speaking of attacks, if you think carefully about it, you realise that MVP might very well increase the exposure of handsets to malware. Put yourselves in the shoes of a blackhat and think about how to get a rootkit onto a handset. If you know that a handset is provisioned to have MVP loaded on it, you know that it has provision for loading the MVP kernel module. It might well be that the easiest way to crack the system is to write a rootkit module which masquerades as MVPkm. I’ll sure stay away from such phones!

In a future blog I will investigate how each type of hypervisor does (or doesn’t) support the various use cases for mobile virtualization. Stay tuned, and drop me a line if you have questions.

October 22, 2010

VMware’s MVP—Encryption Doesn’t Make It Secure!

Last week I talked about the backwards step VMware is taking by implementing their long-overdue mobile virtualization platform (MVP) as a Type-2 hypervisor. In the meantime, an insightful blog (which liberally quotes from my blog, although without attribution) talks about their use of encryption to try to protect user (actually, enterprise) data. I’ll explain here why this is just window-dressing, providing an appearance of security rather than the real thing.

VMware say they encrypt the guest’s data on flash and also use an encrypted VPN tunnel to connect to the enterprise network. Surely, this will protect the data from attacks?

Type-1 (left), Type-2 (right)

Type-1 and Type-2 hypervisors and system structure

Surely not. This is akin to thinking that the data on your Windows laptop is safe from rootkits because the disk is encrypted. It ain’t. Where encrypting the disk helps is if you lose your laptop and someone finds/steals it and breaks into it. If your OS gets infected by malware, it helps zilch. ‘Cause in order to be processed, the data is loaded into memory and decrypted. And there it is fully accessible by the OS, and if that OS is infected, there’s no way to stop the malware from seeing (and leaking) your data.

Same story on the phone with the Type-2 hypervisor. The hypervisor can encrypt the guest’s data until the cows come home, that doesn’t protect it from malware infecting the hypervisor or the host OS underneath. If the host gets cracked, the hypervsior gets cracked. If the hypervisor gets cracked, you lose. No way around this fundamental truth. And the inconvenient bit of the truth is that the host+Type-2 presents a huge attack surface. While for a well-designed Type-1 hypervisor, such as the OKL4 Microvisor, that attack surface is tiny, about two orders of magnitude smaller. Take your pick!

So, what is an MVP-style solution good for? I’ll look at this later, but first need to take a more in-depth (and rather technical) look at VMware’s approach. Stay tuned!

October 14, 2010

Much Ado About Type-2

VMware has finally lifted the lid on their long-promised mobile virtualization platform (MVP). And, surprise, it’s a Type-2 hypervisor! This is a bit of a let-down, and has some interesting implications on what MVP can (or rather cannot) do, which I’m going to explore in a few blogs.

First a bit of background. Observers of the mobile virtualization space will remember that about two years ago, VMware, better known for server and desktop virtualization products, bought our then competitor Trango. At the time they promised MVP-based products “should arrive in around 12 to 18 months“. That’s phones with MVP on it. Almost 24 months later, there isn’t even a product announcement for MVP. It’s been a bit like waiting for Godot…

In the meantime, the OKL4 Microvisor has been around for yonks. It’s available, it’s benchmarkable, it’s being deployed—it’s real. And, as befits something with “L4” in the name, it defines the state of the art of hypervisors for embedded systems.

Well, at last (least?) VMware presented their vision, accompanied by a demo, at a BOF at last week’s OSDI conference in Vancouver. Not exactly a high-profile announcement. And it’s a Type-2 hypervisor!

I’ve discussed Type-1 vs Type-2 in a blog a year ago, and another one a few months earlier, and will probably explore this topic a bit more in a future blog. For now I’ll focus on what VMware is trying to sell, and why it doesn’t actually doesn’t solve the problem they claim they are addressing. Further technical discussion will look at why they taking this particular stance. (Hint: If all you’ve got is a hammer, everything looks like a nail. Even an egg…)

Hypervisors (also called virtual machine monitors) are designed to provide multiple virtual machines which can each run an OS with all of its apps. The fundamental difference between a Type-1 hypervisor (such as OKL4) and a Type-2 is that the former runs on bare metal, between the hardware and the operating system(s). In contrast, a Type-2 hypervisor runs on top of an OS (which is why it’s also called a “hosted” hypervisor).

That difference is much more significant than it may seem. It implies a completely different relationship between the hypervisor and the various operating systems. With Type-1, the hypervisor is master, it controls the OSes (called “guests”). With Type-2, the master is an OS (the one which hosts the hypervisor), it controls the hypervisor, which can only control the other OSes. Keep this in mind.

So, what problems is VMware (pretending) to solve with their Type-2 hypervisor? The main use case they are highlighting is BYOD, “bring your own device”. (Yes, they adopted the terminology we introduced 18 Months ago—good on them!)

The motivation for BYOD is that smartphones have business as well as private use. People like to control their private phones: They want to decide on the type and model, and they want to install their choice of apps. In contrast, companies like control over the phones used for business: They want to decide the model (ideally a single one for everybody) and what software runs on them. This forces an increasing number of people to carry two phones, business and private.

The idea of BYOD is that a single phone can serve both purposes: a person buys a phone of their choice, takes it to their company’s IT dudes, and they install a virtual business phone on the BYOD handset. Sounds great, doesn’t it?

The devil is in the detail, and it’s those details which make MVP a non-solution.

Why do companies want control over the phone? There’s only one reason: security. The whole point of issuing smartphones to employees is to keep them linked into the enterprise IT infrastructure while they are on the move. Traditionally this is all about email, address books and calendars, but increasingly it is a much deeper integration, enabling the phone to access employee records, sales databases, engineering designs—anything you’d access from your computer screen in the office.

So, the bottom line is that companies are worried about the security and integrity of their data when accessed via the mobile device (phone, tablet or whatever it might be). They are worried that accessing this critical data from an uncontrolled phone puts the critical enterprise information at risk. And they are right: phones do get infected by malware, and with each application installed, the risk of infection increases. This is the core challenge BYOD must address.

Surely, VMware understands this? Maybe they do, but if so, why do they offer solution which doesn’t cut the mustard?

The reason I say this is that the BYOD model VMware is propagating does nothing to solve this fundamental security issue, while OKL4 does.

Type-1 (left), Type-2 (right)

Type-1 and Type-2 hypervisors and system structure

This is illustrated in the figures at the left. With OKL4, the (Type-1) hypervisor is in control of all hardware. It isolates the VMs and their OSes from each other. If the user gets their private OS infected, that’s tough for them, but the infection cannot spread across VMs to the business environment. In order to subvert this, the attacker either has to have already subverted some of the enterprise IT infrastructure (thus coming in from the business side into the business OS) or has to attack the hypervisor from the private VM. But the hypervisor has an extremely small attack surface! The hypervisor is very small (about 10,000 lines of code). Technically speaking, the business VM has a small trusted computing base (TCB).

In VMware’s Type-2 model, it’s quite different. The business environment is controlled by the hypervisor, which is controlled by the host OS (the one that comes with the BYOD phone). If this gets cracked, as it inevitably will be, then it’s trivial to crack the hypervisor, and then you control the business OS! The reason this is easy to crack is that in this setup, the business OS has a huge TCB. It includes the complete private OS, which likely comprises upwards of 1,000,000 lines of code—two orders of magnitude more than OKL4!

Now remember where we’re coming from. The original motivation for BYOD was that companies don’t trust people’s private phones with critical business data, because these phones get cracked, which would compromise the business data. The idea of BYOD, as promoted by OK Labs, is to provide a virtual business phone on the private handset which is just as secure as if it was a physically separate handset.

If you followed my argumentation above, you’ll see that VMware’s solution is no bit more secure than allowing people to access the business data through their normal private phones, without the detour via a hypervisor. In other words, MVP adds nothing to security. So why would you pay for it then? You might as well cut out the middle man and allow people to access the enterprise IT system from their unmodified private phones. Security-wise, there is no difference whatsoever.

At OK Labs, we believe that security isn’t something that’s solved with PR. It requires a technically-sound approach. It requires a minimal TCB. It requires OKL4.

Stay tuned for a more in-depth look at these issues.

August 31, 2009

Why multicore needs virtualization

System virtualization is increasingly being used in embedded systems for a variety of reasons, mostly anticipated in a paper I wrote last year. However, the most visible use case is probably still processor consolidation, as exemplified by our Motorola Evoke deployment. Given that the incremental cost of a processor core is shrinking, and likely to go to zero, this makes some people think that the use of hypervisors in embedded systems is a temporary phenomenon, which will become obsolete as multicore technology becomes the standard. These people are quite wrong: in embedded systems, multicore chips will depend on efficient hypervisors for effective resource management.

In order to explain this prediction, let’s look at a few trends:

  1. Embedded systems, particularly but not only in the mobile wireless space, tend to run multiple operating systems to support the requirements of different subsystems. Typically this is a low-level real-time environment supported by an RTOS, and a high-level application environment supported by a “rich OS”, such as Linux, Symbian, or Windows. This OS diversity will not go away, it will become universal.
  2. Energy is a valuable resource on mobile devices and must be managed effectively. Key to energy management is to provide the right amount of hardware resources, not more, not less. The most effective way of reducing energy consumption on a multicore is to shut down idle cores—the gain far exceeds that possible by other means such as dynamic voltage and frequency scaling (DVFS). This gap will become more pronounced in the future: on the one hand, shrinking core voltage squeezes the energy-savings potential of DVFS, while on the other hand, increasing number of cores mean that the energy-saving potential of shutting down cores increases while becoming at the same time a more fine-granular mechanism.
  3. Increasing numbers of cores on the SoC will encourage designs where particular subsystems or functionalities are given their own core (or cores). Some of these functions (e.g. media processors) will use a core in an essentially binary fashion: full throttle or not at all. These are easy to manage. However, other functions impose a varying load, ranging from a share of a single core to saturating multiple cores. Managing energy for such functions is much harder.

Because of point (2), (3) is best addressed by allocating shares of cores to functions (where a share can be anything from a small fraction of one to a small integer). Sounds like a simple time-sharing issue: you have a bunch of cores and you share them on demand between apps, turning off the ones you don’t need. Classical OS job, right?

Yes, but there’s a catch. Multiple, in fact.

For one, existing OSes aren’t very good at resource management. In fact, they are quite hopeless in many respects. If OSes did a decent job at resource management, virtualization in the server space would be mostly a non-event (in the server space, virtualization is mostly used for resource management). Embedded OSes aren’t better at this than server OSes (if anything they are probably worse).

Now combine this with point (1) above, and you’ll see that the problem goes beyond what the individual OS can do (even if the vendors actually fixed them, which isn’t going to happen in a hurry). In order to manage energy effectively, it is possible to allocate shares of the same core to functionality supported by different OSes.

Say you have a real-time subsystem (your 5G modem stack) that requires two cores when load is high, but never more than 0.2 cores during periods of low load. And say you have your multimedia stack which requires up to four cores at full load, and zero if no media is displayed. And you have a GUI stack that uses between half and two cores while user interaction takes place (zero when there’s none). Clearly, while the user is just wading through menus, only about 2/3 of the power of one core is required, but there are still two OSes involved. Without virtualization, you’ll need to run two cores, each at half power or less. With virtualization, you can do everything on a single core, and the overall energy use of the single core running at 2/3 of its power will be less than the combined energy gobbled up by two cores running on low throttle. (And on top you have the usual isolation requirements that make virtualization attractive on a single core.)

In a nutshell, the growing hardware and software complexity, combined with the need to minimise energy consumption, creates a challenge which isn’t going to be resolved inside the OS. It requires an indirection layer, which is provided by a hypervisor. The hypervisor maps physical resources (physical cores) to virtual resources (logical processors seen by the guest OSes). This not only makes it easy to add or remove physical resources to particular subsystems (something OSes are notoriously bad at dealing with), but can further consolidate the complete system onto a single core, shared by multiple OSes, when demand is low.

How about heterogeneous multicores? I’ll leave this as an exercise for the reader ;-)

August 23, 2009

Type-1 vs Type-2 — Does it matter?

A while back I discussed how one of our competitors (in an article which is great entertainment material for all the stuff it gets wrong), had falsely claimed that OKL4 was a Type-2 hypervisor. The Type-1 vs 2 issue has since come up a few times in different contexts, and there seems to be a bit confusion out there. So let me explain why no-one in their right mind would consider using a Type-2 hypervisor in a mobile phone.

A Type-2 hypervsior runs as a normal application on top of a normal OS, which is why it’s also called a hosted hypervsior. This is great for PCs, as it allows you, for example, to run Linux or Windows inside an application on a Mac, getting a bit of the best of two worlds. To a degree, at least. Anyone who does this (for example I run Linux on VMware Fusion on a Mac) will not fail to notice that some things are clearly much slower than they are on native Linux, or even on Linux running inside a virtual machine on a Type-1 (or bare metal) hypervisor. In fact, while performance differences between native Linux and Linux running in a Type-1 virtual machine is barely noticeable, the performance degradation on a hosted hypervisor is definitely significant.

The reason is simple: with a hosted hypervisor, you need to go through many more layers of software. For one, the inherent virtualization cost is at least doubled. A syscall on a natively running OS inherently costs two mode switches. Virtualized on a bare-metal hypervisor this becomes four mode switches and two context switches. On a hosted hypervsior this blows out to eight mode switches and four context switches. All that for only getting in and out of the guest kernel. My earlier blog has a more detailed explanation.

But a much bigger impact on performance has the fact that in a Type-2 scenario, the underlying OS effectively becomes part of the hypervsior, and it isn’t designed for that. Anyone who ever played with user-mode Linux (UML), which is a Type-1 scenario but using the general-purpose Linux kernel as the hypervisor, will confirm this. The performance just isn’t competitive, besides special hacks having  been made to Linux to make UML more efficient. So, the bottom line is that Type-2 hypervisors simply can’t compete with Type-1 hypervisors in performance.

So, why would anyone in their right mind use one for mobile phones? Beats me. If you look at the typical use cases for virtualization in mobile wirless devices, you’ll see that in many of them a hosted hypervisor is simply not suitale at all. In the cases where a hosted hypervisor could be used, it has no compelling advantage over a native hypervisor, but a compelling performance disadvantage. Let’s look at the mobile virtualisation use cases:

  • Processor consolidation: now way Jose. In the typical rich-OS + RTOS scenario, are you going to host the hypervsior on the RTOS? Most of them don’t even support memory protection, leave alone support for virtualization! Or host the hypervisor on the rich OS, running the RTOS on top? Clearly you’d lose the real-time properties for which you have the RTOS in the first place
  • License separation, especially of GPL code: won’t help you with re-using Linux drivers, and will defeat most of the purpose
  • Security: yes, a hosted hypervisor will preovide encapsulation, although at a much higher cost than with a native hypervisor, so why bother?
  • Architectural abstraction: yes, but only if the underlying host OS plays ball. Again, cut out the middleman and you’ve got a winner.
  • Resource-management for upcoming manycores: you lose with a hosted hypervisor, it buys you nothing there.
  • Multiple user-environments (private and enterprise) and BYOD? Trying to do this with a hosted hypervsior would degrade at least one of the envrionments to second-class citizen status. Not only performance-wise, but also security wise: the primary environment (which is hosting the hypervisor which supports the secondary environment) is in control of resources. This means that it would be the one the enterprise IT folks would trust and need to control. And the complete BYOD idea goes right out of the window. Clearly a non-starter.
  • The same can be said about other appraoches to using the phone as a terminal to access the enterprise IT infrastructure: Trying to do this in a hosted VM means you need to trust the host OS. The whole point is you don’t want to do this.

See what I mean? For All the use cases people talk about, a Type-2 hypervisor is either totally unsuited, or is a clearly second-rate solution compared to a Type-1 hypervisor. No-one with half a clue would want to do this. If you can think of reasonable use cases for hosted VMs, you’ll find that they are adequately supported by Java. Except that using a JVM allows a much leaner solution than a Type-2 hypervisor running on a rich OS.

You’ll likely get better mileage by using Java than an Type-2 hypervisor. But the Type-1 hypervisor is clearly the way to do. This is what OK does, competitor FUD notwithstanding.

July 27, 2009

Context switching in context

My Evoke white paper has triggered an unusual amount of comments, all to do with the brief discussion of fast context-switching and the graph reproduced here.

Context-switching latency as a function of the number of processes

Native Linux vs OK:Linux context-switching cost on ARM9 (smaller is better)

However, the unusually large number of comments and queries tells me that I should provide a bit more explanation. Here we go (although it can really all be found in the referenced publications too…)

The graph shows that context-switching costs in OK:Linux are significantly lower than in native Linux on an ARM9 platform, but the gap narrows as the number of processes exceeds 8, and the graph ends at 16. And I said that “the improvement is particularly large if the number of presently active processes is small—a typical situation for mobile phones.”

First, I have to admit I was a bit sloppy when talking about “active processes”, many people (not unreasonably) interpreted to refer to the total number of non-zombie processes in the system. I should have used the technically accurate formulation “process working set”, but didn’t want to get too technical. Clearly, you should never be sloppy…

What does the “process working set” mean? The working set is a technical term in operating systems referring to the subset of a particular resource that is in active use over a certain (small) window of time. It can be applied to main memory, cache memory, processes or others. In this case the “process working set” is the set of processes which execute on the CPU in a particular, small, time window.

How small is “small”? That depends (and I remember that as a student I was frustrated that the prof didn’t provide a good answer to that question). Given the numbers we are talking about here (order of 10 processes, and order of 0.25ms overhead in native Linux) it makes sense to look at a few miliseconds. Given that the default time slice of Linux is 10ms, let’s say we’re looking at a window of a single time slice.

So, how big is the process working set in a typical (embedded) Linux system? Small. Remember, as long as a running process doesn’t block (waiting on a resource, such as I/O or a lock) or is preempted (as a result of an interrupt) it will execute for its full time slice. The working set size in this case is one. On my Linux laptop there are at any time at least 200 processes, but almost all of those are blocked waiting for some event (such as standard input or a mouse click). The number of processes running during a time slice will be small, I’d say it’s less than ten almost all the time. On a phone it’s likely to be less than on my laptop. Phones are becoming more and more like laptops, but they really aren’t doing as much as a typical laptop with dozens of windows and all those background processes our highly-bloated desktop environments are running.

So, I clearly stand by my claim, disputed by some, that the process working set on a phone is typically small, and normally much less than ten. Which implies that the context-switching overhead of OK:Linux is about an order of magnitude less than that of native Linux.

What if it does get bigger occasionally? The graph ended at 16, does that mean OK:Linux cannot have more than 16 processes in the working set?

Nope, there is no such limitation. OK:Linux supports as many processes as native Linux does, no matter how many of them form the working set.

Those of you who have read the FASS papers (referenced by the white paper) will know that our fast context switching is based on the use of a feature of the ARM9 MMU called “domains”, and there are only 16 of them (and one is reserved for kernel use, so there are 15 available). So, what if we have more than 15 processes? Well, we do what any decent OS does if it runs out of a resource: it recycles. So, we use “domain preemption” to share a limited number of domains among a greater number of processes. That has a cost, but it’s still better than not using domains at all, as the graph also shows: With 16 processes the latency is still only about 2/3 of that of native Linux. Once the process working set size gets really large, OK:Linux overheads end up a bit higher than those of native Linux. But I’ve never seen a mobile system with such a large process working set (remember, my busy laptop doesn’t even get there, how would a phone?)

But, of course, you don’t have to believe me, you can see for yourself. Check out the Evoke and its snappy UI, and tell me whether you’ve seen a phone with similar functionality, running on an ARM9 processor (even a dedicated one) that does better!

Follow

Get every new post delivered to your Inbox.