Virtualization: Some get it, some don’t
This is the fourth (and hopefully last) installment of my dissection of a paper our competitors at VirtualLogix had written and presented at IEEE CCNC last January. The paper compares what the authors call the “hypervisor” and “micro-kernel” approaches to virtualization.
In the previous blogs I explained how their flawed benchmarking approach makes their results worthless, that their idea of a microkernel represents a 1980’s view that has been superseded more than 15 years ago and gave specific examples of this. Now I’ll look at what they say about virtualisation and hypervisors, which, as it turns out is similarly non-sensical.
Their explanation of virtualization approaches starts off with somewhat amusing statements, when they say that for pure virtualization (i.e. without modifying the guest OS) you either need hardware support that’s only available on the latest processors or need use binary translation. That will come as a surprise to the folks at IBM who’ve done pure virtualization without binary translation for 40+ years…
What they apparently mean (but fail to say) is that many recent processors were not trap-and-emulate virtualizable and therefore require either of the above.
Much more concerning is that they present OKL4 as a Type-II (aka “hosted”) hypervisor — these are generally reputed to perform poorly compared to Type-I (aka. bare-metal) hypervisors, so the reader is at this stage primed to expect poor performance from OKL4.
This is quite misleading. A Type-II hypervisor runs as an application program on top of a full-blown OS. The virtual machine runs on top of that, and the Type-II hypervisor intercepts the VM’s attempts to perform privileged operations — typically by using the host OS’s debugging API — an expensive process. A Type-I hypervisor, in contrast, runs on the bare hardware.
A critical difference is in the number of mode switches and context switches required to virtualise an operation performed by the virtual machine. Say the app running in the virtual machine (let’s call it the guest app) is executing a system call, which causes a traps into privileged mode. This is shown in the figure below.
In the Type-I case, the syscall invokes the hypervisor. The hypervisor examines the trap cause, finds that it was a syscall meant for the guest, and invokes the guest’s trap handler. The guest delivers the service and executes a return-from-exception instruction (in the case of pure virtualization) or an explicit hypercall (in the case of para-virtualization). In each case, the hypervisor is entered again to restore the guest app’s state. In total this requires four mode switches (app-hypervisor-guest-hypervisor-app) and two switches of addressing context (app-guest-app).
In the Type-II case, the trap invokes the host OS. It notices that a “debugger” (the hypervisor) has registered a callback for syscalls, so it invokes the hypervisor. This one knows that this is an event for the guest OS, so it asks the host OS (via a syscall) to transfer control to the guest. The guest delivers the service and attempts to return to its app. This again traps into the host, which invokes the hypervisor, which invokes the host to return to the app. In total, 8 mode switches (app-host-hypervisor-host-guest-host-hypervisor-host-app) and 4 context switches (app-hypervisor-guest-hypervisor-app). Obviously much more expensive.
So, how does that compare to virtualization using OKL4? This is what happens if an OK Linux app performs a system call: The app causes a trap, which invokes the OKL4 kernel. It notices that this is not a valid system call, so it considers it an exception, which is handled by sending an exception IPC to the app’s exception handler thread. This happens to be the “syscall thread” in the OKL Linux server. OK Linux delivers the service, and invokes the OKL4 kernel to return to the app. In total, 4 mode switches (app-OKL4-Linux-OKL4-app) and 2 context switches (app-Linux-app).
Does this sound like a Type-II hypervisor? You’ll note that the expensive operations which determine the virtualization overheads (mode changes and context switches) are exactly those of a Type-I hypervisor. And the statement made in the paper that “the result of implementing virtualization on a micro-kernel host is a level of complexity similar to fully hosted VMs with high performance impact”? I think you’ll agree with me that this is utter nonsense.
Then there’s a truly stunning statement: “The micro-kernel is involved in every memory-context switch and the guest OS must be heavily modified to allow this.” The non-expert reader will be excused for interpreting this statement as implying that a “hypervisor” doesn’t have to be involved in every context switch. How is that possible? A context switch is clearly a privileged operation, as it changes access rights to physical memory. The hypervisor (and this is consistent with the definition given in the VirtualLogix paper) must have ultimate control over physical resources, and thus must be involved in each context switch.
Anyone who doubts this clearly doesn’t understand virtualization. So, this is an interesting statement from a virtualization provider. Folks, you really should take UNSW’s Advanced OS course! (Note that it would be possible to build hardware that knows about multiple user contexts, and can switch between them in certain situations without software interaction. But no ARM processor has such hardware.)
Finally, there’s the claim that for para-virtualizing a guest, the “microkernel approach” (read “OKL4”) requires more modifications than the “hypervisor approach” (read “VLX”). And this is supported by another apples-vs-oranges comparison: It is stated that to para-virtualize Linux 2.6.13 on a particular ARM platform for VLX required changes to 51 files and adding 40, while OK Linux “required” a new architecture tree which includes 202 new files, 108 of them ARM specific.
What they fail to mention is that this is a (well-documented) design choice of OK Linux. When para-virtualizing Linux, we chose to make the para-virtualization as independent of the processor architecture as possible. Hence we decided to introduce a new L4 architecture, and port to that one, with most processor-specifics hidden by the microkernel. Therefore the comparison made in the paper cannot be considered fair. Fairer would be to compare the sum of changes required to maintain para-virtualized Linux on several architectures.
But this isn’t the full story either. Just counting “added files” is a poor (and unscientific) metric. How big are those files? Once the first port is done, how many of them will ever need changing as Linux evolves? Without addressing those (and similar) questions, this comparison is completely meaningless.
Finally (and worst of all), the authors fail to mention another rather relevant fact. OK Linux on ARM9 processor is somewhat special: it contains support for fast context switching, the trick we use to make para-virtualized Linux do context switches 1–2 orders of magnitude faster than native Linux, without sacrificing security. Obviously, this requires some code to support (a fair bit, in fact). I’m not aware of VLX supporting this feature. So, this is apples-vs-oranges again.
Now, remember my first blog in this series. There I noted that they used an ARM11 platform for the performance comparison, and I stated that OKL4 wasn’t at the time particularly optimised on that platform, but highly optimised on ARM9. So, isn’t it interesting that for comparing code changes they actually use the ARM9 version (which requires far more extensive support code for fast context switching)? Can this really be an accident?
It’s amazing, isn’t it? Discussing what’s wrong with that paper produced almost as much text as the paper itself (and I’ve ignored plenty of minor things). And I would expect any of my students to be able to do this sort of dissection. You’ll probably understand that I think the reviewers of that paper were totally out of their depth.
In sum, that paper is utterly worthless, and almost all conclusions drawn in it are totally wrong.