Skip to content

How to (and how not to) use seL4 IPC

2019/03/07

IPC is a message passing mechanism, right? After all, it’s an acronym for inter-process communication?

That’s historically true, but seL4 IPC has come a long way, and its IPC primitive is quite a different beast from message-passing primitives offered by other OSes, even quite different from the original L4 IPC.

A little bit of history

Jochen Liedtke’s L4 microkernel, developed in the mid-90s, was the first that showed that IPC could be fast, an incredible factor of 10–20 over all contemporaries. And its IPC was a highly overloaded mechanism, providing:

  • by-value data transfer, including large and unaligned buffers
  • page mappings in power-of-two multiples of the base page size
  • synchronisation.

The last was a by-product of the performance-driven rendezvous design that implicitly synchronised sender and receiver, but was the only way to synchronise threads that did not share memory. L4 IPC also had timeouts to prevent denial of service.

Over the years we removed much of the L4 IPC functionality, mostly to simplify the kernel, but also the programming model. For example, L4 forced a multi-threaded design on many applications even on a single core. This is bad policy that forces concurrency control onto code that is not meant to be concurrent. In seL4 we simplified things to the bare minimum. As a consequence, the nature of IPC changed dramatically from the original L4.

What seL4 IPC is not

When dealing with seL4, you probably need to forget just about everything you may have learned about IPC from other systems. In particular, you should forget that it is an acronym. Think of it as a term like IBM, ARM or ACM, these names have far outlived the original meanings. The same is true for IPC in seL4. In particular:

  • seL4 IPC is not a mechanism for shipping data
  • seL4 IPC is not a mechanism for synchronising activities.

So, what is it then?

A great, time-honoured definition (that actually predates seL4) was given by my then student Chuck Gray:

IPC is a user-controlled context switch with benefits.

You switch to a different thread (usually in a different address space), without involving the scheduler. The benefit is that you get to carry along a bit of data. That definition captures the bare-bonedness of seL4 IPC, which is in line with the general seL4 philosophy that the microkernel is a minimal software veneer for securely multiplexing hardware.

A more utilitarian definition is this:

IPC is the seL4 mechanism for implementing cross-domain function calls.

You should really think of IPC in those terms, and only those. Something like an RPC mechanism, except you don’t go across networks, only protection-domain boundaries.

One of the implications of this definition is that there are really two main APIs for IPC:

  1. result = call (function, arguments)
  2. client = reply&wait (client, result, &arguments)

where function is the capability of the (server) function you invoke. Present mainline has the clients represented implicitly, but this being fixed with the introduction of explicit reply capabilities in the new mixed-criticality (MCS) kernel that is making its way through verification, and will eventually replace mainline. The MCS kernel makes invocation more like a function call, by providing passive servers, effectively resulting in a migrating-threads model. More on that below.

Any other IPC APIs are only for initiating communication protocols or exception handling.

IPC no-no’s

With the above definition in mind, it should be clear that you should never use it for a number of things, especially:

  • sending bulk data
    You don’t pass by-value arrays to a function. Don’t do it with IPC either. The message arguments are function arguments, i.e. small data items that form the function inputs
  • using send-only or receive-only IPC
    … except for protocol initialisation or exception handling
  • using IPC for synchronisation
    that’s what Notification objects are for!
  • using cross-core IPC
    that’s forcing synchronisation of activities that should be separate, and forces extra scheduler invocations. Don’t do it!

So, why are those things supported at all?

Good question, and the answer is historical in some cases. And I certainly think we should reduce the size of the IPC buffer to no more than 64 bytes. You should never need more, and reducing this size will reduce the kernel’s worst-case execution time (WCET), which is important for real-time systems.

Obviously, send-only and receive-only IPC is still needed for the listed exceptions.

Cross-core IPC might actually still have valid use cases in the mainline kernel (that kernel is legacy for me, so I won’t bother thinking this through). The new MCS kernel introduces scheduling contexts that are passed along with IPC, and passive servers, which can only run at the expense of their client. This kernel definitely does away with the need for cross-core IPC, and we should seriously consider removing it.

Some interesting consequences

When IPC is used correctly, endpoint queues are shallow.

In particular, a shared server (i.e. a process providing functionality to multiple clients via an RPC interface) should always run at a priority at least as high as that of any of its clients. This prevents the server from being preempted by its own clients. (Not following this rule would result in high-priority clients impeding their own progress, including possible deadlocks.) If this rule is followed, then intra-core IPC can never lead to more than one thread blocked on a particular endpoint.

In other words, the maximum queue depth for an endpoint in a properly-designed single-core system should be one. This means the simplest implementation of the queue, i.e. a linked list, should be just fine.

On multicore, following the no-cross-core-IPC rule, it should be the same, as inter-core servers should be invoked by a protocol using shared buffers synchronised with semaphores (Notification objects).

However, with the MCS kernel and passive servers, the story is a bit different. A passive server always executes on its client’s core (so the local-only rule still applies). However, it can be invoked from any core – in fact, that’s the elegant way to implement mutual exclusion with priority ceiling, a fast and deadlock-free way to protect critical sections. This means that a passive server’s request endpoint can queue up to n-1 clients on an n-core system.

seL4 is not meant to run as a single kernel image on large number of cores, it is meant to scale as far as cache-line migration costs are small (say 10–20 cycles). For larger machines, clustering should be used. This means the linked-list implementation of endpoint queues should serve the multicore case as well.

Summary

Use IPC correctly: for RPC-like invocation of functions in a different address space. Period.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: