1. Introduction
Hubris is a small operating system intended for deeply-embedded computer systems — the kind that usually don’t have any user interface, or way for an operator to intervene. Because it’s important for systems like that to take care of themselves, Hubris is designed around the goal of robustness.
1.1. Obligatory bulleted list of features
-
Designed for 32-bit microcontrollers with region-based memory protection and kiB to MiB of RAM and Flash.
-
Tasks are separately-compiled programs, isolated from one another in memory, and running entirely in the processor’s unprivileged mode(s).
-
A flexible IPC model allows tasks to talk to each other.
-
A small kernel running in privileged mode provides these abstractions and very little else.
-
Drivers live in tasks, not the kernel.
-
All task mechanisms are designed to allow component-level reboots without taking out the whole system, under application control.
-
One “special task,” the supervisor, implements all task management and crash recovery policies, outside the kernel. (Applications are free to provide their own supervisor.)
-
Applications ship as integral firmware images, containing a set of tasks and the kernel built at the same time. Piecewise update of tasks, or creation of new tasks at runtime, is deliberately not supported.
1.2. Architecture
An application using Hubris consists of a collection of tasks and the Hubris kernel.
+---------+ +---------+ +---------+ +---------+ \ | task | | task | | task | | task | | +---------+ +---------+ +---------+ +---------+ | +---------+ +---------+ +---------+ +---------+ | | task | | task | | task | | task | | application +---------+ +---------+ +---------+ +---------+ | | +---------------------------------------------+ | | kernel | | +---------------------------------------------+ /
The Hubris build system compiles the tasks and kernel with features chosen by
a configuration file called app.toml
, which defines the structure of the
particular application. The scheme is designed so that tasks can be written to
be somewhat generic, and then customized for the application.
An application is the unit of firmware that is shipped and flashed. We do not support updating parts of an application in the field. This is to ensure that we’ve tested the particular combination of parts that we ship. This decision has a lot of implications on the design of the rest of the system — for instance, there is no particular requirement for inter-task ABIs to be stable if all tasks will be rebuilt together.
Finally, an important thing to know about Hubris’s architecture is that it is a physically addressed system. Each task’s memory occupies separate, non-overlapping sections of address space, and the kernel has its own section. This is different from most memory-protected operating systems, like Unix or Windows, where each program is allowed to believe it occupies the entire address space, through hardware trickery. We initially chose to make Hubris physically mapped out of necessity: the low-complexity low-cost microcontrollers we target simply do not have virtual memory mapping hardware. However, it turns out that having all application components visible in a single address space makes debugging dramatically simpler. As a result, we currently intend to keep the system physically mapped, even if we port to a processor with virtual addressing support.
1.3. Philosophy
1.3.1. Toward robustness
We’re trying to inch closer to robustness than our previous systems could, through a combination of decisions.
More memory safety. The bulk of both the Hubris kernel and our applications are written in safe Rust, with careful sprinklings of unsafe Rust where required. “Unsafe” Rust is still a much safer language than C or assembler and helps us avoid thinking about a bunch of potential bugs.
Fault isolation. Tasks, including drivers, can crash independently. An application might choose to have a driver crash ripple out into clients, but could also choose to notify clients and have them retry requests — whichever is appropriate. Memory protection is vital for ensuring this; without it, once some invariant in the system is observed to be broken, you have to assume they’re all in jeopardy.
Holistic deployment. It’s critical to ship the code you test, but once a program has been factored into a bunch of separately compiled pieces, there’s a temptation to update each of these pieces independently. This leads to a combinatorial explosion in the configurations you’d have to test to be thorough. To avoid that, engineering processes pick up a lot of overhead about conscious forward- and backward-compatible API design, etc. We’ve chosen to bypass this and assume that all the software that runs on a given processor was built — and tested! — together.
1.3.2. Pragmatism
There are a class of “ideal attractors” in engineering, concepts like “everything is an object,” “homoiconicity,” “purely functional,” “pure capability system,” etc. Engineers fall into orbit around these ideas quite easily. Systems that follow these principles often get useful properties out of the deal.
However, going too far in any of these directions is also a great way to find a deep reservoir of unsolved problems, which is part of why these are popular directions in academia.
In the interest of shipping, we are consciously steering around unsolved problems, even when it means we lose some attractive features. For instance:
-
While we expect interrupts to be handled in unprivileged tasks in general, we have left allowances for applications to handle interrupts in lower-latency but more-dangerous privileged code if required.
-
While we’re bullish on Hubris’s ability to enforce system-wide
W^X
— that is, having no executable sections of the address space writable or vice versa — this is not mandatory, in case you need to do something we didn’t foresee. -
We have chosen fixed system-level resource allocation rather than dynamic, because doing dynamic properly in a real-time system is hard. Yes, we are aware of work done in capability-based memory accounting, space banks, and the like.
-
Speaking of capabilities, in the security sense, Hubris doesn’t use any. The only object of note in the system is the task, and any task can look up and talk to any other task; we plan to address the most obvious issues with that statement using mandatory access control. Capabilities raise issues around revocation, proxying, branding, etc. that can yield useful results but don’t seem necessary for our application.
-
We have (so far) not done any kind of inter-task type system relying on session types and the like.
1.3.3. Implementation
We are doing our best to avoid writing code.
That might seem like an odd statement coming from a group that has written an operating system from whole-cloth, and it is.
Adding code to a system like this adds attack surface, new corner cases that must be tested, things the programmer has to reason about, and — most mundanely — more code we have to understand and maintain.
We’re working hard to avoid adding features to the lower layers of the system, even when it takes a little more work at higher levels to compensate. For instance, the original Hubris proposal included a MINIX-3-inspired asynchronous messaging primitive for use by the supervisor task; we’re conscious of the massive impact this would have on the system, and have been going out of our way to avoid implementing it.
Now, that being said: we are doing our best to ensure that the code we do write is correct.
In many ways, Rust makes this part of the job easy, but “cowboy coding” is as feasible in Rust as in other languages, given a sufficiently motivated cowboy. Culturally, we try to avoid being “clever” in pursuit of a few saved bytes or cycles, and instead solve problems in ways that are more likely to be correct. We also prize correctness by construction where possible, meaning, designing pieces of the system such that illegal or undesirable states simply can’t be represented, and defining operations that compose in predictable and useful ways that can be discovered by applying local reasoning.
2. Tasks
Other than the kernel, all the code in a Hubris application lives in one or more tasks. Tasks are programs that run in the processor’s unprivileged mode, subject to memory isolation. This means that tasks cannot directly stomp on memory owned by the kernel, or each other.
2.1. A brief note on priorities
Tasks have priorities, which are currently fixed at build time. These are small integers.
Priorities are numbered in a way that may feel backwards: 0 is the highest priority, 1 is the next highest, and so forth. In general, when we talk about something being “higher priority,” we mean its priority number is numerically smaller.
2.2. Scheduling
The kernel is responsible for scheduling tasks by swapping between them as needed. At any given time, one task has control of the CPU, and the others are blocked.
Hubris uses a strict priority scheduling method: the kernel ensures that, at any given time, the highest-priority task that is ready to run has control of the CPU. If that task becomes ready while another task is running — say, due to an interrupt — the kernel will preempt the lower priority task and switch to the higher priority task.
Within a single priority level, multitasking is effectively cooperative: the kernel will never interrupt a task to switch to another task of equal or lower priority, until that task performs an operation that yields the CPU, such as sending a message or blocking to receive messages that haven’t arrived yet. The alternative to this is to implement time-slicing, where a task gets a fixed amount of time before another task at the same priority will have the opportunity to run; we chose not to implement this.
Priority levels in Hubris are effectively unlimited (currently, there are up to 256 of them), and using more levels has no runtime cost — so, if the absence of time-slicing is a problem for your application, you can use a single task per priority level and get full preemption.
2.3. Separate compilation
Tasks are separately compiled and do not share code. This is both good and bad. Some advantages:
-
Libraries that, for whatever reason, use static global data can’t accidentally propagate failures from one task to another.
-
Each task can customize the libraries it uses with different feature flags, without unexpected effects on other tasks.
-
Tasks can use totally different compiler optimization levels.
-
Tasks can, in theory, be written in entirely different programming languages, from assembler to C to Rust.
The main disadvantage is that it makes the application bigger. If three tasks
all use the useful_code
library, the application will contain three copies of
that library’s code in Flash.
We could improve this situation by introducing an equivalent to shared libraries — something similar to another physically addressed system, such as early versions of Microware OS9. This would be a significant amount of work because all the off the shelf tooling we have access to assumes that shared libraries go hand in hand with virtual addressing. So, we have punted for now. |
2.4. Tasks can’t be created or destroyed
An application declares all of its tasks in its app.toml
config file, which
the build system processes to make an application image. The set of tasks
declared in that file is the set of tasks the application has, period. Unlike
most operating systems, Hubris does not support creating new tasks at runtime or
destroying existing tasks. This is on purpose. Task creation is a common source
of hard-to-account-for dynamic resource usage, which can enable denial of
service attacks.
Put another way, you can’t write a fork-bomb without fork.
This has a bunch of advantages, including:
-
It simplifies very important kernel code and data structures used to keep track of tasks.
-
It means we can check the peak memory consumption of the system under load at compile time, since there’s no way to create tasks in response to load.
-
It makes the addresses of tasks in physical memory predictable, simplifying the debugger.
-
Tasks can size data structures based on the fixed number of tasks in the system, if required. (This is used in supervisor tasks and in our prototype network stack.)
It does put some requirements on your design of applications, however, if you’re accustomed to being able to clone tasks in response to load. Hubris applications tend to be somewhat economical with tasks, such as by having a single task process multiple streams of data, so that the task’s resources can be bounded no matter how many streams arrive.
While tasks can’t be destroyed, they can be halted due to faults or other events. More on that below. |
2.5. Failure and supervision
Hubris is built on the assumption that individual tasks may fail. A task fails by causing a fault — or, more precisely, by taking an action that causes the kernel to assign a fault to it. There are a wide variety of faults in Hubris, and they fall into three main categories:
-
Hardware faults. Hardware faults are delivered by the CPU in response to program actions. Examples include dereferencing a null pointer, trying to access another task’s memory, jumping into non-executable RAM, and executing instructions that are illegal in unprivileged mode.
-
Syscall faults. Syscall faults occur when the task makes a syscall into the kernel, and does it wrong. Programmer errors that are indicated with error return codes in other operating systems are syscall faults in Hubris — under the theory that misuse of a syscall indicates a failed, malfunctioning program.
-
Explicit panics. Tasks may volunteer that they have failed by using a syscall to panic. In Rust, this maps to the
panic!
macro.
Regardless of the source, when a task faults, it
-
Immediately loses the CPU,
-
Has its state preserved to the extent possible, and
-
Has its fault recorded by the kernel.
Hubris itself does not restart faulted tasks. Instead, a designated task called the supervisor can be notified when a task faults and take action. Normally, the supervisor’s response will be to read information about the fault from the kernel, log that somewhere, and ask the kernel to reinitialize the failed task, as described in the next section.
(For a more detailed look at supervisors, see Supervision.)
2.6. Initialization and re-initialization
At boot, the Hubris kernel sets some number of tasks to run. (The application
can designate which tasks to start at boot in its app.toml
.) Later, the kernel
may need to restart failed tasks, on request from the supervisor. Hubris
deliberately uses the same code path for both of these operations.
The steps taken when (re)initializing a task are:
-
Increment the task’s generation number. This is a small counter kept on each task that tracks restarts.
-
Identify any other tasks that were blocked interacting with this task’s previous generation. Unblock them, delivering a recognizable error code to tell them what happened. (More on this in Death and IPC.)
-
Reset the task’s registers to their initial values, which were chosen at compile time based on information in the
app.toml
. -
Reset the task’s timer. (Timers will be discussed in the section Timers.)
-
“Scribble” the task’s stack memory with a recognizable pattern. This helps catch accesses to uninitialized stack memory (in languages other than Rust, or excessively clever Rust) and can be used by the debugger to determine each task’s actual peak stack usage, by looking for how much of it has been overwritten.
-
Mark the task as runnable.
It’s worth noting a few things that the kernel does not do during reinit:
-
The task’s memory protection configuration in the kernel is left unchanged, since there are no APIs changing a task’s memory protections.
-
It doesn’t initialize the task’s
data
or zero itsbss
. We leave this to the task itself, so that we don’t make too many assumptions about task internal memory layout. (For programs written in Rust, this is handled by the_start
routine inuserlib
before execution reachesmain
.) -
It doesn’t do anything to the task’s executable code, which is assumed to be in execute-in-place Flash and immutable. (Hubris has no equivalent to a “loader.”)
3. IPC
Hubris IPC (inter-process communication) provides a mechanism for communicating between tasks. It’s designed to be easy to reason about, and to work well with Rust’s ownership model.
This chapter takes a high-level look at IPC in Hubris, which can be broken apart into four pieces:
-
How to send messages,
-
How to receive and handle messages,
-
How those two bits interact with task restarts, and
-
Notifications, an alternative lighter-weight IPC mechanism.
In practice, most code you write on Hubris will be using various abstractions or wrapper libraries that obscure the specifics of IPC. However, we think it’s important to understand what’s really going on, even if you choose to mostly use it through a library — and besides, somebody has to write those libraries, and that could easily be you. So in this chapter, we’ll peel back the abstractions and deal in the raw IPC operations provided by the kernel.
IPC is technically a misnomer, since we don’t have what most folks think of as “processes” (the P in IPC). But hey. Inter-Task Communication just isn’t as catchy. |
3.1. Synchronous IPC basics
Hubris IPC is based around synchronous messaging. Communication between tasks consists of sending and receiving messages. Sending a message always blocks the sender, waiting for a reply. A task that asks to receive a message will block if no messages are available. In the common case, the sequence is
-
The recipient task, B, finishes doing something else and asks to receive.
-
It is blocked by the kernel, allowing other tasks to run.
-
The sending task, A, prepares a message to B and asks to send.
-
Task A is marked as blocked and task B is resumed to process the message.
When two tasks align in this way, a message transfer occurs. This is an action performed by the kernel on behalf of the two tasks. During a message transfer, the kernel copies a small data payload (the message) from the sender to the recipient, and provides the recipient with some metadata it can use to process the message.
This style of messaging is also called a rendezvous or handoff, because both tasks need to arrange to “meet” in the right state for the message transfer to happen.
In case you’re concerned that all messages are small, you will be relieved to know that we have a mechanism for dealing with that; if you want to skip ahead, see Lending out memory. |
3.1.1. Why synchronous?
This is not the only way to design an IPC system. The main alternatives involve some sort of asynchrony or queueing, where a task can issue several messages and then do other work while waiting for responses. This can be useful, so it’s worth asking why Hubris is fully synchronous.
It turns out that synchronous IPC has significant advantages for systems like Hubris.
It’s fast. Synchronous IPC can be implemented using a single message copy from sender to recipient. The message need only be read once. This reduces the number of clock cycles required for a message transfer. (The operating system that originally made this point was L4, and Hubris’s IPC mechanism is directly inspired by L4’s.)
There are no queues to size. Asynchronous systems usually queue messages in kernel memory, which means there are now kernel-managed queues that must have their resources accounted for. In extreme cases, this can allow a single task to exhaust the kernel’s memory with impunity by expanding queues to the breaking point — which isn’t great for reliability. Many systems that try to avoid that problem do so by allowing the user to impose size limits on queues, which then become another thing users need to tune carefully for best performance. With synchronous rendezvous messaging, we avoid having queues in the first place.
It limits the power of any single task. A Hubris task can either be doing local work, or sending one message. It can’t, for instance, spam every other task in the system with messages simultaneously, or set up hundreds of messages to a single task. This avoids “fault amplification” scenarios where a bug in one task cascades into a much larger problem, as in a denial-of-service attack.
It lets message recipients reason about what their callers are doing. When your task is processing a message, it knows that the sending task is parked waiting for a reply.[1] It also knows that, if it goes to receive a second message before replying, that message won’t be from the same task. This allows message recipients to coordinate when senders run and don’t run, which is useful for implementing mutual exclusion, as we’ll discuss later.
Finally, synchronous IPC makes the system much easier to think about. Each task operates as a synchronous state machine, with IPC operations appearing as atomic and predictably ordered. Larger systems composed out of tasks can be easily halted and inspected to see who’s waiting on who. Lots of potential race conditions are eliminated. We’ve found this point to be hugely important during our early experience with the system.
You may be wondering about deadlocks in synchronous IPC. We avoid IPC-level deadlocks and priority inversion by imposing rules around messaging and task priority. This makes IPC-level deadlock impossible, but of course you can still write software that deadlocks if you try. More on this in the section Servers are clients too. |
3.2. Sending messages
To simply consume IPC services implemented by others, there’s only one operation
you need to consider: send
. send
operates a lot like a function
call:
-
It takes some arguments,
-
It returns some results,
-
It pauses your code (the caller) until it returns.
To use send
, you specify the task you’re sending to, and an operation code
that tells that task what operation you’re trying to invoke. (A given API will
document which operation codes are used.)
In Rust, send
has the following signature:
fn sys_send(
target: TaskId, (1)
operation: u16, (2)
outgoing: &[u8], (3)
incoming: &mut [u8], (4)
leases: &[Lease<'_>], (5)
) -> (u32, usize); (6)
1 | The TaskId of the task you’re trying to contact. |
2 | The operation code giving the operation you’re requesting. |
3 | The message you wish to send. |
4 | A buffer where any message sent in response will be deposited. |
5 | Zero or more leases, which we’ll discuss below in the section Lending out memory. |
6 | Returns a u32 giving the response code and a usize saying how many
bytes were written into incoming . These will both be described below in
the section Response codes and Result . |
The simplest case for send
is when you’re sending a small payload (say, a
struct
) and receiving a small response (say, another struct
), and the
recipient/callee is already blocked waiting for messages:
-
Your task invokes
send
, providing thetarget
(recipient task) andoperation
, as well as the message to send (outgoing
) as a&[u8]
. It also provides a buffer where the response should be deposited (incoming
), as a&mut [u8]
. -
The kernel notices that the recipient is waiting for messages, and directly copies your message (operation code and payload data) from your task’s memory into the callee’s.
-
The kernel then marks your task as blocked waiting for reply, and unblocks the recipient, informing it that a message has arrived.
-
The recipient does some work on your behalf.
-
It then uses the
reply
operation to send a response message back. (reply
will be covered soon in the section Receiving and handling messages). -
The kernel copies the response message into the
incoming
buffer your task provided, makessend
's two result values available to your task, and marks your task as runnable.
If the recipient is not already waiting for messages when your task tries to send, the process is similar, except that your task may be blocked waiting to send for an arbitrary period between steps 1 and 2.
Note that the kernel doesn’t interpret either the operation code or the message data flowing between tasks. Any meaning there is defined by the application. The kernel looks only at the task IDs, and copies the rest blindly from place to place.
The Hubris source code sometimes refers to operation codes as “selectors” or “discriminators,” because Cliff can’t make up his mind on which metaphor to use. |
3.2.1. Response codes and Result
From the caller’s perspective, send
deposits any returned data into the
incoming
buffer given to send
. But it also returns two integers:
-
A response code.
-
The length of data deposited in the return buffer.
The response code, like the operation code, is largely application-defined, but
it’s intended to be used to distinguish success from error. Specifically, 0
meaning success, and non-zero meaning error.
The length gives the number of bytes in the incoming
buffer that have been
written by the kernel and are now valid. This happens no matter what the
response code is, so an operation can return detailed data even in the case of
an error.
This scheme is specifically designed to allow IPC wrapper functions to translate
the results of send
into a Rust Result<T, E>
type, where T
is built from
the data returned on success, and E
is built from the combination of the
non-zero response code and any additional data returned. Most wrappers do this.
While response codes are mostly up to the application, there is a class of non-zero response codes used by the kernel to indicate certain failure cases related to crashing tasks. These are called “dead codes,” and will be covered later, in the section Death and IPC. Applications should choose their response codes to avoid colliding with them. Fortunately, they’re very large numbers, so if applications start their errors from 1 they should be safe. |
3.2.2. Message size limits
When a message transfer happens, the kernel diligently copies the message data from one place to another. This operation is uninterruptible, so it may delay processing of interrupts or timers. To limit this, we impose a maximum length on messages, currently 256 bytes.
If you need to move more data than this, you can use the “lease” mechanism, described in the next section. |
3.2.3. Lending out memory
Sending and receiving small structs by copy is a good start, but what if you want something more complex? For example, how do you send 1kiB of data to a serial port, or read it back, when messages are limited to 256 bytes?
The answer is the same as in Rust: when you call the operation, you loan it some of your memory.
Any send
operation can include leases, which are small descriptors that tell
the kernel to allow the recipient of the message to access parts of the
sender’s memory space. Each lease can be read-only, write-only, or read-write.
While the sender is waiting for a reply, the recipient has exclusive control of
the leased memory it is borrowing. If the caller resumes (generally after
reply
, but also possible in some corner cases involving task supervision) the
leases are reliably and atomically revoked.
This means it’s safe to lend out any memory that the caller can safely access, including memory from the caller’s stack.
This property also means that lending data can be expressed, in Rust, as simple
&
or &mut
borrows, and checked for correctness at compile time.
Each send
can include up to 255 leases (currently — this number may be
reduced because what on earth do you need that many leases for).
On the recipient side, leases are referred to by index (0 through 255), and an IPC operation would typically declare that it needs certain arguments passed as leases in a certain order. For instance, a simple serial write operation might expect a single readable lease giving the data to send, while a more nuanced I2C operation might take a sequence of readable and writable leases.
An operation can also take a variable number of leases and use this to implement scatter-gather. It’s up to the designer of the API. |
3.2.4. Making this concrete
Let’s sketch a concrete IPC interface, to get a feeling for how the various
options on send
fit together. Imagine a task that implements a very simple
streaming data access protocol consisting of two functions (written as in Rust):
fn read(fd: u32, buffer: &mut [u8]) -> Result<usize, IoError>;
fn write(fd: u32, buffer: &[u8]) -> Result<usize, IoError>;
enum IoError {
Eof = 1,
ContainedBobcat = 2,
}
These are basically POSIX read and write, only expressed in Rust style.
A concrete mapping of these operations to IPCs might go as follows.
Read. Operation code 0.
-
Message is a four-byte struct containing
fd
as a little-endianu32
. Borrow 0 isbuffer
and must be writable. -
Data will be written to a prefix of lease 0, starting at offset 0.
-
On success, returns response code 0 and a four-byte response, containing the bytes-read count as a little-endian
u32
. -
On failure, returns a non-zero response code that maps to an
IoError
, and a zero-length response message.
Write. Operation code 1.
-
Message is a four-byte struct containing
fd
as a little-endianu32
. Borrow 0 isbuffer
and must be readable. -
Data will be taken from a prefix of lease 0, starting at offset 0.
-
On success, returns response code 0 and a four-byte response, containing the bytes-written count as a little-endian
u32
. -
On failure, returns a non-zero response code that maps to an
IoError
, and a zero-length response message.
Either of these operations could be altered to also return the number of
bytes read or written in an error case, by making the response non-empty and
changing the IoError type in Rust to have data fields.
|
A very simple IPC stub for the read
operation might be written as follows.
use userlib::{TaskId, FromPrimitive, sys_send};
#[derive(Copy, Clone, Debug, FromPrimitive)]
enum IoError {
Eof = 1,
ContainedBobcat = 2,
}
fn read(task: TaskId, fd: u32, buffer: &mut [u8]) -> Result<usize, IoError> {
let mut response = [0; 4];
let (rc, len) = sys_send(
task,
0,
&fd.to_le_bytes(),
&mut response,
&[Lease::from(buffer)],
);
if let Some(err) = IoError::from_u32(rc) {
Err(err)
} else {
assert_eq!(len, 4);
Ok(u32::from_le_bytes(&response))
}
}
(write
would be nearly identical, but with the operation code changed.)
3.3. Receiving and handling messages
To write a task that implements some IPC protocol, we need to be able to receive and handle messages. There are two operations involved on this side:
fn sys_recv_open(
buffer: &mut [u8],
notification_mask: u32, (1)
) -> RecvMessage;
struct RecvMessage {
pub sender: TaskId, (2)
pub operation: u32, (3)
pub message_len: usize, (4)
pub response_capacity: usize, (5)
pub lease_count: usize, (6)
}
1 | The notification_mask is used for a facility we haven’t described yet,
which will be covered below in Notifications: the other IPC mechanism. |
2 | sender is the TaskId of the task that sent the message. This is provided
by the kernel and is reliable, i.e. there is no way for a task to lie here. |
3 | The operation code sent by the sender. (You might notice that this is 32
bits while the equivalent argument to send is only 16. This will also be
explained in the section Notifications: the other IPC mechanism.) |
4 | Length of the sent message. If this is larger than buffer.len() , the
caller sent an over-long message that has been truncated, and you likely want to
return an error. |
5 | Number of bytes the caller reserved for receiving your reply. |
6 | Number of leases the caller sent you. You can get additional information
about these borrows using the BORROW_INFO (6) syscall, and there are
additional syscalls for reading and
writing. |
fn sys_reply(peer: TaskId, code: u32, message: &[u8]);
Note that sys_reply
cannot fail. This will be unpacked in the next section.
3.3.1. Pipelining, out-of-order replies, and reply failure
Hubris does not require that you reply
before calling recv
again. You
could instead start an operation, do some bookkeeping to keep track of that
sender, and then recv
the next, with the intent of replying later. This
allows you to implement a pipelined server that overlaps requests.
Hubris also doesn’t require that you reply
in the same order as recv
. For
example, in a pipelined server, you might want to promptly reply
with an error
to a bogus request while still processing others. Or, in a fully asynchronous
server (such as a network stack for something like UDP), you might reply
whenever operations finish, regardless of their order.
Hubris doesn’t actually require that you reply
, ever. The caller will wait
patiently. This means if you want to halt a task, sending a message to someone
who will never reply is a reasonable technique. Or, a server could halt
malfunctioning callers by never replying (see next section).
What is required for reply
to succeed is that the sender must actually be
blocked in a send to your task. If you reply
to a random task ID that has
never messaged you, the reply will not go through. If the sending task has been
forceably restarted by some supervising entity, the reply will not go through.
Similarly, if an application implements IPC timeouts by forceably unblocking
senders that have waited too long (something you can choose to do), the reply to
the timed-out sender won’t go through.
Because the latter two cases (sender timed out, sender rebooted) are expected to
be possible in an otherwise functioning application, and because it isn’t clear
in general how a server should handle a behavior error in one of its clients,
the reply
operation does not return an error to the server, even if it
doesn’t go through. The server moves on.
This design decision copies MINIX 3, and those folks explained the decision in much greater detail. See [herder08ipc] for details, and [shap03vuln] for motivating history. |
3.3.2. Handling error cases on receive
Hubris assumes that you mistrust tasks sending you messages, and provides enough information to detect the following error cases:
-
Unknown operation code.
-
Incoming message shorter or longer than what you expected, given the operation code.
-
Wrong number of leases attached for the operation.
-
Sender’s response buffer too small to accommodate your reply.
Any of these suggest that the sender is confused or malfunctioning. You have a few options for dealing with these cases:
-
Immediately
reply
to the sender with a non-zero response code and zero-length message. Even if the sender is sending to the wrong task, the convention around non-zero response codes means this is likely to be interpreted as an error by the sender. -
Don’t reply. Leave the sender blocked, and instead notify some sort of supervising entity of a potential malfunction. Or, depending on your application architecture, just leave them blocked and expect a watchdog timer to handle the problem if it matters.
3.3.3. Open and closed receive
recv
can operate in two modes, called open receive and closed receive.
The general case that we’ve been discussing so far is the open receive case.
In a closed receive, the task selects a single other task to receive messages from. Any other sender will be blocked.
This can be used to implement mutual exclusion. If a client sends a lock request to a server, for instance, the server could then perform a closed receive and only accept messages from that client. Other clients could send lock requests, but they’d queue up, until the first client either sends a “release” message, or dies (see below).
3.4. Death and IPC
Tasks sometimes restart. For instance, the program running in a task may
panic!
or dereference an invalid pointer, both of which produce a fault
against the task within the kernel. Normally, the supervisor task is expected
to notice this and reinitialize the failed task. When the task is restarted, a
number associated with the task, its generation, is incremented in the kernel.
The TaskId
type used to designate tasks for IPC includes both a fixed
identifier for the task (its index) and this generation. The generation part of
the TaskId
is checked on any IPC, and if it doesn’t match, the operation will
fail.
This is intended to detect cases where, during an exchange of messages between two tasks, one restarts and the other doesn’t. Thanks to the generation mechanism, the task that didn’t restart will get notified that the other task did. It can then decide how to proceed — maybe the protocol between them is stateless, and no action is needed, but often some kind of an init sequence may be in order.
When an operation fails because of a generation mismatch, it returns a
predictable response code called a “dead code.” A dead code has its 24 top bits
set to 1, with the peer’s new generation number in the low 8. You can use this
to update your TaskId
and retry your request, for instance.
The only currently defined IPC operations that can fail in this way are send
and the closed version of receive
. reply
does not check generations in
keeping with its fire-and-forget philosophy, and the open version of receive
doesn’t take a TaskId
at all so there’s nothing to check.
It’s important to note that a generation mismatch may be detected at several different points in time:
-
When a message is initially sent.
-
After the sending task has blocked, but before the receiving task has noticed the message.
-
After the message has been received, but before it’s been replied to.
There’s currently no way for the sender to distinguish these cases, so, be prepared for any of them.
3.5. Notifications: the other IPC mechanism
In addition to synchronous messaging, Hubris also provides a very limited asynchronous communication mechanism called notifications. Notifications are designed to complement send-receive style IPC, and are intended for different purposes. Generally, notifications are useful for situations where one might use interrupts or signals in other systems.
Each task has 32 notification bits, which together form a notification set.
These bits can be posted, which means they are written to true
— the number
of posts is not tracked. Each posting operation can touch any subset of the
notification bits, which means the post operation is effectively bitwise-OR-ing
a 32-bit mask into the task’s notification set (which is exactly how it’s
implemented).
Importantly, posting a notification does not interrupt the receiving task’s code — it is not like a signal handler or asynchronous exception. Instead, the receiving task finds out about the notifications only when it checks.
Tasks check for notifications by calling recv
— the recv
operation takes an
additional parameter called the notification mask, which is a 32-bit word. Any
1-bits in the notification mask express to the kernel that the task would like
to find out if the corresponding bit in its notification set has been posted
since it last checked.
If any of the requested bits have been posted:
-
The kernel atomically clears the bits that have been noticed (but leaves others intact),
-
The task immediately returns from
recv
without blocking, and -
The result of
recv
is a notification message instead of an IPC.
The task can distinguish a notification message from its contents:
-
The sender’s
TaskId
will beTaskId::KERNEL
, indicating that the message comes from the kernel. Since the kernel never sends messages in any other context, any message “from” the kernel is a notification. -
The
operation
field will contain the bits that were posted and matched the provided mask. (These are also the bits that the kernel atomically cleared.)
3.5.1. What are they good for?
Notifications are used by the kernel to route hardware interrupts to tasks: a task can request that an interrupt appear as one of its notification bits. (More on that in the chapter on Interrupts, below.)
Notifications are also used to signal tasks that the deadline they loaded into their timer has elapsed. (More on that in the chapter on Timers.)
Finally, notifications can be valuable between tasks in an application, as a way
for one task to notify another of an event without blocking. In particular,
notifications are the only safe way for a high-priority server shared by many
clients to signal a single client — if it used send
instead, that client
could decide not to reply
, starving all the other clients.
In developing firmware on Hubris we’ve found a particular pattern to be useful,
called “pingback.” In this pattern, a high-priority shared server (such as a
network stack) has obtained data that it needs to give to one of its clients — but it can’t just send
the data for the reason described above. One option is
to have all clients forever blocked in send
to the server until data
arrived, but that keeps the clients from ever doing anything else! Instead, the
server and clients can agree on a protocol where
-
The client
send
s to the server, to do whatever setup is required (e.g. to express interest in data from a particular port). The client provides the server with a notification set that it wants to receive when the event occurs. -
The server notes this and
reply
s immediately. -
The client goes on about its business, periodically checking for the notifications it requested.
-
When the server receives data, it posts a notification to the client.
-
When the client notices this, it calls back to the server seeking more information, and providing a writable lease to some memory where the server can deposit the result.
-
The server receives this message and copies the data over.
This pattern is useful, in part, because it’s very tolerant of a defective client task. If the server posts the notification and the client never responds, it’s no skin off the server’s back — it’s still free to continue serving other clients.
4. Interrupts
This chapter discusses the role of CPU interrupts in Hubris and applications. We’ll be using some ARM-inspired terminology, because that’s the most mature Hubris port, but these ideas are intended to translate to RISC-V systems using controllers like the PLIC.
4.1. Interrupts vs Exceptions
Hubris distinguishes between interrupts and exceptions. The terminology is from ARM, but it’s a general idea that’s also valid on RISC-V.
An interrupt is generated by a peripheral to signal an asynchronous event. Drivers usually want to find out about some set of interrupts and react to them.
An exception is a broader idea that includes signals generated by the processor in response to the program’s behavior. This would include memory access violations, illegal instructions, privilege violations, bus faults, system calls, etc. Exceptions may be asynchronous but are often synchronous.
The kernel provides first-line handlers (interrupt service routines) for all exceptions, interrupts included, but will also route interrupts to applications.
4.2. The Hubris interrupt model
We assume an interrupt system with the following characteristics. This is based on the design of the ARMv6/7/8-M interrupt system, but is intended to be general.
You can likely skip this section if you’re not porting the kernel.
-
Interrupts can be numbered using small integers starting at zero. They may be a subset of exception numbers at a fixed offset — 16, on ARMvX-M — and we’ll subtract that offset when numbering interrupts.
-
Interrupts have a “pending” flag and an “enable” flag. The pending flag can be set by hardware at any time, but the ISR will only be invoked if the enable and pending flags are set at the same time. (This ensures that momentary events are latched and can be handled with a delay.) Starting the interrupt service routine clears the “pending” flag and may or may not affect the “enable” flag.
-
Interrupt handlers (or interrupt service routines) can be configured not to preempt one another, either by configuring priorities in the interrupt controller, or by using explicit critical sections in the implementation. Hubris runs a quick generic ISR that records an interrupt and returns, so nested interrupts are less important. If an interrupt arrives while an ISR is running, it should remain pending until the current ISR returns. (This also implies that interrupts cannot preempt the kernel, since the kernel always runs in interrupt context.)
For brevity in the discussion below, we’ll refer to an interrupt “happening” as when the CPU decides to execute the associated in-kernel interrupt service routine. That can occur when…
-
The
pending
bit is set whileenable
was already set. -
The
enable
bit is set whenpending
was previously set.
4.3. Interrupts from a task’s perspective
The app.toml
configuration file can route interrupts to tasks, by binding
platform interrupt numbers to notification sets. From the task’s perspective,
interrupts are delivered as notifications (see the IPC chapter).
Hubris does not allow a single interrupt to be routed to multiple tasks, which
means that a task has exclusive control over any interrupts it handles. Tasks
have access to a syscall, irq_control
, that they can use to mask and unmask
their interrupts.
When a task starts (or restarts) its interrupts are initially masked. This means
that the hardware enable
bit is clear, and interrupts can accumulate in the
pending
bit but will not translate to notifications.
The task can use irq_control
to unmask (or mask) a subset of its interrupts.
When calling irq_control
, the task names its interrupts by the notification
bits they will set when they fire, rather than their hardware interrupt numbers.
This means the task can be written to service multiple sources of interrupts
without code changes, only configuration changes. It also removes the need to
validate ownership of the interrupt in the kernel: since tasks can only specify
interrupts using their own notification masks, they can’t pick arbitrary IRQ
numbers that might be invalid or owned by someone else.
Once the interrupt is unmasked, pending
will cause the kernel ISR to fire,
preempting some task code. The only things the ISR does are:
-
Posts the requested notification to the handling task.
-
Masks the interrupt (clearing
enable
).
Once the handling task is waiting in receive
with the interrupt’s notification
bit included in its notification mask, and is the highest priority task that’s
ready to go, it will be woken with a notification about the interrupt. It would
normally respond to this by inspecting the hardware to figure out the details of
the event and the next steps required, and then using irq_control
to unmask
the interrupt again.
In simplified pseudocode, an interrupt handling task’s main loop looks something like this:
// This task only uses notification bit 0 for interrupts.
const MY_INTERRUPT: u32 = 1;
// We're willing to accept interrupts.
sys_irq_control(MY_INTERRUPT, true);
loop {
// Receive a message -- in this case, we're only
// expecting notifications.
let result = sys_recv_closed(
&mut [],
MY_INTERRUPT,
TaskId::KERNEL,
).unwrap();
if result.operation & MY_INTERRUPT != 0 {
do_interrupt_stuff();
// Unmask that interrupt so it can fire again.
sys_irq_control(MY_INTERRUPT, true);
}
}
4.4. Routing interrupts to tasks in the kernel
The kernel has a table of interrupt routing information, filled out at compile
time from the app.toml
. For each implemented interrupt, it stores two pieces
of information:
-
The index of the task that will handle the interrupt.
-
The notification set that should be posted to that task when the interrupt occurs.
Typically an SoC will have many interrupts that are not used by a given application. We currently store interrupt response information only for the interrupts that are being used, which makes looking up table entries slightly slower in exchange for saving a bunch of Flash. |
When an interrupt happens, it gets routed to a generic kernel ISR. The kernel
ISR will find the task named in the response record, and post the notification
set. The kernel then clears the interrupt’s enable
bit to prevent reoccurrence
until the task has a chance to respond.
As with any situation where the kernel posts notifications, the kernel exit path then checks to see if the notification has caused the scheduling situation to change — in this case, if the task handling the interrupt is higher priority than whatever task was running before, and is ready to receive it. If so, the kernel saves context for the interrupted task and switches to the handler task.
4.5. Kernel reserved interrupts
Some interrupts on some systems cannot be reasonably handled outside the kernel. In these cases, the kernel will handle them directly, and provide abstractions if necessary.
The main example here is the system tick timer that is used to maintain the kernel’s internal sense of time, but DMA controllers might also fall into this category.
5. Timers
In addition to fancier hardware timers, the microcontrollers we target tend to
have a single general-use timer that is portable across implementations and
silicon vendors — the SysTick
on ARM, the mtimer
on RISC-V. Hubris provides
a multiplexer for this timer, so that each task appears to have its own.
The time unit of the clock is selectable by the application, but in practice, we always select milliseconds. This chapter will refer to the clock unit as milliseconds.
5.1. Timestamp format
Hubris timestamps are from a monotonic realtime clock. Time is kept since kernel startup, which presumably corresponds to the most recent CPU reset or bootloader exit. Without outside information (such as data from an external RTC) there’s no way to map a Hubris timestamp to human wall time — they’re mostly used for relative delays.
When interacting with the kernel, timestamps are expressed as milliseconds in
u64
format. This means the timer will roll over after a bit more than 584
million years of continuous operation. The intent is that applications need not
concern themselves with timer wraparound, because reasoning about timer
wraparound is hard.
The kernel reserves the right to keep time in a different format internally.
5.2. Programmer’s model
Each task gets a timer. The timer has three properties:
-
An enable bit.
-
A deadline.
-
A notification set.
At periodic intervals (ticks), if the enable bit is set, the kernel checks the
deadline to see if it is in the future. If not (it is <=
the current time),
the kernel will
-
Clear the enable bit.
-
Post the notification set to the owning task. (Notifications are discussed in more detail in Notifications: the other IPC mechanism.)
The task will find out about this next time it enters recv
with the
notification bits unmasked — or immediately, if the task is already blocked in
recv
at the time the timer fires.
Because the enable bit is cleared when the timer fires, tasks can assume that setting their timer will result in exactly zero or one notification events.
If a task sets the timer notification set to 0 , it will not receive a
notification when the timer fires, but it could still poll the enable bit. We
haven’t had a use for this so far, but, now you know.
|
By default, when a task is initialized, its timer is set up as:
-
Enable bit clear.
-
Deadline
!0
(i.e. the distant future) -
Notification set
0
(i.e. no bits)
5.3. Timer control operations
5.4. Using the timer to implement sleep
The most common use of the task timer is to implement a delay. If this is all you’re doing — i.e. you don’t have several deadlines to juggle — then it’s fairly easy:
-
Choose a notification bit that isn’t used for other purposes.
-
Set the timer for the desired wake time.
-
Enter a closed receive from only the kernel’s TaskId, giving a notification mask with only the chosen bit set.
During this sleep, incoming messages will queue, and other notification bits will accumulate, but the task will only wake when the deadline is reached.
The userlib
crate provides an implementation of this using notification bit 31
in the userlib::hl
module.
5.5. Multiplexing your multiplexed timer
If a task needs to track multiple delays, it will need to maintain some in-memory data structure (such as a table or heap) tracking their deadlines. At any given time, the kernel-provided timer should be set to the lowest deadline. When it fires, take action and then load the next lowest. And so forth.
The multitimer
crate implements such a multiplexed timer.
6. Startup
This document describes how Hubris takes a CPU from reset to running your application. It is mostly architecture-neutral; any architecture-specific bits will be called out.
6.1. From reset to Rust
At reset, the processor runs a designated chunk of code, the reset handler. Hubris’s reset handler is responsible for making the world safe for Rust code. This means:
-
Setting up a stack pointer, if the hardware doesn’t do that for us;
-
Enabling any processor features or memory devices required to run the kernel — for example, if the device has an FPU, we need it turned on so we can configure it even though the kernel itself may not use floating point;
-
Ensuring that all initialized variables get initialized;
-
Jumping to
main
.
On ARMv6-M, ARMv7-M, and ARMv8-M this sequence is handled by the cortex_m_rt
crate, which in turn uses the r0
crate to set up Rust variables.
6.2. main
: bring your own
Currently, Hubris expects the application packager (i.e. you) to provide a
main
routine in the crate that builds the kernel. This gives you an
opportunity to do any setup that isn’t handled by the runtime startup, but needs
to happen before the kernel boots.
main
should do those things, and then call start_kernel
.
This same file is where you might declare interrupt service routines for any SoC-specific interrupts that you wish to handle outside Hubris’s standard mechanism — for cases where an interrupt is so latency-sensitive that you need to handle it with a privileged ISR, despite the safety implications. Hubris’s internal ISR symbols are weak, so, declaring an ISR in this file overrides them.
6.3. Starting the kernel
The kernel expects to be packaged in Flash with a table describing the tasks.
This table is defined in kconfig.rs
, which is generated at compile time
by the kernel’s build.rs
and included at the bottom of
sys/kern/src/startup.rs
.
The table consists of the following (all the types named here are defined in
the abi
crate):
-
An
App
header record describing the overall shape of things to come. -
One or more region descriptor (
RegionDesc
) records, carving up address space into regions with attributes. -
One or more task descriptor (
TaskDesc
) records describing tasks.
Region descriptors can technically be shared among tasks — task descriptors specify the regions they can access by index. Task descriptors contain the initial program counter and stack pointer values for the task, which will be loaded into those registers when the task first runs (or is restarted). Hubris will check that those pointers fall within some memory region, but other than that, the layout of your task memory regions and their roles is totally up to you. This means you could create an application with tasks that share a Flash code region, for instance, though, think carefully before doing so. |
start_kernel
reads the App
header and the task and region descriptors,
validates their integrity, and initializes bookkeeping information for each
task described in the task table. This data is allocated in static mut
arrays
HUBRIS_TASK_TABLE_SPACE
and HUBRIS_REGION_TABLE_SPACE
, which are declared
in the autogenerated kconfig.rs
. The few kernel global variables are not
placed here; it’s only used for stuff that is sized based on information found
in the app.toml
configuration.
Any extra RAM allocated to the kernel, but not used, is lost.
6.4. Starting the first task(s)
One of the fields in the task descriptor contains a START_AT_BOOT
flag. Any
task with this flag set will be initialized in Runnable
state; all others are
initialized in Stopped
state. (The START_AT_BOOT
flag in the descriptor
corresponds to the start = true
field in the app.toml
.)
As its last act during startup, the kernel scans the tasks looking for the
highest priority task marked START_AT_BOOT
. It then switches into that task,
and your application is running.
If the task descriptor table contains zero tasks marked START_AT_BOOT
, this
represents an application configuration error, and the kernel will panic.
7. Hubris Syscalls
Syscalls provide ways for tasks to invoke kernel code. Hubris has a very small set of syscalls, because syscalls — unlike IPC — have to be implemented in the kernel and are hard to proxy, so they form an ongoing ABI commitment.
In general, the following sorts of things are syscalls (exhaustive):
-
IPC primitives
-
Send, Receive, Reply
-
Access to memory borrowed from senders
-
Looking up the correct generation number for a task
-
-
Access to the multiplexed per-task timer
-
Control of the current task’s interrupt mask
-
Crashing the current task
And the following sorts of things are not (not exhaustive):
-
Checking the state or fault information of a task
-
Starting/restarting a task
-
Forcing faults on other tasks
This is because the things in the first category are universal, or nearly so, while the things in the second category are normally used only in special cases, usually by the supervisor task.
In general, syscalls should represent operations that every task should have access to, while anything privileged or sensitive should be done through IPC.
This doesn’t mean all these things need to be done outside the kernel, though. We have an escape hatch, in the form of messages to the virtual “kernel task.” The kernel IPC interface is the topic of the next chapter.
7.1. Syscall ABI
Syscalls are invoked using the architecture’s supervisor-call instruction or equivalent.
We’ll describe syscalls in an architecture-independent manner below by referring to abstract argument and return slots instead of register names. Syscalls have seven argument slots and eight return slots.
We assume that all registers are 32 bits wide.
7.1.1. ARMv6-M / ARMv7-M / ARMv8-M
Syscalls are invoked using the SVC
instruction. The 8-bit immediate in the
instruction is ignored, because reading it from user text is potentially
sketchy.
Syscalls provide for up to 7 arguments and 8 return values in registers. Syscalls never use arguments from the stack, to make it easier to reason about possible memory management faults during syscall entry (i.e. now there aren’t any).
Arguments to syscalls are passed in r4
through r10
, with the syscall index
in r11
.
Return values from syscalls are returned in r4
through r11
.
You’re probably wondering why we’re using weird registers instead of the
standard calling convention, which would pass things in r0 through r3 . It
comes back to the point above about stack accesses. The ARMvX-M hardware stores
r0 through r3 on the user stack on entry to a syscall, and we don’t want
to have to read it back from there. r4 through r11 , on the other hand, are
treated as callee-save, and our syscall entry sequence saves them into the TCB,
where we can refer to them as needed.
|
This calling convention is somewhat awkward on ARMv6-M, where the
registers above r7 are second-class. So it goes.
|
7.1.2. RISC-V
Syscalls are invoked using the ECALL
instruction. The rest is TBD.
7.2. Syscalls
7.2.1. SEND
(0)
Sends a message.
The error-free path:
-
Identifies the desired recipient.
-
Transfers a message (0+ bytes) from an outgoing slice in your task’s memory, into an incoming slice in the recipient’s memory.
-
Waits until the recipient calls
REPLY
. -
During this time, allows the recipient to access your task’s memory, subject to the rules laid out in the lease table.
-
Once
REPLY
happens, transfers the reply from the recipient’s memory into the reply buffer slice in your task’s memory. -
Resumes your task.
Arguments
-
0: packed target and operation.
-
Bits 31:16: target task ID (split into index and generation per the constants in the
abi
crate). -
Bits 15:0: operation code (application defined).
-
-
1: Base address of outgoing message.
-
2: Length of outgoing message, in bytes.
-
3: Base address of buffer where a reply should be deposited.
-
4: Size of reply buffer, in bytes.
-
5: Base address of lease table.
-
6: Number of leases in lease table.
Lease table layout
Each lease is 12 bytes in size and must be 4-byte aligned. A lease is equivalent to the following Rust struct:
#[repr(C)]
struct Lease {
attributes: u32,
base_address: usize,
length: usize,
}
const ATT_READ: u32 = 1 << 0;
const ATT_WRITE: u32 = 1 << 1;
-
attributes
can specify that a lease can be read from, written to, or both. Any use of undefined attribute bits will cause a fault. -
base_address
is a byte-aligned address. If this points to memory your task can’t access, it will cause a fault. -
length
is the length of the leased memory region in bytes.
Return values
-
0: response code (application defined with caveat below).
-
1: length of reply deposited into reply buffer.
Faults
Most things that can go wrong with SEND
are programming errors, and will cause
your task to be immediately faulted instead of returning a code.
Condition | Fault taken |
---|---|
Recipient forbidden by your task’s (static) IPC mask. |
|
Recipient task index greater than the (static) number of tasks in the entire system. |
|
Any slice invalid (e.g. it would wrap the end of the address space). |
|
Lease table slice misaligned. |
|
Outgoing slice or lease table are memory you can’t actually read. |
|
Reply buffer slice is memory you can’t actually write. |
|
Notes
Target and operation are packed into a single word because we’re out of useful registers on ARM. This currently limits operation codes to 16 bits. We might revisit this later.
For all slices (outgoing message, reply buffer, lease table), if the count is zero, the base address won’t be dereferenced and can be illegal. In particular, it’s okay to pass address 0 for empty slices.
If the slices are not zero length, however, the kernel will check them against your task’s memory map, and your task will be faulted if anything is amiss.
Slices are accessed by the kernel only while your task is blocked in SEND
,
so passing a slice to the kernel here can be done safely (in the Rust sense).
The reply buffer slice must be an &mut
, but the others can be &
.
The lease table slice must be 4-byte aligned. The others can be arbitrarily aligned.
Response codes are application defined except for one subtlety: dead codes. The kernel will deliver a dead code in two situations:
-
SEND to a task with the wrong generation, suggesting that the recipient has restarted without the sender noticing.
-
If the recipient crashes while the sender is waiting — either waiting to transfer the initial message, or waiting for the reply.
Dead codes have their top 24 bits set (that is, 0xFFFF_FF00
). In the bottom 8
bits, the kernel returns the current generation number of the peer, so that
the caller can correct their records.
It is possible to fake a dead task by deliberately sending a response code in the dead code range — because it didn’t seem useful to spend cycles filtering this out.
7.2.2. RECV
(1)
Receives a pending message or notification.
The error-free path:
-
Blocks until some number of tasks are ready to send to your task.
-
Picks the highest priority one.
-
Transfers its message into memory you’ve designated.
-
Keeps the sending task blocked.
-
Returns information describing the message to your task.
If the provided notification mask is not zero, the receive operation may be
interrupted by a notification message from the kernel instead. This happens
if any of the notification bits specified in the mask (by 1 bits) have been set
on the calling task. When RECV returns, you can distinguish these notification
messages because they have the kernel’s virtual task ID 0xFFFF
as the message
sender.
Closed vs Open RECV
One argument to RECV determines whether to accept messages from any sender, or to only accept messages from one. Accepting messages from any sender is called an “open” receive, while only listening for one sender is “closed.”
During an open receive, a task may receive messages sent by any other task, plus any notifications enabled by the notification mask.
During a closed receive, a task will receive messages only from the chosen task. The task will still receive any notifications set in its notification mask.
To listen only for notifications, a task can perform a closed receive against
the kernel’s task ID, 0xFFFF
.
Arguments
-
0: Address of a buffer where received messages should be written.
-
1: Number of bytes in that buffer.
-
2: Notification mask to apply during this receive.
-
3: Sender filter for open vs closed receive.
-
Bit 31: 0=open, 1=closed
-
Bits 30:16: reserved
-
Bits 15:0: TaskId if closed, ignored if open.
-
Return values
-
0: always 0 for open receive; closed receive may also return a dead code (see
SEND
) to indicate that the chosen peer has died. -
1: Task ID of the sender (generation in 15:12, ID in 11:0).
-
2: Operation code used by sender. (Or notification bits, if the sender is the kernel.)
-
3: Length of message sent, in bytes. This may be longer than the buffer provided by the caller, which indicates that the message was truncated.
-
4: Number of bytes of room the caller has provided for the reply message.
-
5: Number of leases provided with message.
Faults
Most things that can go wrong with RECV
are programming errors, and will cause
your task to be immediately faulted instead of returning a code.
Condition | Fault taken |
---|---|
Receive buffer slice invalid (i.e. would wrap the end of the address space). |
|
Receive buffer slice is memory you can’t actually write. |
|
Notes
It’s legal to specify a zero-length receive buffer, if the messages you’re expecting consist only of the operation code or notification bits. In this case, the base address is ignored and may be invalid or null.
If the sender sent a message longer than your receive buffer, you will get the prefix of the message, and the returned response length will give the actual length. This means you should check the response length against your buffer length to detect truncation.
Leases received with the message are referenced with the combination (TaskID,
lease number). Lease numbers range between 0 and one less than the received
lease count, as you’d expect. Leases are only valid until the sending task
unblocks, which normally happens only when you REPLY
, but could also occur as
a result of an asynchronous restart from the supervisor.
The notification mask is provided anew with each receive because the RECV
callsite has a clear idea of which notifications it can handle. Plus, it saves a
syscall during the common pattern of updating the mask and then receiving.
RECV
is called RECV
because Cliff can’t spell “recieve” reliably.
7.2.3. REPLY
(2)
Replies to a received message.
If all goes well, this copies a slice of data from your task’s memory into the caller’s memory and resumes the caller.
Arguments
-
0: Task ID of sender we’re replying to.
-
1: Response code to deliver.
-
2: Base address of reply message.
-
3: Number of bytes in reply message.
Return values
REPLY
doesn’t return anything, but should be treated as clobbering return
registers 0 and 1 for future compatibility.
Faults
There is only one way to break REPLY
, and that’s with a bogus slice.
Condition | Fault taken |
---|---|
Outgoing buffer slice invalid (i.e. would wrap the end of the address space). |
|
Outgoing buffer slice is memory you can’t actually read. |
|
Reply message is longer than recipient requested. |
|
Notes
It might strike you as odd that REPLY
doesn’t return any status. This is a
subtle decision, and has to do with what servers will do if their clients
“defect” or crash before reply (generally: nothing).
Reply messages can be zero-length, in which case the base address of the slice is ignored. Often, the response code is enough.
RECV
delivers the size of the caller’s response buffer, so your task has
sufficient information to not overflow it. This is why doing so is a fault: it’s
a programming error.
7.2.4. SET_TIMER
(3)
Configures your task’s timer.
Arguments
-
0: Enable (1) or disable (0) flag.
-
1: Low 32 bits of deadline.
-
2: High 32 bits of deadline.
-
3: Notification bitmask to post when timer expires.
Return values
None. All registers preserved.
Faults
None.
Notes
The notification bitmask will be delivered into your task’s notification set
when the kernel time becomes equal to or greater than the given deadline, if the
timer is enabled. Configuring the timer with an enabled deadline that is already
in the past delivers the notification immediately (though you won’t notice until
you RECV
).
The time unit for deadlines is not currently specified — it’s currently an abstract “kernel ticks” unit. This will be fixed.
7.2.5. BORROW_READ
(4)
Copies data from memory borrowed from a caller (a “borrow”).
Arguments
-
0: TaskId of lender.
-
1: Lease index for that lender.
-
2: Offset within the borrowed memory to start reading.
-
3: Base address of slice in your memory space to deposit data.
-
4: Length of slice in bytes.
Return values
-
0: response code: zero on success, non-zero if something went wrong on the sender side.
-
1: on success, number of bytes copied.
Faults
TBD
Notes
This provides “file-like” access to memory borrowed from other tasks, rather than direct memory-mapped access, and that’s for a good reason: the other task may potentially be restarted at any time. In the event that the peer restarts while you’re working with one of its borrows, you’ll get an error return code and can clean up — whereas if you were directly accessing its memory, we’d have no choice but to deliver a fault to stop you. That would give clients the opportunity to induce faults in shared servers, which would be bad.
7.2.6. BORROW_WRITE
(5)
Copies data into memory borrowed from a caller (a “borrow”).
Arguments
-
0: TaskId of lender.
-
1: Lease index for that lender.
-
2: Offset within the borrowed memory to start writing.
-
3: Base address of data (in your memory space) to transfer.
-
4: Length of data in bytes.
Return values
-
0: response code: zero on success, non-zero if something went wrong on the sender side.
-
1: on success, number of bytes copied.
Faults
TBD
Notes
This provides “file-like” access to memory borrowed from other tasks, rather than direct memory-mapped access, and that’s for a good reason: the other task may potentially be restarted at any time. In the event that the peer restarts while you’re working with one of its borrows, you’ll get an error return code and can clean up — whereas if you were directly accessing its memory, we’d have no choice but to deliver a fault to stop you. That would give clients the opportunity to induce faults in shared servers, which would be bad.
7.2.7. BORROW_INFO
(6)
Collects information about one entry in a sender’s lease table.
Arguments
-
0: TaskId of lender.
-
1: Lease index for that lender.
Return values
-
0: response code: zero on success, non-zero if something went wrong on the sender side.
-
1: attributes field (see
SEND
for definition of lease table attributes). -
2: length in bytes
7.2.8. IRQ_CONTROL
(7)
Arguments
-
0: notification bitmask corresponding to the interrupt
-
1: desired state (0 = disabled, 1 = enabled)
Return values
None.
Faults
Condition | Fault taken |
---|---|
The given notification bitmask is not mapped to an interrupt in this task. |
|
Notes
It might seem strange that this syscall has tasks refer to interrupts using their notification bits. However, this is quite deliberate, for two reasons:
-
It gives tasks a consistent semantic model. When an interrupt goes off, they see a notification in bit X; when they want to re-enable that interrupt, they request enabling on bit X. There is no separate “IRQ number” to configure; that’s left to the application-level config file.
-
It makes it impossible for a task to mess with other tasks' interrupts, since it can only refer to its own mapped interrupts, by construction.
7.2.9. PANIC
(8)
Delivers a Panic
fault to the calling task, recording an optional message.
This is roughly equivalent to the Rust panic!
operation and is used in its
implementation.
Arguments
-
0: base address of 7-bit ASCII panic message
-
1: length of panic message in bytes
Return values
Does not return.
Faults
This produces a Panic
fault every time — that’s its purpose.
Notes
The kernel does not interpret the panic message in any way, but may be made available to the supervisor if it asks.
7.2.10. GET_TIMER
(9)
Reads the contents of the task’s timer: both the current time, and any configured deadline.
Arguments
None.
Return values
-
0: low 32 bits of kernel timestamp.
-
1: high 32 bits of kernel timestamp.
-
2: 0=no deadline set, 1=deadline set.
-
3: low 32 bits of deadline, if set.
-
4: high 32 bits of deadline, if set.
-
5: notifications to post when deadline reached.
Faults
None.
Notes
The timestamp is defined as being CPU-wide, consistent for all tasks, so the result of this syscall can be meaningfully sent to other tasks on the same CPU. (Behavior in multicore situations is not yet defined.)
The time unit is not currently specified — it’s currently an abstract “kernel ticks” unit. This will be fixed.
7.2.11. REFRESH_TASK_ID
(10)
Given a task ID that may have the wrong generation, produces a corrected task ID with the target task’s current generation.
This is intended for two use cases:
-
Initially contacting a task. In this case, the generation can be arbitrary and is usually given as zero.
-
Recovering from a peer task crashing. In this case, hand in your previously valid TaskId to redeem it for a new one.
Arguments
-
0: task ID (in low 16 bits)
Return values
-
0: task ID (in low 16 bits), top 16 bits zeroed
Faults
Condition | Fault taken |
---|---|
Recipient task index greater than the (static) number of tasks in the entire system. |
|
7.2.12. POST
(11)
Accumulates a set of notification bits into another task’s notification word using bitwise OR. This enables a simple inter-task asynchronous communication mechanism. See Notifications: the other IPC mechanism for more information on the mechanism.
Arguments
-
0: task ID (in low 16 bits)
-
1: bits to OR in
Return values
-
0: zero on success, dead code on generation mismatch.
Faults
Condition | Fault taken |
---|---|
Recipient task index greater than the (static) number of tasks in the entire system. |
|
Notes
If the task generation is wrong, the caller will receive a dead code (see Death and IPC) and no notification will be posted.
If the task being notified is higher priority, and the notification causes it to wake, control will immediately transfer to the higher priority task. This will be returned as “success” to the caller, because the notification was successfully delivered, even if the higher priority task subsequently crashes before the caller gets another chance to run.
7.2.13. REPLY_FAULT
(12)
Like REPLY
, this resumes a task that is blocked waiting for a reply from the
invoking task. Unlike REPLY
, this does not set the task runnable, and instead
marks it as faulted by a recognizable code.
Arguments
-
0: task ID (in low 16 bits)
-
1:
ReplyFaultReason
value (seeabi
crate)
Return values
REPLY_FAULT
doesn’t return anything, but should be treated as clobbering
return registers 0 and 1 for future compatibility.
Faults
Condition | Fault taken |
---|---|
Designated task index greater than the (static) number of tasks in the entire system. |
|
|
|
Notes
Like REPLY
, this syscall just silently ignores replies to the wrong
generation, under the assumption that the task got restarted for some reason
while we were processing its request. (It can happen.)
7.2.14. IRQ_STATUS
(13)
Returns the current status of interrupts mapped to the calling task.
Arguments
-
0: notification bitmask corresponding to the interrupt(s) to query
Return values
-
0: an
IrqStatus
(see theabi
crate) describing the status of the interrupts in the notification mask. Currently, the following bits inIrqStatus
are significant:-
0b0001
: set if any interrupt in the mask is enabled -
0b0010
: set if an IRQ is pending for any interrupt in the mask -
0b0100
: set if a notification has been posted to the caller but not yet consumed
-
Faults
Condition | Fault taken |
---|---|
The given notification bitmask is not mapped to an interrupt in this task. |
|
Notes
As discussed in the notes for the IRQ_CONTROL syscall, tasks refer to interrupts using their notification bits.
If the provided notification mask is zero, the syscall will return a NoIrq
fault. If the provided notification mask has multiple bits set, the returned
IrqStatus
value will be the boolean OR of the status of all interrupts in the
map (e.g. if any interrupt in the mask is pending, the PENDING
bit will be
set, and so on).
8. The Kernel IPC Interface
Hubris provides syscalls for operations that are used by every (or nearly every) task, and uses IPC for everything else. But IPC is used to talk between tasks — what about operations that require the kernel to be involved?
We still use IPC, we just do it with the kernel.
Hubris has a concept called the virtual kernel task. The virtual kernel task is not a real task — it’s an illusion created by the kernel. Any messages sent to the virtual kernel task will be processed in the kernel itself and replied to directly.
The kernel answers to the task ID TaskId::KERNEL
in Rust, which has the
numeric value 0xFFFF
. We chose this number so that the task index portion of
it (in the 10-12 LSBs) would be larger than any valid task index, so there’s no
risk of it colliding with an actual task.
8.1. Sending messages to the kernel
The kernel accepts normal IPC messages sent to TaskId::KERNEL
. Currently, none
of the messages accepted by the kernel make use of leases/borrows, but that’s
just because they haven’t been useful yet.
The kernel makes a special guarantee: it will respond to IPCs synchronously, without blocking the sender. This means it is safe to send IPCs to the kernel from even the highest-priority tasks — in fact, for the highest priority task (normally the supervisor), the kernel is the only place it can send messages safely.
The kernel does not reply to all messages, however. Like making a bogus syscall, if you send a message to the kernel that uses an out of range selector or has a very malformatted body, the kernel will deliver a fault to your task and move on to the next highest priority runnable code.
Messages sent to the kernel, and responses sent back, are formatted using the
ssmarshal
crate via serde
. Messages and responses are currently defined as
Rust structs, simply because we haven’t had a need to write a supervisor task in
C so far. We may wish to formalize this later, but, this is expedient for now.
Remember when reading the request/response types that it is the ssmarshal
serialized form, and not the in-memory layout of the struct, that is exchanged
with the kernel.
The userlib::kipc
module provides wrapper functions for these IPCs for
programs written in Rust.
It is our intent to restrict kernel IPC sends to “privileged” tasks — likely just the supervisor task. We haven’t implemented this yet for every entry point, though some IPC entry points explicitly limit themselves to the supervisor. |
8.1.1. read_task_status
(1)
Reads out status information about a task, by index. This is intended to be used for task management purposes by the supervisor — because tasks can restart whenever, the supervisor generally doesn’t want to concern itself with getting generation numbers right.
Request
struct TaskStatusRequest {
task_index: u32,
}
Preconditions
The task_index
must be a valid index for this system.
Response
type TaskStatusResponse = abi::TaskState;
Notes
See the abi
crate for the definition of TaskState
that matches your kernel.
Here is a representative example at the time of this writing:
pub enum TaskState {
/// Task is healthy and can be scheduled subject to the `SchedState`
/// requirements.
Healthy(SchedState),
/// Task has been stopped by a fault and must not be scheduled without
/// intervention.
Faulted {
/// Information about the fault.
fault: FaultInfo,
/// Record of the previous healthy state at the time the fault was
/// taken.
original_state: SchedState,
},
}
pub enum FaultInfo {
StackOverflow { address: u32 },
// other fault cases go here
}
pub enum SchedState {
/// This task is ignored for scheduling purposes.
Stopped,
/// This task could be scheduled on the CPU.
Runnable,
/// This task is blocked waiting to deliver a message to the given task.
InSend(TaskId),
/// This task is blocked waiting for a reply from the given task.
InReply(TaskId),
/// This task is blocked waiting for messages, either from any source
/// (`None`) or from a particular sender only.
InRecv(Option<TaskId>),
}
8.1.2. reinit_task
(2)
Reinitializes a task, chosen by index, and optionally starts it running.
This is valid in any task state, and can be used to interrupt otherwise uninterruptible operations like the SEND syscall.
A successful call to reinit_task
has the following effects:
-
The targeted task is forced out of whatever state it was in, and left in either the
Stopped
(ifstart
isfalse
) orRunnable
(ifstart
istrue
) state. -
The task’s generation number is incremented.
-
The task’s registers are reset (to particular values where necessary, and to zero otherwise) and the stack erased.
-
The task’s interrupts are disabled and its timer is stopped.
-
Any other tasks that were blocked in IPC with the targeted task (either waiting to deliver a message, waiting for a reply to a delivered message, or waiting to receive) are interrupted and given a dead code to indicate that the IPC will never complete.
Request
struct ReinitRequest {
task_index: u32,
start: bool,
}
Preconditions
The task_index
must be a valid index for this system.
Response
type ReinitResponse = ();
Notes
If a task asks to reinit itself, the kernel mumbles “alright, your funeral”
and reinits the caller. Given that reinit_task
is intended to be restricted to
the supervisor, and the supervisor can’t panic!
to restart without taking out
the system, this seemingly weird move may actually prove useful.
Reinitialization does not write over the task’s memory except for the stack. Tasks are responsible for (say) setting up their data/BSS areas on start. This is explicitly intended to allow tasks to keep some information from “past lives” if required.
8.1.3. fault_task
(3)
Forces a task into a Faulted
state. Specifically, this will set the task’s
fault to FaultInfo::Injected(caller)
, where caller
is the TaskId of the task
that called fault_task
(i.e. you). This means that a fault caused by
fault_task
is both easily distinguished from any other fault, and traceable.
fault_task
immediately prevents the targeted task from running, and any
other tasks that were blocked in IPC with the targeted task are interrupted
and given a dead code to indicate that the IPC will never complete.
Request
struct FaultRequest {
task_index: u32,
}
Preconditions
The task_index
must be a valid index for this system.
Response
type FaultResponse = ();
Notes
As with reinit_task
, it is possible for a task to use fault_task
to fault
itself. This is an odd thing to do.
On faults, the kernel tries to save the pre-fault state of the task. However, if
you apply fault_task
to an already-faulted task, the task will be marked as
double-faulted and the previous fault will be replaced with the new injected
fault.
8.1.4. read_image_id
(4)
8.1.5. reset
(5)
8.1.6. get_task_dump_region
(6)
A dump region is an area of memory for a specified task that can be
read by the supervisor for purpose of creating a memory dump for debugging.
For a specified task and region index, get_task_dump_region
will return the details of the dump region, if any. This entry point
is only present if the kernel’s dump
feature is enabled.
Request
type GetTaskDumpRegionRequest = (u32, u32);
Preconditions
The task index (GetTaskDumpRegionRequest.0
) must be a valid task index. The
dump region index (GetTaskDumpRegionRequest.1
) should denote the region of
interest.
Response
struct TaskDumpRegion {
base: u32,
size: u32,
}
type GetTaskDumpRegionResponse = Option<TaskDumpRegion>;
Notes
For the specified task index, this will return the dump region specified by
the dump region index. If the dump region index is equal to or greater
than the number of dump regions for the specified task, None
will
be returned.
8.1.7. read_task_dump_region
(7)
For a given task and task dump region, this will read the specified region and
return its contents. The region should be entirely contained by a region that
has been returned by a call to get_task_dump_region
but is otherwise
unconstrained. This entry point is only present if the kernel’s dump
feature is enabled.
Request
struct TaskDumpRegion {
base: u32,
size: u32,
}
type ReadTaskDumpRegionRequest = (u32, TaskDumpRegion);
Preconditions
The task index (ReadTaskDumpRegionRequest.0
) must be a valid task index.
The dump region should be entirely contained by a region that has been
returned by a call to get_task_dump_region
for the specified task.
Response
A copy of the memory referred to by the specified region, starting
at base
and running for size
bytes.
8.2. Receiving from the kernel
The kernel never sends messages to tasks. It’s simply not equipped to do so. However, it is legal to enter a closed receive from the kernel. This might be counter-intuitive — since the kernel will never send a message for you to receive, it sure sounds like a programming error, and Hubris as a rule tries to turn obvious programming errors into crashes.
Receiving from the kernel is deliberately allowed to enable two use cases:
-
Blocking the current task until a notification arrives while ignoring all incoming messages. By receiving from the kernel’s task ID with a non-zero notification mask, the current task will wait until any matching notification arrives.
-
Halting the current task. If you really want to stop the current task forever (or at least, until the supervisor reinits it), you can receive from the kernel with no notification mask bits set.
We haven’t needed that second one in practice, so we might make it an error someday. The first one, on the other hand, is useful. |
9. Application Notes
9.1. Servers
A server in Hubris is any task that receives messages to implement some API. This section looks at how servers work, how to implement one using low and high level APIs, and provides some tips.
9.1.1. The role of a server
Servers normally spend most of their time hanging out in RECV. This ensures that they’re ready to handle incoming messages.
In the simplest case, after doing any initialization required on startup, a server will:
-
RECV to collect a request.
-
Inspect the request and figure out what needs to be done.
-
Do it.
-
Reply.
-
Repeat.
This simple version covers most servers on Hubris, believe it or not. All the complexity, and application-specific logic, is hidden in the "do it" step.
9.1.2. Servers are clients too
The vast majority of servers need to send messages to other servers to do their jobs. Most servers will turn a single incoming client message into a sequence of messages to other servers to perform useful work.
When designing a collection of servers in an application, remember that it’s only safe to send messages to higher priority servers (called the "uphill send rule"). Sending messages to lower priority servers can cause starvation and deadlock.
The kernel will enforce this, eventually. |
9.1.3. When not to use a server
Servers are tasks. Tasks are relatively expensive — they require separate code and data storage and stack space. When designing an API consider whether it should be a server task — or just a crate.
You may want a server if any of these are true:
-
There will be several client tasks, particularly if it’s important for only one of them to be performing an operation at a time (mutual exclusion).
-
The implementation needs to do something clever or
unsafe
, such that you want it isolated in memory away from other tasks. -
You need the code to be able to crash and restart separately from other code.
-
You need to have multiple concurrent state machines responding to messages and notifications. (This is hard to do inside another task.)
Signs that you may just want a crate:
-
This task and another task (or a whole group of tasks!) will never be runnable at the same time. For instance, only one device driver on an I2C bus can be using the bus at any given time. (See the section on drivers, below.)
-
There will be a single client, or there will be multiple clients but the code is fairly small and no mutual-exclusion is required.
-
You don’t expect crashes and can return
Err
for failures. -
You’re not being weird with
unsafe
.
9.1.4. Low-level (syscall) implementation
Here is a full implementation of a server for a very simple IPC protocol: it maintains a 32-bit integer, and can add or subtract values and return the result.
This implementation uses syscalls directly and no abstractions, to show you exactly what’s happening under the hood. In practice, we rarely write servers this way — the next section shows a higher-level equivalent.
This server supports two messages, add
(0) and sub
(1). Both messages expect
a four-byte payload, which is a u32
in little-endian byte order. On success,
the messages update the server state the_integer
and return the new value, as
another four-byte little-endian integer.
#![no_std]
#![no_main]
use userlib::{sys_recv_open, sys_reply};
enum Errs {
BadMsg = 1,
}
#[export_name = "main"]
pub fn main() -> ! {
let mut the_integer: u32 = 0; (1)
let mut msg = [0; 4]; (2)
loop {
let msginfo = sys_recv_open(&mut msg, 0); (3)
match msginfo.operation { (4)
0 => {
// Add
if msginfo.message_len == 4 { (5)
// yay!
the_integer = the_integer.wrapping_add(
u32::from_le_bytes(msg)
);
sys_reply(msginfo.sender, 0, &the_integer.to_le_bytes());
} else {
sys_reply(msginfo.sender, Errs::BadMsg as u32, &[]);
}
}
1 => {
// Subtract
if msginfo.message_len == 4 {
// yay!
the_integer = the_integer.wrapping_sub(
u32::from_le_bytes(msg)
);
sys_reply(msginfo.sender, 0, &the_integer.to_le_bytes());
} else {
sys_reply(msginfo.sender, Errs::BadMsg as u32, &[]);
}
}
_ => { (6)
// Unknown operation
sys_reply(msginfo.sender, Errs::BadMsg as u32, &[]);
}
}
}
}
1 | This is the server’s local state. It’s common for servers to keep their
state on the stack, but larger state might be better placed in a static . |
2 | The server maintains a 4-byte buffer for incoming messages. This means that any longer message will be truncated. |
3 | The server uses sys_recv_open to accept messages from any caller. The
notification mask is 0, ensuring that we won’t get any notifications instead of
messages. |
4 | The operation code distinguishes the operations we implement, so we
match on it. |
5 | It’s important to check message_len , since clients can send a message that
is too short or too long. Too-long messages get truncated, but message_len
will be honest, so if the message_len here were 6, we’d know the client sent a
truncated message. |
6 | Clients can choose any operation code they want, so we need to make sure to have a default case to signal errors. |
9.1.5. High-level (wrapper) implementation
The userlib::hl
module provides wrappers for common patterns in server
implementation. Here’s the same server from the last section, rewritten using
the hl
conveniences.
#![no_std]
#![no_main]
use userlib::{hl, FromPrimitive}; (1)
use zerocopy::AsBytes;
#[derive(FromPrimitive)]
enum Op { (2)
Add = 0,
Sub = 1,
}
enum ResponseCode { (3)
// Note: code 1 is produced by hl
BadArg = 2,
}
impl From<ResponseCode> for u32 { (4)
fn from(rc: ResponseCode) -> Self {
rc as u32
}
}
#[export_name = "main"]
pub fn main() -> ! {
let mut the_integer: u32 = 0; (5)
let mut argument = 0u32; (6)
loop {
hl::recv_without_notification( (7)
argument.as_bytes_mut(), (8)
|op, msg| -> Result<(), ResponseCode> { (9)
let (msg, caller) = msg.fixed::<u32, u32>() (10)
.ok_or(ResponseCode::BadArg)?; (11)
match op { (12)
Op::Add => the_integer.wrapping_add(argument),
Op::Sub => the_integer.wrapping_sub(argument),
}
caller.reply(the_integer); (13)
Ok(()) (14)
},
);
}
}
1 | The userlib::hl module provides these utilities for implementing clients
and servers, and is intended to be imported as hl like this, so references to
it in the file are prefixed with hl:: . We also import the FromPrimitive
derive macro for our Op enum below. |
2 | We now describe the possible operation codes using an enum. Any operation outside this set will automatically generate an error reply to the client. |
3 | Errors are still described in an enum, but hl directly supports this as
long as we provide a From impl for u32 . We skip code 1 as it’s used by hl
to indicate an illegal operation code. |
4 | Here’s our impl. It’s unfortunate that Rust can’t derive this, but, it can’t. |
5 | Server state is still kept on the stack as a u32 . |
6 | This is our incoming argument buffer. Since all incoming messages use the
same argument type, u32 , hl lets us use it directly instead of dealing in
byte arrays. |
7 | recv_without_notification wraps up the open receive pattern used by most
servers. |
8 | We pass the argument buffer in using zerocopy::AsBytes . |
9 | This closure handles messages. The op parameter is automatically converted
to the Op enum by hl . |
10 | The fixed operation requires that the argument exactly match the size of
its first type (here, u32 ), wrapping up the common case where arguments are
fixed-size. |
11 | If we can’t parse the message as a u32 we bail with BadArg . hl is
designed so we can use ? to signal errors here. |
12 | And now, we match on the operation code. We no longer need a default
case, as hl has already filtered out unknown codes. |
13 | The caller type returned from fixed has a convenient reply operation
that also checks that the types match. |
14 | And, we’re done. |
9.1.6. API wrapper crates
It’s polite to provide a wrapper crate that turns your server’s IPC API into a Rust API. We write these by hand at the moment, since we don’t have any sort of IDL. The general pattern is:
-
Create a crate ending in
-api
, e.g. for thefnord
service it would befnord-api
by convention. -
Implement a "server handle" type that wraps your server’s
TaskId
and represents the server. -
Provide operations on that type that correspond to IPCs, or combinations of IPCs.
The wrapper crate should not depend on the server implementation crate. This may require moving types around.
One of the decisions wrapper crates must make is how to handle server death — that is, what if the server crashes while the client is talking to it, or between messages? There are three common ways to respond.
-
Crash. If the client and server are engaged in some sort of stateful protocol, the client may not be able to recover from a server restart, and want to restart itself in response. This effectively propagates the crash out through a tree of dependent tasks, putting them all back in a known-good state.
-
Retry. If the request to the server is idempotent, the client may just want to update their TaskId to the server’s new generation and re-send. (That’s what the demo below does.)
-
Return an error. This lets the caller decide whether to retry. In practice, a lot of callers will
unwrap
this error, which is a sign that the wrapper crate should have chosen approach #1.
Here is a wrapper crate for the server presented earlier in this chapter, expressed entirely using low-level Hubris API, under the assumption that we just want to retry on server restart:
#![no_std]
use abi::TaskId;
use core::cell::Cell;
use userlib::sys_send;
use zerocopy::AsBytes;
enum Op { (1)
Add = 0,
Sub = 1,
}
pub struct IntServer(Cell<TaskId>); (2)
impl IntServer {
pub fn new(tid: TaskId) -> Self {
Self(Cell::new(tid))
}
/// Adds `value` to the server's integer, returning the new
/// integer.
pub fn add(&self, value: u32) -> u32 {
self.send(Op::Add, value)
}
/// Subtracts `value` to the server's integer, returning the
/// new integer.
pub fn sub(&self, value: u32) -> u32 {
self.send(Op::Sub, value)
}
// Common implementation bit of add and sub, which
// differ only in Op
fn send(&self, op: Op, value: u32) -> u32 {
let mut response = 0u32;
loop { (3)
let target = self.0.get();
let (code, response_len) = (4)
sys_send(target, op, value.as_bytes(), response.as_bytes_mut());
if code == 0 && response_len == 4 {
return response; (5)
} else if Some(g) = abi::extract_new_generation(code) {
// The int server has crashed, let's just retry. (6)
self.0.set( (7)
TaskId::for_index_and_gen(target.index(), g)
);
} else {
panic!(); (8)
}
}
}
}
1 | This duplicates the Op enum from the server, and could be shared with some
rearranging. |
2 | Clients will manipulate an IntServer as a sort of "handle" to the server,
hiding a TaskId that they need not concern themselves with. |
3 | The send implementation is in a loop so that it can retry until it succeeds. |
4 | Here we send a message to what we believe is the right TaskId , though we
may find out otherwise shortly… |
5 | A 0 return code means success — the easy path. |
6 | abi::extract_new_generation is a function for analyzing "dead codes"
received over IPC. If a result value indicates peer death, it will return
Some(gen) where gen is the peer’s new generation number after restart. |
7 | Here, we update our internal state to keep track of the correct server generation. |
8 | It may surprise you to see panic! here. More on this below. |
Now, notice that the server can generate error codes, such as BadArg
if the
buffers are the wrong size, but the client doesn’t have any representation for
them. This is deliberate. In the case of the integer server protocol, all
potential errors returned from IPCs represent programming errors in the
client:
-
Use of an undefined operation code like 3 or 119,
-
Sending a too-small or too-big message, or
-
Providing the wrong size of response buffer.
In the first two cases the server will return a non-zero response code; in the
last case, it will succeed, but the response_len
will show that our response
was truncated. Either case represents a mismatch between the wrapper crate and
the server, and the normal thing to do in such situations on Hubris is to
panic!
.
9.1.7. Pipelining
The server loop described above handles a single request at a time. Things become more complex if the server wants to be able to handle multiple requests concurrently. In that case, the reply step is delayed until the work actually completes, so the server may RECV another message before replying to the first.
For each incoming request, the server needs to record at least the caller’s Task ID, so that it can respond. In practice, the server will also need to record some details about each request, and some information about the state of processing. While it’s nice to pretend that we can resize buffers forever, that’s simply not the environment we work in. Eventually, the server’s internal storage for this sort of thing will fill up. At this point, the server should finish at least one outstanding request before doing another RECV.
Typically, a pipelined server will keep information about outstanding requests
in a table. The maximum size of that table is dictated by the number of
potential clients. If the server has specific knowledge of this number in the
application, it can use that to size the table — or it be conservative and set
the size of the table to hubris_num_tasks::NUM_TASKS
, the number of tasks in
the system. Such a table should never overflow.
Remember that tasks can restart — any table tracking per-task state should be indexed by task index and record the generation. If a new request arrives from the same task index but a different generation, the request should be halted and replaced. |
9.2. Supervision
Rather than doing things like crash recovery in the kernel, Hubris assigns the responsibility to a designated task, called the supervisor. This section discusses the role of the supervisor and provides suggestions for writing your own.
The Hubris repo contains our reference supervisor implementation, El Jefe, in
the task-jefe
directory.
9.2.1. What is the supervisor?
The supervisor is a task like any other. It is compiled with the application, runs in the processor’s unprivileged mode, and is subject to memory protection.
Two things make the supervisor different from other tasks:
-
It runs at the highest task priority, 0, and is the only task at this priority.
-
The kernel recognizes it and interacts with it in unique ways.
The kernel can spot the supervisor because the supervisor always has task
index 0, and is listed first in the app.toml
. The kernel treats task index 0
differently:
-
When any other task crashes, the kernel posts a notification to the supervisor task. This notification is always sent to bit 0 (i.e. the value
1
). -
The supervisor task is allowed to send any kernel IPC message.
-
If the supervisor task crashes, the system reboots.
9.2.2. What does the supervisor do?
The design of Hubris assumes that the supervisor is responsible for taking action on task crashes. It may also do other things, but, that’s the basics.
When any task crashes, the kernel will post a notification to the supervisor
task (as chosen by the supervisor.notification
key in the app.toml
). Since
notifications don’t carry data payloads, this tells the supervisor that
something has crashed, but not what or why. The supervisor can use kernel
IPC messages to figure out the rest.
Currently, the supervisor needs to scan the set of tasks using the
read_task_state
kernel IPC until it finds faults. (If the supervisor sometimes
lets tasks stay in faulted states, then it will need to keep track of that and
look for new faults here.) It can then record that fault information somewhere
(maybe a log) and use the reinit_task
call to fix the problem.
Having to scan across the set of tasks is a little lame; if it proves to be an issue in practice we’ll introduce a more efficient way of pulling the last crash(es) from the kernel via IPC. |
The basic supervisor main loop reads, then, reads as follows:
// Value chosen in app.toml.
const CRASH_NOTIFICATION: u32 = 1;
loop {
// Closed receive will only accept notifications.
let msg = sys_recv_closed(
&mut [],
CRASH_NOTIFICATION,
TaskId::KERNEL,
);
// This case is so simple that we don't need to inspect
// the message to distinguish different sources. See
// below for a more complex example.
// Scan tasks. Skip ourselves at index 0.
for i in 1..hubris_num_tasks::NUM_TASKS {
match userlib::kipc::read_task_status(i) {
abi::TaskState::Faulted { fault, .. } => {
// Record any observed faults and restart.
log(fault);
kipc::restart_task(i, true);
}
}
}
}
(This is almost verbatim from the reference implementation.)
9.2.3. Talking to the supervisor
A supervisor may expose an IPC interface that can be used by other tasks to report information. (Because the supervisor is the highest priority task, any task can SEND to it, but it is not allowed to SEND anywhere but the kernel.)
Why would you want to do this? Some examples might include:
-
In a simple system, the supervisor might maintain the system’s event log in a circular RAM buffer, and provide an IPC for other tasks to append information to it.
-
You could implement interactive health monitoring (see next section).
-
You could proxy kernel IPCs that are normally only available to the supervisor, optionally implementing restrictions or filters.
If the supervisor wishes to expose an IPC interface, its main loop changes as follows:
// Value chosen in app.toml.
const CRASH_NOTIFICATION: u32 = 1;
// However large our biggest incoming message will be.
const MAX_MSG: usize = 16;
loop {
let mut msgbuf = [0u8; MAX_MSG]; (1)
let msg = sys_recv_open( (2)
&mut msgbuf,
CRASH_NOTIFICATION,
);
if msg.sender == TaskId::KERNEL { (3)
// Scan tasks. Skip ourselves at index 0.
for i in 1..hubris_num_tasks::NUM_TASKS {
match userlib::kipc::read_task_status(i) {
abi::TaskState::Faulted { fault, .. } => {
// Record any observed faults and restart.
log(fault);
kipc::restart_task(i, true);
}
}
}
} else {
// This is a message from a task
match msg.operation { (4)
...
}
}
}
1 | The loop now needs a buffer for depositing incoming messages. |
2 | Instead of a closed receive, we use an open receive to accept both notifications and messages from any source. |
3 | We need to distinguish notifications from messages by checking the origin. |
4 | In the case of a message, we choose different actions based on the operation code. |
9.3. Drivers
One of the purposes of an operating system is to provide a driver abstraction
for talking to hardware. Most traditional monolithic kernels (e.g. Linux)
have applications making system calls (read
, write
, ioctl
) directly
into the kernel where drivers live:
+-----------+ +-----------+ +-----------+
|application| |application| |application|
+-----------+ +-----------+ +-----------+
| | | |
| | | |
+-------------------------------------------------+
| | | |
+--v---+ +----v-+ +--v---+ +-v----+
|driver| |driver| |driver| |driver|
+------+ +------+ +------+ +------+
In Hubris, drivers are unprivileged and don’t live in the kernel. The primary
communication method is send
and recv
between tasks. Hardware drivers
usually exist as a 'server' which listens for messages and changes the hardware
block accordingly. Multiple application tasks may call into a single server.
(This is discussed in more detail in the chapter on servers, above.)
+-----------+ +-----------+ +-----------+
+------------+ app task | | app task | | app task +----------+
| +--+----+---+ +--+-+---+--+ +-+---------+ |
| | | | | | | |
| | | +------+ | +-+ +--+ |
| | | | | | | |
| v v v v v v |
| +------+ +------+ +------+ +------+ +----------+ |
| |server| |server| |server| |server| |supervisor| |
| +---+--+ +--+---+ +---+--+ +--+---+ +----+-----+ |
| .......|.......|.........|.......|..........|........ |
| +---v-------v---------v-------v----------v-----+ |
| | | |
+-------->+ kernel +<-------+
| |
+----------------------------------------------+
However, there’s some nuance to designing a good driver. This chapter aims to provide advice on this.
9.3.1. Driver crate vs server
Since tasks are relatively expensive in terms of resources (primarily RAM and Flash), it’s important to have the right number of tasks, rather than a separate task for everything (or just one task).
Drivers should not always be servers. Hubris is not religious about this, and it’s useful to have some flexibility here.
We’ve found the following distinction to be useful:
-
A driver crate provides a Rust interface for dealing with some device. It may directly access the hardware, or it may make IPCs to other required servers, or some combination.
-
A driver server wraps a driver crate and provides an IPC interface.
By convention, a driver crate for the encoder
peripheral on the xyz
SoC is
called drv-xyz-encoder
, while the crate wrapping it in a server is called
drv-xyz-encoder-server
.
If, in a given application, there’s only one user for a given driver — say, the board has a SPI controller with only one device wired to it — then it doesn’t necessarily make sense to have a task for the SPI controller. Instead, the task responsible for managing the device could link the SPI driver crate in directly.
There’s also the question of mutual exclusion. On an I2C bus, for instance, we can only talk to one device at any given time — and we may need to issue several transactions to a single device without risk of interruption. This means that a single device driver needs exclusive access to the I2C bus, for a combination of inherent hardware reasons (I2C is not pipelined) and software requirements.
If we allocated a separate server per I2C device, only one of those servers would be doing useful work at any given time — the rest would be waiting their turn.
In this case it might make more sense to assign the task to the bus and have it call into driver crates for each device as needed. This ensures that we’re only spending enough stack space for one device at a time, and helps the device drivers share common code. It also puts the drivers for the devices and the bus controller in the same fault domain, so that a crash in one affects the other — in I2C, a heavily stateful protocol with poor error recovery, this is almost certainly what you want, since a crash in a device during a transaction will likely require global recovery actions on the bus controller.
9.3.2. High Level Server
A typical driver server has to multiplex hardware events and client requests,
which requires both configuration in the app.toml
and code in the server
itself. Here is an example server written against the userlib::hl
library.
(For more details on userlib::hl
and server implementations in general, see
the chapter on servers.)
Some details are omitted — this is pseudocode.
// Notification mask for interrupts from app.toml
const INTERRUPT: u32 = 1;
fn main() {
turn_on_hardware_clocks();
let B = get_hardware_block();
B.clk.write(|w| w.clock.very_fast());
B.cfg.modify(|_, w| w.foo.disable().
enabled.set());
// Type used to record the state of an ongoing operation.
// This is handwavey but is similar to most block transfer
// implementations, which track a position and length.
struct MyData {
caller: hl::Caller<()>,
pos: usize,
len: usize,
}
// State of an ongoing operation; None indicates no
// operation
let mut data: Option<MyData> = None;
loop {
// This receives with notification, the alternate
// version is hl::recv_without_notification
hl::recv(
// all our messages are zero length.
&mut [],
// notification mask
INTERRUPT,
// state shared by notification and message handlers
&mut data,
// Notification handler
|dataref, bits| {
if bits & INTERRUPT != 0 {
// Matches our notification for an
// interrupt, do something
B.fifowr.write(|w| w.out.bits(buffer));
if let Some(state) = dataref {
if B.sr.read().is_done() {
// Resume the caller we were servicing.
state.caller.reply(());
// Clear the state machine to accept
// more messages.
*dataref = None;
}
}
}
},
// Message handler
|dataref, op, msg| match op {
Op::Write => {
// We expect a caller with one lease
let ((), caller) = msg
.fixed_with_leases(1)
.ok_or(ResponseCode::BadArg)?
// Deny incoming writes if we're
// already running one.
if dataref.is_some() {
return Err(ResponseCode::Busy);
}
// Our lease #0 is what is being sent to
// the hardware
let borrow = caller.borrow(0);
let info = borrow.info()
.ok_or(ResponseCode::BadArg)?;
// Provide feedback to callers if they
// fail to provide a readable lease
// (otherwise we'd fail accessing the
// borrow later, which is a defection
// case and we won't reply at all).
if !info.attributes.contains(LeaseAttributes::READ) {
return Err(ResponseCode::BadArg);
}
// Set our state machine, including saving the
// caller.
*dataref = Some(MyData {
task: caller,
pos: 0,
len: info.len
});
B.intstat.write(|w| w.interrupt_en.set());
Ok(())
}
Op::Read => {
// Looks almost identical to Write except
// We check the borrow against
// LeaseAttributes::WRITE
}
},
);
}
}
9.3.3. Driver API crates
A server called drv-xyz-encoder-server
should, by convention, provide clients
with a corresponding API wrapper crate called drv-xyz-encoder-api
. This will
normally use the userlib::hl
module under the hood to generate IPC.
An example API might look like:
enum Op {
Write,
Read,
Reset,
}
enum Peripheral {
Alpha,
Bravo,
Charlie,
Delta,
Echo,
Foxtrot
}
// This serves as a handle for the server.
pub struct Data(TaskId);
impl Data {
pub fn write(&self, peripheral: Peripheral, entry: u32) {
struct WriteData(Peripheral, u32);
impl hl::Call for WriteData {
const OP: u16 = Op::Write as u16;
// We don't expect a meaningful response.
type Response = ();
// Error is just an int
type Err = u32;
}
hl::send(self.0, &WriteData(peripheral, entry));
}
}
9.4. The caboose
At times, users may wish for a Hubris archive to contain information with the following properties:
-
Decided after the image is built
-
Readable in a wide variety of situations:
-
A live system with well-known APIs to request it (e.g. over the network)
-
A Hubris build archive
-
A binary file, given the original build archive
-
A binary file, without the original build archive
-
Note that a live system with a debugger attached is morally equivalent to "a binary file", because we can read arbitrary memory.
The motivating example of this data is a component-level version: after building
an image, a separate release engineering process wants to assign it a
user-facing version (e.g. 1.2.3
) and store that information somewhere in the
image.
The "caboose" is a region of flash allocated for this purpose. It is declared
with a [caboose]
section in an app.toml
, e.g.
[caboose]
region = "flash"
size = 128
tasks = ["caboose_reader"]
If this section is present in an app.toml
, the build system reserves an
appropriately-aligned section of memory for the caboose. The caboose is located
at the end of flash (after the final task), and is aligned so that it can be
mapped as an MPU region. Only tasks declared in caboose.tasks
are allowed to
read data from the caboose region of flash. If other tasks attempt to read from
this memory region, they will experience the typical memory fault.
Hubris does not have strong opinions about the contents of the caboose, other
than reserving two words. At build time, the caboose is loaded with the
following data (shown here as u32
words):
Start |
|
… |
(filled with |
End |
Caboose size (little-endian |
The caboose’s length is included in the total_image_len
field of
abi::ImageHeader
. Because the caboose is located at the end of flash, its
presence and size can be determined as follows:
-
Read total image length from the
ImageHeader
-
At runtime, this is a variable that can be read by the kernel
-
At rest, the image header is at a known location (depending on microcontroller) and includes a distinctive magic number (
abi::HEADER_MAGIC
)
-
-
Read the final word of the image, which may be the caboose length
-
Subtract this value from total image length to get the (presumptive) caboose start
-
If this subtraction underflows or exceeds the bounds of flash, the caboose is not present.
-
-
Read the first word of the (presumptive) caboose
-
If this word is not
abi::CABOOSE_MAGIC
, then the caboose is not present -
Otherwise, the caboose position and length is now known
-
Note that this procedure works both at runtime and from a binary file, with or without an associated Hubris archive.
To reduce runtime overhead, the caboose position may also be baked into an
individual task image at build time. This is implemented in the
drv-caboose-pos
crate:
let caboose: Option<&'static [u8]> = drv_caboose_pos::CABOOSE_POS.as_slice();
(This functionality requires cooperation with the xtask
build system, as we
can’t know the caboose position until all tasks have been built)
Other than reserving the two words above, the Hubris build system is
deliberately agnostic to the contents of the caboose; it is left to a separate
release engineering process to decide what to store there. The
hubtools
repository includes a
library and CLI for modifying the caboose of a Hubris archive.
However, for convenience, it’s possible to enable a default caboose:
[caboose]
region = "flash"
size = 128
tasks = ["caboose_reader"]
default = true
If the default
parameter is true
, then Hubris will itself use hubtools
to
populate the caboose with default values.
References
-
[shap03vuln] Jonathan Shapiro. Vulnerabilities in Synchronous IPC Designs. 2003. Short-ish and straightforward, Shap pokes a bunch of holes in conventional IPC designs.
-
[herder08ipc] Jorrit N. Herder et al. Countering IPC Threats In Multiserver Operating Systems: A Fundamental Requirement for Dependability. 2008. This paper marked MINIX 3’s transition from a teaching tool to a high-reliability research platform.