Writing a small async runtime for Cortex-M micro-controllers with Rust

How to build a lightweight async runtime for embedded systems.

Last updated on Oct 9, 2024 11 min read

Introduction

I’ve been working with Rust on microcontrollers for a while now, and I’ve been using the excellent embassy crate for async programming. However, I recently started using a processor for which embassy does not have Hardware Abstraction Layer (HAL) support, and I wanted to experiment with building my own async runtime. This was also a great opportunity to learn more about the processor and understand the implementation details of async/await in Rust.

In this article, we’ll create an async runtime for ARM Cortex-M microcontrollers. We’ll build on Philipp Oppermann’s work for x86 bare-metal systems here and adapt the approach for Cortex-M, which presents unique challenges like limited memory and often lacking a Memory Management Unit (MMU). These constraints require different design choices for async primitives, executors, and interrupt handling.

Note that this is an experimental project focused on learning and exploration. It is not production-ready, but it serves as a good starting point for building your own async runtime for embedded systems. If you’re looking for mature libraries for running real-time code on embedded systems, consider using embassy or rtic.

We will cover:

Building a lightweight executor for embedded systems
Implementing tasks in an async context
Utilizing power-saving features to sleep the microcontroller when no tasks are active

By the end of this article, you’ll understand the basics of creating a custom async runtime and how it can be adapted to a resource-constrained environment like Cortex-M microcontrollers.

Why Build Your Own Async Runtime?

Building an async runtime from scratch is not just a fun exercise but a useful way to understand how embedded systems handle concurrent tasks. This is particularly important for real-time systems that need to manage limited resources, such as processing power and memory. You’ll also be forced to become familiar with the inner workings the underlying hardware peripherals on the microcontroller and how they interact with the runtime. In a world where abstractions are stacked liberally and the typical developer is separated from the hardware by multiple layers of abstraction, this exercise can be a refreshing change of pace.

With existing libraries like Embassy, you can quickly get started with async programming on supported hardware. However, there might be scenarios where support for specific hardware is not available, or you need more control over how tasks are managed. This article aims to bridge that gap by helping you understand what it takes to write a custom runtime for such cases.

What is Async Programming?

In Rust, async/await allows you to write asynchronous code that looks very similar to synchronous code. It allows you to write code that can pause and resume execution without blocking the entire system, which is particularly useful for handling tasks like I/O operations, waiting for timers, or running multiple concurrent tasks.

The async model in Rust uses cooperative multitasking, where tasks yield control back to the executor when they are waiting for something to happen. This is different from preemptive multitasking, where an operating system controls task switching. Cooperative multitasking has the advantage of being predictable and having lower overhead, but it requires each task to explicitly yield control and be well-behaved.

For embedded systems, cooperative multitasking often works well because it keeps the system simple and avoids the overhead of complex task switching, which is critical when working with constrained environments like microcontrollers.

How Async Works in Rust

In Rust, async functions implement the Future trait. Futures can be polled using the poll() method. If a future is ready to produce a value, it returns Poll::Ready(T); otherwise, it returns Poll::Pending. The executor is responsible for polling these futures and managing the lifecycle of tasks.

The executor works by polling each task until it completes. It acts as the “runtime” that runs the tasks concurrently and keeps them progressing. For more details on how this works, I recommend checking out Phil Oppermann’s blog.

Async Execution Workflow

Below is a visual representation of how an async executor manages tasks and keeps polling them until completion.

graph TD
  subgraph Executor
    A[Task Queue] --> B[Task Polling]
  end

  subgraph Task
    D[Future] --> E[Poll Function]
  end

  A --> F[Task Scheduler]
  F --> E
  E --> G{Poll Result}
  G -->|Pending| F
  G -->|Ready| H[Complete Task]

  M[Sleep if Idle] --> F

This diagram shows how the executor maintains a queue of tasks, polls them to determine their state, and schedules them accordingly. If there are no tasks to execute, the executor can put the processor to sleep to save power.

Defining a Task

Each task in our runtime is represented as a Future pinned in memory, ensuring it can be polled without being moved. This is crucial because futures created by async/await may be self-referential, and moving them in memory would invalidate references to their internal fields.

Embedded environments typically don’t have access to standard memory allocators, so we need to initialize a custom heap allocator. We’ll use the embedded_alloc crate, which is designed for embedded systems. Below is the implementation of our task and how it uses atomic counters for task IDs.

// Import the 'alloc' crate, which provides memory allocation utilities. 
// Necessary for dynamic memory allocation in a no_std environment.
extern crate alloc;

// Importing 'Box' from the 'alloc' crate. Box is a smart pointer for heap-allocated memory.
use alloc::boxed::Box;
// Importing 'addr_of_mut' from core::ptr to get the mutable pointer to the heap memory.
use core::ptr::addr_of_mut;
// Importing 'AtomicU32' and 'Ordering' to manage atomic operations in a multi-threaded or interrupt context.
use core::sync::atomic::{AtomicU32, Ordering};
// Importing essential types for async and future handling from core: Future, Pin, Context, and Poll.
use core::{
    future::Future,
    pin::Pin,
    task::{Context, Poll},
};
// Importing LlffHeap as Heap from 'embedded_alloc' crate, which provides a memory allocator suitable for embedded systems.
use embedded_alloc::LlffHeap as Heap;

// Define a global allocator for heap memory. We use a static instance of Heap here.
#[global_allocator]
static HEAP: Heap = Heap::empty();  // The heap is initially empty and needs to be initialized.

/// Initialize the heap.
pub fn init_heap() {
    use core::mem::MaybeUninit;  // Use 'MaybeUninit' to represent uninitialized memory safely.
    const HEAP_SIZE: usize = 1024;  // Define the size of the heap in bytes.
    
    // Allocate an uninitialized array of bytes (of type MaybeUninit<u8>), which will serve as the heap.
    static mut HEAP_MEM: [MaybeUninit<u8>; HEAP_SIZE] = [MaybeUninit::uninit(); HEAP_SIZE];
    
    // Initialize the heap using the starting address and size of the memory array. 'unsafe' is required 
    // because we are using raw pointers and modifying static mutable data.
    unsafe { HEAP.init(addr_of_mut!(HEAP_MEM) as usize, HEAP_SIZE) }
}

/// Task ID type. We use a 32-bit unsigned integer to represent a task ID.
/// Cortex-M architecture is 32-bit, and it doesn't support atomic 64-bit integers.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
struct TaskId(u32);  // TaskId is a wrapper around a 32-bit unsigned integer.

impl TaskId {
    /// Generate a new unique task ID. 
    fn new() -> Self {
        static NEXT_ID: AtomicU32 = AtomicU32::new(0);  // Static AtomicU32 that stores the next available task ID.
        TaskId(NEXT_ID.fetch_add(1, Ordering::Relaxed))  // Increment and return the current ID.
    }
}

/// Base struct to represent a task.
pub struct Task {
    /// A unique identifier for the task.
    id: TaskId,  // Each task has a unique TaskId.
    /// The future representing the task. The Pin<Box<>> wrapper ensures that the
    /// future is not moved in memory. The Output type of the future is (), indicating
    /// that the future does not return a value - it just runs to completion.
    future: Pin<Box<dyn Future<Output = ()>>>,  // The 'Pin' ensures that the future is pinned in memory and cannot be moved.
}

impl Task {
    /// Create a new task from a future.
    pub fn new(future: impl Future<Output = ()> + 'static) -> Task {
        Task {
            id: TaskId::new(),  // Assign a new unique TaskId to the task.
            future: Box::pin(future),  // Pin the future to ensure it is not moved in memory and store it in a Box.
        }
    }

    /// Poll the task. This checks whether the future is ready to make progress or has completed.
    /// 'poll' uses a mutable reference to the task's future and a context to manage async execution.
    fn poll(&mut self, context: &mut Context) -> Poll<()> {
        self.future.as_mut().poll(context)  // Poll the pinned future and pass the provided Context.
    }
}

// A module declaration for 'executor', the runtime that manages task execution.
pub mod executor;

In this implementation, TaskId is used to uniquely identify each task, and we use a 32-bit atomic counter to generate these IDs. This approach avoids the overhead of locking and works well in a 32-bit environment like Cortex-M.

Implementing the Executor

The executor is responsible for managing tasks and running them to completion. It uses a task queue to determine which tasks are ready to run, and a custom TaskWaker to wake up tasks when they are ready to make progress.

Below is an implementation of the executor that runs tasks in a loop, polling each until they are complete.

extern crate alloc;

use super::*;
use alloc::collections::{BTreeMap, VecDeque};
use alloc::sync::Arc;
use alloc::task::Wake;
use core::task::{Context, Poll, RawWaker, RawWakerVTable, Waker};
use cortex_m::asm;
use crossbeam_queue::ArrayQueue;

// The custom waker struct for waking tasks.
struct TaskWaker {
    task_id: TaskId, // Unique identifier for the task.
    task_queue: Arc<ArrayQueue<TaskId>>, // Shared queue used to store task IDs ready to be executed.
}

impl TaskWaker {
    // Create a new waker instance for a given task.
    fn new(task_id: TaskId, task_queue: Arc<ArrayQueue<TaskId>>) -> Waker {
        Waker::from(Arc::new(TaskWaker {
            task_id,
            task_queue,
        }))
    }
    
    // Add the task to the task queue, making it ready to be polled again.
    fn wake_task(&self) {
        self.task_queue
            .push(self.task_id)
            .expect("Task queue is full!");
    }
}

impl Wake for TaskWaker {
    // Wakes the task by adding it back to the task queue.
    fn wake(self: Arc<Self>) {
        self.wake_task();
    }

    // Wake the task by reference, allowing it to be added to the task queue without consuming the waker.
    fn wake_by_ref(self: &Arc<Self>) {
        self.wake_task();
    }
}

/// Task executor that runs tasks to completion.
pub struct Executor {
    /// Map of tasks to be run, indexed by task ID.
    tasks: BTreeMap<TaskId, Task>,
    /// Queue holding task IDs that are ready to be executed.
    task_queue: Arc<ArrayQueue<TaskId>>,
    /// Cache for storing wakers for tasks that are currently running.
    waker_cache: BTreeMap<TaskId, Waker>,
}

impl Executor {
    /// Create a new executor with an empty task queue.
    pub fn new<const N: usize>() -> Executor {
        Executor {
            tasks: BTreeMap::new(),
            task_queue: Arc::new(ArrayQueue::new(N)), // Initialize task queue with a capacity of N.
            waker_cache: BTreeMap::new(),
        }
    }

    /// Add a new task to the executor.
    pub fn spawn(&mut self, task: Task) {
        let task_id = task.id;
        if self.tasks.insert(task_id, task).is_some() {
            panic!("task with same ID already in tasks");
        }
        self.task_queue.push(task_id).expect("Task queue is full."); // Add the task ID to the task queue.
    }

    /// Run all tasks that are ready to execute.
    fn run_ready_tasks(&mut self) {
        // Continuously pop task IDs from the task queue and run them.
        while let Some(task_id) = self.task_queue.pop() {
            // Retrieve the task using its task ID.
            let task = match self.tasks.get_mut(&task_id) {
                Some(task) => task,
                None => continue, // If the task doesn't exist, skip it.
            };

            // Retrieve or create a waker for the task.
            let waker = self
                .waker_cache
                .entry(task_id)
                .or_insert_with(|| TaskWaker::new(task_id, self.task_queue.clone()));

            // Create a new context from the waker.
            let mut context = Context::from_waker(waker);

            // Poll the task to determine if it is ready to make progress.
            match task.poll(&mut context) {
                Poll::Ready(()) => {
                    // If the task is complete, remove it and its waker from the cache.
                    self.tasks.remove(&task_id);
                    self.waker_cache.remove(&task_id);
                }
                Poll::Pending => {} // If the task is not complete, leave it in the task list.
            }
        }
    }

    /// Put the processor to sleep if there are no tasks to run.
    fn sleep_if_idle(&self) {
        cortex_m::interrupt::free(|_| {
            // If the task queue is empty, put the processor to sleep.
            if self.task_queue.is_empty() {
                asm::wfi(); // Wait for interrupt to wake up the processor.
            }
        });
    }

    /// Run the executor to completion, continuously executing tasks until none are left.
    pub fn run(&mut self) {
        loop {
            self.run_ready_tasks();
            self.sleep_if_idle();
        }
    }
}

The Executor structure maintains a map of tasks, a task queue, and cached wakers for tasks that are currently running. The run method continuously runs tasks until none are left, and the sleep_if_idle method puts the processor to sleep when there are no tasks to execute. As mentioned here the wfi instruction can be used to wait for an event or interrupt.

Putting it all together

Below is an example of how to put everything together to run a simple async task on a Cortex-M microcontroller.

#![no_std]
#![no_main]

use cortex_m_asyncrt::os::{self, executor, init_heap, Task};
use cortex_m_rt::entry;
use cortex_m_semihosting::{dbg, hprintln};
// use panic_probe as _;
use panic_semihosting as _;

#[entry]
fn main() -> ! {
    init_heap();
    hprintln!("Hello, worlds!");
    // New executor that can run up to 32 tasks
    let mut executor = executor::Executor:: new::<64>();

    // Spawn a task
    executor.spawn(Task::new(example_task()));

    // Run the executor
    executor.run();

    // This code is unreachable because the executor.run() function runs tasks to completion.
    loop {}
}

async fn example_task() {
    // your code goes here
    let r = example_fn().await;

    hprintln!("r = {}", r);
}

async fn example_fn() -> u32 {
    42
}

If you run this code on QEMU you should see the following output:

Timer with period zero, disabling
Hello, worlds!
r = 42

If you want to try this out yourself, you can use the cortex-m-asyncrt crate on crates.io. Version 0.1.0 of the crate is exactly as described in this article.

Real-World Use Cases and Future Work

This async runtime could be applied to scenarios where you need a lightweight scheduler for real-time tasks, such as sensor data collection, motor control, or communication handling in resource-constrained embedded systems.

Future improvements could include adding task priority, enhancing the scheduler with time measurements for scheduling tasks at regular intervals and for asynchronously awaiting delays.

References

programming rust