| Make the compiler a better friend (aka parse, don't validate)

TL;DR: We engineers have a very limited context window and robots have a lot of state to be kept track of in software and our heads. Let's make our lives easier by making invalid state unrepresentable! Then the compiler or type checker becomes your best friend and frees up essential space in your head by taking care of all these errors at compile time!

Parse, don't validate

I first encountered the concept of "parse, don't validate" in Alexis King's article. Even though the examples are in haskell, a language I'm not familiar with, they resonated immediately with me. The recent article of the same pattern in Rust by Gio reminded me once again why it is so powerful. The idea is simple: Incorporate as much information as possible in the types such that the compiler or type checker can validate it at compile time or type check time respectively.

Let's dive into it with real world robotics examples in python and C++ from the Gravis Robotics codebase!

Python Example: Leveraging wrapper types to avoid wrong robot identifier

Note: Python supports typing out of the box, however needs third party tooling for type verification such as pyrefly, basedpyright or ty. We recommend to leverage these tools similar to a compiler, to verify types at type check time (e.g. in editor, CI or both)

At Gravis each robot has a name. e.g. bob-the-builder. In addition each machine has a globally unique identifier: robot/bob-the-builder (we use the Google API guideline as a reference). For historical reasons we often use the robot name in on-robot software without the robot/ prefix. As you imagine we eventually had bugs in our software where the robot name was used instead of the identifier and vice versa. They are not hard to track down, but annoying. The newtype pattern in combination with a type checker eliminates the possibility of this class of problems.

Error prone code would look like this:

class TaskDefinition:
    def __init__(self, robot: str, ...):
        # The globally unique identifier of the robot
        self._robot: str = robot
        ...

class TaskExecutor:
    def execute(self, task_definition: TaskDefinition):
        ...

def start_task(task_executor: TaskExecutor): 
    robot_name: str = "robo42" # Note: no "robot/" prefix
    task_definition = TaskDefinition(robot=robot_name, ...)
    task_executor.execute(task_definition)

Note how we have semantic information in the comment in TaskDefinition. We hope that fellow engineers read the comments and adhere to it. In the best case the task_executor does validation on the task_definition and fails quickly. In the worst case it will fail only sometime later, when a piece of code assumes that task_definition._robot is prefixed with robot/. This increases our cognitive load, which takes energy from the actual task we are working on.

Let's fix this by incorporating that information into the type system such that the type checker can help us.

By introducing a RobotIdentifier type we can distinguish the robot name from the identifier and design an API that makes it impossible to mix up the two things:

from typing import NewType

RobotName = NewType('RobotName', str)
RobotIdentifier = NewType('RobotIdentifier', str)

def robot_identifier_from_name(name: RobotName) -> RobotIdentifier:
    return RobotIdentifier(f"robot/{name}")

Now we change the TaskDefinition to accept a RobotIdentifier instead of a str:

class TaskDefinition:
    def __init__(self, robot: RobotIdentifier, ...):
        # The globally unique identifier of the robot
        self._robot: RobotIdentifier = robot
        ...

With this small change, we have made the code safe for accidential mixups of robot names and global robot identifiers. The typechecker will report

Argument `str` is not assignable to parameter  `robot` with type `RobotIdentifier` in function `TaskDefinition.__init__`

Not only are we now very explicit in the API that we expect a global robot identifier we also provide the semantic information to the type checker, which can warn us if we use the wrong type when creating the TaskDefinition. Just by introducing these new types we eliminated a whole category of errors!

C++ Example: Migrating hardware version numbers

We recently changed our hardware versioning schema and needed to propagate this change to our code. The schema changed from 123 (a plain number, but often used with a prefix, e.g. hw-123) to hw-9876-123456. The old serial number was accessed via a free standing function:

import <cstdint> 

/**
 * Returns the serial number of the hardware.
 */
[[nodiscard]] uint32_t getSerialNumber();

As you can see the old format was just represented as an uint32_t. Remember that we have hardware of different versions concurrently in the field. Therefore we expect hardware with the old and the new format to live concurrently for quite a while. The quick change here would have been to just change the return type of getSerialNumber() to std::string and use the function as is. I.e. getSerialNumber() would return a string that either looks like 123 or hw-9876-123456. The type system would help us find all the usages as existing callers expect an uint32_t instead of a string. Let's improve this further by giving even more information to the compiler to support us further! We give the compiler information about both types of serial numbers, old and new ones:

import <cstdint>
import <string>
import <variant>

using SerialNumber = std::variant<uint32_t, std::string>;

[[nodiscard]] SerialNumber getSerialNumber();

With this, we force our future selves to think about both cases whenever we use the getSerialNumber() function. This prevents a software engineer not familiar with new serial numbers just prepending hw- to whatever they get back from getSerialNumber(). Note that exactly this situation would create a bug that we would only detect at runtime. By forcing the caller to take care of both variants we move these mistakes to compile time!

Note: It might be tempting to use using LegacySerialNumer = uint32_t to create something similar to python's NewType. This does not work as using only creates a type alias. There are other options however. For example Boost's BOOST_STRONG_TYPEDEF library or defining custom structs or classes as described in this fluentcpp article. And there is first proof of concepts of doing this with reflection in C++26.

Summary

If illegal state is unrepresentable, the compiler will help you avoid it. In compiled languages, it will be compile time errors and in dynamic languages, a type checker can help us avoid it. To get there you need to give the information to the compiler/type checker to be able to help you. Therefore, don't be mean and withhold information from the compiler.

Make the compiler a better friend by giving it what it needs to help you!