A quick tour of Rust’s Type System Part 1: Sum Types (a.k.a. Tagged Unions)

Rust is one of those hip new programming languages you might get tired of hearing about on the Hacker News, but talking with people who have heard of Rust but haven’t actually looked into it in depth, I feel like what Rust actually offers you as a programmer is often poorly communicated. The more I use Rust, the more I’ve come to discover that the single most important thing it offers, the part of the language that almost all of its other benefits fall out of, is its type system.

I’m embarking on a short series of blog posts which will explore Rust’s type system, and hopefully describe things in a beginner-friendly way such that you don’t need to know a whole lot about advanced type systems to understand what I’m talking about.

Before I go any further talking about type systems, let me make one thing clear: this is not a post for type system experts. If you’re a functional programmer who has already accepted types into your heart and loves working in languages like Haskell or Scala, this is not the blog post you’re looking for. I might recommend Rust for functional programmers instead. While I will class Rust as an imperative language, I think its type system allows it to have many of the same benefits as functional languages, or as the aphorism goes, Rust is “C++ in ML clothing”.

This series of posts is written from the point-of-view of my real world experiences of using Rust as someone who hasn’t previously worked extensively in a language with a “fancy” type system of the sort you typically find in those aforementioned functional languages. I’ll be talking about features of the type system I used organically as part of developing my first non-trivial program in Rust.

What’s a sum type? #

Before I start busting out any fancy functional language terminology, I’d like to give some examples of sum types in Rust and the features they offer.

I’d like to start with this example of Rust enums:

enum Animal {
    Cat,
    Dog,
    Mouse,
    Fish,
    Dolphin,
    Snake
}

In Rust, enums represent one of many possible variants. Many languages support an enum syntax that works like this. Rust lets us go a bit further and add data associated with each of the variants. Here’s a contrived example which adds some associated data to each variant:

enum Animal {
    Cat { weight: f32, legs: usize },
    Dog { weight: f32, legs: usize },
    Monkey { weight: f32, arms: usize, legs: usize },
    Fish { weight: f32, fins: usize },
    Dolphin { weight: f32, fins: usize },
    Snake { weight: f32, fangs: usize }
}

With data added to variants, this style of enum takes on many of the properties of union types in other languages. The combination of enum and union-like behavior is why this sort of type is often referred to as a “tagged union”. In type theory, it’s referred to as a sum type, and is one of many algebraic types including product types and quotient types. Though the variants are disjoint, they collectively form a single type.

Rust’s pattern matching lets us write match statements that work across all of the variants:

match animal {
    Animal::Cat { weight, .. } |
    Animal::Dog { weight, .. } |
    Animal::Monkey { weight, .. } |
    Animal::Fish { weight, .. } |
    Animal::Dolphin { weight, .. } |
    Animal::Snake { weight, .. } => weight
}

This provides a way to query an attribute across all of the variants of this enum, even though their data is disjoint. We can implement this as reusable functionality by defining a method on the Animal enum:

impl Animal {
    pub fn weight(&self) -> float {
        match *self {
            Animal::Cat { weight, .. } |
            Animal::Dog { weight, .. } |
            Animal::Monkey { weight, .. } |
            Animal::Fish { weight, .. } |
            Animal::Dolphin { weight, .. } |
            Animal::Snake { weight, .. } => weight
        }
    }

    pub fn legs(&self) -> Option<usize> {
        match *self {
            Animal::Cat { legs, .. } => Some(legs),
            Animal::Dog { legs, .. } => Some(legs),
            Animal::Monkey { legs, .. } => Some(legs),
            Animal::Fish { .. } => None,
            Animal::Dolphin { .. } => None,
            Animal::Snake { .. } => None
        }
    }

    pub fn fins(&self) -> Option<usize> {
        match *self {
            ...
        }
    }
}

The above contrived example may not seem particularly compelling, but let’s take a look at how Rust implements Option, an option type similar to a “maybe monad” which serves the same function as the null keyword in other languages (Rust itself has no concept of null):

enum Option<T> {
    Some(T),
    None,
}

The Some(T) syntax is another way of associating data with an enum variant like the above weight, legs, etc. we associated with Animals. In this case it’s associating a generic type T with the Some variant, and the None variant ostensibly has nothing associated with it.

The interesting thing about this to me is Option is very much a core feature of Rust, and one of the first features you will probably learn about, and yet for it to even exist it requires both generics and sum types. It’s used ubiquitously throughout the Rust standard library wherever you might otherwise use null, a language feature criticized by its own creator Tony Hoare as a “billion dollar mistake”.

I think this is a great example of how a powerful type system like Rust’s, when leveraged ubiquitously by its own standard library, can greatly improve the quality of code and overall experience of programming. In Rust, there are no null pointer exceptions.

But it goes beyond that: Rust has no exceptions at all. Instead it has Result, it’s form of the eponymously named result type:

enum Result<T, E> {
   Ok(T),
   Err(E),
}

Result is generic around two types: an arbitrary type T intended to be wrapped in the Result::Ok variant upon success, and an error type E to be wrapped in Result::Err on error (these are aliased to Ok and Err for brevity).

Why is this interesting? Because if we have a function that looks like this:

fn get_something(someparam: &MyParam) -> Result<Something, Error> {
    ...
}

Result allows us to use the following code in lieu of exceptions for handling errors:

let something = try!(get_something(someparam));

The try! macro “unwraps” the result for us on success (i.e. Ok), but will short circuit the rest of the calling function and return the error value if we encounter an error. It’s an explicit way of describing which functions we call return error values, and avoids the need for non-local jumps to propagate errors up the stack to the point where we actually want to handle them.

This concludes the quick tour of Rust’s sum types. Hopefully you learned something!

In my next post (if I ever get around to writing it) I’ll be covering associated types, a feature which allows you to make “type families”, packaging a bunch of types together under a single name. See you then!

 
556
Kudos
 
556
Kudos

Now read this

Would Rust have prevented Heartbleed? Another look

In case you haven’t heard, another serious OpenSSL vulnerability will be announced this Thursday. It reminded me of about a year ago, when Heartbleed was announced: In December 2014 I gave a talk at Mozilla about cryptography in Rust... Continue →