April 29, 2015

An open letter to Matz on Ruby type systems

Hi Matz,

I really enjoyed your keynote at RubyConf 2014. The most interesting part of it to me was where you talked about how Ruby 3.0 might include some sort of type system. There are lots of directions you can go with type systems in Ruby. Your presentation talked about how you like to explore different directions that you could go with a given feature.

I think there were two big points you made about a type system in Ruby that are important:

DRY: adding types should not make the Rubyist add type declarations to their program over and over again
Duck Typing: this is important to the way Rubyists program and any type system added to Ruby should still enable duck typing

After many years of writing Ruby, I have taken on a rather defensive programming style for a lot of the code I write after debugging too many type-related issues. I often find myself writing code like this:

def some_method(param)
  fail TypeError, "not ExpectedType" unless param.is_a?(ExpectedType)
  ...
end

Many Rubyists would probably say you should not write code like this, and use respond_to? to check how the types “quack” rather than checking what they are. But I find this approach error prone: a lot of different classes have methods that just happen to have the same name, but have completely different and unexpected behaviors. Much of the code I write deals with things like concurrency and cryptography, where I feel correctness is more important than the flexibility that might be offered by duck typing.

I don’t write all of my code with these sorts of fail TypeError checks. Sometimes I do want duck typing. But for important parts of the code where errors happen frequently, I find them quite helpful.

So I have already written types into the method once with these parameter checks. But then when I add documentation using YARD, I add the same type information in again:

# This is some method that does something
#
# @param [ExpectedType] param to the method
# @return [AnotherType] the result of the method
def some_method(param)
  fail TypeError, "not ExpectedType" unless param.is_a?(ExpectedType)
  ...
end

Now I have written ExpectedType twice (actually three times since the fail TypeError line has it twice)! That’s not DRY. But the big problem here is while I’m putting a lot of type information into my program, there is no type system to give this information to. So it has all of the burden of a statically typed language without any of the benefits.

I would like to have a single way to tell Ruby the types of my programs so they can be used both to replace those fail TypeError checks with something automatic and also be useful for documentation systems.

I have seen some projects that attempt to do this:

rtc: Ruby Type Checker
Rubype

These are sort of “contracts” based systems that specify the types that you expect will be passed to a method or returned from a method, except they work completely at runtime, so they do the fail TypeError checks I was talking about earlier for me.

However, this is not what I’d actually like out of a type system. I would like to have a type system that’s fully backwards compatible with Ruby today, but lets the compiler do static analysis and catch type errors in advance at program start time rather than when the program is running.

But Ruby is so flexible and lets you change things at runtime, so how is this possible?

There’s an approach called gradual typing (also referred to as “optional typing” although I understand there are some distinctions) which is like a hybrid of static typing and dynamic typing. It works by using static analysis when possible, and when static analysis isn’t possible, doing a runtime check instead. This lets the compiler catch certain types of errors in advance when the program loads, and for everything else, it works like rtc/Rubype and uses runtime checks (if you have specified type information).

Gradual typing lets you start with a program that has no types and slowly add types where you think it makes sense. For example, let’s say we have a really big Rails app which has core models that are thousands and thousands of lines long. There are probably methods inside these models where a lot of type errors happen, because these are the core models and lots of code interacts with them. This is where we might look and go “ok, types make sense here”, because there is a lot of code churn around type-related errors. I am guessing anyone who has worked on a large Rails app has probably experienced this at some point.

The rest of the program is just plain old “untyped” Ruby, but we put the types on the parts we care about the most. As we “gradually” add types, the type system is able to figure out more and more things about the program in advance, so we can catch more errors when the program starts up instead of at runtime.

The best part is this approach can actually work with dynamic programs that people are constantly changing. For example, in Rails we might have a browser and editor open, change some code in the editor, and refresh the page in the browser to see how our code changes affected how the page rendered. People do this over and over again, making a code change and refreshing the browser, and Rails reloads the code dynamically in an already loaded program. This sort of workflow is amazing and why I think Rails programmers can be so productive. People came to Rails from languages like Java where each time they made a code change there was a compile step and probably rebooting the program, which is very tedious. Ruby’s dynamic nature is what enables productivity for Rails programmers.

How can this work with a gradual typing system? Will we break this beautiful workflow that makes programmers so productive?

Facebook developed a language called Hack which is a gradually typed version of PHP. In a Strange Loop 2013 talk about PHP and Hack (slides here), a Facebook employee talked about how developers are able to develop with the Rails style workflow without worrying about types.

When Facebook engineers work on the Facebook PHP code, they run a type checker side-by-side with their browser and editor. Programs are not required to have valid types when the engineer saves a change and refreshes the program in their browser. The types are just erased and the program is treated like normal PHP code. However, the type checker watches for their changes, and type checks the program in parallel with the engineer’s workflow. The engineer can make a change to their program and refresh their browser, and the type checker works in the background, allowing the programmer to only worry about the types when they’re done hacking on the code.

In this regard, the type checker acts more like a linter than an enforcer. So maybe a type error will pop up, but it does not get in the way of development. It’s just something the programmer should fix up after they’re done hacking and before they commit.

There are many other languages that implement gradual typing:

TypeScript: a gradually typed superset of JavaScript developed by Microsoft Research
Flow: a gradually typed superset of JavaScript which is somewhat syntax-compatible with TypeScript developed by Facebook
Dart: a optionally typed language for web development created by Google
Clojure core.typed: an optional type system for Clojure
Typed Racket: an optionally typed version of the Racket Scheme language where many of the ideas for this sort of type system were developed
StrongTalk: an optionally typed version of the Smalltalk language

There was also a project that attempted to add this sort of type system to Ruby called Diamondback Ruby.

(Edit: I should also mention that Brian Shirai is working on this sort of thing in Rubinius 3.0 too)

In conclusion, I think the ideas from these languages could greatly benefit Ruby and I hope you will consider them!

490

Kudos

490

Kudos

An open letter to Matz on Ruby type systems

Now read this

All the crypto code you’ve ever written is probably broken