December 18, 2012

2012: The Year Rubyists Learned to Stop Worrying and Love Threads (or: What Multithreaded Ruby Needs to Be Successful)

Let me provide a very different picture of how Rubyists used to view threads versus what the title of this post implies about now. I’m not talking about in 2005 in the early days of Rails. I’m talking about at Dr. Nic’s talk at RubyConf 2011, a little more than a year ago. Dr. Nic had a fairly simple message: when performance matters, build multithreaded programs on JRuby (also: stop using EventMachine). Now granted he was working the company that was subsidizing JRuby development at the time, but I didn’t, and I for one strongly agreed with him. Not many other people in the room did. The talk seemed to be met with a lot of incredulity.

“I thought this was going to be a talk on EventMachine!” said That Guy. Perhaps what That Guy missed was that Dr. Nic had hosted EventMachineConf as a subconference of RailsConf a few months before. And now Dr. Nic was saying don’t use EventMachine, use threads. And Dr. Nic is certainly not the only one who has come to this conclusion.

Flash forward to 2012 and I think the Ruby community has completely changed its tune. I may be a bit biased, but if I were to pick an overall “theme” (or perhaps “tone”) of RubyConf 2012, it’s that the single-core nature of the canonical Ruby interpreter, MRI (the “Matz Ruby Interpreter”), is limiting Ruby’s potential applications.

There was a lot of buzz about JRuby this year, and Brian Ford, one of the primary developers of Rubinius, announced the first 2.0 prerelease. Both of these Ruby implementations support parallel execution of multithreaded Ruby programs on multicore CPUs. I talked to a lot of people who are interested in my Celluloid concurrent object library as well.

At the same time RubyConf 2012 marked the first “Ruby 2.0” prerelease. Many of the talks covered upcoming Ruby 2.0 features, most notably refinements. Much like the release of Rails 2.0, this felt a bit underwhelming. What the crowd was clamoring for was what would be done with the Global Interpreter Lock (or GIL, or perhaps more appropriately the Global VM Lock or GVL in ruby-core parlance).

At the end of the conference, Evan Phoenix sat down with Matz and asked him various questions posed by the conference attendees. One of these questions was about the GIL and why such a substantial “two dot oh” style release didn’t try to do something more ambitious like removing the GIL and enabling multicore execution. Matz looked a bit flustered by it, and said “I’m not the threading guy”.

Well Matz, I’m a “threading guy” and I have some ideas ;)

Personally I’m a bit dubious about whether or not removing the GIL from MRI is a good idea. The main problems would be the initial performance overhead of moving to a fine-grained locking scheme, and also the bugs that would crop up as a large codebase originally intended for single-threaded execution is retrofitted into a multithreaded program. I think the large bug trail this would create would hamper future Ruby development, because instead of spending their time improving the language itself, ruby-core would spend its time hunting thread bugs.

All that said, there are some features I would personally like to see in Ruby which would substantially benefit multithreaded Ruby programs, GIL or no GIL. I would personally prioritize all of these features ahead of removing the GIL, as they would provide cleaner semantics we need to write correct multithreaded Ruby programs:

Recommendation #1: Deep freeze #

Immutable state has a number of benefits for concurrent programs. What better way to prevent concurrent state mutation than to prevent any state mutation? (actually there are ways I think are better, but I’ll get to that later) Ruby provides #freeze to prevent modifications to an object, however I don’t think #freeze is enough.

Purely immutable languages allow the creation of immutable persistent data structures. The word “persistent” in this case doesn’t mean written to disk, but that you can create multiple versions of the same data structure with “persistent” copies of parts of the old one that get shared between versions. This approach only works if it’s immutable state all the way down. Ruby supports immutable persistent data structures via the Hamster gem, but it would be much easier to work with immutable data if this feature were in core Ruby.

What we need is more than just freezing of individual objects that #freeze provides. As some recent compiler research into immutability by Microsoft demonstrates, what really matters isn’t the mutability of individual objects but rather aggregates (i.e. object graphs). We need a guaranteed way to freeze aggregates as a whole, rather than freezing just a single object at a time. This means we’d walk all references from a parent object and freeze every single object we find. This would allow for the creation of efficient immutable persistent data structures in Ruby.

What would this look like? Something like Object#deep_freeze. While it’s possible to use Ruby introspection to attempt to traverse all the references a given object is holding recursively (see the ice_nine gem) this is something I really feel should be part of the language proper. I also get the idea that VM implementers know exactly where these references are in their implementations and could implement a lot faster version of #deep_freeze than using Ruby reflection to spelunk objects and find their references.

Recommendation #2: Deep dup #

There’s another approach that works equally well when we have objects we’d like to mutate but that we’d also like to share across threads. Before we send objects across threads, we could make a copy, and give the copy to another thread.

Making copies every time we want to pass an object to another thread might sound expensive and wasteful, but there’s a language that’s been very successful at multicore performance which does just that: Erlang. In Erlang, every process has its own heap, so every time a message is passed from one process to another the Erlang VM makes a copy of the data being sent in the message and places the new copy in the receiving process’s heap space. (the exception is binary data, for which Erlang has a shared heap)

Ruby has two methods for making shallow copies of objects, Object#dup and #clone (which are more or less synonymous except #clone copies the frozen state of an object). However, the only built-in way to make a deep copy of entire object graphs is to use Marshal. This is nowhere near ideal for making in-VM copies, because for starters it produces an intermediate string that needs to be garbage collected, not to mention that Marshal uses a complex protocol which precludes the sorts of optimizations that could be done on a simple deep copy operation.

Instead of Marshaling, Ruby could support Object#deep_dup to make deep copies of entire object graphs. This would work much like #deep_freeze, traversing all references in an object graph but constructing an equivalent copy instead of freezing every object. Once a copy has been created, it can be safely sent to another thread. This could be leveraged by systems like Celluloid which control what happens at the boundary between threads. If Celluloid even provided an optional mode for always copying objects sent in messages, then using it would ensure your program was safe of concurrent mutation bugs.

Bonus points #1: Ownership Transfer #

Copying object graphs every time we pass a reference to another thread is one solution to providing both mutability and thread safety, however making copies of object graphs is a lot slower than a zero copy system. Can we have our cake and eat it too: zero-copy mutable state objects that are free of any potential concurrent mutation bugs?

There’s a great solution to this: we can pin whole object graphs to a single thread at a time, raising exceptions in other threads that may hold a reference to any object in the graph but do not own it and attempt to perform any type of access. This idea is called ownership transfer.

The Kilim Isolation-Typed Actor system for Java is one implementation of this idea. Kilim supports the idea of “linear ownership transfer”: only one actor can ever see any particular object graph in the system, and object graphs can be transferred wholesale to other actors, but cannot be shared. For more information on the messaging model in Kilim, I definitely suggest you check out the portion of Kilim-creator Sriram Srinivasan’s talk on the isolation system Kilim uses for its messages.

Another language that supports this approach to ownership transfer is Go. References passed across channels between goroutines change ownership. For more information on how this works in Go, I recommend checking out Share Memory By Communicating from the Go documentation. (Edit: I have been informed that Go doesn’t have a real ownership transfer system and that the idea of ownership is more of a metaphor, which means the safety guarantees around concurrent mutation are as nonexistent as they are in Ruby/Celluloid)

Ruby could support a similar system with only a handful of methods. We could imagine Object#isolate. Like the other methods I’ve described in this post, this method would need to do a deep traversal of all references, isolating them as well so as to isolate the entire object graph.

Moreover, to be truly effective, isolation would have to apply to any object that an isolated object came in contact with. If we add an object to an isolated array, the object we added would also need to be isolated to be safe. This would also have to apply to any objects referenced from the object we’re adding to the isolated aggregate. Isolation would have to spread like a virus from object-to-object, or otherwise we’d have leaky bits of our isolated aggregate which could be concurrently accessed or mutated without errors.

If a reference to an isolated object were to ever leak out to another thread, and that thread tried to reference it in any way, the system would raise an OwnershipError informing you that an unpermitted cross-thread object access was performed. This would prevent any concurrent access or mutation errors by simply making any cross-thread access to objects without an explicit transfer of ownership an error.

To pass ownership to another thread, we could use a method like Thread#transfer_ownership(obj) which would raise OwnershipError unless we owned the object graph rooted in obj. Otherwise, we’ve just given control of the object graph to another thread, and any subsequent accesses by ourselves will result in OwnershipError. If we ever want to get it back again, we will have to hand the reference off to that other thread, and the other thread must explicitly transfer control of the object graph back to us.

A system like this would be a dream come true for Celluloid. One of the biggest drawbacks of Celluloid is its inability to isolate the objects being sent in messages, and while either #deep_freeze or #deep_dup would provide solutions to the isolation problem (with various and somewhat onerous tradeoffs), an ownership transfer system could provide effective isolation, zero copy messaging, and preserve mutability of data (which Ruby users will generally expect).

Bonus points #2: Unified Memory Model and Concurrent Data Structures #

Java in particular is well known for its java.util.concurrent library of thread-safe data structures. Many of these are lock-free equivalents of data structures we’re already familiar with (e.g. ConcurrentHashMap). Others provide things like fast queues between threads (e.g. ArrayBlockingQueue)

It would be great if Ruby had a similar library of such data structures, but right now Ruby does not have a defined memory model (yet another thing Brian Ford called for), and without a memory model shared by all Ruby implementations it seems difficult to define how things like concurrent data structures will behave.

Conclusion #

Stefan Marr recently gave an awesome talk laying out the challenges for building multicore programs in object-oriented languages which contains a number of points relevant to systems like Celluloid.

Celluloid solves some of the synchronization problems of multithreaded programs, but not all of them. It’s still possible to share objects sent in messages between Celluloid actors, and it’s possible for concurrent mutations in these objects go unnoticed.

I don’t think I can solve these problems effectively without VM-level support in the form of the aforementioned proposed features to core Ruby. You can imagine being able to do include Celluloid::Freeze or include Celluloid::Dup to control the behavior of how individual actors pass messages between each other. Or, even better, if Ruby had an ownership transfer system Celluloid could automatically transfer ownership of objects passed as operands to another actor or returned from a synchronous call. If that were the case, accesses to objects which have been sent to other threads would result in an exception instead of a (typically) silent concurrent mutation.

Celluloid is your best bet for building complex multithreaded Ruby programs, but it could be better… and we need Ruby’s help.

923

Kudos

923