July 12, 2016

A gentle introduction to nio4r: low-level portable asynchronous I/O for Ruby

Rails 5.0 was recently released, and with it came ActionCable, a new part of the framework to put WebSockets “on Rails”. ActionCable has had something of a sordid history, from taking Rails Core developer Aaron Patterson by surprise when he first heard of it at a RailsConf keynote to at one point using both EventMachine and Celluloid, each of which independently is an onerous dependency (I say this as the author of Celluloid).

That said, the dust has settled and both EventMachine and Celluloid have been removed. Instead, ActionCable is based on concurrent-ruby, a Ruby library inspired by Java’s java.util.concurrent which was already a Rails dependency, and nio4r, i.e. New I/O (or Non-blocking I/O) for Ruby, a library you may not have heard of before inspired by java.nio. While EventMachine and Celluloid are both grand inventions, I think there’s something to be said for copying our homework from Java.

As the author of nio4r, I thought I could shed some light on how it works, especially now that it’s shipping as a default dependency of Rails. It’s not a new library: I started writing it around the beginning of 2012, which makes it over four and a half years old. It hit 1.0 in the beginning of 2014. Before ActionCable, it primarily served to provide the core async I/O functionality for Celluloid::IO, a set of asynchronous I/O extensions for Celluloid. But as ActionCable shows, nio4r has uses outside of the Celluloid ecosystem.

A low-level asynchronous I/O library #

Unlike large, complicated frameworks such as EventMachine, Celluloid::IO, and Cool.io (an earlier async I/O framework I wrote which is surprisingly still maintained by others), nio4r provides relatively few features and a low-level API. In that regard, nio4r is best suited as the foundation of higher-level async libraries.

Instead, nio4r aims to provide portable implementations of just the I/O primitives that need support from native extensions. Additionally, nio4r aims to provide first-class support for JRuby in addition to CRuby (and other VMs that support MRI C extensions like Rubinius). To that end, nio4r borrows heavily from the design of Java NIO, allowing the JRuby backend to be a thin Java shim which exposes a Ruby API.

On CRuby, nio4r wraps libev, itself a small, portable C wrapper for various kernel APIs including epoll and kqueue. While other event libraries like libuv were available at the time of nio4r’s authoring, libev was specifically chosen because it provides similar semantics to Java NIO.

When designing any asynchronous I/O library, there are two strategies you can choose:

Selector: register I/O objects of interest and poll for readiness. This allows you to monitor changes in what I/O operations can be immediately performed on objects without actually performing them. This approach is also known as a “reactor”, and is the approach favored by most *IX operating systems.
Event Completion: request I/O operations be performed, then receive asynchronous notifications when they have completed. This approach is also known as a “proactor”, and is primarily used by Windows (although Solaris and other proprietary *IX operating systems implemented completion APIs).

nio4r uses the selector approach, as opposed to libuv-style event completions. This means it’s better optimized for *IX and can remain more compatible with the Java NIO API, but performs worse on Windows. It is unfortunate Windows never implemented a high performance selector API, but it seems Windows has a fundamental architectural limit of only being able to monitor 64 object handles from a single thread which is baked very deep into its core. I/O completions were likely introduced into Windows as a workaround for this fundamental limit rather than trying to change it, although some suspect Microsoft may have done it to purposefully make it difficult to build high performance asynchronous servers which work portably across Windows and *IX.

nio4r provides portable, natively optimized implementations of the following features:

Selectors: monitor multiple I/O objects for readiness using Monitors
Monitors: track registered I/O interests for a particular object, and which ones it was selected for
ByteBuffers (WIP): natively-backed off-heap buffers that support zero copy I/O operations

In the rest of the post, I’ll go into detail about these features and how they fit into the overall design of nio4r.

Selectors #

Selectors solve the fundamental problem of waiting on more than I/O object at once. There’s already a facility built into the Ruby standard library to do this called IO.select:

server = TCPServer.new("127.0.0.1", 12345)

clients = []

# Two's a crowd, three's a party!
3.times do
  clients << server.accept
end

ready = IO.select(clients)

The ready array contains the connections that are immediately ready for reading. We can also monitor which sockets are writable:

readers = [...]
writers = [...]

ready_readers, ready_writers = IO.select(readers, writers)

That’s all well and good, but there’s a problem: each time we do a IO.select operation, we’re having to pass it the entire state of every single I/O object we want to monitor, even if it’s the same (or pretty close to the same) every time. This needless repetition adds a lot of CPU overhead and object allocations.

What would be nice is if we had a stateful API which could keep track of the objects we’re interested in so we don’t have to reconstruct that state every time we want to do a select operation. It’d be great if we could even have the kernel track that state for us! This is exactly what the epoll (on Linux) and kqueue (on BSD/OS X) APIs provide, along with Java NIO.

nio4r provides a portable version of such a stateful API with NIO::Selector:

require "nio4r"

server = TCPServer.new("127.0.0.1", 12345)

selector = NIO::Selector.new

# Two's a crowd, three's a party!
3.times do
  client = server.accept
  _monitor = selector.register(client, :r)
end

ready = selector.select

Now we don’t have to pass in an array of all of the I/O objects we want to monitor each time. In fact, we don’t even have to keep track of them at all: the selector will track them for us, so we can get rid of the clients variable.

This stateful approach lets us scale to much larger numbers of connections because the kernel is now tracking the state for us.

One last thing before I get into monitors: the NIO::Selector#select operation supports a timeout parameter:

ready = selectors.select(1.0) # wait one second

This lets us wait for any I/O objects to become available for a predetermined period of time, and if not, the operation times out. This can be useful for things like scheduling timers that run as part of the event loop (by the way, if you’re interested in timers designed to run within an asynchronous event loop, there’s a gem for that called “timers”).

Monitors #

Monitors are objects that are created when you register an I/O object with NIO::Selector#register which store the current “interests” associated with an I/O object. They’re also the objects returned from an NIO::Selector#select call.

The following methods are available on monitors:

#interests: what I/O operations monitor is selecting for (:r, :w, or :rw for read, write, and read/write respectively)
#interests=: changes the current interests a monitor is selecting for (to :r, :w, or :rw)
#readiness: what I/O operations the monitored object is ready for (:r, :w, :rw, or nil if there are no operations that are ready)
#readable?: is the I/O object ready to be read?
#writable?: is the I/O object ready to be written to?

Monitors also support a #value and #value= method for storing a handle to an arbitrary object of your choice (e.g. a Proc to be called when an object becomes ready for I/O that provides a callback to handle the event). This lets you encapsulate other connection-specific state needed to dispatch the event.

When you’re done monitoring an object (either because it’s been closed or you have lost interest in its I/O operations for other reasons) you can call NIO::Monitor#close to deregister it from the NIO::Selector.

Putting it all together #

Readiness alone doesn’t help: we actually want to perform I/O operations! The API provided by nio4r is intended to support non-blocking I/O, and should be used in conjunction with Ruby’s native non-blocking I/O methods:

IO#read_nonblock: read from a socket if it’s ready for reading, otherwise return immediately with an error.
IO#write_nonblock: write to a socket if its buffer isn’t full, otherwise immediately return an error.
Socket#connect_nonblock: begin opening a connection asynchronously, which can be polled for completeness.
Socket#accept_nonblock: begin accepting another connection asynchronously, which can be polled for completion.

These methods form the core of Ruby’s asynchronous I/O support. They can be used with IO.select to perform readiness monitoring, but NIO::Selector is more scalable. But how should they be combined?

For a complete guide to this, please see nio4r’s wiki page on flow control. For this post I will specifically cover how to handle combining NIO::Selector with IO#wait_readable or IO#wait_writable.

The naive strategy is to use the selector to wait for the operation you’re interested in, then perform it. However, I would recommend a slightly different approach, which is to attempt to perform an I/O operation, then use the selector if it fails:

read_complete = proc { |data| puts "Got data! #{data}" }

begin
  # On newer Rubies check out the "exception: false" option
  # It avoids raising an exception of the operation fails
  data = socket.read_nonblock(16384) 
  read_complete.call(data)
rescue IO::WaitReadable 
  monitor = selector.register(socket, :r)
  monitor.value = proc do
    data = socket.read_nonblock(16384)
    read_complete.call(data)
  end
end

This approach uses the selector as a sort of “error handling” mechanism for when I/O operations aren’t ready to complete. It avoids having to round-trip around the event loop for I/O operations that are already ready, which helps minimize the total number of I/O objects being monitored, and will also help to reduce latency.

As a quick recap:

Try to perform the intended I/O operation
If it succeeds, you’re done!
If it fails, register the I/O object with the selector
Wait until it’s selected, then retry the I/O operation

ByteBuffers: Coming Soon! #

There’s one last piece of the NIO API I haven’t covered yet, and that’s because it’s a work-in-progress. This piece is called ByteBuffers, and they represent fixed-sized off-heap native buffers which can be used for zero-copy I/O.

ByteBuffers, backed by java.nio.ByteBuffer on JRuby and a C implemention on CRubies, are being added to nio4r as part of a Google Summer of Code project. They will appear in the next release of nio4r.

I hope this writeup has been insightful for you, and that you find nio4r (or projects based upon it) useful! If you liked it, I’d appreciate you heading over to GitHub and starring the repo (or just hitting the kudos button at the end of this post).

367

Kudos

367