Go in 42 lines of Ruby
Go’s Goroutines are like sugar. They give me a rush, and my IO-bound work as well!
But for a one-off script, my Go’s not fluent enough, so why not bring some of it to Ruby?
I’m writing a script that scrapes a slow API pretty intensely. The naive approach would have me do this:
things_ids.each do |thing_id|
data = HttpClient.get("http://example.com/things/#{thing_id}")
save_to_file(data)
end
… but I have 3,000 things to get, the API takes 1 second per call, and I’m not that patient.
(For the record, the API in question is Pivotal Tracker’s)
So what I’d really like to do is something like:
things_ids.each do |thing_id|
go do
data = HttpClient.get("http://example.com/things/#{thing_id}")
save_to_file(data)
end
end
and have it magically parallelize my go
block.
Sure, I could use Celluloid’s beautiful futures, or Faraday’s parallelism support, but today I have a moment to tinker under the hood.
All I really need is a pool of threads, futures, and some syntax sugar:
class Pool
def initialize(max_size = nil)
@threads = []
@lock = Monitor.new
@signal = ConditionVariable.new(@lock)
@max_size = max_size
end
def go(&block)
@lock.synchronize do
@signal.wait while !available?
@threads << Thread.new do
Thread.current.abort_on_exception = true
begin
value = block.call
ensure
release Thread.current
end
value
end
return @threads.last
end
end
private
def release(thread)
@lock.synchronize do
@threads.delete thread
@signal.broadcast
end
end
def available?
@max_size.nil? || @threads.length < @max_size
end
end
Our pool just has one go
method. Pass it a block, and it will spawn a thread
to do your work (possibly waiting on the pool to have a free slot).
abort_on_exception
makes sure that if one of your “goroutines” bursts into
flames, you’re told (as opposed to the default of failing silently).
The @lock
/ @signal
dance keeps your pool size under control;
specifically the @signal
will wake up waiting for a slot in the pool whenever
one becomes available.
Threads in Ruby can be used as promises/futures, and the go
method returns a
thread object (leaky API, but bear with me):
go { :hello }.value
#=> :hello
The shiny
I almost forgot the syntax sugar that lets you say go
in your
favourite Ruby REPL:
def Kernel.go(&block)
($pool ||= Pool.new).go(&block)
end
Adding the go
method directly on Kernel
means it’ll be globally available
abolutely everywhere in your VM.
Probably not the best idea, but hey, Ruby lets you do it!
Caveat emptor
There are a few major caveats here, so please don’t do the above in production code. Experts have built better libraries for this, this is for educational purposes, don’t say you weren’t warned:
- Threaded code is bloody hard to integration test, as is everything with threads and parallelism.
- If you limit the pool size, you’ll get deadlocks rapidly unless you very carefully think about what you’re spawning.
- If you don’t, your machine may or may not catch fire (or in my case, your API token may well get banned!)
Threads v processes
At HouseTrip, we generally avoid the use of threading for scaling purposes, and follow the Unix Way (and, incidentally, the 12 factor way) of scaling out through the process model.
But cases exist where concurrency (not scaling) is useful. The example I’ve used (calling on a bunch of external resources and processing the results) is a typical one. Another classic, which will sound more familiar to web engineers, is firing database queries “in parallel” before rendering your views.
To elaborate: in this concurrency example you want the “threads” to interact, so they can for instance share/reuse HTTP connections, and then provide an aggregate result.
In a “scaling out with threads” example, examples, the threads interact (you don’t have a choice—the memory and object space is shared) but it’s actually a problem as one thread has the ability to corrupt or disrupt all others.
Tools of the thread
Here’s some pointers to get you started, should you want to leverage concurrency in your Ruby VM:
- Ilya Grigorik has built agent, a much more elaborate (and tested) version of the above.
- Another classic library, offering the Actor model in a more traditional OO fashion, is Celluloid.
- For a different approach to the same problem, check out async/await in .NET 4.5. Always good to see how the other side does it.
That’s all, go have fun with threads now!