Solving a Hard Problem for Ruby with Go
About a year ago, I was tasked with solving a hard problem (a tricky resource reservation problem with arbitrary quantities and spans). As with all problems, there were constraints: in this case use Ruby, use MySQL, make it respond in under 100ms, handle spikey traffic. This was a computationally intensive service for which no regular caching was possible so the only plausible solution (after eliminating many options) was adding a partially precomputed table to the database which could shortcut some of the calculation (ok, so it’s like a cache, but it’s an incomplete one).
In this way, Ruby could amortize out the calculation (a little bit on update, a bit little bit on read) and then also employ some pretty crazy SQL to help speed it all along.
Special Note: It may sound like an easy problem, but after you work on all the cases you have to deal with, you’ll realize it is in fact a hard problem for Ruby to solve with the given constraints.
At first the solution was a slam dunk…response times were around 40ms (at the glass 100ms), it handled the load, and was merrily working away. Without the precomputed table, the computation was coming in at 10 seconds (not milli or nano….seconds!), so this was a big win.
But I didn’t like it.
For one, it was a huge addition of complexity to the system. Although the complexity was hidden by service objects, I still knew it was there and didn’t like it. It wasn’t a real part of the system, it was a derivative of a real object and that bothered me.
But it was working and there were other pressing issues, so we moved on. Here’s what an Apache Benchmark of that service looked like:
Concurrency Level: 1 Time taken for tests: 46.114 seconds Complete requests: 1000 Requests per second: 21.69 [#/sec] (mean) Time per request: 46.114 [ms] (mean) Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 33 46 19.9 38 117 Waiting: 32 46 19.9 38 117 Total: 33 46 19.9 38 117 Percentage of the requests served within a certain time (ms) 50% 38 66% 39 75% 41 80% 42 90% 92 95% 95 98% 98 99% 100 100% 117 (longest request)
Many months later (and millions of rows later), the service started to have some issues. What was once a speedy service, was now slowing down a bit, and that was not acceptable. Even with all the precomputations and leaving Ruby with only a minor job of collating results (the part which was not cachable) it was just too slow.
Also, we began to see very rare issues with the updating of the computed table (SQL upserts via on duplicate key). We could always recalculate the table, but it was obnoxious to have to do so.
It just so happened that we were getting close to a “Free Week”, a chance to work on any crazy idea that might make our apps better for the people that use them. Some really cool stuff has come out of these times…it’s long enough that you can really dig into a problem…I love it. I work for a great company, with great people, making really cool stuff.
So I decided to go outside the box and take a look at Go, after all, I had read a good bit about it and knew it was a good candidate for a service.
Day 1 of Go
What I didn’t realize was how easy it was going to be to get productive. On the first day, we had a working service that could replace the Ruby one. On the second day, we had ironed out the bugs. On the third day, we did some performance optimization.
Honestly, it was REALLY fun. The compiler makes a great coach, the type system wasn’t annoying, and everything was just so FAST. There are minor quibbles regarding some design decisions, but the simplicity of the whole language is a huge win for getting up to speed fast.
Go is Really Fast
Primarily using interpreted languages for the last 5 years, I think I had forgotten what “fast” actually meant. This “grading on a curve” shifted my baseline to 150ms as “good”…so when the Go service clocked in at 4ms….I was euphoric, quite frankly. Even under totally unrealistic conditions generated by “ab”, the service was rock solid and fast…no huge GC cliffs or other such service time variations. Here you can see it under the same Apache Benchmark conditions as those above:
Concurrency Level: 1 Time taken for tests: 3.884 seconds Complete requests: 1000 Requests per second: 257.49 [#/sec] (mean) Time per request: 3.884 [ms] (mean) Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 1 Processing: 3 4 2.7 4 87 Waiting: 3 4 2.7 3 87 Total: 3 4 2.7 4 87 Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 5 98% 5 99% 6 100% 87 (longest request)
In addition, I found running it on our Ubuntu box to be another 20% faster or so. In other words…FAST!
In Ruby, it was 150ms with millions of precomputed database rows to help. These rows were error prone (rarely) and a congnitive burden on the codebase.
In Go, it is 4ms with no caching whatsoever. The Ruby code is so much cleaner not having to worry about derivative objects built purely for performance limitations.
Oh, and we compared hundreds of thousands of combinations against both services and found it was giving identical (correct) answers.
Hard is Now Easy
So what’s the point? The thing that was “hard” was just hard for Ruby and it’s very real performance limitations. By using Go, those vanished and the problem is back to being easy again.
Now before conclusions are drawn, I have to say that Ruby makes easy so many problems that it’s not leaving my toolbox any time soon whatsoever. I love Ruby and all the abstractions that Rails provides. But it doesn’t work for computationally intensive services, and I think that is where Go really shines.
This is just the first post on what I hope will be many.