Monday, July 17, 2006

The Adventures (Horrors?) Of Scaling Rails

In my latest meanderings across the Net, I came across this blog by Patrick Lenz, a Ruby on Rails application scaling guru. This blog has a 4 article series describing the mind-boggling effort it has taken Patrick and his team to scale the backend of, a popular German social networking site.

This site originally had 50,000 lines of PHP code, and it was transformed into 5,000 lines of Rails code (some features were left out). This speaks very well for Rails's boost to developer productivity, but that's not so shocking anymore.

It sounds like the coding was the easy part. It took Patrick and his team months of hard work to find the bottlenecks in and optimize the numerious components that drive comprise application's backend: Lighttpd, proxy server, Rails, Linux, MySQL, memcached. Many of the hidden bottlenecks came from impenetrable issues surrounding issues with Rails dispatchers in Lighttpd, difficulties in estimating the right sizes for thread pools, and poor multi-master replication performance in MySQL.

During this whole time, the site suffered from poor performance.

It doesn't seem like the problems came from Rails per se, but with the unimaginably complex set of auxiliary tools needed to support such a high-volume Rails application.

It must have been a very expensive project, and that's without counting the hidden cost of user dissatisfaction.

I'd bet it would have been much easier to scale this site if it were written in Erlang. The concurrency bottlenecks would have gone away, native compilation with Hipe would have outperformed Ruby, and the number of moving parts would have been much smaller: it would require Yaws for the web server, Mnesia for replicated live session data and MySQL/Postgres for large volume data.

If (when?) Mnesia will be made to handle very large data volumes better, the need for an external DBMS will vanish, and even a high-volume website could be powered by not much more than Yaws + Mnesia. Even putting aside Erlang's proven scalability and fault tolerance in commercial phone switches, when your application has few moving parts, bottlenecks are easier to identify and fix.

Another nice feature and Erlang backend has is that experimenting with different configurations would require no downtime: Erlang has hot code swapping (you can even hack Yaws while it's running -- try that with Lighttpd), and Mnesia can be reconfigured without taking it offline. This is not surprising considering that Erlang was designed from the ground up for applications that target %99.9999999 (yes, that's nine nines!) availability.

The main downside with Erlang web development is that Erlang doesn't have as many libraries are Rails. However, when you consider the tremendous efforts saved due to much better scaling, the productivity equation starts looking different.

It's strange that you don't hear of people using Erlang as their backend language for real-world websites. Is it because of a language barrier due to Erlang's functional nature? Poor PR? I can attest that Erlang web development is quite fun and productive, and I think many people would agree with me that saving hundereds of thousands of dollars on optimization efforts doesn't suck at all.

If more people used Erlang, web-centric libraries would be more abundant as well. This will happen. It's only a matter of time. Erlang is too good to remain unnoticed by the web developer community.


Yong Bakos said...

Yarvis, you are one bright dude. This reminds me of back in the day when Greenspun was touting AOLServer/TCL. The development community at large didn't take it "seriously," and TCL, though beautiful, was just too foreign for all the web-script-kiddie-turned-programmer crowd.

One barrier to adoption is always: does my cheap host provider support it?

For most common platforms (.NET, Java, PHP, RoR) this is the case.

Yaws hosting? None that I could find in 10s.

Looking forward to trying out some of your ideas.

Yariv said...

Thanks, Yong. I've been running a prototype Erlang/Yaws app, as well as this blog, on a virtual server (VPS) that costs $20 a month. You might even be able to find a cheaper deal. Yes, it's not as easy to set up as a shared hosting plan, but if you are serious about Erlang than setting up your server is a relatively minor obstacle IMO.

ryan rawson said...

Mnesia is not a bullet proof answer - the erlang list recently talked about the problems with mnesia, notably recovery in the face of network disconnections and partitions. This makes recovery not necessarily straight forward and difficult.

In any case, it isnt technically rails that failed to scale, it is the backend database overall scheme. Of course you can argue that rails is highly dependent on a rdbms, but the scaling problem doesn't belong with rails.

Yariv said...

Hi Ryan. Thanks for the feedback. I followed the Mnesia problems thread, and I am aware that Mnesia has some limitations, about some of which I have personally pestered the list a few times. In the articles to which I linked, it did sound like many of the problems with Rails came from the middle layer. I can't say, however, that such issues would plague every, or even most, Rails applications.

Another Ryan said...

Okay, I am interested; You wouldn't happen to have some skeleton code for a web app lying around -- I am very much at the monkey see, monkey do stage of erlang, yaws, mnesia exploration ;-)

Yariv said...

Ryan -- you've picked the absolute best strategy for exploring Erlang: trying it out! Best of luck.

Toby said...

Ryan: I have just started exploring Erlang myself, and recently completed my first "non-trivial" (well, still trivial:) application, a web based Tic-Tac-Toe game. It's not properly packaged yet, but works with the standard inets web server. (I may test it with yaws later.)

The source code may interest you as a simple skeleton, although as my first program it undoubtedly could be improved (and it needs to be properly packaged as an application). Source is in Subversion @ and it should be fairly simple to get running (instructions included in README).