Yariv's Blog: ErlyWeb vs. Ruby on Rails EC2 Benchmarking Strangeness

Tuesday, December 11, 2007

ErlyWeb vs. Ruby on Rails EC2 Benchmarking Strangeness

I've been running some more benchmarks for the ErlyWeb vs. Ruby on Rails EC2 Performance Showdown. The results I've gotten are very strange -- so strange, in fact, that I'm not going to "officially" publish anything before I run all the tests one more time.

What's strange about the results? I'll give you a quick glimpse, but please keep in mind that none of these observations are the official benchmarks results and they may all be invalidated in the next run.

- Recompiling the Erlang files *without* HiPE seemed to *increase* max performance (in requests/sec) by ~10%.
- Running Yaws with kernel poll enabled ("+K true" passed to erl) *decreased* the max ErlyWeb performance to 410 (~41% decrease).
- No matter how many Mongrels I started (I tried 1,3,5, and 10) behind Pen (running on the same server), the max performance of Rails was ~18% lower than when it was running a on single Mongrel without Pen (111 requests/sec).

This across-the-board strangeness calls for another run. I'll try to have the results sometime in the next few days. In the meantime, if anybody else wants to take a stab at the these benchmarks, it would be interesting to see how our results compare.

By the way, although the effects of the changes I made were different from my expectations, when I ran the same tests the results were similar to the results I got last time.

13 comments:

Andy said...: - You might want to check with the Yaws and Erlang mailing lists, but I seem to recall (vaguely) reading somewhere that HiPE and epoll actually reduce Yaw's performance

- the Mongrel cluster results may not be that surprising. The way your test is set up, there's no querying DB or serving content from disk. So basically there's no blocking of the Ruby runtime and hence no opportunities to improve performance through concurrent serving. As a result, using a single Mongrel provides the highest performance. By introducing Pen and multiple Mongrels, you're introducing extra IPC costs (between Pen and Mongrel) and context switching costs (between different Mongrels), not to mention extra memory usage, which leads to lower performance.

What'd be interesting is to add benchmarks for:

- Plain Yaws dynamic page. This'd show the overhead of ErlyWeb

- PHP and/or some PHP framework. Very popular tech, has a reputation for speed. How'd ErlyWeb stack up?

- JSP and/or Spring. Same reason as PHP; December 12, 2007 12:11 AM
Jason Watkins said...: Sounds like pen is treating them as hot spares, not an active pool. I recall there's something tricky with pen config files that can lead to this unexpectedly. In any case, I don't know anyone using pen in production... typically it's nginx.

For testing, I'd suggest not bothering to set up a load balancer: just configure your load generation to hit a pool of mongrels evenly at each individual port.; December 12, 2007 10:55 AM
Yariv said...: My Pen setup was very simple. All I did was start it using

'pen 3000 localhost:8000 localhost:8001 localhost:8002...'

Maybe Pen was misbehaving somehow. I can try nginx. I think that not using a load balancer would make it an unrealistic scenario because production Rails apps do use them.

I may try benchmarking other languages but first I want to test using a database (where the songs will be stored).; December 12, 2007 11:25 AM
jherber said...: you may want to try the event driven mongrel patch from the swiftiply guys:

http://brainspl.at/articles/2007/05/12/event-driven-mongrel-and-swiftiply-proxy; December 12, 2007 1:05 PM
Yariv said...: No luck getting Mongrel cluster to perform much better switching Pen for Pound. Both 5 and 10 instance clusters peak at 103 requests/sec. Pound is only marginally faster than Pen. I think this provides a strong indication that the IPC cost between the load balancer and the Mongrels outweighs the extra concurrency advantage for this test.; December 12, 2007 4:08 PM
Pichi said...: I have seen some performance downgrade with HiPE too. HiPE increses performence mainly when one have tight loop over one function with algebraic computation. HiPE decreses performance gain or decreasse performance generally if one have many different functions calls. I susspect that module2module calls have bigger effect especillay if call HiPE to nonHiPE module a vice versa. I suspect big concurrent load too. It's my be time to open some thread on erlang-questions with feedbeack for HiPE team.; December 13, 2007 12:48 AM
Alex Payne said...: I'd second the suggestion of switching to Nginx for the Rails benchmarks. Particularly, check out the configuration at http://brainspl.at/articles/2007/11/09/a-fair-proxy-balancer-for-nginx-and-mongrel.

Most load balancers (even expensive commercial ones) manage to send more than one request at a time to Mongrel, which rapidly degrades its performance. Rails, of course, locks around each request, forcing Mongrel to queue pending requests.

I wouldn't expect to see a dramatic performance improvement with "fair" load balancing, but your results should be more consistent between runs of your benchmark.; December 13, 2007 4:53 AM
Tom Bagby said...: Are you testing with the -c option to ab? Having multiple processes doesn't do any good if there are no concurrent requests. Your numbers for rails requests per second are so much lower than normal benchmarks that it's really odd.

I did a quick set up with nginx and 4 mongrels on my macbook.

With a single mongrel, ab -c 1 ~ 100 r/sec
With 4 mongrels, -ab -c 1 ~ 85 r/sec
With 4 mongrels, -ab -c 10 ~ 200r/sec

I don't have time to muck with this, I need to go to bed, but it should perform quite a bit better with more mongrels too. The process model of concurrency is all important for rails applications.
I love erlang, and prefer to use it these days, but rails gets treated unfairly a lot. It is a memory pig, but it is in some ways similar to the erlang model of concurrency. Run lots of independant processes to service requests that share nothing. For basic CRUD stuff, you really can scale indefinitely by throwing enough hardware at the problem.; December 13, 2007 6:26 PM
Yariv said...: I'll try reconfiguring my Rails setup with Nginx. We'll see how it goes.

By the way, the Erlang concurrency model is quite different from Rails. Erlang concurrency is not "shared nothing." Erlang lets you share memory between processes (ets) and also do it in a transactionally safe manner (Mnesia). Plus, Erlang has support for DB connection pooling, whereas Rails dedicates a DB connection for every process, which can lead to bottlenecks when you have too many processes.; December 13, 2007 8:03 PM
Tom Bagby said...: Yeah, I understand that it is a crude comparison. The basic point is that Rails handles connection per process, not per thread. Erlang processes in general are more comparable to system processes than to threads. Ruby itself does have a very different threading model,
which I'm leaving out of the discussion because Rails doesn't use it/isn't threadsafe.

Mnesia/ets does have an analog in the Rails world, which is memcached. Not written in Ruby itself but part of the ecosystem/a standard thing to use. Again it is in its own process, but that only supports my claim that the processes world of Rails apps is more like erlang setups than say, a Java based system using threading.

memcached itself does not do transactions, but Ruby is a nice flexible language. Twitter created a library called starling that does transactional messaging through memcached, the initial version of which was ~ 400 lines. Twitter is a good example of scaling to do things that are "impossible" in Rails/Ruby. They actually experimented with erlang, rejected it, and managed to solve all their problems in Ruby.

I do wish that there was a standard way of doing DB connection pooling in Rails out of the box. However, modifying ActiveRecord to support pooling or multiple db schemes is not difficult and there are many plugins that implement different forms of it.

Anyway, point being, I think Erlang can definitely be characterized as a "shared nothing" system. Transactional memory like using Mnesia is just an example of how you efficiently do communicate data between processes in a messaging/copying system.

I normally spend my time evangelizing Erlang, funny to find myself defending Rails for a change. I do have a lot of experience using Rails in a production environment and if half of what was said about Rails and it's ability to scale were true, our servers would have collapsed a year ago. I also get to see how much we spend on our production slices, which is why my personal projects are written in Erlang, heh.; December 14, 2007 4:52 AM
Tom Bagby said...: Ah, one last point about the Erlang vs Rails process comparison. Rails sites do use a very supervisor/worker process pattern.

For instance, one advantage of having many worker processes is never hunting down slow memory leaks. Set a maximum process size and reboot worker processes when they hit it. Or just kill one every couple of hours round robin fashion.

One of your mongrel instances was using a native library that crashed the whole damn thing? Log it and restart the mongrel.

Again, it is a crude comparison, but the approach does have obvious similarities to the Erlang fault tolerance approach using processes.; December 14, 2007 4:59 AM
Yariv said...: As far as I know, ActiveRecord's connection pooling lets a single Ruby process connect to multiple DBs, but it doesn't let multiple Ruby processes share a pool of DB connections. The ErlyWeb DB drivers, OTOH, implement real connection pooling. The distinction is actually pretty important. If you use caching, most requests don't require querying the DB; therefore, by dedicating a DB connection for each process you may be creating an artificial bottleneck. (In fact, NYTimes created DBSlayer precisely because they needed to scale their DB connection pools independently of web server processes.)

The fact that ets copies data on inserts and lookups is an implementation detail. As far as the programmer is concerned, ets is shared memory (albeit with more consistency guarantees than traditional shared memory).

I haven't heard about the transactional library on top of Memcached. It sounds cool. However, I have an easier time trusting something that has been used in production telcom systems for many years.

The Twitter folks probably picked Ruby over Erlang because they knew Ruby better and because Ruby is farther along than Erlang is in the realm of web development (although the gap is closing rapidly :) ). That's a perfectly valid decision.

Ultimately, what Ruby (and all other imperative languages, actually) doesn't give you is the sense of freedom to spawn processes whenever you want and use concurrent programming without worries. This opens up the door to tinkering and experimentation, which always leads to interesting applications. That's the main reason I like Erlang -- it frees my imagination in ways that other languages can't.; December 14, 2007 1:13 PM
José Manuel Peña said...: Why you just don't use lighttpd instead of mongrel?; January 14, 2008 7:57 AM