Sunday, August 27, 2006

To Web Developers: Think Outside The Thread

A few people have asked me this very good question (I'm paraphrasing): "If I'm just trying to build a webapp, and I don't need telcom-grade scalability, why should I use Erlang? Yes, Erlang may be the best language for concurrent programming, but I will never have to use Erlang's concurrency features directly, just as I don't use any threading libraries when I'm building a webapp with other languages."



Here's my answer: you can certainly build a webapp without using concurrency features directly. Most web developers don't. Just keep in mind that most framework designers use languages that support concurrency so poorly, that they have resorted to fooling you into thinking that the only webapps you can realistically build must follow the good ol' request-response, single-threaded, CRUD-based paradigm.



This is clearly a fallacy.



Many high-quality, innovative webapps don't follow this paradigm.



The Google search engine doesn't follow this paradigm. Neither does Meebo.



Many of the great webapps of tomorrow won't follow this paradigm.



Why should your webapp follow this paradigm?



The paradigm is shifting to a large degree because of one main innovation, which somebody has named "Comet" (it's a silly name if you ask me, but it has stuck). Comet boils down to having the browser keep an open XMLHttpRequest connection to the server, waiting for the server to send it messages. This opens the door to full bidirectional communication between the browser and the server (this is essentially what Meebo does).



If you want to build a Comet app with today's Java, Ruby or Python frameworks -- good luck. Not only will the message-passing -- not to mention clustering -- logic be painful to code (especially if you've tasted the power of Erlang :) ), your servers will also croak when your app reaches a few thousand (or fewer) simultaneous users.



Partly for these reasons, nobody uses Ruby or Python to build commercial phone switches (not that I know of, anyway :) ). However, Ericsson builds them quite successfully with Erlang. Erlang powers highly scalable, fault-tolerant phone switches with nine nines availability. Building scalable clusters is what Erlang is made for, and I dare say Erlang makes such programming tasks (relatively) easy.



If you want to build an app that uses Comet, you will need concurrency and scalability, so use Erlang.



However, if your app doesn't use Comet, should you still care about Erlang's unmatched capabilities for concurrency, clustering and message passing?



Yes.



Another good reason to use Erlang is that your web application may benefit from -- or even require, as in Google's case -- distributing a calculation over multiple processors or even multiple nodes in a cluster prior to sending a response back to the browser.



Erlang is designed for distributed programming. In Erlang, sending a message to a remote node is as easy as sending it to another process in the same box. As Joe Armstrong has shown, writing a parallel version of lists:map is almost laughably easy in Erlang. This is all it takes:




pmap(F, L) ->
S = self(),
Pids = map(fun(I) ->
spawn(fun() -> do_f(S, F, I) end)
end, L),
gather(Pids).

gather([H|T]) ->
receive
{H, Ret} -> [Ret|gather(T)]
end;
gather([]) ->
[].

do_f(Parent, F, I) ->
Parent ! {self(), (catch F(I))}.


Distributed computation will grow in importance as server manufacturers race to add more cores to their boxes. (The next release of Sun Niagara will have 64 cores, for instance.) Even if you don't need distributed computation right now, you'll benefit from adding Erlang to your toolbox. Multi-core computation is where things are going, and knowing Erlang will help you stay on top of the game.



By the way, I'm not suggesting that Erlang has a monopoly on distributed programming -- Erlang is just years ahead of any other language in this area. Erlang has 20 years of development behind it for this specific purpose, with a long track record of success in large-scale production systems. Sure, you always have the option of using, say, Starfish for Ruby. You'd just have to cross your fingers and hope your competition doesn't use Erlang :)



To summarize, if you are 100% confident that you will never attempt to push the envelope of what your webapp's backend can do, you probably won't need Erlang's concurrency features. IMO, you'll have fun by using Erlang, but you certainly don't need it. However, if you start venturing into areas that are beyond the single-threaded CRUD view of the world, Erlang's capabilities will help you greatly.



So why use Erlang?



Think outside the thread.

18 comments:

thomas lackner said...

I want to love Erlang, but it seems to me that most Erlang advocates keep talking about phone switches and concurrency, and not the things that are important in 99.99% of web apps: easy-to-use MVC toolkits (like Rails), and great database connectivity. Mnesia is difficult for someone from an SQL background to understand, and the fact that it's only questionably able to handle large databases gives many people pause. The awkward record syntax doesn't help either - most web apps are just manipulating database records after all.

I'm not trying to be negative or a "hater," but I am trying to figure out how I can get my team of jr. programmers to build web apps better and faster - help me understand how Erlang fills that role.

Yariv said...

Thomas -- this is a great point, and it's just the problem I'm trying to tackle now. Keep reading this blog and you'll hear good news -- you have my word :)

Reinier Zwitserloot said...

All good points. I'm not sold on the idea that you have to use a programming language with this kind of purpose in its genes though. I'm currently hacking together a java webserver that does away with the entire stack of traditional tools, including servlets, hibernate, and many template engines, and instead works off of a simple construct:

All actual web app code runs in one thread - for ALL requests. Reading in template files and making db queries get offloaded to separate threads (there's no practical alternative to this), but all that code is provided for you. Any kind of synchronizing or use of ThreadLocals is pointless and not needed.

If your CPU has enough cores to idle along (as a rule 2 cores have enough jobs between db queries and the web app itself not to need this), run multiple instances, and use a load balancer, on a single computer if you have to.

Makes for easier programming. More importantly, makes for a different look on web development.

COMET is considered somewhat difficult and lots of folks are making 'toolkits' and the like, but if your webserver doesn't thread anyway, when it's all a big pile of reactions to events, COMET is the most natural thing in the world.

I'm coding the db stuff in but I've already written a bunch of apps that don't use a dbase, including a calculator with a tickertape that is 'shared' amongst all users of the calculator webapp, and the tape itself updates instantly for all users. Very simple to do. Would have been annoying and a resource hog on traditional threaded systems.

Glad to hear you're addressing Tom's comments because I'm already feeling very constrained by the lack of db support. Unfortunately hibernate and many other object mapping tools are spectacularly unsuited to this kind of thing.

I wonder, what would you consider a 'reasonable' set of primitives for a webapp to cover most bases?

I got templating and dbase access. Beyond that? I got nothing. Apache can take care of access control and logging where needed, and the rest doesn't come up often enough.

Dmitrii Dimandt said...

I think that besides "erlang-telecom-erlang-telecom" web developers should also be told this:

- Mnesia is a nice database for small amounts of data - 4-8 GB, not more :))) (am I correct on this estimate?)

- Quite a lot of things can be parallelized in web development. Most notably, logging and post-process actions:

-- Once you do an action you can spawn a separate logging process and return to the user immediately
-- Once you accept all data (most notably, image files), you can spawn separate processes to do whatever with the data at hand (scale, crop, rotate, move to a different folder etc. etc.) and return to the user immediately

Many web developers don't even understand that this, indeed, is possible.

I think that in order to sell Erlang to web developers, we need to focus exactly on the things that can be "branched off". Because telecom stuff is interesting, but it also is scary. What else can we lure web developers with?

Dmitrii Dimandt said...

Off-topic

Typo produces an error when I try to enter full URL (with http:// and trailing slash included)

And it doesn't respect carriage returns in comments :)

Prashant Rane said...

What about Java NIO? Sun's OpenSource Application Server has HTTP connetor, Grizzley, which uses NIO.
http://blogs.pathf.com/agileajax/2006/06/infrastructure_.html

Java with all the libraries and tools still looks like a good proposition.

Jeethu Rao said...

Just started learning erlang 2 days ago. Honestly, I'm impressed by the sheer elegance of the process model and the ease of IPC. Erlang seems to be extremely well polished for distributed computing.


Prashanth: Its more than just non blocking IO with NIO. I've been using Twisted for a few years now, which btw is pretty cool framework. But with non blocking IO (all the way from select loops to the reactor pattern), you're forced to write code in a non linear fashion, but with light weight processes you get all the goodness of NIO with simple readable code.

Yariv said...

The problem with NIO is that it basically makes your application single threaded. It's not friendly to long-running operations, because they prevent other things from happening. You have the option of spawning off long-running operations to other threads, but then you run into the same old scalability problems. NIO also also locks you into an ugly callback-based coding style. With Erlang, you get the best of both worlds: NIO-like scalability combined with a preemptive scheduler and a multi-threaded coding style, with excellent concurrency semantics thrown into the mix.

Jason said...

I agree with an earlier post about a 'renaissance' or upsurge of interest in erlang and I enjoy reading the full-on advocacy ;-) What I haven't seen addressed here (I've not read all comments) is string handling. What makes a runtime that represents strings as lists using 1 byte per char especially suitable for my next web forum?

Yariv said...

Jason, I appreciate the feedback :) RE strings, there was a recent long discussion about strings recently in the mailing list, which I can recommend for you to read. By the way, Erlang strings aren't 1 byte per char but 2 words per char. This means that on a 64 bit machine, each character takes 128 bits!!! However, we're ignoring an important point here: Erlang has binary objects. You should use them as much as possible to reduce memory consumption.

Jason said...

Hi Yariv, I will search for the mailing list list thread. Apologies for my thinko, I meant 8bytes per char (on a 32bit system). I am also aware that we can use binary representations but again, what compelling reason is there to do so for a web app? These are questions to which I feel an advocate needs to provide solid answers when trying to sell web developers on erlang ;-)

Yariv said...

Oops, Typo mangled my symbols :(

Yariv said...

Jason, unless you need to perform list-style manipulations on a string, you're better off using Erlang binary objects due to their smaller memory consumption. You just need to remember to add those '' symbols around your strings, e.g. . For instance, when you're reading from a database and sending the result verbatim to the client, use binaries. I hope this calms your fears with regards to strings memory consumption :) At the very worst, though, Erlang would consume more memory than other languages when performing string manipulations, but this isn't as big a drawback as it may sound IMO because memory is pretty cheap these days.

Zachary Pinter said...

Thanks for your recent posts about Erlang Yariv. I was thinking about Yaws and Rails, and was wondering how difficult it would be to make Yaws serve Rails (almost in a FCGI sense). Yaws/Erlang could then monitor Rails processes, see if they go over a certain memory threshold, restart them if they fail, and so on. Additionally, it would then seem possible to have Yaws handle certain parts of the site, while forwarding other parts to Rails making for an easier integration between the two.

Does such an idea sound feasible/practical/useful?

Jason said...

Hi Yariv, I read the mailing list thread. UTF-8 encoded binary 'strings' appear to be the way to go, I'll try this at some point and see if I consider it workable in practice. A tutorial/example showing html form validation and sanitization would be interesting indeed. Which ever way it's sliced, I still believe erlangs string handling would present an obstacle to widepread adoption for web development.

Yariv said...

Hi, Jason. I appreciate your research into the matter. I haven't ran into the strings "issue" yet myself, so I would be grateful if you could share your experiences. There will be tutorials shortly. Erlang is still catching up to other languages in terms of web development libraries and developer community size, but I will be doing my best to make sure this "catch up" phase is as short as possible :) Read my latest posting about ErlyDB. I think you'll like it.

Anonyme said...

Nine nines reliability: unclear how this figure was obtained?

Sorry if this has been brought up before, but I've just finished reading a large portion of Joe Armstrong's PhD dissertation (20 November 2003 version), and he states that

"[f]or the Ericsson AXD301 the only information on the long-term stability of the system came from a power-point presentation showing some figures claiming that a major customer had run an 11 node system with a 99.9999999% reliability, though how these figures had been obtained was not documented" (191).

If the 99.9999999% figure was taken from Joe's dissertation, then it seems prudent to put it in context, à la the quotation above. That aside, Erlang's kick-ass! :)

Yariv said...

I this mailing list posting, http://www.erlang.org/ml-archive/erlang-questions/200606/msg00162.html, it's mentioned how Erlang ran the telephone system in the UK with nine nines availability during the first year of operation. However, later it has been clarified to me that Erlang doesn't target nine nines in all cases -- "just" better than five nines :)