Yariv's Blog: erlang

Showing posts with label erlang. Show all posts

Monday, March 09, 2009

Parallel merge sort in Erlang

I've been thinking lately about the problem of scaling a service like Twitter or the Facebook news feed. When a user visits the site, you want to show her a list of all the recent updates from her friends, sorted by date. It's easy when the user doesn't have too many friends and all the updates are on a single database (as in Twoorl's case :P). You use this query:


"select * from update where uid in ([fid1], [fid2], ...) order by creation_date desc limit 20"

(After making sure you created an index on uid and creation_date, of course :) )

However, what do you when the user has many thousands of friends, and each friend's updates are stored on a different database? Clearly, you should fetch those updates in parallel. In Erlang, it's easy. You use pmap():


fetch_updates(Uids) ->
  pmap(
    fun(Uid) -> 
      Db = get_db_for_user(Uid),
      query(Db, [<<"select * from update where uid =">>,
         Uid, <<" order by creation_date desc limit 20">>])
    end, Uids).


%% Applies the function Fun to each element of the list in parallel
pmap(Fun, List) ->
  Parent = self(),
  %% spawn the processes
  Refs =
      lists:map(
        fun(Elem) ->
                Ref = make_ref(),
                spawn(
                  fun() ->
                          Parent ! {Ref, Fun(Elem)}
                  end),
                Ref
        end, List),

  %% collect the results
  lists:map(
    fun(Ref) ->
            receive
                {Ref, Elem} ->
                    Elem
            end
    end, Refs).

Getting the updates is straightforward. However, what do you do once you've got them? Merging thousands of lists can take a long time, especially if you do it in a single process. The last thing you want is that your site's performance would grind to a halt when users add lots of friends.

Fortunately, merging a list of lists isn't too hard to do in parallel. Once you've implemented your nifty parallel merge algorithm, you can theoretically speed up response time by adding more cores to your web servers. This should help you maintain low latency even for very dense social graphs.

So, how do you merge a list of sorted lists in parallel in Erlang? There is probably more than one way of doing it, but this is what I came up with: you create a list of single element lists. You scan through the main list, and for each pair of lists you spawn a process that merges the two lists and sends the result to the parent process. The parent process collects all the results, and repeats as longs as there is more than one result. When only one result is left, the parent returns it.

Let's start with the base case of how to merge two lists:


%% Merges two sorted lists
merge(L1, L2) -> merge(L1, L2, []).

merge(L1, [], Acc) ->  lists:reverse(Acc) ++ L1;
merge([], L2, Acc) -> lists:reverse(Acc) ++ L2;
merge(L1 = [Hd1 | Tl1], L2 = [Hd2 | Tl2], Acc) ->
  {Hd, L11, L21} =
      if Hd1 < Hd2 ->
              {Hd1, Tl1, L2};
         true ->
              {Hd2, L1, Tl2}
      end,
  merge(L11, L21, [Hd | Acc]).

Now, to the more interesting part: how to merge a list of sorted lists in parallel.


%% Merges all the lists in parallel
merge_all(Lists) ->
    merge_all(Lists, 0).

%% When there are no lists to collect or to merge, return an
%% empty list.
merge_all([], 0) ->
     [];

%% When no lists are left to merge, we collect the results of
%% all the merges that were done in spawned processes
%% and recursively merge them.
merge_all([], N) ->
  Lists = collect(N, []),
  merge_all(Lists, 0);

%% If only one list remains, merge it with the result
%% of all the pair-wise merges
merge_all([L], N) -> 
  merge(L, merge_all([], N));

%% If two or more lists remains, spawn a process to merge
%% the first two lists and move on to the remaining lists
%% without blocking. Also, increment the number
%% of spawned processes so we know how many results
%% to collect later.
merge_all([L1, L2 | Tl], N) ->
    Parent = self(),
    spawn(
      fun() ->
              Res = merge(L1, L2),
              Parent ! Res
      end),
    merge_all(Tl, N + 1).

%% Collects the results of N merges (the order
%% doesn't matter).
collect(0, Acc) -> Acc;
collect(N, Acc) ->
  L = receive
            Res -> Res
        end,
  collect(N - 1, [L | Acc]).

So, how well does this perform? I ran a benchmark on my 2.5 GHz Core 2 Duo Macbook Pro. First, I created a list of a million random numbers, each between 1 and a million:


> L = [random:uniform(1000000) || N <- lists:seq(1, 1000000)].

Then, I timed how long it takes to sort the list, first with lists:sort() and then with my shiny new parallel merge function.


> timer:tc(lists, sort, [L]).                                 
{688149,
 [1,1,1,1,2,6,7,8,10,11,11,11,13,13,14,15,15,16,17,17,20,21,
  21,22,23,24,29|...]}

Less than a second. lists:sort() is pretty fast!

Before we can pass the list of numbers into merge_all(), we have to break it up into multiple lists with a single element in each list:


> Lists = [[E] || E <- L].

Now for the moment of truth:


> timer:tc(psort, merge_all, [Lists]).
{8187563,
 [1,1,1,1,2,6,7,8,10,11,11,11,13,13,14,15,15,16,17,17,20,21,
  21,22,23,24,29|...]}

About 8.2 seconds :(

It's not exactly an improvement, but at least we learned something. In this test case, the overhead of process spawning and inter-process communications outweighed the benefits of parallelism. It would be interesting to run the same test it on machines that have more than two cores but I don't have any at my disposal right now.

Another factor to consider is that lists:sort() is AFAIK implemented in C and therefore it has an unfair advantage over a function implemented in pure Erlang. Indeed, I tried sorting the list with the following pure Erlang quicksort function:


qsort([]) -> [];
qsort([H]) -> [H];
qsort([H | T]) ->
    qsort([E || E <- T, E =< H]) ++
        [H] ++
        qsort([E || E <- T, E > H]).


> timer:tc(psort, qsort, [L]).     
{2066387,
 [1,2,3,3,4,5,6,6,7,7,8,8,10,10,10,12,13,14,14,15,17,18,19,
  22,24,26,26|...]}

It took about ~2 seconds to sort the million numbers.

The performance of merge_all() doesn't seem great, but consider that we spawned ~1,000,000 processes during this test. It had ~19 levels of recursion (log2 500,000). At each level, we spawned half the number of processes as the previous level. The sum of all levels is 500,000*(1 + 1/2 + 1/4 + 1/8 ... + 1/19) ~= 1,000,000 (http://en.wikipedia.org/wiki/Geometric_series). 8 seconds / 500,000 processes = 0.000016 seconds / process. It's actually quite impressive!

Let's go back to the original problem. It wasn't to sort one big list, but to merge a list of sorted lists with 20 items in each list. In this scenario, we still benefit from parallelism but we don't pay for the overhead of spawning hundreds of thousands of processes to merge tiny lists in the first few levels of recursion. Let's see how long it takes merge_all() to merge a million random numbers split between 50,000 sorted lists.


> Lists = [lists:sort([random:uniform(1000000) || N <- lists:seq(1, 20)])
       || N1 <-  lists:seq(1, 50000)].
> timer:tc(psort, merge_all, [lists]).
{2259870,
 [1,1,4,5,8,10,10,12,12,13,14,14,14,16,16,17,18,18,18,18,18,
  19,19,19,21,22,25|...]}

This function call took just over 2 seconds to run, roughly the same time as qsort(), yet it involved spawning 25,000*(1 - 0.5^15)/(1 - 0.5) ~= 50,000 processes! Now the benefits of concurrency start being more obvious.

Can you think of ways to improve performance further? Let me know!

Wednesday, May 28, 2008

Announcing Twoorl: an open source ErlyWeb-based Twitter clone

With the recent brouhaha over Twitter's scalability problems, I thought, wouldn't it be fun to write a Twitter clone in Erlang?

Last weekend was cold and rainy here in Palo Alto, so I sat down and hacked one, and thus Twoorl was born. It took me one full day plus a couple of evenings. The codebase is about 1700 lines (including comments). You can get it at http://code.google.com/p/twoorl

Note: you need the trunk version of ErlyWeb to make it work (when released, it will be the 0.7.1 version).

Many people written about Twitter's scalability problems and how to solve them. Some have blamed Rails (TechCrunch is among them), whereas others, including Blaine Cook, Twitter's Architect, have convincingly argued that you can scale a webapp written in any language/framework if you've figured out how to Just Add More Servers to handle the growing traffic. Eran Hammer-Lahav wrote some of the most insightful articles on the subject, On Scaling a Microblogging Service.

I have no idea why Twitter is having a hard time scaling. Well, I have some suspicions, but since I haven't been in the Twitter trenches, such speculation isn't worth wasting many pixels on.

I didn't write a Twitter clone in Erlang because I thought my implementation would be inherently more scalable than a Rails one (although it may be cheaper to scale because Erlang has very good performance) . In fact, Twoorl right now wouldn't scale well at all since I prioritized simplicity above all else.

The reasons I wrote Twoorl are:

- ErlyWeb needs more open source apps showing how to use the framework. It's hard to pick how to use the framework just from the API docs.
- Twitter is awesome. Once you start using it, it becomes addictive. I thought it would be fun to write my own.
- Twitter is very popular, but I don't know of any open source clones. I figured somebody may actually want one!
- Some people think Erlang isn't a good language for building webapps. I like to prove them wrong :)
- Although you can scale pretty much anything, your choice of language can make a difference in of performance and stability, both of which lead to happy users.
- I think Erlang is a great language for writing a Twitter clone because Twitter's functionality offers interesting opportunities benefit from concurrency. Here are a couple of ideas I thought of:

1) If you use sharding, the Tweets for different users would be stored in separated databases. When you render the page for someone's timeline, wouldn't it be advantageous to fetch the tweets for all the users she follows in parallel? In Ruby, you would probably do something like this:


def get_tweets(users)
  var alltweets = Array.new()
  users.each { | user |
    alltweets.add(user.fetch_tweets())
  }
  alltweets.sort()
  return alltweets
end

(Please forgive any language errors -- my Ruby is very rusty. Treat the above as Pseudo code.).

This code would work well enough for a small number of tweet streams, but as the number gets large, it would take a very long time to execute.

In ErlyWeb, you could instead do the following:


get_tweets(Users) ->
  sort(flatten(pmap(fun(Usr) -> Usr:tweets() end, Users)))

This would spawn a process for each user the user follows, fetch the tweets for that user, then reassemble them in sorted order in the original process before rendering the page. (Think of it as map/reduce implemented directly in the application controller.) If a user follows hundreds of other users, querying their tweets in parallel can significantly reduce page rendering time.

2) Background tasks. When a user sends a tweet, the first thing you want to do is store it in the database. Then, depending on the features, you have to do a bunch of other stuff: send IM/SMS notifications, update RSS feeds, expire caches, etc. Why not do those tasks in different background processes? After to write to the DB, you can return an immediate reply to the user, giving him or her the perception of speed, and then let the background processes do all the extra work for processing the tweet.

(Such technique works very well for Facebook apps, by the way. In Vimagi, when the user submits a painting, the app first saves the painting data, and then it spawns a new process to update the news feed and profile box, send notifications, etc.)

Anyway, I hope you enjoy Twoorl. It's still in very early alpha. It doesn't have many features and it probably has bugs. Please take Twoorl for a spin and give me your feedback! I'll also appreciate useful contributions :)

Sunday, May 18, 2008

Erlang vs. Scala

In my time wasting activities on geeky social news sites, I've been seeing more and more articles about Scala. The main reasons I became interested in Scala are 1) Scala is an OO/FP hybrid, and I think that any attempt to introduce more FP concepts into the OO world is a good thing and 2) Scala's Actors library is heavily influenced by Erlang, and Scala is sometimes mentioned in the same context as Erlang as a great language for building scalable concurrent applications.

A few times, I've seen the following take on the relative mertis of Scala and Erlang: Erlang is great for concurrent programming and it has a great track record in its niche, but it's unlikely to become mainstream because it's foreign and it doesn't have as many libraries as Java. Scala, on the hand, has the best of both worlds. Its has functional semantics, its Actors library provides Erlang style concurrency, and it runs on the JVM and it has access to all the Java libraries. This combination makes Scala it a better choice for building concurrent applications, especially for companies that are invested in Java.

I haven't coded in Scala, but I did a good amount of research on it and it looks like a great language. Some of the best programmers I know rave about it. I think that Scala can be a great replacement for Java. Function objects, type inference, mixins and pattern matching are all great language features that Scala has and that are sorely missing from Java.

Although I believe Scala is a great language that is clearly superior to Java, Scala doesn't supersede Erlang as my language of choice for building high-availability, low latency, massively concurrent applications. Scala's Actors library is a big improvement over what Java has to offer in terms of concurrency, but it doesn't provide all the benefits of Erlang-style concurrency that make Erlang such a great tool for the job. I did a good amount of research into the matter and these are the important differences I think one should consider when choosing between Scala and Erlang. (If I missed something or got something wrong, please let me know. I don't profess to be a Scala expert by any means.)

Concurrent programming

Scala's Actor library does a good job at emulating Erlang style message passing. Similar to Erlang processes, Scala actors send and receive messages through mailboxes. Like Erlang, Scala has pattern matching sematics for receiving messages, which results in elegant, concise code (although I think Erlang's simpler type system makes pattern matching easier in Erlang).

Scala's Actors library goes pretty far, but it doesn't (well, it can't) provide an important feature that makes concurrent programming so easy in Erlang: immutability. In Erlang, multiple processes can share the same data within the same VM, and the language guarantees that race conditions won't happen because this data is immutable. In Scala, though, you can send between actors pointers to mutable objects. This is the classic recipe for race conditions, and it leaves you just where you started: having to ensure synchronized access to shared memory.

If you're careful, you may be able to avoid this problem by copying all messages or by treating all sent objects as immutable, but the Scala language doesn't guarantee safe access to shared objects. Erlang does.

Hot code swapping

Hot code swapping it a killer feature. Not only does it (mostly) eliminates the downtime required to do code upgrades, it also makes a language much more productive because it allows for true interactive programming. With hot code swapping, you can immediately test the effects of code changes without stopping your server, recompiling your code, restarting your server (and losing the application's state), and going back to where you had been before the code change. Hot code swapping is one of the main reasons I like coding in Erlang.

The JVM has limited support for hot code swapping during development -- I believe it only lets you change a method's body at runtime (an improvement for this feature is in Sun's top 25 RFE's for Java). This capability is not as robust as Erlang's hot code swapping, which works for any code modification at any time.

A great aspect of Erlang's hot code swapping is that when you load new code, the VM keeps around the previous version of the code. This gives running processes an opportunity to receive a message to perform a code swap before the old version of the code is finally removed (which kills processes that didn't perform a code upgrade). This feature is unique to Erlang as far as I know.

Hot code swapping is even more important for real-time applications that enable synchronous communications between users. Restarting such servers would cause user sessions to disconnect, which would lead to poor user experience. Imagine playing World of Warcraft and, in the middle of a major battle, losing your connection because the developers wanted to add a log line somewhere in the code. It would be pretty upsetting.

Garbage collection

A common argument against GC'd languages is that they are unsuitable for low latency applications due to potential long GC sweeps that freeze the VM. Modern GC optimizations such as generational collection alleviate the problem somewhat, but not entirely. Occasionally, the old generation needs to be collected, which can trigger long sweeps.

Erlang was designed for building applications that have (soft) real-time performance, and Erlang's garbage collection is optimized for this end. In Erlang, processes have separate heaps that are GC'd separately, which minimizes the time a process could freeze for garbage collection. Erlang also has ets, an in-memory storage facility for storing large amounts of data without any garbage collection (you can find more information on Erlang GC at http://prog21.dadgum.com/16.html).

Erlang might not have a decisive advantage here. The JVM has a new concurrent garbage collector designed to minimize freeze times. This article and this whitepaper (PDF warning) have some information about how it works. This collector trades performance and memory overhead for shorter freezes. I haven't found any benchmarks that show how well it works in production apps, though, and if it is as effective as Erlang's garbage collector for low-latency apps.

Scheduling

The Erlang VM schedules processes preemptively. Each process gets a certain number of reductions (roughly equivalent to function calls) before it's swapped out for another process. Erlang processes can't call blocking operations that freeze the scheduler for long periods. All file IO and communications with native libraries are done in separate OS threads (communications are done using ports). Similar to Erlang's per-process heaps, this design ensures that Erlang's lightweight processes can't block each other. The downside is some communications overhead due to data copying, but it's a worthwhile tradeoff.

Scala has two types of Actors: thread-based and event based. Thread based actors execute in heavyweight OS threads. They never block each other, but they don't scale to more than a few thousand actors per VM. Event-based actors are simple objects. They are very lightweight, and, like Erlang processes, you can spawn millions of them on a modern machine. The difference with Erlang processes is that within each OS thread, event based actors execute sequentially without preemptive scheduling. This makes it possible for an event-based actor to block its OS thread for a long period of time (perhaps indefinitely).

According to the Scala actors paper, the actors library also implements a unified model, by which event-based actors are executed in a thread pool, which the library automatically resizes if all threads are blocked due to long-running operations. This is pretty much the best you can do without runtime support, but it's not as robust as the Erlang implementation, which guarantees low latency and fair use of resources. In a degenerate case, all actors would call blocking operations, which would increase the native thread pool size to the point where it can't grow anymore beyond a few thousand threads.

This can't happen in Erlang. Erlang only allocates a fixed number of OS threads (typically, one per processor core). Idle processes don't impose any overhead on the scheduler. In addition, spawning Erlang processes is always a very cheap operation that happens very fast. I don't think the same applies to Scala when all existing threads are blocked, because this condition first needs to be detected, and then new OS threads need to be spawned to execute pending Actors. This can add significant latency (this is admittedly theoretical: only benchmarks can show the real impact).

Depends on what you're doing, the difference between process scheduling in Erlang and Scala may not impact performance much. However, I personally like knowing with certainty that the Erlang scheduler can gracefully handle pretty much anything I throw at it.

Distributed programming

One of Erlang's greatest strengths is that it unifies concurrent and distributed programming. Erlang lets you send a message to a process in the local or on a remote VM using exactly the same semantics (this is sometimes referred to as "location transparency"). Furthermore, Erlang's process spawning and linking/monitoring works seamlessly across nodes. This takes much of the pain out of building distributed, fault-tolerant applications.

The Scala Actors library has a RemoteActor type that apparently provides the similar location-transparency, but I haven't been able to find much information about it. According to this article, it's also possible to distribute Scala actors using Terracotta, which does distributed memory voodoo between nodes in a JVM cluster, but I'm not sure how well it works or how simple it is to set up. In Erlang, everything works out of the box, and it's so simple to get it working it's in the language's Getting Started manual.

Mnesia

Lightweight concurrency with no shared memory and pure message passing semantics is a fantastic toolset for building concurrent applications... until you realize you need shared (transactional) memory. Imagine building a WoW server, where characters can buy and sell items between each other. This would be very hard to build without a transactional DBMS of sorts. This is exactly what Mnesia provides -- with the a number of extra benefits such as distributed storage, table fragmentation, no impedance mismatch, no GC overhead (due to ets), hot updates, live backups, and multiple disc/memory storage options (you can read the Mnesia docs for more info). I don't think Scala/Java has anything quite like Mnesia, so if you use Scala you have to find some alternative. You would probably have to use an external DBMS such as MySQL cluster, which may incur a higher overhead than a native solution that runs in the same VM.

Tail recursion

Functional programming and recursion go hand-in-hand. In fact, you could hardly write working Erlang programs without tail recursion because Erlang doesn't have loops -- it uses recursion for *everything* (which I believe is a good thing :) ). Tail recursion serves for more than just style -- it's also facilitates hot code swapping. Erlang gen_servers call their loop() function recursively between calls to 'receive'. When a gen_server receive a code_change message, they can make it a remote call (e.g. Module:loop()) to re-enter its main loop with the new code. Without tail recursion, this style of programming would quickly result in stack overflows.

From my research, I learned that Scala has limited support for tail recursion due to bytecode restrictions in most JVMs. From http://www.scala-lang.org/docu/files/ScalaByExample.pdf:

In principle, tail calls can always re-use the stack frame of the calling function. However, some run-time environments (such as the Java VM) lack the primitives to make stack frame re-use for tail calls efﬁcient. A production quality Scala implementation is therefore only required to re-use the stack frame of a directly tail-recursive function whose last action is a call to itself. Other tail calls might be optimized also, but one should not rely on this across implementations.

(If I understand the limitation correctly, tail call optimization in Scala only works within the same function (i.e. x() can make a tail recursive call to x(), but if x() calls y(), y() couldn't make a tail recursive call back to x().)

In Erlang, tail recursion Just Works.

Network IO

Erlang processes are tightly integrated with the Erlang VM's event-driven network IO core. Processes can "own" sockets and send and receive messages to/from sockets. This provides the elegance of concurrency-oriented programming plus the scalability of event-driven IO (the Erlang VM uses epoll/kqueue under the covers). From Googling around, I haven't found similar capabilities in Scala actors, although they may exist.

Remote shell

In Erlang, you can get a remote shell into any running VM. This allows you to analyzing the state of the VM at runtime. For example, you can check how many processes are running, how much memory they consume, what data is stored Mnesia, etc.

The remote shell is also a powerful tool for discovering bugs in your code. When the server is in a bad state, you don't always have to try to reproduce the bug offline somehow to devise a fix. You can log right into it and see what's wrong. If it's not obvious, you can make quick code changes to add more logging and then revert them when you've discovered the problem. I haven't found a similar feature in Scala/Java from some Googling. It probably wouldn't be too hard to implement a remote shell for Scala, but without hot code swapping it would be much less useful.

Simplicity

Scala runs on the JVM, it can easily call any Java library, and it is therefore closer than Erlang to many programmers' comfort zones. However, I think that Erlang is very easy to learn -- definitely easier than Scala, which contains a greater total number of concepts you need to know in order to use the language effectively (especially if you consider the Java foundations on which Scala is built). This is to a large degree due to Erlang's dynamic typing and lack of object orientation. I personally prefer Erlang's more minimalist style, but this is a subjective matter and I don't want to get into religious debates here :)

Libraries

Java indeed has a lot of libraries -- many more than Erlang. However, this doesn't mean that Erlang has no batteries included. In fact, Erlang's libraries are quite sufficient for many applications (you'll have to decide for yourself if they are sufficient for you). If you really need to use a Java library that doesn't have an Erlang equivalent, you could call it using Jinterface. It may or may not be a suitable option for your application. This can indeed be a deal breaker for some people who are deciding between the two languages.

There's an important difference between Java/Scala and Erlang libraries besides their relative abundance: virtually all "big" Erlang libraries use Erlang's features concurrency and fault tolerance. In the Erlang ecosystem, you can get web servers, database connection pools, XMPP servers, database servers, all of which use Erlang's lightweight concurrency, fault tolerance, etc. Most of Scala's libraries, on the other hand, are written in Java and they don't use Scala actors. It will take Scala some time to catch up to Erlang in the availability of libraries based on Actors.

Reliability and scalability

Erlang has been running massive systems for 20 years. Erlang-powered phone switches have been running with nine nines availability -- only 31ms downtime per year. Erlang also scales. From telcom apps to Facebook Chat we have enough evidence that Erlang works as advertised. Scala on the other hand is a relatively new language and as far as I know its actors implementation hasn't been tested in large-scale real-time systems.

Conclusion

I hope I did justice to Scala and Erlang in this comparison (which, by the way, took me way too much to write!). Regardless of these differences, though, I think that Scala has a good chance of being the more popular language of the two. Steve Yegge explains it better than I can:

Scala might have a chance. There's a guy giving a talk right down the hall about it, the inventor of – one of the inventors of Scala. And I think it's a great language and I wish him all the success in the world. Because it would be nice to have, you know, it would be nice to have that as an alternative to Java.

But when you're out in the industry, you can't. You get lynched for trying to use a language that the other engineers don't know. Trust me. I've tried it. I don't know how many of you guys here have actually been out in the industry, but I was talking about this with my intern. I was, and I think you [(point to audience member)] said this in the beginning: this is 80% politics and 20% technology, right? You know.

And [my intern] is, like, "well I understand the argument" and I'm like "No, no, no! You've never been in a company where there's an engineer with a Computer Science degree and ten years of experience, an architect, who's in your face screaming at you, with spittle flying on you, because you suggested using, you know... D. Or Haskell. Or Lisp, or Erlang, or take your pick."

Well, at least I'm not trying too hard to promote LFE... :)

Wednesday, May 14, 2008

Is Facebook running one of the world's biggest Erlang clusters?

I just read on the Facebook engineering blog this post describing how Facebook used Erlang to scale Facebook Chat to 70 million active users overnight. WOW.

This announcement should remove any doubts that Erlang is *the* platform for building scalable realtime (aka Comet) applications.

Monday, May 12, 2008

Erlang does have shared memory

I occasionally hear people say things such as "Erlang makes concurrency easy because it doesn't have shared memory" or "processes in Erlang communicate just by message passing." Just earlier today I came across one such comment on Steve Yegge's post Dynamic Languages Strike Back (you'll have some scrolling down to do -- it's a long article :) ).

For many purposes, it's "good enough" to think of Erlang as lacking shared memory. Erlang's built-in message passing semantics makes inter-process communications trivial to implement, and for many concurrent applications, it's all you need. Also, when you first learn Erlang, it's easy to get excited by the promise of not having to worry about the difficult shared memory problems you encounter in "traditional" concurrent programming. And that's for a good reason: Erlang indeed makes concurrent programming much easier (and more scalable, reliable, etc) than other languages. However, it's not true that Erlang processes can't share memory.

Erlang has (a kind of) shared memory: it's called ets.

From the ets documentation:

This module provides very limited support for concurrent updates. No locking is available, but the safe_fixtable/2 function can be used to guarantee that a sequence of first/1 and next/2 calls will traverse the table without errors and that each object in the table is visited exactly once, even if another process (or the same process) simultaneously deletes or inserts objects into the table. Nothing more is guaranteed; in particular any object inserted during a traversal may be visited in the traversal.

Multiple Erlang processes can simultaneously access and manipulate an ets table, which makes ets act very much like shared memory. ets isn't identical to shared memory in other languages, however. These are the main differences:

- Objects are copied when inserted into and looked-up from ets tables.
- Basic consistency is guaranteed. Individual ets records never get garbled.
- ets tables are not garbage collected (this lets you store massive amounts of data in RAM without incurring garbage collection penalties -- an important trait for soft real time performance.)

Despite these differences, as far as the programmer is concerned, ets is effectively shared memory. If you want to guarantee that multiple processes get a consistent snapshot of a set of objects in a plain ets table, you're out of luck.

The good news is that Erlang doesn't leave you to your own devices to figure out some subtle solution involving locking around critical regions to access ets tables safely from multiple processes. Instead, it provides a very nice tool for working with ets: Mnesia. Mnesia is is a kind of STM for Erlang, with some extra properties such as support for distributed and persistent storage. Mnesia has a simple transaction API you can use to ensure atomicity, isolation and consistency when accessing objects in an ets table.

Here's an example, taken for the Mnesia documentation, of how to raise an employee's salary in a transaction:


raise(Eno, Raise) ->
    F = fun() ->
                [E] = mnesia:read(employee, Eno, write),
                Salary = E#employee.salary + Raise,
                New = E#employee{salary = Salary},
                mnesia:write(New)
        end,
    mnesia:transaction(F).

So, Erlang has shared memory, and it also has Mnesia, which provides easy transactional access to ets. Does that mean that concurrent programming in Erlang is just like other languages, but with using Mnesia for storing shared data? Not exactly. When you program in Erlang, you still use message passing in many situations where in other languages you would rely on semaphors/locks/monitors/signals/etc to enable inter-thread communications. In the simplest possible example, a producer/consumer application, you would use messaging in Erlang, not ets. In most other languages, you would probably use monitors to protect against concurrent access to a shared buffer (see the wikipedia examples).

Erlang isn't the only language that has message queues. In fact, you could use the basic concurrency facilities in most languages to implement message queues as higher-level abstractions for inter-thread communications (although you probably wouldn't be able to replicate Erlang's selective receive using pattern matching, and it probably wouldn't scale or perform as well as Erlang). This is exactly what some of the actor libraries out there do. However, you would have to be very careful to not send pointers/references to objects that would end up being shared between threads, or you would potentially run into nasty bugs when multiple threads modify the same object. In Erlang, all data is immutable, which makes such bugs impossible. And if even you figure out how to ensure copy semantics for message passing, you would still have your work cut out for you to allow processes to communicate between VMs...

Implementing full Erlang style concurrency isn't trivial. I don't think it can be added as a library to a language that doesn't have it by design with support from the runtime.

Erlang takes you as close as possible to concurrent (and distributed!) programming bliss -- but it does have (a kind of) shared memory: ets.

Monday, April 14, 2008

Concurrency and expressiveness

Damien Katz's article Lisp as Blub has sparked a lively debate on Hacker News on the relative merits of Erlang and other languages for building robust applications. Good points were made on all sides (except for the tired complaints about Erlang syntax -- I have a suspicion that most people who complain about it haven't done much coding in Erlang). Unfortunately, I think that a key point was lost in all the noise: all else being equal, a language with great support for concurrency and fault tolerance has a higher expressive power than a language that doesn't. It just lets you build concurrent applications much more easily (less code, fewer bugs, better scalability, yadda yadda).

Monday, March 31, 2008

Announcing the Erlang FriendFeed API

I just hacked together an (alpha) implementation of the FriendFeed API for Erlang. You can get the code at http://code.google.com/p/erlang-friendfeed/. Enjoy!

By the way, my shiny new FriendFeed is at http://friendfeed.com/yariv.

Sunday, March 09, 2008

In Response to "What Sucks About Erlang"

Damien Katz's latest blog post lists some ways in which Damien Katz thinks Erlang sucks. I agree with some of these points but not with all of them. Below are my responses to some of his complaints:

1. Basic Syntax

I've heard many people express their dislike for the Erlang syntax. I found the syntax a bit weird when I started using it, but once I got used to it it hasn't bothered me much. Sometimes I mess up and use the wrong expression terminator, and sometimes things break when I cut and paste code between function clauses, but it hasn't been real pain point for me. I understand where the complaints are coming from, but IMHO it's a minor issue.

Since the release of LFE last week, if you don't like the Erlang syntax, you can write Erlang code using Lisp Syntax, with full support for Lisp macros. If you prefer Lisp syntax to Erlang syntax, you have a choice.

2. 'if' expressions

The first issue is that in an 'if' expression, the condition has to match one of the clauses, or an exception is thrown. This means you can't write simple code like


if Logging -> log("something") end

and instead you have to write


if Logging -> log("something"); true -> ok end

This requirement may seem annoying, but it is there for a good reason. In Erlang, all expressions (except for 'exit()') must return a value. You should always be able to write

A = foo();

and expect A to be bound to a value. There is no "null" value in Erlang (the 'undefined' atom usually takes its place).

Fortunately, Erlang lets you get around this issue with a one-line macro:


-define(my_if(Predicate, Expression), if Predicate -> Expression; true -> undefined end).

Then you can use it as follows:


?my_if(Logging, log("something"))

It's not that bad, is it?

This solution does have a shortcoming, though, which is that it only works for a single-clause 'if' expression. If it has multiple clauses, you're back where you started. That's where you should take a second look at LFE :)

The second issue about 'if' expressions is that you can't call any user- defined function in the conditional, just a subset of the Erlang BIFs. You can get around this limitation by using 'case', but again you have to provide a 'catch all' clause. For a single clause, you can simply change the macro I created to use a case statement.


-define(case(Predicate, Expression), case Predicate -> Expression; _ -> undefined end).

For an arbitrary number of clauses, a macro won't help, and this is something you'll just have to live with. If it really bothers you, use LFE.

3. Strings

The perennial complaint against Erlang is that it "sucks" for strings. Erlang represents strings as lists of integers, and for some reason many people are convinced that this is tantamount to suckage.

...you can't distinguish easily at runtime between a string and a list, and especially between a string and a list of integers.

A string *is* a list of integers -- why should we not represent it as such? If you care about the type of list you're dealing with, you should embed it in a tuple with a type description, e.g.


{string, "dog"},
{instruments, [guitar, bass, drums]}

But if you don't care what the type is, representing a string as a list makes a lot of sense because it lets you leverage all the tools Erlang has for working with lists.

A real issue with using lists is that it's a memory-hungry data structure, especially on 64 bit machines, where you need 128 bits = 16 bytes to store each character. If your application processes such massive amounts of string data that this becomes a bottleneck, you can always use binaries. In fact, you should always use binaries for "static" strings on which you don't need to do character-level manipulation in code. ErlTL, for example, compiles all static template data as binaries to save memory.

Erlang string operations are just not as simple or easy as most languages with integrated string types. I personally wouldn't pick Erlang for most front-end web application work. I'd probably choose PHP or Python, or some other scripting language with integrated string handling.

I disagree with this statement, but instead of rebutting it directly, I'll suggest a new kind of Erlang challenge: build a webapp in Erlang and show me how string handling was a problem for you. I've heard a number of people say that Erlang's string handling is a hinderance in building webapps, but by my experience this simply isn't true. If you ran into real problems with strings when building a webapp, I would be very interested in hearing about them, but otherwise it's a waste of time hypothesizing about problems that don't exist.

4. Functional Programming Mismatch

The issue here is that Erlang's variable immutability supposedly makes writing test code difficult.

Immutable variables in Erlang are hard to deal with when you have code that tends to change a lot, like user application code, where you are often performing a bunch of arbitrary steps that need to be changed as needs evolve.

In C, lets say you have some code:

int f(int x) {
x = foo(x);
x = bar(x);
return baz(x);
}

And you want to add a new step in the function:

int f(int x) {
x = foo(x);
x = fab(x);
x = bar(x);
return baz(x);
}

Only one line needs editing,

Consider the Erlang equivalent:

f(X) ->
X1 = foo(X),
X2 = bar(X1),
baz(X2).

Now you want to add a new step, which requires editing every variable thereafter:

f(X) ->
X1 = foo(X),
X2 = fab(X1),
X3 = bar(X2),
baz(X3).

This is an issue that I ran into in a couple of places, and I agree that it can be annoying. However, discussing this consequence of immutability without mentioning its benefits is missing a big part of the picture. I really think that immutability is one of Erlang's best traits. Immutability makes code much more readable and easy to debug. For a trivial example, consider this Javascript code:


function test() {
	var a = {foo: 1; bar: 2};
	baz(a);
	return a.foo;
}

What does the function return? We have no idea. To answer this question, we have to read the code for baz() and recursively descend into all the functions that baz() calls with 'a' as a parameter. Even running the code doesn't help because it's possible that baz() only modifies 'a' based on some unpredictable event such as some user input.

Consider the Erlang version:


test() ->
  A = [{foo, 1}, {bar, 2}],
  baz(A),
  proplists:get_value(A, foo).

Because of variable immutability, we know that this function returns '1'.

I think that the guarantee that a variable's value will never change after it's bound is a great language feature and it far outweighs the disadvantage of having to use with unique variable names in functions that do a series of modifications to some data.

If you're writing code like in Damien's example and you want to be able to insert lines without changing a bunch of variable names, I have a tip: increment by 10. This will prevent the big cascading variable renamings in most situations. Instead of the original code, write


f(X) ->
  X10 = foo(X),
  X20 = bar(X10),
  baz(X20).

then change it as follows when inserting a new line in the middle:


f(X) ->
  X10 = foo(X),
  X15 = fab(X10),
  X20 = bar(X15),
  baz(X20).

Yes, I know, it's not exactly beautiful, but in the rare cases where you need it, it's a useful trick.

This issue could be rephrased as a complaint against imperative languages: "I don't know if the function to which I pass my variable will change it! It's too hard to track down all the places in the code where my data could get mangled!" This may sound outlandish especially if you haven't coded in Erlang or Haskell, but that's how I really feel sometimes when I go back from Erlang to an imperative language.

Erlang wasn't a good match for tests and for the same reasons I don't think it's a good match for front-end web applications.

I don't understand this argument. Webapps need to be tested just like any other application. I don't see where the distinction lies.

5. Records

Many people hate records and on this topic I fully agree with Damien. I think the OTP team should just integrate Recless into the language and thereby solve most of the issues people have with records.

If you really hate records, another option is to use LFE, which automatically generates getters and setters for record properties.

Incidentally, if you use ErlyWeb with ErlyDB, you probably won't use records at all and you won't run into these annoyances. ErlyDB generates functions for accessing object properties which is much nicer than using the record syntax. ErlyDB also lets you access properties dynamically, which records don't allow, e.g.


P = person:new_with([{name, "Paul"}]),
Fields = person:db_field_names(),
[io:format("~p: ~p~n", [Field, person:Field(P)]) || Field <- Fields]

Records are ugly, but if you're creating an ErlyWeb app, you probably won't need them. If they do cause you a great deal of pain, you can go ahead and help me finish Recless and then bug the OTP team to integrate it into the language :)

6. Code oragnization

Every time time you need to create something resembling a class (like an OTP generic process), you have to create whole Erlang file module, which means a whole new source file with a copyright banner plus the Erlang cruft at the top of each source file, and then it must be added to build system and source control. The extra file creation artificially spreads out the code over the file system, making things harder to follow.

I think this issue occurs in many programming languages, and I don't think Erlang is the biggest offender here. Unlike Java, for instance, Erlang doesn't restrict you to defining a single data type per module. And Ruby (especially Rails) applications are also known for having multitudes of small files. In Erlang, you indeed have to create a module per gen-server and the other 'behaviors' but depending on the application this may not be an issue. However, I don't think there's anything wrong with keeping different gen-servers in different modules. It should make the code more organized, not less.

7. Uneven Libraries and Documentation

I haven't had a problem with most libraries, and in cases where they do have big shortcomings you can often find a good 3rd party tool. The documentation is a pain to browse and search, but gotapi.com makes some of this pain go away.

Summary

Is Erlang perfect? Certainly not. But sometimes people exaggerate Erlang's problems or they don't address the full picture.

Here are some suggestions I have for improving Erlang:

- Add a Recless-like functionality to make working with records less painful.
- Improve the online documentation by making it easier to browse and search.
- Make some of the string APIs (especially the regexp library) also work with binaries and iolists.
- Add support for overloading macros, just like functions.
- Add support for Smerl-style function inheritance between modules.

Like any language, Erlang has some warts. But if it were perfect, it would be boring, wouldn't it? :)

Tuesday, March 04, 2008

Favorite Lisps Poll Result

With 78 responses so far, here are the breakdowns for the responders' favorite Lisps:

Scheme: 41%
Common Lisp: 27%
LFE: 12%
Arc: 10%
Emacs Lisp: 6%
Clojure: 4%

You can see the full report here.

I thought that Common Lisp would get more votes Scheme (isn't CL more widely used?), but apparently I was wrong. Arc and LFE have made an impressive showing for their young age (but this is probably due to the bias of my blog readers).

Saturday, March 01, 2008

Lisp Flavored Erlang!

Erlang has a Lisp syntax! Robert Virding just released LFE, his Lisp syntax compiler for Erlang. Finally, Erlang hackers enjoy the full power Lisp-style macros. I suspect it won't be long before you'll be able to hack ErlyWeb apps in LFE :)

While you're here, please answer this Wufoo poll:

Powered by Wufoo

Wednesday, February 27, 2008

Your Daily Dose of Erlang Evangelism

This is a good one I fished from the mailing list:

Using Erlang continually makes me both smile and cry at the same time. I smile because of the overall simplicity it brings to solving all those hard issues I mentioned above, but I also cry knowing how many hours, days, weeks, and months my former colleagues and I spent trying
to solve all those really hard issues.

http://www.nabble.com/Steve-Vinosky-interview-to15709698.html#a15720750

Sunday, February 17, 2008

Seaside-Style Programming in ErlyWeb

The Arc Challenge started an interesting thread in the ErlyWeb mailing list about continuations-driven web frameworks. ErlyWeb doesn't have built-in support for continuations, but Arc does and so does Seaside. I haven't paid much attention to the use of continuations in web frameworks before the Arc challenge, but I became especially interested in experimenting with them after seeing Seaside solution.

In case you haven't read it, this is the requirement of the Arc challenge:

Write a program that causes the url said (e.g. http://localhost:port/said) to produce a page with an input field and a submit button. When the submit button is pressed, that should produce a second page with a single link saying "click here." When that is clicked it should lead to a third page that says "you said: ..." where ... is whatever the user typed in the original input field. The third page must only show what the user actually typed. I.e. the value entered in the input field must not be passed in the url, or it would be possible to change the behavior of the final page by editing the url.

This is the original Arc solution:


(defop said req
  (aform [w/link (pr "you said: " (arg _ "foo"))
              (pr "click here")]
     (input "foo")
     (submit)))

This is how you would write the same logic in Erlang, if it had an Arc-like web framework:


said(A) ->
  form(
     fun(A1) ->
       link(fun(A2) -> ["you said ", get_var(A1, "foo")] end,
         "click here")
     end,
     [input("foo"),
      submit()]).

The Erlang code is a bit more verbose, mostly because Erlang macros don't allow you to hide the "fun() -> ... end" syntax the way Lisp macros let you hide the (lambda ..) keyword.

This is the Seaside solution:


| something |
  something := self request: 'Say something'.
  self inform: 'Click here'.
  self inform: something

IMO, this solution cheats a bit by using high-level functions for generating the HTML tags whereas the Arc version generates them explicitly. However, putting minor complaints aside, I think the Seaside version is the winner in readability. As a reddit comment said, it reads like prose. It doesn't even declare any closures explicitly -- it says exactly what it does and nothing more.

Wouldn't it be cool if we could use this style of programming in ErlyWeb applications?

Fortunately, we can! I hacked a continuations plugin for ErlyWeb that lets you write Seaside-style code so fans of this programming style would feel at home with ErlyWeb. (This is all done in 105 lines of code :) ) Before I explain how the plugin works, I'll show you how to create an ErlyWeb controller that implements the the Arc challenge using this plugin:


-module(said_controller).
-compile(export_all).
-import(continuations, [ask/2, confirm/2, pr/1]).

index(A) ->
    continuations:do(
      A, fun(K) ->      
                 Name = ask(K, "name"),
                 confirm(K, "click here"),
                 pr(["your name: ", Name])
         end).

This may seem more verbose than the Seaside code because of the module declarations at the top, but the "meat" is about the same. (I could make this code even smaller by integrating continations.erl deep into ErlyWeb, which would remove the explicit call to continuations:do(). However, I didn't want to go too far with this proof of concept.)

How does this work? Using concurrency, of course! For each "continuation", the plugin spawns a process and registers it in a Mnesia table according to a randomly generated key. The key (K) is encoded in the URLs to which the <form> and <a> tags point. When a request arrives, continuations:do() looks up the process in Mnesia and sends it a message of the type {A, self()}. The process does some work and sends back in reply the HTML to be rendered, and waits for the next message. The web server process receives the rendered HTML and sends it to the client using the normal ErlyWeb mechanisms.

If a process doesn't receive a message in 10 seconds, it dies and removes itself from the Mnesia table, which provides automatic garbage collection to stale sessions.

You can get the code for continuations.erl here. Just remember it's a proof of concept and I don't recommend using it in a production environment.

(Before you use it in your application, make sure you call continuations:start().)

Final word: IMO, although continuations help write more natural code in certain multi-page interactions, most of the logic in web applications involves rendering dynamic pages for RESTful URLs. So, if your web framework doesn't support continuations, don't worry about it too much. It's likely the code for your application wouldn't be dramatically smaller if you could write it with the use of continuations. (That said, take my advice with a grain of salt. I haven't used a continuations based framework to build a real application, so I may be missing something.)

Sunday, February 10, 2008

More Erlang Fun: Distributed, Fault Tolerant MapReduce

I the Erlang Challenge posting, I showed how to implement a simple distributed parallel map function. You can pass it a list of nodes, and it performs each application of F1 on a random node ("node" means a connected Erlang VM that may exist on a different machine) from the list. Finally, it collects the results in the order of the arguments.

What Erlang nailed beside concurrent and distributed programming is fault tolerance. When processes die or nodes go down, the VM notifies their supervisors so they can perform error recovery. My previous example didn't demonstrate this capability, so in this article I'll show how to implement a similar solution but also taking into account node failures. In addition, instead of doing just a simple parallel map, this example will also fold the results, making it a basic MapReduce implementation.

(The reason I'm adding the 'reduce' phase is because it allows me to not worry about the order at which the results are accumulated. This implies that the reduce operation must be commutative.)

The function mapreduce() takes a function F1 and applies to elements of list L. Each application is performed on a remote node from Nodes. As opposed to the pmap() example, this implementation chooses the next node from the list in a rotating fashion instead of randomly (i.e., if Nodes is [A, B, C], then F1 is applied on remote nodes with the following order: A, B, C, A, B, C, A ...). The results are collected, and if any nodes went down, their computations are retried on the remaining nodes until all computations succeed or no nodes are left. If the computations all succeed, we call lists:foldl(F2, Acc, Vals), where Vals contains the results of the computations.

Implementation details:

- do_map() returns a list of {Pid, X} tuples. Pid is the process on the remote node in which F1 was applied to X. The reason we want to remember X is that if Pid's node goes down, we can later retry to evaluate F1(X) on a different node.
- For each {Pid, X} tuple, collect() receives either an {ok, Pid, Val} message, indicating the computation was performed successfully, or {nodedown, Node}, indicating the node on which Pid lived went down. collect() folds the result into a tuple of type {Successes, Failures, RemainingNodes}. If Failures is not empty, collect() calls do_map() on the list of Failures and the remaining nodes, and then recursively calls itself on the result and appends the outcome to Successes.

The code is below. (Note that I didn't test it, so it may have bugs :) )


-module(mapreduce).
-export([mapreduce/5]).

mapreduce(F1, F2, Acc, L, Nodes) ->
  %% map F1 to the elements of L on nodes from Nodes
  Results = do_map(F1, L, Nodes),

  %% collect the results, retrying in the event of failures
  Vals = collect(F1, Results, Nodes),

  %% perform the reduce operation
  lists:foldl(F2, Acc, Vals).
 

%% exit if we ran out of good nodes
do_map(_F1, _L, []) -> exit(no_nodes);
do_map(F1, L, Nodes) ->
  Parent = self(),

  %% apply F1 to values of L on remote nodes in a rotating fashion
  {Results, _Nodes1} =
    lists:foldl(
      fun(X, {Acc, [Node | Rest]}) ->
        erlang:monitor_node(Node),
        Fun = fun() -> Parent ! {ok, self(), (catch F1(X))} end,
        Pid = spawn(Node, Fun),
        {[{Pid, X} | Acc], Rest ++ [Node]}
      end, {[], Nodes}, L),
  Results.

collect(F1, Results, Nodes) ->
  collect(F1, Results, Nodes, []).
collect(F1, Results, Nodes, Acc) ->
  {Successes, Failures, RemainingNodes}=
    lists:foldl(
      fun({Pid, X}, {Successes1, Failures1, Nodes1}) ->
        Node = node(Pid),
        receive
          {ok, Pid, Val} ->
            {[Val | Successes1], Failures1, Nodes1};

          %% we may receive this message because of call to
          %% monitor_node()
          {nodedown, Node} ->
            {Successes1, [X | Failures1], Nodes1 -- Node}
        end
      end, {[], [], Nodes}, Results),
  
  if Failures =/= [] ->
      %% retry the failed computations on the remaining nodes
      %% and add the results to the current list of successes
      Results2 = do_map(F1, Failures, RemainingNodes),
      collect(F1, Results2, RemainingNodes, Successes ++ Acc);
    true ->
      Successes ++ Acc
  end.

This is more complex than the pmap() example, but I think it packs a lot of functionality into a relatively small amount of code

Friday, February 08, 2008

Erlang Challenge Followup

After I published the Erlang Challenge, I saw a few comments on Hacker News and Reddit repudiating it with the following arguments, to which I'd like to respond:

- It was a snide response to the Arc challenge.

Maybe I should have read the article a couple of times over before I published it late last night, because if it came off as snide, I didn't communicate it as I had intended. I've found the Arc challenge fun and interesting (I even submitted my own Erlang solution), and I wouldn't intentionally respond to it in a snide way. The reason I said the Erlang challenge is in the spirit of the Arc challenge is that the Arc challenge highlights Arc's strengths, whereas the Erlang challenge highlights Erlang's strengths.

- It was rigged in favor of Erlang.

That was basically the point. I wanted the Erlang challenge to demonstrate the kinds of problems that Erlang excels at solving. Maybe I should have made this clearer when I wrote the article -- it was more of an illustration of Erlang's strengths than a "real" challenge. I wasn't really expecting other languages to compete with Erlang on its home turf, the same way I wouldn't expect Erlang to compete with, say, C or C++ at low-level systems programming or with ActionScript at Flash programming.

- It showed that Erlang is good at solving a narrow scope of problems, but I didn't show that Erlang is a good general purpose tool.

I don't think it's possible to show in 10 lines of code how good (or bad) a language is at solving anything but problems that are very similar to those in the example. The only true way to judge whether a language is good at building certain types of systems is to build them using the language and evaluate the results.

I know that people have used Erlang successfully to build phone switches, a MMORPG, a massively scalable DBMS, web applications, a collaboration application , high performance messaging servers, ad servers, web servers, a scalable poker server, and a 3D modeling tool. When I say that Erlang is a good language for building certain applications, that's what I base it on.

Edit: I forgot to list CouchDB, an open source document storage system, and Triggit, a widget server unlike any other.

Thursday, February 07, 2008

The Erlang Challenge

In the spirit of Paul Graham's Arc challenge, and in spite of the "controversy" around it, I devised a programming challenge of my own. It's called the Erlang Challenge.

The goal is to implement an advanced "parallel map": a function (F1) that applies another function (F2) to every element (E) of a list (L1) concurrently and then gathers the results in a new list (L2). The order of results in L2 corresponds to the order of elements in L1, and the elements of L1 are guaranteed to remain unchanged no matter what function F2 does. To make things more interesting, the F1 should also accept a list of cluster nodes (NL) onto which the computations can be distributed. Each application of F2 to an element of L1 should be done on a random node from NL, and the result should be send back to the node on which F1 was called (N). If NL is empty, all computations should occur locally on N.

This challenge isn't just about succinctness, but, like most real-world systems, also about scalability and performance. Therefore, it has the following requirements:

- The implementation must be able to spawn at least 1 million concurrent processes on a modern server machine.
- On multi-core machines, the application's performance must increase (almost) linearly with the number of available CPUs.
- The application as a whole must exhibit soft real-time performance. Specifically, this means that processes won't freeze for a long time during global garbage collection sweeps, and that no process will be able to block the entire VM to do IO or call native code, regardless of what F2 does. In other words, it is impossible to create a function F2 that could block the scheduler while F2 executes.

Here's how I would implement it in Erlang (this is loosely based on Joe Armstrong's pmap implementation):


pmap(Fun, List, Nodes) -> 
  SpawnFun =
    case length(Nodes) of
       0 -> fun spawn/1;
       Length ->
         NextNode = fun() -> nth(random:uniform(Length), Nodes) end,
         fun(F) -> spawn(NextNode(), F) end
    end,
  Parent = self(),
  Pids = [SpawnFun(fun() -> Parent ! {self(), (catch Fun(Elem))} end)
    || Elem <- List],
  [receive {Pid, Val} -> Val end || Pid <- Pids].

How would you implement it in your favorite language?

Update: please read the followup.

Monday, January 28, 2008

Debugging with Distel

Until recently, I've done my Erlang debugging either using the built-in debugger or not using a debugger at all and instead relying on the good old io:format() function. The reason I haven't been keen on using the Erlang debugger is that it's not integrated into my editor (I use Emacs). To debug a file, I'd have to open it separately in the debugger UI, where I could set breakpoints and step through the code. For simple bugs, this is more effort than adding a few io:format() statements, reloading the page, and figuring out the bug from the output.

I'm not an IDE fanatic (well, duh, I use Emacs) -- I can do fine without most of the features in advanced IDEs -- but I do expect any decent editor to have at least 3 features: syntax highlighting, auto-indentation, and integrated debugging.

(I'm ambivalent about IDEs with code completion. It adds convenience, but it also it lets you get by without really remembering your APIs. Without code completion, I remember most of the my application's function names and their parameters simply because not remembering them slows me down too much. This is akin to the cellphone effect on remembering numbers: before I had a cellphone, I remembered all my close friends and family members' numbers. Looking them up in an address book was too much work. Now, I remember just a fraction of them. My cellphone made me lazy.)

The Emacs mode for Erlang provides syntax highlighting and auto-indentation, but no integrated debugging. AFAIK, neither Erlide nor ErlyBird have integrated debuggers. I knew Distel was supposed to add debugging support, and I even tried setting it up a while ago but gave up after running into problems I couldn't figure out. Last weekend, though, I finally got it working after following Bill Clemenson's Distel tutorial. I must say that having a debugger integrated into Emacs is a huge improvement. I recommend it to every Erlang hacker.

My only peeve with this debugger is that you must mark each file you want to debug as interpreted (the Emacs shortcut is "C-c C-d i") before you can set breakpoints in it and step into its functions. This seems unnecessary -- Erlang .beam files usually contain metadata including the location of their source files. I wish the debugger used this information to auto-interpret all the files you're trying to debug.

This is a relatively small peeve, though. The Distel debugger is definitely a step up from io:format().

I'm still hoping someone will create a more user-friendly Erlang IDE with a slick UI and an integrated debugger. I like the TextMate interface, but TextMate doesn't do auto-indentation for Erlang or debugging. I'll stick to Emacs+Distel for now.

Wednesday, January 23, 2008

Triggit: Why Didn't I think of That?

Triggit received some great publicity last week. Triggit, a small startup in San Francisco, recently released a product makes it easy to add widgets and ads to any site by dragging and dropping -- no HTML editing required. I think it can really catch on among bloggers who are disinclined to muck around in mounds of PHP code in order to add some widgets to their blogs. I can also see this kind of technology fitting very well in sites like MySpace, where non tech-savvy users could use Triggit to effortlessly soup up their profiles.

Ryan Tecco, Triggit's CTO, told me Triggit uses Erlang for all the heavy lifting in the backend. You probably won't find this shocking, but I think using Erlang is a smart choice. Low latency, scalability and high availability are all essential for this kind of product: it's one thing to have performance/availability issues on *your* site, but it's much worse to let those issues affect other people's sites as well.

Finally, Triggit is an interesting data point on how very small teams can successfully use Erlang to achieve impressive results in a short time frame.

Update: I added this video with Triggit :)

Monday, January 21, 2008

How to Use Concurrency to Improve Response Time in ErlyWeb Facebook Apps

When I was building the Vimagi Facebook app, I came across a common scenario where using concurrency can make your application more responsive for its users.

The typical flow of responding to requests coming from Facebook looks like this:

1) request arrives
2) do some stuff (mostly DB CRUD operations)
3) call Facebook API to send notifications / update newsfeeds and profile FBML
4) send response

When you're building a facebook app with ErlyWeb, you can instead do the following:

1) request arrives
2) do some stuff (mostly DB CRUD operations)

spawn(fun() ->
3) call Facebook API to send notifications / update newsfeeds and profile FBML
end)

4) send response

The Facebook API calls in step 3 are much more expensive than the typical ErlyWeb controller operations because these calls involve synchronous round trips to the Facebook servers plus XML processing for the responses. By performing the Facebook API calls in a new process we can return the rendered response to the browser immediately and let the Erlang VM schedule the Facebook API calls to happen leisurely in the background.

The only gotcha is that if an error occurs in the spawned process we can't notify the user right away -- but this isn't really problem because we can log the errors and retry later, which is arguably a better approach anyway from a usability standpoint.

It's really that easy! A simple call to spawn() makes our app *feel* much faster. This puts the debate around language performance comparisons in a new light: how do you take into account the observation that some languages make "cheating" so much easier? :)

Thursday, December 13, 2007

Amazon SimpleDB Runs on Erlang

I just came across this blog post: http://www.satine.org/archives/2007/12/13/amazon-simpledb/. The author got the scoop that Amazon's just-released SimpleDB runs on Erlang. This is actually the second Amazon Web Service that runs on Erlang -- at least, if the rumors I've heard that Amazon SQS was built with Erlang as well are true (update: I found the reference in this article).

It's pretty cool knowing that you can use the same language that powers some of the massive systems that Amazon and Ericsson have built to whip up a blog app just as easily as in Ruby or Python. (But if you built it in Erlang/ErlyWeb, you could also use Comet to make the blog update itself in real time, resting assured that horizontal scalability is just a matter of reconfiguring your Mnesia schema :) ).

Amazon SimpleDB is a game changer. It fills the last hole in making Amazon Web Services a complete environment for elastic, scalable web application hosting: the need for a fast, reliable data store. Together with S3 and EC2, SimpleDB makes it feasible for small startups to scale like the big companies but without the operational overhead. What a great set of products from Amazon.

Saturday, November 17, 2007

Erlang Quote of the Day

Who cares if Erlang starts slowly - it was designed to start once and never stop - we have systems that have run for 5 years - a two seconds start-up time amortized over 5 years is not *too* bad.

- Joe Armstrong

(From http://www.nabble.com/idea%3A-service-pack-one-tf4800752.html#a13735173)