Sunday, May 18, 2008

Erlang vs. Scala

In my time wasting activities on geeky social news sites, I've been seeing more and more articles about Scala. The main reasons I became interested in Scala are 1) Scala is an OO/FP hybrid, and I think that any attempt to introduce more FP concepts into the OO world is a good thing and 2) Scala's Actors library is heavily influenced by Erlang, and Scala is sometimes mentioned in the same context as Erlang as a great language for building scalable concurrent applications.

A few times, I've seen the following take on the relative mertis of Scala and Erlang: Erlang is great for concurrent programming and it has a great track record in its niche, but it's unlikely to become mainstream because it's foreign and it doesn't have as many libraries as Java. Scala, on the hand, has the best of both worlds. Its has functional semantics, its Actors library provides Erlang style concurrency, and it runs on the JVM and it has access to all the Java libraries. This combination makes Scala it a better choice for building concurrent applications, especially for companies that are invested in Java.

I haven't coded in Scala, but I did a good amount of research on it and it looks like a great language. Some of the best programmers I know rave about it. I think that Scala can be a great replacement for Java. Function objects, type inference, mixins and pattern matching are all great language features that Scala has and that are sorely missing from Java.

Although I believe Scala is a great language that is clearly superior to Java, Scala doesn't supersede Erlang as my language of choice for building high-availability, low latency, massively concurrent applications. Scala's Actors library is a big improvement over what Java has to offer in terms of concurrency, but it doesn't provide all the benefits of Erlang-style concurrency that make Erlang such a great tool for the job. I did a good amount of research into the matter and these are the important differences I think one should consider when choosing between Scala and Erlang. (If I missed something or got something wrong, please let me know. I don't profess to be a Scala expert by any means.)

Concurrent programming


Scala's Actor library does a good job at emulating Erlang style message passing. Similar to Erlang processes, Scala actors send and receive messages through mailboxes. Like Erlang, Scala has pattern matching sematics for receiving messages, which results in elegant, concise code (although I think Erlang's simpler type system makes pattern matching easier in Erlang).

Scala's Actors library goes pretty far, but it doesn't (well, it can't) provide an important feature that makes concurrent programming so easy in Erlang: immutability. In Erlang, multiple processes can share the same data within the same VM, and the language guarantees that race conditions won't happen because this data is immutable. In Scala, though, you can send between actors pointers to mutable objects. This is the classic recipe for race conditions, and it leaves you just where you started: having to ensure synchronized access to shared memory.

If you're careful, you may be able to avoid this problem by copying all messages or by treating all sent objects as immutable, but the Scala language doesn't guarantee safe access to shared objects. Erlang does.

Hot code swapping


Hot code swapping it a killer feature. Not only does it (mostly) eliminates the downtime required to do code upgrades, it also makes a language much more productive because it allows for true interactive programming. With hot code swapping, you can immediately test the effects of code changes without stopping your server, recompiling your code, restarting your server (and losing the application's state), and going back to where you had been before the code change. Hot code swapping is one of the main reasons I like coding in Erlang.

The JVM has limited support for hot code swapping during development -- I believe it only lets you change a method's body at runtime (an improvement for this feature is in Sun's top 25 RFE's for Java). This capability is not as robust as Erlang's hot code swapping, which works for any code modification at any time.

A great aspect of Erlang's hot code swapping is that when you load new code, the VM keeps around the previous version of the code. This gives running processes an opportunity to receive a message to perform a code swap before the old version of the code is finally removed (which kills processes that didn't perform a code upgrade). This feature is unique to Erlang as far as I know.

Hot code swapping is even more important for real-time applications that enable synchronous communications between users. Restarting such servers would cause user sessions to disconnect, which would lead to poor user experience. Imagine playing World of Warcraft and, in the middle of a major battle, losing your connection because the developers wanted to add a log line somewhere in the code. It would be pretty upsetting.

Garbage collection


A common argument against GC'd languages is that they are unsuitable for low latency applications due to potential long GC sweeps that freeze the VM. Modern GC optimizations such as generational collection alleviate the problem somewhat, but not entirely. Occasionally, the old generation needs to be collected, which can trigger long sweeps.

Erlang was designed for building applications that have (soft) real-time performance, and Erlang's garbage collection is optimized for this end. In Erlang, processes have separate heaps that are GC'd separately, which minimizes the time a process could freeze for garbage collection. Erlang also has ets, an in-memory storage facility for storing large amounts of data without any garbage collection (you can find more information on Erlang GC at http://prog21.dadgum.com/16.html).

Erlang might not have a decisive advantage here. The JVM has a new concurrent garbage collector designed to minimize freeze times. This article and this whitepaper (PDF warning) have some information about how it works. This collector trades performance and memory overhead for shorter freezes. I haven't found any benchmarks that show how well it works in production apps, though, and if it is as effective as Erlang's garbage collector for low-latency apps.

Scheduling


The Erlang VM schedules processes preemptively. Each process gets a certain number of reductions (roughly equivalent to function calls) before it's swapped out for another process. Erlang processes can't call blocking operations that freeze the scheduler for long periods. All file IO and communications with native libraries are done in separate OS threads (communications are done using ports). Similar to Erlang's per-process heaps, this design ensures that Erlang's lightweight processes can't block each other. The downside is some communications overhead due to data copying, but it's a worthwhile tradeoff.

Scala has two types of Actors: thread-based and event based. Thread based actors execute in heavyweight OS threads. They never block each other, but they don't scale to more than a few thousand actors per VM. Event-based actors are simple objects. They are very lightweight, and, like Erlang processes, you can spawn millions of them on a modern machine. The difference with Erlang processes is that within each OS thread, event based actors execute sequentially without preemptive scheduling. This makes it possible for an event-based actor to block its OS thread for a long period of time (perhaps indefinitely).

According to the Scala actors paper, the actors library also implements a unified model, by which event-based actors are executed in a thread pool, which the library automatically resizes if all threads are blocked due to long-running operations. This is pretty much the best you can do without runtime support, but it's not as robust as the Erlang implementation, which guarantees low latency and fair use of resources. In a degenerate case, all actors would call blocking operations, which would increase the native thread pool size to the point where it can't grow anymore beyond a few thousand threads.

This can't happen in Erlang. Erlang only allocates a fixed number of OS threads (typically, one per processor core). Idle processes don't impose any overhead on the scheduler. In addition, spawning Erlang processes is always a very cheap operation that happens very fast. I don't think the same applies to Scala when all existing threads are blocked, because this condition first needs to be detected, and then new OS threads need to be spawned to execute pending Actors. This can add significant latency (this is admittedly theoretical: only benchmarks can show the real impact).

Depends on what you're doing, the difference between process scheduling in Erlang and Scala may not impact performance much. However, I personally like knowing with certainty that the Erlang scheduler can gracefully handle pretty much anything I throw at it.

Distributed programming


One of Erlang's greatest strengths is that it unifies concurrent and distributed programming. Erlang lets you send a message to a process in the local or on a remote VM using exactly the same semantics (this is sometimes referred to as "location transparency"). Furthermore, Erlang's process spawning and linking/monitoring works seamlessly across nodes. This takes much of the pain out of building distributed, fault-tolerant applications.

The Scala Actors library has a RemoteActor type that apparently provides the similar location-transparency, but I haven't been able to find much information about it. According to this article, it's also possible to distribute Scala actors using Terracotta, which does distributed memory voodoo between nodes in a JVM cluster, but I'm not sure how well it works or how simple it is to set up. In Erlang, everything works out of the box, and it's so simple to get it working it's in the language's Getting Started manual.

Mnesia


Lightweight concurrency with no shared memory and pure message passing semantics is a fantastic toolset for building concurrent applications... until you realize you need shared (transactional) memory. Imagine building a WoW server, where characters can buy and sell items between each other. This would be very hard to build without a transactional DBMS of sorts. This is exactly what Mnesia provides -- with the a number of extra benefits such as distributed storage, table fragmentation, no impedance mismatch, no GC overhead (due to ets), hot updates, live backups, and multiple disc/memory storage options (you can read the Mnesia docs for more info). I don't think Scala/Java has anything quite like Mnesia, so if you use Scala you have to find some alternative. You would probably have to use an external DBMS such as MySQL cluster, which may incur a higher overhead than a native solution that runs in the same VM.

Tail recursion


Functional programming and recursion go hand-in-hand. In fact, you could hardly write working Erlang programs without tail recursion because Erlang doesn't have loops -- it uses recursion for *everything* (which I believe is a good thing :) ). Tail recursion serves for more than just style -- it's also facilitates hot code swapping. Erlang gen_servers call their loop() function recursively between calls to 'receive'. When a gen_server receive a code_change message, they can make it a remote call (e.g. Module:loop()) to re-enter its main loop with the new code. Without tail recursion, this style of programming would quickly result in stack overflows.

From my research, I learned that Scala has limited support for tail recursion due to bytecode restrictions in most JVMs. From http://www.scala-lang.org/docu/files/ScalaByExample.pdf:


In principle, tail calls can always re-use the stack frame of the calling function. However, some run-time environments (such as the Java VM) lack the primitives to make stack frame re-use for tail calls efficient. A production quality Scala implementation is therefore only required to re-use the stack frame of a directly tail-recursive function whose last action is a call to itself. Other tail calls might be optimized also, but one should not rely on this across implementations.


(If I understand the limitation correctly, tail call optimization in Scala only works within the same function (i.e. x() can make a tail recursive call to x(), but if x() calls y(), y() couldn't make a tail recursive call back to x().)

In Erlang, tail recursion Just Works.

Network IO


Erlang processes are tightly integrated with the Erlang VM's event-driven network IO core. Processes can "own" sockets and send and receive messages to/from sockets. This provides the elegance of concurrency-oriented programming plus the scalability of event-driven IO (the Erlang VM uses epoll/kqueue under the covers). From Googling around, I haven't found similar capabilities in Scala actors, although they may exist.

Remote shell


In Erlang, you can get a remote shell into any running VM. This allows you to analyzing the state of the VM at runtime. For example, you can check how many processes are running, how much memory they consume, what data is stored Mnesia, etc.

The remote shell is also a powerful tool for discovering bugs in your code. When the server is in a bad state, you don't always have to try to reproduce the bug offline somehow to devise a fix. You can log right into it and see what's wrong. If it's not obvious, you can make quick code changes to add more logging and then revert them when you've discovered the problem. I haven't found a similar feature in Scala/Java from some Googling. It probably wouldn't be too hard to implement a remote shell for Scala, but without hot code swapping it would be much less useful.

Simplicity


Scala runs on the JVM, it can easily call any Java library, and it is therefore closer than Erlang to many programmers' comfort zones. However, I think that Erlang is very easy to learn -- definitely easier than Scala, which contains a greater total number of concepts you need to know in order to use the language effectively (especially if you consider the Java foundations on which Scala is built). This is to a large degree due to Erlang's dynamic typing and lack of object orientation. I personally prefer Erlang's more minimalist style, but this is a subjective matter and I don't want to get into religious debates here :)

Libraries


Java indeed has a lot of libraries -- many more than Erlang. However, this doesn't mean that Erlang has no batteries included. In fact, Erlang's libraries are quite sufficient for many applications (you'll have to decide for yourself if they are sufficient for you). If you really need to use a Java library that doesn't have an Erlang equivalent, you could call it using Jinterface. It may or may not be a suitable option for your application. This can indeed be a deal breaker for some people who are deciding between the two languages.

There's an important difference between Java/Scala and Erlang libraries besides their relative abundance: virtually all "big" Erlang libraries use Erlang's features concurrency and fault tolerance. In the Erlang ecosystem, you can get web servers, database connection pools, XMPP servers, database servers, all of which use Erlang's lightweight concurrency, fault tolerance, etc. Most of Scala's libraries, on the other hand, are written in Java and they don't use Scala actors. It will take Scala some time to catch up to Erlang in the availability of libraries based on Actors.

Reliability and scalability


Erlang has been running massive systems for 20 years. Erlang-powered phone switches have been running with nine nines availability -- only 31ms downtime per year. Erlang also scales. From telcom apps to Facebook Chat we have enough evidence that Erlang works as advertised. Scala on the other hand is a relatively new language and as far as I know its actors implementation hasn't been tested in large-scale real-time systems.

Conclusion


I hope I did justice to Scala and Erlang in this comparison (which, by the way, took me way too much to write!). Regardless of these differences, though, I think that Scala has a good chance of being the more popular language of the two. Steve Yegge explains it better than I can:


Scala might have a chance. There's a guy giving a talk right down the hall about it, the inventor of – one of the inventors of Scala. And I think it's a great language and I wish him all the success in the world. Because it would be nice to have, you know, it would be nice to have that as an alternative to Java.

But when you're out in the industry, you can't. You get lynched for trying to use a language that the other engineers don't know. Trust me. I've tried it. I don't know how many of you guys here have actually been out in the industry, but I was talking about this with my intern. I was, and I think you [(point to audience member)] said this in the beginning: this is 80% politics and 20% technology, right? You know.

And [my intern] is, like, "well I understand the argument" and I'm like "No, no, no! You've never been in a company where there's an engineer with a Computer Science degree and ten years of experience, an architect, who's in your face screaming at you, with spittle flying on you, because you suggested using, you know... D. Or Haskell. Or Lisp, or Erlang, or take your pick."


Well, at least I'm not trying too hard to promote LFE... :)

52 comments:

daaku said...

I've been reading up on Scala after being introduced to Erlang a few weeks ago. This is a great summary of the differences. Its clear that the advantage of a bigger existing code base for the JVM is not a good enough reason to choose Scala. Mnesia and distributed programming make Erlang the better choice for me.

Vikas said...

I have been programming in scala for the last few months and I really love it! Yes, its not erlang, but Martin Odersky during the JavaOne talk about Scala mentioned that the actors model was based on Erlang's concurrency implementation. yes, Erlang might be a much better option when it comes to building large scale concurrent applications, but scala is really the best replacement to Java I have seen and its got all the coveted features which people are trying to push into Java 7 and beyond.
The only regret I have about scala is the lack of IDE support, the Eclipse plug-in still needs a lot of work and I think when the IDEs support Scala better, it will be adopted a lot better by the industry.
Adam Bein(http://www.adam-bien.com/roller/abien/entry/java_net_javaone_which_programming) had this on his blog:
During a meeting in the Community Corner (java.net booth) with James Gosling, a participant asked an interesting question: "Which Programming Language would you use *now* on top of JVM, except Java?". The answer was surprisingly fast and very clear: - Scala.

shadowcoder said...

By first-order functions do you mean higher-order functions? Or what is a first-order function?

Adrian said...

Wow! Thanks for this amazing comparison!

Yariv said...

@shadowcoder I meant "function objects" :) Thanks for pointing out the typo.

Kerris said...

Vikas: If you don't mind NetBeans, have a look at Caoyuan's [1] Scala for NetBeans [2].

[1] http://blogtrader.net/page/dcaoyuan/
[2] http://wiki.netbeans.org/Scala

Oliver said...

When I got fed up with the Java bloat I looked at Scala, and it seemed to be interesting. But in the end it's too big, by including all the Java features and more, and I abandoned it--in favour of Erlang. One good thing that came out of looking at Scala was that it provided me with a pointer to Erlang.

The main reason why I prefer Erlang is that the language is a lot smaller, so that there is less to learn in terms of syntax. And the easy concurrency provides a good reason for using it. My only problem with Erlang is the difficulty of dealing with a cross-platform GUI; that's where Scala might have been better.

Holger Hoffstätte said...

Yariv, your argument about the need for something like Mnesia is good (just like anyhing you write ;) but stating that Java has no equivalent is not quite correct. JavaSpaces as "shared brain" between decoupled services and the underlying Jini model in fact provide _many_ features that both Erlang and therefore Mnesia lack - code bootstrapping/migration, transparent fault resilience (arguable), distributed events and most importantly a much stronger security model for authentication, authorization and insulation against malicious code. Erlang is just plain weak in this regard - understandably so since it was made for trusted environments, but still. I think I read that Joe himself regards the security model as one of Erlang's biggest shortcomings.
Also (I swear I don't work for them ;) Gigaspaces as the most prominent commercial implementation of the JavaSpaces spec is also _a lot_ more scalable than Mnesia, both horizontally and vertically.
When coupled with grid infrastructure like Rio (also a Jini service) or an OSGi-based adaptive provisioning mechanism like Newton/Infiniflow, Java is nowhere near the end of the road when it comes to distributed systems and even has a few things that Erlang might want to look at.

Dennis said...

Pretty convincing...I recently saw a post claiming that mnesia's limitation on large databases has been fixed, is that true?

Holger Hoffstätte said...

Dennis: it is probably OK when running on 64bit and good storage, but even then the requirements and constraints are just different - when people say "database" they typically mean an RDBMS that acts as store/archive, however nobody would use Mnesia (or a Tuplespace) for data warehousing, OLAP or the like..just like Oracle (or any other RDBMS) is in many ways quite useless as a low-latency coordination or distribution mechanism for a 24/7/always-evolving system.
But this has nothing to do with Scala anymore ;)

Andy said...

Dennis:

Where did you read about mnesia’s limitation on large databases has been fixed?

Can you post the link? I'd like to learn more about that. Thanks.

Peter Eddy said...

Wow, all the questions I've been wondering about myself but have been too busy to investigate. Thanks!

Dennis said...

Don't have a link, it was just a comment somewhere that said no more than "mnesia's storage limitation has been fixed."

It'd be really great to use mnesia for something like slashdot or reddit...no need to deal with mysql, replication, sharding, just drop everything in mnesia and let it take care of the rest. Sorta like Amazon's SimpleDB or Google App Engine's Datastore. CouchDB is heading that direction but it's kinda young, plus last I checked I couldn't find any documentation on working with it directly from Erlang.

An Erlang interface to SimpleDB/S3 might be the next best thing, but given SimpleDB's limitations, you'd have a multistep process to retrieve a discussion.

(Since Yariv mentioned mnesia I figured this wasn't TOO off topic :)

Daws said...

Excellent article.

I think the key difference between Erlang and just about any other language out there, is it's problem domain. Erlang is made for making highly efficient, highly reliable services. Java and Scala, as far as I know, are general purpose languages.

You can strap on additional features to an existing language, like message passing and pattern matching, but you are not going to match up to a language that is built from the ground up for this particular purpose, such as Erlang for servers. Doing so is like putting extra gunpowder in a gun to get more power; you are only going to get so much of an increase but eventually you will need a bigger barrel to fit a bigger bullet that has more mass.

Daniel said...

Terracota is also a functional equivalent to Mnesia. You can't compare the two feature-wise, because they both attempt to solve the same problem, but they go about it in different ways.

Probably, YMMV depending on the specific problem you have.

Daniel Spiewak said...

Very nice comparison! It's rare that you see a language comparison like this which doesn't degrade into bashing of one or the other. To answer a few of your points...

Scala does not guaranty mutability, because to do so would break compatibility with most Java libraries, but it does strongly encourage the use of immutable data and such. As a developer, you have to be aware that if you are sharing mutable state, then bad things can happen.

Hot code swapping is implemented by several Java solutions, most prominently JavaRebel. I use hot swapping all the time when I run applications in debug mode in Eclipse, so it's hardly something that the JVM doesn't support.

WRT Mnesia and embedded RDBMS, Scala has access to numerous Java libraries which can provide similar functionality. In a distributed sense, you can consider Terracotta to be something of a solution, but for proper database options you might want something more like HSQLDB or Derby. HSQLDB is *extremely* lightweight and fast, so much so that it's almost considered to be something other than an RDBMS by ORM designers. I don't know how well it distributes, but I'm assuming that with Terracotta governing the backend, you can probably do something fancy.

The JVM's lack of tail recursion optimization is annoying, but they are working to fix it in MVM (code named "the Da Vinci Machine").

Sockets are preemptively locked in the Java standard library, which means that once you open the socket, you own that port. Scala Actors would have no difficulty with this function. Even using NIO (which is asynchronous, similar to Erlang's IO backend), the socket can still be held by a single Actor.

JMS allows highly extensible "remote shell" like stuff. You can't exactly shell into the process itself and run arbitrary Scala expressions, but you can find out just about anything you want to know, as well as even kicking off some tasks as necessary.

WRT reliability, Scala is literally just Java with a really fancy frontend. You can compile Scala sources, then decompile the result and get readable Java in return. Java has been used on large scale enterprise systems for well over a decade, and since Scala inherits that track record, I think it's safe to say that it's pretty reliable.

With that said, I wouldn't try to write a massively distributed, high availability system with it; that would be a task better suited to Erlang. :-) But I think Scala is capable of a bit more than you give it credit for.

Nick Kallen said...

Very informative article, thanks.

Hermann said...

Very good comparison analysis. Really entertaining. Congratulations.

But it is really funny to see all the the fuss about functional programming in the last months. I've began working with functional programming a few years ago at college, mostly related to Erlang, Haskell and SML. At that time, all my mates spend a good amount of time saying how horrible and complicated that kind of languages were.

Now, funtional programming is becoming mainstream and in the future I believe more and more companies will be hiring developers to work with some sort of funcional design... I can even see a lot of geek, who used to hate FP, starting to advocacy how powerful and elegant functional programming can be.

The world is really funny...

David Pollak said...

Nice comparison... but I think you missed a few things:

Immutability -- Erlang enforces this and there's almost no way around it. But you trade the rest of Scala's amazingly powerful type system for enforcement of this single type. I do my Scala Actor coding with immutable data and I have Scala's type system to enforce the rest of my types.

Hot code swapping -- You can code your Scala Actors to support hot code swapping the same way you code your Erlang Actors. The pattern that's matches in a Scala Actor is a class (an instance of PartialFunction) that can be swapped at runtime.

GC -- The JVM's GC is a well researched issue. What you trade for having global, but slightly higher pause GC is not having to copy data when messaging between processes in the same address space. I'd bet that the cost of copying is still much higher than the cost of low-pause GC in the JVM.

MNesia -- It's cool, unless you have to cold-start it and then you've got redo-log hell. On the other hand, Steve Yen's ActorD/Treap stuff is interesting.

Tail recursion -- It's a non-issue for event-based actors.

Remote Shell -- you can attach a debugger to a JVM at any point. How is this different than a Remote Shell?

Simplicity -- This is the emacs vs. vi thing. I find it simpler to write code when I have the option to mutate my data. I find it simpler to write code when the compiler checks my types. I find the Scala syntax vastly superior to that of Erlang.

Reliability and Scalability -- for every switch that's running Erlang, there's a JVM app server running in a financial services institution.

I agree that Erlang has some nice advantages, but many of those advantages are present in Scala. More importantly, Scala is fast and runs on top of a much more manageable platform.

Yariv said...

@David

Doesn't sending only immutable types a big limitation? It means you can't, for example, load a simple bean from Hibernate and send it to another actor.

You can code your actors to behave like Erlang processes when they do code swapping, but the JVM nonetheless limits the kind of code swapping you can do (and it certainly doesn't support keeping the older generation of code until a code swap takes place). This is a runtime issue more than a language issue. However, I also think hot code swapping is inherently more complicated in OO languages because of their tight coupling between data and code. If it were easy to do, Java would have had it by now.

Mnesia is indeed optimized for pure in-memory storage because of its original low latency requirements, which covers a wide range of applications. It's been used in production quite successfully.

Erlang's architects decided that per-process GC is worth the message copying overhead (except for large binaries), and I'm sure they had good reasons.

Remote shell is different from a debugger. It isn't intended for setting breakpoints and stepping through code. It's a command line interface to your running application. Also, you can remote debug Java programs, but they need to be run in debug mode, which is slow and I doubt many people use it in production.

I think immutability adds simplicity when writing and especially when reading code. Erlang has tools for (some) static type checking but I never use them. Interacting with my code in the shell and writing tests (which you have to do anyway to make sure your code works) takes care of almost all of type errors. The syntax issue is subjective. I've seen people complain about Scala's syntax too. I think once you get use to a syntax it "disappears."

Reliability and scalability -- I'm sure that Java is used extensively in the financial world, but I haven't seen much data about how it's used (web applications? trading systems?) and what kind of performance and uptime it gets for comparing it to Erlang.

Speed depends very much on architecture and problem domain, not just on raw CPU performance for pure number crunching. According to Yegge, Rails has outperformed Struts (http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html) despite Ruby being slower. Erlang is certainly fast enough for many applications.

I'm not sure what makes the Scala platform more manageable (maybe unless a company has already invested in Java infrastructure).

Yariv said...

Btw, people used to say that Java is slow... :) (I'm sure some still do)

Yariv said...

A great reddit comment from the discussion: http://reddit.com/r/programming/info/6jwa8/comments/c0427s2

links for 2008-05-20 « Brent Sordyl’s Blog said...

[...] Erlang vs. Scala Scala, on the hand, has the best of both worlds. Its has functional semantics, its Actors library provides Erlang style concurrency, and it runs on the JVM and it has access to all the Java libraries (tags: erlang scala programming) [...]

Steve Nicolai said...

"but the JVM nonetheless limits the kind of code swapping you can do (and it certainly doesn’t support keeping the older generation of code until a code swap takes place)."

With the JVM, it is possible to keep several versions of code around. Web containers do this for application redeployment, keeping the old version of the webapp going while the requests drain out of it, but sending new requests to the new version. BEA calls it Production Redeployment, see

http://edocs.bea.com/wls/docs92/deployment/redeploy.html

This is done through a JVM feature called classloaders. Each version of a webapp is loaded in it's own classloader, so the webapps are isolated from each other. BEA's limitation in their implementation is that no data is transfered from the old version to the new version. All data that should be shared between the two needs to be persisted outside the webapp, e.g. the database, JMS, etc.

Scala could easily take advantage of classloaders in the JVM as well to provide hot code loading.

David Pollak said...

Yariv,

The JVM allows loading of new classes at runtime. This is how JSPs work. This is how servlet containers load new contexts, etc. You can load a new class representing a partial function (the pattern match thing in an Actors) and poke your Actor to use the new Partial function rather than the old one. This is not JVM level hot code replacement (the thing you are complaining of), but in fact exactly the same mechanism that Erlang uses to swap in new Actor code. So, practically and functionally, Scala Actors and Erlang Actors have the same hot code swapping capabilities.

In terms of "Rails is plenty fast enough", that's clearly wrong. Once you have a site that has high enough volume to care, you're dead in the water with Rails (see Twitter.) The JVM has radically faster String handling, XML handling and just about everything else handling than does Erlang. Once you get into serving pages and doing the things that sites want to do, you want a runtime that's fast, stable, and manageable. Ruby has none of these. Erlang has two of these. JVM has three of these.

Finally, in terms of your immutability argument, if you don't want to pass mutable objects, then don't. I can do 50 things in Erlang that are frowned upon. The argument that you can do things in the JVM that makes it difficult to use a particular technology doesn't make any sense at all. I've built a number of production systems based on Scala Actors. There's very little work actually required to deal with the immutable issue. You just define your case classes (the messages) to be immutable and away you go.

Coding in Erlang is a matter of taste. If it works for you, great. But a lot of the advantages that you claim are not real or substantiated by the facts (e.g., loading new [not replacing existing] patterns into Actors or Garbage Collection). You've done really great things with Erlang and ErlyWeb, but I don't see Erlang as the slam dunk over Scala that your article suggests.

Thanks,

David

Yariv said...

In my experience working with Java, hot code replacement was definitely lacking. Often I would try to make a change as I'm stepping through code in the debugger, and Eclipse would complain that the change couldn't be loaded without restarting the process. This was a significant speed bump during development. This RFE http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=75918701bbfba9ffffffff98ef0a2fcb0863c?bug_id=4910812 confirms it. I know that Java can load new classes at runtime, but it can't change (the type signatures of) existing ones.

Rails may have its issues. I'm not a Rails expert, but I've heard many voices saying that you can scale Rails (well, any language) if you architect your system properly. Twitter has had issues, but the developers of other big sites such as Scribd and Friends for Sale have said that Rails scales fine if you know what you're doing. I've also seen systems written in Java that don't scale well, btw. It usually comes down to DB bottlenecks. Raw speed isn't an important factor in scalability.

I don't think Erlang is a slam dunk over Scala, and I didn't mean the article to make it sound like I do. I think Scala has many advantages over Erlang and for many applications and teams Scala may very well be the better choice. However, I think Erlang has enough advantages over Scala for me to stick with Erlang as my preferred tool, especially for building scalable real-time/comet applications such as Meebo/FB chat, etc, where Erlang's concurrent and distributed programming truly shines.

Yariv said...

Btw, about loading new code for actors, I didn't realize this was possible. Thanks for explaining it.

Chris Hansen said...

First of all, great post. It has a slight bias, but that is to be expected. I prefer Scala myself, since it's a general purpose language, but Erlang does what it does better than anything else I've heard about.

The back-and-forth of the comments keeps coming back to the supposed lack of hot code swapping on the JVM. By default, HotSpot only supports changes inside of method bodies - this is not acceptable. It turns out this can be done in a library - without starting the JVM in debug mode. One such library has been mentioned already (JavaRebel), but it seems you have missed it. http://www.zeroturnaround.com/javarebel/features/

So far, it can't do everything that is possible out-of-the-box in Erlang, but it gets most of the way there. I wouldn't run it in a production environment (yet), but it really makes a difference to me during development.

Yariv said...

Thanks for the tip about JavaRebel.

How is Scala a general purpose language and Erlang not?

If you compare their origins, Erlang was designed for building scalable fault tolerant distributed systems whereas Java/JVM was designed for building applets that run on set top boxes.

Zubin Wadia said...

Good Post , Yariv and also the responses from David.

It's has been clear to me what the benefits of each language are.

It's nice to see two experienced campaigners reinforce that perception.

What did you think about FB's use of leveraging C++ to handle Chat logging for optimal File I/O?

@Yariv - Would this be a pattern you would advocate for applications of this nature?

jau said...

>> How is Scala a general purpose language and Erlang not?

>> If you compare their origins, Erlang was designed for building
>> scalable fault tolerant distributed systems whereas Java/JVM
>> was designed for building applets that run on set top boxes.


Scala has been designed to be a general purpose language from the beginning. It blends both oo and functional approaches. It gives you choice. JVM is only a platform which happens to be also libraries rich and mature. Scala also can target CLR. Ofcourse it may be the case that Erlang and it's VMs will be as popular as JVM someday but currently Scala seems best language for me.

I'm not a "distributed systems" guy so maybe I'm wrong but I think that JVM and Scala can be used to implement some Erlang concepts like remote error handling, message passing and other concepts familiar to you. However AFAIK JVM's processes and threads are too heavyweight comparing to Erlang.

Anyone knows both Erlang and Scala / JVM well?

Jevgeni Kabanov said...

Since there was some talk here about JavaRebel I think it is also relevant that for Scala JavaRebel is now free: http://www.zeroturnaround.com/news/scala-goes-dynamic-with-javarebel/

Chris Hansen said...

> How is Scala a general purpose language and Erlang not?
>
> If you compare their origins, Erlang was designed for
> building scalable fault tolerant distributed systems...

That's just it. Erlang was built for a specific purpose, and it fulfills that purpose well.

Scala was designed to give you the tools to write everything from scripts of a few lines to codebases with millions of lines of code - no matter what the application. Only time will tell if it lives up to that lofty goal. Pushing a hybrid OO and FP approach is a good start, though.

Yariv said...

I disagree. The reason I brought up that comparison is that saying that Erlang is only good for building distributed systems is like saying that Java is only good for building applets. Check out wings3d and vimagi counter examples. Erlang is a very simple language and is quite suited for writing anything from short scripts to huge systems with millions of lines of code. Scala is a lot more complex than Erlang and to use Scala effectively you need to be familiar not with one, but with two languages, Scala and Java, because Scala relies on so many Java APIs. Finally, a pure FP language is arguably superior to OO/FP hybrid, not just because of simplicity and expressiveness but also because once you go OO you sacrifice immutability, you lose readability, and you open a can of worms as far as concurrency goes. Maybe Scala gurus know to be careful enough to sidestep the concurrency pitfalls involved with mutable data but try scaling this expertise to bigger teams and you'll face a very serious challenge. In the scalability department, I think Erlang wins.

Yariv said...

@Zubin I don't know much about why Facebook decided to go with C++. I think it's because they have a bunch of backend services written in C++ and exposed using thrift, and it was easier to set up an Erlang/C++ bridge than rewrite those things in Erlang. It's just speculation on my part, though.

The Scala vs Erlang whirlwind at Ted Leung on the Air said...

[...] Sadan has done a pile of stuff in Erlang, and supplied his own summary of the differences between Scala and Erlang. There is a very informative exchange between Yariv and [...]

pk11 said...

"In terms of “Rails is plenty fast enough”, that’s clearly wrong. Once you have a site that has high enough volume to care, you’re dead in the water with Rails (see Twitter.) The JVM has radically faster String handling, XML handling and just about everything else handling than does Erlang. Once you get into serving pages and doing the things that sites want to do, you want a runtime that’s fast, stable, and manageable. Ruby has none of these. Erlang has two of these. JVM has three of these."

David,
with all respect it's just not true. If the jvm made a web solution scalable (I doubt it), rails has a jvm targeted deployment option too (via JRuby).

As lots of people mentioned twitter's issues at this point probably more architecture related than rails or ruby. And it's clear that there are lots of high traffic rails apps out there. So architecture what matters not the lang/framework when it comes to scalability, even if scala is scalable by definition:D

Tim said...

As of right now, Erlang has miserable performance at file I/O, string processing, and regular expressions; which makes it a non-starter for some apps. That aside, as a light user of Erlang and a mere student of Scala, Erlang feels way more elegant, if only because it's smaller. But that's a pretty nice platform Scala runs on.

Yariv said...

@Tim Didn’t Erlang perform quite well in the wide finder project, beating most languages? I actually think that whatever weaknesses Erlang may or may not have in string processing and file io (of which I have seen scant evidence btw) are insignificant for the majority of applications for which people consider Erlang, webapps included. These complaints are reminiscent of the “Ruby/Java/PHP is slow” arguments that many practitioners of the languages have happily ignored and went on to build their successful apps.

Tim said...

For Wide Finder, it turned out (my interpretation) that coarse-grained parallelism was the winning formula. Thus, the winners were Perl and Python. Erlang placed very well, though, with all sorts of horrid workarounds for the file/string-processing weakness.

Tim said...

D'oh, and I should point to the Wide Finder summary, http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results and to the currently-in-progress round 2: http://www.tbray.org/ongoing/What/Technology/Concurrency/ - come and help explore!

Your Bear said...

I am very surprised that the OTP framework of Erlang/OTP was not mentioned. (Or did I miss it in the text?)

Eric said...

This is a great post. I love your blog, keep it up.

Shawn said...

If you want to waste some more time it would be interesting to see a comparison with Clojure in these categories :)

As a Lisp on the JVM, there will be similarities with Scala's answers on features like hot code swapping, garbage collection, recursion, network I/O, libraries, and reliability. But I have to say David makes a good case on all of those.

Clojure's author has issues with "location transparency", so don't hold your breath on that type of distribution.

Since it's Lisp, you get a lot of the dynamicism and FP concepts that you know and love (a REPL, of course; I'm not sure how easy it is to get a remote one).

A big feature of Clojure is its immutable data structures, which all work with the built in software transactional memory and asynchronous agent/action systems for concurrency. In terms of scheduling, I believe those systems are built on threads.

No Mnesia, but I wonder if one could make a really nice one using the pure Java version of Berkeley DB.

I'm not trying to change your mind. You like Erlang and it has what you need today. But as a language for Java environments, it looks promising.

Humanist → Why Make Erlang a Functional Language? said...

[...] — Yariv Sadan [...]

World of Warcraft, keeping track of TODO items, and more. said...

[...] Erlang vs Scala : I like Erlang (related post) for various reasons, none of them related to it being partially a functional language. Apparently, twitter switched to Scala from Ruby (the kind of mistakes people make) and it has worked pretty well, so far, for them. [...]

Keith Kim’s Blog » Blog Archive » Erlang and Java (and Scala too) said...

[...] is a posting about Erlang and Scala: http://yarivsblog.com/articles/2008/05/18/erlang-vs-scala/ I only read and played around basics with Scala and thought it’s just a way to add FP to [...]

nickelcode » Erlang and Cloud Computing: A Fine Pair Indeed said...

[...] to applications, and it only takes one look at the comments in one of Yariv’s posts ( Erlang vs Scala ) to see there are a lot of [...]

下一波流行的开发语言是? « 广告时间 said...

[...] 看这里了解一下Erlang的前生后世及Salar的,还有两者的一番对比。 [...]

Szymon Jeż said...

Very good comparison. In my impression Erlang seems to be the better choice for building concurrent systems. At least for now.

Scala: The Successor to the Throne « Metaphysical Developer said...

[...] on a class is not really immutability, since referring objects may not be immutable themselves. And there is no way at the moment to [...]

Trabajo said...

Java indeed has a lot of libraries.