Monday, May 12, 2008

Erlang does have shared memory

I occasionally hear people say things such as "Erlang makes concurrency easy because it doesn't have shared memory" or "processes in Erlang communicate just by message passing." Just earlier today I came across one such comment on Steve Yegge's post Dynamic Languages Strike Back (you'll have some scrolling down to do -- it's a long article :) ).

For many purposes, it's "good enough" to think of Erlang as lacking shared memory. Erlang's built-in message passing semantics makes inter-process communications trivial to implement, and for many concurrent applications, it's all you need. Also, when you first learn Erlang, it's easy to get excited by the promise of not having to worry about the difficult shared memory problems you encounter in "traditional" concurrent programming. And that's for a good reason: Erlang indeed makes concurrent programming much easier (and more scalable, reliable, etc) than other languages. However, it's not true that Erlang processes can't share memory.

Erlang has (a kind of) shared memory: it's called ets.

From the ets documentation:


This module provides very limited support for concurrent updates. No locking is available, but the safe_fixtable/2 function can be used to guarantee that a sequence of first/1 and next/2 calls will traverse the table without errors and that each object in the table is visited exactly once, even if another process (or the same process) simultaneously deletes or inserts objects into the table. Nothing more is guaranteed; in particular any object inserted during a traversal may be visited in the traversal.


Multiple Erlang processes can simultaneously access and manipulate an ets table, which makes ets act very much like shared memory. ets isn't identical to shared memory in other languages, however. These are the main differences:

- Objects are copied when inserted into and looked-up from ets tables.
- Basic consistency is guaranteed. Individual ets records never get garbled.
- ets tables are not garbage collected (this lets you store massive amounts of data in RAM without incurring garbage collection penalties -- an important trait for soft real time performance.)

Despite these differences, as far as the programmer is concerned, ets is effectively shared memory. If you want to guarantee that multiple processes get a consistent snapshot of a set of objects in a plain ets table, you're out of luck.

The good news is that Erlang doesn't leave you to your own devices to figure out some subtle solution involving locking around critical regions to access ets tables safely from multiple processes. Instead, it provides a very nice tool for working with ets: Mnesia. Mnesia is is a kind of STM for Erlang, with some extra properties such as support for distributed and persistent storage. Mnesia has a simple transaction API you can use to ensure atomicity, isolation and consistency when accessing objects in an ets table.

Here's an example, taken for the Mnesia documentation, of how to raise an employee's salary in a transaction:


raise(Eno, Raise) ->
F = fun() ->
[E] = mnesia:read(employee, Eno, write),
Salary = E#employee.salary + Raise,
New = E#employee{salary = Salary},
mnesia:write(New)
end,
mnesia:transaction(F).


So, Erlang has shared memory, and it also has Mnesia, which provides easy transactional access to ets. Does that mean that concurrent programming in Erlang is just like other languages, but with using Mnesia for storing shared data? Not exactly. When you program in Erlang, you still use message passing in many situations where in other languages you would rely on semaphors/locks/monitors/signals/etc to enable inter-thread communications. In the simplest possible example, a producer/consumer application, you would use messaging in Erlang, not ets. In most other languages, you would probably use monitors to protect against concurrent access to a shared buffer (see the wikipedia examples).

Erlang isn't the only language that has message queues. In fact, you could use the basic concurrency facilities in most languages to implement message queues as higher-level abstractions for inter-thread communications (although you probably wouldn't be able to replicate Erlang's selective receive using pattern matching, and it probably wouldn't scale or perform as well as Erlang). This is exactly what some of the actor libraries out there do. However, you would have to be very careful to not send pointers/references to objects that would end up being shared between threads, or you would potentially run into nasty bugs when multiple threads modify the same object. In Erlang, all data is immutable, which makes such bugs impossible. And if even you figure out how to ensure copy semantics for message passing, you would still have your work cut out for you to allow processes to communicate between VMs...

Implementing full Erlang style concurrency isn't trivial. I don't think it can be added as a library to a language that doesn't have it by design with support from the runtime.

Erlang takes you as close as possible to concurrent (and distributed!) programming bliss -- but it does have (a kind of) shared memory: ets.

16 comments:

miguel rodriguez said...

is it possible to get rid of ets from the language?, because is not a technical issue, it is a political issue, if the functionality could be replace by some other(normal erlang mnesia) method(although not as fast?) I think the language gains more if it keep tight to is fundamentals and not from obtaining some specific functionality, because how can you claim that the share memory is dangerous and at the same time you have it

masklinn said...

In other words, Erlang doesn't have programmer-exposed shared memory (I guess you were talking about my comment, by the way) (oh and you can get the "complete" URL of a blogger comment by clicking on the date at the end of the comments) but it has shared data stores that don't have most of shared memory's problems...

I must say, I don't get why you're trying to say that erlang does have shared-memory concurrency (available), it's not like it would be a good thing would it?

> Implementing full Erlang style concurrency isn’t trivial. I don’t think it can be added as a library to a language that doesn’t have it by design with support from the runtime.

I don't either, even though Scala's Actors library is an interesting experiment.

masklinn said...

> is it possible to get rid of ets from the language?

There is no reason to do that, ets has its use (plus I'm pretty sure mnesia is implemented on top of ets...)

miguel rodriguez said...

well of course ets has it uses, I’m pretty sure too that mnesia is implemented on top of ets, but I believe that one of the beauties of erlang is to take the concept of individual process acting like small cells spare in different address spaces interacting only through messages, and ets does not seems to follow this approach, one could argue that sometimes is necessary this kind of approaches(maybe for efficiency) but in my experience(not only with erlang) some times we choose the wrong solution just because it seems too easy (and a share memory is always an attractive solution for it easiness (for not saying lazy) )

alisdair sullivan said...

ETS entries aren't copied when they're inserted as binaries, as binaries are also shared memory allocated outside of process memory space.

Toby DiPasquale said...

What about the process dictionary? That's the *real* shared memory in Erlang vis a vis what imperative programmers expect going in, IMHO.

Tim Bates said...

No, the process dictionary represents mutable state (to a newcomer to Erlang), not shared memory - since it is constrained to a single process (the opposite of "shared"). And even the process dictionary could be implemented using Erlang's process model. ETS and the process dictionary are just optimizations, with their own performance implications and caveats, but they don't break Erlang's no-shared-memory model.

orbitz said...

Hrm, I think this post is kind of cheap. Erlang does not have shared mutable memory, it does however have the ability to write modules with the same semantics as shared mutable memory. Heck, it's a programming language, no surprise there. This is a property of a module though, not the language. Perhaps a more accurate title: Erlang does not have shared mutable memory, but you can simulate it if you want.

Hypothetical Labs said...

[...] Yariv makes a good point about ets being Erlang’s sort of equivalent to shared memory. I agree totally. [...]

orbitz said...

To continue my previous thought, this sounds somewhat analogous to saying Haskell has side effects because you can create monads.

Matt W said...

Hey Yariv,

I love Erlyweb and I keep up with your blog (I even scraped some HTML so I could figure out how to associate an atom feed with a website ;) ). Thanks for both!

I just wanted to let you know I made a google custom search for Erlang and yarivsblog.com is one of the few site that I'm limiting the search to.

You can find it at http://search.dawsdesign.com

I hope you find it useful!

Yariv said...

Whether you define ets as shared memory or not, you can't deny that as far as the programmer is concerned ets shares some semantics with traditional shared memory implementations. That's not a bad thing by any means -- it's just interesting to observe the duality between message passing and shared memory + locks. Given one you can implement the other.

orbitz said...

Sure, you can write one with the other. But doesn't mean Erlang has shared memory, just means you have the ability to express the semantics of shared memory in Erlang. I'd be surprised if you couldn't.

Your Bear said...

Hi Guys!

@Yariv: I agree with you, ets is a shared area of mutable storage.

@Orbitz: We have I/O and if you don't want to carry around the updated world in your call stacks, you better have some side effects. Yes, Haskell's answer to this are monads. By the way, bang operation in Erlang is a side effect as well.

@Hypothetical Guy: The message passing etc we all like Erlang for is otherwise known as actors model. You can add that to an OOP lang as well, with Scala being an example. See OOP and COP. Because destructive updates are what most of us were raised with and many algorithms and data structures exist using this feature, it is easier and I guess Scala will overtake Erlang as actor language at one time in point, although I hope that having that will make it too easy too make a mess of already complex concurrent or even distributed problem solutions.

Ulf Wiger said...

I have also reacted to the claims that Erlang lacks shared memory. My main objection is that it's too broad a statement, along the lines of "erlang doesn't have memory management" - it does, it's just that it's automatic.

Semantically, Erlang copies messages and data, but it doesn't always do so in practice. Since the language doesn't allow you to address memory directly, the underlying implementation is free to put all erlang processes in the same memory space without sacrificing robustness*. Ets tables are also kept in the same memory space. Furthermore, objects in Ets tables are not garbage collected, which makes a difference for very large data sets.

* Experimental versions of the VM have even used one global heap for all processes, and passed all messages by reference. Unfortunately, a new garbage collector would have been needed to make the implementation really useful in practice.

Balu said...

Since ETS is "shared memory", will using it invalidate Erlang's multi-core scalability advantage?