Tuesday, August 29, 2006

Introducing ErlyDB: The Erlang Twist on Database Abstraction

Every web developer craves a database abstraction layer. Ruby, PHP, Java and Python all have them. Many developers have adopted web frameworks just for their database abstraction capabilities. The lack of such a framework has even caused some developers to hesitate before using the best language of them all for building world-class backends: Erlang.



That's why I created ErlyDB: The Erlang Twist on Database Abstraction :)



ErlyDB isn't a database abstraction layer, but a database abstraction layer generator. ErlyDB taps into Erlang's runtime metaprogramming powers to generate an abstraction layer for your database on the fly. This layer is as flexible as it is paper-thin.



Unlike ORM frameworks, ErlyDB works directly with Erlang tuples, which are simple immutable arrays that map very easily to database records. This incurs less overhead than object-relational mapping solutions, which rely on more heavyheight objects to represent database rows.



With ErlyDB, productivity doesn't come at the expense of performance :)



ErlyDB is in an alpha stage right now. It only supports MySQL; it hasn't been thoroughly tested and optimized; and it's not feature-complete by any means. However, I'll be adding many more features and improvements over time. If you strongly need a feature before I implement it, I encourage you to implement it yourself and share it with the rest of us. I will be accepting any useful, high quality contributions from other developers.



Let's do a short tutorial.




  • Get ErlyDB from the subversion repository and install it in your code path (on my Mac, it's /usr/local/lib/erlang/lib).
  • Get the MySQL driver from process-one.net and install it in your code path.
  • Create a MySQL database called 'test'

  • Run the following SQL script:



create table language (
id integer auto_increment primary key,
name varchar(50),
paradigm varchar(30),
creation_year integer);

create table project (
id integer auto_increment primary key,
name varchar(50),
description varchar(50),
language_id integer,
index(language_id));


  • Create the file "language.erl" with the follwing code and compile it in your code path:




-module(language).
-export([relations/0]).

relations() ->
[{one_to_many, [project]}].


  • Create the file "project.erl" with the following code and compile it in your code path:





-module(project).
-export([relations/0]).

relations() ->
[{many_to_one, [language]}].


  • Implement the following module, which illustrates some of the common idioms of ErlyDB.






-module(erlydb_test).
-export(
[init/0,
test/0]).

init() ->
%% connect to the database
erlydb:connect(mysql, "localhost", "username", "password",
"test"),

%% generate the abstraction layer modules
erlydb:code_gen([language, project]).

test() ->
init(),


%% clean up old records
erlydb:q("delete from language;"),
erlydb:q("delete from project;"),

%% Create some new records
Languages =
[language:new("Erlang",
"A fun and productive functional language with " ++
"top-notch concurrency features designed for "++
"building scalable, fault-tolerant systems. " ++
"Erlang is an excellent choice for building "++
"anything from websites like hamsterster.com " ++
"to commercial phone switches.",
"functional/dynamic/concurrent", 1981),
language:new("Java",
"An OO language designed to make the lives of " ++
"C++ programmers less painful.",
"OO/static", 1992),
language:new("Ruby",
"An OO language designed to make the lives of " ++
"Java and Perl programmers less painful.",
"OO/script/dynamic", 1995)],

%% Save the records in the database and collect the updated
%% tuples.
[Erlang, Java, Ruby] =
lists:map(
fun(Language) ->
%% executes an INSERT statement
{ok, Lang} = language:save(Language),
Lang
end, Languages),

%% demonstrate getters
"Erlang" = language:name(Erlang),
"functional/dynamic/concurrent" =
language:paradigm(Erlang),
1981 = language:creation_year(Erlang),

%% demonstrate setter
J1 = language:creation_year(Java, 1993),

%% executes an UPDATE statement
{ok, J2} = language:save(J1),

1993 = language:creation_year(J2),


%% Lets run some queries
{ok, E1} = language:find_id(language:id(Erlang)),
true = E1 == Erlang,

{ok, [E2]} = language:find_where("name = 'Erlang'"),
true = E2 == Erlang,

{ok, [E3]} = language:find(
"WHERE paradigm = 'functional/dynamic/concurrent' " ++
"LIMIT 1"),
true = E3 == Erlang,
{ok, [E4, J4, R4]} = language:find_all(),
true =
E4 == Erlang andalso
J4 == J1 andalso
R4 == Ruby,

%% Let's make some projects

Yaws = project:new(
"Yaws", "A beautiful web server written in Erlang",
Erlang),
Ejabberd = project:new("ejabberd",
"The best Jabber server, hands down", Ruby),
OpenPoker =
project:new("OpenPoker",
"A scalable, fault-tolerant poker server", Erlang),


%% We call language:id just to demonstrate that constructors
%% accept both related tuples or, alternatively, their id's.
%% This example would behave identically if we used the
%% Java variable directly.
Tomact =
project:new("Tomcat",
"A Java Server with XML config files",
language:id(Java)),

JBoss =
project:new("JBoss",
"A Java Application Server with more XML files",
Java),
Spring =
project:new("Spring Framework",
"A Java IoC framework oozing XML love", Java),


Mongrel =
project:new("Mongerl",
"A web server with a funny name", Ruby),
Rails =
project:new("Ruby on Rails",
"A nice integrated web framework for building " ++
"CRUD websites.", Ruby),
Ferret = project:new("Ferret",
"A Ruby port of Apache Lucene. It would be nice " ++
"if someone ported it to Erlang", Ruby),
Gruff = project:new("Gruff",
"A Ruby library for easy graph generation. " ++
"An Erlang implementation would be nice as well",
Ruby),

Projects = [Yaws, Ejabberd, OpenPoker, Tomact, JBoss,
Spring, Mongrel, Rails, Ferret, Gruff],

%% Insert our projects into the database
[Yaws1, Ejabberd1, OpenPoker1 | _Rest] =
lists:map(
fun(Project) ->
{ok, P1} = project:save(Project),
P1
end, Projects),

%% let's get the language associated with Yaws
{ok, Erlang2} = project:language(Yaws1),
true = (Erlang2 == Erlang),

%% now let's correct a grave error
{ok, Ejabberd2} = project:save(
project:language(Ejabberd1, Erlang)
),
true = language:id(Erlang) ==
project:language_id(Ejabberd2),


%% let's get all the projects for a language
{ok, [Yaws3, Ejabberd3, OpenPoker3]} =
language:projects(Erlang),
true =
Yaws3 == Yaws1
andalso Ejabberd3 == Ejabberd2
andalso OpenPoker3 == OpenPoker1,

ok.

(This code is included in the 'test' directory of the ErlyDB distribution.)




  • Read the code and experiment.




As you've probably guessed, ErlyDB uses Smerl to do its magic. (Confession: when I created Smerl, ErlyDB was just what I had in mind :) ) ErlyDB takes advantage of all of Smerl's advanced features, such as metacurrying, parameter embedding and module extension. You may find it interesting to look at the code and see how ErlyDB works under the hood.



Adhering to the Erlang philosophy of zero downtime, the ErlyDB abstraction layer can change in runtime without taking the system offline. Simply call erlydb:code_gen with the module names to regenerate and everything will work as expected.



Please get your hands on ErlyDB and give it a test drive. Let me know if you find any bugs or if you have any suggestions.



Coming very soon:



  • Many-to-many relations

  • Transactions

  • Event handlers (before_save, after_save, etc.)



Coming soon:



  • Prepared statements

  • Support for additional database engines (probably when edbc comes out, unless other people make temporary drivers beforehand.)

  • Better connection pooling (also probably tied to edbc)

  • Real documentation :)

  • Performance optimizations

  • More customizations

  • Fine-grained control on field visibility

  • Multiple models per table

  • More versatile SQL query generation

  • Support for complex queries

  • and more



Enjoy!





Appendix



Erlang users have actually had a nice alternative to database abstraction: Mnesia. Mnesia is an industrial-strength, distributed database engine written in pure Erlang. Although Mnesia is quite beautiful, it isn't the best fit for all applications. Mnesia's biggest drawback (at least as I see it) is that Mnesia isn't designed to scale to very large (many gigs) data volumes. I'm hoping the OTP team will improve this aspect of Mnesia, but until it does, we have to live with this limitation.



Even if Mnesia didn't have this shortcoming, many developers would still prefer to use other database engines for their various capabilities. MySQL, Postgres and Oracle aren't going away any time soon.



Having said that, Mnesia can very useful even if you're using another database engine. For instance, a killer application for Mnesia is a distributed session store shared between instances of a Yaws cluster. Mnesia could also be used as a distributed cache for recently-accessed database records.



Update: I just realized that there's no reason ErlyDB shouldn't have a Mnesia driver. Any takers? :)

22 comments:

Charles said...

You're doing an excellent job in spreading the word about Erlang's strength. Exactly what David Heinemeier Hansson did for Ruby with Rails. I re-discovered Erlang thanks to your blog!

Anders said...

This looks really promising. I'll definitely be playing with it. Being able to work easily with legacy databases is an important step in getting Erlang out to a wider audience.

Is it going to stay introspection based (ie, it figures out the fields by examining the tables in the database) or will there be support for defining the fields in Erlang code and having it be able to create them in the database?

al3x said...

ErlyDB looks awesome, but it's even cooler that you spent time developing it and were still willing to suggest another solution (mnesia). Not many developers can be that objective about what they write.

I'm with dennis in wondering if mnesia or the like can be made to be not just a scalable solution but a *better* solution for the problems of the web domain than existing databases. Simplifying deployment and distribution while improving performance, all that jazz.

I think Erlang's success in the web application arena will come from proving that there better ways of doing things, not just the same old approach in another language. Rails has been an incremental step forward, time-tested ideas wrapped up in a pretty package. Proving that Erlang is compatible with "legacy solutions" is great, but it's not going to be yet another ORM or templating system that gets people moving to the Erlang camp in droves.

Chris Hartjes said...

Hey Yavis, you can count me as one of those lurkers who kept coming back to your blog and wondering "why should an experienced web developer like me take a look at the erlang + yaws combo." I'm a long time PHP guy who has also worked with Ruby on Rails. The first thing that came to mind was "I don't see any support for databases other than the mnesia thing he keeps mentioning"

Now that it exists, well, it's time to fool around with it on my iBook and see if a little app I've been thinking about building could actually work as an erlang-powered website. It could also use some work to make doing HTML easier, but from the examples I've seen it looks like I should be able to brute-force things.

Keep up the great work and start thinking of ways to get other web developers interested in building Erlang apps for the web.

Tobbe said...

Very nice indeed !

Yariv said...

Thanks for all the great feeback, guys. Just remember this one thing: ErlyDB is just the beginning. I made it in a few days in the weekends and after-work hours. In the next few months, I and other people will be going full steam. We will keep releasing high-quality tools that will make Erlang the ULTIMATE secret weapon for web startups :) I wasn't joking when I said it earlier. Erlang is an extremely powerful language and the stuff that we can do with Erlang will make users of other languages wonder "what the hell just happened"? :)


Regarding Schema generation -- it's pretty straightforward to do but ErlyDB has more pressing needs. Keep reading this blog and you'll hear about ErlyDB's progress.

Frank said...

You're going about this the wrong way. Don't write a Mnesia interface for this: Write an SQL backend for Mnesia. That way, all existing code can be moved to, say, PostgreSQL.

Aslak said...

You're going about this the right way. Forget Mnesia, let someone else focus on that. What you're doing here is not just exactly what a lot of people desire for Erlang, but a great eye opener for people coming from the orm world.

Dmitrii Dimandt said...

Wow, wow, WOW

:)

ke han said...

Do you have any tips on introspecting the generated code? I would like to easily query from an erlang shell the functions generated and also access the generated source or forms. I suppose these are mostly smerl questions, but would like to see examples relative to your erlydb test code.
Would be nice to have the ability to generate a an "erl doc" from any loaded module. I have commented in other forums that one of the drawbacks to rails is that you cannot easily see all the magic code that is generated/inherited. It would nice to see erlydb/smerl have some answers to this while its still in its infancy.
thanks, ke han

Damir said...

If you guys keep comming up with these goodies at such rate, soon there will be no secret weapon for web startups ;-). So please, keep your voice down ;-)).

I've heard before about Mnesia not being well suited for large data storage. How "big" (in Mbytes) is the upper limit in your opinion? Is this also a reason for ErlyDB? Looks awsome!

Joel Reymont said...

Yariv,

What about performance?

Are you compiling all that dynamic code once and then using it or is it generated for every invocation?

Bill Mill said...

So, I had an erlang phase a while ago, but it seems I've forgotten a bit of it.

What do the "true = a == b" statements mean? Do I read it like I would in python - "true_ = (a == b)" where the value of the comparison between a and b is assigned to the variable true?

Sorry, those bits just confused me. Also, you have spelled Tomcat as Tomact in your code.

Bill Mill said...

So, I had an erlang phase a while ago, but it seems I've forgotten a bit of it.

What do the "true = a == b" statements mean? Do I read it like I would in python - "true_ = (a == b)" where the value of the comparison between a and b is assigned to the variable true?

Sorry, those bits just confused me. Also, you have spelled Tomcat as Tomact in your code.

Yariv said...

I'll answer all the questions in a backward order:

Bill -- the true = X statement would cause the runtime to throw an exception if the statement fails. It tells Erlang to bind X to 'true', which is only possible if X is already true.

Joel -- the queries are generated in runtime but the code that generates the queries is compiled once. When I add support for prepared statements, most queries will be generated once per conncetion.

Ke Han -- use smerl:for_module followed by smerl:get_func or smerl:forms. Also keep in mind smerl:get_exports and smerl:get_module.

Damir -- ErlyDB came out of a selfish, pragmatic need: Mnesia doesn't scale to large datasets and I needed to work with databases that do. I don't know how well Mnesia scales but I'm not gonna use my own app as a Mnesia testbed :)

RE Mnesia -- if the OTP team decides to improve it, it's all the better for us. It gives us more tools to do our jobs. However, I've looked at the Mnesia and dets source codes and it's too much for me to tackle by myself. I decided ErlyDB would give me more bang for the buck :)


Cheers - Yariv

Yariv said...

I'll answer all the questions in a backward order:

Bill -- the true = X statement would cause the runtime to throw an exception if the statement fails. It tells Erlang to bind X to 'true', which is only possible if X is already true.

Joel -- the queries are generated in runtime but the code that generates the queries is compiled once. When I add support for prepared statements, most queries will be generated once per conncetion.

Ke Han -- use smerl:for_module followed by smerl:get_func or smerl:forms. Also keep in mind smerl:get_exports and smerl:get_module.

Damir -- ErlyDB came out of a selfish, pragmatic need: Mnesia doesn't scale to large datasets and I needed to work with databases that do. I don't know how well Mnesia scales but I'm not gonna use my own app as a Mnesia testbed :)

RE Mnesia -- if the OTP team decides to improve it, it's all the better for us. It gives us more tools to do our jobs. However, I've looked at the Mnesia and dets source codes and it's too much for me to tackle by myself. I decided ErlyDB would give me more bang for the buck :)


Cheers - Yariv

ke han said...

re: What do the "true = a == b" statements mean?

erlang will compare the values a and b (a == b). This evaluates to true or false, as thats what the '==' does. The '=' in erlang is _not_ an assingment, it is a match. so, what's on the left gets matched to whats on the right of the '='. In this case, if a == b evaluates to anything other than true, then the match operation will fail with a "bad match" error. This is the goal of writting this style of code. You write what you expect to match; anything else produces an error. This gets into a coding style which erlang'ers call "non-defensive programming". There exist good books and tutorials on this subject.

thomas lackner said...

This looks great and is an important part of a popular web stacks (though I prefer to slang SQL by hand). Good job!

I was thinking, perhaps, that some of the functions like new and save could benefit from keyword arguments so you can omit column values. Does Erlang do that? How does the _ stuff used in records work in practice? Could Smerl enable this behavior?

Jonathan Allen said...

Drop the anti-OO rehtoric, it just makes you look stupid.

For your information, objects are cheap to create in most languages. For example, .Net only need 12 bytes of overhead.

Had you not starting attacking OO, I woulf have been more interested in what you have to say about Erlang and databases.

ak47 said...

2 Jonathan Allen>
erlydb: pregenerated & precompiled code, data delivered in plain tuples, accesing = basically just offseting.

others : queries generated during runtime, quite expensive. accessing through dynamic accessors, quite expensive.

If you had actually thought for a while instead of rushing to defend your beloved OO, you could have figured it out yourself.

Todd said...

Being very new to Erlang I'm just curious if using MySQL (I know its popular) instead of Mnesia reduces the concurrancy and fault-tolerant advantage that's native to Erlang?

Serge said...

Yariv, on the YaWS page http://yaws.hyber.org/contribs.yaws they mention the Kreditor application using Mnesia. Does it assume that their needs in DB capacity are smaller so Mnesia is sufficient? I tend to think otherwise, but your take on it is interesting.