Saturday, September 30, 2006

ErlyDB 0.7

In the past couple of weeks, I've been hacking away at ErlyDB and the libraries it uses (the Revamped MySQL driver, ErlSQL and Smerl). I added a bunch of features and optimizations and also refactored and/or rewrote most of the original ErlyDB code until I felt the picture looked good.



The last release was v0.1, but I decided to name this release v0.7 because ErlyDB getting pretty close to 1.0 quality. Plus, I just like the number 0.7 :)



Here are the main new features:



Transactions



Each generated module has the function transaction/1, which lets you execute a group of statements in a transaction against the module's driver. Here's an example:




person:transaction(
fun() ->
P = person:new("Menahem"),
P1 = person:save(P),
city:add_person(TelAviv, P1)
end).


Transactions are automatically rolled back if the function throws an error or crashes. For example, this function




person:transaction(
fun() ->
person:delete(P1),
exit(just_kidding)
end).


would cause the delete statement to be rolled back.



Note: all auto-generated CRUD operations are executed in transactions, so you can set your database's auto-commit option to 'false' and nothing will break.



Protection against SQL injection attacks



ErlyDB uses ErlSQL under the hood to automatically quote all string and binary values in SQL expressions. All functions that let the user define a Where condition expect an ErlSQL expression by default. Here are some examples:




person:find({name, '=', "Joe"}).



person:find(
{{name,'=',"Joe"}, 'or', {'not', {age, '<', 26}}}).


You can also use ErlSQL expressions for LIMIT and ORDER BY clauses, such as




person:find({name, like, "Joe%"},
[{order_by, [{age, desc}, country]},
{limit, 3, 4}]).


If you want to use string and/or binary expressions, you can define the {allow_unsafe_statements, true} option when calling erlydb:code_gen/3. This will tell ErlyDB to accept statements such as




person:find("name = 'Joe'").


and


person:find("name LIKE 'Joe%'", "LIMIT 17").


However, this usage is discouraged because it makes you more vulnerable to SQL injection attacks. If you turn this feature on, make sure to escape all your strings using erlsql:encode/1.



Hooks



You can implement in your modules the following functions, which ErlyDB will use as hooks for different operations:




after_fetch/1
before_save/1
after_save/1
before_delete/1
after_delete/1



Fine-grained control on field visibility



By implementing the fields/0 function in your modules, you can specify exactly which database fields ErlyDB will use for the module. For example, if you have the function




fields() -> [name, age, country].


in the module 'person', ErlyDB will only use those fields even if the database table has additional fields (the 'id' field is always assumed to be the primary key).



User-defined table assignment



You can implement the table/0 function to indicate which table ErlyDB should use as the database table for your module. For example, the function




table() -> person.


in the module 'artist' would tell ErlyDB to use the 'person' table in all SQL statements for the module 'artist'.



Multiple modules per table



You can implement the type_field/0 function to indicate which column in the database table holds the module type for each record. In combination with the table/0 function, this lets you store records for multiple modules in a single table. By implementing the fields/0 function, you can also control which fields are exposed for each module.



For example, suppose you have the table 'person' defined as




CREATE TABLE person (
id integer auto_increment primary key,
type char(10),
name varchar(30),
age integer,
country varchar(20),
office integer,
department varchar(30),
genre varchar(30),
instrument varchar(30),
created_on timestamp,
index(type)
)


You can create 3 modules for accessing this table as follows:



person.erl




-module(person).
-export([fields/0, type_field/0]).

fields() -> [name, age, country].

type_field() -> type.



employee.erl




-module(employee).
-export([table/0, fields/0, type_field/0]).

table() -> person.
fields() -> person:fields() ++ [office, department].
type_field() -> type.


musician.erl




-module(musician).
-export([table/0, fields/0, type_field/0]).

table() -> person.
fields() -> person:fields() ++ [genre, instrument].
type_field() -> type.


The created_on field will not be exposed to any of the modules.



Note: The fields/0 function isn't limited to defining supersets. You can define any relation you want, as long is it's expressible in Erlang :)



More auto-generated functions



In addition to the find(Where, Extras) and find_id(Id) functions, ErlyDB now generates find_max(Max, Where, Extras) for getting at most Max records, and find_range(First, Max, Where, Extras) for getting at most Max records starting from offset First. Each function has 4 variants, e.g.




find()
find(Where)
find_with(Extras)
find(Where, Extras)


Functions for finding related records also have such variants. Examples:




developer:projects(D)
developer:projects(D, Where)
developer:projects_with(D, Extras)
developer:projects(D, Where, Extras)

developer:projects_first(D)
developer:projects_first(D, Where)
developer:projects_first_with(D, Extras)
developer:projects_first(D, Where, Extras)

developer:projects_max(D, Max),
developer:projects_max(D, Max, Where),
developer:projects_max_with(D, Max, Extras)
developer:projects_max(D, Max, Where, Extras)

developer:projects_range(D, First, Max),
developer:projects_range(D, First, Max, Where),
developer:projects_range_with(D, First, Max, Extras)
developer:projects_range(D, First, Max, Where, Extras)


Aggregate functions



ErlyDB generates functions for getting aggregate data about records from a module. The functions currently supported are 'count', 'max', 'min', 'avg', 'sum' and 'stddev' (it's easy to add more functions to this list by changing aggregate_functions/0 in erlydb.erl). For example, you can make calls such as




person:max(age).
employee:count('distinct name').
season:avg(temperature).
city:sum(population, {country,'=',"USA"}).


to get aggregate values from the database. The 'count' function has a special version that takes no arguments.




person:count().


is equivalent to




person:count('*').


In addition, ErlyDB lets you query aggregate data about related records (both in one_to_many and many_to_many relations). Examples:




developer:sum_of_projects(Joe, language).
apple:avg_of_oranges(MyApple, color,
{{state,'=',"ripe"},'and',{size, '>', 3}}).


The special form of 'count' also exists for related records:




language:count_of_developers(Erlang).


Better error handling



In the last version of ErlyDB, functions returned {ok, Result} or {error, Err}. In version 0.7, I changed it so functions return Result or they crash by calling exit(Err). This makes it easier to execute a group of statements and let the enclosing function trap all the errors (it also follows the Mnesia API more closely).



Multiple drivers, multiple databases in one session



The last version of ErlyDB supported only one driver per session. In version 0.7, each module can have a different driver. In addition, each module can have a list of driver options defined when calling erlydb:code_gen/3. All generated functions pass these options to the driver. For example, the MySQL driver accepts the {pool_id, Val} option, which defines against which connection pool ErlyDB should execute SQL statements. This allows you to work with multiple MySQL databases in one ErlyDB session.



Efficient SQL generation



In the last version, ErlyDB created SQL statements by string concatenation. In version 0.7, ErlyDB uses ErlSQL to generate statements as iolists of binaries, which are more efficient than strings because they consume less memory.



Many internal improvements


I rewrote or refactored most of the code from the alpha version. I also optimized the code generation process so it's much faster now.



That's it :)



You can get the distribution from branches/v0.7 in the subversion repository.



Note: ErlyDB requires the Revamped MySQL Driver v0.9.5 or above, which you can get from the Subversion repository under branches/v0.9.5. This driver is based on the original MySQL driver from the YXA project with a improvements I made such as support of transactions, prepared statements, binary queries and more efficient connection pooling. Once the changes I made to this driver are integrated back into the YXA source repository, the Revamped driver will no longer be supported. This is to ensure that the Erlang community doesn't have to deal with multiple forks of the same driver.



Please let me know if you find any bugs or if you have any suggestions.

14 comments:

ajrw said...

Looks good! Do you have any plans to allow automatic creation of tables from a set of field specifications, or will ErlyDB alwas require an existing database schema?

Nick Thomas said...

Wow! This stuff is looking better and better! A question, though: are there any plans to support database engines other than MySQL?

Yariv said...

arjw -- I don't really see a compelling value in schema creation from Erlang. It can't be that much easier than writing CREATE TABLE statements. Am I missing something?


Nick -- there are such plans, esp with Mnesia, Postgres and ODBC. However, these efforts will probably rely mostly on contributions from other developers as I don't have a strong need for them at the moment.

ke han said...

Yariv,
I like where erlyDB is going. My biggest concern is namespace pollution. In anything but simple apps where your model objects are only used as rdb record wrappers, models end up with lots of functions for specialized domain behavior.
You are taking up lots of function names without having an approach for namespace collision. I assume you've thought of this and decided things looked more beautiful without prefixing functions with something like '_db_', so you would have '_db_table'/1 or db_table/1 instead of table/1.

ke han said...

your before_save and after_save functions do not provide enough resolution. In many cases, you need to know if the save was an update or a create. I recommend either offering additional before_update, before_create, etc... or have an optional parameter to the save hooks to pass the context of the operation. You could have a before_save/2 which can be used like:
before_save(Model, Context) where Context == update.

ke han said...

in addition to the resolution on save hooks, having access to information in the after hooks for fetch, save and delete would be useful. This additional info would inlcude the number of rows effected by the operation.

ajrw said...

Yariv: The nice thing about programmatic table creation is that you can write to one (common subset) schema and maintain portability across different database servers. There's also the possibility for automatic database upgrades as new fields are added or existing ones are modified.

In particular I'm thinking of SugarCRM's "vardefs.php" files, which define various metadata for each of the database (and virtual) fields, such as the column type and size. The field descriptor also covers things like formatting conventions, for automatic conversion from the user's display/edit format to the database storage format - useful when handling lots of dates and currency amounts.

Perhaps this sort of thing would be better implemented as a layer over ErlyDB.

ajrw said...

Hm, clicking on the submit button seems to add a comment without giving any feedback (such as moving to a new page). Inappropriate use of Ajax? :)

Dmitrii Dimandt aka Mamut said...

One word: Whoah! Automatic table generation or table migrations (a la Rails?) could be a plus, though

Yariv said...

Thanks for everybody's input!

Yariv said...

Regarding schema generation -- it's simple enough to do. Maybe I'll try to hack something for the next ErlyDB release. About namespace collisions -- I think the likelyhood of a collision is low enough for this to not be such a big concern. However, if it does happen, it can be annoying. I don't see it as an urgent issue, but maybe I'll find an easy way to add an ErlyDB prefix to generated functions for a future release. For hooks -- you can call the is_new/1 function of the domain modules to check for INSERT or UPDATE statements in the before_save hook. In the after_X hooks, they always apply to a single record. If the DELETE or UPDATE statements don't affect a single record, an error is thrown telling you how many records were affected in the database. Isn't that sufficient?

Norbert Klamann said...

@arjw wrt. database independence.

IMHO not a great idea at all. Use the database to its fullest extent ! You never get as much security and performance for the buck as in the database itself.

[Disclaimer] I do Oracle DB-Development for a living.

jwewymolusa said...

But then stopped, malin akerman nude that much longer, be doing your facts.

Bestiality porn zoo sex private photo and movies. said...

Dog porn movies zoo porn movies animal porn movies....

Beast sex zoo sex animal porn. Zoo sex beastiality horse porn dog fucking....