Saturday, September 16, 2006

Introducing ErlSQL: Easy Expression and Generation of SQL Statements in Erlang

A couple of days ago, I started refactoring some of the code generating SQL statements in ErlyDB. My intention was to modularlize the SQL generation in ErlyDB to make it more flexible and reusable. As I added more and more functions, I suddently found myself with a new domain specific embedded language for expressing SQL statements in Erlang on my hands :) I named this language ErlSQL (ESQL in short).



I created small library for generating literal SQL statements from ESQL expressions. Given that this capability can be useful outside of ErlyDB (for instance, if you are using the Revamped MySQL driver or the Postgres driver from Jungerl directly), I decided to create a new project for this library. ErlSQL now lives at http://code.google.com/p/erlsql.



ErlSQL's main benefits are:



  • Easy dynamic generation of SQL queries from Erlang for application developers.

  • Prevention of most, if not all, SQL injection attacks by assuring that all string values are properly escaped.

  • Integration with higher level libraries such as ErlyDB



ErlSQL covers a large subset of the SQL language and its extensions, including most CREATE, UPDATE, DELETE and SELECT queries. ESQL supports field and table aliases ("foo AS bar"), complex WHERE expressions, ORDER BY and LIMIT clauses, nested queries, function calls, unions, aggregate expressions (using GROUP BY and HAVING clauses), and more.



ErlSQL will gain more capabilities over time. If it's missing a feature you need urgently, it should be fairly straightforward for you to add it youself.



erlsql.erl only has 2 exported functions: sql/1 and sql/2. sql/1 takes an ESQL expression and returns an iolist (a tree of strings and binaries that you can send directly to a socket). sql/2 takes an additional boolean paramemter indicating if the result should be converted to a single binary (via iolist_to_binary/1) or not.



Here are some examples of ESQL expressions and their corresponding SQL queries:




{insert,project,[{foo,5},{baz,"bob"}]} ->
"INSERT INTO project(foo,baz) VALUES (5,'bob')"

{insert,project,[foo,bar,baz],[[a,b,c],[d,e,f]]} ->
"INSERT INTO project(foo,bar,baz) VALUES
('a','b','c'),('d','e','f')"

{insert,project,[foo,bar,baz],[{a,b,c},{d,e,f}]} ->
"INSERT INTO project(foo,bar,baz) VALUES
('a','b','c'),('d','e','f')"

{update,project,[{foo,5},{bar,6},{baz,"hello"}]} ->
"UPDATE project SET foo=5,bar=6,baz='hello'"

{update,project,[{started_on,{2000,21,3}}],{name,like,"blob"}} ->
"UPDATE project SET started_on='2000213'
WHERE (name LIKE 'blob')"

{delete,project} ->
"DELETE FROM project"

{delete,project,{a,'=',5}} ->
"DELETE FROM project WHERE (a = 5)"

{delete,developer,{'not',{{name,like,"%Paul%"},
'or',{name,like,"%Gerber%"}}}} ->
"DELETE FROM developer
WHERE NOT ((name LIKE '%Paul%') OR (name LIKE '%Gerber%'))"

{select,["foo"]} ->
"SELECT 'foo'"

{select,["foo","bar"]} ->
"SELECT 'foo','bar'"

{select,{1,'+',1}} ->
"SELECT (1 + 1)"

{select,{foo,as,bar},{from,{baz,as,blub}}} ->
"SELECT foo AS bar FROM baz AS blub"

{select,name,{from,developer},
{where,{country,'=',"quoted ' \" string"}}} ->
"SELECT name FROM developer
WHERE (country = 'quoted \\' \\\" string')"

{select,[{{p,name},as,name},{{p,age},as,age},{project,'*'}],
{from,[{person,as,p},project]}} ->
"SELECT p.name AS name,p.age AS age,project.*
FROM person AS p,project"

{select,{call,count,name},{from,developer}} ->
"SELECT count(name) FROM developer"

{{select,name,{from,person}},
union,
{select,name,{from,project}}} ->
"(SELECT name FROM person) UNION "
"(SELECT name FROM project)"

{select,distinct,name,{from,person},{limit,5}} ->
"SELECT DISTINCT name FROM person LIMIT 5"

{select,[name,age],{from,person},{order_by,[{name,desc},age]}} ->
"SELECT name,age FROM person ORDER BY name DESC,age"

{select,[{call,count,name},age],{from,developer},{group_by,age}} ->
"SELECT count(name),age FROM developer GROUP BY age"

{select,[{call,count,name},age,country],
{from,developer},
{group_by,[age,country],having,{age,'>',20}}} ->
"SELECT count(name),age,country
FROM developer GROUP BY age,country HAVING (age > 20)"

{select,'*',{from,developer},{where,{name,in,["Paul","Frank"]}}} ->
"SELECT * FROM developer WHERE name IN ('Paul','Frank')"

{select,name,
{from,developer},
{where,{name,in,
{select,distinct,name,{from,gymnist}}}}} ->
"SELECT name FROM developer WHERE name IN
(SELECT DISTINCT name FROM gymnist)"

{select,name,
{from,developer},
{where,{name,in,
{{select,distinct,name,{from,gymnist}},
union,
{select,name,
{from,dancer},
{where,{{name,like,"Mikhail%"},
'or',
{country,'=',"Russia"}}}},
{where,{name,like,"M%"}},
[{order_by,{name,desc}},{limit,5,10}]}}}} ->
"SELECT name FROM developer
WHERE name IN (
(SELECT DISTINCT name FROM gymnist)
UNION
(SELECT name FROM dancer
WHERE ((name LIKE 'Mikhail%') OR (country = 'Russia')))
WHERE (name LIKE 'M%') ORDER BY name DESC LIMIT 5,10)"


Making ErlSQL was another lesson to me of the great expressiveness gained by the combination of of tuples, lists and pattern matching. It would be quite frustrating for me to use a language that doesn't have such semantics if I were ever forced to do so :)



In conclusion: if it looks like SQL and it acts like SQL, it must be... Erlang! :)



Update: There seems to be some confusion regarding the utility of ErlSQL, so I'll take another shot at explaining it. Many applications and libraries generate SQL queries programatically based on certain rules. ErlyDB is such library. Given a domain model and a set of relations, ErlyDB generates SQL queries for interacting with the data for this domain. The current implementation of ErlyDB (v0.1) does this by string concatenation, which is rather inefficient and error prone. ErlSQL facilitates writing SQL generation code in ErlyDB in a safe and efficient manner and by interacting directly with the semantics of the SQL language in Erlang. This is primarily why I made ErlSQL: I wanted to make SQL generation in ErlyDB more robust and elegant than string concatenation.



The second motivation is that for certain applications, the statements that ErlyDB generates automatically are not sufficient. Application developers often need to add their own WHERE framents and sometimes even write full queries to support the application requirements. I wanted to give developers the tools to write dynamic queries in a manner that's resilient to SQL injection attacks but without losing the full flexibility and familiarity of the SQL language and without stepping too far outside of Erlang.



Plus, it's neat to have the IDE indent your clauses and balance all your paretheses :)



Another note: Ulf Wiger pointed out to me on the mailing list that using Dialyzer, it may be possible to check the validity of the ErlSQL expressions in compile time. I haven't tried it, though.

12 comments:

Randall Randall said...

So... you've figured out how to add extra typing for SQL queries? Perhaps it's only that your examples were poorly chosen. :)

Yariv said...

Randall -- I think you misunderstood what ErlSQL is about. It's not about "adding typing to SQL" -- it's about making it easy to construct dynamic SQL statements free of SQL injection attacks in Erlang . Or maybe I misunderstood the meaning of your comment...

Dennis said...

Yariv, you rock.

Randall Randall said...

I'm sorry; I was being kind of snarky. SQL is really regular in the first place, and while I often try to hide as much boilerplate as possible, the SQL in my webapps is often where much of the application logic resides, and I haven't so far found a solution to writing SQL that writes the same SQL I'd write, while hiding boilerplate. In particular, your Erlang solution here actually seems to make things harder to write *and* read, which prompted my snarky comment, above.

SQL injection attacks are solved by automatic quoting, which is such a basic feature of a database interface that it's hard to imagine that any don't have it. Once in a while, you may have to actually build a SQL query, but that's never seemed to call for this level of difference to me.

Randall Randall said...

Wow, that looks dense. Perhaps I should have figured out how to put line breaks in, since double returns don't do it. :)

Yariv said...

Randall, ErlSQL is a tool that's very useful for certain purposes and may be less useful for others. If sounds like you have found a good solution for escaping your strings, but for other applications and/or libraries, your solution may be less effective than ErlSQL. For ErlyDB, ErlSQL is very useful, for instance. In addition, ErlSQL provides a very strong assurance that SQL injection attacks wont creep into your dynamically constructed queries -- EVER. It's similar to the way intepreted languages assure that your code will never have buffer overflows. Is it possible to write code in C++ that doesn't have buffer overflows? Yes. But the assurance that it simply cannot happen is a great boon for many developers -- especially when you're working in a team with some developers who aren't as skilled at avoiding such pitfalls.

Dmitrii Dimandt aka Mamut said...

Yariv, I've been drooling over ErlyDB and now ErlySQL so much, I'm ready to drown in my own saliva (excuse me for the graphic image :)) )


I wonder... How easy would it be to hook mnesia with all this?

Yariv said...

Dimitrii -- I'm happy to see such a level of excitement :) The next version (or maybe the one right after that) of ErlyDB will have some basic Mnesia integration built into it. It's actually not too hard to plug into Mnesia for simple operations :)

esteban said...

Hola no entiendo mucho por el ingles pero solo queria preguntarte si esta implementación era para usar la base de datos MNESIA de Erlang.

Gracias

esteban said...

Te cuento q usamos erlang + mnesia para gestion de mensajeria instantane y esto lo hacemos desde el webserver yaws, pero tengo el siguiente problema me dicen q no se puede hacer una consulta contra mnesia q traiga x registro por ejemplo los q comienzan con A esto es para hacer un paginado alfavetico, ya q la forma q implemente es traer todos los registros y procesarlos con elrang, el tema es q me demora 20 segundos cargar 25 registros en una tabla en la pagina, pues todo este procesa me consume tiemp, mi pregunta es hay alguna manera q se pueda hacer eso siendo explicito en la consulta, tambien quiero integrar ajax, la libreria prototype.js pero me anda en Mozilla y no en I.E si me puedes ayudar te agradesco.

Esteban

SQL Tutorials said...

You know, the thing about SQL is, that there is virtually nothing that can replace it.

Does anyone know if a substitute exists for sql? I mean besides MS SQL and Oracle and all that jazz. Thanks.

SQL Tutorials said...

Does anyone know if there is another language or set of commands beside SQL for talking with databases?

I'm working on a project and am doing some research thanks