Wednesday, October 11, 2006

Recless: A Type Inferring Parse Transform for Erlang (Experimental)

Warning: Recless is highly experimental, and it's not yet finished. Use it at your own risk!!!

My cell phone's favorite feature is the auto-complete mode. With auto-complete, my cell phone guesses what words I'm trying to type based on which words in its database are most likely to match the combination of digits I have entered. For instance, if I wanted to type the word 'hello', I could type '43556' instead of '4433555555666'.

The main drawback is that my cell phone has a limited words database, so when I want to type a word that's not in the database, I have to press some extra keys to switch from auto-complete to manual mode and back again, which is somewhat annoying.

(This happens more often than you would think: the database doesn't hold a single English curse word -- not even 'damn'! WTF :) )

Last week, I had the idea to implement a similar feature for Erlang. In case you haven't used Erlang and you're horrified about the possible implications of the previous statement, don't worry -- Erlang doesn't limit you to coding using only your keyboard's number pad :) However, Erlang does have some idioms that could benefit from some trimming IMO: record getters and setters.

Let's take a detour into a discussion of Erlang's records. Erlang is a dynamically typed functional language. It has no notion of classes and objects. If you want to group a few data elements together, you can put them in a list or in a tuple. To store a fixed number of elements, you use a tuple. For a variable number of elements, you use a (linked) list.

When you use tuples as containers, you have to know the position of each field in the tuple in order to access or change the value of the field. This works well in many cases, and it makes pattern-matching very natural, but it doesn't work well in all cases. Sometimes you want to give a name to a data element and access it using its name, not its position. That's where records come in.

Records basically provide syntactic sugar for tuple creation as well as element access and manipulation. They let you give names to fields, and they resolve the positions for you in compile time. For example, let's say we have defined the following records:


-record(address, {street, city, country = "USA"}).
-record(person, {name, address = #address{}}).
-record(project, {name, owner = #person{}}).


To create a tuple of "type" address and bind it to the variable named Address, you could write


Address = #address{street = "700 Boylston", city = "Boston"},


This would bind Address the value {address, "700 Boylston", "Boston", "USA"}. (The country field demonstrates how to set default values to record fields).

To get the street from an address, you have 3 options:


Street = Address#address.street,
{_,Street,_,_} = Address,
Street = element(2, Address)


Update (10/12/2006): It has been brought to my attention that there's another, perferred way of writing the above logic (thanks, Bengt):


#address{street=Street} = Address,


Changing a record's fields requires additional syntax. Here's an example:


NewAddress = Address#address{street =
"77 Massachusetts Avenue", city="Cambridge"}


Record values can be nested in other records. Example:


Project =
#project{name = "MyProject",
owner = #person{name = "Bob",
address = #address{city = "Miami" }}}


Some people find the standard field access and manipulation syntax somewhat less than beautiful, and I sympathize with them. In almost all OO languages, using properties of an object it as simple as writing 'person.city'. This syntax is obviously more lightweight, but it also comes at the cost of maintaining type data in runtime and/or embracing full static typing. (This is another reminder that no language is perfect -- not even Erlang! :) Actually, many people would say Lisp is perfect... but I digress).

The Erlang record syntax gets even more cumbersome when working with nested records. Let's say we want to get the city of the owner of Project. This is how we go about it using the record syntax:


City = ((Project#project.owner)
#person.address)#address.city


If you think that's an eyesore, consider *changing* the value of the city:


NewProject =
Project#project{owner =
(Project#project.owner)#person{address =
((Project#project.owner)
#person.address)#address{city = "Boston"}}}.


Yes, I know. Ugly. That's why I created Recless.

Recless is a parse transform that uses a type inference algorithm to figure out what kinds of records your variables are holding, and then lets you write much less code to work with their elements. For example, with Recless, the above two examples could be written as


City = Project.owner.city.


and


NewProject = Project.owner.address.city = "Boston".


All you have to do to enable this syntax is to put recless.erl in your source directory and add the following declaration at the top of your source file:


-compile({parse_transform, recless}).



The holy grail for Recless is to Just Work. There is, however, one main restriction in Recless's type inference algorithm: function parameters must indicate their record types for type inference to work on them. For instance, this won't work:


get_name(Person) -> Person.name.


Instead, you must write this:


get_name(Person = #person{}) -> Person.name.



Recless is already pretty good at figuring out the types of your variables, but there are many cases that it doesn't handle yet. It also probably has some bugs. The reason I'm releasing it now is that when I got the idea to make Recless, I seriously underestimated how hard it would be to make it. I thought it would take 2, at most 3 days, but I've already spent more time than that, and I'm only 75% or so done. Before I dig myself any deeper, I wanted to get some feedback from the Erlang community. If many people want Recless, and everybody agrees that Recless can't mess up any code that's written in standard Erlang, I'll devote the extra time to finishing it. Otherwise, I'll go back to other projects and maybe finish it later when I have some more time.

You can get the source for Recless here: http://code.google.com/p/recless. It also includes a test file that demonstrates what Recless can do at the moment.

Note: Although Recless does a good amount of type inference, it does not attempt to catch type errors. Dialyzer already does a fantastic job at that. All Recless tries to do at the moment is simplify record access and manipulation syntax. If Recless fails to infer the type of an expression such as "Person.name", it crashes with a type_inference error (this will be improved up if/when Recless grows up).

I'll appreciate any feedback!

4 comments:

Tobbe said...

Looks promising i think. However, I tried the following, but got an error when compiling:


-module(test).
-compile({parse_transform, recless}).

-export([new/0,upd_street/2]).

-record(adr, {
name = "Bill Smith",
street = "Main street 23",
phone
}).

new() -> #adr{}.

upd_street(A = #adr{}, Street) ->
A.street = Street.


Also, it would be nice to be able to write:

upd_street(A, Street) when record(A, adr) -> ....

Mike said...

Hi Yariv,

What do you think about support for

Project.owner.city

as a short for Project.owner.address.city

when the city is unique (not a part of the owner in this case)?

I beleive this became part of some modern languages.

Thanks,
Mike.

Yariv said...

Mike, I like this idea. Unfortunately, I don't have any time to spend on Recless right now... :(

Yariv’s Blog » Blog Archive » Geeking out with Lisp Flavoured Erlang said...

[...] concise code above by inferring the types of variables in your program when possible. (I wrote a rant about this a long time ago, with a proposed solution to it in the form of a yet unfinished parse [...]