Wednesday, May 28, 2008

Announcing Twoorl: an open source ErlyWeb-based Twitter clone

With the recent brouhaha over Twitter's scalability problems, I thought, wouldn't it be fun to write a Twitter clone in Erlang?

Last weekend was cold and rainy here in Palo Alto, so I sat down and hacked one, and thus Twoorl was born. It took me one full day plus a couple of evenings. The codebase is about 1700 lines (including comments). You can get it at http://code.google.com/p/twoorl

twoorl_screenshot.png

Note: you need the trunk version of ErlyWeb to make it work (when released, it will be the 0.7.1 version).

Many people written about Twitter's scalability problems and how to solve them. Some have blamed Rails (TechCrunch is among them), whereas others, including Blaine Cook, Twitter's Architect, have convincingly argued that you can scale a webapp written in any language/framework if you've figured out how to Just Add More Servers to handle the growing traffic. Eran Hammer-Lahav wrote some of the most insightful articles on the subject, On Scaling a Microblogging Service.

I have no idea why Twitter is having a hard time scaling. Well, I have some suspicions, but since I haven't been in the Twitter trenches, such speculation isn't worth wasting many pixels on.

I didn't write a Twitter clone in Erlang because I thought my implementation would be inherently more scalable than a Rails one (although it may be cheaper to scale because Erlang has very good performance) . In fact, Twoorl right now wouldn't scale well at all since I prioritized simplicity above all else.

The reasons I wrote Twoorl are:

- ErlyWeb needs more open source apps showing how to use the framework. It's hard to pick how to use the framework just from the API docs.
- Twitter is awesome. Once you start using it, it becomes addictive. I thought it would be fun to write my own.
- Twitter is very popular, but I don't know of any open source clones. I figured somebody may actually want one!
- Some people think Erlang isn't a good language for building webapps. I like to prove them wrong :)
- Although you can scale pretty much anything, your choice of language can make a difference in of performance and stability, both of which lead to happy users.
- I think Erlang is a great language for writing a Twitter clone because Twitter's functionality offers interesting opportunities benefit from concurrency. Here are a couple of ideas I thought of:

1) If you use sharding, the Tweets for different users would be stored in separated databases. When you render the page for someone's timeline, wouldn't it be advantageous to fetch the tweets for all the users she follows in parallel? In Ruby, you would probably do something like this:


def get_tweets(users)
var alltweets = Array.new()
users.each { | user |
alltweets.add(user.fetch_tweets())
}
alltweets.sort()
return alltweets
end


(Please forgive any language errors -- my Ruby is very rusty. Treat the above as Pseudo code.).

This code would work well enough for a small number of tweet streams, but as the number gets large, it would take a very long time to execute.

In ErlyWeb, you could instead do the following:


get_tweets(Users) ->
sort(flatten(pmap(fun(Usr) -> Usr:tweets() end, Users)))


This would spawn a process for each user the user follows, fetch the tweets for that user, then reassemble them in sorted order in the original process before rendering the page. (Think of it as map/reduce implemented directly in the application controller.) If a user follows hundreds of other users, querying their tweets in parallel can significantly reduce page rendering time.

2) Background tasks. When a user sends a tweet, the first thing you want to do is store it in the database. Then, depending on the features, you have to do a bunch of other stuff: send IM/SMS notifications, update RSS feeds, expire caches, etc. Why not do those tasks in different background processes? After to write to the DB, you can return an immediate reply to the user, giving him or her the perception of speed, and then let the background processes do all the extra work for processing the tweet.

(Such technique works very well for Facebook apps, by the way. In Vimagi, when the user submits a painting, the app first saves the painting data, and then it spawns a new process to update the news feed and profile box, send notifications, etc.)

Anyway, I hope you enjoy Twoorl. It's still in very early alpha. It doesn't have many features and it probably has bugs. Please take Twoorl for a spin and give me your feedback! I'll also appreciate useful contributions :)

33 comments:

Nick Gerakines said...

Oh Yariv you beat me to it. I was going to go with 'Twittrl', although Twoorl sounds really slick. Now you just need an Erlang port to connect to a phone to send/receive text messages and you are all set.

David Roe said...

Nice set of objectives, it looks impressive and the short time spent putting it all together is intense. Nice one.

Plus, now I know that http://twerly.com/ is already taken.

Yariv said...

http://twerly.com was an amusing discovery :)

I wanted Twerl or Twerly but they were taken so I went with Twoorl. I'm hoping those two O's will sprinkle Twoorl with some of that Google magic dust :)

mpi said...

There was twitter clone in Scala called Skittr.

http://blog.lostlake.org/index.php?/archives/55-Prance-with-the-Horses,-Skittr-with-the-Mice.html

I wonder how do Twoorl and Skittr stack up against each other?

mikong said...

Nice! I'm new to Erlang and would like to see more open source Erlang projects.

As for your ruby method, here's the fix to the syntax:

def get_tweets(users)
all_tweets = []
users.each { |user|
all_tweets << user.fetch_tweets
}
all_tweets.sort
end

One can also use map/reduce in Ruby (there's Skynet and Starfish), but I haven't tried it. I'm not saying I prefer Ruby for this particular problem, since concurrency is at the heart of Erlang.

I read parts of Twitter uses Erlang. Reading your posts, it just seemed like you didn't know about this.

Joel said...

Depending on what you want to do you can write the ruby much simpler of course :) The last statement is what the function returns so there's no need to do that explicitly.

def tweets users
(users.inject([]) { |tweets, user| tweets.concat user.fetch_tweets }).sort
end

or if you don't want a flat array:

def tweets users
(users.map { |user| user.fetch_tweets }).sort
end

Jesse Andrews said...

remember that every user sees a different view of another person's follow list.

http://twitter.com/jack is dynamic for each view since "private" users require that they follow you before you are allowed to see their tweets. So if Jill is private and following me, I see her tweets on /jack, but since she isn't following you, /jack will show 20 recent tweets not including hers.

Yariv said...

@Jesse I know about the private/public feature in Twitter, but I didn't see much value in implementing it. I doubt a high percentage of Twitter users actually use this feature -- my feeling is that it adds more complexity than it's worth.

Guido Kollerie said...

Twoorl - A Twitter clone...

There’s a lot to do about Twitter these days. Most notably for the scalability issues they are experiencing. The topic of scalability happens to be one of my areas of interest. To that end I have tracked the programming language Erlang for a numb...

Bryce Kerley said...

> (Such technique works very well for Facebook apps, by the way. In Vimagi, when the user submits a painting, the app first saves the painting data, and then it spawns a new process to update the news feed and profile box, send notifications, etc.)

That's the same thing that http://poopstat.us/ does in its facebook app. Took my latency down to 130ms, which is pretty okay.

Matthew said...

I liked it right off the bat. Unfortunately, Follow/send/character counting-- those all seem to be broken in FF3 with Javascript errors like: "follow is not a function".

Nice experiment.

Jon Tirsen said...

I'm not sure you'll get the right scalability if you fetch the messages directly from the shards of the other users. That way a very popular user's shard would get more load than other user's shards potentially making that shard impossible to scale.

Instead I would suggest that one particular shard already contains all the data needed to render a response. When a tweet gets published the tweet would get asynchronously "pushed" to the shards containing users following that user. This could make for a more scalable and "faster" architecture but it's also more difficult to handle the failure scenarios.

Tim said...

Yariv, very cool!

Question: Why are you using MySQL for storage instead of mnesia? Which do you think is better for scalability?

Pichi said...

Hello Yariv. It's nice example how ErlyWeb is easy. But there is something buggy. I use:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080404 Iceweasel/2.0.0.14 (Debian-2.0.0.14-2)

I was registered but I can't send any message, can't follow anybody and so. It's look like any POST don't work. When I click on button Submit or Follow, nothing happen and following still empty and I'm not in followers of person which I want follow.

Twoorl - Twitter in Erlang said...

[...] one of the biggest evangelists and contributors to the erlang community, hacked out a twitter clone in erlang over the course of 2 or 3 days.  Although he tries to back out of an ‘Erlang will [...]

dkaz said...

Got this after 2 mins of clicking around...

ERROR erlang code crashed:
File: appmod:0
Reason: {no_such_user,"yariv:"}
Req: {http_request,'GET',{abs_path,"/users/yariv:"},{1,1}}

kellan said...

Making reads in parallel rather misses the point of sharding the data, and assuming your using anything resembling a conventional RDBMS as your backend is going to hit the max connection ceiling under moderate load. (the first time 10 people who follow a 100 people are on the site simultaneously)

Jason Watkins said...

Gathering from shards on a get doesn't seem tenable unless the number of shards is much larger than the average number of followings. You could probably get away with it below that threshold for a while, but your scalability will be limited to the throughput of a single shard (ie your memcache gets/sec).

@Jon Turning twitter into a push system makes it trivial to scale the web piece. It also misses what I suspect are the limiting requirements within twitter: they want to have a system that allows a lot of freedom of looking at the data in different ways.

Personally I think the greatest advantage twitter could leverage (and believe this is true for most web applications) is that the queries aren't truely ad hoc. While they want a flexible system, they'll know the structure of queries before they deploy a given feature. So then one could use some sort of continuous map reduce to materialize various sharded views of the data.

Also, your ruby could be as simple as:

def tweets users; users.map(&:fetch_tweets).flatten.sort; end

You could write a pmap implementation fairly easily in ruby, but ruby's threading model is strictly co-operative.

BasementCoder [dot] Com » Twitter, Scaling and Solving the wrong problem said...

[...] have even been some attempts to supposedly solve the problem that people think Twitter has. Namely, that it doesn’t scale very well and that [...]

Phil Rand said...

Hi Yariv,

Nice work. I haven't followed Erlyweb much since a few weeks after you first announced it, and I was wondering why you use both mysql and mnesia?

Btw, I'm also unable to follow, in Firefox 2.0.0.14 on Mac OSX 10.5.3.

--Phil

Twoorl « Jason’s Weblog said...

[...] http://yarivsblog.com/articles/2008/05/28/announcing-twoorl-an-open-source-erlyweb-based-twitter-clo... [...]

Jay Phillips said...

More Ruby cleanup.

def tweets_for_users(users)
users.map(&:tweets).flatten.sort_by(&:created_at).reverse
end

Seth Ladd said...

You should definitely duplicate the post to every follower. As mentioned above, spawning processes to fetch posts across shards will eventually fail. Instead, continue to shard, but save a copy of the message in every follower's feed. Need to delete a post? No problem, simply spawn processes to go out to every shard to clean it up. Posts can't be edited, just added or deleted.

Remember, normalization is for operational systems, not large web systems. Copy that data!

ps You have a winner here if you can market it.

links for 2008-05-30 « Brent Sordyl’s Blog said...

[...] Yariv’s Blog » Blog Archive » Announcing Twoorl: an open source ErlyWeb-based Twitter clone a Twitter clone in Erlang? I sat down and hacked one, and thus Twoorl was born. It took me one full day plus a couple of evenings. The codebase is about 1700 lines (including comments). You can get it at http://code.google.com/p/twoorl (tags: twitter erlang opensource) [...]

john conroy said...

gr8 post. gotta check this thing out

Confluence: Greenhouse said...

Enterprise Twitter...

Can we make a simple, enterprise Twitter using this?...

loretoparisi said...

Great work Yariv! Erlang rocks!

Scot said...

How would you realistically go about testing this as the base for a twitter-like service? Who hosts Erlang? How would you get started? I think it has great potential for a white label solution if you can make it easy for people to get started and seek out support.

Cheers

Chris Laux said...

Well, I just spent the last few hours analysing the possibility of just such a project in Erlang. Like Nick above you beat me to it, well done! Have you actually considered setting it up? My idea was to recreate the twitter API 1:1 and import their stuff from the public timeline. Maybe the client developers would be willing to switch considering twitters problems and the fact that the replacement is open source...

Cedric said...

Does Twoorl support friend updates and IM updates? These are the main scaling bottlenecks in the Twitter architecture, based on the developer blog.

Open Parenthesis » Open Source Microblogging said...

[...] Twoorl - a GPL (3) implementation of a microblogging service in Erlang using ErlyWeb. Started (and entirely written?) by Yariv Sadan [...]

Martin Owen Has A Blog : Erlang Talk - Why Functional Programming? said...

[...] then that is unlikely to be the case. Yariv Sadan is working hard to persuade people that Erlang is a great platform for writing web apps, but I’m not convinced, at least not for the [...]

Witold Baryluk said...

Hi, Yariv, is twoorl.com dead?

How about using xmpp and using XEP-0107: User Mood for twoorls? It will make it automatically distirbuted :)