Yariv's Blog: June 2006

Wednesday, June 28, 2006

The Pirate Bay: Sponsored byâ€¦. Citibank???

The good ol' Bay has been through a lot lately, but despite all the news and controversy, I haven't visited The Pirate Bay in a while. Today, I finally decided to drop by for a visit to see how the site is doing. Everything looked quite familiar, with one notable change: a bright orange logo resembling a pirate ship in the backdrop of a reborn phoenix as it rises from the flames into a new life -- a fitting metaphor, if I may say. I browsed around just to get a feel of the offerings, and after I drilled down into some categories, to my great astonishment, I encountered a big animated flash banner at the top of the page, right above the usual array of links to torrents for stuff such as the latest build of Windows Vista and different releases of Pirates of the Caribbean (oh, the irony).

Unless you're familiar with the world of BitTorrent sites, nothing I've said so far should surprise you. However, what did strike me as quite... strange, was the fact that this ad tried to peddle me nothing less than a business credit card from a global banking powerhouse, Citibank.

In close proximity to the Citibank banner were ads for services I prefer not to describe verbatim due to certain aesthetic standards I have established for this blog, so I'll let your imagination fill in the details after informing you that the relief these ads were promoting had nothing to do with helping victims of devastating hurricanes.

Conjecturing that Citibank is probably not the only big institution that takes advantages of The Pirate Bay's popularity to market its goods, I refreshed the page a few times until another big fish landed in my net: Verizon DSL.

Clearly, most Pirate Bay visitors aren't to be satisfied with internet connections that are suitable only for browsing a webpage here and there or sending an occasional text message, so I must congratulate the Verizon marketing team on the its bulls-eye aim with targeting the right demographic.

Well, I'm glad to see the Bay is doing well for itself -- better than I expected, in a sense. I'm sure that the support it's been getting from such notable companies will help the Bay sail through any hostile winds that may be lying in its path and into comfortably torrential weather.

Monday, June 26, 2006

Erlang + Yaws + haXe = Perfect Comet Recipe

Comet, the trick of pushing data to a web client using eager client requests combined with prolonged server responses, has been getting a lot of buzz lately. Comet lets you create truly interactive web applications with full bidirectional communications and without the high latency imposed by the more primitive technique of periodic polling. Meebo uses Comet to do its magic, and so does the GTalk integration in Gmail. Most web applications probably don't need Comet, but if you want to be on the cutting edge then Comet opens up many possibilities.

Comet is used for some cool apps, but there's a good reason you shouldn't use it. Unless you're hacker of functional languages of Swedish origins, chances are your web server croaks when it reaches a few thousand simultaneous users. It just happens to be the case that the vast majority of web servers are written in languages that use operating system threads for concurrency, which doesn't scale above a few thousand threads. Other web servers, like Lighttpd, don't use threads at all because they're entirely non-blocking/single-threaded, but their interfaces, e.g. FastCGI, to dynamic languages rely on system threads if not even more heavyweight processes.

You're probably wondering at this point how Meebo does it. Well, I haven't seen Meebo's source code, so I don't know for certain, but my guess is the Meebo hackers wrote an entirely non-blocking/single threaded web server that maintains connections to all users. This web server acts as a simple message router -- it doesn't do database transactions or anything. It just sends messages to other servers that do the heavy lifting -- and that includes the AIM, ICQ, Yahoo Messenger servers, etc. A second possibility is that Meebo's software isn't very scalable and that Meebo just buys a lot of cheap front-end boxes. These are my two theories, at least, because I know that that Meebo's servers are written in C++ (again).

Whatever C++ loops Meebo code jumps through, one thing is pretty clear: Meebo has at least 37 frontend servers. To arrive at this observation, I harnessed the best of my investigative journalism skills, which prompted me to access wwwXX.meebo.com servers in increasing order until www38.meebo.com redirected me to a lower server.

This kind of architecture sounds pretty difficult to develop and maintain. It has many moving parts, it requires a lot of custom code because it doesn't fit well with the standard array of web development tools, or it's just plain expensive. In addition, writing server quality code in C++ is not easy and if you want to upgrade your servers while they're running, well, forget about it. If I were to build a webapp like Meebo, I would do it quite differently: I would ditch C++ and go with Erlang.

Erlang was created with scalability and concurrency in mind, so Erlang has taken a much more effective approach to concurrency than other programming languages, using lightweight threads that are managed by an event-driven VM (you can read my previous posts on the topic for more details). Now, Erlang is not just a language: it has very useful applications, especially Yaws, a powerful web server for dynamic applications. Yaws is written in Erlang, so it has scalability and concurrency built-in. Yaws allows you write server-side code using the good old multi-threaded paradigm, and without worrying about its maintaining the number of connections that that would make other web servers gasp their last dying breath if your site ever gets popular.

If I were to build a web application that uses Comet, I would most certainly use Erlang + Yaws. In fact, I would use a Yaws backend with a haXe client, now that my haXe remoting adapter for Yaws is ready to ship.

Well, that's my take on the matter. As Dennis Miller would say, of course, that's just my opinion -- I could be wrong :) I do urge you, however, to scroll down to my posting with the graph of the experiment comparing the performance of Apache and Yaws in the face of a high number of simultaneous requests or to visit Joe Armstrong's webpage, as it should give you some extra persuasion.

Friday, June 23, 2006

How to Get Net Neutrality When Lawmakers Wonâ€™t Give It to You

ISPs are bent on forcing internet companies to pay extortion fees in order to get preferential treatment over their competitors and our lawmakers aren't going to do anything about it. This is threatening to make the Internet a hostile place to innovation, where companies both big and small have to pay a toll to the ISP gatekeepers in order to stay afloat. Can anything be done to stop them?

Yes.

Google, Ebay, Amazon, Yahoo, Microsoft, and Craigslist should form an alliance wherein all members vow to completely cut off any ISP who attempts to force extortion fees upon any of the alliance's members.

This alliance should be called the Net Neutrality Alliance. It should be open to any Internet company, but it's critical that all the major players participate. Otherwise, the ones that do participate would fear losing market share -- due to lower quality of service imposed on them by the ISP -- to the ones that don't, and the alliance would collapse.

Discrimination can go both ways, and ISPs can lose in this game just as badly as the Internet companies. How many Verizon customers would keep their accounts if they couldn't search on Google, shop on Amazon, or search for apartments or jobs Craigslist? Not many. Even in a largely monopolistic market such as cable, users would only put up with so much before they switch.

The main reason people pay for Internet connection is the content. If a vast pool of useful content disappeared from an ISP's network, its customers would go somewhere else. No ISP would want that to happen, even if it means it has to generate revenue the old fashioned way: by selling Internet access to its customers.

The Internet is an ecosystem in which the fates of the ISPs and the content providers are more tightly linked than the ISPs realize. Nothing can make this clearer than a strong unified front.

Wednesday, June 21, 2006

Erlang Hot Code Swapping -> Hacking Nirvana

When I first heard of Erlang hot code swapping, I thought, "What a fantastic -- no, wait -- essential feature for systems that have five nines availability requirements. No wonder Erlang probably powers my phone company's 911 switch. Too bad hot I won't get to enjoy this powerful feature in my after-work Erlang hacking."

I'm happy to say I was wrong.

In my free time, I've been hacking a haXe remoting adapter into Yaws, a very powerful and scalable Erlang web server. I picked this project because think haXe is a great web client language and Erlang is unbeatable on the server side for certain purposes. I mentioned some of the reasons in previous posts and will probably discuss this more in the future (haXe is also a very good server language, by the way, and is arguably better than Erlang for many applications). What could be better than integrating the two so I can use them both in future projects?

I'm still fairly new to Erlang, and since I only work on this project on my free time, it's not going as fast as I would have wanted. Oh well.

I got the Yaws source code, and frankly I was a little lost at first. Where do I start? I decided that my first victim will be the Yaws JSON serializer/deserializer because it's an independent module. I copied json.erl to haxe.erl, opened Emacs (which I haven't used for programming since college) and a separate Erlang shell window, and modified the module's functions while testing them in the shell. That was relatively straightforward. The most challenging parts were wrapping my head around Continuation Passing Style, which the JSON parser uses, figuring out the haXe binary format, which isn't very documented, and mapping haXe types to Erlang types. haXe has class objects and Enums which Erlang doesn't have. At first, I tried to simulate classes and Enums in Erlang, but later I realized that using such structures in Erlang code would be too laborious. I decided to remove support for such types, also because I believe that arrays and anonymous objects should suffice for most RPC needs.

Now that my serializer/deserializer was finished, I got back to hacking Yaws's internals, and to my original state of confusion. I didn't know exactly how all the Yaws modules interact with each other. All I knew was that yaws_jsonrpc.erl contained the JSON RPC handling logic into which I wanted to hook. I wasn't sure how I would isolate this module from the rest of the system in order to test my implementation, which, at least initially, depended on a haXe client sending requests to the server.

My first approach was to stop Yaws, hack yaws_jsonrpc.erl (generally by adding logging statments in a few places to figure out the code flow), then run the Yaws build and install script, and restart Yaws. Needless to say, this was a very slow development effort, reminiscent of Java servlet hacking in the pre-Eclipse server integration days (a torture so horrible I wouldn't wish it upon even the new landlord who won't renew my lease :) ).

Then I had one of those earth-shattering, life-changing realizations that shook my foundations and elevated me to a higher plane of existence: This isn't Java -- it's Erlang. I can hack the code while Yaws is running and hot-deploy my changes!

Yes, it works, and it's wonderful. I run Yaws in interactive mode, where Yaws exposes an Erlang shell. Every time I make a change to a file, I simply recompile it by calling "c(FileName)." and the changes are deployed into Yaws while Yaws is running. This brings about such a speed-up in prototyping and development that any nostalgia I had left for some IDE-supported keyboard shortcut for a maddeningly slow server restart has gone in a puff of smoke.

This hot code-deploy trick is probably old news to experienced Erlang hackers, but for me it was exciting. Now that I'm armed with new knowledge, my challenge is to stop blogging about coding and to actually write code so I can get this haXe remoting adapter finished.

Why I Moved from Blogger to Wordpress

I used to use Blogger, but I recently decided to move my blog to Wordpress. The primary reason I decided to leave Blogger is Blogger's pathetic security, mostly due to the lack of SSL access. I picked Wordpress for my blog's new home because Wordpress has some of the best features and positive overall experience out of all blogging services I know. In fact, Wordpress's only minor drawback in my mind is the lack of manual control over the templates, but I'm not a customization freak, so this isn't a big concern for me.

Blogger doesn't even let you log in over SSL, not to mention keeping your session over SSL while you're editing your blog. When you change your password, Blogger doesn't even send you a validation email. What does that mean? Every 12 year old hacker armed with Ethereal or tcpdump can steal your password by eavesdropping on your connection, and can then go ahead and change your password and thereby hijack your blog.

Your blog is a large part of your your online identity. It's often the first thing that shows on search engines when people search for your name. It's valuable. I'm not comfortable with the thought that my blog could be hijacked so easily and there's nothing I can do to prevent it. (I did read that certain blogging applications let you use Blogger over SSL, but that's one more hoop than I'm willing to jump.)

I dread the day when somebody stages a large scale attack on Blogger and hijacks thousands if not millions of blogs. Maybe such an event would kick Google's butt into action, getting it to turn on the SSL switch on the Blogger servers. I suppose that if this happened, Blogger could mitigate the disaster by rolling back all changes that happened during the attack, and then resetting all passwords. The damage would be significant, but not irreversible. I'm actually more concerned about individual blogs getting hijacked without Blogger's knowing or caring.

Wordpress has SSL access, so this problem largely doesn't affect Wordpress users (I say "largely" because the Wordpress servers could always be cracked and the user data could be stolen, but the risk is very small). That's a huge advantage for Wordpress, and is the primary reason I moved here. I must say I'm happy here so far. I may decide to host my blog on my own server eventually, which would have is downsides, but it's likely that Wordpress will remain my blog's permanent home.

Wednesday, June 14, 2006

blog migration

Hello, dear visitor. This blog has moved. Please visit its new home: http://yarivsblog.com.

More Erlang

Strange trends are taking place in the web progamming world. As new languages come and go, developers are overlooking a mighty beast whose unparalleled power is $0 plus a mental barrier away: Erlang.

I mentioned Erlang in previous posts. Here's a quick recap on Erlang's history: in the early 1980s, Ericsson assembled a team of computer scientists who were devise the best methods for developing scalable, fault tolerant systems with soft real-time performance requirements. After much experimentation and development, Erlang, a functional language with built in notions of concurrency, was born. This need for a new language was real: no existing language was suitable for solving Ericsson's problems, and when you're in the business of selling telephone switches to the world's largest telcos, you can't let a language with inadequate notions of concurrency and fault tolerance get in your way. The design decisions behind Erlang turned out to be very powerful, and this eventually gave Ericsson a solid market lead over the competition and positioned Ericsson as a dominant force in the telcom switch market.

Fortunately, the power of Erlang isn't stashed away in some grey corporate computer lab. In the 1990's, Ericsson released Erlang to the open source community, thereby giving every developer the power to build scalable distributed backends with (relative) ease.

Since its release, Erlang has been making headway in the open source world. An example of a recent convert is jabber.org, home of the Jabber Software Foundation (Jabber is the leading open IM standard, used by numerous organizations and IM providers, including Google Talk and Gizmo Project). jabber.org has recently switched its Jabber server from jabberd, which is written in C, to ejabberd, written in Erlang. This press release discusses jabber.org's move. jabber.org operates an instant messaging service with very high requirement for reliability and for handling large numbers of simultaneous connections (just like a telephone exchange), so it's no surprise that a server written in Erlang was jabber.org's server of choice.

I think that Erlang's strengths in the areas of concurrency, scalability and fault tolerance make it a good contender for being a more widely used web development language. The main reasons web developers haven't adopted Erlang in large numbers yet are, in my opinion, 1) Erlang has different semantics, which will always discourage some developers 2) Erlang needs better PR and 3) Erlang doesn't have an integrated web development framework like Ruby on Rails (I'm a huge Ruby on Rails fan, by the way). Efforts to build such a framework are apparently under way. Once they are mature, web developers will be able to tap into Erlang's strengths more easily, and Erlang will in turn enjoy the best kind of marketing: word-of-mouth.

How does Erlang achieve much greater scalability with large numbers of concurrent processes than other programming languages? Erlang processes are very lightweight -- much more than OS processes and threads -- and the Erlang VM, BEAM, does the scheduling. BEAM is mostly event driven, and no lightweight process blocks the whole VM for very long. On multi-processor machines, BEAM launches (by default) one scheduler per processor. Erlang applications are normally designed from the ground up with concurrency in mind, so it's easy for Erlang code to take advantage of most, if not all, available processors. In a recent posting on the Erlang mailing list, Joe Armstrong, described an expriment he conducted on a Sun Niagara box with 32 CPUs, in which changing a single function call from map() to pmap() made his application's performance scale with up to 16 CPUs. With upcoming BEAM improvements, additional scalability is expected. Joe gives background to the experiment here. Quote:

Erlang also maps nicely onto multi-core CPUs - why is this? - precisely because we use a non-shared lots of parallel processes model of computation. No shared memory, no threads, no locks = ease of running on a parallel CPU.

Believe me, making your favourite C++ application run really fast on a multi-core CPU is no easy job. By the time the Java/C++ gang have figured out how to throw away threads and use processes and how to structure their application into small lightweight processes they will be where we were 20 years ago.

Does this work? - yes - we are experimenting with Erlang programs on the sun Niagara - the results are disappointing: our message passing benchmark only goes 18 times faster on 32 CPU's - but 18 is not too bad - if any C++ fans want to try the Naigara all they have to do is make sure they have a multi-threaded version of their application, debug it -'cos it probably won't work and they can compare their results with us (and I'm not holding my breath).

Turning a sequential program in a parallel program for the Niagara is really easy. Just change map/2 to pmap/2 in a few well chosen places in your program and sit back and enjoy.

Efficency comes from a correct underlying architecture, in this case being able to actually use all the CPUs on a multi-core CPU. The ability to scale and application, to make it very efficient, to distribute it depends upon how well we can slit the application up into chuncks that can be evaluated in parallel. Erlang programmers have a head start here.

The following graph shows the result of an experiment Joe and colleagues conducted to compare the performance of Yaws, an Erlang web server, and Apache, under very high load -- in effect, a simulated DDOS attack:

apache vs yaws

Here's Joe's explanation:

Apache (blue and green) dies when subject to a load of c. 4000 parallel sessions. Yaws (red) works well even when subject to high load.

The red curve is yaws (running on an NFS file system). The blue curve is apache (running on an NFS file system). The green curve is apache (running on a local file system).

...

Our figure shows the performance of a server when subject to parallel load. This kind of load is often generated in a so-called "Distributed denial of service attack".

Apache dies at about 4,000 parallel sessions. Yaws is still functioning at over 80,000 parallel connections.

You can read the full description of the experiment on Joe's website.

Erlang is powerful, and once it has a good web development framework, I think it will become many more developers' web language of choice. Interesting times are ahead for Erlang.

Sunday, June 11, 2006

Helen OS

I just saw on Digg this interesting a link to an interesting new open source operating system, Helen OS. Among other things, HelenOS has support for SMP, Kernel threads, userspace threads, userspace pseudo-threads ("Userspace pseudo threads are very lightweight threads running in the context of one userspace thread") and IPC ("the ability of userspace threads to communicate with other threads (possibly from different tasks) via sending and receiving, synchronously or asynchronously, short messages"). The full list is here.

Some of these features strike me as very similar to those that provided by Erlang and its virtual machine. I wonder if HelenOS developers took some cues from Erlang's success at scaling to large numbers of concurrent processes by keeping them very lightweight. This raises the interesting question of whether such features, when provided by the OS, make it possible to write C/C++ programs with the same scalability characteristics as Erlang programs at large numbers of concurrent processes. I should stress that the operative word here is "possible" -- not "easy"!

Let's sit back and wait for the benchmarks.

Wednesday, June 07, 2006

Erlang

A few months ago, I discovered Erlang and quickly became fascinated by it. Erlang is a dynamically typed functional programming language that runs on a special virtual machine (called BEAM) and a set of libraries developed by Ericsson for the purpose of building large-scale distributed, faul-tolerant applications with soft real time peformance requirements.

Erlang used to be a proprietary technology developed and owned by Ericsson for building large telephone switches. In 1998, Ericsson released Erlang to the community under an open source license. I first heard about Erlang in the context of ejabberd, when I was looking at different Jabber servers for possible deployment at my company. I initially rejected the idea of deploying a server written in a weird, obscure language called Erlang, but as I dug deeper I discovered Erlang's beauty and the power it gives developers for building scalable, robust distributed applications. (Although we didn't end up deploying ejabberd, a good testament to its quality is that Jabber.org has recently made the switch from jabberd, written in C/C++, to ejabberd.)

Erlang's support for distributed programming is unmatched by any other language I know. A core capability of Erlang is spawning lightweight processes that can send and receive messages to each other. A message can be any Erlang term (e.g. {foo, bar, 34, [4,5,6]}), and Erlang's message sending and pattern matching syntax makes message processing a breeze. I know you may think you can imitate the same concurrency facilities in [your favorite language here] using its threading API, but you're probably mistaken. Erlang processes are much more lightweight than OS threads, and hence Erlang scales much better with large numbers of concurrent processes. In addition, Erlang has capabilites such as hot code swapping and remote code deployment, which, in addition to lightweight processes, most languages are probably many years away from having.

Consider the following example. It shows how to spawn a process, send it messages that for printing to console, and then sending it a message to terminate.


-module(example).
-export([start/0, listen/0]).
listen() ->
  receive
    {msg, Text} ->
      io:format("got message: ~s", [Text]),
      listen();
    stop ->
      io:format("goodbye", [])
  end.

start() ->
  Pid = spawn(example, listen, []),
  Pid ! {msg, "hello world"},
  Pid ! stop.

I hope this gives you a sense of how easy Erlang makes concurrent programming. Of course, this only scratches the surface. There's much more, including a super high performance web server called Yaws and a distributed transactional database called Mnesia, both written in Erlang.

I'll write more about Erlang in the future. For now, I hope I've been able to pique your curiousity. In case you're not fully convinced that Erlang is very powerful, consider the fact that Erlang powers the telephone system in the UK with 31ms downtime per year -- that's 99.9999999% availability. That's very impressive.

Tuesday, June 06, 2006

MacBook: 1, Dual G5: 0

I knew the Intel Macs were fast, but I didn't expect my (low end) MacBook to put my Dual G5 PowerMac to shame in a task that's both CPU and IO intensive.

I timed the compilation time for a mid-size C/C++ project using xcodebuild, and here are my results:

So, the MacBook compiles about 40% faster than the Dual G5.

That's pretty awesome.

Friday, June 02, 2006

Secure Portable Storage with OS X

If you're an OS X user and you store sensitive files on your iPod or flash drive, you're probably looking for ways to secure your data in case your portable storage device falls into the wrong hands. Some flash drives have proprietary data protection mechanisms, but they often don't work with OS X. More importantly, the iPod doesn't have such capability built-in. The best way mechanism I found was to create an encrypted disk image and use it as a virtual drive for your sensitive files. This disk image is safe to carry around because it protects your data with 128 bit AES encryption, which is uncrackable by all practical means.

Here's how you do it:

Open the terminal and type

cd /Volumes/[name of portable storage device]
hdiutil create -fs HFS+ -encryption -type SPARSE -volname "My Drive" securedrive

This creates a new disk image on your portable storage device called securedrive.sparseimage. You can mount the disk image by executing "hdiutil mount securedrive.sparseimage" or by double clicking on the disk image in Finder. This will show the virtual drive in Finder as volume "My Drive" as well as in the /Volumes directory.

You can copy or drag and drop your files into the newly mounted virtual drive and your data will be safe. Just don't forget to cleanly eject (unmount) the virtual drive (using the Finder eject button or by executing 'hdiutil unmount "My Drive"'), as well as you portable storage device, before you physically disconnect the portable storage device from your computer.

Keep in mind is that when you delete files from the virtual drive, the disk image doesn't shrink automatically and the physical space taken by the files remains unavailable. To reclaim this space, unmount the virtual drive and type

cd /Volumes/[name of portable storage device]
hdiutil compact securedrive.sparseimage

That'll give you those precious bytes back.

Thursday, June 01, 2006

At 5th Ave Apple Store

What's the first thing you do when you're at the Apple store on 5th ave?

Collect evidence :)