Saturday, August 12, 2006

Does Google Have No Interest in Functional Programmers?

Note: before I try to answer this question, I would like to clarify that by "functional programmers" I don't mean programmers who are functional human beings, but programmers who know functional programming. Whether Google is interested in hiring emotionally functional people is beyond the scope of this inquiry, but I would like venture a guess: of course not! Functional people are boring! Just kidding :)

To research this fascinating question, I went to the Google jobs website and ran the following queries:

Just to make sure I'm not completely delusional in assuming that Google actually creates software and therefore wants to hire *any* programmers, I ran the next few queries:

This is strange. I know that Google makes heavy use of Java, Python and C++, but I'm sure Google has many great hackers who love functional languages. Peter Norvig, the director of Research at Google, is a prominent figure in the functional programming world. In addition, Google's distributed computing infrastructure is largely built on MapReduce, a concept that originated in functional programming.

Speaking of MapReduce, let's take a short detour and think about MapReduce maps into Erlang -- no pun intended :)

In Erlang, the (linear) concept of MapReduce is embodied in lists:mapfold(). As Joe Armstrong has shown in his pmap implementation, it would be trivial in Erlang to write a parallelized version of mapfold(), which the Erlang runtime would execute on multiple CPUs and which could easily be distributed onto a cluster of Erlang nodes.

Following is the code, taken from his mailing list posting, for Joe's pmap implementation in Erlang. I doubt it would be possible to write an implementation that's nearly as concise in C++. This speaks well for Erlang, because as Paul Graham said, succinctness is power.

pmap(F, L) ->
S = self(),
Pids = map(fun(I) ->
spawn(fun() -> do_f(S, F, I) end)
end, L),

gather([H|T]) ->
{H, Ret} -> [Ret|gather(T)]
gather([]) ->

do_f(Parent, F, I) ->
Parent ! {self(), (catch F(I))}.

I don't know of any language that would make MapReduce so straightforward to implement as Erlang. Also, in addition to Erlang's message passing primitives, Erlang has hot code swapping, supervision trees, lightweight processes, a distributed database engine called Mnesia and many other goodies that would facilitate a highly robust MapReduce implementation.

None of this is surprising, given that Erlang was designed to power massively scalable phone switches targeting %99.9999999 (nine nines!) availability, so concurrent, distributed programming is Erlang's bread and butter.

I may be crazy, but I just can't help but wonder whether Google would have been better off writing its MapReduce infrastructure in Erlang rather than C++. After all, Erlang has worked wonders for Ericsson in building its high-end distributed telephony products as Ulf Wiger's paper, Four-fold Increase in Productivity and Quality (pdf), points out. (Yes, I know that Ericsson isn't a search engine company, but let's not worry too much over such minutia :) )

Ok, I admit I'm not seriously proposing that Google should write its backend in Erlang. Doing so would totally defeat the purpose. If Google dropped C++ for Erlang, I would lose the competitive advantage I need for building my Erlang-Powered Google Killer! MOAHAHAHAHA! :)

Back to the jobs question: the only way I can explain my findings short of asking somebody at Google (and I don't know anybody who works there), is that Google simply doesn't build any applications in functional languages -- at least not the ones in my searches. So what's the conclusion? If you want to get a programming job at Google (actually, probably in 99.9% of software companies, especially in the US), you should forget those functional programming adventures you've been dreaming about and start cracking open those Java/Python/C#/C++ books.

Well... there is one more option. You can follow in Paul Graham's footsteps and use a functional language as the secret weapon of your very own startup. You'll have fun, and maybe -- just maybe -- you'll strike it rich. That doesn't sound too bad, does it? Just keep in mind that I take no responsiblity for any eventuality of financial ruin or loss of spouse or girlfriend in case the ideas I put in your head don't work out the way you hoped :)


anonymous said...

google uses tons of functional programming techniques, they just don't use the languages you mention. they don't need erlang because they've already solved the scalability problem. :)

Frank said...

Google's reason for not writing MapReduce in Erlang is simple: C++ can be hooked into other languages, like Python and Java. You'd have a better argument if you went after Sawzall instead of MapReduce, but I digress.

Google's reason for not using functional programming languages, like Erlang, are also simple: It's easier to get programmers for Java and Python than Erlang and Lisp.

Yariv said...

In a distributed system, interoperability becomes pretty easy because all languages can speak the same network protocols. A good reason to implement MapReduce in C++ is actually so squeeze as much performance as possible from each CPU, which C++ does better than Erlang for raw computation. About the availability of programmers -- I would expect Google to write most of their applications in Java/Python/C++, but I was truly surprised to see that Google has 0 openings for anybody with functional programming experience. I expected Google to have at least one or two systems written in Lisp but I guess I was wrong.

Frank said...

So I can either write a network protocol in each language I want to use MapReduce in or I can hook in a C++ library? Yeah, I'll go with the latter.

As for CPU performance, it's a moot point. Disk and/or network I/O is going to be the bottleneck.

Yariv said...

I think you're right about C++. It would actually be a pretty good strategy to write a MapReduce client in C++ that communicates with a backend that's written in Erlang :) This way you wouldn't need to write a protocol library in each client language as you could just embed the C++ API with a language-specific wrapper.

Yariv said...

Anonymous -- thanks for the link. It's very interesting! Btw, I don't think Google will every "solve" the scalability problem. If it did, it wouldn't keep building those ridiculous data centers :)