Scalable Web Apps: Erlang + Python

Motivation

This post describes how to program scalable web applications with Erlang and Python using computational parallelism. Caching and load balancing are well documented elsewhere and beyond the scope of this post.

Web applications, by nature, span two drastically different programming domains: The high-level web design and development domain and the low-level, high-performance, and distributed domain. Since the internet has provided a bridge between these two domains, it’s now possible to realistically provide high-level user interfaces to high-performance back-end applications.

I prefer to do as much programming in Python and the Django web framework as possible because it’s just so easy. ErlyWeb is an Erlang web framework which should provide similar features to Django, though I’ve never used it. In a web application setting, it’s easy to have too many Apache/Python processes which max out server memory or the CPU. For those tasks, I like Erlang. Erlang has parallelism and distribution primitives, which are arguably more elegant than Python’s concurrency primitives, and has SMP support.

Erlang + Python Communication

It seems like a no-brainier to use domain specific languages to ease development efforts in the corresponding domain but the tricky part is making multiple languages communicate when multiple domains are spanned. MochiMedia is a small company that builds many of their products around Erlang + Python and, after shopping around, I chose to use their method of interfacing between the two languages: HTTP + JSON. The data being sent between the two languages is serialized, sometimes with JSON, and then passed along via HTTP. This has a few benefits. First, since HTTP is being used, my Erlang cluster can be across the internet from my Python front-end server. Second, since I’m using an independent intermediate representation to serialize the data (JSON), any component of the application stack may be swapped out for something completely new.

Here’s how it’s done:

  1. Use MochiWeb to enable the Erlang nodes to communicate over HTTP. The latest version of Erlang and OpenSSL headers are needed to compile MochiWeb.
  2. Create a MochiWeb project skeleton. Since MochiWeb is a framework, a script is provided to help create a web server which uses MochiWeb.
  3. Modify the request handler to understand JSON. The only Erlang that really needs to be modified is in [project name]/src/[project name]_web.erl. This is where the processing code goes (ex: map/reduce).

On the Python side, a simple urllib2.urlopen can be used to build Request objects to send to Erlang. Django comes pre-packaged with SimpleJSON to serialize the body of the HTTP request:

def send_to_erlang(data):
    url = "http://erlang.nodes.tld:8000/"
    body = json.dumps(data)
    headers = {'Content-Type': 'application/jsonrequest',
               'User-Agent'  : 'Python/Project/0.1'}
    urlopen(Request(url, body, headers))

Parallelism

Kevin Smith, of Hypothetical Labs, did a great interview with Bob Ippolito, CTO at MochiMedia, which is a great case study for Erlang + Python. Bob talks in-depth about the engineering tasks the model helps overcome.

Computation tasks which can be executed in parallel are key to utilizing Erlang’s distributed parallelism. A relatively small message with big computations is the desired abstraction. For example, the Django web interface could wrap an Erlang distributed map/reduce implementation. The Erlang book enumerates many different paradigms for Erlang distributed parallelism and for programmers who already have an idea, the plists library takes care of all the distribution automatically. A programmer with at least a little experience in both Erlang and Python should be able to hack their way through to a fully functional and scalable web application from here.

Erlang Circular Process Communication

The 1984 Chandy / Misra solution to the dining philosophers problem1 requires philosophers to communicate with each other. In Erlang, one way to do this is to have the philosophers tell the philosopher who sat before them that they are neighbors.

Below is my first Erlang program which sets up the circular communication between processes:

%% Dining Philosophers setup.
%% By Luke Hoersten
%% Public Domain (PD) No Rights Reserved.

%% This is the Chandy / Misra Dining Philosophers Solution and it
%% assumes philosophers can talk to eachother.

-module(dine_phil).
-export([dine/1,sit/1]).

%%%% Setup
dine(Places) -> sit(Places).

%%%% Fork
pickup_fork(Clean) ->
    spawn(fun() -> fork(Clean) end).

fork(IsClean) ->
    receive
        {Phil, is_clean} ->
            Phil ! IsClean,
            fork(IsClean);
        set_dirty -> fork(false);
        set_clean -> fork(true)
    end.

%%%% Sit
sit(Num) -> %% first expects last as left
    First = spawn(fun() -> greet(last, pickup_fork(true)) end),
    sit(Num-1, First, First).

sit(1, Left, First) -> %% last expects first as right
    spawn(fun() -> greet(Left, First, pickup_fork(false)) end);
sit(Num, Left, First) ->
    Current = spawn(fun() -> greet(Left, pickup_fork(false)) end),
    sit(Num-1, Current, First).

%%%% Greet
greet(Left, First, Fork) ->
    First ! {last, self()}, %% last tells first when he's seated
    greet(Left, Fork).

greet(last, Fork) -> %% first waits for his right and last to sit
    receive
        {last, Last} -> greet(Last, Fork)
    end;
greet(Left, Fork) -> %% everyone tells left when he's seated
    Left ! {right, self()},
    receive
        {right, Right} -> eat(Left, Right, Fork)
    end.
  1. In computer science, the dining philosophers problem is an illustrative example of a common computing problem in concurrency. It is a classic multi-process synchronization problem.
    Wikipedia

    []

Fix Disqus Validation Errors on WordPress

Disqus is a great comment system but unfortunately the WordPress plugin generates invalid XHTML. Luckily it’s pretty easy to fix. I’ve made the necessary changes and packaged them up. A simple diff will show the minimal amount of changes needed.

Disqus WordPress Plugin — Fixed validation

SocialThing vs. FriendFeed

With blogs, Tumblr, Twitter, Flickr, Facebook, Pownce, Vimeo etc, it’s hard staying on top of the social web. It’s the ever-cliché problem of “more information that we know what to do with.” The social web, to me, is analogous to CML‘s message passing model. All the modular social apps listed above are communication channels and humans are the actors (the actor model comes full circle ;-D ). Life feeds like SocialThing and FriendFeed are the anonymous send and receive primitives and without them, the social web is nothing more than Java RMI. SocialThing and FriendFeed essentially do the same thing but solve the problem from completely different angles of attack.

FriendFeed

FriendFeed requires all the users friends to also be users of FriendFeed and have added all their feeds to FriendFeed as well. Then, the user must subscribe to all their friends. Since the user has already subscribed to all their friends on all the applications being aggregated anyway, why subscribe again? It’s FriendFeed’s method of filtering. To not hear from certain friends, just don’t add them.

SocialThing

SocialThing social web life feed

SocialThing1, in my opinion, has a much better method of aggregation. The users tells SocialThing about all their web app accounts and SocialThing basically just reads them. So since I already have all my friends added to Twitter, SocialThing will show whatever Twitter would show. FriendFeed would only show my FriendFeed friends I’ve subscribed too who also told FriendFeed about their Twitter. To borrow a bit form set theory, SocialThing starts with the universe and subtracts things out of the friend set (it acutally don’t have that functionality yet). FriendFeed starts with nothing and explicity adds friends to the set.

In my mind, a life feed’s job is to basically act as a join point for sends and receives on my social channels. FriendFeed went and added a whole other later of abstract channels. SocialThing, on the other hand, is much younger than FriendFeed and only has the ability to aggregate a handful of apps. I really need Disqus integration before I can fully use it.

Thanks to Sahil for motivating this post!

  1. I have one SocialThing beta invite left. Comment if you want it. First come, first serve. []

How Google App Engine Affects Startups

Google App Engine
Google has just announced a new service called Google App Engine. “Google App Engine enables [developers] to build web applications on the same scalable systems that power Google applications.” Basically GAE is a scalable web application back-end with a nice API. So what does this mean for web startups?1 Because web startups commonly have an exit strategy based on being acquired by Google or a similar company, and because web startups are bounded financially in growth by constraints such as scalability, GAE means a lot.

Google App Engine Effects on Google

Previous to GAE, Google only directly benefits from startups by acquisition. After an acquisition, Google must spend time scaling the acquired assets to Google proportions.

Now with GAE, Google will profit more directly from startups who pay for GAE services, allowing profit in numbers. Also, if Google chooses to acquire a certain startup, the startups assets are already generally fitted to work with Google’s internal scaled systems.

Google App Engine Effects on Startups

Previous to GAE, startups really only truly ever made big money if they are acquired by Google or, by chancing the storm of scalability investment and try to break though the scalability investment wall like Facebook and Digg. Breaking through is obviously extremely rare and few startups even consider this a valid “exit” strategy.

Now with GAE, startups don’t have to avoid designing their systems to scale up, right from the beginning (a previously extremely expensive and questionably worth it endeavor). GAE’s inherent scalability dramatically reduces the dependency for startups to be acquired as an exit strategy. Growing big is much more practical than in the past.

How’s the Model Changed?

So, is the startup-Google bond being loosened or tightened. Breaking the situation down, it seems that Google is spreading their endeavors more thinly. Without GAE, the startup-Google bond is an “all eggs in one basket” situation. With GAE, startups will be less likely to sell out to Google but, Google’s made more money in the numbers of startups using GAE.

  1. Based on my observations after visiting Disqus. []