PillowChat: How Not to Build a Chat Room with jQuery, PHPillow, and CouchDB

After watching J. Chris Anderson show off a CouchDB chat app at an Austin Javascript meeting, I figured Couch might be a good fit for my next large project. Building a clone of my own would be a good way to get familiar with the tech. Since I knew my back end would be PHP, I opted for Kore Nordmann’s PHPillow wrapper after reading some good things on StackOverflow.

Here’s the result:

Structural Overview

Client: jQuery runs in the browser, sending messages and polling the server for new ones. Application settings and state are maintained in the global pillowchat object. Functions beginning with “render” read state information from the global and push it into the DOM.

Server: PHP receives POST requests from the client and handles them in chat.php, sending back JSON messages to the client. There are three CouchDB views defined in views.php for performing the following actions:

Getting a list of users
Getting recent messages
Getting the timestamp on a user’s empty message

Implementation Specifics

CouchDB only stores “message” documents. They contain three properties: Username, tripcode, message, and timestamp. Each message includes a password. The server hashes this into a tripcode that allows users to identify each other without registration. Only the hashes are saved. Mouse over usernames in the chat room to see the tripcodes.

Rather than create a separate document to keep track of users’ most recent activity, I just record timestamps in an empty message document. Every time the client polls for new messages, the empty message document for that client is updated with a timestamp. The getUsers function grabs a list of active users by selecting all the empty message documents with timestamps in the last 5 seconds. Since the views are predefined, you can’t change the timestamp in the request sent to CouchDB. Instead, I return all empty messages and use PHP to loop through the response, looking for the most recent. It seems like there should be a better way to do this.

The Stress Test from Hell

The app looked nice and appeared to work pretty well when I roped in some late night facebook lurkers to test it. Feeling confident in my creation, I showed it off to my neighbor @Kmobs while picking up some cookies from his apartment. The app was in no way prepared for his hordes of CyanogenMod followers. The chat appeared to work okay with about 25 people in the room, but became inaccessible almost immediately after.

All the polling clients hammered the server, maxing out the Linode instance’s CPU and memory.

It’s obvious that something with an event loop like Node.js would have been far more appropriate here. Apache’s processes quickly wiped out the RAM.

Linode provides some nice graphs of CPU, bandwidth, and disk IO. They all confirm that the server was swamped. I’m curious to see how a static web page or WordPress install would stand up to this kind of traffic. The trouble didn’t stop there. During the post mortem I saw that the database size had ballooned to a whopping 5.8 GB.

What had I done wrong? Had I stored the message documents in an inefficient way? Was there some kind of bug causing duplicate documents? Probably not. Here’s what I saw when I wrote a view to dump out all the messages:

Some crafty hackers correctly assumed there was no rate limiting and flooded the chat with long spammy messages. The easiest way to do this is probably Chrome’s javascript console. Some sort of shell script would have also worked.

5,939 MB / 35,800 Documents = 169.9 KB per document

The Takeaway

You already know that you shouldn’t trust clients. This truism is an understatement. You should write your server side code with the assumption that your client code is being manipulated by devious bastards. In this case, the server failed to verify message length and message frequency.

Several improvements could make this app scale better. Websockets are probably better than having clients poll the server every second. When enough people join the chat, the server basically experiences a DDOS attack. Since Websockets don’t really enjoy wide support, the polling method could still be employed in a more conservative manner. Clients could reduce their poll frequency as an inverse function of active users or perform an exponential backoff when their requests time out.

The PHPillow literature base is pretty small. There is a short tutorial on the official site, but it doesn’t cover very many use cases. While the API itself is decently documented, but some more examples would go a long way.

When you create a view in PHPillow, it is stored to CouchDB the first time you execute the code. If you want to modify the view, you must delete the original view in Futon before the changes are stored. This is not a big deal, but it’s frustrating if you don’t know about it. Additionally, I’m not thrilled by the prospect of writing a view every time I want to construct a new type of query. CouchDB is good at selecting ranges, but it’s not immediately apparent how I should locate a document based on 2 string properties, e.g. “firstname=bob&lastname=loblaw”.

The PillowChat source is available on GitHub. It would be fun to see what kind of volume is possible if the aforementioned improvements are implemented. Big thanks to Keyan and the CyanogenMod crowd for the testing.

Edit: You can build a more efficient chat app with far less code using socket.io and node.js. See SocksChat for a simple example.

Resources

http://wiki.apache.org/couchdb/HTTP_view_API
http://wiki.apache.org/couchdb/View_collation#View_Collation
http://arbitracker.org/phpillow/tutorial.html
http://tracker.arbitracker.org/phpillow/api/view/Core/phpillowDateValidator.html
http://tracker.arbitracker.org/phpillow/issue_tracker/issue/11

Miscellany

Styling is borrowed from Jérôme Gravel-Niquet‘s backbone.js demo.
I have no idea what will happen if you open the demo in Internet Explorer.
I changed the name from PillowTalk to PillowChat after I realized someone else on github beat me to the pun.