The Why: Preamble

Working in the information technology sector, one of the most common questions I get asked by parents is about monitoring internet access of their children.1

Most parents want to know what their children are doing online but also recognize that most off-the-shelf products are just as easy to disable or circumvent (or are far more restrictive/bloated than they want) as they are to install or operate. And sadly, enterprise solutions that capture and control network traffic at the most basic level (making circumvention next to impossible) is still very expensive and therefore out of reach for the average family.

What I needed was a cheap and hackable router that I could modify to send captured URLs to a central source for storage and processing.

The What: WRT54G

Linksys-WRT54G-Ultimate-HackingAfter studying my options I remembered reading a lot about the Linksys WRT54G-series routers and how they were originally based on a heavily modified version of Linux and how Linksys made headlines when it lost a court case regarding the GPLed code it used in their router’s firmware.

So I did a little digging.

What I found was a whole router-hacking subculture built around the WRT54G. While it seems that much of the initial fervor has subsided, many of the packages show a last update time of 2007 or so, the documentation is still valid for the most part. The most popular projects which provide custom firmware are the OpenWRT and DD-WRT. While OpenWRT is the original, I found DD-WRT to be a lot more polished and (as we’ll see later) configurable without much headache.

It’s important to note here that the WRT54G has many variants and its easy to fall into the trap of thinking that any old WRT54G will do but a little diligence and study of the differences between the hardware revisions will certainly save you time and money.

After buying a few different routers and bricking one (a Buffalo AirStation WHR-HP-54G2 ) and a false start with a newer WRT54G v7 (anyone need a highly configurable, albeit not-very-hackable router?) I discovered that the best router for hacking is the WRT54GL (which was designed by Linksys to allow for user modifications).

The How: URLSnarf and custom shell scripts

Space on a router is very limited. On the WRT54GL model I eventually ended up using I had 4Megs of space to work with.

The first order of business was to find a package that could monitor all of the network connections (wired and wireless) on the router and capture requested URLs. For this task I discovered  that URLSnarf, part of the dsniff OpenWrt package, worked quite well.

To install packages I used DD-WRT’s firmware modification kit which allowed me to simply add the scripts and packages I wanted without having to recompile everything.

Next I needed to transform the captured URL into a URLencoded string in order to send it to my monitoring service via a simple wget request. Initially I tried using several variations of user-generated Python and PHP packages but they both took up far more space than I could afford so, instead, I searched for a pure command-line based solution.

After some more digging I found a handy sed substitution script that worked like a charm. The script worked in two parts, the first one being the substitution script (/usr/bin/urlencode.sed):

s/ /%20/g
s/ /%09/g
s/	/%09/g

and the command line to use it:

sed -f urlencode.sed

To tie it all together, we can pass captured URLs to it via pipes from the command-line with:

urlsnarf | sed -f urlencode.sed

At this point, the only missing link of the capture chain is a script to continually read from the command line and send the urlencoded capture data to our storage application (described in the next part). For this task I used the following script (/usr/bin/

while read url; do
    DATE=`date +%s`
    echo $(wget -q -O- "$url&h=$HOSTNAME&t=$DATE")

exit 0

Finally, we need to have the router start listening for URLs as soon as it is booted. In a Linux environment this is generally done by init scripts. Since our router has limited capabilities, we don’t need to write a full init script. Here is the slimmed down init script I used (/etc/init.d/S50urlmon):


/usr/sbin/urlsnarf -v "/(|https\://myapp\.appspot\.com)/" | sed -f /usr/bin/urlencode.sed | /usr/bin/

The Where: Google App Engine

I’ve been itching to try out Google’s App Engine for a while now and this project seemed to be a great fit since I didn’t know how much data to expect and I needed my receiving/processing/display application to be highly available and scalable. Especially if this works well enough that others might want to use it.

Since my initial phase is to merely capture the URLs requested from devices behind the router, and since the capture process should be as efficient and lean as possible (I don’t want the router to take very long logging a URL when it’s primary job is to retrieve that URL for the initial requester) I decided to make a simple Java servlet which simply takes the URLencoded log line generated by URLSnarf.

Google App Engine uses Java Data Objects enhanced by DataNucleus to store data in Google’s massive cluster. Here is the annotated JDO ( I used to store the captured URL:

import javax.jdo.annotations.IdGeneratorStrategy;
import javax.jdo.annotations.IdentityType;
import javax.jdo.annotations.PersistenceCapable;
import javax.jdo.annotations.Persistent;
import javax.jdo.annotations.PrimaryKey;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class LogLine {
	@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
	private Long id;

	private String host;

	private Long time;

	private String line;

	public void setId(Long id) { = id;

	public Long getId() {
		return id;

	public void setLine(String line) {
		this.line = line;

	public String getLine() {
		return line;

	public String getHost() {
		return host;

	public void setHost(String host) { = host;

	public Long getTime() {
		return time;

	public void setTime(Long time) {
		this.time = time;

And here is the servlet that processes the GET request (containing the captured URL in Apache Common Log format)


import javax.jdo.PersistenceManager;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;


public class Log extends HttpServlet {
	private final static long serialVersionUID = 3;

	public void doGet(HttpServletRequest req, HttpServletResponse resp)
    	throws IOException {
			try {

				LogLine logline = new LogLine();
				String logStr = URLDecoder.decode(req.getParameter("l"));

				PersistenceManager pm = PMF.get().getPersistenceManager();

			} catch (Exception e) {
			} finally {



The future

This project is still in it’s early stages. There is no real way to view the captured data just yet, though I plan on incorporating Polliwog, and the router software hasn’t been tested as much as I would like. I’m also leery of any security holes I may have introduced.

So if you have any suggestions or would like to know more, feel free to leave a comment below!

  1. Most actually ask about “controlling what their kids see online” but I generally argue for a observe-only approach as it helps open lines of communication with your child whereas silently blocking “bad” sites will only start a silent war which will only frustrate you once they do find a suitable workaround, such as a proxy. []
  2. I might have had better luck had I seen this helpful guide. Oh well, this gives me a future project in figuring out how to de-brick my WHR-HP-54G []

Tags: , , , , , , ,