Archive for category it industry

Stages of creative activity

Dorothy Sayers, in her excellent book, The Mind of the Maker, divides creative activity into three stages: the idea, the implementation, and the interaction. A book, then, or a computer, or a program comes into existence first as an ideal construct, built outside time and space, but complete in the mind of the author. It is realized in time and space, by pen, ink, and paper, or by wire, silicon, and ferrite. The creation is complete when someone reads the book, uses the computer, or runs the program, thereby interacting with the mind of the maker.

This description, which Miss Sayers uses to illuminate not only human creative activity but also the Christian doctrine of the Trinity, will help us in our present task. For the human makers of things, the incompletenesses and inconsistencies of our ideas become clear only during implementation. Thus it is that writing, experimentation, “working out” are essential disciplines for the theoretician.

In many creative activities the medium of execution is intractable. Lumber splits; paints smear; electrical circuits ring. These physical limitations of the medium constrain the ideas that may be expressed, and they also create unexpected difficulties in the implementation.

Implementation, then, takes time and sweat both because of the physical media and because of the inadequacies of the underlying ideas. We tend to blame the physical media for most of our implementation difficulties; for the media are not “ours” in the way the ideas are, and our pride colors our judgment.

Computer programming, however, creates with an exceedingly tractable medium. The programmer builds from pure thought-stuff: concepts and very flexible representations thereof. Because the medium is tractable, we expect few difficulties in implementation; hence our pervasive optimism. Because our ideas are faulty, we have bugs; hence our optimism is unjustified.

Fred Brooks, The Mythical Man-Month, pg. 15

Tags: , ,

Why programming is fun

Fred Brooks, in his excellent work, The Mythical Man-Month, has this to say about why we enjoy programming.

The Joys of the Craft
Why is programming fun? What delights may its practitioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God’s delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child’s first clay pencil holder “for Daddy’s office.”

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures. (As we shall see later, this very tractability has its own problems.)

Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a
keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.

-Fred Brooks, The Mythical Man-Month, pg. 8-9

Tags: , ,

Doug McIlroy on the design of HTML

The original HTML documents recommended “be generous in what you accept”, and it has bedeviled us ever since because each browser accepts different superset of the specifications. It is the specifications that should be generous, not their interpretation. -Doug McIlroy, quoted from The Art of UNIX Programming, pg 21

This is the reason web development is so difficult. Every now and then I run across someone who considers web development to not be “real programming”. I suppose that is because web developers are not only faced with being generous in what they accept by way of input but also because they must account for a wide variety of environments (browsers), each of which requires slightly different output in order to achieve the same effect.

Tags: , ,

HipHop for PHP

Earlier this year Facebook developers caused quite a stir in the PHP community by releasing HipHop, the central piece in their high-performance arsenal.

Here is the video of the initial announcement along with some juicy technical details:

If you are interested in seeing how your PHP project fares when run through HipHop, check out the official HipHop project page.

With tools like HipHop, scripting languages like PHP are no longer subject to the charges of gross computational inefficiency. In short, it’s a great day to be a web developer.

Tags: , , , ,

A few helpful bash command-line one-liners

[HT Peter, commandlinefu]

Query SVN log history and filter by username

svn log | sed -n '/username/,/-----$/ p'

Run the last command as root

sudo !!

Save a file you edited in vim without the needed permissions

:w !sudo tee %

Why is this command so awesome? Peter described it quite well:

This happens to me way too often. I open a system config file in vim and edit it just to find out that I don’t have permissions to save it. This one-liner saves the day. Instead of writing the while to a temporary file :w /tmp/foobar and then moving the temporary file to the right destination mv /tmp/foobar /etc/service.conf, you now just type the one-liner above in vim and it will save the file.

Change to the previous working directory

cd -

Run the previous shell command but replace string “foo” with “bar”

^foo^bar^

Find the last command that begins with “whatever,” but avoid running it

!whatever:p

Copy your public-key to remote-machine for public-key authentication

ssh-copy-id remote-machine

Capture video of a linux desktop

ffmpeg -f x11grab -s wxga -r 25 -i :0.0 -sameq /tmp/out.mpg

Tags: , , , ,

What’s the best way to make sure my data is safe?

I get asked many times from friends and family what the best storage solution is for ensuring data they find to be critical is not lost or corrupted.

Whatever storage solution you decide to use it needs to be unobtrusive and largely automated because, if not, then you’ll find out at the worst possible time (usually in a crisis) that actually recovering your data is nearly impossible and often times, incomplete.

The most unobtrusive solution I’ve found so far is to use a Network Attached Storage solution. The one I use and highly recommend is the D-Link DNS-321 which accepts standard SATA drives (which means they are fast and reliable) in a RAID-1 configuration. RAID-1 means the drives are mirrored, meaning the data is automatically duplicated to two internal drives. Just about any NAS system will work but make sure it includes RAID (most don’t) and isn’t simply a more fancy external hard drive.

Being attached a network attached device also gives you the benefit of not having to rely on too many additional moving parts. For a long time I used to use spare computer systems as storage units but what I quickly found out is that the individual parts in them posed as multiple unnecessary points of failure. Motherboards, RAM, even graphics cards can cause significant headaches when all you care about is the hard drives and the data they contain.

In fact, since Google’s high powered cloud computing infrastructure runs on common hardware like the kind you and I use, it is significant to note the hardware failure rate they discovered from constantly pushing common hardware to it’s limits over long periods of time. This simply means that when you are planning a computational strategy (in this case, storage of sensitive data) you need to plan for failure instead of hoping for the best.

In contrast, having a system that only consists of a minimal operating system and two drives should be able to give you enough time to replace one drive if/when the other one fails and the NAS unit itself is cheap enough that you could easily have a spare mothballed for the rainy day when you’ll need it.

It’s also a good idea to keep a copy of your data in an offsite location. The principle being that if one place storing your data were flattened then the you should be able to recover from the offsite location. The best way to achieve this is through a continuous online storage solution. I personally don’t use an online storage solution but some things to look for in one would be the backing company’s reliability, whether they back your data up to a cloud or a single server, and how well put together their interface software is. Try the free services first, chances are that if they are really as good as they claim to be (and they all claim to be good) you’ll quickly find out during the trial period (which often is a certain amount of allowed data storage). Here are a few free ones, I have used box.net before (for random file storage, not for regular automated backups) and can say it is pretty good.

I’ve also adopted the strategy of using as many online solutions (such as Gmail for email) which allow me to leverage reliable 3rd party clouds which provide inherent protection from data loss and provide the added benefit of allowing me to access my data from a wide variety of computers without having to sync data between every system I want to use.

Finally, focus on only backing up the files you know you will need. There is no reason to back up the entire computer in terms of applications, operating system, etc. Backing up unnecessary data will only serve to max out your storage capacity and quickly overtax your backup solution. Instead, plan on replacing your whole PC (and the operating system it uses, but keep a copy of the applications you use) in the event of catastrophic data loss. If you stick with reasonably reliable hardware your failure rate should be much higher than Google’s (3-4 years). Average costs of new and decent systems are low enough now that treating a computer as a disposable device (like a cell phone) isn’t all too uncommon or that bad of an idea.

Tags: , , ,

Governments calling citizens to ditch Internet Explorer

Google was recently hit by an exploit McAfee has named “Aurora”. This exploit involves all versions of Internet Explorer (though version 6 is getting most of the attention) which has prompted the governments of France and Germany to warn it’s citizens not to use Internet Explorer at all.

Microsoft initially tried to claim that this exploit was trivial but has since issued an out-of-cycle emergency patch for all versions of Internet Explorer.

Looks like now is the perfect time to switch to one of the more superior browsers like Chrome or Firefox.

Here’s a video detailing how this hack works in action in case you are like me and interested in the juicy technical details:

Tags: , , , ,

Passwords revisited

An analysis of 32million leaked passwords provided some interesting insights into the password selection practices of users. Among the key findings are:

  • The shortness and simplicity of passwords means many users select credentials that will make them susceptible to basic forms of cyber attacks known as “brute force attacks.”
  • Nearly 50% of users used names, slang words, dictionary words or trivial passwords (consecutive digits, adjacent keyboard keys, and so on). The most common password is “123456”.
  • Recommendations for users and administrators for choosing strong passwords.

Also, here are the top 10 most commonly used passwords they found:

1. 123456
2. 12345
3. 123456789
4. Password
5. iloveyou
6. princess
7. rockyou
8. 1234567
9. 12345678
10. abc123

I’ve said it before, the first step in computer security is having a strong password policy.

Tags: ,

Taming the blogosphere with Google Reader

What are blogs?

Many of you are wondering what the big deal is with blogs. Well here is a short video on blogs and why they are important/useful:

What’s so great about blogs?

Aside from being able to access specialized information put out on a regular basis, there is one other reason I enjoy reading blogs and consider them to be an essential element in our modern forms of communication.

Blogs help you connect with people.

You learn a lot about someone’s character, thoughts, and passions if you follow what they say on their blog. The trouble is that since blogs are generally authored by one person on individual website it can become time consuming and cumbersome to visit each blog you’re interested in to check for and read any new posts.

How can I keep up with blogs?

The easiest tool I’ve found to help bring a variety of different blogs together into one place is by utilizing the RSS feed offered by most blogs.

Google Reader is a web-based RSS reader which requires a Google account and a little bit of setup, but once you get it going its pretty much automated and will allow you to check a number of blogs without having to spend time visiting each and every website to get updates.

Here is a short video to help you get started with Google Reader:

Tags: , ,

Hacking your router for effective internet monitoring

The Why: Preamble

Working in the information technology sector, one of the most common questions I get asked by parents is about monitoring internet access of their children.1

Most parents want to know what their children are doing online but also recognize that most off-the-shelf products are just as easy to disable or circumvent (or are far more restrictive/bloated than they want) as they are to install or operate. And sadly, enterprise solutions that capture and control network traffic at the most basic level (making circumvention next to impossible) is still very expensive and therefore out of reach for the average family.

What I needed was a cheap and hackable router that I could modify to send captured URLs to a central source for storage and processing.

The What: WRT54G

Linksys-WRT54G-Ultimate-HackingAfter studying my options I remembered reading a lot about the Linksys WRT54G-series routers and how they were originally based on a heavily modified version of Linux and how Linksys made headlines when it lost a court case regarding the GPLed code it used in their router’s firmware.

So I did a little digging.

What I found was a whole router-hacking subculture built around the WRT54G. While it seems that much of the initial fervor has subsided, many of the packages show a last update time of 2007 or so, the documentation is still valid for the most part. The most popular projects which provide custom firmware are the OpenWRT and DD-WRT. While OpenWRT is the original, I found DD-WRT to be a lot more polished and (as we’ll see later) configurable without much headache.

It’s important to note here that the WRT54G has many variants and its easy to fall into the trap of thinking that any old WRT54G will do but a little diligence and study of the differences between the hardware revisions will certainly save you time and money.

After buying a few different routers and bricking one (a Buffalo AirStation WHR-HP-54G2 ) and a false start with a newer WRT54G v7 (anyone need a highly configurable, albeit not-very-hackable router?) I discovered that the best router for hacking is the WRT54GL (which was designed by Linksys to allow for user modifications).

The How: URLSnarf and custom shell scripts

Space on a router is very limited. On the WRT54GL model I eventually ended up using I had 4Megs of space to work with.

The first order of business was to find a package that could monitor all of the network connections (wired and wireless) on the router and capture requested URLs. For this task I discovered  that URLSnarf, part of the dsniff OpenWrt package, worked quite well.

To install packages I used DD-WRT’s firmware modification kit which allowed me to simply add the scripts and packages I wanted without having to recompile everything.

Next I needed to transform the captured URL into a URLencoded string in order to send it to my monitoring service via a simple wget request. Initially I tried using several variations of user-generated Python and PHP packages but they both took up far more space than I could afford so, instead, I searched for a pure command-line based solution.

After some more digging I found a handy sed substitution script that worked like a charm. The script worked in two parts, the first one being the substitution script (/usr/bin/urlencode.sed):

s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/"/%22/g
s/#/%23/g
s/\$/%24/g
s/\&/%26/g
s/'\''/%27/g
s/(/%28/g
s/)/%29/g
s/\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\./%2e/g
s/\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/@/%40/g
s/\[/%5b/g
s/\\/%5c/g
s/\]/%5d/g
s/\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/	/%09/g

and the command line to use it:

sed -f urlencode.sed

To tie it all together, we can pass captured URLs to it via pipes from the command-line with:

urlsnarf | sed -f urlencode.sed

At this point, the only missing link of the capture chain is a script to continually read from the command line and send the urlencoded capture data to our storage application (described in the next part). For this task I used the following script (/usr/bin/urlmon.sh):

HOSTNAME=`hostname`
while read url; do
    DATE=`date +%s`
    echo $(wget -q -O- "http://myapp.appspot.com/log?l=$url&h=$HOSTNAME&t=$DATE")
done

exit 0

Finally, we need to have the router start listening for URLs as soon as it is booted. In a Linux environment this is generally done by init scripts. Since our router has limited capabilities, we don’t need to write a full init script. Here is the slimmed down init script I used (/etc/init.d/S50urlmon):

#!/bin/sh

/usr/sbin/urlsnarf -v "/(192.168.1.1|https\://myapp\.appspot\.com)/" | sed -f /usr/bin/urlencode.sed | /usr/bin/urlmon.sh

The Where: Google App Engine

I’ve been itching to try out Google’s App Engine for a while now and this project seemed to be a great fit since I didn’t know how much data to expect and I needed my receiving/processing/display application to be highly available and scalable. Especially if this works well enough that others might want to use it.

Since my initial phase is to merely capture the URLs requested from devices behind the router, and since the capture process should be as efficient and lean as possible (I don’t want the router to take very long logging a URL when it’s primary job is to retrieve that URL for the initial requester) I decided to make a simple Java servlet which simply takes the URLencoded log line generated by URLSnarf.

Google App Engine uses Java Data Objects enhanced by DataNucleus to store data in Google’s massive cluster. Here is the annotated JDO (LogLine.java) I used to store the captured URL:

import javax.jdo.annotations.IdGeneratorStrategy;
import javax.jdo.annotations.IdentityType;
import javax.jdo.annotations.PersistenceCapable;
import javax.jdo.annotations.Persistent;
import javax.jdo.annotations.PrimaryKey;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class LogLine {
	@PrimaryKey
	@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
	private Long id;

	@Persistent
	private String host;

	@Persistent
	private Long time;

	@Persistent
	private String line;

	public void setId(Long id) {
		this.id = id;
	}

	public Long getId() {
		return id;
	}

	public void setLine(String line) {
		this.line = line;
	}

	public String getLine() {
		return line;
	}

	public String getHost() {
		return host;
	}

	public void setHost(String host) {
		this.host = host;
	}

	public Long getTime() {
		return time;
	}

	public void setTime(Long time) {
		this.time = time;
	}
}

And here is the servlet that processes the GET request (containing the captured URL in Apache Common Log format)

import java.io.IOException;
import java.net.URLDecoder;

import javax.jdo.PersistenceManager;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.werxltd.webmon.data.LogLine;

public class Log extends HttpServlet {
	private final static long serialVersionUID = 3;

	public void doGet(HttpServletRequest req, HttpServletResponse resp)
    	throws IOException {
			try {
				resp.setContentType("text/plain");

				LogLine logline = new LogLine();
				String logStr = URLDecoder.decode(req.getParameter("l"));
				logline.setLine(logStr);
				logline.setHost(req.getParameter("h"));
				logline.setTime(Long.parseLong(req.getParameter("t")));

				PersistenceManager pm = PMF.get().getPersistenceManager();
				pm.makePersistent(logline);
				pm.close();	

				resp.getWriter().println("OK");
			} catch (Exception e) {
				e.printStackTrace();
				resp.getWriter().println("FAIL");
			} finally {

			}

	}
}

The future

This project is still in it’s early stages. There is no real way to view the captured data just yet, though I plan on incorporating Polliwog, and the router software hasn’t been tested as much as I would like. I’m also leery of any security holes I may have introduced.

So if you have any suggestions or would like to know more, feel free to leave a comment below!

  1. Most actually ask about “controlling what their kids see online” but I generally argue for a observe-only approach as it helps open lines of communication with your child whereas silently blocking “bad” sites will only start a silent war which will only frustrate you once they do find a suitable workaround, such as a proxy. []
  2. I might have had better luck had I seen this helpful guide. Oh well, this gives me a future project in figuring out how to de-brick my WHR-HP-54G []

Tags: , , , , , , ,