Archive for category python

Discover your IP from the command line

Many times I find myself needing to keep track of a host on a DHCP’d network where its IP address is subject to change. Here are a collection of command line methods for discovering your IP using both curl/http and dns lookups.

HTTP based lookups

curl -s '' | sed 's/.*Current IP Address: \([0-9\.]*\).*/\1/g'
curl -s
curl -s
curl -s curl

This one is pretty slow, but it sometimes works


DNS based lookups. These are the best options since they’re not likely to be blocked by firewalls and, being UDP, have a low overhead.

dig +short
dig TXT +short

As a bonus, here are two services for decorating an ip address or domain with additional information such as geolocation:

curl -s
curl -s

There are severe limitations to these services so take that into account when deciding what to include in your app.

The commands above were discovered here and here.


Streaming filenames from an overpopulated directory

I have a project which requires that I process files from a directory that contains hundreds of thousands, even into the millions of files. Enough that performing an ls in that directory is painfully slow so I’ve learned to only perform specific file lookups. Until now I had just put up with the slow file listing, but other day I finally had enough and decided to look into why ls and other file listing utilities like the python’s os.listdir are equally slow. Shouldn’t it be possible to just stream filenames out of a directory as you read the filesystem’s index of files in that directory rather than waiting until all of the filenames have been scanned and put into an array first?

It turns out that you can list a million files in a directory but not with ls. The key is to use the getdents system call which exists on both linux and freebsd. While a separate command line utility based on the C code in the first or second links will work, what I really wanted to do is stream the files in python. Python has the ability, thanks to cython, to interact with system libraries directly. So with a little more digging I was able to find a simple python module that uses cython to wrap the getdents system call and stream out the files from my directory ‘o many files. Since my final module isn’t exactly the same as the one I found I’ll post it below:

from ctypes import CDLL, c_char_p, c_int, c_long, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
    """Directory entry"""
    _fields_ =1  # filename
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
    A generator to return the names of files in the directory passed in
    dir_p = opendir(path)
        while True:
            p = readdir(dir_p)
            if not p:
            name = p.contents.d_name
            if name not in (".", ".."):
                yield name
  1. 'd_off', c_long), # offset to the next dirent ('d_name', c_char * 4096 []

Finding yesterday’s beginning and ending unix timestamp

When writing reports I’ve often come across the need to find the unix timestamp beginning and end of a day. Here is a Python snippet that does just that.

yesterday = - datetime.timedelta(days = 1)
yesterday_beginning = datetime.datetime(yesterday.year, yesterday.month,,0,0,0,0)
yesterday_beginning_time = int(time.mktime(yesterday_beginning.timetuple()))
yesterday_end = datetime.datetime(yesterday.year, yesterday.month,,23,59,59,999)
yesterday_end_time = int(time.mktime(yesterday_end.timetuple()))

print yesterday_beginning_time
print yesterday_end_time

Tags: , , ,

Fun with heatmaps

Recently I’ve been playing with a few new technologies. Some are new to me while most are simply new. My base project to use these technologies is a heatmap visualization of churches in Georgia. While heatmaps in themselves aren’t exactly exciting, having the ability to map more than 10,000 data points at a time in real-time is.

So here are the various technologies I used in the creation of my app.


To get the data for this project I used a simple Python script and YQL to get a list of locations in each Zip code in Georgia with the term “Church” in them. This yielded approximately 10,000 results, including a few from South Carolina.


I stored the data I got from YQL in MongoDB. Specifically, I used MongoLab’s to host the data because they have a generous free storage limit (256Meg) and can be accessed from a RackSpace server without incurring an additional bandwidth charge.

MongoDB is a JavaScript-based NoSQL solution used by sites like Foursquare. It has built-in support for Geo queries and is built to scale.


For the application layer I decided to try out node.js. Node is also JavaScript-based, based on WebKit, the engine behind Google’s Chrome and Apple’s Safari browsers. Node is event-based which means it has the potential to be lightning fast.


The biggest factor in how well a heatmap solution performs is the graphics package that’s used. After searching around I found a pretty decent PHP heatmap solution, but it used GD and was very slow. I also found a JavaScript solution that used the new HTML5 Canvas element, but it choked when given a significant amount of datapoints to render all at once. So I decided to refactor some of the utility functions from the PHP solution and combine them with the Canvas function.

The great thing about the resulting solution is that it has the ability to run on either the client-side or the server-side. And in the end the heatmap application I built uses both. If the number of data points in a tile is less than a preset threshold defined per browser1 the raw data is sent back to the browser which renders the tiles client-side rather than consuming precious server resources.

Web Sockets

The usual way to serve tiles is by serving images to overlay the map. And while the heatmap solution I developed does serve static PNG files from cache, I decided to use the new HTML5 Web Sockets to make things a bit more interesting. What is great about web sockets is that it allows me to pass events between the server and client very easily. made it easy to forget where the server ended and where the client began.


As applications scale to multiple threads, processes, and eventually across servers and networks, they need to have a way for each component to communicate with other components in an efficient manner. So add a bit of future-proofing to my solution I decided to use ZeroMQ to pass messages between the front-end web server component and the back-end tile generator component. This allows me to tune the performance of the application in both directions, up or down2.


To add some extra pizzazz to the app I decided to add in the ability to display each individual data point along with some additional detailed information. I found that Google’s native Marker system was a bit slow when it came to displaying over 2,000 markers at a time so I decided to give the Raphaël graphics library a try. The results were impressive. Raphael was not only able to draw thousands of data points on the map seamlessly, but was able to do it with smooth animations. Look for gRaphaël to be employed in future renditions of this heatmap solution.


Every now and then I run across a programming challenge that reminds me why I love doing what I do. These technologies and this project have done that for me. Being able to throw together a large, complex project like this in a relatively quick manner reminded me of Fred Brooks’s comment on why programming is fun.

  1. 1000 for WebKit-based browsers, 250 for Mozilla-based browsers, and 0 for IE because IE still sucks. []
  2. Tuning performance down comes in handy when you are on a shared server with limited resources. []

Tags: , , , , , , , , , , ,

Fred Brooks on the promise of object oriented programming

One view of object-oriented programming is that it is a discipline that enforces modularity and clean interfaces. A second view emphasizes encapsulation, the fact that one cannot see, much less design, the inner structure of the pieces. Another view emphasizes inheritance, with its concomitant hierarchical structure of classes, with virtual functions. Yet another view emphasizes strong abstract data-typing, with its assurance that a particular data-type will be manipulated only by operations proper to it.

Now any of these disciplines can be had without taking the whole Smalltalk or C++ package—many of them predated object-oriented technology. The attractiveness of object-oriented approach is that of a multivitamin pill: in one fell swoop (that is, programmer retraining), one gets them all. It is a very promising concept.

Why has object-oriented technique grown slowly? In the nine years since “NSB,” the expectancy has steadily grown. Why has growth been slow? Theories abound. James Coggins, author for four years of the column, “The Best of comp.lang.c++ ” in The C++ Report, offers this explanation:

The problem is that programmers in O-O have been experimenting in incestuous applications and aiming low in abstraction, instead of high. For example, they have been building classes such as linked-list or set instead of classes such as user-interface or radiation beam or finite-element model. Unfortunately the self-same strong type checking in C++ that helps programmers to avoid errors also makes it hard to build big things out of little ones.

He goes back to the basic software problem, and argues that one way to address unmet software needs is to increase the size of the intelligent workforce by enabling and coopting our clients. This argues for top-down design:

we design large-grained classes that address concepts our clients are already working with, they can understand and question the design as it grows, and they can cooperate in the design of test cases. My ophthalmology collaborators don’t care about stacks; they do care about Legendre polynomial shape descriptions of corneas. Small encapsulations yield small benefits.

David Parnas, whose paper was one of the origins of object-oriented concepts, sees the matter differently. He writes me:

The answer is simple. It is because [O-O] has been tied to a variety of complex languages. Instead of teaching people that O-O is a type of design, and giving them design principles, people have taught that O-O is the use of a particular tool. We can write good or bad programs with any tool. Unless we teach people how to design, the languages matter very little. The result is that people do bad designs with these languages and get very little value from them. If the value is small, it won’t catch on.

-Fred Brooks, The Mythical Man-Month, pg. 220

Tags: , ,

McAfee Secure URL Shortener Firefox Add-on

McAfee LogoFollowing the release of our extension allowing Chrome users to quickly shorten URLs using the cloud-backed security of the service, we are pleased to announce the release of an add-on which allows Firefox users to quickly and securely shorten URLs to share with others.

To download and install this extension, head on over to Firefox’s add-on site.

I also want to give a special thank you to the Mozilla JetPack project for making the development of this extension not only less painful than it otherwise would have been, but actually fun. Thanks guys!

Tags: , , , , , ,

Learning Languages: Python

Here are some helpful resources if you are looking to learn Python.

Google I/O 2008 – Painless Python Part 1 of 2

Google I/O 2008 – Painless Python Part 2 of 2

Dive Into Python (excellent reference)

Tags: , ,

Quick and dirty image sorting script

Recently my wife and I decided to try and wrangle our images into some sort of logical order for easy accessibility. After some thought we decided on a simple system of image-directory/year/month for our images and since our old images were spread out across several folders in no particular order I decided to write a script to copy everything into the right folders.

Here is the Python script I wrote to sort all of our images by creation date into properly ordered folders.1


./ /images/source /images/dest

import sys, shutil, os, time, tempfile
from os.path import join, getsize
from stat import *

if len(sys.argv) != 3:
	print "Usage: "+sys.argv[0]+" [source] [target]"

if not os.path.isdir(sys.argv[1]):
	print "'"+sys.argv[1] +"' is not a valid source directory"

if not os.path.isdir(sys.argv[2]):
	print "'"+sys.argv[2] + "' is not a valid destination directory"

#Define your system's copy command here
copyCmd = "cp -f"

def walk( root, recurse=0, pattern='*', return_folders=0 ):
	import fnmatch, os, string
	result = []

		names = os.listdir(root)
	except os.error:
		return result

	pattern = pattern or '*'
	pat_list = string.splitfields( pattern , ';' )
	for name in names:
		fullname = os.path.normpath(os.path.join(root, name))

		for pat in pat_list:
			if fnmatch.fnmatch(name, pat.upper()) or fnmatch.fnmatch(name, pat.lower()):
				if os.path.isfile(fullname) or (return_folders and os.path.isdir(fullname)):
		if recurse:
			if os.path.isdir(fullname) and not os.path.islink(fullname):
				result = result + walk( fullname, recurse, pattern, return_folders )
	return result

def getTime(file):
	result = []
		st = os.stat(file)
	except IOError:
		print "failed to get information about", file
		result = time.localtime(st[ST_MTIME])
	return result

if __name__ == '__main__':
	log_fd, logfilename = tempfile.mkstemp (".log","psort_")
	logfile = os.fdopen(log_fd, 'w+')
	print "Scanning '%s' for images..." % sys.argv[1]
	files = walk(sys.argv[1], 1, '*.jpg;*.gif;*.png;*.psd;*.tif', 0)
	logfile.write("Found %d images in '%s'...\n" % (len(files), sys.argv[1]))
	print "Copying %d images to '%s'" % (len(files),sys.argv[2])
	for file in files:
		fileTime = getTime(file)
		destination = os.path.join(sys.argv[2], "%s" % fileTime.tm_year)
		if not os.path.isdir(destination):
			logfile.write("Created directory '%s''n" % destination)
		destination = os.path.join(destination, time.strftime("%m", fileTime))
		if not os.path.isdir(destination):
			logfile.write("Created directory '%s''\n" % destination)
		os.system("%s \"%s\" \"%s\"" % (copyCmd, file, destination))
		if os.path.isfile(os.path.join(destination,os.path.basename(file))):
			print ".",
			logfile.write("'%s' => '%s''\n" % (file,destination))
			logfile.write("[FAIL] '%s' => '%s''\n" % (file,destination))
	print "Finished copying files, log file avaliable at %s" % logfilename
  1. Directory walking code taken from here with slight modification to make patterns truly case insensitive. []

Tags: , ,