Archive for category software development

Discover your IP from the command line

Many times I find myself needing to keep track of a host on a DHCP’d network where its IP address is subject to change. Here are a collection of command line methods for discovering your IP using both curl/http and dns lookups.

HTTP based lookups

curl -s '' | sed 's/.*Current IP Address: \([0-9\.]*\).*/\1/g'
curl -s
curl -s
curl -s curl

This one is pretty slow, but it sometimes works


DNS based lookups. These are the best options since they’re not likely to be blocked by firewalls and, being UDP, have a low overhead.

dig +short
dig TXT +short

As a bonus, here are two services for decorating an ip address or domain with additional information such as geolocation:

curl -s
curl -s

There are severe limitations to these services so take that into account when deciding what to include in your app.

The commands above were discovered here and here.


Streaming filenames from an overpopulated directory

I have a project which requires that I process files from a directory that contains hundreds of thousands, even into the millions of files. Enough that performing an ls in that directory is painfully slow so I’ve learned to only perform specific file lookups. Until now I had just put up with the slow file listing, but other day I finally had enough and decided to look into why ls and other file listing utilities like the python’s os.listdir are equally slow. Shouldn’t it be possible to just stream filenames out of a directory as you read the filesystem’s index of files in that directory rather than waiting until all of the filenames have been scanned and put into an array first?

It turns out that you can list a million files in a directory but not with ls. The key is to use the getdents system call which exists on both linux and freebsd. While a separate command line utility based on the C code in the first or second links will work, what I really wanted to do is stream the files in python. Python has the ability, thanks to cython, to interact with system libraries directly. So with a little more digging I was able to find a simple python module that uses cython to wrap the getdents system call and stream out the files from my directory ‘o many files. Since my final module isn’t exactly the same as the one I found I’ll post it below:

from ctypes import CDLL, c_char_p, c_int, c_long, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
    """Directory entry"""
    _fields_ =1  # filename
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
    A generator to return the names of files in the directory passed in
    dir_p = opendir(path)
        while True:
            p = readdir(dir_p)
            if not p:
            name = p.contents.d_name
            if name not in (".", ".."):
                yield name
  1. 'd_off', c_long), # offset to the next dirent ('d_name', c_char * 4096 []

Of Mikes and Davids

Mike is a professor at a reputable university. He teaches advanced machine learning and robotics, he’s finishing up his PhD in computer science, and he always has a new gadget he’s playing with.

David is a software entrepreneur. He has sold a software company or two for modest profit and he always has a business angle he’s working.

I had talked shop with Mike and David separately over the span of several months but one day the three of us managed to get together. Since Mike and David are both in the general information technology field I naturally assumed that we could find a common ground on that topic. I couldn’t have been more wrong. Mike and David clashed on almost every level and in the aftermath I learned a very important lesson about two distinctly different groups of technologists that I now classify as Mikes and Davids.

Mike is meticulous in his work. He needs to understand everything about the problem he is attempting to solve and all of the intricate mathematics behind any possible solution. As a result, it takes Mike a long time to ship a high quality product.

David, on the other hand, is focused on delivering something of value as quickly as possible. As a result, David can quickly churn out a working product which will likely need several iterations in order to work out all of the bugs.

These are fundamentally different approaches which I believe serve as a rosetta stone of sorts to help us decode the motivations and likely future actions of these two different schools of thought.

Let’s say, for example, you need X. David will bang out a version of X for you after pulling a week of all-nighters wherein his kids briefly forget they had a father. What you get will work according to your specifications. But don’t expect it to be pretty. But you’ll put it into production anyway. Because why not? Several months down the road you’ll wonder why your app is so sloooow and you’ll have to go back to David to have him fix a growing list of bugs. Thats not a knock on David. That’s just the nature of his work. Its fast and it’s to specifications.

By contrast Mike will take a very long time to complete a task. But when they do, you will have a rock-solid solution that has been thoroughly vetted. You will also have pages of proofs and data to go along with that solution.

In short, call David when you have an idea you want to have built quickly. And after David builds it, call Mike to build out the next version that will scale.

Both approaches have their place.

Having worked with startups for a long time I can appreciate each one in their own unique way. I am a Mike or a David depending on who I’m working with. I take great pains to build up my Mike and David skillsets equally. I can use Yeoman to quickly generate a skeleton of an application. And I can use SciKit Learn to discover patterns in data to make my processes more efficient.

I’ve decided the best engineers are honest with themselves on whether they are naturally more of a Mike or a David and are actively working to move towards the other end of the spectrum.

A script to turn a Youtube playlist into mp3s

There are a lot of interesting lectures on Youtube. Recently I’ve taken to adding these lectures to a playlist and processing them on my mac using the following script:

youtube-dl -o '%(stitle)s.%(ext)s' $YOUR_YOUTUBE_PLAYLIST

for file in ./*.mp4; do
	echo "processing $file"
	file = $(print '%q' "$file")
	ffmpeg -i "$file" -filter:a "atempo=2.0, pan=stereo|c1<c0+c1" -c:a libmp3lame -q:a 4 "$file.mp3"

rm -rf *.mp4

You’ll need youtube-dl and ffmpeg installed for this script to work. Both of these are available from brew.

This script also encodes the youtube video at 2x speed and combines the left and right channels into the right channel. Remove the filter flag if you don’t want your files processed this way.

LiveStream chat-only mode

Recently I’ve taken to participating in a LiveStream of a Sunday School I physically attend. Because I’m there physically I don’t need the video so Here’s what I use to remove the video and expand the chat pane to fill up the resulting free space.


An idiomatic way of converting an Option[String] into an Option[Int] in Scala

This always returns an Option[Int]

Option("bad") filter { _ != None } map { catching(classOf[NumberFormatException]) opt _.toInt } getOrElse None

Javascript replace on a capture group

I ran into a problem recently where I needed to perform a regex replace on a string and also manipulate the string captured in a capture group at the same time. What I discovered is that its valid to pass a function as the second argument to the replace function which gets passed the capture groups as arguments 1+

So here’s my code to capture a person’s name and escape it.

story = story.replace(person\.go\?ID=\d+"\s*[^>]*>([^(<)]+)<\/a>/g,function() {
     return 'person.go?ID='+escape(arguments[1].replace(/\./g,''))+'">'+arguments[1]+'';

ios7 form input patch

ios7 appears to have broken input fields for a number of web applications. Input fields now take two taps to allow the user to input data even though the keyboard is brought up after only one click. Here’s a hack to fix the input fields for any of your webapps ios7 broke.

if(window.navigator.standalone) {
    var arr = document.all.tags("input");
    var len = arr.length;
    for(;len--;) {
            var tel =;
            setTimeout(function() {
            }, 150);

Getting useful index information from MongoDB

Here is a MongoDB script for presenting index information in a more concise way than getIndexes() provides. This script also presents an index’s total size along with a breakdown of its size on all of the shards.

//mongo --eval="var collection='file';"

var ret = db[collection].getIndexes().map(function(i){
    return {"key":i.key, "name"};

var o = {};
for(r in ret) {
    o[ret[r].name] = ret[r].key;

var cstats = db[collection].stats();
for(k in cstats.indexSizes) {
    o[k].totalsize = cstats.indexSizes[k];

var shardinfo = cstats.shards;
for(s in shardinfo) {
    for(k in shardinfo[s].indexSizes) {
        if(!o[k].shards) o[k].shards = {};
        o[k].shards[s] = shardinfo[s].indexSizes[k];


Produces the following output:

    "_id_" : {
        "_id" : 1,
        "totalsize" : 50501459568,
        "shards" : {
            "shard0000" : 18620766416,
            "shard0001" : 18117909712,
            "shard0002" : 13762783440


Simple Scala Map/Reduce Job

I was recently tasked with writing a Hadoop map/reduce job. This job had the requirement of taking a list of regular expressions and scouring hundreds of gigs worth of log files for matches. Since I’ve been leaning more and more towards Scala I wanted to use it for my job but I also wanted to use Maven for my job’s package management to make the job easy to setup and extend. And finally, I wanted to have unit tests for my mapper and reducer and an overall job unit test. The result is this project I posted to GitHub as a template for future projects. I hope it proves as helpful for others as I’m sure it’ll be for me.

Tags: , , , ,