Archive for category software development

Of Mikes and Davids

Mike is a professor at a reputable university. He teaches advanced machine learning and robotics, he’s finishing up his PhD in computer science, and he always has a new gadget he’s playing with.

David is a software entrepreneur. He has sold a software company or two for modest profit and he always has a business angle he’s working.

I had talked shop with Mike and David separately over the span of several months but one day the three of us managed to get together. Since Mike and David are both in the general information technology field I naturally assumed that we could find a common ground on that topic. I couldn’t have been more wrong. Mike and David clashed on almost every level and in the aftermath I learned a very important lesson about two distinctly different groups of technologists that I now classify as Mikes and Davids.

Mike is meticulous in his work. He needs to understand everything about the problem he is attempting to solve and all of the intricate mathematics behind any possible solution. As a result, it takes Mike a long time to ship a high quality product.

David, on the other hand, is focused on delivering something of value as quickly as possible. As a result, David can quickly churn out a working product which will likely need several iterations in order to work out all of the bugs.

These are fundamentally different approaches which I believe serve as a rosetta stone of sorts to help us decode the motivations and likely future actions of these two different schools of thought.

Let’s say, for example, you need X. David will bang out a version of X for you after pulling a week of all-nighters wherein his kids briefly forget they had a father. What you get will work according to your specifications. But don’t expect it to be pretty. But you’ll put it into production anyway. Because why not? Several months down the road you’ll wonder why your app is so sloooow and you’ll have to go back to David to have him fix a growing list of bugs. Thats not a knock on David. That’s just the nature of his work. Its fast and it’s to specifications.

By contrast Mike will take a very long time to complete a task. But when they do, you will have a rock-solid solution that has been thoroughly vetted. You will also have pages of proofs and data to go along with that solution.

In short, call David when you have an idea you want to have built quickly. And after David builds it, call Mike to build out the next version that will scale.

Both approaches have their place.

Having worked with startups for a long time I can appreciate each one in their own unique way. I am a Mike or a David depending on who I’m working with. I take great pains to build up my Mike and David skillsets equally. I can use Yeoman to quickly generate a skeleton of an application. And I can use SciKit Learn to discover patterns in data to make my processes more efficient.

I’ve decided the best engineers are honest with themselves on whether they are naturally more of a Mike or a David and are actively working to move towards the other end of the spectrum.

Share/Save

A script to turn a Youtube playlist into mp3s

There are a lot of interesting lectures on Youtube. Recently I’ve taken to adding these lectures to a playlist and processing them on my mac using the following script:

youtube-dl -o '%(stitle)s.%(ext)s' $YOUR_YOUTUBE_PLAYLIST

for file in ./*.mp4; do
	echo "processing $file"
	file = $(print '%q' "$file")
	ffmpeg -i "$file" -filter:a "atempo=2.0, pan=stereo|c1<c0+c1" -c:a libmp3lame -q:a 4 "$file.mp3"
done

rm -rf *.mp4

You’ll need youtube-dl and ffmpeg installed for this script to work. Both of these are available from brew.

This script also encodes the youtube video at 2x speed and combines the left and right channels into the right channel. Remove the filter flag if you don’t want your files processed this way.

LiveStream chat-only mode

Recently I’ve taken to participating in a LiveStream of a Sunday School I physically attend. Because I’m there physically I don’t need the video so Here’s what I use to remove the video and expand the chat pane to fill up the resulting free space.

$('.player-wrapper').remove();
$('.chat_wrapper').css('width','100%');

An idiomatic way of converting an Option[String] into an Option[Int] in Scala

This always returns an Option[Int]

Option("bad") filter { _ != None } map { catching(classOf[NumberFormatException]) opt _.toInt } getOrElse None

Javascript replace on a capture group

I ran into a problem recently where I needed to perform a regex replace on a string and also manipulate the string captured in a capture group at the same time. What I discovered is that its valid to pass a function as the second argument to the replace function which gets passed the capture groups as arguments 1+

So here’s my code to capture a person’s name and escape it.

story = story.replace(person\.go\?ID=\d+"\s*[^>]*>([^(<)]+)<\/a>/g,function() {
     return 'person.go?ID='+escape(arguments[1].replace(/\./g,''))+'">'+arguments[1]+'';
});

ios7 form input patch

ios7 appears to have broken input fields for a number of web applications. Input fields now take two taps to allow the user to input data even though the keyboard is brought up after only one click. Here’s a hack to fix the input fields for any of your webapps ios7 broke.

if(window.navigator.standalone) {
    var arr = document.all.tags("input");
    var len = arr.length;
    for(;len--;) {
        arr[len].addEventListener('touchstart',function(ev){
            var tel = ev.target;
            setTimeout(function() {
                tel.focus();
            }, 150);
        });
    }
}

Getting useful index information from MongoDB

Here is a MongoDB script for presenting index information in a more concise way than getIndexes() provides. This script also presents an index’s total size along with a breakdown of its size on all of the shards.

//mongo --eval="var collection='file';"

var ret = db[collection].getIndexes().map(function(i){
    return {"key":i.key, "name":i.name};
});

var o = {};
for(r in ret) {
    o[ret[r].name] = ret[r].key;
}

var cstats = db[collection].stats();
for(k in cstats.indexSizes) {
    o[k].totalsize = cstats.indexSizes[k];
}

var shardinfo = cstats.shards;
for(s in shardinfo) {
    for(k in shardinfo[s].indexSizes) {
        if(!o[k].shards) o[k].shards = {};
        o[k].shards[s] = shardinfo[s].indexSizes[k];
    }
}

printjson(o);

Produces the following output:

{
    "_id_" : {
        "_id" : 1,
        "totalsize" : 50501459568,
        "shards" : {
            "shard0000" : 18620766416,
            "shard0001" : 18117909712,
            "shard0002" : 13762783440
        }
    }
}

Tags:

Simple Scala Map/Reduce Job

I was recently tasked with writing a Hadoop map/reduce job. This job had the requirement of taking a list of regular expressions and scouring hundreds of gigs worth of log files for matches. Since I’ve been leaning more and more towards Scala I wanted to use it for my job but I also wanted to use Maven for my job’s package management to make the job easy to setup and extend. And finally, I wanted to have unit tests for my mapper and reducer and an overall job unit test. The result is this project I posted to GitHub as a template for future projects. I hope it proves as helpful for others as I’m sure it’ll be for me.

Tags: , , , ,

Select distinct for MongoDB

Here is a handy script I’ve been using for MongoDB to retrieve a list of all the fields used in a collection. This uses a map/reduce routine and has to comb over all the documents in a collection so you may want to exercise caution when using this script.

// usage:
// mongo localhost/foo --quiet --eval="var collection='bar';" getcollectionkeys.js
var mr = db.runCommand({
  "mapreduce":collection,
  "map":function() {
    for (var key in this) { emit(key, null); }
  },
  "reduce":function(key, stuff) { return null; }, 
  "out":collection + "_keys"
})

print(db[mr.result].distinct("_id"))

db[collection+"_keys"].drop()

Tags: , , ,

Simple PhantomJS web scraping script

Here is a simple web scraping script I wrote for PhantomJS, the immensely useful headless browser, to load a page, inject jQuery into it, and then scrape the page using a user-supplied jQuery selector.

page = require('webpage').create()
system = require 'system'

phantom.injectJs "static/js/underscore-min.js"

page.onConsoleMessage = (msg) ->
    if not msg.match /^Unsafe/
        console.log msg

scrapeEl = (elselector) ->
    rows = $ elselector
    for el in rows
        if el.innerHTML
            str = el.innerHTML.trim()
            if str.length > 0
                console.log str

page.open system.args[1], (status) ->
    if status isnt 'success'
        phantom.exit 1
    else
        page.injectJs "static/js/underscore-min.js"
        page.injectJs "static/js/utils.js"
        page.injectJs "static/js/jquery-1.8.2.min.js"
        page.evaluate scrapeEl, system.args[2]
        phantom.exit()

Run it with:

phantomjs scrape_element.coffee "http://www.moviefone.com/coming-soon" ".movieTitle span"

Tags: