Archive for category software development

An idiomatic way of converting an Option[String] into an Option[Int] in Scala

This always returns an Option[Int]

Option("bad") filter { _ != None } map { catching(classOf[NumberFormatException]) opt _.toInt } getOrElse None

Javascript replace on a capture group

I ran into a problem recently where I needed to perform a regex replace on a string and also manipulate the string captured in a capture group at the same time. What I discovered is that its valid to pass a function as the second argument to the replace function which gets passed the capture groups as arguments 1+

So here’s my code to capture a person’s name and escape it.

story = story.replace(person\.go\?ID=\d+"\s*[^>]*>([^(<)]+)<\/a>/g,function() {
     return 'person.go?ID='+escape(arguments[1].replace(/\./g,''))+'">'+arguments[1]+'';

ios7 form input patch

ios7 appears to have broken input fields for a number of web applications. Input fields now take two taps to allow the user to input data even though the keyboard is brought up after only one click. Here’s a hack to fix the input fields for any of your webapps ios7 broke.

if(window.navigator.standalone) {
    var arr = document.all.tags("input");
    var len = arr.length;
    for(;len--;) {
            var tel =;
            setTimeout(function() {
            }, 150);

Getting useful index information from MongoDB

Here is a MongoDB script for presenting index information in a more concise way than getIndexes() provides. This script also presents an index’s total size along with a breakdown of its size on all of the shards.

//mongo --eval="var collection='file';"

var ret = db[collection].getIndexes().map(function(i){
    return {"key":i.key, "name"};

var o = {};
for(r in ret) {
    o[ret[r].name] = ret[r].key;

var cstats = db[collection].stats();
for(k in cstats.indexSizes) {
    o[k].totalsize = cstats.indexSizes[k];

var shardinfo = cstats.shards;
for(s in shardinfo) {
    for(k in shardinfo[s].indexSizes) {
        if(!o[k].shards) o[k].shards = {};
        o[k].shards[s] = shardinfo[s].indexSizes[k];


Produces the following output:

    "_id_" : {
        "_id" : 1,
        "totalsize" : 50501459568,
        "shards" : {
            "shard0000" : 18620766416,
            "shard0001" : 18117909712,
            "shard0002" : 13762783440


Simple Scala Map/Reduce Job

I was recently tasked with writing a Hadoop map/reduce job. This job had the requirement of taking a list of regular expressions and scouring hundreds of gigs worth of log files for matches. Since I’ve been leaning more and more towards Scala I wanted to use it for my job but I also wanted to use Maven for my job’s package management to make the job easy to setup and extend. And finally, I wanted to have unit tests for my mapper and reducer and an overall job unit test. The result is this project I posted to GitHub as a template for future projects. I hope it proves as helpful for others as I’m sure it’ll be for me.

Tags: , , , ,

Select distinct for MongoDB

Here is a handy script I’ve been using for MongoDB to retrieve a list of all the fields used in a collection. This uses a map/reduce routine and has to comb over all the documents in a collection so you may want to exercise caution when using this script.

// usage:
// mongo localhost/foo --quiet --eval="var collection='bar';" getcollectionkeys.js
var mr = db.runCommand({
  "map":function() {
    for (var key in this) { emit(key, null); }
  "reduce":function(key, stuff) { return null; }, 
  "out":collection + "_keys"



Tags: , , ,

Simple PhantomJS web scraping script

Here is a simple web scraping script I wrote for PhantomJS, the immensely useful headless browser, to load a page, inject jQuery into it, and then scrape the page using a user-supplied jQuery selector.

page = require('webpage').create()
system = require 'system'

phantom.injectJs "static/js/underscore-min.js"

page.onConsoleMessage = (msg) ->
    if not msg.match /^Unsafe/
        console.log msg

scrapeEl = (elselector) ->
    rows = $ elselector
    for el in rows
        if el.innerHTML
            str = el.innerHTML.trim()
            if str.length > 0
                console.log str system.args[1], (status) ->
    if status isnt 'success'
        phantom.exit 1
        page.injectJs "static/js/underscore-min.js"
        page.injectJs "static/js/utils.js"
        page.injectJs "static/js/jquery-1.8.2.min.js"
        page.evaluate scrapeEl, system.args[2]

Run it with:

phantomjs "" ".movieTitle span"


node.js at Facebook


Tags: , , ,

Simple init.d script template

Recently I found the need to create an init.d script and since I had a hard time finding an example elsewhere1, here’s the overly simple script I came up with to get the job done:

# myapp daemon
# chkconfig: 345 20 80
# description: myapp daemon
# processname: myapp


DAEMONOPTS="-my opts"

DESC="My daemon description"

case "$1" in
	printf "%-50s" "Starting $NAME..."
	PID=`$DAEMON $DAEMONOPTS > /dev/null 2>&1 & echo $!`
	#echo "Saving PID" $PID " to " $PIDFILE
        if [ -z $PID ]; then
            printf "%s\n" "Fail"
            echo $PID > $PIDFILE
            printf "%s\n" "Ok"
        printf "%-50s" "Checking $NAME..."
        if [ -f $PIDFILE ]; then
            PID=`cat $PIDFILE`
            if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
                printf "%s\n" "Process dead but pidfile exists"
                echo "Running"
            printf "%s\n" "Service not running"
        printf "%-50s" "Stopping $NAME"
            PID=`cat $PIDFILE`
            cd $DAEMON_PATH
        if [ -f $PIDFILE ]; then
            kill -HUP $PID
            printf "%s\n" "Ok"
            rm -f $PIDFILE
            printf "%s\n" "pidfile not found"

  	$0 stop
  	$0 start

        echo "Usage: $0 {status|start|stop|restart}"
        exit 1

This script will work in /etc/init.d on Xubuntu 11.10 (so most Debian-based systems) and CentOS 5.5 and you can control it via chkconfig.

  1. That said, if you know of such an example I’d love to hear from you. []

Tags: , , , , ,

Fun with jsonselect

One of the strengths of CSS and jQuery is that it provides a common and powerful mechanism known as a selector language for referencing bits of data, especially data whose structure is not exactly known at runtime which makes such an addressing scheme a perfect fit for the often lumpy world of HTML.

Increasingly JSON is being used as a transport medium for data and with the rise of NoSQL solutions, having a selector language for JSON makes a lot of sense when dealing with JSON documents whose structure isn’t deterministic.

JSONSelect provides a good implementation of just such a JSON selector language but after working with it on a project I found myself needing to do more than it allowed me to do. Namely, I wanted 1. to be able to perform a selection and receive matching paths instead of the data contained in those paths and I wanted 2. to be able to modify data specified at a path location in-place.

jsonselect.match(sel, obj, asPath); // Added the asPath flag to return a path instead of the values
jsonselect.forEach(sel, obj, fun, asPath); // Added the same flag to forEach, I use this to 
jsonselect.get(path,obj); // For getting the value using a path
jsonselect.set(path, value, obj); // For setting the value of a path
jsonselect.del(path,root); // For deleting a path

Here is my modified version of jsonselect in case anyone needs help solving the same problems I mentioned above.

Tags: , , , ,