Archive for category bash

A script to turn a Youtube playlist into mp3s

There are a lot of interesting lectures on Youtube. Recently I’ve taken to adding these lectures to a playlist and processing them on my mac using the following script:

youtube-dl -o '%(stitle)s.%(ext)s' $YOUR_YOUTUBE_PLAYLIST

for file in ./*.mp4; do
	echo "processing $file"
	file = $(print '%q' "$file")
	#file_norm = $(echo "$file" | sed 's/[^a-zA-Z0-9]//g')
	ffmpeg -i "$file" "$file.mp3"
done

rm -rf *.mp4

You’ll need youtube-dl and ffmpeg installed for this script to work. Both of these are available from brew.

Share/Save

Simple PhantomJS web scraping script

Here is a simple web scraping script I wrote for PhantomJS, the immensely useful headless browser, to load a page, inject jQuery into it, and then scrape the page using a user-supplied jQuery selector.

page = require('webpage').create()
system = require 'system'

phantom.injectJs "static/js/underscore-min.js"

page.onConsoleMessage = (msg) ->
    if not msg.match /^Unsafe/
        console.log msg

scrapeEl = (elselector) ->
    rows = $ elselector
    for el in rows
        if el.innerHTML
            str = el.innerHTML.trim()
            if str.length > 0
                console.log str

page.open system.args[1], (status) ->
    if status isnt 'success'
        phantom.exit 1
    else
        page.injectJs "static/js/underscore-min.js"
        page.injectJs "static/js/utils.js"
        page.injectJs "static/js/jquery-1.8.2.min.js"
        page.evaluate scrapeEl, system.args[2]
        phantom.exit()

Run it with:

phantomjs scrape_element.coffee "http://www.moviefone.com/coming-soon" ".movieTitle span"

Tags:

Simple init.d script template

Recently I found the need to create an init.d script and since I had a hard time finding an example elsewhere1, here’s the overly simple script I came up with to get the job done:

#!/bin/bash
# myapp daemon
# chkconfig: 345 20 80
# description: myapp daemon
# processname: myapp

DAEMON_PATH="/home/wes/Development/projects/myapp"

DAEMON=myapp
DAEMONOPTS="-my opts"

NAME=myapp
DESC="My daemon description"
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

case "$1" in
start)
	printf "%-50s" "Starting $NAME..."
	cd $DAEMON_PATH
	PID=`$DAEMON $DAEMONOPTS > /dev/null 2>&1 & echo $!`
	#echo "Saving PID" $PID " to " $PIDFILE
        if [ -z $PID ]; then
            printf "%s\n" "Fail"
        else
            echo $PID > $PIDFILE
            printf "%s\n" "Ok"
        fi
;;
status)
        printf "%-50s" "Checking $NAME..."
        if [ -f $PIDFILE ]; then
            PID=`cat $PIDFILE`
            if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
                printf "%s\n" "Process dead but pidfile exists"
            else
                echo "Running"
            fi
        else
            printf "%s\n" "Service not running"
        fi
;;
stop)
        printf "%-50s" "Stopping $NAME"
            PID=`cat $PIDFILE`
            cd $DAEMON_PATH
        if [ -f $PIDFILE ]; then
            kill -HUP $PID
            printf "%s\n" "Ok"
            rm -f $PIDFILE
        else
            printf "%s\n" "pidfile not found"
        fi
;;

restart)
  	$0 stop
  	$0 start
;;

*)
        echo "Usage: $0 {status|start|stop|restart}"
        exit 1
esac

This script will work in /etc/init.d on Xubuntu 11.10 (so most Debian-based systems) and CentOS 5.5 and you can control it via chkconfig.

  1. That said, if you know of such an example I’d love to hear from you. []

Tags: , , , , ,

A few helpful bash command-line one-liners

[HT Peter, commandlinefu]

Query SVN log history and filter by username

svn log | sed -n '/username/,/-----$/ p'

Run the last command as root

sudo !!

Save a file you edited in vim without the needed permissions

:w !sudo tee %

Why is this command so awesome? Peter described it quite well:

This happens to me way too often. I open a system config file in vim and edit it just to find out that I don’t have permissions to save it. This one-liner saves the day. Instead of writing the while to a temporary file :w /tmp/foobar and then moving the temporary file to the right destination mv /tmp/foobar /etc/service.conf, you now just type the one-liner above in vim and it will save the file.

Change to the previous working directory

cd -

Run the previous shell command but replace string “foo” with “bar”

^foo^bar^

Find the last command that begins with “whatever,” but avoid running it

!whatever:p

Copy your public-key to remote-machine for public-key authentication

ssh-copy-id remote-machine

Capture video of a linux desktop

ffmpeg -f x11grab -s wxga -r 25 -i :0.0 -sameq /tmp/out.mpg

Tags: , , , ,

Hacking your router for effective internet monitoring

The Why: Preamble

Working in the information technology sector, one of the most common questions I get asked by parents is about monitoring internet access of their children.1

Most parents want to know what their children are doing online but also recognize that most off-the-shelf products are just as easy to disable or circumvent (or are far more restrictive/bloated than they want) as they are to install or operate. And sadly, enterprise solutions that capture and control network traffic at the most basic level (making circumvention next to impossible) is still very expensive and therefore out of reach for the average family.

What I needed was a cheap and hackable router that I could modify to send captured URLs to a central source for storage and processing.

The What: WRT54G

Linksys-WRT54G-Ultimate-HackingAfter studying my options I remembered reading a lot about the Linksys WRT54G-series routers and how they were originally based on a heavily modified version of Linux and how Linksys made headlines when it lost a court case regarding the GPLed code it used in their router’s firmware.

So I did a little digging.

What I found was a whole router-hacking subculture built around the WRT54G. While it seems that much of the initial fervor has subsided, many of the packages show a last update time of 2007 or so, the documentation is still valid for the most part. The most popular projects which provide custom firmware are the OpenWRT and DD-WRT. While OpenWRT is the original, I found DD-WRT to be a lot more polished and (as we’ll see later) configurable without much headache.

It’s important to note here that the WRT54G has many variants and its easy to fall into the trap of thinking that any old WRT54G will do but a little diligence and study of the differences between the hardware revisions will certainly save you time and money.

After buying a few different routers and bricking one (a Buffalo AirStation WHR-HP-54G2 ) and a false start with a newer WRT54G v7 (anyone need a highly configurable, albeit not-very-hackable router?) I discovered that the best router for hacking is the WRT54GL (which was designed by Linksys to allow for user modifications).

The How: URLSnarf and custom shell scripts

Space on a router is very limited. On the WRT54GL model I eventually ended up using I had 4Megs of space to work with.

The first order of business was to find a package that could monitor all of the network connections (wired and wireless) on the router and capture requested URLs. For this task I discovered  that URLSnarf, part of the dsniff OpenWrt package, worked quite well.

To install packages I used DD-WRT’s firmware modification kit which allowed me to simply add the scripts and packages I wanted without having to recompile everything.

Next I needed to transform the captured URL into a URLencoded string in order to send it to my monitoring service via a simple wget request. Initially I tried using several variations of user-generated Python and PHP packages but they both took up far more space than I could afford so, instead, I searched for a pure command-line based solution.

After some more digging I found a handy sed substitution script that worked like a charm. The script worked in two parts, the first one being the substitution script (/usr/bin/urlencode.sed):

s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/"/%22/g
s/#/%23/g
s/\$/%24/g
s/\&/%26/g
s/'\''/%27/g
s/(/%28/g
s/)/%29/g
s/\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\./%2e/g
s/\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/@/%40/g
s/\[/%5b/g
s/\\/%5c/g
s/\]/%5d/g
s/\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/	/%09/g

and the command line to use it:

sed -f urlencode.sed

To tie it all together, we can pass captured URLs to it via pipes from the command-line with:

urlsnarf | sed -f urlencode.sed

At this point, the only missing link of the capture chain is a script to continually read from the command line and send the urlencoded capture data to our storage application (described in the next part). For this task I used the following script (/usr/bin/urlmon.sh):

HOSTNAME=`hostname`
while read url; do
    DATE=`date +%s`
    echo $(wget -q -O- "http://myapp.appspot.com/log?l=$url&h=$HOSTNAME&t=$DATE")
done

exit 0

Finally, we need to have the router start listening for URLs as soon as it is booted. In a Linux environment this is generally done by init scripts. Since our router has limited capabilities, we don’t need to write a full init script. Here is the slimmed down init script I used (/etc/init.d/S50urlmon):

#!/bin/sh

/usr/sbin/urlsnarf -v "/(192.168.1.1|https\://myapp\.appspot\.com)/" | sed -f /usr/bin/urlencode.sed | /usr/bin/urlmon.sh

The Where: Google App Engine

I’ve been itching to try out Google’s App Engine for a while now and this project seemed to be a great fit since I didn’t know how much data to expect and I needed my receiving/processing/display application to be highly available and scalable. Especially if this works well enough that others might want to use it.

Since my initial phase is to merely capture the URLs requested from devices behind the router, and since the capture process should be as efficient and lean as possible (I don’t want the router to take very long logging a URL when it’s primary job is to retrieve that URL for the initial requester) I decided to make a simple Java servlet which simply takes the URLencoded log line generated by URLSnarf.

Google App Engine uses Java Data Objects enhanced by DataNucleus to store data in Google’s massive cluster. Here is the annotated JDO (LogLine.java) I used to store the captured URL:

import javax.jdo.annotations.IdGeneratorStrategy;
import javax.jdo.annotations.IdentityType;
import javax.jdo.annotations.PersistenceCapable;
import javax.jdo.annotations.Persistent;
import javax.jdo.annotations.PrimaryKey;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class LogLine {
	@PrimaryKey
	@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
	private Long id;

	@Persistent
	private String host;

	@Persistent
	private Long time;

	@Persistent
	private String line;

	public void setId(Long id) {
		this.id = id;
	}

	public Long getId() {
		return id;
	}

	public void setLine(String line) {
		this.line = line;
	}

	public String getLine() {
		return line;
	}

	public String getHost() {
		return host;
	}

	public void setHost(String host) {
		this.host = host;
	}

	public Long getTime() {
		return time;
	}

	public void setTime(Long time) {
		this.time = time;
	}
}

And here is the servlet that processes the GET request (containing the captured URL in Apache Common Log format)

import java.io.IOException;
import java.net.URLDecoder;

import javax.jdo.PersistenceManager;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.werxltd.webmon.data.LogLine;

public class Log extends HttpServlet {
	private final static long serialVersionUID = 3;

	public void doGet(HttpServletRequest req, HttpServletResponse resp)
    	throws IOException {
			try {
				resp.setContentType("text/plain");

				LogLine logline = new LogLine();
				String logStr = URLDecoder.decode(req.getParameter("l"));
				logline.setLine(logStr);
				logline.setHost(req.getParameter("h"));
				logline.setTime(Long.parseLong(req.getParameter("t")));

				PersistenceManager pm = PMF.get().getPersistenceManager();
				pm.makePersistent(logline);
				pm.close();	

				resp.getWriter().println("OK");
			} catch (Exception e) {
				e.printStackTrace();
				resp.getWriter().println("FAIL");
			} finally {

			}

	}
}

The future

This project is still in it’s early stages. There is no real way to view the captured data just yet, though I plan on incorporating Polliwog, and the router software hasn’t been tested as much as I would like. I’m also leery of any security holes I may have introduced.

So if you have any suggestions or would like to know more, feel free to leave a comment below!

  1. Most actually ask about “controlling what their kids see online” but I generally argue for a observe-only approach as it helps open lines of communication with your child whereas silently blocking “bad” sites will only start a silent war which will only frustrate you once they do find a suitable workaround, such as a proxy. []
  2. I might have had better luck had I seen this helpful guide. Oh well, this gives me a future project in figuring out how to de-brick my WHR-HP-54G []

Tags: , , , , , , ,