Archive for category it industry

Simple PhantomJS web scraping script

Here is a simple web scraping script I wrote for PhantomJS, the immensely useful headless browser, to load a page, inject jQuery into it, and then scrape the page using a user-supplied jQuery selector.

page = require('webpage').create()
system = require 'system'

phantom.injectJs "static/js/underscore-min.js"

page.onConsoleMessage = (msg) ->
    if not msg.match /^Unsafe/
        console.log msg

scrapeEl = (elselector) ->
    rows = $ elselector
    for el in rows
        if el.innerHTML
            str = el.innerHTML.trim()
            if str.length > 0
                console.log str

page.open system.args[1], (status) ->
    if status isnt 'success'
        phantom.exit 1
    else
        page.injectJs "static/js/underscore-min.js"
        page.injectJs "static/js/utils.js"
        page.injectJs "static/js/jquery-1.8.2.min.js"
        page.evaluate scrapeEl, system.args[2]
        phantom.exit()

Run it with:

phantomjs scrape_element.coffee "http://www.moviefone.com/coming-soon" ".movieTitle span"
Share/Save

Tags:

Tracking the trackers

Tags:

MongoDB Security Considerations presentation at MongoSF 2012

Here is a presentation I gave at MongoSF 2012 on unique security considerations for MongoDB.

And here are my slides.

Tags: , , , ,

node.js at Facebook

Slides

Tags: , , ,

Hollywood vs the internet

[HT Forbes]

PROTECT IP Act Breaks The Internet from Fight for the Future on Vimeo.

Tags: , ,

How Intellectual Property Hampers the Free Market

[HT Mises Blog]

Tags: , , , , ,

Open-source blueprint for civilization

[HT Mises Blog]

Tags: , , , ,

Ted Talk: Visualizing Humanity with Aaron Koblin

[HT Infosthetics]

Tags: , ,

Lessig: Copyright isn’t just hurting creativity: it’s killing science

[HT Mother Board]

The Architecture of Access to Scientific Knowledge from lessig on Vimeo.

Tags: , ,

Dan Cathy: High Tech for High Customer Touch

Tags: , ,