I was recently tasked with writing a Hadoop map/reduce job. This job had the requirement of taking a list of regular expressions and scouring hundreds of gigs worth of log files for matches. Since I’ve been leaning more and more towards Scala I wanted to use it for my job […]
hadoop
[HT Alex Popescu]
Jeopardy and Hadoop
I’ve recently released a simple json-rpc query bridge (using our own simple json-rpc framework) for HBase at http://code.google.com/p/hbasebridge/ You can use this bridge to query HBase for either the current record or the last few versions of a record. To see the methods http://localhost:8080/hbasebridge/rpc?debug=true Which returns a list of usable RPC […]
Simple HBase query bridge
First, some review Hadoop is a very powerful MapReduce framework based on a white paper released by Google documenting how they have successfully tackled the issue of processing large amounts of data (on the scale of petabytes in many cases) using their proprietary distributed filesystem, GFS. Hadoop is the open […]
Using Python with Hadoop
Recently I’ve been studying several technologies that appear to form the core of cloud computing. In short, these are the technologies behind such technological marvels as Amazon, Google, Facebook, Yahoo, NetFlix, Pixar, etc.1 Since each of these technologies by themselves is worthy of a new book, and since even those […]