Posts Tagged java

Simple Scala Map/Reduce Job

I was recently tasked with writing a Hadoop map/reduce job. This job had the requirement of taking a list of regular expressions and scouring hundreds of gigs worth of log files for matches. Since I’ve been leaning more and more towards Scala I wanted to use it for my job but I also wanted to use Maven for my job’s package management to make the job easy to setup and extend. And finally, I wanted to have unit tests for my mapper and reducer and an overall job unit test. The result is this project I posted to GitHub as a template for future projects. I hope it proves as helpful for others as I’m sure it’ll be for me.

Share/Save

Tags: , , , ,

A Java String permutations utility

On a recent project I need to find all the possible permutations of a given URL. Stripping off subdomains, paths, and query parameters. Here is the first part of the solution. A method which takes a string and strips it down based on a given divider in a given direction at a given interval.

Here’s the code:

private static String[] getPermutations(String whole, String divider, int lim, int dir) {
	   String[] chunks = whole.split((divider.matches("\\.") ? "\\" : "")+divider);
	   System.out.println("chunks.length: "+chunks.length);
	   	   
	   if(chunks.length <= lim) {
		   System.out.println("return whole: "+whole);
		   return new String[]{whole};
	   }
	   
	   String[] permutations = new String[chunks.length-lim];
	   
	   if(dir == 1) {
		   	permutations[0] = whole;
   		    System.out.println("permutations[0]: "+permutations[0]);
			for(int i = 1; i < chunks.length-lim; i++) {
			   String permutation = "";
			   for(int o = i; o < chunks.length; o++) {
				   permutation += (o == i ? "" : divider) + chunks[o];
			   }
			   permutations[i] = permutation;
			   System.out.println("permutations["+i+"]: "+permutations[i]);
			}
	   } else if(dir == -1) {
		   for(int i = 0; i < chunks.length-lim; i++) {
			   String permutation = "";
	 		   for(int o = 0; o < chunks.length-i; o++) {
				   permutation += (o == 0 ? "" : divider) + chunks[o];
			   }
			   permutations[i] = permutation;
			   System.out.println("permutations["+i+"]: "+permutations[i]);
			}
	   }
	   
	   return permutations;
   }

Here is an example of it being used.

Input:

      getPermutations("com",".",1, 1);
      getPermutations("google.com",".",1, 1);
      getPermutations("a.b.c.d.e.cool.google.com",".",1, 1);
      getPermutations("/path/asdf/a/b/c/d/e/f","/",0, -1);
      getPermutations("a=b&c=d&e=f","&",0, -1);

Output:

$ javac Runme.java && java Runme
chunks.length: 1
return whole: com
chunks.length: 2
permutations[0]: google.com
chunks.length: 8
permutations[0]: a.b.c.d.e.cool.google.com
permutations[1]: b.c.d.e.cool.google.com
permutations[2]: c.d.e.cool.google.com
permutations[3]: d.e.cool.google.com
permutations[4]: e.cool.google.com
permutations[5]: cool.google.com
permutations[6]: google.com
chunks.length: 9
permutations[0]: /path/asdf/a/b/c/d/e/f
permutations[1]: /path/asdf/a/b/c/d/e
permutations[2]: /path/asdf/a/b/c/d
permutations[3]: /path/asdf/a/b/c
permutations[4]: /path/asdf/a/b
permutations[5]: /path/asdf/a
permutations[6]: /path/asdf
permutations[7]: /path
permutations[8]: 
chunks.length: 3
permutations[0]: a=b&c=d&e=f
permutations[1]: a=b&c=d
permutations[2]: a=b

Next I'll need to combine the URL components to get a list of valid URLs..

Tags: ,

Introducing jpoxy, the simple json-rpc implementation for Java

werx-jsonrpc has been renamed to jpoxy (hosted on Google Code). But the project name isn’t the only thing that’s changed. Based on some excellent feedback I’ve made the following changes:

  • The project is now thread-safe. I’ve removed request and response as global objects and, instead, made sure they get passed along the request chain.
  • This means the events for request and response have been taken out (which, I’ve been informed, I didn’t make a big enough deal about last time). The only event that you will get now is the init complete event which will provide your application a copy of the servlet configuration.
  • You can still access the request object (and, subsequently, the session object) by accepting a single HashMap parameter in your method.
  • A new annotation has been added, @Jpoxy(enabled = false), which allows you to tell the framework not to expose a method if it is public. This means you can mark all your internal event listener code so they aren’t exposed to the public.
  • You can now expose methods which accept a complex Java object. This means you can expose a method which accepts, say, a User object and the framework will use Jackson to marshal the parameters you provide into that object.

We also now have a Google group! So if you have any questions or comments we would love to hear from you.

Tags: , , , ,

Simple JSON-RPC updated to 0.9.7

Simple JSON-RPC has been updated to 0.9.7. This new version includes the ability to enable using full class names rather than simple method names. This makes using multiple classes with the same public method names possible.

To enable this functionality simply add an init-param to your web.xml file like:

<init-param>
	<param-name>use_full_classname</param-name>
	<param-value>true</param-value>
</init-param>

With this feature enabled, you then call methods via their full classname + method name separated by a dot (this nomenclature is for both static as well as non-static methods, the framework handles the particulars in the back-end).

Special thanks to Stephan of Cross Pollinate for suggesting this improvement!

Tags: , , ,

Bible Flashcards for Android 1.0

Bible Flashcards is an Android application based on the data files provided by the Crosswire Bible Society‘s Flashcard application. It contains flashcards for both Greek and Hebrew along with appropriate fonts for proper display.

I’ve also included an additional lesson set titled “greekBasics” which includes flashcards for the Greek alphabet.

Here are some screenshots:

You can find Bible Flashcards in the Android app Market by entering “Bible Flashcards” or by scanning the barcode below:

Additionally, if you are interested in helping out with new lessons or if you would like to provide feedback, please feel free to email me.

Tags: , , , , , , ,

Learning Languages: Java

The first thing you’ll need to consider is the development environment you want to use primarily. This is important as it will have an impact on how you run through tutorials and examples later on.

Environments can be broken down into two broad categories; command-line or a visual IDE. Both have their merits and you’ll eventually need to be familiar with both (especially if you expect to be releasing production code or participating in any well-maintained development environment).

The most common command-line environments are Maven and Ivy. Both come with a somewhat steep learning curve (which, unfortunately is unavoidable) but both are well worth investigating as they are both very common in production environments.

Starting out, however, you’ll most likely find that using a visual IDE will help you get right down to learning and compiling example code fairly quickly.

There are several common IDEs; IntelliJ, NetBeans, and Eclipse are all great ones that I’ve seen used in production. My favorite hands-down is Eclipse, especially since it also has configurations to help you develop in other languages such as PHP (Aptana) and C/C++.

For general basics and a broad overview of Java; I would recommend you take a look at the Java Beginner site.

There’s also several handy video tutorials (mostly using Eclipse) on YouTube such as this one:

Once you get the basics down, I’ve found that working on a full project helps. A great place to start would be to help our with an existing open-source project like JSword.

Tags: , , , ,

Simple HBase query bridge

I’ve recently released a simple json-rpc query bridge (using our own simple json-rpc framework) for HBase at http://code.google.com/p/hbasebridge/

You can use this bridge to query HBase for either the current record or the last few versions of a record.

To see the methods

http://localhost:8080/hbasebridge/rpc?debug=true

Which returns a list of usable RPC methods:

{
  "jsonrpc": "2.0",
  "result": {"method": [
    {
      "class": "com.werxltd.hbasebridge.HBaseInfo",
      "name": "listtables",
      "params": [],
      "returns": "org.json.JSONObject",
      "static": false
    },
    {
      "class": "com.werxltd.hbasebridge.HadoopInfo",
      "name": "clusterstatus",
      "params": [],
      "returns": "org.json.JSONObject",
      "static": false
    },
    {
      "class": "com.werxltd.hbasebridge.HadoopInfo",
      "name": "jobstatus",
      "params": [],
      "returns": "org.json.JSONObject",
      "static": false
    },
    {
      "class": "com.werxltd.jsonrpc.RPC",
      "name": "listrpcmethods",
      "params": [],
      "returns": "org.json.JSONObject",
      "static": false
    },
    {
      "class": "com.werxltd.hbasebridge.TableLookup",
      "name": "lookup",
      "params": ["org.json.JSONObject"],
      "returns": "org.json.JSONObject",
      "static": false
    }
  ]}
}

To list tables:

http://localhost:8080/hbasebridge/rpc?debug=true&method=listtables

Which returns:

{
  "jsonrpc": "2.0",
  "result": {"tables": [
    "mytable"
  ]}
}

To get the current status of the cluster:

http://localhost:8080/hbasebridge/rpc?debug=true&method=clusterstatus

Which returns:

{
  "jsonrpc": "2.0",
  "result": {
    "activetrackernames": [
      "trackernode1:localhost/127.0.0.1:33455",
      "trackernode2:localhost/127.0.0.1:54616",
    ],
    "blacklistedtrackernames": [],
    "blacklistedtrackers": 0,
    "jobqueues": {"queues": [{
      "jobs": [
        {
          "cleanuptasks": [{"state": ""}],
          "complete": false,
          "filename": "hdfs://hadoophdfsnode:9000/data/hadoop/mapred/system/job_201003191557_0442/job.xml",
          "jobpriority": "normal",
          "mapprogress": 1,
          "name": "My mapreduce job",
          "reduceprogress": 0.9819000363349915,
          "runstate": "running",
          "schedulinginfo": "NA",
          "setupprogress": 1,
          "starttime": 1269024863960,
          "username": "hadoop-admin"
        }
      ],
      "name": "default"
    }]},
    "jobtrackerstate": "running",
    "maptasks": 1,
    "maxmaptasks": 116,
    "maxmemory": 2079719424,
    "maxreducetasks": 58,
    "reducetasks": 16,
    "tasktrackers": 34,
    "ttyexpiryinterval": 600000,
    "usedmemory": 969170944
  }
}

Key/Value Query:

http://localhost:8080/hbase_tape/rpc?debug=true&data={"method":"lookup","params":{"table":"tablename","keys":["mykey"]}}

Results:

{
  "jsonrpc": "2.0",
  "result": {"rows": [{"mykey": {
    "family:col": "myvalue"
  }}]}
}

Key/Value query with versions:

http://localhost:8080/hbase_tape/rpc?debug=true&data={"method":"lookup","params":{"table":"tablename","keys":["mykey"],versions:2}}

Results:

{
  "jsonrpc": "2.0",
  "result": {"rows": [{"mykey": {
    "family:col": [{
      "value": "myval",
      "version": 123456789
    }],
    "family:col": [{
      "value": "myoldval",
      "version": 123456788
    }]
  }}]}
}

The code should also provide a handy reference for anyone who wants to learn how to query HBase and scrape Result objects for values without knowing family or column names in advance.

Tags: , , , , , , , ,

Simple Java implementation of JSON-RPC

Preamble

Explanation of standard formats and protocols:

  • JSON (JavaScript Object Notation) is a lightweight1 data-interchange format with language bindings for C, C++, C#, Java, JavaScript, Perl, TCL and many others.
  • JSON-RPC is a simple remote procedure call protocol similar to XML-RPC although it uses the lightweight JSON format instead of XML.

What is it?

I love the simplicity of JSON-RPC when it comes to rapidly devloping cutting-edge web applications.

I recently decided to start writing more server-side code in Java servlets in order to take advantage of cloud-based infrastructures such as the Google App Engine. Until now I have written my web applications using PHP as the server-side language. Specifically using frameworks such as Symfony or Kohana, which makes writing simple JSON-RPC services relatively trivial.

So I started looking for a simple JSON-RPC system I could use in Java to abstract my business logic classes from the mundane tasks associated with handling web requests. I found several packages which all claimed to implement the JSON-RPC protocol via Java servlets, however they seemed to require far more by way of setup than I wanted and they all seemed to require the developer to write applications to their specific implementation’s standards.2

So I decided to write yet another Java implementation of the JSON-RPC specification myself with the following goals in mind:

  • Easy to implement. Setup for this package should be kept at a minimum. This includes both development as well as production setup.
  • Easy to code in. Application developers using this class should not need to know much about JSON-RPC beyond exposing methods in their code that can be called remotely from other applications via the web.
  • Non-invasive. Developers using this implementation should be able to reuse plain old Java object (POJO) classes as much as possible, making the transport layer of JSON-RPC as transparent as possible.

With these goals in mind I set off to develop the package com.werxltd.jsonrpc to be a simple wrapper designed to be used inside of a standard .war project.

Setting it up

You can either download the .jar file here to include in your project manually or (and this is my preferred method) you can import the com.werxltd.jsonrpc package as a dependency in your Maven-managed project by specifying the following in your project’s pom.xml configuration file:

<repositories>
    <repository>
        <id>werxltd</id>
        <url>http://maven.werxltd.com </url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
        <releases>
           <enabled>true</enabled>
       </releases>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>com.werxltd</groupId>
        <artifactId>jsonrpc</artifactId>
        <version>0.9</version>
    </dependency>
</dependencies>

Next, you’ll need to specify endpoints in your .war file’s web.xml configuration file. Here’s an example:

<web-app>
    <servlet>
        <servlet-name>example</servlet-name>
        <servlet-class>com.werxltd.jsonrpc.RPC</servlet-class>
        <init-param>
            <param-name>rpcclasses</param-name>
            <param-value>YourClass</param-value>
        </init-param>
    </servlet>

    <servlet-mapping>
        <servlet-name>example</servlet-name>
        <url-pattern>/example</url-pattern>
    </servlet-mapping>
</web-app>

That’s it! Now your project is configured to filter all requests sent to /example through the JSON-RPC class which examines the class name you passed in (in this case, YourClass) for public methods it can expose. An instance of your class is created internally if your class is not static and it will remain in memory throughout the life of the servlet. Any exceptions your class generates are gracefully wrapped inside a JSON-RPC error message for proper handling upstream.

Using it

While the setup is pretty much straightforward, due to the loose typing found in JavaScript (and, as a result, JSON) there are some caveats in how methods are called. Specifically in how arguments are passed to those methods.

Scanned methods are stored internally with a signature consisting of the method name and how many arguements that method accepts. When a JSON-RPC request is made, the servlet determines how many parameters were included and attempts to match the method requested with a corresponding internal method which has the same number of parameters/arguements.

Because parameters are passed in via the web, only Java primitive data types along with three others, JSONObject, JSONArray and java.lang.String are accepted as valid parameter data types.

If a match of method name and number of parameters is found, an attempt is made to parse the passed-in data into the method’s required type. Any methods which accept as their first parameter a parameter of the JSONObject type, this method automatically takes prescience over all other parameters.

To use the JSON-RPC interface from another application, you must pass a valid JSON-RPC object to your final servlet as a parameter named “json” via either GET or POST. Here is an example of the valid JSON-RPC object you need to pass:

{
    "method":"add",
    "params":[1 2 3]
}

This JSON-RPC implementation accepts either named parameters or positional parameters like the ones shown above. Here is a named parameters example:

{
    "method":"echo",
    "params":{
        "text":"testing"
    }
}

The road ahead

Future development of this class will include more formalized access between the JSON-RPC layer and the underlying classes.

I’m also planning to post the Javadocs and the source to an example project that utilizes the JSON-RPC transport package.

Hope this helps someone else. I’m looking forward to using this class as a central component in many rich web projects I have planned.

  1. Lightweight in both size and resources required to process data encoded in JSON vs. XML. []
  2. I did find this project after finishing the first revision of my class. It looks great and like it would do much of what I wanted, however the code is proprietary. After I finish documenting my classes I plan on releasing them under an OSI-approved licence. If you are interested in helping me with this project, feel free to let me know! []

Tags: , , , , , , , , , ,

Getting Maven and Eclipse to play together

I love using Maven for dependency management and code portability, and I love Eclipse as an enviroment to develop in. However, for the longest time I had trouble getting the two to play well together until I discovered the following commands that made combining the two much easier.

To add Maven repositories to your Eclipse workspace (for code completion and syntax verification) run the following command:

mvn -Declipse.workspace=/path/to/workspace eclipse:add-maven-repo

To add an Eclipse .project file to your project run the following command:

mvn eclipse:eclipse

That’s it! You should now be able to import the project into Eclispe. I haven’t figured out how to build Maven projects in Eclipse yet1 so building and testing your code still requires you to use Maven via the command line.

You’ll also need to re-create the Eclipse project file if you add any dependencies in order for them to be picked up properly in Eclipse.

Need more? Check out this site for more on Maven integration in Eclipse

  1. I’ve seen the plugins but haven’t gotten any to work well enough to rely on. []

Tags: , , , ,

Running PHP in Java

Many might consider even the thought of running PHP inside of a Java Virtual Machine to be anathema. Others will wonder why bother (apart from the novelty). However running PHP in Java has one crucal benefit: it future-proofs your code.

Quercus is a nifty utility that will allow you to run PHP code in clouds such as Google App Engine1. This means your Drupal and WordPress sites can now be distributed across a highly avaliable and scalable cloud infrustructure.

Now if we can only get an MVC framework like Kohana or Symfony to work on top of this system..

  1. Other great articles on running PHP in Google’s App Engine can be found here and here. IBM has also highlighted this utility. []

Tags: , , , , , ,