Posts Tagged cloud computing

How to ROC recovery oriented computing

For the past few years I’ve adopted an attitude and corresponding strategy when it comes to information systems which is basically this.

Stuff happens.

Computers break, drives crash, kids try to see if CD drives will cure play-dough, “somebody” causes virus infestations to occur.

In short. Stuff happens.

So I’ve decided that rather than try to prevent any of these things from happening. A truly impossible feat. It is better to plan in advance for these events. In other words, view information systems as if their malfunctioning is a foregone conclusion and plan accordingly.

Recovery oriented computing1 is not a new concept. Already it has provided rich rewards to businesses, particularly internet-based businesses such as Google, Amazon, eBay, E*Trade, etc. Unfortunately, however, this same computing practice has not filtered down to the average consumer.

The basic idea behind recovery oriented computing is quite simple. From the abstract on the initial paper on ROC:

Our approach, denoted recovery-oriented computing (ROC), recognizes the inevitability of unanticipated failure and thus emphasizes recovery and repair rather than simple fault-tolerance. We define the properties that a ROC system must provide, and briefly consider how they might be achieved.

While this paper and approach are largely geared towards large internet services with many critical systems in the back-end. I believe the same approach can and should be adapted to the average household which tends to contain multiple computers, making many recovery oriented techniques possible.

One such principle is to keep critical data replicated automatically in multiple places. I’ve written about how network attached storage systems can aid in this endeavor. However I have been increasingly impressed by the ability of cloud-based solutions such as Dropbox to achieve an even greater degree of data replication and availability.

Along these lines it is also helpful to plan for your systems to die. So I would argue that it is worthwhile to keep in mind the cost of repair and replacement when purchasing new systems. For me, this means that a slightly less powerful system that costs 1/2 the price of a top-of-the-line model is far more desirable since it can theoretically last twice as long as the other model (possibly even with upgrades if replaced with newer technology in the future) since the cost is about half of the other system. Cheaper systems also mean that it is feasible to have a spare or two (or three) lying around as “hot standbys”.

While recovery oriented computing is geared mostly towards large businesses, getting in the mindset recovery oriented computing promotes can yield rich rewards.

  1. Official Berkeley/Stanford
    Recovery-Oriented Computing Site

Tags: , ,

Running PHP in Java

Many might consider even the thought of running PHP inside of a Java Virtual Machine to be anathema. Others will wonder why bother (apart from the novelty). However running PHP in Java has one crucal benefit: it future-proofs your code.

Quercus is a nifty utility that will allow you to run PHP code in clouds such as Google App Engine1. This means your Drupal and WordPress sites can now be distributed across a highly avaliable and scalable cloud infrustructure.

Now if we can only get an MVC framework like Kohana or Symfony to work on top of this system..

  1. Other great articles on running PHP in Google’s App Engine can be found here and here. IBM has also highlighted this utility. []

Tags: , , , , , ,

Diskless computing vs distributed computing

A friend of mine recently asked me about cloud computing, what it was, and the ramifications of it on where we will see technology in the coming years. In his question he demonstrated a common confusion among most people between the difference between cloud computing and diskless computing.

Both of these are interesting areas of computer science, they do sometimes overlap, and they are both going to change computing in general in significant ways as time rolls on, but they are not the same.

Here’s are the differences to help  you can tell them apart.

Diskless computing

Diskless computing is best demonstrated in the Linux Terminal Server Project (excellent project, I’ve use it before to deploy over 150 diskless workstations in a company before) and Microsoft’s pathetic rival, Windows Terminal Services. Sun has their own solution as well and there are countless 3rd party utilities, but the basic idea behind them all is that you have one big computer (or series of computers) that all these “headless” computers connect to in order to retrieve an operating system, store files, etc. For large networks this network model is absolutely amazing.

Cloud Computing

Cloud computing, however, is the concept that you have a large problem that requires a lot of computing power to solve. Rather than buy bigger and bigger hardware, what we’ve found out (going back to Cray supercomputers) is that it is far better to split the problem down into iterative chunks and push those through multiple processors all at once rather than try to get a single processor to process everything. This is called distributed computing.

You might have heard of one of the major platforms for this type of computing, Beowulf, from the popular internet meme “imagine a beowulf cluster of…” Another very popular distributed computing platform (popular because it is far easier to install, operate, and write code for than the Beowulf project) is Hadoop. Hadoop is a project inspired by Google’s implementation of the MapReduce design paradigm written in Java which makes it a lot more portable.

Projects using Cloud Computing

Parallel processing is done today in a wide variety of settings including:

  • 3D rendering farms for companies such as Disney’s Pixar
  • indexing the web with Google, Yahoo, Microsoft, etc.
  • data mining of all sorts with companies like Wal-Mart, etc.

Join in!

There are some very popular projects using distributed computing technologies that regular people with CPU cycles to spare are encouraged to join in on like:

  • SETI@home where you can help process data that might help us identify extraterrestrial signals
  • Folding@home where you can help search for cures to various diseases
  • Genome@home where you can help map the human genome (again), this is tied closely to the folding@home project above
  • Shrek@home which was a pioneer project that a few of us got to participate in
  • others, including fightaids@home to help fight AIDS and lhc@home to process the massive amounts of data coming from the CERN’s Large Hadron Collider

So while diskless computing and cloud computing can have some areas of overlap (I configured the LTSP network I mentioned earlier to assist with the genome@home project when the systems were idle) they aren’t necessarily tied together.

Tags: , , , , , , , ,

Getting started with Hadoop and MapReduce

Recently I’ve been studying several technologies that appear to form the core of cloud computing. In short, these are the technologies behind such technological marvels as Amazon, Google, Facebook, Yahoo, NetFlix, Pixar, etc.1

Since each of these technologies by themselves is worthy of a new book, and since even those familiar with the common implementation languages of these technologies (like Java and Python), I decided to put together all the resources I’ve found on these technologies in hopes that they will help someone else get started in this fascinating world of distributed or “cloud computing”.

Introduction to cloud computing

One might wonder why they should take the time to learn these technologies and concepts. A fair question to ask considering the amount of time and energy that will potentially be required in order to put any of this knowledge to any functional use. With that in mind I found the following videos particularly helpful in answering the question “why should I care?”:


Hadoop2 is essentially a compilation of a number of different projects  that make distributed computing a lot less painful. The best source of beginner’s information on Hadoop I’ve found has come from these Google lectures as well as from Cloudera‘s training pages:


MapReduce is more of a paradigm than a language. It is a way to write algorithms that can be run in parallel in order to utilize the computing power of a number of computers across a large data set. There are a number of software frameworks that make writing MapReduce jobs a lot easier and in the following videos you will learn how to use some of the most common.

Quickstart packages

As with many complex technologies, just setting up a working environment can be a challenge in itself. One that is enough to discourage the causal learner. To help alleviate the stress of setting up a general Hadoop environment to help you start working with Hadoop and the related cloud technologies, as well to help you gain some useful hands-on experience, here are a few resources to help you get a working Hadoop environment going fairly quickly.

Helpful hint regarding videos: If you are like me and prefer to watch/listen to long lectures in your car or otherwise on the go on your netbook, iPod or other mobile device.  Try looking for the above mentioned videos on Google Video instead of YouTube. Google Video includes a helpful download link that allows you to take a copy of the movie with you.

  1. This article is a continuation of a recent article I wrote on the different approaches to cloud computing taken by Google and Microsoft []
  2. Hadoop was actually inspired by Google, more history and background here. []

Tags: , , , , , ,

Cloud computing 101

I just finished reading a great article1 outlining the difference between Google and Microsoft’s approaches to cloud computing and how, as one company that switched from Virtual Earth to Google Maps put it, it all comes down to speed, speed, speed.

I’m not very big into virtuilization/cloud computing just yet, but with companies like Red Hat posting profits in an otherwise bear market, and with the advent of (mostly) free cloud computing platforms like Google’s App Engine, I’m definatly going to find an excuse to develop at least one test project in the cloud.

  1. This was from the recent Structure 09 conference event. []

Tags: , ,