Archive for category general

What’s the best way to make sure my data is safe?

I get asked many times from friends and family what the best storage solution is for ensuring data they find to be critical is not lost or corrupted.

Whatever storage solution you decide to use it needs to be unobtrusive and largely automated because, if not, then you’ll find out at the worst possible time (usually in a crisis) that actually recovering your data is nearly impossible and often times, incomplete.

The most unobtrusive solution I’ve found so far is to use a Network Attached Storage solution. The one I use and highly recommend is the D-Link DNS-321 which accepts standard SATA drives (which means they are fast and reliable) in a RAID-1 configuration. RAID-1 means the drives are mirrored, meaning the data is automatically duplicated to two internal drives. Just about any NAS system will work but make sure it includes RAID (most don’t) and isn’t simply a more fancy external hard drive.

Being attached a network attached device also gives you the benefit of not having to rely on too many additional moving parts. For a long time I used to use spare computer systems as storage units but what I quickly found out is that the individual parts in them posed as multiple unnecessary points of failure. Motherboards, RAM, even graphics cards can cause significant headaches when all you care about is the hard drives and the data they contain.

In fact, since Google’s high powered cloud computing infrastructure runs on common hardware like the kind you and I use, it is significant to note the hardware failure rate they discovered from constantly pushing common hardware to it’s limits over long periods of time. This simply means that when you are planning a computational strategy (in this case, storage of sensitive data) you need to plan for failure instead of hoping for the best.

In contrast, having a system that only consists of a minimal operating system and two drives should be able to give you enough time to replace one drive if/when the other one fails and the NAS unit itself is cheap enough that you could easily have a spare mothballed for the rainy day when you’ll need it.

It’s also a good idea to keep a copy of your data in an offsite location. The principle being that if one place storing your data were flattened then the you should be able to recover from the offsite location. The best way to achieve this is through a continuous online storage solution. I personally don’t use an online storage solution but some things to look for in one would be the backing company’s reliability, whether they back your data up to a cloud or a single server, and how well put together their interface software is. Try the free services first, chances are that if they are really as good as they claim to be (and they all claim to be good) you’ll quickly find out during the trial period (which often is a certain amount of allowed data storage). Here are a few free ones, I have used box.net before (for random file storage, not for regular automated backups) and can say it is pretty good.

I’ve also adopted the strategy of using as many online solutions (such as Gmail for email) which allow me to leverage reliable 3rd party clouds which provide inherent protection from data loss and provide the added benefit of allowing me to access my data from a wide variety of computers without having to sync data between every system I want to use.

Finally, focus on only backing up the files you know you will need. There is no reason to back up the entire computer in terms of applications, operating system, etc. Backing up unnecessary data will only serve to max out your storage capacity and quickly overtax your backup solution. Instead, plan on replacing your whole PC (and the operating system it uses, but keep a copy of the applications you use) in the event of catastrophic data loss. If you stick with reasonably reliable hardware your failure rate should be much higher than Google’s (3-4 years). Average costs of new and decent systems are low enough now that treating a computer as a disposable device (like a cell phone) isn’t all too uncommon or that bad of an idea.

  • Share/Bookmark

Tags: , , ,

Topic survey

What topics are you most interested in knowing more about? Take a minute to fill out the following short survey and let us know!

  • Share/Bookmark

Tags: ,

Taming the blogosphere with Google Reader

What are blogs?

Many of you are wondering what the big deal is with blogs. Well here is a short video on blogs and why they are important/useful:

What’s so great about blogs?

Aside from being able to access specialized information put out on a regular basis, there is one other reason I enjoy reading blogs and consider them to be an essential element in our modern forms of communication.

Blogs help you connect with people.

You learn a lot about someone’s character, thoughts, and passions if you follow what they say on their blog. The trouble is that since blogs are generally authored by one person on individual website it can become time consuming and cumbersome to visit each blog you’re interested in to check for and read any new posts.

How can I keep up with blogs?

The easiest tool I’ve found to help bring a variety of different blogs together into one place is by utilizing the RSS feed offered by most blogs.

Google Reader is a web-based RSS reader which requires a Google account and a little bit of setup, but once you get it going its pretty much automated and will allow you to check a number of blogs without having to spend time visiting each and every website to get updates.

Here is a short video to help you get started with Google Reader:

  • Share/Bookmark

Tags: , ,

New Year’s Resolutions

According to surveys, only 12% of new-year’s resolutions are actually kept. So I’m not going to try and beat the odds by offering another list of items here.

However as someone who loves getting things done, I figured I would switch gears a bit and offer some productivity tools/methods I’ve found to be particularly helpful.

Inbox Zero

I went to lunch once with a well known speaker, Mike Licona, who lamented that he had almost 2,000 undread and unprocessed emails in his inbox.

While I don’t get nearly that many emails, I have been using a simple email management system known as Inbox Zero that helps me quickly process, sort, and manage my digital communications. Since telling Mike about Inbox Zero, he has managed (after some initial effort) to keep the number of unread messages in his inbox close to zero (hey, it improves your chances of getting a response from him).

Here is a video of Inbox Zero’s creator, Merlin Mann, giving a Google Tech Talks presentation about it:

Getting things done

Getting things done is a pretty simple program aimed at helping you optimize your workflow to help you get more things done.

I like this system because it works with any personality type and accounts for both short-term and long-range planning. It also has a very low learning curve, overhead, and since it does not focus on any single set of utilities or tools it is very adaptable.

Here is an excellent presentation of getting things done by it’s creator, David Allen:

Six sigma

A very popular system among large businesses is Six Sigma. Originally developed as a manufacturing process designed to eliminate manufacturing defects, it has since been adapted to a more general set of principles which can help you have a lot more consistency when it comes to the work you produce.

Six sigma can get pretty complicated, job boards are filled with management requirements of the various “levels” of six sigma experience. However here is a simple introductory video by Kaj Ahlmann of Six Sigma Ranch and Winery. In this video Kaj, one of the founders of Six Sigma, uses his hobby of wine making as an example of six sigma principles:

Hope these methods help you become productive in the new year!

  • Share/Bookmark

Tags: , , , ,

Global warming, greatest myth of this generation

At the outset I must admit that I’ve long been skeptical about the relatively recent claims of global warming. Mostly because I’m old enough to still remember the chicken-little stories about global cooling and how we were all heading for the next ice age as depicted in this TV show circa 1978:

My skepticism regarding global warming, however, was rather mild until I came across Michael Chreighton’s excellent (albiet rather preachy) work, “State of Fear“. Until reading Chreighton’s career-crippling (if not ending) work, I had assumed (or had been lead to assume) that the only people actively fighting global warming were religious zealots and conspiracy theorists. I had bought into the “inconvenient truth” that, as Al Gore (the leading proponent of global warming) puts it, “it’s a settled science”.

Regardless of other observable facts we’re constantly told that we are in imminent danger of catastrophic climate shift due to the “accepted fact” of the earth’s temperature rising. Facts such as: there are 5 times as many polar bears today as there were 50 years ago (despite what global warming advocates try to say to the contrary in order to continue using images of them to promote global warming) or that the oceans are actually cooling, not warming, or that glaciers such as those on Mt Kilimanjaro are not rapidly receding, or

No, all we’ve been given are data sources of tempatures collected around the world and we’re told these numbers paint a grim picture for the future of the earth’s climate, and that we (humans) are to blame!

Unfortunately (for global warming advocates at least) this single point of failure has recently come under direct fire following the release (either via hack or leak) of a large number of emails1 from the Climate Research Unit. The CRU is largely responsible for fueling the global warming hysteria through data and charts, including “the hockey stick” chart which seemed to indicate a sharp rise in temperature from 1980 to 2000.

The emails contained regrets of lack of warming data, mentions of cooking the data to show warming trends, and mentions of suppressing any and all opposition. These emails are quite damaging to the cause of global warming, forcing the head of the CRU, Phil Jones, to step down pending an investigation.

As damaging as the emails are, the source code, leaked along with the emails, looks to be a lot more damning because it shows artificial (VERY ARTIFICIAL in the words of the programmer via comments) limits placed on the data used to generate graphs along with blatant data cooking. Statistician and founder of the free software/open source movement, Eric S. Raymond, writes:

This, people, is blatant data-cooking, with no pretense otherwise. It flattens a period of warm temperatures in the 1940s 1930s — see those negative coefficients? Then, later on, it applies a positive multiplier so you get a nice dramatic hockey stick at the end of the century.

All you apologists weakly protesting that this is research business as usual and there are plausible explanations for everything in the emails? Sackcloth and ashes time for you. This isn’t just a smoking gun, it’s a siege cannon with the barrel still hot.

Incidentally, following the backlash generated by the leaked emails we’ve learned that the original data used to generate these graphs has been erased. Not that we should be overly surprised, it seems that modifying and massaging global warming data has been going on for quite some time and is not limited to the CRU, it’s happened at NASA and New Zeleand’s National Institute of Water and Atmospheric Research (NiWA).

No wonder John Coleman, the founder of The Weather Channel calls Global Warming “the greatest scam in human history“.

At a minimum, the unraveling of the myth of global warming reveals a gross violation of the trust placed in the “unbiased” nature of the scientific community. It calls into question the value of the peer-review process when scientists at the top get to determine what gets peer-reviewed and accepted (which, in turn, allows them to suppress anything they don’t like). At worst, the leaked CRU data and subsequent unraveling of man-made global warming2 are evidence that scientists are humans who have agendas just like everyone else. This incident tends to highlight the notion that “just the facts ma’am” is a bit spurious as facts don’t interpret themselves.

With the explosion caused by climategate it seems inescapable to conclude along with columnist Christopher Brooker that this is the worst scientific scandal of our generation.

  1. You can search through and view the emails here. []
  2. Known formally as anthropogenic global warming. []
  • Share/Bookmark

Tags: , , , ,

Beginner’s guide to load testing

Recently I got tasked with load testing an internal system and producing statistics for the team to show how well it will scale once it is put into production.

After some intense research I decided to go with “The Grinder” which allows multiple tests to be run by multiple machines which can all funnel their collected statistics back up to a central “console”. Tests are written in Python which, in turn, gets fed through Jython and converted into native Java bytecode to be run by participating Grinder agent instances. Grinder works on a single, user-definable port, for both pushing scripts to listening agents as well as gathering statistics from tests.

Initially I decided to try and capture the results of each individual test in a MySQL database but abandoned that idea when the tests ended up overloading the MySQL database server before the web app we were primarily testing. Logging results also proved to be an interesting feat since it swampped the agent’s filesystems after less than an hour (we were running multiple processes and threads) as well.

We eventually settled on simply capturing the combined statistical data at the root console level (the way Grinder is designed) and displaying it via a plugin in Hudson.

Overall Grinder worked great for testing the load of our web app (which passed with flying colors). And since Grinder works natively in Java we are also planning on testing specific Java classes directly in the future as well as their overall performance through a web based front-end such as a servlet.

  • Share/Bookmark

Tags: , , ,

Getting internet access on the road

This weekend we took another trip back to Augusta, GA to visit family. On the way I decided to continue wrestling with my phone (Palm Treo 755p) and netbook (Asus Aspire One with Ubuntu’s Netbook remix distribution) in order to connect to the internet.

This isn’t something new or unique. I’ve managed to do it before with Windows quite easily but this was my first time trying this using Linux.

While it wasn’t as simple to setup and connect as it was under Windows, using this Bluetooth dial-up guide for Ubuntu and these scripts for Verizon I managed to connect and surf the web all the way from Atlanta to Augusta and back again.

One caveat I found, however, is with the rfcomm0 device. The tutorial walks you through creating the rfcomm0  device using configuration files. What I found is that the built-in bluetooth support in Gnome (blueman) worked just fine to create this device.

  • Share/Bookmark

Tags: , , , , , ,

How to grow your website or blog

I’ve recently embarked on a quest to figure out how to improve my website, blogs, and also how to help our hosted customers improve their websites and blogs too. Along the way I’ve managed to run across several very helpful resources I have posted below.

Keep in mind that these resources are not merely about improving your site’s search engine ranking or placement, but about transforming your site’s content as well for your site’s intended audience. Some of this material also covers some emerging trends with social mediums being thrown into the mix (you do have a Twitter and Facebook account, right?).

Like all marketing, or anything involving people for that matter, this isn’t exactly an exact science. There are certainly many areas you’ll need to figure out for yourself as your audience and site purposes are probably not going to fit neatly into a cookie cutter mold (which is good since such sites are easily forgotten). However, I think you’ll find many of the principles and strategies outlined here to be helpful as you build and market your blog/website.

Know of any excellent resources I’ve missed? Let me know via the comments!

  • Share/Bookmark

Tags: , , , ,

Quick and dirty image sorting script

Recently my wife and I decided to try and wrangle our images into some sort of logical order for easy accessibility. After some thought we decided on a simple system of image-directory/year/month for our images and since our old images were spread out across several folders in no particular order I decided to write a script to copy everything into the right folders.

Here is the Python script I wrote to sort all of our images by creation date into properly ordered folders.1

Usage:

./picturesorter.py /images/source /images/dest
#!/usr/bin/python

import sys, shutil, os, time, tempfile
from os.path import join, getsize
from stat import *

if len(sys.argv) != 3:
	print "Usage: "+sys.argv[0]+" [source] [target]"
	sys.exit(-1)

if not os.path.isdir(sys.argv[1]):
	print "'"+sys.argv[1] +"' is not a valid source directory"
	sys.exit(-1)

if not os.path.isdir(sys.argv[2]):
	print "'"+sys.argv[2] + "' is not a valid destination directory"
	sys.exit(-1)

#Define your system's copy command here
copyCmd = "cp -f"

def walk( root, recurse=0, pattern='*', return_folders=0 ):
	import fnmatch, os, string

	result = []

	try:
		names = os.listdir(root)
	except os.error:
		return result

	pattern = pattern or '*'
	pat_list = string.splitfields( pattern , ';' )

	for name in names:
		fullname = os.path.normpath(os.path.join(root, name))

		for pat in pat_list:
			if fnmatch.fnmatch(name, pat.upper()) or fnmatch.fnmatch(name, pat.lower()):
				if os.path.isfile(fullname) or (return_folders and os.path.isdir(fullname)):
					result.append(fullname)
				continue

		if recurse:
			if os.path.isdir(fullname) and not os.path.islink(fullname):
				result = result + walk( fullname, recurse, pattern, return_folders )

	return result

def getTime(file):
	result = []
	try:
		st = os.stat(file)
	except IOError:
		print "failed to get information about", file
	else:
		result = time.localtime(st[ST_MTIME])
	return result

if __name__ == '__main__':
	log_fd, logfilename = tempfile.mkstemp (".log","psort_")
	logfile = os.fdopen(log_fd, 'w+')
	print "Scanning '%s' for images..." % sys.argv[1]
	files = walk(sys.argv[1], 1, '*.jpg;*.gif;*.png;*.psd;*.tif', 0)

	logfile.write("Found %d images in '%s'...\n" % (len(files), sys.argv[1]))
	print "Copying %d images to '%s'" % (len(files),sys.argv[2])
	for file in files:
		fileTime = getTime(file)

		destination = os.path.join(sys.argv[2], "%s" % fileTime.tm_year)

		if not os.path.isdir(destination):
			os.mkdir(destination)
			logfile.write("Created directory '%s''n" % destination)

		destination = os.path.join(destination, time.strftime("%m", fileTime))

		if not os.path.isdir(destination):
			os.mkdir(destination)
			logfile.write("Created directory '%s''\n" % destination)

		os.system("%s \"%s\" \"%s\"" % (copyCmd, file, destination))
		if os.path.isfile(os.path.join(destination,os.path.basename(file))):
			print ".",
			logfile.write("'%s' => '%s''\n" % (file,destination))
		else:
			logfile.write("[FAIL] '%s' => '%s''\n" % (file,destination))

	print "Finished copying files, log file avaliable at %s" % logfilename
	logfile.close()
  1. Directory walking code taken from here with slight modification to make patterns truly case insensitive. []
  • Share/Bookmark

Tags: , ,

Diskless computing vs distributed computing

A friend of mine recently asked me about cloud computing, what it was, and the ramifications of it on where we will see technology in the coming years. In his question he demonstrated a common confusion among most people between the difference between cloud computing and diskless computing.

Both of these are interesting areas of computer science, they do sometimes overlap, and they are both going to change computing in general in significant ways as time rolls on, but they are not the same.

Here’s are the differences to help  you can tell them apart.

Diskless computing

Diskless computing is best demonstrated in the Linux Terminal Server Project (excellent project, I’ve use it before to deploy over 150 diskless workstations in a company before) and Microsoft’s pathetic rival, Windows Terminal Services. Sun has their own solution as well and there are countless 3rd party utilities, but the basic idea behind them all is that you have one big computer (or series of computers) that all these “headless” computers connect to in order to retrieve an operating system, store files, etc. For large networks this network model is absolutely amazing.

Cloud Computing

Cloud computing, however, is the concept that you have a large problem that requires a lot of computing power to solve. Rather than buy bigger and bigger hardware, what we’ve found out (going back to Cray supercomputers) is that it is far better to split the problem down into iterative chunks and push those through multiple processors all at once rather than try to get a single processor to process everything. This is called distributed computing.

You might have heard of one of the major platforms for this type of computing, Beowulf, from the popular internet meme “imagine a beowulf cluster of…” Another very popular distributed computing platform (popular because it is far easier to install, operate, and write code for than the Beowulf project) is Hadoop. Hadoop is a project inspired by Google’s implementation of the MapReduce design paradigm written in Java which makes it a lot more portable.

Projects using Cloud Computing

Parallel processing is done today in a wide variety of settings including:

  • 3D rendering farms for companies such as Disney’s Pixar
  • indexing the web with Google, Yahoo, Microsoft, etc.
  • data mining of all sorts with companies like Wal-Mart, etc.

Join in!

There are some very popular projects using distributed computing technologies that regular people with CPU cycles to spare are encouraged to join in on like:

  • SETI@home where you can help process data that might help us identify extraterrestrial signals
  • Folding@home where you can help search for cures to various diseases
  • Genome@home where you can help map the human genome (again), this is tied closely to the folding@home project above
  • Shrek@home which was a pioneer project that a few of us got to participate in
  • others, including fightaids@home to help fight AIDS and lhc@home to process the massive amounts of data coming from the CERN’s Large Hadron Collider

So while diskless computing and cloud computing can have some areas of overlap (I configured the LTSP network I mentioned earlier to assist with the genome@home project when the systems were idle) they aren’t necessarily tied together.

  • Share/Bookmark

Tags: , , , , , , , ,