Archive for category hosting

Getting useful index information from MongoDB

Here is a MongoDB script for presenting index information in a more concise way than getIndexes() provides. This script also presents an index’s total size along with a breakdown of its size on all of the shards.

//mongo --eval="var collection='file';"

var ret = db[collection].getIndexes().map(function(i){
    return {"key":i.key, "name":i.name};
});

var o = {};
for(r in ret) {
    o[ret[r].name] = ret[r].key;
}

var cstats = db[collection].stats();
for(k in cstats.indexSizes) {
    o[k].totalsize = cstats.indexSizes[k];
}

var shardinfo = cstats.shards;
for(s in shardinfo) {
    for(k in shardinfo[s].indexSizes) {
        if(!o[k].shards) o[k].shards = {};
        o[k].shards[s] = shardinfo[s].indexSizes[k];
    }
}

printjson(o);

Produces the following output:

{
    "_id_" : {
        "_id" : 1,
        "totalsize" : 50501459568,
        "shards" : {
            "shard0000" : 18620766416,
            "shard0001" : 18117909712,
            "shard0002" : 13762783440
        }
    }
}
Share/Save

Tags:

Simple init.d script template

Recently I found the need to create an init.d script and since I had a hard time finding an example elsewhere1, here’s the overly simple script I came up with to get the job done:

#!/bin/bash
# myapp daemon
# chkconfig: 345 20 80
# description: myapp daemon
# processname: myapp

DAEMON_PATH="/home/wes/Development/projects/myapp"

DAEMON=myapp
DAEMONOPTS="-my opts"

NAME=myapp
DESC="My daemon description"
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

case "$1" in
start)
	printf "%-50s" "Starting $NAME..."
	cd $DAEMON_PATH
	PID=`$DAEMON $DAEMONOPTS > /dev/null 2>&1 & echo $!`
	#echo "Saving PID" $PID " to " $PIDFILE
        if [ -z $PID ]; then
            printf "%s\n" "Fail"
        else
            echo $PID > $PIDFILE
            printf "%s\n" "Ok"
        fi
;;
status)
        printf "%-50s" "Checking $NAME..."
        if [ -f $PIDFILE ]; then
            PID=`cat $PIDFILE`
            if [ -z "`ps axf | grep ${PID} | grep -v grep`" ]; then
                printf "%s\n" "Process dead but pidfile exists"
            else
                echo "Running"
            fi
        else
            printf "%s\n" "Service not running"
        fi
;;
stop)
        printf "%-50s" "Stopping $NAME"
            PID=`cat $PIDFILE`
            cd $DAEMON_PATH
        if [ -f $PIDFILE ]; then
            kill -HUP $PID
            printf "%s\n" "Ok"
            rm -f $PIDFILE
        else
            printf "%s\n" "pidfile not found"
        fi
;;

restart)
  	$0 stop
  	$0 start
;;

*)
        echo "Usage: $0 {status|start|stop|restart}"
        exit 1
esac

This script will work in /etc/init.d on Xubuntu 11.10 (so most Debian-based systems) and CentOS 5.5 and you can control it via chkconfig.

  1. That said, if you know of such an example I’d love to hear from you. []

Tags: , , , , ,

MongoDB script to check the status of background index builds

Here is a simple script I’ve found to be quite helpful for monitoring the status of background index builds across shards on a system:

var currentOps = db.currentOp();

if(!currentOps.inprog || currentOps.inprog.length < 1) {
    print("No operations in progress");
} else {
    for(o in currentOps.inprog) {
        var op = currentOps.inprog[o];
        if(op.msg && op.msg.match(/bg index build/)) {
            print(op.opid+' - '+op.msg);
        }
    }
}

Here's the output:

$ mongo mycluster:30000/mydb bgIndexBuildStatus.js 
MongoDB shell version: 1.8.1
connecting to: mycluster:30000/mydb
shard0000:343812263 - bg index build 122042652/165365928 73%
shard0001:355224633 - bg index build 111732254/165568168 67%

Tags: , , , ,

Simulating Markers with Tile Layers

Here is a great video I found on Youtube on displaying lots of markers (more than a couple hundred points) through tile layers. He makes some great points, but I like my raphael overlay system I used in the church heatmap project.

Tags: , , ,

Fun with heatmaps

Recently I’ve been playing with a few new technologies. Some are new to me while most are simply new. My base project to use these technologies is a heatmap visualization of churches in Georgia. While heatmaps in themselves aren’t exactly exciting, having the ability to map more than 10,000 data points at a time in real-time is.

So here are the various technologies I used in the creation of my app.

YQL

To get the data for this project I used a simple Python script and YQL to get a list of locations in each Zip code in Georgia with the term “Church” in them. This yielded approximately 10,000 results, including a few from South Carolina.

MongoDB

I stored the data I got from YQL in MongoDB. Specifically, I used MongoLab’s to host the data because they have a generous free storage limit (256Meg) and can be accessed from a RackSpace server without incurring an additional bandwidth charge.

MongoDB is a JavaScript-based NoSQL solution used by sites like Foursquare. It has built-in support for Geo queries and is built to scale.

node.js

For the application layer I decided to try out node.js. Node is also JavaScript-based, based on WebKit, the engine behind Google’s Chrome and Apple’s Safari browsers. Node is event-based which means it has the potential to be lightning fast.

Canvas

The biggest factor in how well a heatmap solution performs is the graphics package that’s used. After searching around I found a pretty decent PHP heatmap solution, but it used GD and was very slow. I also found a JavaScript solution that used the new HTML5 Canvas element, but it choked when given a significant amount of datapoints to render all at once. So I decided to refactor some of the utility functions from the PHP solution and combine them with the Canvas function.

The great thing about the resulting solution is that it has the ability to run on either the client-side or the server-side. And in the end the heatmap application I built uses both. If the number of data points in a tile is less than a preset threshold defined per browser1 the raw data is sent back to the browser which renders the tiles client-side rather than consuming precious server resources.

Web Sockets

The usual way to serve tiles is by serving images to overlay the map. And while the heatmap solution I developed does serve static PNG files from cache, I decided to use the new HTML5 Web Sockets to make things a bit more interesting. What is great about web sockets is that it allows me to pass events between the server and client very easily. Socket.io made it easy to forget where the server ended and where the client began.

ZeroMQ

As applications scale to multiple threads, processes, and eventually across servers and networks, they need to have a way for each component to communicate with other components in an efficient manner. So add a bit of future-proofing to my solution I decided to use ZeroMQ to pass messages between the front-end web server component and the back-end tile generator component. This allows me to tune the performance of the application in both directions, up or down2.

Raphaël

To add some extra pizzazz to the app I decided to add in the ability to display each individual data point along with some additional detailed information. I found that Google’s native Marker system was a bit slow when it came to displaying over 2,000 markers at a time so I decided to give the Raphaël graphics library a try. The results were impressive. Raphael was not only able to draw thousands of data points on the map seamlessly, but was able to do it with smooth animations. Look for gRaphaël to be employed in future renditions of this heatmap solution.

Conclusion

Every now and then I run across a programming challenge that reminds me why I love doing what I do. These technologies and this project have done that for me. Being able to throw together a large, complex project like this in a relatively quick manner reminded me of Fred Brooks’s comment on why programming is fun.

  1. 1000 for WebKit-based browsers, 250 for Mozilla-based browsers, and 0 for IE because IE still sucks. []
  2. Tuning performance down comes in handy when you are on a shared server with limited resources. []

Tags: , , , , , , , , , , ,

New year, new developments at Werx Limited

We had an excellent 2010 and I want to share with you some of the highlights:

  • We only had about 10 hours of downtime. This means your website was up over 99.99% of the time in 2010.
  • The utilization of our server has remained steady at under 10%. This means we’ve been able to speedily service each and every request to your website.
  • We have had no security breeches in 2010. There have been many attempts, as there are on any public server, but none were successful. In fact, we attained a certification of security for one of our ecommerce partners early in 2010. Your website was safe and sound throughout 2010.

As great as 2010 was, we are looking to make 2011 even better. So we are working on two big changes in January which should set the stage for the rest of the year.

The first is with hosting

We’ve been on the same server since 2006, and while the server itself is sound, newer technologies, like the rising tide of cloud computing, have made upgrading a priority for meeting future strategic goals. What this means for you is that we will be moving to a cloud-based architecture. Soon, you’ll be able to enjoy all of the benefits of having a website “in the cloud” including

  • lower cost (we’ll get to that in a minute)
  • better uptime
  • scalability

For more information about the benefits of cloud computing, please consult this white paper by IBM.

We’ve already begun moving sites and I’ll send out another notice when we’ve completed the move.

The second is with billing

I must admit that our billing has been rather chaotic and frazzled. And getting a good plan in place is one of our biggest goals in 2011. Our plan is to bill once a year for all hosting and DNS management services (if we are providing that service).

Since our billing terms are new, we will extend our billing terms this year to 3 months to give you plenty of time to accommodate this change. And since our move will be saving us money, we want to pass some of those savings on to you. From now on, rather than charging $15/mo for website hosting, or $180/yr, we will charge $90/yr*. That’s right, we’re now offering our rock-solid hosting for only half the price!

Thanks again for choosing us for your hosting needs. If you have any questions or comments please feel free to send us an email at [email protected].

*Includes 1 domain renewal. Additional domains will be billed at $10 per year per domain.

Tags: , ,

A Cascading Style Sheets (CSS) Beginner’s Tutorial

Learning CSS can be a bit daunting if you’ve never encountered it before. Likewise, if you’ve only had limited exposure to CSS, the various ways browsers implement various aspects of the CSS standard (or make up their own) can leave you with the impression that it is all a giant hairy mess. So to help out, I’ve compiled a list of resources to make the learning curve not quite as steep for beginners and to hopefully help tame the CSS wilderness for novices.

First, here is a pretty good and in-depth video on HTML and CSS basics:

Next we have several handy beginner’s tutorial sites:

Finally, here are a few CSS frameworks designed to help make CSS a lot easier by providing a standard system that takes care of much of the common ugly quirks found in CSS:

As a bonus, here are a few inspirational sites to help give you an idea of what CSS can do if applied properly:

Tags: , , ,

What’s the best way to make sure my data is safe?

I get asked many times from friends and family what the best storage solution is for ensuring data they find to be critical is not lost or corrupted.

Whatever storage solution you decide to use it needs to be unobtrusive and largely automated because, if not, then you’ll find out at the worst possible time (usually in a crisis) that actually recovering your data is nearly impossible and often times, incomplete.

The most unobtrusive solution I’ve found so far is to use a Network Attached Storage solution. The one I use and highly recommend is the D-Link DNS-321 which accepts standard SATA drives (which means they are fast and reliable) in a RAID-1 configuration. RAID-1 means the drives are mirrored, meaning the data is automatically duplicated to two internal drives. Just about any NAS system will work but make sure it includes RAID (most don’t) and isn’t simply a more fancy external hard drive.

Being attached a network attached device also gives you the benefit of not having to rely on too many additional moving parts. For a long time I used to use spare computer systems as storage units but what I quickly found out is that the individual parts in them posed as multiple unnecessary points of failure. Motherboards, RAM, even graphics cards can cause significant headaches when all you care about is the hard drives and the data they contain.

In fact, since Google’s high powered cloud computing infrastructure runs on common hardware like the kind you and I use, it is significant to note the hardware failure rate they discovered from constantly pushing common hardware to it’s limits over long periods of time. This simply means that when you are planning a computational strategy (in this case, storage of sensitive data) you need to plan for failure instead of hoping for the best.

In contrast, having a system that only consists of a minimal operating system and two drives should be able to give you enough time to replace one drive if/when the other one fails and the NAS unit itself is cheap enough that you could easily have a spare mothballed for the rainy day when you’ll need it.

It’s also a good idea to keep a copy of your data in an offsite location. The principle being that if one place storing your data were flattened then the you should be able to recover from the offsite location. The best way to achieve this is through a continuous online storage solution. I personally don’t use an online storage solution but some things to look for in one would be the backing company’s reliability, whether they back your data up to a cloud or a single server, and how well put together their interface software is. Try the free services first, chances are that if they are really as good as they claim to be (and they all claim to be good) you’ll quickly find out during the trial period (which often is a certain amount of allowed data storage). Here are a few free ones, I have used box.net before (for random file storage, not for regular automated backups) and can say it is pretty good.

I’ve also adopted the strategy of using as many online solutions (such as Gmail for email) which allow me to leverage reliable 3rd party clouds which provide inherent protection from data loss and provide the added benefit of allowing me to access my data from a wide variety of computers without having to sync data between every system I want to use.

Finally, focus on only backing up the files you know you will need. There is no reason to back up the entire computer in terms of applications, operating system, etc. Backing up unnecessary data will only serve to max out your storage capacity and quickly overtax your backup solution. Instead, plan on replacing your whole PC (and the operating system it uses, but keep a copy of the applications you use) in the event of catastrophic data loss. If you stick with reasonably reliable hardware your failure rate should be much higher than Google’s (3-4 years). Average costs of new and decent systems are low enough now that treating a computer as a disposable device (like a cell phone) isn’t all too uncommon or that bad of an idea.

Tags: , , ,

What do I do if my account’s been hacked?

A friend of mine recently asked me via Facebook what he should do if someone he didn’t know and wasn’t friends with on Facebook was able to access private information in he and his wife’s Facebook and email (and presumably other) accounts. Since this is a fairly common concern and question I figured I’d post my response below. Enjoy!

Most likely they have your password (which they might have gotten from a virus, trojan, back-door-worm, or something else.

While anti-virus is great (at this point I feel obliged to mention my employer, McAfee) one area constantly overlooked are apps on Facebook which are malicious. I just got through telling my wife not to install apps on FB unless she absolutely had to (which means, something you will use and use constantly). I used to be bad about installing all the poll and quiz applications on Facebook I came across until I went back through my installed apps one day and noticed that many of them weren’t even named the same thing they were named when I installed them.

So if you are worried that someone has hacked your online accounts the best thing to do is to immediately change all of your passwords. Make sure you use a strong password too (that goes for your wife as well as you).

Also, I highly recommend going through your Facebook applications and uninstalling anything you don’t use as they could be used to harvest your information. Not that you should remove them all (I love Mafia Wars) but if you were to read what a developer has access to you’d certainly think long and hard about each application you install ;-)

Finally, (for the super-paranoid) if you are using a wireless router you should certainly be using some form of wireless encryption (hopefully not WEP because it is vulnerable to attacks). Otherwise all of your information is being transmitted in the clear and can be easily captured with minimal effort.

It’s possible that this person might be getting your personal information another way (via ESP perhaps? :-P) but I think the most likely culprit is your computer/network security.

There’s more that you can do to harden your systems against attack, but these are the most often used avenues of attack. If your adversary is a hacker let me know and I’ll continue listing things you can do to make your systems secure.

Good luck!

Next, we’ll look into some security products and practices that can help you secure your systems.

Tags: , ,

Search engine optimization, what really matters

If you want to market your site, you need to know what search engines look for. Google is still the reigning champion of search engines and their PageRank algorithm is what drives the search results that get displayed. Here is a great visualization from SEOmoz that will help you understand how to better market your site to get more traffic:

As you can see, its not just the content on your pages that are important, but the authority and link popularity (incoming links from other sites) that make up almost 65% of the overall PageRank score of your site. In fact, the content of your site only accounts for around 15% of your site’s overall score.

This is where social media sites such as Facebook, Twitter, LinkedIn, Digg etc. come in handy. The more people that retweet and generally reshare your site, the more popular it gets.

For more information on what matters in search engine optimization, take a look at this post on Copyblogger.

Tags: , ,