Thursday, January 2, 2014

Big data in 2013

In the fall of 2013 I implemented a big data pilot at Owens Corning to demonstrate the value of big data analytics and to better understand how to operate and manage the necessary infrastructure. I can't talk a lot about this work because of intellectual property issues.

But, what I can talk about is a mini-big-data platform that I put together at home to learn about the technology before implementing it at work. I've got lots of tech sitting around in my junk box so I thought, why not build out a little Hadoop platform.

I used the guides that Rashesh Mori and ITToby published along with Hortonworks tutorials and O'Reilly publication, Hadoop: The Definitive Guide by Tom White. (There is plenty of good googleable info on Hadoop so it's really not that hard...)

I had three Raspberry Pi's running a version of Debian Linux and a couple of mini-ITX Atom motherboard PCs that I had upgraded to Centos 6.5. After loading the patches, updates and upgrades to the computers and configuring them with static IP's, SSH and setting up the appropriate security on them, I loaded Hadoop on each of the devices in "single node" mode.  Since this "cluster" is all 32 bit, I couldn't use Ambari to load and configure Hadoop - so I needed to use the "manual" method. Ambari is much easier to use so if you're implementing on 64 bit computers, I'd highly recommend it.

After downloading and untarring the code, I installed it and made sure that Hadoop was running correctly on each device before making 4 of them into data nodes and starting up Hadoop on the master node of the cluster.

There are lots of tutorials and source code available that I could use to test a cluster. I compiled the WordCount Java app and loaded up the Hadoop cluster with a pile of books from Gutenberg to test. About a half hour later, it spit out the wordcount numbers.

Pretty cool.

So, if you've got some old PC's collecting dust, load 'em up with Linux and Hadoop.  This is a great way to learn how to set up a Hadoop cluster at a very low cost.

A great way to experiment on the cheap! If you can think it...