Recent Posts

Hadoop from a Python Perspective

I’m just coming off a project where we decided to use Hadoop for the first time. We’re a Python shop developing an analytics feature. We have about 150m reco...

Sqoop/HBase Quickstart on Linux

Sqoop is a tool for bulk copying data between a relational database like MySQL and HDFS or another Hadoop based data store like Hive or HBase. It can either ...

Hive with HBase Quickstart

Though there is some decent documentation, I found that setting up Hive with a HBase back-end to be somewhat fiddly. Hopefully this guide will help you get s...

Dynamic Attributes in Python

One of the strengths of a dynamic language is that it allows you to more easily work introspection and light weight meta-programming into your every day code...