Hadoop from a Python Perspective
I’m just coming off a project where we decided to use Hadoop for the first time. We’re a Python shop developing an analytics feature. We have about 150m reco...
I’m just coming off a project where we decided to use Hadoop for the first time. We’re a Python shop developing an analytics feature. We have about 150m reco...
Sqoop is a tool for bulk copying data between a relational database like MySQL and HDFS or another Hadoop based data store like Hive or HBase. It can either ...
Some newer storage technologies allow you to connect to one of a set of servers right from their client library. For example, MongoDB lets you specify one ho...
Apache Hive is a high level SQL-like interface to Hadoop. It lets you execute mostly unadulterated SQL, like this:
Though there is some decent documentation, I found that setting up Hive with a HBase back-end to be somewhat fiddly. Hopefully this guide will help you get s...
Wrote a simple connection pool for Happybase using socketpool.
Schema design in NoSQL is very different from schema design in a RDBMS. Once you get something like HBase up and running, you may find yourself staring blank...
Ran into an interesting edge case with pickle this week. I had a producer task that was querying objects from a database, and pickling them plus a reference ...
One of the strengths of a dynamic language is that it allows you to more easily work introspection and light weight meta-programming into your every day code...
Update: this app was decommissioned in 2022