Though there is some decent documentation, I found that setting up Hive with a HBase back-end to be somewhat fiddly. Hopefully this guide will help you get started quicker. This article presumes that you already have HBase set up. If not, see my HBase quickstart.
Note: these directions are for development. They don’t use HDFS, for example. For a full guide on production deployment, see the excellent CDH4 directions.
Connect to HBase
Now, you can fire up hive with the
hive command and create a table that’s backed by HBase. For this example, my HBase table is called
test, and has a column family of integer values called
values. Note that the dropping/creating of tables is just effecting Hive meta-data; no actual changes are made in HBase.
Simple Map Reduce Example
Give the above raw data in the table, here is example GROUP/SUM map reduce where you sum up the various HBase columns in the values column family. This example creates a view to handle the blowing apart of the HBase rowkey. You can use an
INSERT OVERWRITE statement at the end to write the results back into Hbase.
Thrift REST API
If you want to connect to Hive via thrift, you can start the thrift service with
hive --service hiveserver. Hiver is a nice little Python API wrapper.