Posted by abloz on June 1, 2012



hbase用java来操作是最方便,也效率最高的方式。但java并非轻量级,不方便在任何环境下调试。而且不同的开发人员熟悉的语言不一样,开发效率也不一样。hbase 通过thrift,还可以用python,ruby,cpp,perl等语言来操作。




启动thrift server,默认侦听9090端口,如果有冲突,可以用-p参数修改默认端口。
 hbase thrift -p 19090 start
 [zhouhh@Hadoop48 hbase-0.94.0]$ hbase thrift -p 19090 start
 12/06/01 17:54:27 INFO util.VersionInfo: HBase 0.94.0
 12/06/01 17:54:27 INFO util.VersionInfo: Subversion -r 1332822
 12/06/01 17:54:27 INFO util.VersionInfo: Compiled by jenkins on Tue May 1 21:43:54 UTC 2012
 Exception in thread "main" java.lang.AssertionError: Exactly one option out of [-hsha, -nonblocking, -threadpool, -threadedselector] has to be specified
 at org.apache.hadoop.hbase.thrift.ThriftServerRunner$ImplType.setServerImpl(
 at org.apache.hadoop.hbase.thrift.ThriftServer.processOptions(
 at org.apache.hadoop.hbase.thrift.ThriftServer.doMain(
 at org.apache.hadoop.hbase.thrift.ThriftServer.main(

[zhouhh@Hadoop48 hbase-0.94.0]$ hbase thrift -p 19090 -nonblocking start
 12/06/01 17:25:38 INFO thrift.ThriftServerRunner: starting HBase TNonblockingServer server on 19090
 [zhouhh@Hadoop48 hbase-0.94.0]$ hbase thrift -p 19090 -hsha start
 12/06/01 17:55:27 DEBUG thrift.ThriftServerRunner: Using binary protocol
 12/06/01 17:55:27 DEBUG thrift.ThriftServerRunner: Using framed transport
 12/06/01 17:55:27 INFO thrift.ThriftServerRunner: starting HBase THsHaServer server on 19090



[zhouhh@Hadoop48 hbase-0.94.0]$ cd src/main/resources/org/apache/hadoop/hbase/thrift/
 [zhouhh@Hadoop48 thrift]$ ls
 [zhouhh@Hadoop48 thrift]$ thrift --gen py Hbase.thrift
 [zhouhh@Hadoop48 thrift]$ ls
 gen-py Hbase.thrift

[zhouhh@Hadoop48 thrift]$ sudo mv gen-py /usr/local/lib/python2.7/site-packages/


[zhouhh@Hadoop48 test]$ hbase shell
 hbase(main):003:0> list
 1 row(s) in 0.5320 seconds


[zhouhh@Hadoop48 test]$ vi
 #!/usr/bin/env python
 import sys
 from thrift import Thrift
 from thrift.transport import TSocket
 from thrift.transport import TTransport
 from thrift.protocol import TBinaryProtocol

from hbase import Hbase
 #如ColumnDescriptor 等在hbase.ttypes中定义
 from hbase.ttypes import *

# Make socket
 transport = TSocket.TSocket('localhost', 19090)
 # Buffering is critical. Raw sockets are very slow
 # 还可以用TFramedTransport,也是高效传输方式
 transport = TTransport.TBufferedTransport(transport)
 # Wrap in a protocol
 protocol = TBinaryProtocol.TBinaryProtocol(transport)
 client = Hbase.Client(protocol)

print client.getTableNames()


[zhouhh@Hadoop48 test]$ python
 Traceback (most recent call last):
 File "", line 27, in <module>
 print client.getTableNames()
 File "/usr/local/lib/python2.7/site-packages/gen-py/hbase/", line 769, in getTableNames
 return self.recv_getTableNames()
 File "/usr/local/lib/python2.7/site-packages/gen-py/hbase/", line 779, in recv_getTableNames
 (fname, mtype, rseqid) = self._iprot.readMessageBegin()
 File "build/bdist.linux-x86_64/egg/thrift/protocol/", line 126, in readMessageBegin
 File "build/bdist.linux-x86_64/egg/thrift/protocol/", line 203, in readI32
 File "build/bdist.linux-x86_64/egg/thrift/transport/", line 58, in readAll
 File "build/bdist.linux-x86_64/egg/thrift/transport/", line 160, in read
 File "build/bdist.linux-x86_64/egg/thrift/transport/", line 94, in read
 socket.error: [Errno 104] Connection reset by peer

server 打印:
 12/06/01 17:55:40 ERROR server.THsHaServer: Read an invalid frame size of -2147418111. Are you using TFramedTransport on the client side?
 [zhouhh@Hadoop48 hbase-0.94.0]$ hbase thrift -p 19090 -threadpool start
 12/06/01 18:02:17 DEBUG thrift.ThriftServerRunner: Using binary protocol
 12/06/01 18:02:17 INFO thrift.ThriftServerRunner: starting TBoundedThreadPoolServer on /; min worker threads=16, max worker threads=1000, max queued requests=1000

[zhouhh@Hadoop48 test]$ python [‘t1’] 打印正确


 * Create a table with the specified column families. The name
 * field for each ColumnDescriptor must be set and must end in a
 * colon (:). All other fields are optional and will get default
 * values if not explicitly specified.
 * @throws IllegalArgument if an input parameter is invalid
 * @throws AlreadyExists if the table name already exists
 void createTable(
 /** name of table to create */
 1:Text tableName,

/** list of column family descriptors */
 2:list<ColumnDescriptor> columnFamilies
 ) throws (1:IOError io, 2:IllegalArgument ia, 3:AlreadyExists exist)

 * An HColumnDescriptor contains information about a column family
 * such as the number of versions, compression settings, etc. It is
 * used as input when creating a table or adding a column.
 struct ColumnDescriptor {
 1:Text name,
 2:i32 maxVersions = 3,
 3:string compression = "NONE",
 4:bool inMemory = 0,
 5:string bloomFilterType = "NONE",
 6:i32 bloomFilterVectorSize = 0,
 7:i32 bloomFilterNbHashes = 0,
 8:bool blockCacheEnabled = 0,
 9:i32 timeToLive = -1


    colusername = ColumnDescriptor( name = 'username:',maxVersions = 1 )
    colpass = ColumnDescriptor( name = 'pass:',maxVersions = 1 )
    colage = ColumnDescriptor( name = 'age:',maxVersions = 1 )
    colinfo = ColumnDescriptor( name = 'info:',maxVersions = 1 )

    client.createTable('tusers', [colusername,colpass,colage,colinfo])

    print client.getTableNames()

except AlreadyExists, tx:
    print "Thrift exception"
    print '%s' % (tx.message)

 [zhouhh@Hadoop48 test]$ python
 ['t1', 'tusers']
 [zhouhh@Hadoop48 test]$ python
 ['t1', 'tusers']
 Thrift exception
 table name already in use