php通过thrift访问HBase一

abloz 2012-10-18
abloz 2012-10-18

原文地址:http://abloz.com/2012/10/18/php-thrift-access-hbase-a.html 作者:周海汉 日期:2012.10.18

HBase原生支持Java,因此可以通过用Java完成Jetty Servlet,通过HBase stargate提供的REST API提供数据访问。PHP可以用CURL来访问。如 http:/localhost:8080/tablename/rowkey获取数据。 Thrift虽然绕了一下,不过支持多种语言。此前我写列一遍文章《python通过thrift来操作hbase》,本篇主要讲php 通过thrift来操作HBase。 thrift安装可以参考我此前写的《thrift 安装测试示例

1.thrift所需基本环境 最新版thrift 0.8,所需环境如下:

POSIX-兼容 *NIX 系统 Cygwin 或 MinGW 用于 Windows g++ 4.2 boost 1.40.0 Runtime libraries for lex and yacc might be needed for the compiler.

从 SVN 编译所需 GNU build tools: autoconf 2.65 automake 1.9 libtool 1.5.24 pkg-config autoconf macros (pkg.m4) lex and yacc (developed primarily with flex and bison) libssl-dev

各语言编译所需组件 C++ Boost 1.40.0 libevent (optional, to build the nonblocking server) zlib (optional)

Java Java 1.5 Apache Ant

C#: Mono 1.2.4 (and pkg-config to detect it) or Visual Studio 2005+

Python 2.6 (including header files for extension modules)

PHP 5.0 (optionally including header files for extension modules)

Ruby 1.8 bundler gem

Erlang R12 (R11 works but not recommended)

Perl 5 Bit::Vector Class::Accessor

centos安装所需组件: sudo yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel

2.编译安装thrift [zhouhh@Hadoop48 thrift-0.8.0]$ ./configure –without-erlang –without-ruby

… thrift 0.8.0

Building code generators ….. :

Building C++ Library ……… : yes Building C (GLib) Library …. : no Building Java Library …….. : yes Building C# Library ………. : no Building Python Library …… : yes Building Ruby Library …….. : no Building Haskell Library ….. : no Building Perl Library …….. : no Building PHP Library ……… : yes Building Erlang Library …… : no Building Go Library ………. : no

Building TZlibTransport …… : yes Building TNonblockingServer .. : no

Using javac ……………… : javac Using java ………………. : java Using ant ……………….. : /etc/ant/bin/ant

Using Python …………….. : /usr/local/bin/python

Using php-config …………. : /usr/bin/php-config

[zhouhh@Hadoop48 thrift-0.8.0]$ make [zhouhh@Hadoop48 thrift-0.8.0]$ sudo make install install: [copy] Copying 12 files to /usr/local/lib

3.php网站如何支持php Thrift 需要 PHP 5 我apache网站放在/var/www/html下。 #1) 复制 thrift/lib/php/src 到 /var/www/html #2) 设置 $GLOBALS[‘THRIFT_ROOT’] Thrift 安装目录 #3) include_once $GLOBALS[‘THRIFT_ROOT’].’/Thrift.php’; #3 必须在其他 Thrift 包含文件之前. 对MyPackage.thrift,生成的文件要安装到下面目录: $GLOBALS[‘THRIFT_ROOT’].’/packages/MyPackage/’

php环境: PHP_INT_SIZE 用于决定是32位还是64位系统,用于TBinaryProtocol 正确 pack() 和 unpack()

apc_fetch(), apc_store() TSocketPool用了apc cache。

[root@Hadoop48 html]# pwd /var/www/html [root@Hadoop48 html]# ls favicon.ico get.php hbase_code.php hbase.php hive.php index.html index.php insert.php KLogger.php list.php log mylog scan.php src test.sh

4.配置HBase,支持Thrift

将Hbase.thrift编译成php文件 thrift –gen php [hbase-root]/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift [zhouhh@Hadoop48 thrift]$ pwd /home/zhouhh/hbase-0.94.0/src/main/resources/org/apache/hadoop/hbase/thrift [zhouhh@Hadoop48 thrift]$ ls Hbase.thrift [zhouhh@Hadoop48 thrift]$ thrift –gen php Hbase.thrift [zhouhh@Hadoop48 thrift]$ ls gen-php Hbase.thrift [zhouhh@Hadoop48 thrift]$ find . . ./Hbase.thrift ./gen-php ./gen-php/Hbase ./gen-php/Hbase/Hbase.php ./gen-php/Hbase/Hbase_types.php

启动hbase thrift server [hbase-root]/bin/hbase thrift start … Read an invalid frame size of -2147418111. Are you using TFramedTransport on the client side?

要想提高传输效率,必须使用TFramedTransport或TBufferedTransport.但对-hsha,-nonblocking两种服务器模式,必须使用TFramedTransport。将其改为线程方式试试。 [zhouhh@Hadoop48 ~]$ hbase thrift start -threadpool -p 9090 -p 参数可以修改thrift server的监听端口,防止端口冲突。

5.准备HBase数据 hbase shell hbase(main):001:0> list TABLE 0 row(s) in 0.5140 seconds

hbase(main):002:0> create ‘t1’,’info’ 0 row(s) in 2.1100 seconds hbase(main):005:0> put ‘t1’,’no1’,’info:name’,’zhouhh’ 0 row(s) in 0.0720 seconds

hbase(main):007:0> put ‘t1’,’no2’,’info:name’,’zhouhh2’ 0 row(s) in 0.0050 seconds

hbase(main):008:0> scan ‘t1’ ROW COLUMN+CELL no1 column=info:name, timestamp=1350370179806, value=zhouhh no2 column=info:name, timestamp=1350370213338, value=zhouhh2 2 row(s) in 0.0370 seconds

6.配置网站支持hbase thrift 将hbase thrift生成的文件复制到网站下。 [root@Hadoop48 packages]# pwd /var/www/html/src/packages [root@Hadoop48 packages]# cp -r $HBASE_HOME/src/main/resources/org/apache/hadoop/hbase/thrift/gen-php/* . [root@Hadoop48 packages]# find . . ./Hbase ./Hbase/Hbase.php ./Hbase/Hbase_types.php

复制hbase的thrift demo到网站 [zhouhh@Hadoop48 hbase-0.94.0]$ sudo cp ./src/examples/thrift/DemoClient.php /var/www/html/hbase.php

修改hbase.php的下列两行 $GLOBALS[‘THRIFT_ROOT’] = ‘/var/www/html/src’; require_once( $GLOBALS[‘THRIFT_ROOT’].’/packages/Hbase/Hbase.php’ );

[root@Hadoop48 html]# vi hbase.php

row}, cols: n” ); $values = $rowresult->columns; asort( $values ); foreach ( $values as $k=>$v ) { echo( “ {$k} => {$v->value}n” ); } }

$socket = new TSocket( ‘localhost’, 9090 ); $socket->setSendTimeout( 10000 ); // Ten seconds (too long for production, but this is just a demo ;) $socket->setRecvTimeout( 20000 ); // Twenty seconds $transport = new TBufferedTransport( $socket ); $protocol = new TBinaryProtocol( $transport ); $client = new HbaseClient( $protocol );

$transport->open();

$t = ‘t1’;

?>

getTableNames();
sort( $tables );
foreach ( $tables as $name ) {
  echo( "  found: {$name}n" );
  if ( $name == $t ) {
    if ($client->isTableEnabled( $name )) {
      echo( "    disabling table: {$name}n");
      $client->disableTable( $name );
    }
    echo( "    deleting table: {$name}n" );
    $client->deleteTable( $name );
  }
}

#
# Create the demo table with two column families, entry: and unused:
#
$columns = array(
  new ColumnDescriptor( array(
    'name' => 'entry:',
    'maxVersions' => 10
  ) ),
  new ColumnDescriptor( array(
    'name' => 'unused:'
  ) )
);

echo( "creating table: {$t}n" );
try {
  $client->createTable( $t, $columns );
} catch ( AlreadyExists $ae ) {
  echo( "WARN: {$ae->message}n" );
}

echo( "column families in {$t}:n" );
$descriptors = $client->getColumnDescriptors( $t );
asort( $descriptors );
foreach ( $descriptors as $col ) {
  echo( "  column: {$col->name}, maxVer: {$col->maxVersions}n" );
}

#
# Test UTF-8 handling
#
$invalid = "foo-xfcxa1xa1xa1xa1xa1";
$valid = "foo-xE7x94x9FxE3x83x93xE3x83xBCxE3x83xAB";

# non-utf8 is fine for data
$mutations = array(
  new Mutation( array(
    'column' => 'entry:foo',
    'value' => $invalid
  ) ),
);
$client->mutateRow( $t, "foo", $mutations );

# try empty strings
$mutations = array(
  new Mutation( array(
    'column' => 'entry:',
    'value' => ""
  ) ),
);
$client->mutateRow( $t, "", $mutations );

# this row name is valid utf8
$mutations = array(
  new Mutation( array(
    'column' => 'entry:foo',
    'value' => $valid
  ) ),
);
$client->mutateRow( $t, $valid, $mutations );

# non-utf8 is not allowed in row names
try {
  $mutations = array(
    new Mutation( array(
      'column' => 'entry:foo',
      'value' => $invalid
    ) ),
  );
  $client->mutateRow( $t, $invalid, $mutations );
  throw new Exception( "shouldn't get here!" );
} catch ( IOError $e ) {
  echo( "expected error: {$e->message}n" );
}

# Run a scanner on the rows we just created
echo( "Starting scanner...n" );
$scanner = $client->scannerOpen( $t, "", array( "entry:" ) );
try {
  while (true) printRow( $client->scannerGet( $scanner ) );
} catch ( NotFound $nf ) {
  $client->scannerClose( $scanner );
  echo( "Scanner finishedn" );
}

#
# Run some operations on a bunch of rows.
#
for ($e=100; $e>=0; $e--) {
  # format row keys as "00000" to "00100"
  $row = str_pad( $e, 5, '0', STR_PAD_LEFT );

  $mutations = array(
    new Mutation( array(
      'column' => 'unused:',
      'value' => "DELETE_ME"
    ) ),
  );
  $client->mutateRow( $t, $row, $mutations);
  printRow( $client->getRow( $t, $row ));
  $client->deleteAllRow( $t, $row );

  $mutations = array(
    new Mutation( array(
      'column' => 'entry:num',
      'value' => "0"
    ) ),
    new Mutation( array(
      'column' => 'entry:foo',
      'value' => "FOO"
    ) ),
  );
  $client->mutateRow( $t, $row, $mutations );
  printRow( $client->getRow( $t, $row ));

  $mutations = array(
    new Mutation( array(
      'column' => 'entry:foo',
      'isDelete' => 1
    ) ),
    new Mutation( array(
      'column' => 'entry:num',
      'value' => '-1'
    ) ),
  );
  $client->mutateRow( $t, $row, $mutations );
  printRow( $client->getRow( $t, $row ) );

  $mutations = array(
    new Mutation( array(
      'column' => "entry:num",
      'value' => $e
    ) ),
    new Mutation( array(
      'column' => "entry:sqr",
      'value' => $e * $e
    ) ),
  );
  $client->mutateRow( $t, $row, $mutations );
  printRow( $client->getRow( $t, $row ));

  $mutations = array(
    new Mutation( array(
      'column' => 'entry:num',
      'value' => '-999'
    ) ),
    new Mutation( array(
      'column' => 'entry:sqr',
      'isDelete' => 1
    ) ),
  );
  $client->mutateRowTs( $t, $row, $mutations, 1 ); # shouldn't override latest
  printRow( $client->getRow( $t, $row ) );

  $versions = $client->getVer( $t, $row, "entry:num", 10 );
  echo( "row: {$row}, values: n" );
  foreach ( $versions as $v ) echo( "  {$v->value};n" );

  try {
    $client->get( $t, $row, "entry:foo");
    throw new Exception ( "shouldn't get here! " );
  } catch ( NotFound $nf ) {
    # blank
  }

}

$columns = array();
foreach ( $client->getColumnDescriptors($t) as $col=>$desc ) {
  echo("column with name: {$desc->name}n");
  $columns[] = $desc->name.":";
}

echo( "Starting scanner...n" );
$scanner = $client->scannerOpenWithStop( $t, "00020", "00040", $columns );
try {
  while (true) printRow( $client->scannerGet( $scanner ) );
} catch ( NotFound $nf ) {
  $client->scannerClose( $scanner );
  echo( "Scanner finishedn" );
}

$transport->close();

?>

7.检验 通过浏览器访问 http://hadoop48/hbase.php 排除一些错误后,检查是否有数据插入: [zhouhh@Hadoop48 ~]$ hbase shell HBase Shell; enter ‘help’ for list of supported commands. Type “exit” to leave the HBase Shell Version 0.94.0, r1332822, Tue May 1 21:43:54 UTC 2012

hbase(main):001:0> scan ‘t1’ ROW COLUMN+CELL column=info:, timestamp=1350387336446, value= foo column=info:foo, timestamp=1350387336443, value=foo-xFCxA1xA1xA1xA1xA1 foo-xE7x94x9FxE3x83x93xE3x83xBCx column=info:foo, timestamp=1350387336448, value=foo-xE7x94x9FxE3x83x93xE3x83xBCxE3x83xAB E3x83xAB foo-xFCxA1xA1xA1xA1xA1 column=info:foo, timestamp=1350387336476, value=foo-xFCxA1xA1xA1xA1xA1 no1 column=info:name, timestamp=1350370179806, value=zhouhh no2 column=info:name, timestamp=1350370213338, value=zhouhh2 6 row(s) in 3.4320 seconds

8.参考 thrift官方网站:http://thrift.apache.org/ 安装文档:http://thrift.apache.org/docs/install/ 下载地址:http://labs.renren.com/apache-mirror/thrift/0.8.0/thrift-0.8.0.tar.gz 使用C++(通过Thrift)访问/操作/读写Hbase http://www.codelast.com/?p=3303 http://thrift.apache.org/docs/BuildingFromSource/ http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/


如非注明转载, 均为原创. 本站遵循知识共享CC协议,转载请注明来源