HBase 官方文档中文版

Question

什么情况下应该用HBase?

Answer 1

参考 the Section 9.1, “概述” in the Architecture chapter.

Answer 2

参考 the FAQ that is up on the wiki, HBase Wiki FAQ.

Answer 3

事实上不支持。 SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. 参考 the Chapter 5, Data Model section for examples on the HBase client.

Answer 4

参考附录中 BigTable 论文链接 Appendix F, Other Information About HBase ，及其他论文.

Answer 5

参考 Appendix G, HBase History.

Answer 6

参考 Section 9.7, “Regions”.

Answer 7

参考 Section 1.2, “Quick Start”.

Answer 8

参考 Chapter 2, Configuration.

Answer 9

参考 Chapter 5, Data Model and Chapter 6, HBase and Schema Design

Answer 10

参考 Section 6.5, “ Supported Datatypes ”.

Answer 11

参考 Section 6.9, “ Secondary Indexes and Alternate Query Paths ”

Answer 12

This is a very common quesiton. You can't. 参考 Section 6.3.5, “Immutability of Rowkeys”.

Answer 13

参考 Chapter 5, Data Model, Section 9.3, “Client” and Section 10.1, “非Java 语言和 JVM 通话”.

Answer 14

参考 Chapter 7, HBase and MapReduce

Answer 15

参考 Chapter 11, Performance Tuning.

Answer 16

参考 Chapter 12, Troubleshooting and Debugging HBase.

Answer 17

EC2 issues are a special case. 参考 Troubleshooting Section 12.12, “Amazon EC2” and Performance Section 11.11, “Amazon EC2” sections.

Answer 18

参考 Chapter 14, HBase Operational Management

Answer 19

参考 Section 14.7, “HBase Backup”

Answer 20

参考 Appendix F, Other Information About HBase

Revision History
Revision 0.97.0-SNAPSHOT	2013-04-07T14:59
中文版翻译整理周海汉

	HBase-0.92.x	HBase-0.94.x	HBase-0.96
Hadoop-0.20.205	S	X	X
Hadoop-0.22.x	S	X	X
Hadoop-1.0.x	S	S	S
Hadoop-1.1.x	NT	S	S
Hadoop-0.23.x	X	S	NT
Hadoop-2.x	X	S	S

Row Key	Time Stamp	ColumnFamily `contents`	ColumnFamily `anchor`
"com.cnn.www"	t9		`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8		`anchor:my.look.ca` = "CNN.com"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."

Row Key	Time Stamp	Column Family `anchor`
"com.cnn.www"	t9	`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8	`anchor:my.look.ca` = "CNN.com"

Row Key	Time Stamp	ColumnFamily "contents:"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."

A.1. 通用

	什么情况下应该用HBase?
	参考 the Section 9.1, “概述” in the Architecture chapter.
	还有别的 HBase FAQ吗?
	参考 the FAQ that is up on the wiki, HBase Wiki FAQ.
	HBase 支持SQL吗?
	事实上不支持。 SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. 参考 the Chapter 5, Data Model section for examples on the HBase client.
	到哪里找到NoSQL/HBase的例子呢?
	参考附录中 BigTable 论文链接 Appendix F, Other Information About HBase ，及其他论文.
	HBase历史如何?
	参考 Appendix G, HBase History.
A.2. 结构

	HBase 如何处理 Region-RegionServer 分配和本地化?
	参考 Section 9.7, “Regions”.
A.3. 配置

	How can I get started with my first cluster?
	参考 Section 1.2, “Quick Start”.
	Where can I learn about the rest of the configuration options?
	参考 Chapter 2, Configuration.
A.4. 模式设计 / 数据访问
How should I design my schema in HBase? How can I store (fill in the blank) in HBase? How can I handle secondary indexes in HBase? Can I change a table's rowkeys? What APIs does HBase support?
	How should I design my schema in HBase?
	参考 Chapter 5, Data Model and Chapter 6, HBase and Schema Design
	How can I store (fill in the blank) in HBase?
	参考 Section 6.5, “ Supported Datatypes ”.
	How can I handle secondary indexes in HBase?
	参考 Section 6.9, “ Secondary Indexes and Alternate Query Paths ”
	Can I change a table's rowkeys?
	This is a very common quesiton. You can't. 参考 Section 6.3.5, “Immutability of Rowkeys”.
	What APIs does HBase support?
	参考 Chapter 5, Data Model, Section 9.3, “Client” and Section 10.1, “非Java 语言和 JVM 通话”.
A.5. MapReduce
How can I use MapReduce with HBase?
	How can I use MapReduce with HBase?
	参考 Chapter 7, HBase and MapReduce
A.6. 性能和问题定位
How can I improve HBase cluster performance? How can I troubleshoot my HBase cluster?
	How can I improve HBase cluster performance?
	参考 Chapter 11, Performance Tuning.
	How can I troubleshoot my HBase cluster?
	参考 Chapter 12, Troubleshooting and Debugging HBase.
A.7. Amazon EC2
I am running HBase on Amazon EC2 and...
	I am running HBase on Amazon EC2 and...
	EC2 issues are a special case. 参考 Troubleshooting Section 12.12, “Amazon EC2” and Performance Section 11.11, “Amazon EC2” sections.
A.8. 操作
How do I manage my HBase cluster? How do I back up my HBase cluster?
	How do I manage my HBase cluster?
	参考 Chapter 14, HBase Operational Management
	How do I back up my HBase cluster?
	参考 Section 14.7, “HBase Backup”
A.9. HBase 实践
Where can I find interesting videos and presentations on HBase?
	到哪里找到感兴趣的 HBase 相关视频和幻灯?
	参考 Appendix F, Other Information About HBase

hfile.LASTKEY	The last key of the file (byte array)
hfile.AVG_KEY_LEN	The average key length in the file (int)
hfile.AVG_VALUE_LEN	The average value length in the file (int)

Version 1	Version 2
File info offset (long)
Data index offset (long)	loadOnOpenOffset (long) The offset of the section that we need toload when opening the file.
Number of data index entries (int)
metaIndexOffset (long) This field is not being used by the version 1 reader, so we removed it from version 2.	uncompressedDataIndexSize (long) The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks.
Number of meta index entries (int)
Total uncompressed bytes (long)
numEntries (int)	numEntries (long)
Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
	The number of levels in the data block index (int)
	firstDataBlockOffset (long) The offset of the first first data block. Used when scanning.
	lastDataBlockEnd (long) The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning.
Version: 1 (int)	Version: 2 (int)

Apache HBase™ 参考指南

HBase 官方文档中文版

序

最前面的话

Chapter 1. 入门

1.1. 介绍

1.2. 快速开始

1.2.1. 下载解压最新版本

1.2.2. 启动 HBase

是否安装了 java ?

1.2.3. Shell 练习

1.2.4. 停止 HBase

1.2.5. 下一步该做什么

2. 配置

2.1. 基础条件

2.1.1 java

2.1. 操作系统

2.1.2.1. ssh

2.1.2.2. DNS

2.1.2.3. Loopback IP

2.1.2.4. NTP

2.1.2.5. ulimit 和 nproc

2.1.2.5.1. 在Ubuntu上设置ulimit

2.1.2.6. Windows

2.1.3. hadoop

2.1.3.1. Apache HBase 0.92 and 0.94

2.1.3.2. Apache HBase 0.96

2.1.3.3. Hadoop versions 0.20.x - 1.x

2.1.3.4. Hadoop 安全性

2.1.3.5. dfs.datanode.max.xcievers

2.2. HBase运行模式:单机和分布式

2.21. 单机模式

2.2.2. 分布式模式

2.2.2.1. 伪分布式模式

Note

Note

2.2.2.1.1. 伪分布模式配置文件

2.2.2.1.2. 伪分布模式附加

2.2.2.1.2.1. 启动

2.2.2.1.2.2. 停止

2.2.2.2. 完全分布式模式

2.2.2.2.1. regionservers

2.2.2.2.2. ZooKeeper 和 HBase

2.2.2.2.3. HDFS客户端配置

2.2.3. 运行和确认你的安装

2.3. 配置文件

2.3.1. hbase-site.xml 和 hbase-default.xml

2.3.1.1. HBase 默认配置

HBase 默认配置

2.3.2. hbase-env.sh

2.3.3. log4j.properties

2.3.4. 连接HBase集群的客户端配置和依赖

2.3.4.1. Java客户端配置

Java是如何读到hbase-site.xml 的内容的

2.4. 配置示例

2.4.1. 简单的分布式HBase安装

2.4.1.1. hbase-site.xml

2.4.1.2. regionservers

2.4.1.3. hbase-env.sh

2.5. 重要的配置

2.5.1. 必须的配置

2.5.2. 推荐配置

2.5.2.1. zookeeper.session.timeout

2.5.2.2. ZooKeeper 实例个数

2.5.2.3. hbase.regionserver.handler.count

2.5.2.4. 大内存机器的配置

2.5.2.5. 压缩

2.5.2.6. 较大 Regions

2.5.2.7. 管理 Splitting

2.5.2.8. 管理 Compactions

2.5.2.9. 预测执行 (Speculative Execution)

2.5.3. 其他配置

2.5.3.1. 负载均衡

2.5.3.2. 禁止块缓存(Blockcache)

2.5.3.3. Nagle算法 或小包问题

Chapter 3. 升级

3.1. 从 0.94.x 升级到 0.96.x

奇点

3.2. 从 0.92.x 升级到 0.94.x

3.3. 从 0.90.x 到 0.92.x 升级

2.1.2.5. `ulimit` 和 `nproc`

2.1.2.5.1. 在Ubuntu上设置`ulimit`

2.1.3.5. `dfs.datanode.max.xcievers`

2.2.2.2.1. `regionservers`

2.3.1. `hbase-site.xml` 和 `hbase-default.xml`

2.3.2. `hbase-env.sh`

2.3.3. `log4j.properties`

Java是如何读到`hbase-site.xml` 的内容的

2.4.1.1. `hbase-site.xml`

2.4.1.2. `regionservers`

2.4.1.3. `hbase-env.sh`

2.5.3.3. Nagle算法或小包问题

4.2.1. `irbrc`