1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
5
Licensed to the Apache Software Foundation (ASF) under one or more
6
contributor license agreements. See the NOTICE file distributed with
7
this work for additional information regarding copyright ownership.
8
The ASF licenses this file to You under the Apache License, Version 2.0
9
(the "License"); you may not use this file except in compliance with
10
the License. You may obtain a copy of the License at
12
http://www.apache.org/licenses/LICENSE-2.0
14
Unless required by applicable law or agreed to in writing, software
15
distributed under the License is distributed on an "AS IS" BASIS,
16
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17
See the License for the specific language governing permissions and
18
limitations under the License.
24
<body bgcolor="white">
25
<a href="http://hbase.org">HBase</a> is a scalable, distributed database built on <a href="http://hadoop.apache.org/core">Hadoop Core</a>.
27
<h2>Table of Contents</h2>
30
<a href="#requirements">Requirements</a>
32
<li><a href="#windows">Windows</a></li>
36
<a href="#getting_started" >Getting Started</a>
38
<li><a href="#standalone">Standalone</a></li>
40
<a href="#distributed">Distributed Operation: Pseudo- and Fully-distributed modes</a>
42
<li><a href="#pseudo-distrib">Pseudo-distributed</a></li>
43
<li><a href="#fully-distrib">Fully-distributed</a></li>
48
<li><a href="#runandconfirm">Running and Confirming Your Installation</a></li>
49
<li><a href="#upgrading" >Upgrading</a></li>
50
<li><a href="#client_example">Example API Usage</a></li>
51
<li><a href="#related" >Related Documentation</a></li>
54
<h2><a name="requirements">Requirements</a></h2>
56
<li>Java 1.6.x, preferably from <a href="http://www.java.com/download/">Sun</a>. Use the latest version available except u18 (u19 is fine).</li>
57
<li>This version of HBase will only run on <a href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</a>.</li>
59
<em>ssh</em> must be installed and <em>sshd</em> must be running to use Hadoop's scripts to manage remote Hadoop daemons.
60
You must be able to ssh to all nodes, including your local node, using passwordless login
61
(Google "ssh passwordless login").
64
HBase depends on <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a> as of release 0.20.0.
65
HBase keeps the location of its root table, who the current master is, and what regions are
66
currently participating in the cluster in ZooKeeper.
67
Clients and Servers now must know their <em>ZooKeeper Quorum locations</em> before
68
they can do anything else (Usually they pick up this information from configuration
69
supplied on their CLASSPATH). By default, HBase will manage a single ZooKeeper instance for you.
70
In <em>standalone</em> and <em>pseudo-distributed</em> modes this is usually enough, but for
71
<em>fully-distributed</em> mode you should configure a ZooKeeper quorum (more info below).
73
<li>Hosts must be able to resolve the fully-qualified domain name of the master.</li>
75
The clocks on cluster members should be in basic alignments. Some skew is tolerable but
76
wild skew could generate odd behaviors. Run <a href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</a>
77
on your cluster, or an equivalent.
80
This is the current list of patches we recommend you apply to your running Hadoop cluster:
83
<a href="https://issues.apache.org/jira/browse/HDFS-630">HDFS-630: <em>"In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block"</em></a>.
84
Dead DataNodes take ten minutes to timeout at NameNode.
85
In the meantime the NameNode can still send DFSClients to the dead DataNode as host for
86
a replicated block. DFSClient can get stuck on trying to get block from a
87
dead node. This patch allows DFSClients pass NameNode lists of known dead DataNodes.
92
HBase is a database, it uses a lot of files at the same time. The default <b>ulimit -n</b> of 1024 on *nix systems is insufficient.
93
Any significant amount of loading will lead you to
94
<a href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>.
95
You will also notice errors like:
97
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
98
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
100
Do yourself a favor and change this to more than 10k using the FAQ.
101
Also, HDFS has an upper bound of files that it can serve at the same time, called xcievers (yes, this is <em>misspelled</em>). Again, before doing any loading,
102
make sure you configured Hadoop's conf/hdfs-site.xml with this:
105
<name>dfs.datanode.max.xcievers</name>
106
<value>2047</value>
109
See the background of this issue here: <a href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5">Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"</a>.
110
Failure to follow these instructions will result in <b>data loss</b>.
115
<h3><a name="windows">Windows</a></h3>
116
If you are running HBase on Windows, you must install
117
<a href="http://cygwin.com/">Cygwin</a>
118
to have a *nix-like environment for the shell scripts. The full details
120
the <a href="../cygwin.html">Windows Installation</a>
124
<h2><a name="getting_started" >Getting Started</a></h2>
125
<p>What follows presumes you have obtained a copy of HBase,
126
see <a href="http://hadoop.apache.org/hbase/releases.html">Releases</a>, and are installing
127
for the first time. If upgrading your HBase instance, see <a href="#upgrading">Upgrading</a>.</p>
129
<p>Three modes are described: <em>standalone</em>, <em>pseudo-distributed</em> (where all servers are run on
130
a single host), and <em>fully-distributed</em>. If new to HBase start by following the standalone instructions.</p>
132
<p>Begin by reading <a href="#requirements">Requirements</a>.</p>
134
<p>Whatever your mode, define <code>${HBASE_HOME}</code> to be the location of the root of your HBase installation, e.g.
135
<code>/user/local/hbase</code>. Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can
136
set the heapsize for HBase, etc. At a minimum, set <code>JAVA_HOME</code> to point at the root of
137
your Java installation.</p>
139
<h3><a name="standalone">Standalone mode</a></h3>
140
<p>If you are running a standalone operation, there should be nothing further to configure; proceed to
141
<a href="#runandconfirm">Running and Confirming Your Installation</a>. If you are running a distributed
142
operation, continue reading.</p>
144
<h3><a name="distributed">Distributed Operation: Pseudo- and Fully-distributed modes</a></h3>
145
<p>Distributed modes require an instance of the <em>Hadoop Distributed File System</em> (DFS).
146
See the Hadoop <a href="http://hadoop.apache.org/common/docs/r0.20.1/api/overview-summary.html#overview_description">
147
requirements and instructions</a> for how to set up a DFS.</p>
149
<h4><a name="pseudo-distrib">Pseudo-distributed mode</a></h4>
150
<p>A pseudo-distributed mode is simply a distributed mode run on a single host.
151
Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
152
<code>${HBASE_HOME}/conf/hbase-site.xml</code>, which needs to be pointed at the running Hadoop DFS instance.
153
Use <code>hbase-site.xml</code> to override the properties defined in
154
<code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself
155
should never be modified). At a minimum the <code>hbase.rootdir</code> property should be redefined
156
in <code>hbase-site.xml</code> to point HBase at the Hadoop filesystem to use. For example, adding the property
157
below to your <code>hbase-site.xml</code> says that HBase should use the <code>/hbase</code> directory in the
158
HDFS whose namenode is at port 9000 on your local machine:</p>
161
<configuration>
164
<name>hbase.rootdir</name>
165
<value>hdfs://localhost:9000/hbase</value>
166
<description>The directory shared by region servers.
170
</configuration>
174
<p>Note: Let HBase create the directory. If you don't, you'll get warning saying HBase
175
needs a migration run because the directory is missing files expected by HBase (it'll
176
create them if you let it).</p>
177
<p>Also Note: Above we bind to localhost. This means that a remote client cannot
178
connect. Amend accordingly, if you want to connect from a remote location.</p>
180
<h4><a name="fully-distrib">Fully-Distributed Operation</a></h4>
181
<p>For running a fully-distributed operation on more than one host, the following
182
configurations must be made <em>in addition</em> to those described in the
183
<a href="#pseudo-distrib">pseudo-distributed operation</a> section above.</p>
185
<p>In <code>hbase-site.xml</code>, set <code>hbase.cluster.distributed</code> to <code>true</code>.</p>
188
<configuration>
191
<name>hbase.cluster.distributed</name>
192
<value>true</value>
193
<description>The mode the cluster will be in. Possible values are
194
false: standalone and pseudo-distributed setups with managed Zookeeper
195
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
199
</configuration>
203
<p>In fully-distributed mode, you probably want to change your <code>hbase.rootdir</code>
204
from localhost to the name of the node running the HDFS NameNode. In addition
205
to <code>hbase-site.xml</code> changes, a fully-distributed mode requires that you
206
modify <code>${HBASE_HOME}/conf/regionservers</code>.
207
The <code>regionserver</code> file lists all hosts running <code>HRegionServer</code>s, one host per line
208
(This file in HBase is like the Hadoop slaves file at <code>${HADOOP_HOME}/conf/slaves</code>).</p>
210
<p>A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients
211
need to be able to get to the running ZooKeeper cluster.
212
HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it.
213
To toggle HBase management of ZooKeeper, use the <code>HBASE_MANAGES_ZK</code> variable in <code>${HBASE_HOME}/conf/hbase-env.sh</code>.
214
This variable, which defaults to <code>true</code>, tells HBase whether to
215
start/stop the ZooKeeper quorum servers alongside the rest of the servers.</p>
217
<p>When HBase manages the ZooKeeper cluster, you can specify ZooKeeper configuration
218
using its canonical <code>zoo.cfg</code> file (see below), or
219
just specify ZookKeeper options directly in the <code>${HBASE_HOME}/conf/hbase-site.xml</code>
220
(If new to ZooKeeper, go the path of specifying your configuration in HBase's hbase-site.xml).
221
Every ZooKeeper configuration option has a corresponding property in the HBase hbase-site.xml
222
XML configuration file named <code>hbase.zookeeper.property.OPTION</code>.
223
For example, the <code>clientPort</code> setting in ZooKeeper can be changed by
224
setting the <code>hbase.zookeeper.property.clientPort</code> property.
225
For the full list of available properties, see ZooKeeper's <code>zoo.cfg</code>.
226
For the default values used by HBase, see <code>${HBASE_HOME}/conf/hbase-default.xml</code>.</p>
228
<p>At minimum, you should set the list of servers that you want ZooKeeper to run
229
on using the <code>hbase.zookeeper.quorum</code> property.
230
This property defaults to <code>localhost</code> which is not suitable for a
231
fully distributed HBase (it binds to the local machine only and remote clients
232
will not be able to connect).
233
It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each
234
ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk.
235
For very heavily loaded clusters, run ZooKeeper servers on separate machines from the
236
Region Servers (DataNodes and TaskTrackers).</p>
238
<p>To point HBase at an existing ZooKeeper cluster, add
239
a suitably configured <code>zoo.cfg</code> to the <code>CLASSPATH</code>.
240
HBase will see this file and use it to figure out where ZooKeeper is.
241
Additionally set <code>HBASE_MANAGES_ZK</code> in <code>${HBASE_HOME}/conf/hbase-env.sh</code>
242
to <code>false</code> so that HBase doesn't mess with your ZooKeeper setup:</p>
246
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
247
export HBASE_MANAGES_ZK=false
251
<p>As an example, to have HBase manage a ZooKeeper quorum on nodes
252
<em>rs{1,2,3,4,5}.example.com</em>, bound to port 2222 (the default is 2181), use:</p>
255
${HBASE_HOME}/conf/hbase-env.sh:
258
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
259
export HBASE_MANAGES_ZK=true
261
${HBASE_HOME}/conf/hbase-site.xml:
263
<configuration>
266
<name>hbase.zookeeper.property.clientPort</name>
267
<value>2222</value>
268
<description>Property from ZooKeeper's config zoo.cfg.
269
The port at which the clients will connect.
274
<name>hbase.zookeeper.quorum</name>
275
<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
276
<description>Comma separated list of servers in the ZooKeeper Quorum.
277
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
278
By default this is set to localhost for local and pseudo-distributed modes
279
of operation. For a fully-distributed setup, this should be set to a full
280
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
281
this is the list of servers which we will start/stop ZooKeeper on.
285
</configuration>
289
<p>When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part
290
of the regular start/stop scripts. If you would like to run it yourself, you can
293
<pre>${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper</pre>
296
<p>Note that you can use HBase in this manner to spin up a ZooKeeper cluster,
297
unrelated to HBase. Just make sure to set <code>HBASE_MANAGES_ZK</code> to
298
<code>false</code> if you want it to stay up so that when HBase shuts down it
299
doesn't take ZooKeeper with it.</p>
301
<p>For more information about setting up a ZooKeeper cluster on your own, see
302
the ZooKeeper <a href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting Started Guide</a>.
303
HBase currently uses ZooKeeper version 3.2.0, so any cluster setup with a
304
3.x.x version of ZooKeeper should work.</p>
306
<p>Of note, if you have made <em>HDFS client configuration</em> on your Hadoop cluster, HBase will not
307
see this configuration unless you do one of the following:</p>
309
<li>Add a pointer to your <code>HADOOP_CONF_DIR</code> to <code>CLASSPATH</code> in <code>hbase-env.sh</code>.</li>
310
<li>Add a copy of <code>hdfs-site.xml</code> (or <code>hadoop-site.xml</code>) to <code>${HBASE_HOME}/conf</code>, or</li>
311
<li>if only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code>.</li>
314
<p>An example of such an HDFS client configuration is <code>dfs.replication</code>. If for example,
315
you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
316
you do the above to make the configuration available to HBase.</p>
319
<h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2>
320
<p>If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.</p>
322
<p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons and
323
ZooKeeper Quorum before starting HBase and stop the daemons after HBase has shut down.</p>
325
<p>Start and stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>.
326
You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
327
HBase does not normally use the mapreduce daemons. These do not need to be started.</p>
329
<p>Start up your ZooKeeper cluster.</p>
331
<p>Start HBase with the following command:</p>
333
<pre>${HBASE_HOME}/bin/start-hbase.sh</pre>
336
<p>Once HBase has started, enter <code>${HBASE_HOME}/bin/hbase shell</code> to obtain a
337
shell against HBase from which you can execute commands.
338
Type 'help' at the shells' prompt to get a list of commands.
339
Test your running install by creating tables, inserting content, viewing content, and then dropping your tables.
343
hbase> # Type "help" to see shell help screen
345
hbase> # To create a table named "mylittletable" with a column family of "mylittlecolumnfamily", type
346
hbase> create "mylittletable", "mylittlecolumnfamily"
347
hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type
348
hbase> describe "mylittletable"
349
hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do
350
hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v"
351
hbase> # To get the cell just added, do
352
hbase> get "mylittletable", "myrow"
353
hbase> # To scan you new table, do
354
hbase> scan "mylittletable"
358
<p>To stop HBase, exit the HBase shell and enter:</p>
360
<pre>${HBASE_HOME}/bin/stop-hbase.sh</pre>
363
<p>If you are running a distributed operation, be sure to wait until HBase has shut down completely
364
before stopping the Hadoop daemons.</p>
366
<p>The default location for logs is <code>${HBASE_HOME}/logs</code>.</p>
368
<p>HBase also puts up a UI listing vital attributes. By default its deployed on the master host
369
at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational
370
http server at 60030).</p>
372
<h2><a name="upgrading" >Upgrading</a></h2>
373
<p>After installing a new HBase on top of data written by a previous HBase version, before
374
starting your cluster, run the <code>${HBASE_DIR}/bin/hbase migrate</code> migration script.
375
It will make any adjustments to the filesystem data under <code>hbase.rootdir</code> necessary to run
376
the HBase version. It does not change your install unless you explicitly ask it to.</p>
378
<h2><a name="client_example">Example API Usage</a></h2>
379
<p>For sample Java code, see <a href="org/apache/hadoop/hbase/client/package-summary.html#package_description">org.apache.hadoop.hbase.client</a> documentation.</p>
381
<p>If your client is NOT Java, consider the Thrift or REST libraries.</p>
383
<h2><a name="related" >Related Documentation</a></h2>
385
<li><a href="http://hbase.org">HBase Home Page</a>
386
<li><a href="http://wiki.apache.org/hadoop/Hbase">HBase Wiki</a>
387
<li><a href="http://hadoop.apache.org/">Hadoop Home Page</a>
388
<li><a href="http://wiki.apache.org/hadoop/Hbase/MultipleMasters">Setting up Multiple HBase Masters</a>
389
<li><a href="http://wiki.apache.org/hadoop/Hbase/RollingRestart">Rolling Upgrades</a>
390
<li><a href="org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description">Transactional HBase</a>
391
<li><a href="org/apache/hadoop/hbase/client/tableindexed/package-summary.html">Table Indexed HBase</a>
392
<li><a href="org/apache/hadoop/hbase/stargate/package-summary.html#package_description">Stargate</a> -- a RESTful Web service front end for HBase.