~ubuntu-branches/ubuntu/precise/hbase/precise

<a href="https://issues.apache.org/jira/browse/HDFS-630">HDFS-630: <em>"In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block"</em></a>.

Dead DataNodes take ten minutes to timeout at NameNode.

In the meantime the NameNode can still send DFSClients to the dead DataNode as host for

a replicated block. DFSClient can get stuck on trying to get block from a

dead node. This patch allows DFSClients pass NameNode lists of known dead DataNodes.

</li>

</ul>

</li>

<li>

HBase is a database, it uses a lot of files at the same time. The default <b>ulimit -n</b> of 1024 on *nix systems is insufficient.

Any significant amount of loading will lead you to

<a href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>.

You will also notice errors like:

<pre>

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901

</pre>

100

Do yourself a favor and change this to more than 10k using the FAQ.

101

Also, HDFS has an upper bound of files that it can serve at the same time, called xcievers (yes, this is <em>misspelled</em>). Again, before doing any loading,

102

make sure you configured Hadoop's conf/hdfs-site.xml with this:

103

<pre>

104

105

<name>dfs.datanode.max.xcievers</name>

106

107

</property>

108

</pre>

109

See the background of this issue here: <a href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5">Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"</a>.

110

Failure to follow these instructions will result in <b>data loss</b>.

111

</li>

112

113

</ul>

114

115

<h3><a name="windows">Windows</a></h3>

116

If you are running HBase on Windows, you must install

117

<a href="http://cygwin.com/">Cygwin</a>

118

to have a *nix-like environment for the shell scripts. The full details

119

are explained in

120

the <a href="../cygwin.html">Windows Installation</a>

121

guide.

122

123

124

<h2><a name="getting_started" >Getting Started</a></h2>

125

<p>What follows presumes you have obtained a copy of HBase,

126

see <a href="http://hadoop.apache.org/hbase/releases.html">Releases</a>, and are installing

127

for the first time. If upgrading your HBase instance, see <a href="#upgrading">Upgrading</a>.</p>

128

129

<p>Three modes are described: <em>standalone</em>, <em>pseudo-distributed</em> (where all servers are run on

130

a single host), and <em>fully-distributed</em>. If new to HBase start by following the standalone instructions.</p>

131

132

<p>Begin by reading <a href="#requirements">Requirements</a>.</p>

133

134

<p>Whatever your mode, define <code>${HBASE_HOME}</code> to be the location of the root of your HBase installation, e.g.

135

<code>/user/local/hbase</code>. Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can

136

set the heapsize for HBase, etc. At a minimum, set <code>JAVA_HOME</code> to point at the root of

137

your Java installation.</p>

138

139

<h3><a name="standalone">Standalone mode</a></h3>

140

<p>If you are running a standalone operation, there should be nothing further to configure; proceed to

141

<a href="#runandconfirm">Running and Confirming Your Installation</a>. If you are running a distributed

142

operation, continue reading.</p>

143

144

<h3><a name="distributed">Distributed Operation: Pseudo- and Fully-distributed modes</a></h3>

145

<p>Distributed modes require an instance of the <em>Hadoop Distributed File System</em> (DFS).

146

See the Hadoop <a href="http://hadoop.apache.org/common/docs/r0.20.1/api/overview-summary.html#overview_description">

147

requirements and instructions</a> for how to set up a DFS.</p>

148

149

<h4><a name="pseudo-distrib">Pseudo-distributed mode</a></h4>

150

<p>A pseudo-distributed mode is simply a distributed mode run on a single host.

151

Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of

152

<code>${HBASE_HOME}/conf/hbase-site.xml</code>, which needs to be pointed at the running Hadoop DFS instance.

153

Use <code>hbase-site.xml</code> to override the properties defined in

154

<code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself

155

should never be modified). At a minimum the <code>hbase.rootdir</code> property should be redefined

156

in <code>hbase-site.xml</code> to point HBase at the Hadoop filesystem to use. For example, adding the property

157

below to your <code>hbase-site.xml</code> says that HBase should use the <code>/hbase</code> directory in the

158

HDFS whose namenode is at port 9000 on your local machine:</p>

159

160

<pre>

161

162

...

163

164

<name>hbase.rootdir</name>

165

<value>hdfs://localhost:9000/hbase</value>

166

<description>The directory shared by region servers.

167

</description>

168

</property>

169

...

170

</configuration>

171

</pre>

172

</blockquote>

173

174

<p>Note: Let HBase create the directory. If you don't, you'll get warning saying HBase

175

needs a migration run because the directory is missing files expected by HBase (it'll

176

create them if you let it).</p>

177

<p>Also Note: Above we bind to localhost. This means that a remote client cannot

178

connect. Amend accordingly, if you want to connect from a remote location.</p>

179

180

<h4><a name="fully-distrib">Fully-Distributed Operation</a></h4>

181

<p>For running a fully-distributed operation on more than one host, the following

182

configurations must be made <em>in addition</em> to those described in the

183

<a href="#pseudo-distrib">pseudo-distributed operation</a> section above.</p>

184

185

<p>In <code>hbase-site.xml</code>, set <code>hbase.cluster.distributed</code> to <code>true</code>.</p>

186

187

<pre>

188

189

...

190

191

<name>hbase.cluster.distributed</name>

192

193

<description>The mode the cluster will be in. Possible values are

194

false: standalone and pseudo-distributed setups with managed Zookeeper

195

true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)

196

</description>

197

</property>

198

...

199

</configuration>

200

</pre>

201

</blockquote>

202

203

<p>In fully-distributed mode, you probably want to change your <code>hbase.rootdir</code>

204

from localhost to the name of the node running the HDFS NameNode. In addition

205

to <code>hbase-site.xml</code> changes, a fully-distributed mode requires that you

206

modify <code>${HBASE_HOME}/conf/regionservers</code>.

207

The <code>regionserver</code> file lists all hosts running <code>HRegionServer</code>s, one host per line

208

(This file in HBase is like the Hadoop slaves file at <code>${HADOOP_HOME}/conf/slaves</code>).</p>

209

210

<p>A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients

211

need to be able to get to the running ZooKeeper cluster.

212

HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it.

213

To toggle HBase management of ZooKeeper, use the <code>HBASE_MANAGES_ZK</code> variable in <code>${HBASE_HOME}/conf/hbase-env.sh</code>.

214

This variable, which defaults to <code>true</code>, tells HBase whether to

215

start/stop the ZooKeeper quorum servers alongside the rest of the servers.</p>

216

217

<p>When HBase manages the ZooKeeper cluster, you can specify ZooKeeper configuration

218

using its canonical <code>zoo.cfg</code> file (see below), or

219

just specify ZookKeeper options directly in the <code>${HBASE_HOME}/conf/hbase-site.xml</code>

220

(If new to ZooKeeper, go the path of specifying your configuration in HBase's hbase-site.xml).

221

Every ZooKeeper configuration option has a corresponding property in the HBase hbase-site.xml

222

XML configuration file named <code>hbase.zookeeper.property.OPTION</code>.

223

For example, the <code>clientPort</code> setting in ZooKeeper can be changed by

224

setting the <code>hbase.zookeeper.property.clientPort</code> property.

225

For the full list of available properties, see ZooKeeper's <code>zoo.cfg</code>.

226

For the default values used by HBase, see <code>${HBASE_HOME}/conf/hbase-default.xml</code>.</p>

227

228

<p>At minimum, you should set the list of servers that you want ZooKeeper to run

229

on using the <code>hbase.zookeeper.quorum</code> property.

230

This property defaults to <code>localhost</code> which is not suitable for a

231

fully distributed HBase (it binds to the local machine only and remote clients

232

will not be able to connect).

233

It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give each

234

ZooKeeper server around 1GB of RAM, and if possible, its own dedicated disk.

235

For very heavily loaded clusters, run ZooKeeper servers on separate machines from the

236

Region Servers (DataNodes and TaskTrackers).</p>

237

238

<p>To point HBase at an existing ZooKeeper cluster, add

239

a suitably configured <code>zoo.cfg</code> to the <code>CLASSPATH</code>.

240

HBase will see this file and use it to figure out where ZooKeeper is.

241

Additionally set <code>HBASE_MANAGES_ZK</code> in <code>${HBASE_HOME}/conf/hbase-env.sh</code>

242

to <code>false</code> so that HBase doesn't mess with your ZooKeeper setup:</p>

243

244

<pre>

245

...

246

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

247

export HBASE_MANAGES_ZK=false

248

</pre>

249

</blockquote>

250

251

<p>As an example, to have HBase manage a ZooKeeper quorum on nodes

252

<em>rs{1,2,3,4,5}.example.com</em>, bound to port 2222 (the default is 2181), use:</p>

253

254

<pre>

255

${HBASE_HOME}/conf/hbase-env.sh:

256

257

...

258

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

259

export HBASE_MANAGES_ZK=true

260

261

${HBASE_HOME}/conf/hbase-site.xml:

262

263

264

...

265

266

<name>hbase.zookeeper.property.clientPort</name>

267

268

<description>Property from ZooKeeper's config zoo.cfg.

269

The port at which the clients will connect.

270

</description>

271

</property>

272

...

273

274

<name>hbase.zookeeper.quorum</name>

275

<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>

276

<description>Comma separated list of servers in the ZooKeeper Quorum.

277

For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".

278

By default this is set to localhost for local and pseudo-distributed modes

279

of operation. For a fully-distributed setup, this should be set to a full

280

list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh

281

this is the list of servers which we will start/stop ZooKeeper on.

282

</description>

283

</property>

284

...

285

</configuration>

286

</pre>

287

</blockquote>

288

289

<p>When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part

290

of the regular start/stop scripts. If you would like to run it yourself, you can

291

do:</p>

292

293

<pre>${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper</pre>

294

</blockquote>

295

296

<p>Note that you can use HBase in this manner to spin up a ZooKeeper cluster,

297

unrelated to HBase. Just make sure to set <code>HBASE_MANAGES_ZK</code> to

298

<code>false</code> if you want it to stay up so that when HBase shuts down it

299

doesn't take ZooKeeper with it.</p>

300

301

<p>For more information about setting up a ZooKeeper cluster on your own, see

302

the ZooKeeper <a href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting Started Guide</a>.

303

HBase currently uses ZooKeeper version 3.2.0, so any cluster setup with a

304

3.x.x version of ZooKeeper should work.</p>

305

306

<p>Of note, if you have made <em>HDFS client configuration</em> on your Hadoop cluster, HBase will not

307

see this configuration unless you do one of the following:</p>

308

<ul>

309

<li>Add a pointer to your <code>HADOOP_CONF_DIR</code> to <code>CLASSPATH</code> in <code>hbase-env.sh</code>.</li>

310

<li>Add a copy of <code>hdfs-site.xml</code> (or <code>hadoop-site.xml</code>) to <code>${HBASE_HOME}/conf</code>, or</li>

311

<li>if only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code>.</li>

312

</ul>

313

314

<p>An example of such an HDFS client configuration is <code>dfs.replication</code>. If for example,

315

you want to run with a replication factor of 5, hbase will create files with the default of 3 unless

316

you do the above to make the configuration available to HBase.</p>

317

318

319

<h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2>

320

<p>If you are running in standalone, non-distributed mode, HBase by default uses the local filesystem.</p>

321

322

<p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons and

323

ZooKeeper Quorum before starting HBase and stop the daemons after HBase has shut down.</p>

324

325

<p>Start and stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>.

326

You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.

327

HBase does not normally use the mapreduce daemons. These do not need to be started.</p>

328

329

<p>Start up your ZooKeeper cluster.</p>

330

331

<p>Start HBase with the following command:</p>

332

333

<pre>${HBASE_HOME}/bin/start-hbase.sh</pre>

334

</blockquote>

335

336

<p>Once HBase has started, enter <code>${HBASE_HOME}/bin/hbase shell</code> to obtain a

337

shell against HBase from which you can execute commands.

338

Type 'help' at the shells' prompt to get a list of commands.

339

Test your running install by creating tables, inserting content, viewing content, and then dropping your tables.

340

For example:</p>

341

342

<pre>

343

hbase> # Type "help" to see shell help screen

344

hbase> help

345

hbase> # To create a table named "mylittletable" with a column family of "mylittlecolumnfamily", type

346

hbase> create "mylittletable", "mylittlecolumnfamily"

347

hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type

348

hbase> describe "mylittletable"

349

hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do

350

hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v"

351

hbase> # To get the cell just added, do

352

hbase> get "mylittletable", "myrow"

353

hbase> # To scan you new table, do

354

hbase> scan "mylittletable"

355

</pre>

356

</blockquote>

357

358

<p>To stop HBase, exit the HBase shell and enter:</p>

359

360

<pre>${HBASE_HOME}/bin/stop-hbase.sh</pre>

361

</blockquote>

362

363

<p>If you are running a distributed operation, be sure to wait until HBase has shut down completely

364

before stopping the Hadoop daemons.</p>

365

366

<p>The default location for logs is <code>${HBASE_HOME}/logs</code>.</p>

367

368

<p>HBase also puts up a UI listing vital attributes. By default its deployed on the master host

369

at port 60010 (HBase RegionServers listen on port 60020 by default and put up an informational

370

http server at 60030).</p>

371

372

<h2><a name="upgrading" >Upgrading</a></h2>

373

<p>After installing a new HBase on top of data written by a previous HBase version, before

374

starting your cluster, run the <code>${HBASE_DIR}/bin/hbase migrate</code> migration script.

375

It will make any adjustments to the filesystem data under <code>hbase.rootdir</code> necessary to run

376

the HBase version. It does not change your install unless you explicitly ask it to.</p>

377

378

<h2><a name="client_example">Example API Usage</a></h2>

379

<p>For sample Java code, see <a href="org/apache/hadoop/hbase/client/package-summary.html#package_description">org.apache.hadoop.hbase.client</a> documentation.</p>

380

381

<p>If your client is NOT Java, consider the Thrift or REST libraries.</p>

382

383

<h2><a name="related" >Related Documentation</a></h2>

384

<ul>

385

<li><a href="http://hbase.org">HBase Home Page</a>

386

<li><a href="http://wiki.apache.org/hadoop/Hbase">HBase Wiki</a>

387

<li><a href="http://hadoop.apache.org/">Hadoop Home Page</a>

388

<li><a href="http://wiki.apache.org/hadoop/Hbase/MultipleMasters">Setting up Multiple HBase Masters</a>

389

<li><a href="http://wiki.apache.org/hadoop/Hbase/RollingRestart">Rolling Upgrades</a>

390

<li><a href="org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description">Transactional HBase</a>

391

<li><a href="org/apache/hadoop/hbase/client/tableindexed/package-summary.html">Table Indexed HBase</a>

392

<li><a href="org/apache/hadoop/hbase/stargate/package-summary.html#package_description">Stargate</a> -- a RESTful Web service front end for HBase.

393

</ul>

394

395

</body>

396

</html>

Older »