~asanjar/charms/precise/hive2/trunk

« back to all changes in this revision

Viewing changes to README.md

Committer: amir sanjar
Date: 2014-06-18 15:47:14 UTC
Revision ID: amir.sanjar@canonical.com-20140618154714-bl4b7p81h5oiifqr

initial precise hive2 commit

files added:

README.md

config.yaml

files

files/archives

files/archives/hadoop-2.2.0-client.tar.gz

files/upstart

files/upstart/defaults

hooks

hooks/config-changed

hooks/db-relation-changed

hooks/elk-relation-joined

hooks/hive-common

hooks/install

hooks/metastore-relation-joined

hooks/namenode-relation-broken

hooks/namenode-relation-changed

hooks/resourcemanager-relation-broken

hooks/resourcemanager-relation-changed

hooks/server-relation-changed

hooks/start

hooks/stop

hooks/upgrade-charm

icon.svg

metadata.yaml

revision

Show diffs side-by-side

added added

removed removed

README.md

# Overview

Data warehouse infrastructure built on top of Hadoop.

Hive 0.11.3 is a data warehouse infrastructure built on top of Hadoop that

provides tools to enable easy data summarization, adhoc querying and

analysis of large datasets data stored in Hadoop files. It provides a

mechanism to put structure on this data and it also provides a simple

query language called Hive QL which is based on SQL and which enables

users familiar with SQL to query this data. At the same time, this

language also allows traditional map/reduce programmers to be able to

plug in their custom mappers and reducers to do more sophisticated

analysis which may not be supported by the built-in capabilities of

the language.

Hive provides:

- HiveQL - An SQL dialect language for querying data in a RDBMS fashion

- UDF/UDAF/UDTF (User Defined [Aggregate/Table] Functions) - Allows user to

create custom Map/Reduce based functions for regular use

- Ability to do joins (inner/outer/semi) between tables

- Support (limited) for sub-queries

- Support for table 'Views'

- Ability to partition data into Hive partitions or buckets to enable faster

querying

- Hive Web Interface - A web interface to Hive

- Hive Server2 - Supports multi-suer querying using Thrift, JDBC and ODBC clients

- Hive Metastore - Ability to run a separate Metadata storage process

-* Hive cli - A Hive commandline that supports HiveQL

See [http://hive.apache.org]http://hive.apache.org) for more information.

This charm provides the Hive Server and Metastore roles which form part of an

overall Hive deployment.

## Usage

A Hive deployment consists of a Hive service, a RDBMS (only MySQL is currently

supported), an optional Metastore service and a Hadoop cluster.

To deploy a simple four node Hadoop cluster (see Hadoop charm README for further

information)::

juju deploy hadoop hadoop-master

juju deploy hadoop hadoop-slavecluster

juju add-unit -n 2 hadoop-slavecluster

juju add-relation hadoop-master:namenode hadoop-slavecluster:datanode

juju add-relation hadoop-master:resourcemanager hadoop-slavecluster:nodemanager

A Hive server stores metadata in MySQL::

juju deploy mysql

# hive requires ROW binlog

juju set mysql binlog-format=ROW

To deploy a Hive service without a Metastore service::

# deploy Hive instance (hive-server2)

juju deploy hive2 hive-server

# associate Hive with MySQL

juju add-relation hive-server:db mysql:db

# associate Hive with HDFS Namenode

juju add-relation hive-server:namenode hadoop-master:namenode

# associate Hive with resourcemanager

juju add-relation hive-server:resourcemanager hadoop-master:resourcemanager

To deploy a Hive service with a Metastore service::

# deploy Metastore instance

juju deploy hive2 hive-metastore

# associate Metastore with MySQL

juju add-relation hive-metastore:db mysql:db

# associate Metastore with Namenode

juju add-relation hive-metastore:namenode hadoop-master:namenode

# deploy Hive instance

juju deploy hive2 hive-server

# associate Hive with Metastore

juju add-relation hive-server:server hive-metastore:metastore

# associate Hive with Namenode

juju add-relation hive-server:namenode hadoop-master:namenode

# associate Hive with resourcemanager

juju add-relation hive-server:resourcemanager hadoop-master:resourcemanager

Further Hive service units may be deployed::

juju add-unit hive-server

This currently only works when using a Metastore service.

Older »