~bigdata-dev/charms/trusty/hdp-tez/trunk

« back to all changes in this revision

Viewing changes to README.md

Committer: amir sanjar
Date: 2014-09-17 03:03:54 UTC
Revision ID: amir.sanjar@canonical.com-20140917030354-ap11m9367vbfdr5t

adding amulet test

files added:
hadoop-tez-cluster.yaml

files removed:
config.yaml

hooks/config-changed

hooks/start

hooks/stop

hooks/upgrade-charm

files modified:
README.md

hooks/namenode-relation-changed

metadata.yaml

tests/10-deploy-tez

Show diffs side-by-side

added added

removed removed

README.md

# What is Tez

# **What is Tez**

Apache Tez, a Framework for YARN-based, Data Processing Applications In Hadoop.

Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as Apache Hive and Apache Pig, as well as 3rd-party software vendors to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale.

Apache™ Tez is an extensible framework for building YARN based, high performance

batch and interactive data processing applications in Hadoop that need to handle

TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as

Apache Hive and Apache Pig, as well as 3rd-party software vendors to express

fit-to-purpose data processing applications in a way that meets their unique

demands for fast response times and extreme throughput at petabyte scale.

Why Apache Tez

Apache Tez provides a developer API and framework to write native YARN applications that bridge the spectrum of interactive and batch workloads. It allows applications to seamlessly span the scalability dimension from GB’s to PB’s of data and 10’s to 1000’s of nodes. The Apache Tez component library allows developers to use Tez to create Hadoop applications that integrate with YARN and perform well within mixed workload Hadoop clusters.

And, since Tez is extensible and embeddable, it provides the fit-to-purpose freedom to express highly optimized data processing applications, giving them an advantage over general-purpose, end-user-facing engines such as MapReduce and Spark. Finally, it offers a customizable execution architecture that allows you to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.

# Tez use

Apache Tez provides a developer API and framework to write native YARN

applications that bridge the spectrum of interactive and batch workloads.

It allows applications to seamlessly span the scalability dimension from

GB’s to PB’s of data and 10’s to 1000’s of nodes. The Apache Tez component

library allows developers to use Tez to create Hadoop applications that

integrate with YARN and perform well within mixed workload Hadoop clusters.

And, since Tez is extensible and embeddable, it provides the fit-to-purpose

freedom to express highly optimized data processing applications, giving

them an advantage over general-purpose, end-user-facing engines such as

MapReduce and Spark. Finally, it offers a customizable execution architecture

that allows you to express complex computations as dataflow graphs and allows

for dynamic performance optimizations based on real information about the data

and the resources required to process it.

## **Tez usecase**

Verify that your cluster meets the following pre-requisites before installing Tez:

Apache Hadoop 2.4.x & YARN

**To deploy a four node Hadoop cluster**

juju deploy hdp-hadoop yarn-hdfs-master

juju deploy hdp-hadoop compute-node

juju add-unit -n 2 compute-node

juju add-relation yarn-hdfs-master:namenode compute-node:datanode

juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager

**To deploy a Tez Client::**

juju add-relation hdp-tez$1:namenode yarn-hdfs-master:namenode

## TEZ scale

juju add-unit -n 2 compute-node

## verify deployement

**install**

execute:

>> juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0

successful result:

hdfs users ... /apps/tez/conf

hdfs users ... /apps/tez/lib

hdfs users ... /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar

## **TEZ scale**

juju add-unit -n 2 compute-node

## **Verify deployment**

execute:

$juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0

A successful result:

hdfs users ... /apps/tez/conf

hdfs users ... /apps/tez/lib

hdfs users ... /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar

hdfs users ... /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar

**HDFS validation from Tez Client**

1) Remote HDFS Cluster health

>> juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0

$juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0

** validate the returned information **

2) Validate a successful create directory on hdfs cluster

>> juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0

3) Copy a test data file to hdfs cluster

>> juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0

4) Run Tez world-count example -

>> juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0

5) View the result save on hdfs cluster:

>> juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0

# Tez Contact Information

amir sanjar <amir.sanjar@canonical.com>

## Hortonowrks TezUpstream Project Name

$juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0

3) Copy a test data file to hdfs cluster

$juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0

4) Run Tez world-count example -

$ juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0

5) View the result save on hdfs cluster:

$juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0

## **Tez Contact Information**

Amir Sanjar <amir.sanjar@canonical.com>

## **Hortonowrks TezUpstream Project Name**

- [Upstream website] (http://hortonworks.com/)

Older »