2
2
Apache Tez, a Framework for YARN-based, Data Processing Applications In Hadoop.
4
Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as Apache Hive and Apache Pig, as well as 3rd-party software vendors to express fit-to-purpose data processing applications in a way that meets their unique demands for fast response times and extreme throughput at petabyte scale.
4
Apache™ Tez is an extensible framework for building YARN based, high performance
5
batch and interactive data processing applications in Hadoop that need to handle
6
TB to PB scale datasets. It allows projects in the Hadoop ecosystem, such as
7
Apache Hive and Apache Pig, as well as 3rd-party software vendors to express
8
fit-to-purpose data processing applications in a way that meets their unique
9
demands for fast response times and extreme throughput at petabyte scale.
7
Apache Tez provides a developer API and framework to write native YARN applications that bridge the spectrum of interactive and batch workloads. It allows applications to seamlessly span the scalability dimension from GB’s to PB’s of data and 10’s to 1000’s of nodes. The Apache Tez component library allows developers to use Tez to create Hadoop applications that integrate with YARN and perform well within mixed workload Hadoop clusters.
9
And, since Tez is extensible and embeddable, it provides the fit-to-purpose freedom to express highly optimized data processing applications, giving them an advantage over general-purpose, end-user-facing engines such as MapReduce and Spark. Finally, it offers a customizable execution architecture that allows you to express complex computations as dataflow graphs and allows for dynamic performance optimizations based on real information about the data and the resources required to process it.
12
Apache Tez provides a developer API and framework to write native YARN
13
applications that bridge the spectrum of interactive and batch workloads.
14
It allows applications to seamlessly span the scalability dimension from
15
GB’s to PB’s of data and 10’s to 1000’s of nodes. The Apache Tez component
16
library allows developers to use Tez to create Hadoop applications that
17
integrate with YARN and perform well within mixed workload Hadoop clusters.
19
And, since Tez is extensible and embeddable, it provides the fit-to-purpose
20
freedom to express highly optimized data processing applications, giving
21
them an advantage over general-purpose, end-user-facing engines such as
22
MapReduce and Spark. Finally, it offers a customizable execution architecture
23
that allows you to express complex computations as dataflow graphs and allows
24
for dynamic performance optimizations based on real information about the data
25
and the resources required to process it.
13
29
Verify that your cluster meets the following pre-requisites before installing Tez:
14
30
Apache Hadoop 2.4.x & YARN
16
32
**To deploy a four node Hadoop cluster**
17
35
juju deploy hdp-hadoop yarn-hdfs-master
18
36
juju deploy hdp-hadoop compute-node
19
37
juju add-unit -n 2 compute-node
20
38
juju add-relation yarn-hdfs-master:namenode compute-node:datanode
21
39
juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager
23
42
**To deploy a Tez Client::**
27
46
juju add-relation hdp-tez$1:namenode yarn-hdfs-master:namenode
32
juju add-unit -n 2 compute-node
37
>> juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0
39
hdfs users ... /apps/tez/conf
40
hdfs users ... /apps/tez/lib
41
hdfs users ... /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar
42
hdfs users ... /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar
43
hdfs users ... /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar
44
hdfs users ... /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar
45
hdfs users ... /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar
46
hdfs users ... /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar
47
hdfs users ... /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar
48
hdfs users ... /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar
51
juju add-unit -n 2 compute-node
53
## **Verify deployment**
56
$juju run "sudo su hdfs -c 'hdfs dfs -ls /apps/tez'" --unit hdp-tez/0
60
hdfs users ... /apps/tez/conf
61
hdfs users ... /apps/tez/lib
62
hdfs users ... /apps/tez/tez-api-0.4.0.2.1.3.0-563.jar
63
hdfs users ... /apps/tez/tez-common-0.4.0.2.1.3.0-563.jar
64
hdfs users ... /apps/tez/tez-dag-0.4.0.2.1.3.0-563.jar
65
hdfs users ... /apps/tez/tez-mapreduce-0.4.0.2.1.3.0-563.jar
66
hdfs users ... /apps/tez/tez-mapreduce-examples-0.4.0.2.1.3.0-563.jar
67
hdfs users ... /apps/tez/tez-runtime-internals-0.4.0.2.1.3.0-563.jar
68
hdfs users ... /apps/tez/tez-runtime-library-0.4.0.2.1.3.0-563.jar
69
hdfs users ... /apps/tez/tez-tests-0.4.0.2.1.3.0-563.jar
50
71
**HDFS validation from Tez Client**
51
72
1) Remote HDFS Cluster health
52
>> juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0
74
$juju run "su hdfs -c 'hdfs dfsadmin -report '" --unit hdp-tez/0
53
75
** validate the returned information **
54
76
2) Validate a successful create directory on hdfs cluster
55
>> juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0
56
3) Copy a test data file to hdfs cluster
57
>> juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0
58
4) Run Tez world-count example -
59
>> juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0
60
5) View the result save on hdfs cluster:
61
>> juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0
64
# Tez Contact Information
65
amir sanjar <amir.sanjar@canonical.com>
67
## Hortonowrks TezUpstream Project Name
78
$juju run "su hdfs -c 'hdfs dfs -mkdir /tmp'" --unit hdp-tez/0
80
3) Copy a test data file to hdfs cluster
82
$juju run "su hdfs -c 'hdfs dfs -put /home/ubuntu/pg4300.txt /tmp '" --unit hdp-tez/0
84
4) Run Tez world-count example -
86
$ juju run "/home/ubuntu/runtez_wc.sh" --unit hdp-tez/0
88
5) View the result save on hdfs cluster:
90
$juju run "su hdfs -c 'hdfs dfs -cat /tmp/pg4300.out/* '" --unit hdp-tez/0
93
## **Tez Contact Information**
94
Amir Sanjar <amir.sanjar@canonical.com>
96
## **Hortonowrks TezUpstream Project Name**
69
98
- [Upstream website] (http://hortonworks.com/)