1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
## Overview
The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using a simple programming model.
This charm deploys a client node running
[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/)
from which workloads can be manually run.
## Usage
This charm is intended to be deployed as a part of the
[core bundle](https://jujucharms.com/u/bigdata-dev/apache-core-batch-processing/):
juju quickstart u/bigdata-dev/apache-core-batch-processing
This will deploy the Apache Hadoop platform with a single client unit.
From there, you can manually load and run map-reduce jobs:
juju scp my-job.jar client/0:
juju ssh client/0
hadoop jar my-job.jar
## Benchmarking
You can perform a terasort benchmark, in order to gauge performance of your environment:
$ juju action do apache-hadoop-client/0 terasort
Action queued with id: cbd981e8-3400-4c8f-8df1-c39c55a7eae6
$ juju action fetch --wait 0 cbd981e8-3400-4c8f-8df1-c39c55a7eae6
results:
meta:
composite:
direction: asc
units: ms
value: "206676"
results:
raw: '{"Total vcore-seconds taken by all map tasks": "439783", "Spilled Records":
"30000000", "WRONG_LENGTH": "0", "Reduce output records": "10000000", "HDFS:
Number of bytes read": "1000001024", "Total vcore-seconds taken by all reduce
tasks": "50275", "Reduce input groups": "10000000", "Shuffled Maps ": "8", "FILE:
Number of bytes written": "3128977482", "Input split bytes": "1024", "Total
time spent by all reduce tasks (ms)": "50275", "FILE: Number of large read operations":
"0", "Bytes Read": "1000000000", "Virtual memory (bytes) snapshot": "7688794112",
"Launched map tasks": "8", "GC time elapsed (ms)": "11656", "Bytes Written":
"1000000000", "FILE: Number of read operations": "0", "HDFS: Number of write
operations": "2", "Total megabyte-seconds taken by all reduce tasks": "51481600",
"Combine output records": "0", "HDFS: Number of bytes written": "1000000000",
"Total time spent by all map tasks (ms)": "439783", "Map output records": "10000000",
"Physical memory (bytes) snapshot": "2329722880", "FILE: Number of write operations":
"0", "Launched reduce tasks": "1", "Reduce input records": "10000000", "Total
megabyte-seconds taken by all map tasks": "450337792", "WRONG_REDUCE": "0",
"HDFS: Number of read operations": "27", "Reduce shuffle bytes": "1040000048",
"Map input records": "10000000", "Map output materialized bytes": "1040000048",
"CPU time spent (ms)": "195020", "Merged Map outputs": "8", "FILE: Number of
bytes read": "2080000144", "Failed Shuffles": "0", "Total time spent by all
maps in occupied slots (ms)": "439783", "WRONG_MAP": "0", "BAD_ID": "0", "Rack-local
map tasks": "2", "IO_ERROR": "0", "Combine input records": "0", "Map output
bytes": "1020000000", "CONNECTION": "0", "HDFS: Number of large read operations":
"0", "Total committed heap usage (bytes)": "1755840512", "Data-local map tasks":
"6", "Total time spent by all reduces in occupied slots (ms)": "50275"}'
status: completed
timing:
completed: 2015-05-28 20:55:50 +0000 UTC
enqueued: 2015-05-28 20:53:41 +0000 UTC
started: 2015-05-28 20:53:44 +0000 UTC
## Contact Information
- [bigdata-dev@lists.launchpad.net](mailto:bigdata-dev@lists.launchpad.net)
## Hadoop
- [Apache Hadoop](http://hadoop.apache.org/) home page
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
- [Apache Hadoop Juju Charm](http://jujucharms.com/?text=hadoop)
|