1
by Kevin W. Monroe
adding apache-hadoop-client charm |
1 |
## Overview
|
2 |
||
3 |
The Apache Hadoop software library is a framework that allows for the |
|
4 |
distributed processing of large data sets across clusters of computers |
|
5 |
using a simple programming model. |
|
6 |
||
79
by Kevin W. Monroe
remove fqdn bits as they are no longer needed for our etc hosts entries. |
7 |
This charm plugs in to a workload charm to provide the |
45
by Cory Johns
Updated README and added README.dev |
8 |
[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/) |
77
by Cory Johns
Cleaned up READMEs and metadata from fork |
9 |
libraries and configuration for the workload to use. |
1
by Kevin W. Monroe
adding apache-hadoop-client charm |
10 |
|
11 |
## Usage
|
|
12 |
||
45
by Cory Johns
Updated README and added README.dev |
13 |
This charm is intended to be deployed via one of the |
98
by Kevin W. Monroe
remove namespace refs from readmes now that we are promulgated; update DEV-README with jujubigdata info |
14 |
[apache bundles](https://jujucharms.com/u/bigdata-charmers/#bundles). |
48
by Cory Johns
Removed double colons, for Kevin |
15 |
For example: |
45
by Cory Johns
Updated README and added README.dev |
16 |
|
96
by Kevin W. Monroe
remove dev references for production |
17 |
juju quickstart apache-analytics-sql
|
45
by Cory Johns
Updated README and added README.dev |
18 |
|
77
by Cory Johns
Cleaned up READMEs and metadata from fork |
19 |
This will deploy the Apache Hadoop platform with a workload node |
45
by Cory Johns
Updated README and added README.dev |
20 |
which is running Apache Hive to perform SQL-like queries against your data. |
21 |
||
22 |
If you wanted to also wanted to be able to analyze your data using Apache Pig, |
|
77
by Cory Johns
Cleaned up READMEs and metadata from fork |
23 |
you could deploy it and attach it to the same plugin: |
24 |
||
95.2.19
by Kevin W. Monroe
update READMEs to use promulgated locations where available |
25 |
juju deploy apache-pig pig
|
77
by Cory Johns
Cleaned up READMEs and metadata from fork |
26 |
juju add-relation plugin pig
|
1
by Kevin W. Monroe
adding apache-hadoop-client charm |
27 |
|
95.1.1
by Cory Johns
Added benchmarking (with some small modifications) from https://code.launchpad.net/~aisrael/charms/trusty/apache-hadoop-client/benchmarks/+merge/260526 |
28 |
## Benchmarking
|
29 |
||
30 |
You can perform a terasort benchmark, in order to gauge performance of your environment: |
|
31 |
||
32 |
$ juju action do plugin/0 terasort
|
|
33 |
Action queued with id: cbd981e8-3400-4c8f-8df1-c39c55a7eae6
|
|
34 |
$ juju action fetch --wait 0 cbd981e8-3400-4c8f-8df1-c39c55a7eae6
|
|
35 |
results:
|
|
36 |
meta:
|
|
37 |
composite:
|
|
38 |
direction: asc
|
|
39 |
units: ms
|
|
40 |
value: "206676"
|
|
41 |
results:
|
|
42 |
raw: '{"Total vcore-seconds taken by all map tasks": "439783", "Spilled Records":
|
|
43 |
"30000000", "WRONG_LENGTH": "0", "Reduce output records": "10000000", "HDFS:
|
|
44 |
Number of bytes read": "1000001024", "Total vcore-seconds taken by all reduce
|
|
45 |
tasks": "50275", "Reduce input groups": "10000000", "Shuffled Maps ": "8", "FILE:
|
|
46 |
Number of bytes written": "3128977482", "Input split bytes": "1024", "Total
|
|
47 |
time spent by all reduce tasks (ms)": "50275", "FILE: Number of large read operations":
|
|
48 |
"0", "Bytes Read": "1000000000", "Virtual memory (bytes) snapshot": "7688794112",
|
|
49 |
"Launched map tasks": "8", "GC time elapsed (ms)": "11656", "Bytes Written":
|
|
50 |
"1000000000", "FILE: Number of read operations": "0", "HDFS: Number of write
|
|
51 |
operations": "2", "Total megabyte-seconds taken by all reduce tasks": "51481600",
|
|
52 |
"Combine output records": "0", "HDFS: Number of bytes written": "1000000000",
|
|
53 |
"Total time spent by all map tasks (ms)": "439783", "Map output records": "10000000",
|
|
54 |
"Physical memory (bytes) snapshot": "2329722880", "FILE: Number of write operations":
|
|
55 |
"0", "Launched reduce tasks": "1", "Reduce input records": "10000000", "Total
|
|
56 |
megabyte-seconds taken by all map tasks": "450337792", "WRONG_REDUCE": "0",
|
|
57 |
"HDFS: Number of read operations": "27", "Reduce shuffle bytes": "1040000048",
|
|
58 |
"Map input records": "10000000", "Map output materialized bytes": "1040000048",
|
|
59 |
"CPU time spent (ms)": "195020", "Merged Map outputs": "8", "FILE: Number of
|
|
60 |
bytes read": "2080000144", "Failed Shuffles": "0", "Total time spent by all
|
|
61 |
maps in occupied slots (ms)": "439783", "WRONG_MAP": "0", "BAD_ID": "0", "Rack-local
|
|
62 |
map tasks": "2", "IO_ERROR": "0", "Combine input records": "0", "Map output
|
|
63 |
bytes": "1020000000", "CONNECTION": "0", "HDFS: Number of large read operations":
|
|
64 |
"0", "Total committed heap usage (bytes)": "1755840512", "Data-local map tasks":
|
|
65 |
"6", "Total time spent by all reduces in occupied slots (ms)": "50275"}'
|
|
66 |
status: completed
|
|
67 |
timing:
|
|
68 |
completed: 2015-05-28 20:55:50 +0000 UTC
|
|
69 |
enqueued: 2015-05-28 20:53:41 +0000 UTC
|
|
70 |
started: 2015-05-28 20:53:44 +0000 UTC
|
|
71 |
||
11
by Cory Johns
Added option for resource mirror |
72 |
|
73 |
## Deploying in Network-Restricted Environments
|
|
74 |
||
16
by Cory Johns
Improved README instructions for mirroring resources |
75 |
The Apache Hadoop charms can be deployed in environments with limited network |
76 |
access. To deploy in this environment, you will need a local mirror to serve |
|
77 |
the packages and resources required by these charms. |
|
11
by Cory Johns
Added option for resource mirror |
78 |
|
45
by Cory Johns
Updated README and added README.dev |
79 |
|
11
by Cory Johns
Added option for resource mirror |
80 |
### Mirroring Packages
|
81 |
||
16
by Cory Johns
Improved README instructions for mirroring resources |
82 |
You can setup a local mirror for apt packages using squid-deb-proxy. |
83 |
For instructions on configuring juju to use this, see the |
|
11
by Cory Johns
Added option for resource mirror |
84 |
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html). |
85 |
||
45
by Cory Johns
Updated README and added README.dev |
86 |
|
11
by Cory Johns
Added option for resource mirror |
87 |
### Mirroring Resources
|
88 |
||
89 |
In addition to apt packages, the Apache Hadoop charms require a few binary |
|
16
by Cory Johns
Improved README instructions for mirroring resources |
90 |
resources, which are normally hosted on Launchpad. If access to Launchpad |
91 |
is not available, the `jujuresources` library makes it easy to create a mirror
|
|
92 |
of these resources: |
|
11
by Cory Johns
Added option for resource mirror |
93 |
|
94 |
sudo pip install jujuresources
|
|
95.2.16
by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option |
95 |
juju-resources fetch --all /path/to/resources.yaml -d /tmp/resources
|
96 |
juju-resources serve -d /tmp/resources
|
|
11
by Cory Johns
Added option for resource mirror |
97 |
|
98 |
This will fetch all of the resources needed by this charm and serve them via a |
|
95.2.16
by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option |
99 |
simple HTTP server. The output from `juju-resources serve` will give you a
|
100 |
URL that you can set as the `resources_mirror` config option for this charm.
|
|
101 |
Setting this option will cause all resources required by this charm to be |
|
102 |
downloaded from the configured URL. |
|
11
by Cory Johns
Added option for resource mirror |
103 |
|
16
by Cory Johns
Improved README instructions for mirroring resources |
104 |
You can fetch the resources for all of the Apache Hadoop charms |
105 |
(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`, |
|
77
by Cory Johns
Cleaned up READMEs and metadata from fork |
106 |
`apache-hadoop-compute-slave`, `apache-hadoop-plugin`, etc) into a single |
95.2.16
by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option |
107 |
directory and serve them all with a single `juju-resources serve` instance.
|
11
by Cory Johns
Added option for resource mirror |
108 |
|
109 |
||
1
by Kevin W. Monroe
adding apache-hadoop-client charm |
110 |
## Contact Information
|
45
by Cory Johns
Updated README and added README.dev |
111 |
|
95.2.26
by Cory Johns
Update mailing list |
112 |
- <bigdata@lists.ubuntu.com>
|
45
by Cory Johns
Updated README and added README.dev |
113 |
|
1
by Kevin W. Monroe
adding apache-hadoop-client charm |
114 |
|
115 |
## Hadoop
|
|
45
by Cory Johns
Updated README and added README.dev |
116 |
|
1
by Kevin W. Monroe
adding apache-hadoop-client charm |
117 |
- [Apache Hadoop](http://hadoop.apache.org/) home page |
118 |
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html) |
|
119 |
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html) |
|
120 |
- [Apache Hadoop Juju Charm](http://jujucharms.com/?text=hadoop) |