~bigdata-dev/charms/trusty/apache-hadoop-plugin/unstable

1 by Kevin W. Monroe
adding apache-hadoop-client charm
1
## Overview
2
3
The Apache Hadoop software library is a framework that allows for the
4
distributed processing of large data sets across clusters of computers
5
using a simple programming model.
6
79 by Kevin W. Monroe
remove fqdn bits as they are no longer needed for our etc hosts entries.
7
This charm plugs in to a workload charm to provide the
45 by Cory Johns
Updated README and added README.dev
8
[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/)
77 by Cory Johns
Cleaned up READMEs and metadata from fork
9
libraries and configuration for the workload to use.
1 by Kevin W. Monroe
adding apache-hadoop-client charm
10
11
## Usage
12
45 by Cory Johns
Updated README and added README.dev
13
This charm is intended to be deployed via one of the
98 by Kevin W. Monroe
remove namespace refs from readmes now that we are promulgated; update DEV-README with jujubigdata info
14
[apache bundles](https://jujucharms.com/u/bigdata-charmers/#bundles).
48 by Cory Johns
Removed double colons, for Kevin
15
For example:
45 by Cory Johns
Updated README and added README.dev
16
96 by Kevin W. Monroe
remove dev references for production
17
    juju quickstart apache-analytics-sql
45 by Cory Johns
Updated README and added README.dev
18
77 by Cory Johns
Cleaned up READMEs and metadata from fork
19
This will deploy the Apache Hadoop platform with a workload node
45 by Cory Johns
Updated README and added README.dev
20
which is running Apache Hive to perform SQL-like queries against your data.
21
22
If you wanted to also wanted to be able to analyze your data using Apache Pig,
77 by Cory Johns
Cleaned up READMEs and metadata from fork
23
you could deploy it and attach it to the same plugin:
24
95.2.19 by Kevin W. Monroe
update READMEs to use promulgated locations where available
25
    juju deploy apache-pig pig
77 by Cory Johns
Cleaned up READMEs and metadata from fork
26
    juju add-relation plugin pig
1 by Kevin W. Monroe
adding apache-hadoop-client charm
27
95.1.1 by Cory Johns
Added benchmarking (with some small modifications) from https://code.launchpad.net/~aisrael/charms/trusty/apache-hadoop-client/benchmarks/+merge/260526
28
## Benchmarking
29
30
    You can perform a terasort benchmark, in order to gauge performance of your environment:
31
32
        $ juju action do plugin/0 terasort
33
        Action queued with id: cbd981e8-3400-4c8f-8df1-c39c55a7eae6
34
        $ juju action fetch --wait 0 cbd981e8-3400-4c8f-8df1-c39c55a7eae6
35
        results:
36
          meta:
37
            composite:
38
              direction: asc
39
              units: ms
40
              value: "206676"
41
          results:
42
            raw: '{"Total vcore-seconds taken by all map tasks": "439783", "Spilled Records":
43
              "30000000", "WRONG_LENGTH": "0", "Reduce output records": "10000000", "HDFS:
44
              Number of bytes read": "1000001024", "Total vcore-seconds taken by all reduce
45
              tasks": "50275", "Reduce input groups": "10000000", "Shuffled Maps ": "8", "FILE:
46
              Number of bytes written": "3128977482", "Input split bytes": "1024", "Total
47
              time spent by all reduce tasks (ms)": "50275", "FILE: Number of large read operations":
48
              "0", "Bytes Read": "1000000000", "Virtual memory (bytes) snapshot": "7688794112",
49
              "Launched map tasks": "8", "GC time elapsed (ms)": "11656", "Bytes Written":
50
              "1000000000", "FILE: Number of read operations": "0", "HDFS: Number of write
51
              operations": "2", "Total megabyte-seconds taken by all reduce tasks": "51481600",
52
              "Combine output records": "0", "HDFS: Number of bytes written": "1000000000",
53
              "Total time spent by all map tasks (ms)": "439783", "Map output records": "10000000",
54
              "Physical memory (bytes) snapshot": "2329722880", "FILE: Number of write operations":
55
              "0", "Launched reduce tasks": "1", "Reduce input records": "10000000", "Total
56
              megabyte-seconds taken by all map tasks": "450337792", "WRONG_REDUCE": "0",
57
              "HDFS: Number of read operations": "27", "Reduce shuffle bytes": "1040000048",
58
              "Map input records": "10000000", "Map output materialized bytes": "1040000048",
59
              "CPU time spent (ms)": "195020", "Merged Map outputs": "8", "FILE: Number of
60
              bytes read": "2080000144", "Failed Shuffles": "0", "Total time spent by all
61
              maps in occupied slots (ms)": "439783", "WRONG_MAP": "0", "BAD_ID": "0", "Rack-local
62
              map tasks": "2", "IO_ERROR": "0", "Combine input records": "0", "Map output
63
              bytes": "1020000000", "CONNECTION": "0", "HDFS: Number of large read operations":
64
              "0", "Total committed heap usage (bytes)": "1755840512", "Data-local map tasks":
65
              "6", "Total time spent by all reduces in occupied slots (ms)": "50275"}'
66
        status: completed
67
        timing:
68
          completed: 2015-05-28 20:55:50 +0000 UTC
69
          enqueued: 2015-05-28 20:53:41 +0000 UTC
70
          started: 2015-05-28 20:53:44 +0000 UTC
71
11 by Cory Johns
Added option for resource mirror
72
73
## Deploying in Network-Restricted Environments
74
16 by Cory Johns
Improved README instructions for mirroring resources
75
The Apache Hadoop charms can be deployed in environments with limited network
76
access. To deploy in this environment, you will need a local mirror to serve
77
the packages and resources required by these charms.
11 by Cory Johns
Added option for resource mirror
78
45 by Cory Johns
Updated README and added README.dev
79
11 by Cory Johns
Added option for resource mirror
80
### Mirroring Packages
81
16 by Cory Johns
Improved README instructions for mirroring resources
82
You can setup a local mirror for apt packages using squid-deb-proxy.
83
For instructions on configuring juju to use this, see the
11 by Cory Johns
Added option for resource mirror
84
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
85
45 by Cory Johns
Updated README and added README.dev
86
11 by Cory Johns
Added option for resource mirror
87
### Mirroring Resources
88
89
In addition to apt packages, the Apache Hadoop charms require a few binary
16 by Cory Johns
Improved README instructions for mirroring resources
90
resources, which are normally hosted on Launchpad. If access to Launchpad
91
is not available, the `jujuresources` library makes it easy to create a mirror
92
of these resources:
11 by Cory Johns
Added option for resource mirror
93
94
    sudo pip install jujuresources
95.2.16 by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option
95
    juju-resources fetch --all /path/to/resources.yaml -d /tmp/resources
96
    juju-resources serve -d /tmp/resources
11 by Cory Johns
Added option for resource mirror
97
98
This will fetch all of the resources needed by this charm and serve them via a
95.2.16 by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option
99
simple HTTP server. The output from `juju-resources serve` will give you a
100
URL that you can set as the `resources_mirror` config option for this charm.
101
Setting this option will cause all resources required by this charm to be
102
downloaded from the configured URL.
11 by Cory Johns
Added option for resource mirror
103
16 by Cory Johns
Improved README instructions for mirroring resources
104
You can fetch the resources for all of the Apache Hadoop charms
105
(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
77 by Cory Johns
Cleaned up READMEs and metadata from fork
106
`apache-hadoop-compute-slave`, `apache-hadoop-plugin`, etc) into a single
95.2.16 by Kevin W. Monroe
be more explicit with how to use the resources_mirror config option
107
directory and serve them all with a single `juju-resources serve` instance.
11 by Cory Johns
Added option for resource mirror
108
109
1 by Kevin W. Monroe
adding apache-hadoop-client charm
110
## Contact Information
45 by Cory Johns
Updated README and added README.dev
111
95.2.26 by Cory Johns
Update mailing list
112
- <bigdata@lists.ubuntu.com>
45 by Cory Johns
Updated README and added README.dev
113
1 by Kevin W. Monroe
adding apache-hadoop-client charm
114
115
## Hadoop
45 by Cory Johns
Updated README and added README.dev
116
1 by Kevin W. Monroe
adding apache-hadoop-client charm
117
- [Apache Hadoop](http://hadoop.apache.org/) home page
118
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
119
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
120
- [Apache Hadoop Juju Charm](http://jujucharms.com/?text=hadoop)