~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk : contents of README.md at revision 89

~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk : (revision 89)
## Overview

The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using a simple programming model.

This charm deploys a compute / slave node running the NodeManager
and DataNode components of
[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/),
which provides computation and storage resources to the platform.

## Usage

This charm is intended to be deployed via one of the
[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
For example:

    juju quickstart u/bigdata-dev/apache-analytics-sql

This will deploy the Apache Hadoop platform with Apache Hive available to
perform SQL-like queries against your data.

You can also manually load and run map-reduce jobs via the plugin charm
included in the bigdata bundles linked above:

    juju scp my-job.jar plugin/0:
    juju ssh plugin/0
    hadoop jar my-job.jar


### Scaling

The compute-slave node is the "workhorse" of the Apache Hadoop platform.
To scale your deployment's performance, you can simply add more compute-slave
units.  For example, to add three mode units:

    juju add-unit compute-slave -n 3


## Deploying in Network-Restricted Environments

The Apache Hadoop charms can be deployed in environments with limited network
access. To deploy in this environment, you will need a local mirror to serve
the packages and resources required by these charms.


### Mirroring Packages

You can setup a local mirror for apt packages using squid-deb-proxy.
For instructions on configuring juju to use this, see the
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).


### Mirroring Resources

In addition to apt packages, the Apache Hadoop charms require a few binary
resources, which are normally hosted on Launchpad. If access to Launchpad
is not available, the `jujuresources` library makes it easy to create a mirror
of these resources:

    sudo pip install jujuresources
    juju resources fetch --all apache-hadoop-compute-slave/resources.yaml -d /tmp/resources
    juju resources serve -d /tmp/resources

This will fetch all of the resources needed by this charm and serve them via a
simple HTTP server. You can then set the `resources_mirror` config option to
have the charm use this server for retrieving resources.

You can fetch the resources for all of the Apache Hadoop charms
(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
`apache-hadoop-hdfs-secondary`, `apache-hadoop-plugin`, etc) into a single
directory and serve them all with a single `juju resources serve` instance.


## Contact Information

- <bigdata-dev@lists.launchpad.net>


## Hadoop

- [Apache Hadoop](http://hadoop.apache.org/) home page
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
- [Apache Hadoop Juju Charm](http://jujucharms.com/?text=hadoop)