~craig.magina/charms/trusty/apache-hadoop/trunk

« back to all changes in this revision

Viewing changes to configx86.yaml

Committer: amir sanjar
Date: 2014-12-03 21:43:22 UTC
Revision ID: amir.sanjar@canonical.com-20141203214322-bi3miv3afb6n97co

Apache hadoop changes for POWER-part 1

files added:
configppcel.yaml

configx86.yaml

files/archives/POWER/hadoop-2.4.1-ppc64le.tar.gz

files/archives/POWER/ibm-java-ppc64le-sdk-7.1-2.0.bin

files modified:
README.md

config.yaml

hooks/hadoop-common

Show diffs side-by-side

added added

removed removed

configx86.yaml

options:

platform_arch:

type: string

default: x86

description: |

Platform architecture: currently supported platforms are x86 and POWER.

Note: for POWER you **MUST** select IBM as the Java vendor

JAVA_vendor:

type: string

default: openjdk

description: |

Valid selections-"IBM" for POWER and "OPENJDK" for x86

JAVA_version:

type: int

default: 7

description: |

Enter java version

hadoop_version:

type: string

default: 2.2.0

description: |

Apache hadoop version

dfs_namenode_handler_count:

type: int

default: 10

description: |

The number of server threads for the namenode. Increase this in larger

deployments to ensure the namenode can cope with the number of datanodes

that it has to deal with.

dfs_replication:

type: int

default: 3

description: |

Default block replication. The actual number of replications can be specified when

the file is created. The default is used if replication is not specified in create time

dfs_block_size:

type: int

default: 134217728

description: |

The default block size for new files (default to 64MB). Increase this in

larger deployments for better large data set performance.

io_file_buffer_size:

type: int

default: 4096

description: |

The size of buffer for use in sequence files. The size of this buffer should

probably be a multiple of hardware page size (4096 on Intel x86), and it

determines how much data is buffered during read and write operations.

dfs_datanode_max_xcievers:

type: int

default: 4096

description: |

The number of files that an datanode will serve at any one time.

An Hadoop HDFS datanode has an upper bound on the number of files that it

will serve at any one time. This defaults to 256 (which is low) in hadoop

1.x - however this charm increases that to 4096.

mapreduce_framework_name:

type: string

default: yarn

description: |

Execution framework set to Hadoop YARN.** DO NOT CHANGE **

mapreduce_reduce_shuffle_parallelcopies:

type: int

default: 5

description: |

The default number of parallel transfers run by reduce during the

copy(shuffle) phase.

mapred_child_java_opts:

type: string

default: -Xmx200m

description: |

Java opts for the task tracker child processes. The following symbol,

if present, will be interpolated: @taskid@ is replaced by current TaskID.

Any other occurrences of '@' will go unchanged. For example, to enable

verbose gc logging to a file named for the taskid in /tmp and to set

the heap maximum to be a gigabyte, pass a 'value' of:

-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc

The configuration variable mapred.child.ulimit can be used to control

the maximum virtual memory of the child processes.

mapreduce_task_io_sort_factor:

type: int

default: 10

description: |

More streams merged at once while sorting files.. This

determines the number of open file handles.

mapreduce_task_io_sort_mb:

type: int

default: 100

description: |

Higher memory-limit while sorting data for efficiency..

mapred_job_tracker_handler_count:

type: int

default: 10

description: |

The number of server threads for the JobTracker. This should be roughly

4% of the number of tasktracker nodes.

tasktracker_http_threads:

100

type: int

101

default: 40

102

description: |

103

The number of worker threads that for the http server. This is used for

104

map output fetching.

105

hadoop_dir_base:

106

type: string

107

default: /usr/local/hadoop/data

108

description: |

109

The directory under which all other hadoop data is stored. Use this

110

to take advantage of extra storage that might be avaliable.

111

112

You can change this in a running deployment but all existing data in

113

HDFS will be inaccessible; you can of course switch it back if you

114

do this by mistake.

115

yarn_nodemanager_aux-services:

116

type: string

117

default: mapreduce_shuffle

118

description: |

119

Shuffle service that needs to be set for Map Reduce applications.

120

yarn_nodemanager_aux-services_mapreduce_shuffle_class:

121

type: string

122

default: org.apache.hadoop.mapred.ShuffleHandler

123

description: |

124

Shuffle service that needs to be set for Map Reduce applications.

125

dfs_heartbeat_interval:

126

type: int

127

default: 3

128

description: |

129

Determines datanode heartbeat interval in seconds.

130

dfs_namenode_heartbeat_recheck_interval:

131

type: int

132

default: 300000

133

description: |

134

Determines datanode recheck heartbeat interval in milliseconds

135

It is used to calculate the final tineout value for namenode. Calcultion process is

136

as follow: 10.30 minutes = 2 x (dfs.namenode.heartbeat.recheck-interval=5*60*1000)

137

+ 10 * 1000 * (dfs.heartbeat.interval=3)

Older »