~smoser/cloud-init/lp-1077700

Viewing changes to cloudinit/user_data.py

Committer: Scott Moser
Date: 2012-07-06 21:19:37 UTC
mfrom: (559.2.436 cloud-init)
Revision ID: smoser@ubuntu.com-20120706211937-i4bhe6ncje8vg0m7

Merge rework branch in [Joshua Harlow]

- unified binary that activates the various stages
   - Now using argparse + subcommands to specify the various CLI options
- a stage module that clearly separates the stages of the different
   components (also described how they are used and in what order in the
   new unified binary)
- user_data is now a module that just does user data processing while the
   actual activation and 'handling' of the processed user data is done via
   a separate set of files (and modules) with the main 'init' stage being the
   controller of this
   - creation of boot_hook, cloud_config, shell_script, upstart_job version 2
     modules (with classes that perform there functionality) instead of those
     having functionality that is attached to the cloudinit object (which
     reduces reuse and limits future functionality, and makes testing harder)
- removal of global config that defined paths, shared config, now this is
   via objects making unit testing testing and global side-effects a non issue
- creation of a 'helpers.py'
   - this contains an abstraction for the 'lock' like objects that the various
     module/handler running stages use to avoid re-running a given
     module/handler for a given frequency. this makes it separated from
     the actual usage of that object (thus helpful for testing and clear lines
     usage and how the actual job is accomplished)
     - a common 'runner' class is the main entrypoint using these locks to
       run function objects passed in (along with there arguments) and there
       frequency
   - add in a 'paths' object that provides access to the previously global
     and/or config based paths (thus providing a single entrypoint object/type
     that provides path information)
       - this also adds in the ability to change the path when constructing
       that path 'object' and adding in additional config that can be used to
       alter the root paths of 'joins' (useful for testing or possibly useful
       in chroots?)
        - config options now avaiable that can alter the 'write_root' and the
         'read_root' when backing code uses the paths join() function
   - add a config parser subclass that will automatically add unknown sections
     and return default values (instead of throwing exceptions for these cases)
   - a new config merging class that will be the central object that knows
     how to do the common configuration merging from the various configuration
     sources. The order is the following:
     - cli config files override environment config files
       which override instance configs which override datasource
       configs which override base configuration which overrides
       default configuration.
- remove the passing around of the 'cloudinit' object as a 'cloud' variable
   and instead pass around an 'interface' object that can be given to modules
   and handlers as there cloud access layer while the backing of that
   object can be varied (good for abstraction and testing)
- use a single set of functions to do importing of modules
- add a function in which will search for a given set of module names with
   a given set of attributes and return those which are found
- refactor logging so that instead of using a single top level 'log' that
   instead each component/module can use its own logger (if desired), this
   should be backwards compatible with handlers and config modules that used
   the passed in logger (its still passed in)
   - ensure that all places where exception are caught and where applicable
     that the util logexc() is called, so that no exceptions that may occur
     are dropped without first being logged (where it makes sense for this
     to happen)
- add a 'requires' file that lists cloud-init dependencies
   - applying it in package creation (bdeb and brpm) as well as using it
     in the modified setup.py to ensure dependencies are installed when
     using that method of packaging
- add a 'version.py' that lists the active version (in code) so that code
   inside cloud-init can report the version in messaging and other config files
- cleanup of subprocess usage so that all subprocess calls go through the
   subp() utility method, which now has an exception type that will provide
   detailed information on python 2.6 and 2.7
- forced all code loading, moving, chmod, writing files and other system
   level actions to go through standard set of util functions, this greatly
   helps in debugging and determining exactly which system actions cloud-init is
   performing
- switching out the templating engine cheetah for tempita since tempita has
   no external dependencies (minus python) while cheetah has many dependencies
   which makes it more difficult to adopt cloud-init in distros that may not
   have those dependencies
- adjust url fetching and url trying to go through a single function that
   reads urls in the new 'url helper' file, this helps in tracing, debugging
   and knowing which urls are being called and/or posted to from with-in
   cloud-init code
   - add in the sending of a 'User-Agent' header for all urls fetched that
     do not provide there own header mapping, derive this user-agent from
     the following template, 'Cloud-Init/{version}' where the version is the
     cloud-init version number
- using prettytable for netinfo 'debug' printing since it provides a standard
   and defined output that should be easier to parse than a custom format
- add a set of distro specific classes, that handle distro specific actions
   that modules and or handler code can use as needed, this is organized into
   a base abstract class with child classes that implement the shared
   functionality. config determines exactly which subclass to load, so it can
   be easily extended as needed.
   - current functionality
      - network interface config file writing
      - hostname setting/updating
      - locale/timezone/ setting
      - updating of /etc/hosts (with templates or generically)
      - package commands (ie installing, removing)/mirror finding
      - interface up/down activating
   - implemented a debian + ubuntu subclass
   - implemented a redhat + fedora subclass
- adjust the root 'cloud.cfg' file to now have distrobution/path specific
   configuration values in it. these special configs are merged as the normal
   config is, but the system level config is not passed into modules/handlers
   - modules/handlers must go through the path and distro object instead
- have the cloudstack datasource test the url before calling into boto to
   avoid the long wait for boto to finish retrying and finally fail when
   the gateway meta-data address is unavailable
- add a simple mock ec2 meta-data python based http server that can serve a
   very simple set of ec2 meta-data back to callers
      - useful for testing or for understanding what the ec2 meta-data
        service can provide in terms of data or functionality
- for ssh key and authorized key file parsing add in classes and util functions
   that maintain the state of individual lines, allowing for a clearer
   separation of parsing and modification (useful for testing and tracing)
- add a set of 'base' init.d scripts that can be used on systems that do
   not have full upstart or systemd support (or support that does not match
   the standard fedora/ubuntu implementation)
   - currently these are being tested on RHEL 6.2
- separate the datasources into there own subdirectory (instead of being
   a top-level item), this matches how config 'modules' and user-data 'handlers'
   are also in there own subdirectory (thus helping new developers and others
   understand the code layout in a quicker manner)
- add the building of rpms based off a new cli tool and template 'spec' file
   that will templatize and perform the necessary commands to create a source
   and binary package to be used with a cloud-init install on a 'rpm' supporting
   system
   - uses the new standard set of requires and converts those pypi requirements
     into a local set of package requirments (that are known to exist on RHEL
     systems but should also exist on fedora systems)
- adjust the bdeb builder to be a python script (instead of a shell script) and
   make its 'control' file a template that takes in the standard set of pypi
   dependencies and uses a local mapping (known to work on ubuntu) to create the
   packages set of dependencies (that should also work on ubuntu-like systems)
- pythonify a large set of various pieces of code
   - remove wrapping return statements with () when it has no effect
   - upper case all constants used
   - correctly 'case' class and method names (where applicable)
   - use os.path.join (and similar commands) instead of custom path creation
   - use 'is None' instead of the frowned upon '== None' which picks up a large
     set of 'true' cases than is typically desired (ie for objects that have
     there own equality)
   - use context managers on locks, tempdir, chdir, file, selinux, umask,
     unmounting commands so that these actions do not have to be closed and/or
     cleaned up manually in finally blocks, which is typically not done and will
     eventually be a bug in the future
   - use the 'abc' module for abstract classes base where possible
      - applied in the datasource root class, the distro root class, and the
        user-data v2 root class
- when loading yaml, check that the 'root' type matches a predefined set of
   valid types (typically just 'dict') and throw a type error if a mismatch
   occurs, this seems to be a good idea to do when loading user config files
- when forking a long running task (ie resizing a filesytem) use a new util
   function that will fork and then call a callback, instead of having to
   implement all that code in a non-shared location (thus allowing it to be
   used by others in the future)
- when writing out filenames, go through a util function that will attempt to
   ensure that the given filename is 'filesystem' safe by replacing '/' with
   '_' and removing characters which do not match a given whitelist of allowed
   filename characters
- for the varying usages of the 'blkid' command make a function in the util
   module that can be used as the single point of entry for interaction with
   that command (and its results) instead of having X separate implementations
- place the rfc 8222 time formatting and uptime repeated pieces of code in the
   util module as a set of function with the name 'time_rfc2822'/'uptime'
- separate the pylint+pep8 calling from one tool into two indivudal tools so
   that they can be called independently, add make file sections that can be
   used to call these independently
- remove the support for the old style config that was previously located in
   '/etc/ec2-init/ec2-config.cfg', no longer supported!
- instead of using a altered config parser that added its own 'dummy' section
   on in the 'mcollective' module, use configobj which handles the parsing of
   config without sections better (and it also maintains comments instead of
   removing them)
- use the new defaulting config parser (that will not raise errors on sections
   that do not exist or return errors when values are fetched that do not exist)
   in the 'puppet' module
- for config 'modules' add in the ability for the module to provide a list of
   distro names which it is known to work with, if when ran and the distro being
   used name does not match one of those in this list, a warning will be written
   out saying that this module may not work correctly on this distrobution
- for all dynamically imported modules ensure that they are fixed up before
   they are used by ensuring that they have certain attributes, if they do not
   have those attributes they will be set to a sensible set of defaults instead
- adjust all 'config' modules and handlers to use the adjusted util functions
   and the new distro objects where applicable so that those pieces of code can
   benefit from the unified and enhanced functionality being provided in that
   util module
- fix a potential bug whereby when a #includeonce was encountered it would
   enable checking of urls against a cache, if later a #include was encountered
   it would continue checking against that cache, instead of refetching (which
   would likely be the expected case)
- add a openstack/nova based pep8 extension utility ('hacking.py') that allows
   for custom checks (along with the standard pep8 checks) to occur when running
   'make pep8' and its derivatives

files added:
Requires

bin/cloud-init

cloudinit/cloud.py

cloudinit/distros

cloudinit/distros/__init__.py

cloudinit/distros/debian.py

cloudinit/distros/fedora.py

cloudinit/distros/rhel.py

cloudinit/distros/ubuntu.py

cloudinit/handlers

cloudinit/handlers/__init__.py

cloudinit/handlers/boot_hook.py

cloudinit/handlers/cloud_config.py

cloudinit/handlers/shell_script.py

cloudinit/handlers/upstart_job.py

cloudinit/helpers.py

cloudinit/importer.py

cloudinit/log.py

cloudinit/settings.py

cloudinit/sources

cloudinit/sources/__init__.py

cloudinit/stages.py

cloudinit/templater.py

cloudinit/url_helper.py

cloudinit/user_data.py

cloudinit/version.py

packages

packages/brpm

packages/make-tarball

packages/redhat

packages/redhat/cloud-init.spec

sysvinit

sysvinit/cloud-config

sysvinit/cloud-final

sysvinit/cloud-init

sysvinit/cloud-init-local

templates/hosts.redhat.tmpl

tests/configs

tests/configs/sample1.yaml

tests/unittests/test_builtin_handlers.py

tools/hacking.py

tools/mock-meta.py

tools/read-dependencies

tools/read-version

tools/run-pep8

files removed:
cloud-init-cfg.py

cloud-init-query.py

cloud-init.py

cloudinit/DataSource.py

cloudinit/UserDataHandler.py

install.sh

templates/default-locale.tmpl

files renamed:
cloudinit/CloudConfig/ => cloudinit/config/

cloudinit/DataSourceCloudStack.py => cloudinit/sources/DataSourceCloudStack.py

cloudinit/DataSourceConfigDrive.py => cloudinit/sources/DataSourceConfigDrive.py

cloudinit/DataSourceEc2.py => cloudinit/sources/DataSourceEc2.py

cloudinit/DataSourceMAAS.py => cloudinit/sources/DataSourceMAAS.py

cloudinit/DataSourceNoCloud.py => cloudinit/sources/DataSourceNoCloud.py

cloudinit/DataSourceOVF.py => cloudinit/sources/DataSourceOVF.py

cloudinit/SshUtil.py => cloudinit/ssh_util.py

tools/bddeb => packages/bddeb

debian.trunk/ => packages/debian/

tools/make-dist-tarball => packages/make-dist-tarball

templates/hosts.tmpl => templates/hosts.ubuntu.tmpl

files modified:
ChangeLog

Makefile

TODO

cloudinit/__init__.py

cloudinit/config/__init__.py

cloudinit/config/cc_apt_pipelining.py

cloudinit/config/cc_apt_update_upgrade.py

cloudinit/config/cc_bootcmd.py

cloudinit/config/cc_byobu.py

cloudinit/config/cc_ca_certs.py

cloudinit/config/cc_chef.py

cloudinit/config/cc_disable_ec2_metadata.py

cloudinit/config/cc_final_message.py

cloudinit/config/cc_foo.py

cloudinit/config/cc_grub_dpkg.py

cloudinit/config/cc_keys_to_console.py

cloudinit/config/cc_landscape.py

cloudinit/config/cc_locale.py

cloudinit/config/cc_mcollective.py

cloudinit/config/cc_mounts.py

cloudinit/config/cc_phone_home.py

cloudinit/config/cc_puppet.py

cloudinit/config/cc_resizefs.py

cloudinit/config/cc_rightscale_userdata.py

cloudinit/config/cc_rsyslog.py

cloudinit/config/cc_runcmd.py

cloudinit/config/cc_salt_minion.py

cloudinit/config/cc_scripts_per_boot.py

cloudinit/config/cc_scripts_per_instance.py

cloudinit/config/cc_scripts_per_once.py

cloudinit/config/cc_scripts_user.py

cloudinit/config/cc_set_hostname.py

cloudinit/config/cc_set_passwords.py

cloudinit/config/cc_ssh.py

cloudinit/config/cc_ssh_import_id.py

cloudinit/config/cc_timezone.py

cloudinit/config/cc_update_etc_hosts.py

cloudinit/config/cc_update_hostname.py

cloudinit/netinfo.py

cloudinit/util.py

config/cloud.cfg

config/cloud.cfg.d/05_logging.cfg

packages/debian/changelog

packages/debian/control

packages/debian/rules

setup.py

templates/chef_client.rb.tmpl

templates/sources.list.tmpl

tests/unittests/test__init__.py

tests/unittests/test_datasource/test_maas.py

tests/unittests/test_handler/test_handler_ca_certs.py

tests/unittests/test_userdata.py

tests/unittests/test_util.py

tools/run-pylint

upstart/cloud-config.conf

upstart/cloud-final.conf

upstart/cloud-init-local.conf

upstart/cloud-init.conf

Show diffs side-by-side

added added

removed removed

cloudinit/user_data.py

# vi: ts=4 expandtab

# Author: Scott Moser <scott.moser@canonical.com>

# Author: Juerg Haefliger <juerg.haefliger@hp.com>

# Author: Joshua Harlow <harlowja@yahoo-inc.com>

# This program is free software: you can redistribute it and/or modify

# it under the terms of the GNU General Public License version 3, as

# published by the Free Software Foundation.

# This program is distributed in the hope that it will be useful,

# but WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License

# along with this program. If not, see <http://www.gnu.org/licenses/>.

import os

import email

from email.mime.multipart import MIMEMultipart

from email.mime.text import MIMEText

from email.mime.base import MIMEBase

from cloudinit import handlers

from cloudinit import log as logging

from cloudinit import url_helper

from cloudinit import util

LOG = logging.getLogger(__name__)

# Constants copied in from the handler module

NOT_MULTIPART_TYPE = handlers.NOT_MULTIPART_TYPE

PART_FN_TPL = handlers.PART_FN_TPL

OCTET_TYPE = handlers.OCTET_TYPE

# Saves typing errors

CONTENT_TYPE = 'Content-Type'

# Various special content types that cause special actions

TYPE_NEEDED = ["text/plain", "text/x-not-multipart"]

INCLUDE_TYPES = ['text/x-include-url', 'text/x-include-once-url']

ARCHIVE_TYPES = ["text/cloud-config-archive"]

UNDEF_TYPE = "text/plain"

ARCHIVE_UNDEF_TYPE = "text/cloud-config"

# Msg header used to track attachments

ATTACHMENT_FIELD = 'Number-Attachments'

class UserDataProcessor(object):

def __init__(self, paths):

self.paths = paths

def process(self, blob):

base_msg = convert_string(blob)

process_msg = MIMEMultipart()

self._process_msg(base_msg, process_msg)

return process_msg

def _process_msg(self, base_msg, append_msg):

for part in base_msg.walk():

# multipart/* are just containers

if part.get_content_maintype() == 'multipart':

continue

ctype = None

ctype_orig = part.get_content_type()

payload = part.get_payload(decode=True)

if not ctype_orig:

ctype_orig = UNDEF_TYPE

if ctype_orig in TYPE_NEEDED:

ctype = handlers.type_from_starts_with(payload)

if ctype is None:

ctype = ctype_orig

if ctype in INCLUDE_TYPES:

self._do_include(payload, append_msg)

continue

if ctype in ARCHIVE_TYPES:

self._explode_archive(payload, append_msg)

continue

if CONTENT_TYPE in base_msg:

base_msg.replace_header(CONTENT_TYPE, ctype)

else:

base_msg[CONTENT_TYPE] = ctype

self._attach_part(append_msg, part)

100

def _get_include_once_filename(self, entry):

101

entry_fn = util.hash_blob(entry, 'md5', 64)

102

return os.path.join(self.paths.get_ipath_cur('data'),

103

'urlcache', entry_fn)

104

105

def _do_include(self, content, append_msg):

106

# Include a list of urls, one per line

107

# also support '#include <url here>'

108

# or #include-once '<url here>'

109

include_once_on = False

110

for line in content.splitlines():

111

lc_line = line.lower()

112

if lc_line.startswith("#include-once"):

113

line = line[len("#include-once"):].lstrip()

114

# Every following include will now

115

# not be refetched.... but will be

116

# re-read from a local urlcache (if it worked)

117

include_once_on = True

118

elif lc_line.startswith("#include"):

119

line = line[len("#include"):].lstrip()

120

# Disable the include once if it was on

121

# if it wasn't, then this has no effect.

122

include_once_on = False

123

if line.startswith("#"):

124

continue

125

include_url = line.strip()

126

if not include_url:

127

continue

128

129

include_once_fn = None

130

content = None

131

if include_once_on:

132

include_once_fn = self._get_include_once_filename(include_url)

133

if include_once_on and os.path.isfile(include_once_fn):

134

content = util.load_file(include_once_fn)

135

else:

136

resp = url_helper.readurl(include_url)

137

if include_once_on and resp.ok():

138

util.write_file(include_once_fn, str(resp), mode=0600)

139

if resp.ok():

140

content = str(resp)

141

else:

142

LOG.warn(("Fetching from %s resulted in"

143

" a invalid http code of %s"),

144

include_url, resp.code)

145

146

if content is not None:

147

new_msg = convert_string(content)

148

self._process_msg(new_msg, append_msg)

149

150

def _explode_archive(self, archive, append_msg):

151

entries = util.load_yaml(archive, default=[], allowed=[list, set])

152

for ent in entries:

153

# ent can be one of:

154

# dict { 'filename' : 'value', 'content' :

155

# 'value', 'type' : 'value' }

156

# filename and type not be present

157

# or

158

# scalar(payload)

159

if isinstance(ent, (str, basestring)):

160

ent = {'content': ent}

161

if not isinstance(ent, (dict)):

162

# TODO raise?

163

continue

164

165

content = ent.get('content', '')

166

mtype = ent.get('type')

167

if not mtype:

168

mtype = handlers.type_from_starts_with(content,

169

ARCHIVE_UNDEF_TYPE)

170

171

maintype, subtype = mtype.split('/', 1)

172

if maintype == "text":

173

msg = MIMEText(content, _subtype=subtype)

174

else:

175

msg = MIMEBase(maintype, subtype)

176

msg.set_payload(content)

177

178

if 'filename' in ent:

179

msg.add_header('Content-Disposition',

180

'attachment', filename=ent['filename'])

181

182

for header in list(ent.keys()):

183

if header in ('content', 'filename', 'type'):

184

continue

185

msg.add_header(header, ent['header'])

186

187

self._attach_part(append_msg, msg)

188

189

def _multi_part_count(self, outer_msg, new_count=None):

190

"""

191

Return the number of attachments to this MIMEMultipart by looking

192

at its 'Number-Attachments' header.

193

"""

194

if ATTACHMENT_FIELD not in outer_msg:

195

outer_msg[ATTACHMENT_FIELD] = '0'

196

197

if new_count is not None:

198

outer_msg.replace_header(ATTACHMENT_FIELD, str(new_count))

199

200

fetched_count = 0

201

try:

202

fetched_count = int(outer_msg.get(ATTACHMENT_FIELD))

203

except (ValueError, TypeError):

204

outer_msg.replace_header(ATTACHMENT_FIELD, str(fetched_count))

205

return fetched_count

206

207

def _part_filename(self, _unnamed_part, count):

208

return PART_FN_TPL % (count + 1)

209

210

def _attach_part(self, outer_msg, part):

211

"""

212

Attach an part to an outer message. outermsg must be a MIMEMultipart.

213

Modifies a header in the message to keep track of number of attachments.

214

"""

215

cur_c = self._multi_part_count(outer_msg)

216

if not part.get_filename():

217

fn = self._part_filename(part, cur_c)

218

part.add_header('Content-Disposition',

219

'attachment', filename=fn)

220

outer_msg.attach(part)

221

self._multi_part_count(outer_msg, cur_c + 1)

222

223

224

# Coverts a raw string into a mime message

225

def convert_string(raw_data, headers=None):

226

if not raw_data:

227

raw_data = ''

228

if not headers:

229

headers = {}

230

data = util.decomp_str(raw_data)

231

if "mime-version:" in data[0:4096].lower():

232

msg = email.message_from_string(data)

233

for (key, val) in headers.iteritems():

234

if key in msg:

235

msg.replace_header(key, val)

236

else:

237

msg[key] = val

238

else:

239

mtype = headers.get(CONTENT_TYPE, NOT_MULTIPART_TYPE)

240

maintype, subtype = mtype.split("/", 1)

241

msg = MIMEBase(maintype, subtype, *headers)

242

msg.set_payload(data)

243

return msg

Older »