~jameinel/juju-core/api-registry-tracks-type : contents of doc/architectural-overview.txt at revision 2746

~jameinel/juju-core/api-registry-tracks-type : (revision 2746)
# Juju Architectural Overview

## Audience

This document is targeted at new developers of Juju, and may be useful to experienced
developers who need a refresher on some aspect of juju's operation. It is deliberately
light on detail, because the precise mechanisms of various components' operation are
expected to change much faster than the general interactions between components.


## The View From Space

A Juju environment is a distributed system comprising:

  * A data store (mongodb) which describes the desired state of the world, in terms
    of running workloads or *services*, and the *relations* between them; and of the
    *units* that comprise those services, and the *machines* on which those units run.

  * A bunch of *agents*, each of which runs the same `jujud` binary, and which are
    variously responsible for causing reality to converge towards the idealised world-
    state encoded in the data store.

  * Some number of *clients* which talk over an API, implemented by the agents, to
    update the desired world-state (and thereby cause the agents to update the world
    to match). The `juju` binary is one of many possible clients; the `juju-gui` web
    application, and the `juju-deployer` python tool, are other examples.

The whole system depends upon a substrate, or *provider*, which supplies the compute,
storage, and network resources used by the workloads (and by juju itself; but never
forget that *everything* described in this document is merely supporting infrastructure
geared towards the successful deployment and configuration of the workloads that solve
actual problems for actual users).


## The Data Store

There's a lot of *detail* to cover, but there's not much to say from an architectural
standpoint. We use a mongodb replicaset to support HA; we use the `mgo` package from
`labix.org` to implement multi-document transactions; we make use of the transaction
log to detect changes to particular documents, and convert them into business-object-
level events that get sent over the API to interested parties.

The mongodb databases run on machines we refer to as *state servers*, and are only
accessed by agents running on those machines; it's important to keep it locked down
(and, honestly, to lock it down further and better than we currently have).

There's some documentation on how to work with [the state package](hacking-state);
and plenty more on the [state entities](lifecycles) and the details of their
[creation](entity-creation) and [destruction](death-and-destruction) from various
perspectives; but there's not a lot more to say in this context.

It *is* important to understand that the transaction-log watching is not an ideal
solution, and we'll be retiring it at some point, in favour of an in-memory model
of state and a pub-sub system for watchers; we *know* it's a scalability problem,
but we're not devoting resources to it until it becomes more pressing.

Code for dealing with mongodb is found primarily in the `state`, `state/watcher`,
`replicaset`, and `worker/peergrouper` packages.


## The Agents

Agents all use the same `jujud` binary, and all follow roughly the same  model.
When starting up, they authenticate with an API server; possibly reset their
password, if the one they used has been stored persistently somewhere and is
thus vulnerable; determine their responsibilities; and run a set of tasks in
parallel until one of those tasks returns an error indicating that the agent
should either restart or terminate completely. Tasks that return any other error
will be automatically restarted after a short delay; tasks that return nil are
considered to be complete, and will not be restarted until the whole process is.

When comparing the unit agent with the machine agent, the truth of the above
may not be immediately apparent, because the responsibilities of the unit
agent are so much less varied than those of the machine agent; but we have
scheduled work to integrate the unit agent into the machine agent, rendering
each unit agent a single worker task within its responsible machine agent. It's
still better to consider a unit agent to be a simplistic and/or degenerate
implementation of a machine agent than to attach too much importance to the
differences.


### Jobs, Runners, and Workers

Machine agents all have at least one of two jobs: JobHostUnits and JobManageEnviron.
Each of these jobs represents a number of tasks the agent needs to execute to
fulfil its responsibilities; in addition, there are a number of tasks that are
executed by every machine agent. The terms *task* and *worker* are generally used
interchangeably in this document and in the source code; it's possible but not
generally helpful to draw the distinction that a worker executes a task. All
tasks are implemented by code in some subpackage of the `worker` package, and the
`worker.Runner` type implements the retry behaviour described above.

It's useful to note that the Runner type is itself a worker, so we can and do
nest Runners inside one another; the details of *exactly* how and where a given
worker comes to be executed are generally elided in this document; but it's worth
being aware of the fact that all the workers that use an API connection share a
single one, mediated by a single Runner, such that when the API connection fails
that single Runner can stop all its workers; shut itself down; be restarted by
its parent worker; and set up a new API connection, which it then uses to start
all its child workers.

Please note that the lists of workers below should *not* be assumed to be
exhaustive. Juju evolves, and the workers evolve with it.


### Common workers

All agents run workers with the following responsibilities:

  * Check for scheduled upgrades for their binaries, and replace themselves
    (implemented in `worker/upgrader`)
  * Watch logging config, and reconfigure the local logger (`worker/logger`; yes,
    we know; it is not the stupidest name in the codebase)
  * Watch and store the latest known addresses for the state servers
    (`worker/apiaddressupdater`)
  * Watch and store rsyslog targets (`worker/rsyslog`) -- this *could* probably be
    reasonably combined with the apiaddressupdater, but there's no guarantee that

### Machine Agent Workers

Machine agents additionally do the following:

  * Run upgrade code in the new binaries once they're replaced themselves
    (implemented directly in the machine agent's `upgradeWorker` method)
  * Handle SIGABRT and permanently stop the agent (`worker/terminationworker`)
  * Handle the machine entity's death and permanently stop the agent (`worker/machiner`)
  * Watch proxy config, and reconfigure the local machine (`worker/machineenvironmentworker`)
    the two sets of machines will be the same for ever.
  * Watch for contained LXC or KVM machines and provision/decommission them
    (`worker/provisioner`)

*Almost* all machine agents have JobHostUnits -- the sole exception is the state
server in a local environment, which runs directly on the user's machine (not
in a container); we don't want to pollute their day-to-day working environment
by deploying charms there (especially because the local provider is explicitly
a development tool, and charms deployed there are disproportionately likely to
be flawed or incomplete). Those that do run the `worker/deployer` code which
watches for units assigned to the machine, and deploys/recalls upstart configs
for their respective unit agents as the units are assigned/removed. We expect
the deployer implementation to change to just directly run the unit agents'
workers in its own Runner.

There remain a couple of abominations in which the machine agent looks up
information it really shouldn't have access to -- ie the running provider type
-- and uses that information to decide whether to start other workers. These
instances should be killed with fire if you get any opportunity to do so;
basicaly all the information in agent config which *isn't* about contacting an
API server represents a layering violation that (1) confuses us and slows us
down and (2) just encourages worse layering violations as we progress.


### State Server Workers

Machines with JobManageEnviron also run a number of other workers, which do
the following.

  * Run the API server used by all other workers (in this, and other, agents:
    `state/apiserver`)
  * Provision/decommission provider instances in response to the creation/
    destruction of machine entities (`worker/provisioner`, just like the
    container provisioners run in all machine agents anyway)
  * Manipulate provider networks in response to units opening/closing ports,
    and users exposing/unexposing services (`worker/firewaller`)
  * Update network addresses and associated information for provider instances
    (`worker/instancepoller`)
  * Respond to queued DB cleanup events (`worker/cleaner`)
  * Maintain the MongoDB replica set (`worker/peergrouper`)
  * Resume incomplete MongoDB transactions (`worker/resumer`)

Many of these workers (more than strictly need to be) are wrapped as "singular"
workers, which only run on the same machine as the current MongoDB replicaset
master. When the master changes, the state connection is dropped, causing all
those workers to also be stopped; when they're restarted, they won't run because
they're no longer running on the master.


### Unit Agents

Unit agents run all the common workers, and the `worker/uniter` task as well;
this task is probably the single most forbiddingly complex part of Juju. (Side
note: It's a unit-er because it deals with units, and we're bad at names; but
it's also a unite-r because it's where all the various components of juju come
together to run actual workloads.) It's sufficiently large that it deserves its
own top-level heading, below.


## The Uniter

At the highest level, the Uniter is a state machine. After a "little bit" of setup,
it runs a tight loop in which it calls `Mode` functions one after another, with the
next mode run determined by the result of its predecessor. All mode functions are
implemented in `worker/uniter/modes.go`, which is actually pretty small: just a hair
over 400 lines.

It's deliberately implemented as conceptually single-threaded (just like almost
everything else in juju -- rampaging concurrency is the root of much evil, and so
we save ourselves a huge number of headaches by hiding concurrency behind event
channels and handling a single event at a time), but this property has degraded
over time; in particular, the `RunListener` code can inject events at unhelpful
times, and while the `hookLock` *probably* renders this safe it's still deeply
suboptimal, because the fact of the concurrency requires that we be *extremely*
careful with further modifications, lest they subtly break assumptions. We hope
to address this by retiring the current implementation of `juju run`, but it's
not entirely clear how to do this; in the meantime, Here Be Dragons.

Leaving these woes aside, the mode functions make use of two fundamental components,
which are glommed together until someone refactors it to make more sense. There's
the `Filter`, which is responsible for communicating with the API server (and the
rest of the outside world) such that relevant events can be delivered to the mode
func via channels exposed on the filter; and then there's the `Uniter` itself, which
exposes a number of methods that are expected to be called by the mode funcs.


### Uniter Modes

XXXX


### Hook contexts

XXXX


### The Relation Model

XXXX


## The APIs

State servers expose an API endpoint over a websocket connection. The methods
available over the API are broken down by client; there's a `Client` facade that
exposes the methods used by clients, an `Agent` facade that exposes the methods
common to all agents, and a wide range of worker-specific *facades* that individually
deal with particular chunks of functionality implemented by one agent or another
(for example, `Provisioner`, `Upgrader`, and `Uniter`, each used by the eponymous
worker types).

Various facades share functionality; for example, the Life method is used by many
worker facades. In these cases, the method is implemented on a spearate type, which
is embedded in the facade implementation.

All APIs *should* be implemented such that they can be called in bulk, but not
all of them are. The agent facades are (almost?) all implemented correctly, but
the Client facade is almost exclusively not. As functionality evolves, and new
versions of the client APIs are implemented, we must take care to implement them
consistently -- this means both implementing bulk calls *and* splitting the
monolithic Client facade into smaller service-specific facades, such that we
can evolve interaction with (say) users without bumping global API versions
across the board).


## The Providers

Each provider represents a different possible kind of substrate on which a juju
environment can run, and (as far as possible) abstracts away the differences
between them, by making them all conform to the Environ interface. The most
important thing to understand about the various providers is that they're all
implemented without reference to broader juju concepts; they are squeezed into
a shape that's convenient WRT allowing juju to make use of them, but if we allow
juju-level concepts to infect the providers we will suffer greatly, because we
will open a path by which changes to *juju* end up causing changes to *all the
providers at once*.

However, we lack the ability to enforce this at present, because the package
dependency flows in the wrong direction, thanks primarily (purely?) to the
StateInfo method on Environ; and we jam all sorts of gymnastics into the state
package to allow us to use Environs without doing so explicitly (see the
state.Policy interface, and its many somewhat-inelegant uses). In other places,
we have (quite reasonably) moved code out of the environs package (see both
environs/config.Config, and instance.Instance).

Environ implementations are expected to be goroutine-safe; we don't currently
make much use of that property at the moment, but we will be coming to depend
upon it as we move to eliminate the wasteful proliferation of Environ instances
in the API server.

It's important to note that an environ Config will generally contain sensitive
information -- a user's authentication keys for a cloud provider -- and so we
must always be careful to avoid spreading those around further than we need to.
Basically, if an environ config gets off a state server, we've screwed up.


## Bootstrapping

XXXX