Hacking the juju-core/state package
===================================

This document remains a work in progress; it's an attempt to capture
the various conventions and things to bear in mind that aren't
necessarily written down anywhere else.

return values: ok vs err
------------------------

By convention, anything that could reasonably fail must use a separate
channel to communicate the failure. Broadly speaking, methods that can
fail in more than one way must return an error; those that can only
fail in one way (eg Machine.InstanceId: the value is either valid or
missing) are expected to return a bool for consistency's sake, even if
the type of the return value is such that failure can be signalled in-
band.

changes to entities
-------------------

Entity objects reflect remote state that may change at any time, but we
don't want to make that too obvious. By convention, the only methods
that should change an entity's in-memory document are as follows:

  * Refresh(), which should update every field.
  * Methods that set fields remotely should update only those fields
    locally.

The upshot of this is that it's not appropriate to call Refresh on any
argument to a state method, including receivers; if you're in a context
in which that would be helpful, you should clone the entity first. This
is simple enough that there's no Clone() method, but it would be kinda
nice to implement them in our Copious Free Time; I think there are
places outside state that would also find it useful.

care and feeding of mgo/txn
---------------------------

Just about all our writes to mongodb are mediated by the mgo/txn
package, and using this correctly demands some care. Not all the code
has been written in a fully aware state, and cases in which existing
practice is divergent from the advice given below should be regarded
with some suspicion.

The txn package lets you make watchable changes to mongodb via lists of
operations with the following critical properties:

  * transactions can apply to more than one document (this is rather
    the point of having them).
  * transactions will complete if every assert in the transaction
    passes; they will not run at all if any assert fails.
  * multi-document transactions are *not* atomic; the operations are
    applied in the order specified by the list.
  * operations, and hence assertions, can only be applied to documents
    with ids that are known at the time the operation list is built;
    this means that it takes extra work to specify a condition like
    "no unit document newer than X exists".

The second point above deserves further discussion. Whenever you are
implementing a state change, you should consider the impact of the
following possible mongodb states on your code:

  * if mongodb is already in a state consistent with the transaction
    having already been run -- which is *always* possible -- you
    should return immediately without error.
  * if mongodb is in a state that indicates the transaction is not
    valid -- eg trying to add a unit to a dying service -- you should
    return immediately with a descriptive error.

Each of the above situations should generally be checked as part of
preparing a new []txn.Op, but in some cases it's convenient to trust
(to begin with) that the entity's in-memory state reflects reality.
Regardless, your job is to build a list of operations which assert
that:

  * the transaction still needs to be applied.
  * the transaction is actively valid.
  * facts on which the transaction list's form depends remain true.

If you're really lucky you'll get to write a transaction in which
the third requirement collapses to nothing, but that's not really
the norm. In practice, you need to be prepared for the run to return
txn.ErrAborted; if this happens, you need to check for previous success
(and return nil) or for known cause of invalidity (and return an error
describing it).

If neither of these cases apply, you should assume it's an assertion
failure of the third kind; in that case, you should build a new
transaction based on more recent state and try again. If ErrAborteds
just keep coming, give up; there's an ErrExcessiveContention that
helps to describe the situation.

watching entities, and select groups thereof
--------------------------------------------

The mgo/txn log enables very convenient notifications of changes to
particular documents and groups thereof. The state/watcher package
converts the txn event log into events and send them down client-
supplied channels for further processing; the state code itself
implements watchers in terms of these events.

All the internally-relevant watching code is implemented in the file
state/watcher.go. These constructs can be broadly divided into groups
as follows:

  * single-document watchers: dead simple, they notify of every
    change to a given doc. SettingsWatcher bucks the convention of
    NotifyWatcher in that it reads whole *Settings~s to send down the
    channel rather than just sending notifications (we probably
    shouldn't do that, but I don't think there's time to fix it right
    now).
  * entity group watchers: pretty simple, very common; they notify of
    every change to the Life field among a group of entities in the
    same collection, with group membership determined in a wide variety
    of ways.
  * relation watchers: of a similar nature to entity group watchers,
    but generating carefully-ordered events from observed changes to
    several different collections, none of which have Life fields.

Implementation of new watchers is not necessarily a simple task, unless
it's a genuinely trivial refactoring (or, given lack of generics, copy-
and-paste job). Unless you already know exactly what you're doing,
please start a discussion about what you're trying to do before you
embark on further work in this area.

transactions and reference counts
---------------------------------

As described above, it's difficult to assert things like "no units of
service X exist". It can be done, but not without additional work, and
we're generally using reference counts to enable this sort of thing.

In general, reference counts should not be stored within entities: such
changes are part of the operation of juju and are not important enough
to be reflected as a constant stream of events. We didn't figure this
out until recently, though, so relationDoc has a UnitCount, and
serviceDoc has both a UnitCount and a RelationCount. This doesn't
matter for the relation -- nothing watches relation docs directly --
but I fear it matters quite hard for service, because every added or
removed unit or relation will be an unnecessary event sent to every
unit agent.

We'll deal with that when it's a problem rather than just a concern;
but new reference counts should not be thoughtlessly added to entity
documents, and opportunities to separate them from existing docs should
be considered seriously.


[TODO: write about globalKey and what it's good for... constraints, settings,
statuses, etc; occasionaly profitable use of fast prefix lookup on indexed
fields.]