~mwhudson/pypy/imported-pypy-rdict-refactoring : contents of doc/theory.txt at revision 8977

~mwhudson/pypy/imported-pypy-rdict-refactoring : (revision 8977)
===================================
Techniques used in PyPy
===================================

.. contents::
.. sectnum::

.. _`abstract interpretation`: 
    
Abstract Interpretation
=======================

Abstract Interpretation is a general technique which consists of an
interpreter that follows the bytecode instructions of a user program, just
like a normal interpreter does, but with abstract objects instead of concrete
ones. Remember that in PyPy this is done by using alternate object spaces with
the same bytecode interpreter main loop.

As a theoretical example, the most abstract object space would be the one manipulating the most abstract objects that you could imagine: they are all equivalent, because we have abstracted away any information about the object. There is actually only one of them left, and we could call it "the object". In Python terms, an AbstractObjectSpace could use None for all its wrapped objects. Any operation between wrapped objects gives None again as the wrapped result -- there is nothing else it could give anyway. So when you have said that the add method of AbstractObjectSpace takes None and None and returns None you have said everything.

The point of such an object space is for example to check the bytecode. The
bytecode interpreter will really run your bytecode, just with completely
abstract arguments. If there is no problem then you are sure that the bytecode
is valid. You could also record, during this abstract interpretation, how much
the stack ever grows; that would give you a fool-proof method of computing or
checking the co_stacksize argument of a code object. (There are subtleties
which I won't describe here, but that's the basic idea.) 

Typically, however, abstract object spaces are a (little) bit less abstract, still maintaining a minimal amount of information about the objects. For example, a wrapped object could be represented by its type. You then define the object space's add to return int when the two arguments are int and int. That way, you abstractedly call a function with the input argument's types and what the interpreter will do is a type inference. (Here also there are subtle problems, even besides the remark that integer operations can overflow and actually return longs in a real Python implementation.)

As an example of more abstract object spaces you have the ones with finite domain, i.e. with a finite number of different possible wrapped objects. For example, you can use True and False as wrapped values to denote the fact that the object is, respectively, a non-negative integer or anything else. In this way you are doing another kind of type inference that just tells you which variables will only ever contain non-negative integers.

In PyPy, the FlowObjSpace_ uses the abstract interpretation technique to generate a control flow graph of the functions of RPython_ programs.

In its `more formal definition`_, Abstract Interpretation typically
considers abstract objects that are organized in a lattice_: some of
these objects are more (or less) abstract than others, in the sense that
they represent less (or more) known information; to say that this forms
a lattice essentially means that any two abstract objects have
well-defined unions and intersections (which are again abstract
objects).

.. _FlowObjSpace: objspace.html#the-flow-object-space
.. _RPython:      coding-guide.html#restricted-python
.. _`more formal definition`: http://en.wikipedia.org/wiki/Abstract_interpretation
.. _lattice:      http://en.wikipedia.org/wiki/Lattice_%28order%29


Multimethods
============

A "multimethod" is the generalization of the OOP notion of "method".
Theoretically, a method is a "message name" and signature attached to a
particular base class, which is implementated in the class or its subclasses.
To do a "method call" means to send a message to an object, using a message
name and actual arguments.  We call "message dispatch" the operation of
finding which actual implementation is suitable for a particular call.  For
methods, a message is dispatched by looking up the class of the "self" object,
and finding an implementation in that class, or in its base classes, in a
certain order.

A multimethod is a message name and signature that can have implementations
that depend not only on the class of the first "self" argument, but on the
class of several arguments.  Because of this we cannot use Python's nice model
of storing method implementations as functions, in the attributes of the
class.

Here is a common implementation of multimethods: they are instances of a
specific MultiMethod class, and the instances are callable (there is a
__call__ operator on MultiMethod).  When a MultiMethod is called, a dispatch
algorithm is used to find which, among the registered implementations, is the
one that should be called; this implementation is then immediately called. The
most important difference with normal methods is that the MultiMethod object
to call is no longer syntactically attached to classes.  In other words,
whereas a method is called with ``obj.somemethod(args)``, a multimethod is
called much like a function, e.g. ``dosomething(obj1, obj2, obj3...)``.  You
have to find the MultiMethod object ``dosomething`` in some namespace; it is
no longer implicitely looked up in the namespace of the "self" object.

PyPy contains two different implementations of multimethods: a `quite general
one`_ written in RPython_ for the purposes of the StdObjSpace_, and a `short
two-arguments-dispatching one`_ used internally by the annotator_.

.. _`quite general one`: http://codespeak.net/svn/pypy/dist/pypy/objspace/std/multimethod.py
.. _StdObjSpace: objspace.html#the-standard-object-space
.. _`short two-arguments-dispatching one`: http://codespeak.net/svn/pypy/dist/pypy/annotation/pairtype.py
.. _annotator: translation.html#annotator