2
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
3
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4
<html xmlns="http://www.w3.org/1999/xhtml">
6
<title>Perspective Broker: <q>Translucent</q> Remote Method calls in Twisted</title>
11
<h1>Perspective Broker: <q>Translucent</q> Remote Method calls in Twisted</h1>
14
<li><a href="http://www.lothar.com">Brian Warner</a>:
15
<code><warner@lothar.com></code>
21
<p>One of the core services provided by the Twisted networking framework is
22
<q>Perspective Broker</q>, which provides a clean, secure, easy-to-use
23
Remote Procedure Call (RPC) mechanism. This paper explains the novel
24
features of PB, describes the security model and its implementation, and
25
provides brief examples of usage.</p>
27
<p>PB is used as a foundation for many other services in Twisted, as well as
28
projects built upon the Twisted framework. twisted.web servers can delegate
29
responsibility for different portions of URL-space by distributing PB
30
messages to the object that owns that subspace. twisted.im is an
31
instant-messaging protocol that runs over PB. Applications like CVSToys and
32
the BuildBot use PB to distribute notices every time a CVS commit has
33
occurred. Using Perspective Broker as the RPC layer allows these projects to
34
stay focused on the interesting parts.</p>
36
<p>The PB protocol is not limited to Python. There is a working Java
37
implementation available from the Twisted web site, as is an Emacs-Lisp
38
version (which can be used to control a PB-enabled application from within
39
your editing session, or effectively embed a Python interpreter in Emacs).
40
Python's dynamic and introspective nature makes Perspective Broker easier to
41
implement (and very convenient to use), but neither are strictly necessary.
42
With a set of callback tables and a good dictionary implementation, it would
43
be possible to implement the same protocol in C, C++, Perl, or other
50
<p>Perspective Broker provides the following basic RPC features.</p>
53
<li><strong>remotely-invokable methods</strong>: certain methods (those
54
with names that start with <q>remote_</q>) of
55
<code>pb.Referenceable</code> objects can be invoked by remote clients who
56
hold matching <code>pb.RemoteReference</code> objects.</li>
58
<li><strong>transparent, controllable object serialization</strong>: other
59
objects sent through those remote method invocations (either as arguments
60
or in the return value) will be automatically serialized. The data that is
61
serialized, and the way they are represented on the remote side, depends
62
upon which <code>twisted.pb.flavor</code> class they inherit from, and
63
upon overridable methods to get and set state.</li>
65
<li><strong>per-connection object ids</strong>: certain objects that are
66
passed by reference are tracked when they are sent over a wire. If the
67
receiver sends back the reference it received, the sender will see their
68
original object come back to them.</li>
70
<li><strong>twisted.cred authentication layer</strong>: provides common
71
username/password verification functions. <code>pb.Viewable</code> objects
72
keep a user reference with them, so remotely-invokable methods can find
73
out who invoked them.</li>
75
<li><strong>remote exception reporting</strong>: exceptions that occur in
76
remote methods are wrapped in <code>Failure</code> objects and serialized
77
so they can be provided to the caller. All the usual traceback information
78
is available on the invoking side.</li>
80
<li><strong>runs over arbitrary byte-pipe transports</strong>: including
81
TCP, UNIX-domain sockets, and SSL connections. UDP support (in the form of
82
Airhook) is being developed.</li>
84
<li><strong>numerous sandwich-related puns</strong>: PB, Jelly, Banana,
85
<code>twisted.spread</code>, Marmalade, Tasters, and Flavors. By contrast,
86
CORBA and XML-RPC have few, if any, puns in their naming conventions.</li>
92
<p>Here is a simple example of PB in action. The server code creates an
93
object that can respond to a few remote method calls, and makes it available
94
on a TCP port. The client code connects and runs two methods.</p>
96
<a href="pb-server1.py" class="py-listing" skipLines="2">pb-server1.py</a>
97
<a href="pb-client1.py" class="py-listing" skipLines="2">pb-client1.py</a>
99
<p>When this is run, the client emits the following progress messages:</p>
102
% <em>./pb-client1.py</em>
103
got object: <twisted.spread.pb.RemoteReference instance at 0x817cab4>
105
addition complete, result is 3
107
subtraction result is -7
111
<p>This example doesn't demonstrate instance serialization, exception
112
reporting, authentication, or other features of PB. For more details and
113
examples, look at the PB <q>howto</q> docs at <a
114
href="http://twistedmatrix.com/documents/howto/">twistedmatrix.com</a>.</p>
116
<h2>Why <q>Translucent</q> References?</h2>
118
<p>Remote function calls are not the same as local function calls. Remote
119
calls are asynchronous. Data exchanged with a remote system may be
120
interpreted differently depending upon version skew between the two systems.
121
Method signatures (number and types of parameters) may differ. More failure
122
modes are possible with RPC calls than local ones.</p>
124
<p><q>Transparent</q> RPC systems attempt to hide these differences, to make
125
remote calls look the same as local ones (with the noble intention of making
126
life easier for programmers), but the differences are real, and hiding them
127
simply makes them more difficult to deal with. PB therefore provides
128
<q>translucent</q> method calls: it exposes these differences, but offers
129
convenient mechanisms to handle them. Python's flexible object model and
130
exception handling take care of part of the problem, while Twisted's
131
Deferred class provides a clean way to deal with the asynchronous nature of
134
<h3>Asynchronous Invocation</h3>
136
<p>A fundamental difference between local function calls and remote ones is
137
that remote ones are always performed asynchronously. Local function calls
138
are generally synchronous (at least in most programming languages): the
139
caller is blocked until the callee finishes running and possibly returns a
140
value. Local functions which might block (loosely defined as those which
141
would take non-zero or indefinite time to run on infinitely fast hardware)
142
are usually marked as such, and frequently provide alternative APIs to run
143
in an asynchronous manner. Examples of blocking functions are
144
<code>select()</code> and its less-generalized cousins:
145
<code>sleep()</code>, <code>read()</code> (when buffers are empty), and
146
<code>write()</code> (when buffers are full).</p>
148
<p>Remote function calls are generally assumed to take a long time. In
149
addition to the network delays involved in sending arguments and receiving
150
return values, the remote function might itself be blocking.</p>
152
<p><q>Transparent</q> RPC systems, which pretend that the remote system is
153
really local, usually offer only synchronous calls. This prevents the
154
program from getting other work done while the call is running, and causes
155
integration problems with GUI toolkits and other event-driven
158
<h3>Failure Modes</h3>
160
<p>In addition to the usual exceptions that might be raised in the course of
161
running a function, remotely invoked code can cause other errors. The
162
network might be down, the remote host might refuse the connection (due to
163
authorization failures or resource-exhaustion issues), the remote end might
164
have a different version of the code and thus misinterpret serialized
165
arguments or return a corrupt response. Python's flexible exception
166
mechanism makes these errors easy to report: they are just more exceptions
167
that could be raised by the remote call. In other languages, this requires a
168
special API to report failures via a different path than the normal
171
<h3>Deferreds to the rescue</h3>
173
<p>In PB, Deferreds are used to handle both the asynchronous nature of the
174
method calls and the various kinds of remote failures that might occur. When
175
the method is invoked, PB returns a Deferred object that will be fired
176
later, when the response (success or failure) is received from the remote
177
end. The caller (the one who invoked <code>callRemote</code>) is free to
178
attach callback and errback handlers to the Deferred. If an exception is
179
raised (either by the remote code or a network failure during processing),
180
the errback will be run with the wrapped exception. If the function
181
completes normally, the callback is run.</p>
183
<p>By using Deferreds, the invoking program can get other work done while it
184
is waiting for the results. Failure is handled just as cleanly as
187
<p>In addition, the remote method can itself return a <code>Deferred</code>
188
instead of an actual return value. When that <code>Deferreds</code> fires,
189
the data given to the callback will be serialized and returned to the
190
original caller. This allows the remote server to perform other work as
191
well, putting off the answer until one is available.</p>
194
<h2>Calling Remote Methods</h2>
196
<p>Perspective Broker is first and foremost a mechanism for remote method
197
calls: doing something to a local object which causes a method to get run on
198
a distant one. The process making the request is usually called the
199
<q>client</q>, and the process which hosts the object that actually runs the
200
method is called the <q>server</q>. Note, however, that method requests can
201
go in either direction: instead of distinguishing <q>client</q> and
202
<q>server</q>, it makes more sense to talk about the <q>sender</q> and
203
<q>receiver</q> for any individual method call. PB is symmetric, and the
204
only real difference between the two ends is that one initiated the original
205
TCP connection and the other accepted it.</p>
207
<p>With PB, the local object is an instance of
208
<code>twisted.spread.pb.RemoteReference</code>, and you <q>do something</q>
209
to it by calling its <code>.callRemote</code> method. This call accepts a
210
method name and an argument list (including keyword arguments). Both are
211
serialized and sent to the receiving process, and the call returns a
212
<code>Deferred</code>, to which you can add callbacks. Those callbacks will
213
be fired later, when the response returns from the remote end.</p>
215
<p>That local RemoteReference points at a
216
<code>twisted.spread.pb.Referenceable</code> object living in the other
217
program (or one of the related callable flavors). When the request comes
218
over the wire, PB constructs a method name by prepending
219
<code>remote_</code> to the name requested by the remote caller. This method
220
is looked up in the <code>pb.Referenceable</code> and invoked. If an
221
exception is raised (including the <code>AttributeError</code> that results
222
from a bad method name), the error is wrapped in a <code>Failure</code>
223
object and sent back to the caller. If it succeeds, the result is serialized
226
<p>The caller's Deferred will either have the callback run (if the method
227
completed normally) or the errback run (if an exception was raised). The
228
Failure object given to the errback handler allows a full stack trace to be
229
displayed on the calling end.</p>
231
<p>For example, if the holder of the <code>RemoteReference</code> does <code
232
class="python">rr.callRemote("foo", 1, 3)</code>, the corresponding
233
<code>Referenceable</code> will be invoked with <code
234
class="python">r.remote_foo(1, 3)</code>. A <code>callRemote</code> of
235
<q><code>bar</code></q> would invoke <code>remote_bar</code>, etc.</p>
237
<h3>Obtaining other references</h3>
239
<p>Each <code>pb.RemoteReference</code> object points to a
240
<code>pb.Referenceable</code> instance in some other program. The first such
241
reference must be acquired with a bootstrapping function like
242
<code>pb.getObjectAt</code>, but all subsequent ones are created when a
243
<code>pb.Referenceable</code> is sent as an argument to (or a return value
244
from) a remote method call.</p>
246
<p>When the arguments or return values contain references to other objects,
247
the object that appears on the other side of the wire depends upon the type
248
of the referred object. Basic types are simply copied: a dictionary of lists
249
will appear as a dictionary of lists, with internal references preserved on
250
a per-method-call basis (just as Pickle will preserve internal references
251
for everything pickled at the same time). Class instances are restricted,
252
both to avoid confusion and for security reasons.</p>
254
<h3>Transferring Instances</h3>
256
<p>PB only allows certain kinds of objects to be transferred to and from
257
remote processes. Most of these restrictions are implemented in the <a
258
href="#jelly">Jelly</a> serialization layer, described below. In general, to
259
send an object over the wire, it must either be a basic python type (list,
260
dictionary, etc), or an instance of a class which is derived from one of the
261
four basic <em>PB Flavors</em>: <code>Referenceable</code>,
262
<code>Viewable</code>, <code>Copyable</code>, and <code>Cacheable</code>.
263
Each flavor has methods which define how the object should be treated when
264
it needs to be serialized to go over the wire, and all have related classes
265
that are created on the remote end to represent them.</p>
267
<p>There are a few kinds of callable classes. All are represented on the
268
remote system with <code>RemoteReference</code> instances.
269
<code>callRemote</code> can be used on these RemoteReferences, causing
270
methods with various prefixes to be invoked.</p>
275
<th>Remote Representation</th>
276
<th>method prefix</th>
279
<td><code>Referenceable</code></td>
280
<td><code>RemoteReference</code></td>
281
<td><code>remote_</code></td>
284
<td><code>Viewable</code></td>
285
<td><code>RemoteReference</code></td>
286
<td><code>view_</code></td>
290
<p><code>Viewable</code> (and the related <code>Perspective</code> class)
291
are described later (in <a href="#authorization">Authorization</a>). They
292
provide a secure way to let methods know <em>who</em> is calling them. Any
293
time a <code>Referenceable</code> (or <code>Viewable</code>) is sent over
294
the wire, it will appear on the other end as a <code>RemoteReference</code>.
295
If any of these references are sent back to the system they came from, they
296
emerge from the round trip in their original form.</p>
298
<p>Note that RemoteReferences cannot be sent to anyone else (there are no
299
<q>third-party references</q>): they are scoped to the connection between
300
the holder of the <code>Referenceable</code> and the holder of the
301
<code>RemoteReference</code>. (In fact, the <code>RemoteReference</code> is
302
really just an index into a table maintained by the owner of the original
303
<code>Referenceable</code>).</p>
305
<p>There are also two data classes. To send an instance over the wire, it
306
must belong to a class which inherits from one of these.</p>
311
<th>Remote Representation</th>
314
<td><code>Copyable</code></td>
315
<td><code>RemoteCopy</code></td>
318
<td><code>Cacheable</code></td>
319
<td><code>RemoteCache</code></td>
324
<a name="pb.Copyable"></a>
326
<p><code>Copyable</code> is used to allow class instances to be sent over
327
the wire. <code>Copyable</code>s are copy-by-value, unlike
328
<code>Referenceable</code>s which are copy-by-reference.
329
<code>Copyable</code> objects have a method called
330
<code>getStateToCopy</code> which gets to decide how much of the object
331
should be sent to the remote system: the default simply copies the whole
332
<code>__dict__</code>. The receiver must register a <code>RemoteCopy</code>
333
class for each kind of <code>Copyable</code> that will be sent to it: this
334
registration (described later in <a href="#unjellyableRegistry">Representing
335
Instances</a>) maps class names to actual classes. Apart from being a
336
security measure (it emphasizes the fact that the process is receiving data
337
from an untrusted remote entity and must decide how to interpret it safely),
338
it is also frequently useful to distinguish a copy of an object from the
339
original by holding them in different classes.</p>
341
<p><code>getStateToCopy</code> is frequently used to remove attributes that
342
would not be meaningful outside the process that hosts the object, like file
343
descriptors. It also allows shared objects to hold state that is only
344
available to the local process, including passwords or other private
345
information. Because the default serialization process recursively follows
346
all references to other objects, it is easy to accidentally send your entire
347
program to the remote side. Explicitly creating the state object (creating
348
an empty dictionary, then populating it with only the desired instance
349
attributes) is a good way to avoid this.</p>
351
<p>The fact that PB will refuse to serialize objects that are neither basic
352
types nor explicitly marked as being transferable (by subclassing one of the
353
pb.flavors) is another way to avoid the <q>don't tug on that, you never know
354
what it might be attached to</q> problem. If the object you are sending
355
includes a reference to something that isn't marked as transferable, PB will
356
raise an InsecureJelly exception rather than blindly sending it anyway (and
357
everything else it references).</p>
359
<p>Finally, note that <code>getStateToCopy</code> is distinct from the
360
<code>__getstate__</code> method used by Pickle, and they can return
361
different values. This allows objects to be persisted (across time)
362
differently than they are transmitted (across [memory]space).</p>
364
<h3>pb.Cacheable</h3>
365
<a name="pb.Cacheable"></a>
367
<p><code>Cacheable</code> is a variant of <code>Copyable</code> which is
368
used to implement remote caches. When a <code>Cacheable</code> is sent
369
across a wire, a method named <code>getStateToCacheAndObserveFor</code> is
370
used to simultaneously get the object's current state and to register an
371
<q>Observer</q> which lives next to the <code>Cacheable</code>. The Observer
372
is effectively a <code>RemoteReference</code> that points at the remote
373
cache. Each time the cached object changes, it uses its Observers to tell
374
all the remote caches about the change. The <q>setter</q> methods can just
375
call <code class="python">observer.callRemote("setFoo", newvalue)</code> for
376
all their observers.</p>
378
<p>On the remote end, a <code>RemoteCache</code> object is created, which
379
populates the original object's state just as <code>RemoteCopy</code> does.
380
When changes are made, the Observers remotely invoke methods like
381
<code>observe_setFoo</code> in the <code>RemoteCache</code> to perform the
384
<p>As <code>RemoteCache</code> objects go away, their Observers go away too,
385
and call <code>stoppedObserving</code> so they can be removed from the
388
<p>The PB <a href="http://twistedmatrix.com/documents/howto/"
389
><q>howto</q> docs</a> have more information and complete examples of both
390
<code>pb.Copyable</code> and <code>pb.Cacheable</code>.</p>
393
<h2>Authorization</h2>
394
<a name="authorization"></a>
396
<p>As a framework, Perspective Broker (indeed, all of Twisted) was built
397
from the ground up. As multiple use cases became apparent, common
398
requirements were identified, code was refactored, and layers were developed
399
to cleanly serve the needs of all <q>customers</q>. The twisted.cred layer
400
was created to provide authorization services for PB as well as other
401
Twisted services, like the HTTP server and the various instant messaging
402
protocols. The abstract notions of identity and authority it uses are
403
intended to match the common needs of these various protocols: specific
404
applications can always use subclasses that are more appropriate for their
407
<h3>Identity and Perspectives</h3>
409
<p>In twisted.cred, <q>Identities</q> are usernames (with passwords),
410
represented by <code>Identity</code> objects. Each identity has a
411
<q>keyring</q> which authorizes it to access a set of objects called
412
<q>Perspectives</q>. These perspectives represent accounts or other
413
capabilities; each belongs to a single <q>Service</q>. There may be multiple
414
Services in a single application; in fact the flexible nature of Twisted
415
makes this easy. An HTTP server would be a Service, and an IRC server would
418
<p>As an example, a login service might have perspectives for Alice, Bob,
419
and Charlie, and there might also be an Admin perspective. Alice has admin
420
capabilities. In addition, let us say the same application has a chat
421
service with accounts for each person (but no special administrator
424
<p>So, in this example, Alice's keyring gives her access to three
425
perspectives: login/Alice, login/Admin, and chat/Alice. Bob only gets two:
426
login/Bob and chat/Bob. <code>Perspective</code> objects have names and
427
belong to <code>Service</code> objects, but the
428
<code>Identity.keyring</code> is a dictionary indexed by (serviceName,
429
perspectiveName) pairs. It uses names instead of object references because
430
the <code>Perspective</code> object might be created on demand. The keys
431
include the service name because Perspective names are scoped to a single
434
<h3>pb.Perspective</h3>
436
<p>The PB-specific subclass of the generic <code>Perspective</code> class is
437
also capable of remote execution. The login process results in the
438
authorized client holding a special kind of <code>RemoteReference</code>
439
that will allow it to invoke <code>perspective_</code> methods on the
440
matching <code>pb.Perspective</code> object. In PB applications that use the
441
<code>twisted.cred</code> authorization layer, clients get this reference
442
first. The client is then dependent upon the Perspective to provide
443
everything else, so the Perspective can enforce whatever security policy it
446
<p>(Note that the <code>pb.Perspective</code> class is not actually one of
447
the serializable PB flavors, and that instances of it cannot be sent
448
directly over the wire. This is a security feature intended to prevent users
449
from getting access to somebody else's <code>Perspective</code> by mistake,
450
perhaps when a <q>list all users</q> command sends back an object which
451
includes references to other Perspectives.)</p>
453
<p>PB provides functions to perform a challenge-response exchange in which
454
the remote client proves their identity to get that <code>Perspective</code>
455
reference. The <code>Identity</code> object holds a password and uses an MD5
456
hash to verify that the remote user knows the password without sending it in
457
cleartext over the wire. Once the remote user has proved their identity,
458
they can request a reference to any <code>Perspective</code> permitted by
459
their <code>Identity</code>'s keyring.</p>
461
<p>There are twisted.cred functions (twisted.enterprise.dbcred) which can
462
pull user information out of a database, and it is easy to create modules
463
that could check /etc/passwd or LDAP instead. Authorization can then be
464
centralized through the Perspective object: each object that is accessible
465
remotely can be created with a pointer to the local Perspective, and objects
466
can ask that Perspective whether the operation is allowed before performing
469
<p>Most clients use a helper function called <code>pb.connect()</code> to
470
get the first Perspective reference: it takes all the necessary identifying
471
information (host, port, username, password, service name, and perspective
472
name) and returns a <code>Deferred</code> that will be fired when the
473
<code>RemoteReference</code> is available. (This may change in the future:
474
there are plans afoot to use a URL-like scheme to identify the Perspective,
475
which will probably mean a new helper function).</p>
479
<p>There is a special kind of <code>Referenceable</code> called
480
<code>pb.Viewable</code>. Its remote methods (all named <code>view_</code>)
481
are called with an extra argument that points at the
482
<code>Perspective</code> the client is using. This allows the same
483
<code>Referenceable</code> to be shared among multiple clients while
484
retaining the ability to treat those clients differently. The methods can
485
check with the Perspective to see if the request should be allowed, and can
486
use per-client information in processing the request.</p>
488
<!-- XXX: it would be nice to provide some examples of typical Perspective
489
use cases: static pre-defined Perspectives, DB lookup, anonymous access. But
490
they would be pretty big, and are probably more appropriate for the
491
pb-cred.html HOWTO doc -->
494
<h2>PB Design: Object Serialization</h2>
496
<p>Fundamental to any calling convention, whether ABI or RPC, is how
497
arguments and return values are passed from caller to callee and back. RPC
498
systems require data to be turned into a form which can be delivered through
499
a network, a process usually known as serialization. Sharing complex types
500
(references and class instances) with a remote system requires more care:
501
references should all point to the same thing (even though the object being
502
referenced might live on either end of the connection), and allowing a
503
remote user to create arbitrary class instances in your memory space is a
504
security risk that must be controlled.</p>
506
<p>PB uses its own serialization scheme called <q>Jelly</q>. At the bottom
507
end, it uses s-expressions (lists of numbers and strings) to represent the
508
state of basic types (lists, dictionaries, etc). These s-expressions are
509
turned into a bytestream by the <q>Banana</q> layer, which has an optional C
510
implementation for speed. Unserialization for higher-level objects is driven
511
by per-class <q>jellyier</q> objects: this flexibility allows PB to offer
512
inheritable classes for common operations. <code>pb.Referenceable</code> is
513
a class which is serialized by sending a reference to the remote end that
514
can be used to invoke remote methods. <code>pb.Copyable</code> is a class
515
which creates a new object on the remote end, with methods that the
516
developer can override to control how much state is sent or accepted.
517
<code>pb.Cacheable</code> sends a full copy the first time it is exchanged,
518
but then sends deltas as the object is modified later.</p>
520
<p>Objects passed over the wire get to decide for themselves how much
521
information is actually passed to the remote system. Copy-by-reference
522
objects are given a per-connection ID number and stashed in a local
523
dictionary. Copy-by-value objects may send their entire
524
<code>__dict__</code>, or some subset thereof. If the remote method returns
525
a referenceable object that was given to it earlier (either in the same RPC
526
call or an earlier one), PB sends the ID number over the wire, which is
527
looked up and turned into a proper object reference upon receipt. This
528
provides one-sided reference transparency: one end sees objects coming and
529
going through remote method calls in exactly the same fashion as through
530
local calls. Those references are only capable of very specific operations;
531
PB does not attempt to provide full object transparency. As discussed later,
532
this is instrumental to security.</p>
534
<h3>Banana and s-expressions</h3>
536
<p>The <q>Banana</q> low-level serialization layer converts s-expressions
537
which represent basic types (numbers, strings, and lists of numbers,
538
strings, or other lists) to and from a bytestream. S-expressions are easy to
539
encode and decode, and are flexible enough (when used with a set of tokens)
540
to represent arbitrary objects. <q>cBanana</q> is a C extension module which
541
performs the encode/decode step faster than the native python
544
<p>Each s-expression element is converted into a message with two or three
545
components: a header, a type marker, and an optional body (used only for
546
strings). The header is a number expressed in base 128. The type marker is a
547
single byte with the high bit set, that both terminates the header and
548
indicate the type of element this message describes (number, list-start,
549
string, or tokenized string).</p>
551
<p>When a connection is first established, a list of strings is sent to
552
negotiate the <q>dialect</q> of Banana being spoken. The first dialect known
553
to both sides is selected. Currently, the dialect is only used to select a
554
list of string tokens that should be specially encoded (for performance),
555
but subclasses of Banana could use self.currentDialect to influence the
556
encoding process in other ways.</p>
558
<p>When Banana is used for PB (by negotiating the <q>pb</q> dialect), it has
559
a list of 30ish strings that are encoded into two-byte sequences instead of
560
being sent as generalized string messages. These string tokens are used to
561
mark complex types (beyond the simple lists, strings, and numbers provided
562
natively by Banana) and other objects Jelly needs to do its job.</p>
567
<p><code>Jelly</code> handles object serialization. It fills a similar role
568
to the standard Pickle module, but has design goals of security and
569
portability (especially to other languages) where Pickle favors efficiency
570
of representation. In addition, Jelly serializes objects into s-expressions
571
(lists of tokens, strings, numbers, and other lists), and lets Banana do the
572
rest, whereas Pickle goes all the way down to a bytestream by itself.</p>
574
<p>Basic python types (apart from strings and numbers, which Banana can
575
handle directly) are generally turned into lists with a type token as the
576
first element. For example, a python dictionary is turned into a list that
577
starts with the string token <q>dictionary</q> and continues with elements
578
that are lists of [key, value] pairs. Modules, classes, and methods are all
579
transformed into s-expressions that refer to the relevant names. Instances
580
are represented by combining the class name (a string) with an arbitrary
581
state object (which is usually a dictionary).</p>
583
<p>Much of the rest of Jelly has to do with safely handling class instances
584
(as opposed to basic Python types) and dealing with references to shared
587
<h4>Tracking shared references</h4>
589
<p>Mutable types are serialized in a way that preserves the identity between
590
the same object referenced multiple times. As an example, a list with four
591
elements that all point to the same object must look the same on the remote
592
end: if it showed up as a list pointing to four independent objects (even if
593
all the objects had identical states), the resulting list would not behave
594
in the same way as the original. Changing <code>newlist[0]</code> would not
595
modify <code>newlist[1]</code> as it ought to.</p>
597
<p>Consequently, when objects which reference mutable types are serialized,
598
those references must be examined to see if they point to objects which have
599
already been serialized in the same session. If so, an object id tag of some
600
sort is put into the bytestream instead of the complete object, indicating
601
that the deserializer should use a reference to a previously-created object.
602
This also solves the issue of recursive or circular references: the first
603
appearance of an object gets the full state, and all subsequent ones get a
606
<p>Jelly manages this reference tracking through an internal
607
<code>_Jellier</code> object (in particular through the <code>.cooked</code>
608
dictionary). As objects are serialized, their <code>id</code> values are
609
stashed. References to those objects that occur after jellying has started
610
can be replaced with a <q>dereference</q> marker and the object id.</p>
612
<p>The scope of this <code>_Jellier</code> object is limited to a single
613
call of the <code>jelly</code> function, which in general corresponds to a
614
single remote method call. The argument tuple is jellied as a single object
615
(a tuple), so different arguments to the same method will share referenced
616
objects<span class="footnote">Actually, PB currently jellies the list
617
arguments in a separate tuple from the keyword arguments. This issue is
618
currently being examined and may be changed in the future</span>, but
619
arguments of separate methods will not share them. To do more complex
620
caching and reference tracking, certain PB <q>flavors</q> (see below)
621
override their <code>jellyFor</code> method to do more interesting things.
622
In particular, <code>pb.Referenceable</code> objects have code to insure
623
that one which makes a round trip will come back as a reference to the same
624
object that was originally sent.</p>
626
<p>An exception to this <q>one-call scope</q> is provided: if the
627
<code>Jellier</code> is created with a <code>persistentStore</code> object,
628
all class instances will be passed through it first, and it has the
629
opportunity to return a <q>persistent id</q>. If available, this id is
630
serialized instead of the object's state. This would allow object references
631
to be shared between different invocations of <code>jelly</code>. However,
632
PB itself does not use this technique: it uses overridden
633
<code>jellyFor</code> methods to provide per-connection shared
636
<h4>Representing Instances</h4>
637
<a name="unjellyableRegistry"></a>
639
<p>Each class gets to decide how it should be represented on a remote
640
system. Sending and receiving are separate actions, performed in separate
641
programs on different machines. So, to be precise, each class gets to decide
642
two things. First, they get to specify how they should be sent to a remote
643
client: what should happen when an instance is serialized (or <q>jellied</q>
644
in PB lingo), what state should be recorded, what class name should be sent,
645
etc. Second, the receiving program gets to specify how an incoming object
646
that claims to be an instance of some class should be treated: whether it
647
should be accepted at all, if so what class should be used to create the new
648
object, and how the received state should be used to populate that
651
<p>A word about notation: in Perspective Broker parlance, <q>to jelly</q> is
652
used to describe the act of turning an object into an s-expression
653
representation (serialization, or at least most of it). Therefore the
654
reverse process, which takes an s-expression and turns it into a real python
655
object, is described with the verb <q>to unjelly</q>. </p>
657
<h4>Jellying Instances</h4>
659
<p>Serializing instances is fairly straightforward. Classes which inherit
660
from <code>Jellyable</code> provide a <code>jellyFor</code> method, which
661
acts like <code>__getstate__</code> in that it should return a serializable
662
representation of the object (usually a dictionary). Other classes are
663
checked with a <code>SecurityOptions</code> instance, to verify that they
664
are safe to be sent over the wire, then serialized by using their
665
<code>__getstate__</code> method (or their <code>__dict__</code> if no such
666
method exists). User-level classes always inherit from one of the PB
667
<q>flavors</q> like <code>pb.Copyable</code> (all of which inherit from
668
<code>Jellyable</code>) and use <code>jellyFor</code>; the
669
<code>__getstate__</code> option is only for internal use.</p>
671
<!-- should we mention persistentStore here? Nothing uses it, so no. Besides
672
it was already hinted at in 'tracking shared references' above. -->
674
<h4>Secure Unjellying</h4>
676
<p>Unjellying (for instances) is triggered by the receipt of an s-expression
677
with the <q>instance</q> tag. The s-expression has two elements: the name of
678
the class, and an object (probably a dictionary) which holds the instance's
679
state. At that point in time, the receiving program does not know what class
680
should be used: it is certainly <em>not</em> safe to simply do an
681
<code>import</code> of the classname requested by the sender. That
682
effectively allows a remote entity to run arbitrary code on your system.
685
<p>There are two techniques used to control how instances are unjellied. The
686
first is a <code>SecurityOptions</code> instance which gets to decide
687
whether the incoming object should accepted or not. It is said to
688
<q>taste</q> the incoming type before really trying to unserialize it. The
689
default taster accepts all basic types but no classes or instances.</p>
691
<p>If the taster decides that the type is acceptable, Jelly then turns to
692
the <code>unjellyableRegistry</code> to determine exactly <em>how</em> to
693
deserialize the state. This is a table that maps received class names names
694
to unserialization routines or classes.</p>
696
<p>The receiving program must register the classes it is willing to accept.
697
Any attempts to send instances of unregistered classes to the program will
698
be rejected, and an InsecureJelly exception will be sent back to the sender.
699
If objects should be represented by the same class in both the sender and
700
receiver, and if the class is defined by code which is imported into both
701
programs (an assumption that results in many security problems when it is
702
violated), then the shared module can simply claim responsibility as the
703
classes are defined:</p>
706
class Foo(pb.RemoteCopy):
708
# note: __init__ will *not* be called when creating RemoteCopy objects
710
def __getstate__(self):
712
def __setstate__(self, state):
713
self.stuff = state.stuff
714
setUnjellyableForClass(Foo, Foo)
717
<p>In this example, the first argument to
718
<code>setUnjellyableForClass</code> is used to get the fully-qualified class
719
name, while the second defines which class will be used for unjellying.
720
<code>setUnjellyableForClass</code> has two functions: it informs the
721
<q>taster</q> that instances of the given class are safe to receive, and it
722
registers the local class that should be used for unjellying.</p>
727
<p>The <code>Broker</code> class manages the actual connection to a remote
728
system. <code>Broker</code> is a <q>Protocol</q> (in Twisted terminology),
729
and there is an instance for each socket over which PB is being spoken.
730
Proxy objects like <code>pb.RemoteReference</code>, which are associated
731
with another object on the other end of the wire, all know which Broker they
732
must use to get to their remote counterpart. <code>pb.Broker</code> objects
733
implement distributed reference counts, manage per-connection object IDs,
734
and provide notification when references are lost (due to lost connections,
735
either from network problems or program termination).</p>
737
<h4>PB over Jelly</h4>
739
<p>Perspective Broker is implemented by sending Jellied commands over the
740
connection. These commands are always lists, and the first element of the
741
list is always a command name. The commands are turned into
742
<code>proto_</code>-prefixed method names and executed in the Broker object.
743
There are currently 9 such commands. Two (<code>proto_version</code> and
744
<code>proto_didNotUnderstand</code>) are used for connection negotiation.
745
<code>proto_message</code> is used to implement remote method calls, and is
746
answered by either <code>proto_answer</code> or
747
<code>proto_error</code>.</p>
749
<p><code>proto_cachemessage</code> is used by Observers (see <a
750
href="#pb.Copyable">pb.Copyable</a>) to notify their
751
<code>RemoteCache</code> about state updates, and behaves like
752
<code>proto_message</code>. <a href="#pb.Cacheable">pb.Cacheable</a> also
753
uses <code>proto_decache</code> and <code>proto_uncache</code> to manage
754
reference counts of cached objects.</p>
756
<p>Finally, <code>proto_decref</code> is used to manage reference counts on
757
<code>RemoteReference</code> objects. It is sent when the
758
<code>RemoteReference</code> goes away, so that the holder of the original
759
<code>Referenceable</code> can free that object.</p>
761
<h4>Per-Connection ID Numbers</h4>
763
<p>Each time a <code>Referenceable</code> is sent across the wire, its
764
<code>jellyFor</code> method obtains a new unique <q>local ID</q> (luid) for
765
it, which is a simple integer that refers to the original object. The
766
Broker's <code>.localObjects{}</code> and <code>.luids{}</code> tables
767
maintain the <q>luid</q>-to-object mapping. Only this ID number is sent to
768
the remote system. On the other end, the object is unjellied into a
769
<code>RemoteReference</code> object which remembers its Broker and the luid
770
it refers to on the other end of the wire. Whenever
771
<code>callRemote()</code> is used, it tells the Broker to send a message to
772
the other end, including the luid value. Back in the original process, the
773
luid is looked up in the table, turned into an object, and the named method
776
<p>A similar system is used with Cacheables: the first time one is sent, an
777
ID number is allocated and recorded in the
778
<code>.remotelyCachedObjects{}</code> table. The object's state (as returned
779
by <code>getStateToCacheAndObserveFor()</code>) and this ID number are sent
780
to the far end. That side uses <code>.cachedLocallyAs()</code> to find the
781
local <code>CachedCopy</code> object, and tracks it in the Broker's
782
<code>.locallyCachedObjects{}</code> table. (Note that to route state
783
updates to the right place, the Broker on the <code>CachedCopy</code> side
784
needs to know where it is. The same is not true of
785
<code>RemoteReference</code>s: nothing is ever sent <em>to</em> a
786
<code>RemoteReference</code>, so its Broker doesn't need to keep track of
789
<p>Each remote method call gets a new <code>requestID</code> number. This
790
number is used to link the request with the response. All pending requests
791
are stored in the Broker's <code>.waitingForAnswers{}</code> table until
792
they are completed by the receipt of a <code>proto_answer</code> or
793
<code>proto_error</code> message.</p>
795
<p>The Broker also provides hooks to be run when the connection is lost.
796
Holders of a <code>RemoteReference</code> can register a callback with
797
<code>.notifyOnDisconnect()</code> to be run when the process which holds
798
the original object goes away. Trying to invoke a remote method on a
799
disconnected broker results in an immediate <code>DeadReferenceError</code>
802
<h4>Reference Counting</h4>
804
<p>The Broker on the <code>Referenceable</code> end of the connection needs
805
to implement distributed reference counting. The fact that a remote end
806
holds a <code>RemoteReference</code> should prevent the
807
<code>Referenceable</code> from being freed. To accomplish this, The
808
<code>.localObjects{}</code> table actually points at a wrapper object
809
called <code>pb.Local</code>. This object holds a reference count in it that
810
is incremented by one for each <code>RemoteReference</code> that points to
811
the wrapped object. Each time a Broker serializes a
812
<code>Referenceable</code>, that count goes up. Each time the distant
813
<code>RemoteReference</code> goes away, the remote Broker sends a
814
<code>proto_decref</code> message to the local Broker, and the count goes
815
down. When the count hits zero, the <code>Local</code> is deleted, allowing
816
the original <code>Referenceable</code> object to be released.</p>
821
<p>Insecurity in network applications comes from many places. Most can be
822
summarized as trusting the remote end to behave in a certain way.
823
Applications or protocols that do not have a way to verify their assumptions
824
may act unpredictably when the other end misbehaves; this may result in a
825
crash or a remote compromise. One fundamental assumption that most RPC
826
libraries make when unserializing data is that the same library is being
827
used at the other end of the wire to generate that data. Developers put so
828
much time into making their RPC libraries work <strong>at all</strong> that
829
they usually assume their own code is the only thing that could possibly
830
provide the input. A safer design is to assume that the input will almost
831
always be corrupt, and to make sure that the program survives anyway.</p>
833
<h3>Controlled Object serialization</h3>
835
<p>Security is a primary design goal of PB. The receiver gets final say as
836
to what they will and will not accept. The lowest-level serialization
837
protocol (<q>Banana</q>) is simple enough to validate by inspection, and
838
there are size limits imposed on the actual data received to prevent
839
excessive memory consumption. Jelly is willing to accept basic data types
840
(numbers, strings, lists and dictionaries of basic types) without question,
841
as there is no dangerous code triggered by their creation, but Class
842
instances are rigidly controlled. Only subclasses of the basic PB flavors
843
(<code>pb.Copyable</code>, etc) can be passed over the wire, and these all
844
provide the developer with ways to control what state is sent and accepted.
845
Objects can keep private data on one end of the connection by simply not
846
including it in the copied state.</p>
848
<p>Jelly's refusal to serialize objects that haven't been explicitly marked
849
as copyable helps stop accidental security leaks. Seeing the
850
<code>pb.Copyable</code> tag in the class definition is a flag to the
851
developer that they need to be aware of what parts of the class will be
852
available to a remote system and which parts are private. Classes without
853
those tags are not an issue: the mere act of <em>trying</em> to export them
854
will cause an exception. If Jelly tried to copy arbitrary classes, the
855
security audit would have to look into <em>every</em> class in the
858
<h3>Controlled Object Unserialization</h3>
860
<p>On the receiving side, the fact that Unjellying insists upon a
861
user-registered class for each potential incoming instance reduces the risk
862
that arbitrary code will be executed on behalf of remote clients. Only the
863
classes that are added to the <code>unjellyableRegistry</code> need to be
864
examined. Half of the security issues in RPC systems will boil down to the
865
fact that these potential unserializing classes will have their
866
<code>setCopyableState</code> methods called with a potentially hostile
867
<code>state</code> argument. (the other half are that <code>remote_</code>
868
methods can be called with arbitrary arguments, including instances that
869
have been sent to that client at some point since the current connection was
870
established). If the system is prepared to handle that, it should be in good
871
shape security-wise.</p>
873
<p>RPC systems which allow remote clients to create arbitrary objects in the
874
local namespace are liable to be abused. Code gets run when objects are
875
created, and generally the more interesting and useful the object, the more
876
powerful the code that gets run during its creation. Such systems also have
877
more assumptions that must be validated: code that expects to be given an
878
object of class <code>A</code> so it can call <code>A.foo</code> could be
879
given an object of class <code>B</code> instead, for which the
880
<code>foo</code> method might do something drastically different. Validating
881
the object is of the required type is much easier when the number of
882
potential types is smaller.</p>
884
<h3>Controlled Method Invocation</h3>
886
<p>Objects which allow remote method invocation do not provide remote access
887
to their attributes (<code>pb.Referenceable</code> and
888
<code>pb.Copyable</code> are mutually exclusive). Remote users can only
889
invoke a well-defined and clearly-marked subset of their methods: those with
890
names that start with <code>remote_</code> (or other specific prefixes
891
depending upon the variant of <code>Referenceable</code> in use). This
892
insures that they can have local methods which cannot be invoked remotely.
893
Complete object transparency would make this very difficult: the
894
<q>translucent</q> reference scheme allows objects some measure of privacy
895
which can be used to implement a security model. The
896
<q><code>remote_</code></q> prefix makes all remotely-invokable methods easy
897
to locate, improving the focus of a security audit.</p>
899
<h3>Restricted Object Access</h3>
901
<p>Objects sent by reference are indexed by a per-connection ID number,
902
which is the only way for the remote end to refer back to that same object.
903
This list means that the remote end can not touch objects that were not
904
explicitly given to them, nor can they send back references to objects
905
outside that list. This protects the program's memory space against the
906
remote end: they cannot find other local objects to play with.</p>
908
<p>This philosophy of using simple, easy to validate identifiers (integers
909
in the case of PB) that are scoped to a well-defined trust boundary (in this
910
case the Broker and the one remote system it is connected to) leads to
911
better security. Imagine a C system which sent pointers to the remote end
912
and hoped it would receive back valid ones, and the kind of damage a
913
malicious client could do. PB's <code>.localObjects{}</code> table insures
914
that any given client can only refer to things that were given to them. It
915
isn't even a question of validating the identifier they send: if it isn't a
916
value of the <code>.localObjects{}</code> dictionary, they have no physical
917
way to get at it. The worst they can do with a corrupt ObjectID is to cause
918
a <code>KeyError</code> when it is not found, which will be trapped and
923
<p>Banana limits string objects to 640k (because, as the source says, 640k
924
is all you'll ever need). There is a helper class called
925
<code>pb.util.StringPager</code> that uses a producer/consumer interface to
926
break up the string into separate pages and send them one piece at a time.
927
This also serves to reduce memory consumption: rather than serializing the
928
entire string and holding it in RAM while waiting for the transmit buffers
929
to drain, the pages are only serialized as there is space for them.</p>
932
<h2>Future Directions</h2>
934
<p>PB can currently be carried over TCP and SSL connections, and through
935
UNIX-domain sockets. It is being extended to run over UDP datagrams and a
936
work-in-progress reliable datagram protocol called <q>airhook</q>. (clearly
937
this requires changes to the authorization sequence, as it must all be done
938
in a single packet: it might require some kind of public-key signature).</p>
940
<p>At present, two functions are used to obtain the initial reference to a
941
remote object: <code>pb.getObjectAt</code> and <code>pb.connect</code>. They
942
take a variety of parameters to indicate where the remote process is
943
listening, what kind of username/password should be used, and which exact
944
object should be retrieved. This will be simplified into a <q>PB URL</q>
945
syntax, making it possible to identify a remote object with a descriptive
946
URL instead of a list of parameters.</p>
948
<p>Another research direction is to implement <q>typed arguments</q>: a way
949
to annotate the method signature to indicate that certain arguments may only
950
be instances of a certain class. Reminiscent of the E language, this would
951
help remote methods improve their security, as the common code could take
952
care of class verification.</p>
954
<p>Twisted provides a <q>componentization</q> mechanism to allow
955
functionality to be split among multiple classes. A class can declare that
956
all methods in a given list (the <q>interface</q>) are actually implemented
957
by a companion class. Perspective Broker will be cleaned up to use this
958
mechanism, making it easier to swap out parts of the protocol with different
961
<p>Finally, a comprehensive security audit and some performance improvements
962
to the Jelly design are also in the works.</p>
964
<!-- $Id: pb.html,v 1.1 2003/03/31 05:21:40 glyph Exp $ -->