4
Subversion uses three kinds of deltas:
8
A @b{@dfn{tree delta}} describes the difference between two arbitrary
9
directory trees, the way a traditional patch describes the difference
10
between two files. For example, the delta between directories A and B
11
could be applied to A, to produce B.
13
Tree deltas can also carry ancestry information, indicating how the
14
files in one tree are related to files in the other tree. And deltas
15
can describe changes to file meta-information, like permission bits,
16
creation dates, and so on. The repository and working copy use deltas
17
to communicate changes.
20
A @b{@dfn{text delta}} describes changes to a string of bytes, such as the
21
contents of a file. It is analogous to traditional patch format, except
22
that it works equally well on binary and text files, and is not
23
invertible (because context and deleted data are not recorded).
26
A @b{@dfn{property delta}} describes changes to a list of named
27
properties (@pxref{Properties}).
30
The term @dfn{delta} without qualification generally means a tree delta,
31
unless some other meaning is clear from context.
33
In the examples below, deltas will be described in XML, which happens
34
to be Subversion's (now mostly defunct) import/export patch format.
35
However, note that deltas are an abstract data structure, of which the
36
XML format is merely one representation. Later, we will describe
37
other representations: for example, there is a serialized
38
representation (useful for streaming protocols, among other things),
39
and a db-style representation, used for repository storage. The
40
various representations of a given delta are (in theory, anyway)
41
perfectly isomorphic to one another, since they describe the same
48
* Postfix Text Deltas::
49
* Serializing Deltas via the "Editor" Interface::
53
@c -----------------------------------------------------------------------
57
A text delta describes the difference between two strings of bytes, the
58
@dfn{source} string and the @dfn{target} string. Given a source string
59
and a target string, we can compute a text delta; given a source string
60
and a delta, we can reconstruct the target string. However, note that
61
deltas are not invertible: you cannot always reconstruct the source
62
string given the target string and delta.
64
The standard Unix ``diff'' format is one possible representation for
65
text deltas; however, diffs are not ideal for internal use by a revision
66
control system, for several reasons:
69
Diffs are line-oriented, which makes them human-readable, but sometimes
70
makes them perform poorly on binary files.
72
Diffs represent a series of replacements, exchanging selected ranges of
73
the old text with new text; again, this is easy for humans to read, but
74
it is more expensive to compute and less compact than some alternatives.
77
Instead, Subversion uses the VDelta binary-diffing algorithm, as
78
described in @cite{Hunt, J. J., Vo, K.-P., and Tichy, W. F. An
79
empirical study of delta algorithms. Lecture Notes in Computer Science
80
1167 (July 1996), 49-66.} Currently, the output of this algorithm is
81
stored in a custom data format called @dfn{svndiff}, invented by Greg
82
Hudson <@email{ghudson@@mit.edu}>, a Subversion developer.
84
The concrete form of a text delta is a well-formed XML element, having
87
<text-delta>@var{data}</text-delta>
89
Here, @var{data} is the raw svndiff data, encoded in the MIME Base64
92
@c -----------------------------------------------------------------------
94
@section Property Deltas
96
A property delta describes changes to a property list, of the sort
97
associated with files, directories, and directory entries, and revision
98
numbers (@pxref{Properties}). A property delta can record creating,
99
deleting, and changing the text of any number of properties.
101
A property delta is an unordered set of name/change pairs. No two
102
pairs within a given property delta have the same name. A pair's name
103
indicates the property affected, and the change indicates what happens
104
to its value. There are two kinds of changes:
106
@item set @var{value}
107
Change the value of the named property to the byte string @var{value}.
108
If there is no property with the given name, one is added to the
111
Remove the named property from the property list.
114
At the moment, the @code{set} command can either create or change a
115
property value. However, this simplification means that the server
116
cannot distinguish between a client which believes it is creating a
117
value afresh, and a client which believes it is changing the value of an
118
existing property. It may simplify conflict detection to divide
119
@code{set} into two separate @code{add} and @code{change} operations.
121
In the future, we may add a @code{text-delta} change, which specifies a
122
change to an existing property's value as a text delta. This would give
123
us a compact way to describe small changes to large property values.
125
The concrete form of a property delta is a well-formed XML element,
126
having the following form:
128
<property-delta>@var{change}@dots{}</property-delta>
130
Each @var{change} in a property delta has one of the following forms:
132
<set name='@var{name}'>@var{value}</set>
133
<delete name='@var{name}'/>
135
The @var{name} attribute of a @code{set} or @code{delete} element gives
136
the name of the property to change. The @var{value} of a @code{set}
137
element gives the new value of the property.
139
If either the property name or the property value contains the
140
characters @samp{&}, @samp{<}, or @samp{'}, they should be replaced with
141
the sequences @samp{&}, @samp{<}, or @samp{'}, respectively.
144
@c -----------------------------------------------------------------------
148
A tree delta describes changes between two directory trees, the
149
@dfn{source tree} and the @dfn{target tree}. Tree deltas can describe
150
copies, renames, and deletions of files and directories, changes to file
151
contents, and changes to property lists. A tree delta can also carry
152
information about how the files in the target tree are derived from the
153
files in the source tree, if this information is available.
155
The format for tree deltas described here is easy to compute from a
156
Subversion working directory, and easy to apply to a Subversion
157
repository. Furthermore, the size of a tree delta in this format is
158
independent of the commands used to produce the target tree --- it
159
depends only on the degree of difference between the source and target
162
A tree delta is interpreted in the context of three parameters:
165
@var{source-root}, the name of the directory to which this complete
168
@var{revision}, indicating a particular revision of @dots{}
170
@var{source-dir}, which is a directory in the source tree that we are
171
currently modifying to yield @dots{}
173
@dots{} @dfn{target-dir} --- the directory we're constructing.
175
When we start interpreting a tree delta, @var{source-root},
176
@var{source-dir}, and @var{target-dir} are all equal. As we walk the
177
tree delta, @var{target-dir} walks the tree we are constructing, and
178
@var{source-dir} walks the corresponding portion of the source tree,
179
which we use as the original. @var{Source-root} remains constant as we
180
walk the delta; we may use it to choose new source trees.
182
A tree delta is a list of changes of the form
184
<tree-delta>@var{change}@dots{}</tree-delta>
186
which describe how to edit the contents of @var{source-dir} to yield
187
@var{target-dir}. There are three kinds of changes:
190
@item <delete name='@var{name}'/>
191
@var{Source-dir} has an entry named @var{name}, which is not present in
194
@item <add name='@var{name}'>@var{content}</add>
195
@var{target-dir} has an entry named @var{name}, which is not present in
196
@var{source-dir}; @var{content} describes the file or directory to which
197
the new directory entry refers.
199
@item <open name='@var{name}'>@var{content}</open>
200
Both @var{source-dir} and @var{target-dir} have an entry named
201
@var{name}, which has changed; @var{content} describes the new file or
205
Any entries in @var{source-dir} whose names aren't mentioned are assumed
206
to appear unchanged in @var{target-dir}. Thus, an empty
207
@code{tree-delta} element indicates that @var{target-dir} is identical
210
In the change descriptions above, each @var{content} takes one of the
214
@item <file @var{ancestor}>@var{prop-delta} @var{text-delta}</file>
215
The given @var{target-dir} entry refers to a file, @var{f}.
216
@var{Ancestor} indicates which file in the source tree @var{f} is
217
derived from, if any.
219
@var{Prop-delta} is a property delta describing how @var{f}'s properties
220
differ from that ancestor; it may be omitted, indicating that the
221
properties are unchanged.
223
@var{Text-delta} is a text delta describing how to construct @var{f}
224
from that ancestor; it may also be omitted, indicating that @var{f}'s
225
text is identical to its ancestor's.
228
@item <file @var{ancestor}/>
229
An abbreviation for @code{<file @var{ancestor}></file>} --- a file
230
element with no property or text delta, thus describing a file identical
234
@item <directory @var{ancestor}>@var{prop-delta} @var{tree-delta}</directory>
235
The given @var{target-dir} entry refers to a subdirectory, @var{sub}.
236
@var{Ancestor} indicates which directory in the source tree @var{sub} is
237
derived from, if any.
239
@var{Prop-delta} is a property delta describing how @var{sub}'s
240
properties differ from that ancestor; it may be omitted, indicating that
241
the properties are unchanged.
243
@var{Tree-delta} describes how to construct @var{sub} from that
244
ancestor; it may be omitted, indicating that the directory is identical
245
to its ancestor. @var{Tree-delta} should be interpreted with a new
246
@var{target-dir} of @file{@var{target-dir}/@var{name}}.
248
Since @var{tree-delta} is itself a complete tree delta structure, tree
249
deltas are themselves trees, whose structure is a subgraph of the target
253
@item <directory @var{ancestor}/>
254
An abbreviation for @code{<directory @var{ancestor}></directory>} --- a
255
directory element with no property or tree delta, thus describing a
256
directory identical to its ancestor.
260
The @var{content} of a @code{add} or @code{open} tag may also contain
261
a property delta, describing changes to the properties of that
262
@emph{directory entry}.
264
In the @code{file} and @code{directory} elements described above, each
265
@var{ancestor} has one of the following forms:
268
@item ancestor='@var{path}'
269
The ancestor of the new or changed file or directory is
270
@file{@var{source-root}/@var{path}}, in @var{revision}. When this
271
appears as an attribute of a @code{file} element, the element's text
272
delta should be applied to @file{@var{source-root}/@var{path}}. When
273
this appears as an attribute of a @code{directory} element,
274
@file{@var{source-root}/@var{path}} should be the new @var{source-dir}
275
for interpreting that element's tree delta.
278
This indicates that the file or directory has no ancestor in the source
279
tree. When followed by a @var{text-delta}, that delta should be applied
280
to the empty file to yield the new text; when followed by a
281
@var{tree-delta}, that delta should be evaluated as if @var{source-dir}
282
were an imaginary empty directory.
285
If neither an @code{ancestor} nor a @code{new} attribute is given, this
286
is an abbreviation for @code{ancestor='@var{source-dir}/@var{name}'},
287
with the same revision number. This makes the common case --- files or
288
directories modified in place --- more compact.
292
If the @var{ancestor} spec is not @code{new='true'}, it may also contain
293
the text @code{revision='@var{rev}'}, indicating a new value for
294
@var{revision}, in which we should find the ancestor.
296
If a filename or path appearing as a @var{name} or @var{path} in the
297
description above contains the characters @samp{&}, @samp{<}, or
298
@samp{'}, they should be replaced with the sequences @samp{&},
299
@samp{<}, or @samp{'}, respectively.
301
Suppose we have the following source tree:
311
If we edit the contents of @file{/dir1/file1}, we can describe the
312
effect on the tree with the following tree delta, to be applied to the
320
<file>@var{text-delta}</file>
327
The outer @code{tree-delta} element describes the changes made to the root
328
directory. Within the root directory, there are changes in @file{dir1},
329
described by the nested @code{tree-delta}. Within @file{/dir1}, there are
330
changes in @file{file1}, described by the @var{text-delta}.
332
If we had edited both @file{/dir1/file1} and @file{/dir1/file2}, then
333
there would simply be two @code{open} elements in the inner
336
As another example, starting from the same source tree, suppose we
337
rename @file{/dir1/file1} to @file{/dir1/file8}:
343
<delete name='file1'/>
345
<file ancestor='/dir1/file1'/>
352
As above, the inner @code{tdelta} describes how @file{/dir1} has
353
changed: the entry for @file{/dir1/file1} has disappeared, but there is
354
a new entry, @file{/dir1/file8}, which is derived from and textually
355
identical to @file{/dir1/file1} in the source directory. This is just
356
an indirect way of describing the rename.
358
Why is it necessary to be so indirect? Consider the delta representing
362
renaming @file{/dir1/file1} to @file{/dir1/tmp},
364
renaming @file{/dir1/file2} to @file{/dir1/file1}, and
366
renaming @file{/dir1/tmp} to @file{/dir1/file2}
368
(in other words, exchanging @file{file1} and @file{file2}):
375
<file ancestor='/dir1/file2'/>
378
<file ancestor='/dir1/file1'/>
385
The indirectness allows the tree delta to capture an arbitrary
386
rearrangement without resorting to temporary filenames.
388
Another example, starting from the same source tree:
391
rename @file{/dir1/dir2} to @file{/dir1/dir4},
393
rename @file{/dir1/dir3} to @file{/dir1/dir2}, and
395
move @file{file3} from @var{/dir1/dir4} to @var{/dir1/dir2}.
397
Note that @file{file3}'s path has remained the same, even though the
398
directories around it have changed. Here is the tree delta:
405
<directory ancestor='/dir1/dir3'>
408
<file ancestor='/dir1/dir2/file3'/>
413
<delete name='dir3'/>
415
<directory ancestor='/dir1/dir2'>
417
<delete name='file3'/>
429
@file{/dir1} has changed;
431
the new directory @file{/dir1/dir2} is derived from the old
432
@file{/dir1/dir3}, and contains a new entry @file{file3}, derived from
433
the old @file{/dir1/dir2/file3};
435
there is no longer any @file{/dir1/dir3}; and
437
the new directory @file{/dir1/dir4} is derived from the old
438
@file{/dir1/dir2}, except that its entry for @file{file3} is now gone.
441
Some more possible maneuvers, left as exercises for the reader:
444
Delete @file{dir2}, and then create a file named @file{dir2}.
446
Rename @file{/dir1/dir2} to @file{/dir1/dir4}; move @file{file2} into
447
@file{/dir1/dir4}; and move @file{file3} into @var{/dir1/dir3}.
449
Move @file{dir2} into @file{dir3}, and move @file{dir3} into @file{/}.
452
@c ----------------------------------------------------------------------
453
@node Postfix Text Deltas
454
@section Postfix Text Deltas
456
It is sometimes useful to represent a set of changes to a tree without
457
providing text deltas in the middle of the stream. Text deltas are
458
often large and expensive to compute, and tree deltas can be useful
459
without them. For example, one can detect whether two changes might
460
conflict --- whether they change the same file, for example --- without
461
knowing exactly how the conflicting files changed.
463
For this reason, our XML representation of a tree delta allows the text
464
deltas to come @emph{after} the </tree-delta> closure. This allows the
465
client to receive early notice of conflicts: during a @code{svn commit}
466
command, the client sends a tree-delta to the server, which can check
467
for skeletal conflicts and reject the commit, before the client takes the
468
time to transmit the (possibly large) textual changes. This potentially
469
saves quite a bit of network traffic.
471
In terms of XML, postfix text deltas are split into two parts. The
472
first part appears "in-line" and contains a reference ID. The second
473
part appears after the tree delta is complete. Here's an example:
479
<text-delta-ref id="123">
484
<text-delta-ref id="456">
488
<text-delta id="123">@emph{data}</text-delta>
489
<text-delta id="456">@emph{data}</text-delta>
493
@c ----------------------------------------------------------------------
494
@node Serializing Deltas via the "Editor" Interface
495
@section Serializing Deltas via the "Editor" Interface
497
The static XML forms above are useful as an import/export format, and as
498
a visualization aid, but we also need a way to express a delta as a
499
@emph{series of operations}, to implement directory tree diffing and
500
patching. Subversion defines a standard set of such operations in the
501
vtable @code{svn_delta_edit_fns_t}, a set of function prototypes which
502
anyone may implement (see @file{svn_delta.h}).
504
Each function in an instance of @code{svn_delta_editor_t}
505
(colloquially known as an @dfn{editor}) implements some distinct subtask
506
of editing a directory tree. In fact, if you compare the editor
507
function prototypes to the XML elements described previously, you'll
508
notice a fairly strict correspondence: there's one function for
509
replacing a directory, another function for replacing a file, one for
510
adding a directory, another for adding a file, a function for deleting,
513
Although the editor interface was designed around the general idea of
514
making changes to a directory tree, a specific implementation's behavior
515
depends on its role. For example, the versioning filesystem library
516
offers an editor that creates new revisions, while the working copy
517
library offers an editor that updates working copies. And the network
518
layer offers an editor that turns editing calls into wire protocol,
519
which is then converted back into editing calls on the other side! All
520
of these different tasks can share a single interface, because they are
521
all fundamentally about the same thing: expressing and applying
522
differences between directory trees.
524
Like the XML forms, a series of editor calls must follow certain nesting
525
conventions; these conventions are implicit in the interface, in that
526
some of the functions take arguments that can only be obtained from
527
previous calls to other editor functions.
529
Editors can best be understood by watching one work on a real directory
532
@c kff todo: fooo working here.
534
Suppose that the user has made a number of local changes to her working
535
copy and wants to commit them to the repository. Let's represent her
536
changes with the same tree-delta from a previous example. Notice that
537
she has also made textual modifications to @file{file3}; hence the
538
in-line @code{<text-delta>}:
546
<directory ancestor='/dir1/dir3'>
549
<file ancestor='/dir1/dir2/file3'>
550
<text-delta>@emph{data}</text-delta>
556
<delete name='dir3'/>
558
<directory ancestor='/dir1/dir2'>
560
<delete name='file3'/>
570
So how does the client send this information to the server?
572
In a nutshell: the tree-delta is @emph{streamed} over the network, as a
573
series of individual commands given in depth-first order.
575
Let's be more specific. The server presents the client with an object
576
of type @code{struct svn_delta_edit_fns_t}, colloquially known as an
577
@dfn{editor}. An editor is really just table of functions; each
578
function makes a change to a filesystem. Agent A (who has a private
579
filesystem) presents an editor to agent B. Agent B then calls the
580
editor's functions to change A's filesystem. B is said to be
581
@dfn{driving} the editor.
583
As Karl Fogel likes to describe the process, if one thinks of the
584
tree-delta as a lion, the editor is a "hoop" that the lion jumps through
585
-- each portion of the lion being decomposed through time.
587
B cannot call the functions in any willy-nilly order; there are some
588
logical restrictions. In particular, as B drives the editor, it
589
receives opaque data structures which represent directories and files.
590
It must use and pass these structures, known as @dfn{batons}, to make
591
further function calls.
593
As an example, let's watch how the client would transmit the above
594
tree-delta to the repository. (The description below is slightly
595
simplified. For exact interface details, see
596
@file{subversion/include/svn_delta.h}.)
598
[Note: in the examples below, and throughout Subversion's code base,
599
you'll see references to 'baton' objects. This is simply a project
600
convention, a name given to structures that define contexts for
601
functions. Many APIs call these structures 'userdata'. In
602
Subversion, we like the term 'baton', because it reminds us of one
603
function ``handing off'' context to another function.]
608
The repository hands an "editor" to the client.
611
The client begins by calling
613
@code{root_baton = editor->open_root();}
615
The client now has an opaque object, @dfn{root_baton}, which represents
616
the root of the repository's filesystem.
619
@code{dir1_baton = editor->open_dir("dir1", root_baton);}
621
Notice that @emph{root_baton} gives the client free license to make any
622
changes it wants in the repository's root directory -- until, of course,
623
it calls @code{editor->close_dir(root_baton)}.
625
The first change made was a replacement of @file{dir1}. In return, the
626
client now has a new opaque data structure that can be used to change
630
@code{dir2_baton = editor->open_dir("dir2", "/dir1/dir3", dir1_baton);}
632
The @emph{dir1_baton} is now used to open @file{dir2} with a
633
directory whose ancestor is @file{/dir1/dir3}.
636
@code{file_baton = editor->add_file("file3", "/dir1/dir2/file3", dir2_baton);}
638
Edits are now made to @file{dir2} (using @emph{dir2_baton}). In
639
particular, a new file is added to this directory whose ancestor is
640
@file{/dir1/dir2/file3}.
643
Now the text-delta associated with @emph{file_baton} needs to be
646
@code{window_handler = editor->apply_textdelta(file_baton);}
648
Text-deltas themselves, for network efficiency, are streamed in
649
"chunks". So instead of receiving a baton object, we now have a routine
650
that is able to receive any number of small "windows" of text-delta
653
We won't go into the details of the @code{svn_txdelta_*} functions right
654
here; but suffice it to say that these routines are used for sending
655
svndiff data to the @emph{window_handler} routine.
658
@code{editor->close_file(file_baton);}
660
The client is done sending the file's text-delta, so it releases the
664
@code{editor->close_dir(dir2_baton));}
666
The client is done making changes to @file{dir2}, so it releases its
670
The client isn't yet finished with @file{dir1}, however; it makes two
673
@code{editor->delete_item("dir3", dir1_baton);} @*
674
@code{dir4_baton = editor->add_dir("dir4", "/dir1/dir2", dir1_baton);}
676
@emph{(The function's name is @code{delete_item} rather than
677
@code{delete} to avoid gratuitous incompatibility with C++, where
678
@code{delete} is a reserved keyword.)}
681
Within the directory @file{dir4} (whose ancestry is @file{/dir1/dir2}),
682
the client removes a file:
684
@code{editor->delete_item("file3", dir4_baton);}
687
The client is now finished with both @file{dir4}, as well as its parent
690
@code{editor->close_dir(dir4_baton);} @*
691
@code{editor->close_dir(dir1_baton);}
694
The entire tree-delta is complete. The repository knows this when the
695
root directory is closed:
697
@code{editor->close_dir(root_baton);}
702
Of course, at any point above, the repository may reject an edit. If
703
this is the case, the client aborts the transmission and the repository
704
hasn't changed a bit. (Thank goodness for transactions!)
706
Note, however, that this "editor interface" works in the other direction
707
as well. When the repository wishes to update a client's working copy,
708
it is the @emph{client's} reponsibility to give a custom editor-object
709
to the server, and the @emph{server} is the editor-driver.
711
Here are the main advantages of this interface:
716
@emph{Consistency}. Tree-deltas move across the network, in both
717
directions, using the same interface.
720
@emph{Flexibility}. Custom editor-implementations can be written to do
721
anything one might want; the editor-driver has no idea what is
722
happening on the other side of the interface. For example, an editor
727
Output XML that matches the tree-delta DTD above;
729
Output human-readable descriptions of the edits taking place;
736
Whatever the case, it's easy to "swap" editors around, and make client
737
and server do new and interesting things.