2
A proposal for an svn filesystem dump/restore format.
4
Two problems we want to solve
5
=============================
7
1. When we change our node-id schema, we need to migrate all of our
8
data (by dumping and restoring).
10
2. Serves as a backup format. Could be read by other software tools
17
A. Written as two new public functions in svn_fs.h. To be invoked
18
by new 'svnadmin' subcommands.
20
B. Format uses only timeless fs concepts.
22
The dump format needs to reference concepts that we *know* are
23
general enough to never change. These concepts must exist
24
independently of any internal node-id schema, or any DB storage
25
backend. In other words, we're talking about the basic ideas in
26
our original "design spec" from May 2000.
32
Here are the timeless semantics of our fs design -- the things that
33
would be stored in our dump format.
35
- A filesystem is an array of trees.
36
Each tree is called a "revision" and has unversioned properties attached.
38
- A revision has a tree of "nodes" hanging off of it.
39
Actually, the nodes in the filesystem form a DAG. A revision
40
always points to an initial node that represents the 'root' of some tree.
42
- The majority of a tree's nodes are hard-links (references) to
43
nodes that were created in earlier trees.
48
- versioned properties
49
- predecessor history: "which node am I a variant of?"
50
- copy history: "which node am I a copy of?"
52
The history values can be non-existent (meaning the node is
53
completely new), or can have a value of {revision, path}.
56
------------------------------------------------------------------------
57
Refinement of proposal #2: (after discussion with gstein)
58
=========================
60
Each node starts with RFC822-style headers at the top. The final
61
header is a 'Content-length:', followed by the content, so record
62
boundaries can be inferred.
64
The content section has two implicit parts: a property hash, and the
65
fulltext. The division between these two sections is implied by the
66
"PROPS-END\n" tag at the end of the prophash. In the case of a
67
directory node or a revision, only the prophash is present.
69
-----------------------------------------------------------------
71
SVN DUMPFILE VERSION 1 FORMAT
73
The format starts with the version number of the dump format
74
("SVN-fs-dump-format-version: 1\n"), followed by a series of revision
75
records. Each revision record starts with information about the
76
revision, followed by a variable number of node changes for that
77
revision. Fields in [braces] are optional, and unknown headers are
78
always ignored, for backwards compatibility.
81
Prop-content-length: P
84
...P bytes of property data. Properties are stored in the same
85
human-readable hashdump format used by working copy property files,
86
except that they end with "PROPS-END\n" for better readability.
88
Node-path: /absolute/path/to/node/in/filesystem
89
Node-kind: file | dir (1)
90
Node-action: change | add | delete | replace
91
[Node-copyfrom-rev: X]
92
[Node-copyfrom-path: /path ]
93
[Text-copy-source-md5: blob] (2)
94
[Text-content-md5: blob]
95
[Text-content-length: T]
96
[Prop-content-length: P]
99
... Y bytes of content data, divided into P bytes of "property"
100
data and T bytes of "text" data. The properties come first; their
101
total length (including formatting) is Prop-content-length, and is
102
included in Node-content-length. The "PROPS-END\n" line always
103
terminates the property section if there are props. The remainder
104
of the Y bytes (expected to be equivalent to Text-content-length)
105
represent the contents of the node.
110
(1) if the node represents a deletion, this field is optional.
112
(2) this is a checksum of the source of the copy. a loader process
113
can use this checksum to determine that the copyfrom path/rev
114
already present in a filesystem is really the *correct* one to use.
116
(3) the Content-length header is technically unnecessary, since the
117
information it holds (and more) can be found in the
118
Prop-content-length and Text-content-length fields. Though
119
Subversion itself does not make use of the header when reading a
120
dumpfile, we include it for compatibility with generic RFC822
123
(4) There are actually 2 types of version 1 dump streams. The regular ones
124
are generated since r2634 (svn 0.14.0). Older ones also claim to be
125
version 1, but miss the Props-content-length and Text-content-length
126
fields in the block header. In those days there *always* was a
129
-----------------------------------------------------------------
132
Here's an example of revision 1422, whereby I added a new directory
133
"baz", added a new file "bop" inside it, and modified the file "foo.c":
136
Revision-number: 1422
137
Prop-content-length: 80
147
Added two files, changed a third.
153
Prop-content-length: 35
163
Node-path: bar/baz/bop
166
Prop-content-length: 76
167
Text-content-length: 54
179
Here is the text of the newly added 'bop' file.
185
Text-content-length: 102
188
Here is the fulltext of my change to an existing /bar/foo.c.
189
Notice that this file has no properties.
191
-------------------------------------
193
SVN DUMPFILE VERSION 2 FORMAT
195
This format is equivalent to the VERSION 1 format in every respect,
196
except for the following:
198
1.) The format starts with the new version number of the dump format
199
("SVN-fs-dump-format-version: 2\n").
201
2.) In addition to "Revision Records", another sort of record is supported:
202
the "UUID" record, which should be of the form:
204
UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e
206
This should be used to indicate the UUID of the originating repository.
208
-------------------------------------
210
SVN DUMPFILE VERSION 3 FORMAT
212
This format is equivalent to the VERSION 2 format except for the
215
1.) The format starts with the new version number of the dump format
216
("SVN-fs-dump-format-version: 3\n").
218
2.) There are two new optional headers for node changes:
220
[Text-delta: true|false]
221
[Prop-delta: true|false]
223
The default value for these headers is "false". If the value is
224
set to "true", then the text and property contents will be treated
225
as deltas against the previous contents of the node (as determined
226
by copy history for adds with history, or by the value in the
227
previous revision for changes--just as with commits).
229
Property deltas have the same format as regular property lists except
230
that (1) properties with the same value as in the previous contents of
231
the node are not printed, and (2) deleted properties will be written
237
just as a regular property is printed, but with the "K " changed to a
238
"D " and with no value part.
240
Text deltas are written out as a series of svndiff windows.