58
58
current File specifying current revision and next node/copy id
59
59
fs-type File identifying this filesystem as an FSFS filesystem
60
60
write-lock Empty file, locked to serialise writers
61
pack-lock Empty file, locked to serialise 'svnadmin pack' (f. 7+)
61
62
txn-current-lock Empty file, locked to serialise 'txn-current'
62
uuid File containing the UUID of the repository
63
uuid File containing the repository IDs
63
64
format File containing the format number of this filesystem
64
65
fsfs.conf Configuration file
65
66
min-unpacked-rev File containing the oldest revision not in a pack file
66
min-unpacked-revprop File containing the oldest revision of unpacked revprop
67
min-unpacked-revprop Same for revision properties (format 5 only)
67
68
rep-cache.db SQLite database mapping rep checksums to locations
69
70
Files in the revprops directory are in the hash dump format used by
84
85
final stage of a commit and unlocked after the new "current" file has
85
86
been moved into place to indicate that a new revision is present. It
86
87
is also locked during a revprop propchange while the revprop file is
87
read in, mutated, and written out again. Note that readers are never
88
blocked by any operation - writers must ensure that the filesystem is
89
always in a consistent state.
88
read in, mutated, and written out again. Furthermore, it will be used
89
to serialize the repository structure changes during 'svnadmin pack'
90
(see also next section). Note that readers are never blocked by any
91
operation - writers must ensure that the filesystem is always in a
94
The "pack-lock" file is an empty file which is locked before an 'svnadmin
95
pack' operation commences. Thus, only one process may attempt to modify
96
the repository structure at a time while other processes may still read
97
and write (commit) to the repository during most of the pack procedure.
98
It is only available with format 7 and newer repositories. Older formats
99
use the global write-lock instead which disables commits completely
100
for the duration of the pack process.
91
102
The "txn-current" file is a file with a single line of text that
92
103
contains only a base-36 number. The current value will be used in the
183
197
Format 6+: Applied equally to revision data and revprop data
184
198
(i.e. same min packed revision)
201
Format 1-6: Physical addressing; uses fixed positions within a rev file
202
Format 7+: Logical addressing; uses item index that will be translated
203
on-the-fly to the actual rev / pack file location
206
Format 1+: The first line of db/uuid contains the repository UUID
207
Format 7+: The second line contains the instance ID (in UUID formatting)
186
209
# Incomplete list. See SVN_FS_FS__MIN_*_FORMAT
189
212
Filesystem format options
190
213
-------------------------
192
Currently, the only recognised format option is "layout", which
193
specifies the paths that will be used to store the revision files and
194
revision property files.
215
Currently, the only recognised format options are "layout" and "addressing".
216
The first specifies the paths that will be used to store the revision
217
files and revision property files. The second specifies that logical to
218
physical address translation is required.
196
220
The "layout" option is followed by the name of the filesystem layout
197
221
and any required parameters. The default layout, if no "layout"
219
243
revs/0/ directory will contain revisions 0-999, revs/1/ will contain
220
244
1000-1999, and so on.
246
The "addressing" option is followed by the name of the addressing mode
247
and any required parameters. The default addressing, if no "addressing"
248
keyword is specified, is the 'physical' addressing.
250
The supported modes, and the parameters they require, are as follows:
253
All existing and future revision files will use the traditional
254
physical addressing scheme. All references are given as rev/offset
255
pairs with "offset" being the byte offset relative to the beginning of
256
the revision in the respective rev or pack file.
259
All existing and future revision files will use logical
260
addressing. It is illegal to use logical addressing on non-sharded
267
Two addressing modes are supported in format 7: physical and logical
268
addressing. Both use the same address format but apply a different
269
interpretation to it. Older formats only support physical addressing.
271
All items are addressed using <rev> <item_index> pairs. In physical
272
addressing mode, item_index is the (ASCII decimal) number of bytes from
273
the start of the revision file to the start of the respective item. For
274
non-packed files that is also the absolute file offset. Revision pack
275
files simply concatenate multiple rev files, i.e. the absolute file offset
278
absolute offset = rev offset taken from manifest + item_index
280
This simple addressing scheme makes it hard to change the location of
281
any item since that may break references from later revisions.
283
Logical addressing uses an index file to translate the rev / item_index
284
pairs into absolute file offsets. There is one such index for every rev /
285
pack file using logical addressing and both are created in sync. That
286
makes it possible to reorder items during pack file creation, particularly
287
to mix items from different revisions.
289
Some item_index values are pre-defined and apply to every revision:
291
0 ... not used / invalid
292
1 ... changed path list
293
2 ... root node revision
295
A reverse index (phys-to-log) is being created as well that allows for
296
translating arbitrary file locations into item descriptions (type, rev,
297
item_index, on-disk length). Known item types
299
0 ... unused / empty section
300
1 ... file representation
301
2 ... directory representation
302
3 ... file property representation
303
4 ... directory property representation
305
6 ... changed paths list
307
The various representation types all share the same morphology. The
308
distinction is only made to allow for more effective reordering heuristics.
309
Zero-length items are allowed.
222
312
Packing revisions
223
313
-----------------
225
315
A filesystem can optionally be "packed" to conserve space on disk. The
226
316
packing process concatenates all the revision files in each full shard to
227
create pack files. A manifest file is also created for each shard which
317
create a pack file. The original shard is removed, and reads are
318
redirected to the pack file.
320
With physical addressing, a manifest file is created for each shard which
228
321
records the indexes of the corresponding revision files in the pack file.
229
In addition, the original shard is removed, and reads are redirected to the
232
The manifest file consists of a list of offsets, one for each revision in the
233
pack file. The offsets are stored as ASCII decimal, and separated by a newline
322
The manifest file consists of a list of offsets, one for each revision in
323
the pack file. The offsets are stored as ASCII decimal, and separated by
326
Revision pack files using logical addressing don't use manifest files but
327
appends index data to the revision contents. The revisions inside a pack
328
file will also get interleaved to reduce I/O for typical access patterns.
329
There is no structural difference between packed and non-packed revision
236
333
Packing revision properties (format 5: SQLite)
237
334
---------------------------
341
438
Within a revision:
343
440
Within a revision file, node-revs have a txn-id field of the form
344
"r<rev>/<offset>", to support easy lookup. The <offset> is the (ASCII
345
decimal) number of bytes from the start of the revision file to the
346
start of the node-rev.
441
"r<rev>/<item_index>", to support easy lookup. See addressing modes
348
444
During the final phase of a commit, node-revision IDs are rewritten
349
445
to have repository-wide unique node-ID and copy-ID fields, and to have
350
"r<rev>/<offset>" txn-id fields.
446
"r<rev>/<item_index>" txn-id fields.
352
448
In Format 3 and above, this uniqueness is done by changing a temporary
353
449
id of "_<base36>" to "<base36>-<rev>". Note that this means that the
429
525
* Text and property representations
431
527
* The changed-path data
432
* Two offsets at the very end
528
* Index data (logical addressing only)
529
* Revision / pack file footer (logical addressing only)
434
531
A representation begins with a line containing either "PLAIN\n" or
435
"DELTA\n" or "DELTA <rev> <offset> <length>\n", where <rev>, <offset>,
436
and <length> give the location of the delta base of the representation
437
and the amount of data it contains (not counting the header or
438
trailer). If no base location is given for a delta, the base is the
532
"DELTA\n" or "DELTA <rev> <item_index> <length>\n", where <rev>,
533
<item_index>, and <length> give the location of the delta base of the
534
representation and the amount of data it contains (not counting the header
535
or trailer). If no base location is given for a delta, the base is the
439
536
empty stream. After the initial line comes raw svndiff data, followed
440
537
by a cosmetic trailer "ENDREP\n".
459
556
type "file" or "dir"
460
557
pred The ID of the predecessor node-rev
461
558
count Count of node-revs since the base of the node
462
text "<rev> <offset> <length> <size> <digest>" for text rep
463
props "<rev> <offset> <length> <size> <digest>" for props rep
464
<rev> and <offset> give location of rep
559
text "<rev> <item_index> <length> <size> <digest>" for text rep
560
props "<rev> <item_index> <length> <size> <digest>" for props rep
561
<rev> and <item_index> give location of rep
465
562
<length> gives length of rep, sans header and trailer
466
<size> gives size of expanded rep; may be 0 if equal
563
<size> gives size of expanded rep (*)
468
564
<digest> gives hex MD5 digest of expanded rep
469
565
### in formats >=4, also present:
470
566
<sha1-digest> gives hex SHA1 digest of expanded rep
476
572
which have svn:mergeinfo.
477
573
minfo-here Exists if this node itself has svn:mergeinfo.
575
(*) Earlier versions of this document would state that <size> may be 0
576
if the actual value matches <length>. This is only true for property
577
and directory representations and should be avoided in general. File
578
representations may not be handled correctly by SVN before 1.7.20,
579
1.8.12 and 1.9.0, if they have 0 <size> fields for non-empty contents.
580
Releases 1.8.0 through 1.8.11 may have falsely created instances of
581
that (see issue #4554). Finally, 0 <size> fields are only ever legal
582
for DELTA representations if the reconstructed full-text is actually
479
585
The predecessor of a node-rev crosses both soft and true copies;
480
586
together with the count field, it allows efficient determination of
481
587
the base for skip-deltas. The first node-rev of a node contains no
489
595
of revision 0). Copy roots are identified by revision and
490
596
created-path, not by node-rev ID, because a copy root may be a
491
597
node-rev which exists later on within the same revision file, meaning
492
its offset is not yet known.
598
its location is not yet known.
494
600
The changed-path data is represented as a series of changed-path
495
601
items, each consisting of two lines. The first line has the format
496
"<id> <action> <text-mod> <prop-mod> <path>\n", where <id> is the
497
node-rev ID of the new node-rev, <action> is "add", "delete",
498
"replace", or "modify", <text-mod> and <prop-mod> are "true" or
499
"false" indicating whether the text and/or properties changed, and
500
<path> is the changed pathname. For deletes, <id> is the node-rev ID
501
of the deleted node-rev, and <text-mod> and <prop-mod> are always
502
"false". The second line has the format "<rev> <path>\n" containing
503
the node-rev's copyfrom information if it has any; if it does not, the
504
second line is blank.
602
"<id> <action> <text-mod> <prop-mod> <mergeinfo-mod> <path>\n",
603
where <id> is the node-rev ID of the new node-rev, <action> is "add",
604
"delete", "replace", or "modify", <text-mod>, <prop-mod>, and
605
<mergeinfo-mod> are "true" or "false" indicating whether the text,
606
properties and/or mergeinfo changed, and <path> is the changed pathname.
607
For deletes, <id> is the node-rev ID of the deleted node-rev, and
608
<text-mod> and <prop-mod> are always "false". The second line has the
609
format "<rev> <path>\n" containing the node-rev's copyfrom information
610
if it has any; if it does not, the second line is blank.
506
612
Starting with FS format 4, <action> may contain the kind ("file" or
507
613
"dir") of the node, after a hyphen; for example, an added directory
508
614
may be represented as "add-dir".
510
At the very end of a rev file is a pair of lines containing
511
"\n<root-offset> <cp-offset>\n", where <root-offset> is the offset of
512
the root directory node revision and <cp-offset> is the offset of the
616
Prior to FS format 7, <mergeinfo-mod> flag is not available. It may
617
also be missing in revisions upgraded from pre-f7 formats.
619
In physical addressing mode, at the very end of a rev file is a pair of
620
lines containing "\n<root-offset> <cp-offset>\n", where <root-offset> is
621
the offset of the root directory node revision and <cp-offset> is the
622
offset of the changed-path data.
624
In logical addressing mode, the revision footer has the form
626
<l2p offset> <l2p checksum> <p2l offset> <p2l checksum><terminal byte>
628
The terminal byte contains the length (as plain 8 bit value) of the footer
629
excluding that length byte. The first offset is the start of the log-to-
630
phys index, followed by the digest of the MD5 checksum over its content.
631
The other pair gives the same of for the phys-to-log index.
515
633
All numbers in the rev file format are unsigned and are represented as
533
652
rev Prototype rev file with new text reps
534
653
rev-lock Lockfile for writing to the above
536
In newer formats, these files are in the txn-protorevs/ directory.
655
(In newer formats, these files are in the txn-protorevs/ directory.)
657
In format 7+ logical addressing mode, it contains two additional index
658
files (see structure-indexes for a detailed description) and one more
661
itemidx Next item_index value as decimal integer
662
index.l2p Log-to-phys proto-index
663
index.p2l Phys-to-log proto-index
538
665
The prototype rev file is used to store the text representations as
539
666
they are received from the client. To ensure that only one client is
540
667
writing to the file at a given time, the "rev-lock" file is locked for
541
668
the duration of each write.
543
The two kinds of props files are all in hash dump format. The "props"
670
The three kinds of props files are all in hash dump format. The "props"
544
671
file will always be present. The "node.<nid>.<cid>.props" file will
545
only be present if the node-rev properties have been changed.
672
only be present if the node-rev properties have been changed. The
673
"props-final" only exists while converting the transaction into a revision.
547
676
The <sha1> files have been introduced in FS format 6. Their content
548
is that of text rep references: "<rev> <offset> <length> <size> <digest>"
677
is that of text rep references: "<rev> <item_offset> <length> <size> <digest>"
549
678
They will be written for text reps in the current transaction and be
550
679
used to eliminate duplicate reps within that transaction.