1
.\" Copyright (c) 1982, 1993
2
.\" The Regents of the University of California. All rights reserved.
4
.\" Redistribution and use in source and binary forms, with or without
5
.\" modification, are permitted provided that the following conditions
7
.\" 1. Redistributions of source code must retain the above copyright
8
.\" notice, this list of conditions and the following disclaimer.
9
.\" 2. Redistributions in binary form must reproduce the above copyright
10
.\" notice, this list of conditions and the following disclaimer in the
11
.\" documentation and/or other materials provided with the distribution.
12
.\" 4. Neither the name of the University nor the names of its contributors
13
.\" may be used to endorse or promote products derived from this software
14
.\" without specific prior written permission.
16
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28
.\" $FreeBSD: src/sbin/fsck_ffs/SMM.doc/2.t,v 1.5.32.1 2010/02/10 00:26:20 kensmith Exp $
29
.\" @(#)2.t 8.1 (Berkeley) 6/5/93
31
.ds RH Overview of the file system
33
Overview of the file system
35
The file system is discussed in detail in [Mckusick84];
36
this section gives a brief overview.
40
A file system is described by its
42
The super-block is built when the file system is created (\c
46
contains the basic parameters of the file system,
47
such as the number of data blocks it contains
48
and a count of the maximum number of files.
49
Because the super-block contains critical data,
51
replicates it to protect against catastrophic loss.
53
.I "default super block"
54
always resides at a fixed offset from the beginning
55
of the file system's disk partition.
57
.I "redundant super blocks"
58
are not referenced unless a head crash
59
or other hard disk error causes the default super-block
61
The redundant blocks are sprinkled throughout the disk partition.
63
Within the file system are files.
64
Certain files are distinguished as directories and contain collections
65
of pointers to files that may themselves be directories.
66
Every file has a descriptor associated with it called an
68
The inode contains information describing ownership of the file,
69
time stamps indicating modification and access times for the file,
70
and an array of indices pointing to the data blocks for the file.
72
we assume that the first 12 blocks
73
of the file are directly referenced by values stored
74
in the inode structure itself\(dg.
76
\(dgThe actual number may vary from system to system, but is usually in
79
The inode structure may also contain references to indirect blocks
80
containing further data block indices.
81
In a file system with a 4096 byte block size, a singly indirect
82
block contains 1024 further block addresses,
83
a doubly indirect block contains 1024 addresses of further single indirect
85
and a triply indirect block contains 1024 addresses of further doubly indirect
86
blocks (the triple indirect block is never needed in practice).
88
In order to create files with up to
90
using only two levels of indirection,
91
the minimum size of a file system block is 4096 bytes.
92
The size of file system blocks can be any power of two
93
greater than or equal to 4096.
94
The block size of the file system is maintained in the super-block,
95
so it is possible for file systems of different block sizes
96
to be accessible simultaneously on the same system.
97
The block size must be decided when
99
creates the file system;
100
the block size cannot be subsequently
101
changed without rebuilding the file system.
105
Associated with the super block is non replicated
106
.I "summary information" .
107
The summary information changes
108
as the file system is modified.
109
The summary information contains
110
the number of blocks, fragments, inodes and directories in the file system.
114
The file system partitions the disk into one or more areas called
115
.I "cylinder groups".
116
A cylinder group is comprised of one or more consecutive
118
Each cylinder group includes inode slots for files, a
120
describing available blocks in the cylinder group,
121
and summary information describing the usage of data blocks
122
within the cylinder group.
123
A fixed number of inodes is allocated for each cylinder group
124
when the file system is created.
125
The current policy is to allocate one inode for each 2048
127
this is expected to be far more inodes than will ever be needed.
129
All the cylinder group bookkeeping information could be
130
placed at the beginning of each cylinder group.
131
However if this approach were used,
132
all the redundant information would be on the top platter.
133
A single hardware failure that destroyed the top platter
134
could cause the loss of all copies of the redundant super-blocks.
135
Thus the cylinder group bookkeeping information
136
begins at a floating offset from the beginning of the cylinder group.
140
cylinder group is about one track further
141
from the beginning of the cylinder group
147
information spirals down into the pack;
148
any single track, cylinder,
149
or platter can be lost without losing all copies of the super-blocks.
150
Except for the first cylinder group,
151
the space between the beginning of the cylinder group
152
and the beginning of the cylinder group information stores data.
156
To avoid waste in storing small files,
157
the file system space allocator divides a single
158
file system block into one or more
160
The fragmentation of the file system is specified
161
when the file system is created;
162
each file system block can be optionally broken into
163
2, 4, or 8 addressable fragments.
164
The lower bound on the size of these fragments is constrained
165
by the disk sector size;
166
typically 512 bytes is the lower bound on fragment size.
167
The block map associated with each cylinder group
168
records the space availability at the fragment level.
169
Aligned fragments are examined
170
to determine block availability.
172
On a file system with a block size of 4096 bytes
173
and a fragment size of 1024 bytes,
174
a file is represented by zero or more 4096 byte blocks of data,
175
and possibly a single fragmented block.
176
If a file system block must be fragmented to obtain
177
space for a small amount of data,
178
the remainder of the block is made available for allocation
181
consider an 11000 byte file stored on
182
a 4096/1024 byte file system.
183
This file uses two full size blocks and a 3072 byte fragment.
184
If no fragments with at least 3072 bytes
185
are available when the file is created,
186
a full size block is split yielding the necessary 3072 byte
187
fragment and an unused 1024 byte fragment.
188
This remaining fragment can be allocated to another file, as needed.
190
Updates to the file system
192
Every working day hundreds of files
193
are created, modified, and removed.
194
Every time a file is modified,
195
the operating system performs a
196
series of file system updates.
197
These updates, when written on disk, yield a consistent file system.
198
The file system stages
199
all modifications of critical information;
201
either be completed or cleanly backed out after a crash.
202
Knowing the information that is first written to the file system,
203
deterministic procedures can be developed to
204
repair a corrupted file system.
205
To understand this process,
206
the order that the update
207
requests were being honored must first be understood.
209
When a user program does an operation to change the file system,
212
the data to be written is copied into an internal
214
buffer in the kernel.
215
Normally, the disk update is handled asynchronously;
216
the user process is allowed to proceed even though
217
the data has not yet been written to the disk.
219
along with the inode information reflecting the change,
220
is eventually written out to disk.
221
The real disk write may not happen until long after the
223
system call has returned.
224
Thus at any given time, the file system,
225
as it resides on the disk,
226
lags the state of the file system represented by the in-core information.
228
The disk information is updated to reflect the in-core information
229
when the buffer is required for another use,
232
is done (at 30 second intervals) by
233
.I "/etc/update" "(8),"
234
or by manual operator intervention with the
237
If the system is halted without writing out the in-core information,
238
the file system on the disk will be in an inconsistent state.
240
If all updates are done asynchronously, several serious
241
inconsistencies can arise.
242
One inconsistency is that a block may be claimed by two inodes.
243
Such an inconsistency can occur when the system is halted before
244
the pointer to the block in the old inode has been cleared
245
in the copy of the old inode on the disk,
246
and after the pointer to the block in the new inode has been written out
247
to the copy of the new inode on the disk.
249
there is no deterministic method for deciding
250
which inode should really claim the block.
251
A similar problem can arise with a multiply claimed inode.
253
The problem with asynchronous inode updates
254
can be avoided by doing all inode deallocations synchronously.
256
inodes and indirect blocks are written to the disk synchronously
257
(\fIi.e.\fP the process blocks until the information is
258
really written to disk)
259
when they are being deallocated.
260
Similarly inodes are kept consistent by synchronously
261
deleting, adding, or changing directory entries.
262
.ds RH Fixing corrupted file systems