51
57
a linear, stripe, or snapshot device is located on the failed device
52
58
the command will not proceed without a '--force' option. The result
53
59
of using the '--force' option is the entire removal and complete
54
loss of the non-redundant logical volume. Once this operation is
55
complete, the volume group will again have a complete and consistent
56
view of the devices it contains. Thus, all operations will be
57
permitted - including creation, conversion, and resizing operations.
60
loss of the non-redundant logical volume. If an image or metadata area
61
of a RAID logical volume is on the failed device, the sub-LV affected is
62
replace with an error target device - appearing as <unknown> in 'lvs'
63
output. RAID logical volumes cannot be completely repaired by vgreduce -
64
'lvconvert --repair' (listed below) must be used. Once this operation is
65
complete on volume groups not containing RAID logical volumes, the volume
66
group will again have a complete and consistent view of the devices it
67
contains. Thus, all operations will be permitted - including creation,
68
conversion, and resizing operations. It is currently the preferred method
69
to call 'lvconvert --repair' on the individual logical volumes to repair
70
them followed by 'vgreduce --removemissing' to extract the physical volume's
71
representation in the volume group.
59
73
- 'lvconvert --repair <VG/LV>': This action is designed specifically
60
to operate on mirrored logical volumes. It is used on logical volumes
61
individually and does not remove the faulty device from the volume
62
group. If, for example, a failed device happened to contain the
63
images of four distinct mirrors, it would be necessary to run
64
'lvconvert --repair' on each of them. The ultimate result is to leave
65
the faulty device in the volume group, but have no logical volumes
66
referencing it. In addition to removing mirror images that reside
67
on failed devices, 'lvconvert --repair' can also replace the failed
68
device if there are spare devices available in the volume group. The
69
user is prompted whether to simply remove the failed portions of the
70
mirror or to also allocate a replacement, if run from the command-line.
71
Optionally, the '--use-policies' flag can be specified which will
72
cause the operation not to prompt the user, but instead respect
74
to operate on individual logical volumes. If, for example, a failed
75
device happened to contain the images of four distinct mirrors, it would
76
be necessary to run 'lvconvert --repair' on each of them. The ultimate
77
result is to leave the faulty device in the volume group, but have no logical
78
volumes referencing it. (This allows for 'vgreduce --removemissing' to
79
removed the physical volumes cleanly.) In addition to removing mirror or
80
RAID images that reside on failed devices, 'lvconvert --repair' can also
81
replace the failed device if there are spare devices available in the
82
volume group. The user is prompted whether to simply remove the failed
83
portions of the mirror or to also allocate a replacement, if run from the
84
command-line. Optionally, the '--use-policies' flag can be specified which
85
will cause the operation not to prompt the user, but instead respect
73
86
the policies outlined in the LVM configuration file - usually,
74
/etc/lvm/lvm.conf. Once this operation is complete, mirrored logical
75
volumes will be consistent and I/O will be allowed to continue.
76
However, the volume group will still be inconsistent - due to the
77
refernced-but-missing device/PV - and operations will still be
87
/etc/lvm/lvm.conf. Once this operation is complete, the logical volumes
88
will be consistent. However, the volume group will still be inconsistent -
89
due to the refernced-but-missing device/PV - and operations will still be
78
90
restricted to the aformentioned actions until either the device is
79
91
restored or 'vgreduce --removemissing' is run.
99
111
Automated Target Response to Failures:
100
112
--------------------------------------
101
The only LVM target type (i.e. "personality") that has an automated
102
response to failures is a mirrored logical volume. The other target
113
The only LVM target types (i.e. "personalities") that have an automated
114
response to failures are the mirror and RAID logical volumes. The other target
103
115
types (linear, stripe, snapshot, etc) will simply propagate the failure.
104
116
[A snapshot becomes invalid if its underlying device fails, but the
105
117
origin will remain valid - presuming the origin device has not failed.]
106
There are three types of errors that a mirror can suffer - read, write,
107
and resynchronization errors. Each is described in depth below.
119
Starting with the "mirror" segment type, there are three types of errors that
120
a mirror can suffer - read, write, and resynchronization errors. Each is
121
described in depth below.
109
123
Mirror read failures:
110
124
If a mirror is 'in-sync' (i.e. all images have been initialized and
184
198
choice of when to incure the extra performance costs of replacing
185
199
the failed image.
188
The appropriate time to take permanent corrective action on a mirror
189
should be driven by policy. There should be a directive that takes
190
a time or percentage argument. Something like the following:
191
- mirror_fault_policy_WHEN = "10sec"/"10%"
192
A time value would signal the amount of time to wait for transient
193
failures to resolve themselves. The percentage value would signal the
194
amount a mirror could become out-of-sync before the faulty device is
197
A mirror cannot be used unless /some/ corrective action is taken,
198
however. One option is to replace the failed mirror image with an
199
error target, forgo the use of 'handle_errors', and simply let the
200
out-of-sync regions accumulate and be tracked by the log. Mirrors
201
that have more than 2 images would have to "stack" to perform the
202
tracking, as each failed image would have to be associated with a
203
log. If the failure is transient, the device would replace the
204
error target that was holding its spot and the log that was tracking
205
the deltas would be used to quickly restore the portions that changed.
207
One unresolved issue with the above scheme is how to know which
208
regions of the mirror are out-of-sync when a problem occurs. When
209
a write failure occurs in the kernel, the log will contain those
210
regions that are not in-sync. If the log is a disk log, that log
211
could continue to be used to track differences. However, if the
212
log was a core log - or if the log device failed at the same time
213
as an image device - there would be no way to determine which
214
regions are out-of-sync to begin with as we start to track the
215
deltas for the failed image. I don't have a solution for this
216
problem other than to only be able to handle errors in this way
217
if conditions are right. These issues will have to be ironed out
218
before proceeding. This could be another case, where it is better
219
to handle failures in the kernel by allowing the kernel to store
220
updates in various metadata areas.
201
RAID logical volume device failures are handled differently from the "mirror"
202
segment type. Discussion of this can be found in lvm2-raid.txt.