2
* PROGRAM: JRD Access Method
3
* MODULE: validation.cpp
4
* DESCRIPTION: Validation and garbage collection
6
* The contents of this file are subject to the Interbase Public
7
* License Version 1.0 (the "License"); you may not use this file
8
* except in compliance with the License. You may obtain a copy
9
* of the License at http://www.Inprise.com/IPL.html
11
* Software distributed under the License is distributed on an
12
* "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express
13
* or implied. See the License for the specific language governing
14
* rights and limitations under the License.
16
* The Original Code was created by Inprise Corporation
17
* and its predecessors. Portions created by Inprise Corporation are
18
* Copyright (C) Inprise Corporation.
20
* All Rights Reserved.
21
* Contributor(s): ______________________________________.
27
Database Validation and Repair
28
==============================
32
Updated: 1996-Dec-11 David Schnepper
37
The following terminology will be helpful to understand in this discussion:
39
record fragment: The smallest recognizable piece of a record; multiple
40
fragments can be linked together to form a single version.
41
record version: A single version of a record representing an INSERT, UPDATE
42
or DELETE by a particular transaction (note that deletion
43
of a record causes a new version to be stored as a
45
record chain: A linked list of record versions chained together to
46
represent a single logical "record".
47
slot: The line number of the record on page. A
48
variable-length array on each data page stores the
49
offsets to the stored records on
50
that page, and the slot is an index into that array.
52
For more information on data page format, see my paper on the internals
53
of the InterBase Engine.
57
Here are all the options for gfix which have to do with validation, and
60
gfix switch dpb parameter
61
----------- -------------
63
-validate isc_dpb_verify (gds__dpb_verify prior to 4.0)
65
Invoke validation and repair. All other switches modify this switch.
69
Visit all records. Without this switch, only page structures will be
70
validated, which does involve some limited checking of records.
74
Attempts to mend the database where it can to make it viable for reading;
75
does not guarantee to retain data.
77
-no_update isc_dpb_no_update
79
Specifies that orphan pages not be released, and allocated pages not
80
be marked in use when found to be free. Actually a misleading switch
81
name since -mend will update the database, but if -mend is not specified
82
and -no_update is specified, then no updates will occur to the database.
84
-ignore isc_dpb_ignore
86
Tells the engine to ignore checksums in fetching pages. Validate will
87
report on the checksums, however. Should probably not even be a switch,
88
it should just always be in effect. Otherwise checksums will disrupt
89
the validation. Customers should be advised to always use it.
90
NOTE: Unix 4.0 (ODS 8.0) does not have on-page checksums, and all
91
platforms under ODS 9.0 (NevaStone & above) does not have
96
Validation runs only with exclusive access to the database, to ensure
97
that database structures are not modified during validation. On attach,
98
validate attempts to obtain an exclusive lock on the database.
100
If other attachments are already made locally or through the same multi-
101
client server, validate gives up with the message:
103
"Lock timeout during wait transaction
104
-- Object "database_filename.fdb" is in use"
106
If other processes or servers are attached to the database, validate
107
waits for the exclusive lock on the database (i.e. waits for every
108
other server to get out of the database).
110
NOTE: Ordinarily when processes gain exclusive access to the database,
111
all active transactions are marked as dead on the Transaction Inventory
112
Pages. This feature is turned off for validation.
114
IV. PHASES OF VALIDATION
116
There are two phases to the validation, the first of which is a walk through
117
the entire database (described below). During this phase, all pages visited
118
are stored in a bitmap for later use during the garbage collection phase.
122
During the walk-through phase, any page that is fetched goes through a
127
Each page is check against its expected type. If the wrong type
128
page is found in the page header, the message:
130
"Page xxx wrong type (expected xxx encountered xxx)"
132
is returned. This could represent a) a problem with the database
133
being overwritten, b) a bug with InterBase page allocation mechanisms
134
in which one page was written over another, or c) a page which was
135
allocated but never written to disk (most likely if the encountered
138
The error does not tell you what page types are what, so here
139
they are for reference:
141
define pag_undefined 0 // purposely undefined
142
define pag_header 1 // Database header page
143
define pag_pages 2 // Page inventory page
144
define pag_transactions 3 // Transaction inventory page
145
define pag_pointer 4 // Pointer page
146
define pag_data 5 // Data page
147
define pag_root 6 // Index root page
148
define pag_index 7 // Index (B-tree) page
149
define pag_blob 8 // Blob data page
150
define pag_ids 9 // Gen-ids
151
define pag_log 10 // Write ahead log page: 4.0 only
155
If -ignore is specified, the checksum is specifically checked in
156
validate instead of in the engine. If the checksum is found to
159
"Checksum error on page xxx"
161
is returned. This is harmless when found by validate, and the page
162
will still continue to be validated--if data structures can be
163
validated on page, they will be. If -mend is specified, the page
164
will be marked for write, so that when the page is written to disk
165
at the end of validation the checksum will automatically be
168
Note: For 4.0 only Windows & NLM platforms keep page checksums.
172
We check each page fetched against the page bitmap to make sure we
173
have not visited already. If we have, the error:
175
"Page xxx doubly allocated"
177
is returned. This should catch the case when a page of the same type
178
is allocated for two different purposes.
180
Data pages are not checked with the Revisit mechanism - when walking
181
record chains and fragments they are frequently revisited.
183
B. Garbage Collection
185
During this phase, the Page Inventory (PIP) pages are checked against the
186
bitmap of pages visited. Two types of errors can be detected during
191
If any pages in the page inventory were not visited
192
during validation, the following error will be returned:
194
"Page xxx is an orphan"
196
If -no_update was not specified, the page will be marked as free
199
2. Improperly Freed Pages
201
If any pages marked free in the page inventory were in fact
202
found to be in use during validation, the following error
205
"Page xxx is use but marked free" (sic)
207
If -no_update was not specified, the page will be marked in use
210
NOTE: If errors were found during the validation phase, no changes will
211
be made to the PIP pages. This assumes that we did not have a chance to
212
visit all the pages because invalid structures were detected.
214
V. WALK-THROUGH PHASE
218
In order to ensure that all pages are fetched during validation, the
219
following pages are fetched just for the most basic validation:
221
1. The header page (and for 4.0 any overflow header pages).
222
2. Log pages for after-image journalling (4.0 only).
223
3. Page Inventory pages.
224
4. Transaction Inventory pages
226
If the system relation RDB$PAGES could not be read or did not
227
contain any TIP pages, the message:
229
"Transaction inventory pages lost"
231
will be returned. If a particular page is missing from the
232
sequence as established by RDB$PAGE_SEQUENCE, then the following
233
message will be returned:
235
"Transaction inventory page lost, sequence xxx"
237
If -mend is specified, then a new TIP will be allocated on disk and
238
stored in RDB$PAGES in the proper sequence. All transactions which
239
would have been on that page are assumed committed.
241
If a TIP page does not point to the next one in sequence, the
242
following message will be returned:
244
"Transaction inventory pages confused, sequence xxx"
246
5. Generator pages as identified in RDB$PAGES.
250
All the relations in the database are walked. For each relation, all
251
indices defined on the relation are fetched, and all pointer and
252
data pages associated with the relation are fetched (see below).
254
But first, the metadata is scanned from RDB$RELATIONS to fetch the
255
format of the relation. If this information is missing or
256
corrupted the relation cannot be walked.
257
If any bugchecks are encountered from the scan, the following
260
"bugcheck during scan of table xxx (<table_name>)"
262
This will prevent any further validation of the relation.
264
NOTE: For views, the metadata is scanned but nothing further is done.
268
Prior to 4.5 (NevaStone) Indices were walked before data pages.
269
In NevaStone Index walking was moved to after data page walking.
270
Please refer to the later section entitled "Index Walking".
274
All the pointer pages for the relation are walked. As they are walked
275
all child data pages are walked (see below). If a pointer page cannot
276
be found, the following message is returned:
278
"Pointer page (sequence xxx) lost"
280
If the pointer page is not part of the relation we expected or
281
if it is not marked as being in the proper sequence, the following
284
"Pointer page xxx is inconsistent"
286
If each pointer page does not point to the next pointer page as
287
stored in the RDB$PAGE_SEQUENCE field in RDB$PAGES, the following
290
"Pointer page (sequence xxx) inconsistent"
294
Each of the data pages referenced by the pointer page is fetched.
295
If any are found to be corrupt at the page level, and -mend is
296
specified, the page is deleted from its pointer page. This will
297
cause a whole page of data to be lost.
299
The data page is corrupt at the page level if it is not marked as
300
part of the current relation, or if it is not marked as being in
301
the proper sequence. If either of these conditions occurs, the
302
following error is returned:
304
"Data page xxx (sequence xxx) is confused"
308
Each of the slots on the data page is looked at, up to the count
309
of records stored on page. If the slot is non-zero, the record
310
fragment at the specified offset is retrieved. If the record
311
begins before the end of the slots array, or continues off the
312
end of the page, the following error is returned:
314
"Data page xxx (sequence xxx), line xxx is bad"
316
where "line" means the slot number.
318
NOTE: If this condition is encountered, the data page is considered
319
corrupt at the page level (and thus will be removed from its
320
pointer page if -mend is specified).
324
The record at each slot is looked at for basic validation, regardless
325
of whether -full is specified or not. The fragment could be any of the
330
If the fragment is marked as a back version, then it is skipped.
331
It will be fetched as part of its record.
335
If the fragment is determined to be corrupt for any reason, and -mend
336
is specified, then the record header is marked as damaged.
340
If the fragment is marked damaged already from a previous visit or
341
a previous validation, the following error is returned:
343
"Record xxx is marked as damaged"
345
where xxx is the record number.
349
If the record is marked with a transaction id greater than the last
350
transaction started in the database, the following error is returned:
352
"Record xxx has bad transaction xxx"
356
If -full is specified, and the fragment is the first fragment in a logical
357
record, then the record at this slot number is fully retrieved. This
358
involves retrieving all versions, and all fragments of each
359
particular version. In other
360
words, the entire logical record will be retrieved.
364
If there are any back versions, they are visited at this point.
365
If the back version is on another page, the page is fetched but
366
not validated since it will be walked separately.
368
If the slot number of the back version is greater than the max
369
records on page, or there is no record stored at that slot number,
370
or it is a blob record, or it is a record fragment, or the
371
fragment itself is invalid, the following error
374
"Chain for record xxx is broken"
378
If the record header is marked as incomplete, it means that there
379
are additional fragments to be fetched--the record was too large
380
to be stored in one slot.
381
A pointer is stored in the record to the next fragment in the list.
383
For fragmented records, all fragments are fetched to form a full
384
record version. If any of the fragments is not in a valid position,
385
or is not the correct length, the following error is returned:
387
"Fragmented record xxx is corrupt"
389
Once the full record has been retrieved, the length of the format is
390
checked against the expected format stored in RDB$FORMATS (the
391
format number is stored with the record, representing the exact
392
format of the relation at the time the record was stored.)
393
If the length of the reconstructed record does not match
394
the expected format length, the following error is returned:
396
"Record xxx is wrong length"
398
For delta records (record versions which represent updates to the record)
399
this check is not made.
403
If the slot on the data page points to a blob record, then the blob
404
is fetched (even without -full). This has several cases, corresponding
405
to the various blob levels. (See the "Engine Internals" document for a
406
discussion of blob levels.)
409
----- -----------------------------------------------------------------
410
0 These are just records on page, and no further validation is done.
411
1 All the pages pointed to by the blob record are fetched and
412
validated in sequence.
413
2 All pages pointed to by the blob pointer pages are fetched and
415
3 The blob page is itself a blob pointer page; all its children
416
are fetched and validated.
418
For each blob page found, some further validation is done. If the
419
page does not point back to the lead page, the following error
422
"Warning: blob xxx appears inconsistent"
424
where xxx corresponds to the blob record number. If any of the blob pages
425
are not marked in the sequence we expect them to be in, the following
428
"Blob xxx is corrupt"
430
Tip: the message for the same error in level 2 or 3 blobs is slightly
435
If we have lost any of the blob pages in the sequence, the following error
438
"Blob xxx is truncated"
440
If the fetched blob is determined to be corrupt for any of the above
441
reasons, and -mend is specified, then the blob record is marked as
446
In 4.5 (NevaStone) Index walking was moved to after the completion
447
of data page walking.
449
The indices for the relation are walked. If the index root page
450
is missing, the following message is returned:
452
"Missing index root page"
454
and the indices are not walked. Otherwise the index root page
455
is fetched and all indices on the page fetched.
457
For each index, the btree pages are fetched from top-down, left to
459
Basic validation is made on non-leaf pages to ensure that each node
460
on page points to another index page. If -full validation is specified
461
then the lower level page is fetched to ensure it is starting index
462
entry is consistent with the parent entry.
463
On leaf pages, the records pointed to by the index pages are not
464
fetched, the keys are looked at to ensure they are in correct
467
If a visited page is not part of the specified relation and index,
468
the following error is returned:
470
"Index xxx is corrupt at page xxx"
472
If there are orphan child pages, i.e. a child page does not have its entry
473
as yet in the parent page, however the child's left sibling page has it's
474
btr_sibling updated, the following error is returned
476
"Index xxx has orphan child page at page xxx"
478
If the page does not contain the number of nodes we would have
479
expected from its marked length, the following error is returned:
481
"Index xxx is corrupt on page xxx"
483
While we are walking leaf pages, we keep a bitmap of all record
484
numbers seen in the index. At the conclusion of the index walk
485
we compare this bitmap to the bitmap of all records in the
486
relation (calculated during data page/Record Validation phase).
487
If the bitmaps are not equal then we have a corrupt index
488
and the following error is reported:
490
"Index %d is corrupt (missing entries)"
492
We do NOT check that each version of each record has a valid
493
index entry - nor do we check that the stored key for each item
494
in the index corresponds to a version of the specified record.
498
We count the number of backversions seen while walking pointer pages,
499
and separately count the number of backversions seen while walking
500
record chains. If these numbers do not match it indicates either
501
"orphan" backversion chains or double-linked chains. If this is
502
see the following error is returned:
504
"Relation has %ld orphan backversions (%ld in use)"
506
Currently we do not try to correct this condition, mearly report
507
it. For "orphan" backversions the space can be reclaimed by
508
a backup/restore. For double-linked chains a SWEEP should
509
remove all the backversions.
515
If any corruption of a record fragment is seen during validation, the
516
record header is marked as "damaged". As far as I can see, this has no
517
effect on the engine per se. Records marked as damaged will still be
518
retrieved by the engine itself. There is some question in my mind as
519
to whether this record should be retrieved at all during a gbak.
521
If a damaged record is visited, the following error message will appear:
523
"Record xxx is marked as damaged"
525
Note that when a damaged record is first detected, this message is not
526
actually printed. The record is simply marked as damaged. It is only
527
thereafter when the record is visited that this message will appear.
528
So I would postulate that unless a full validation is done at some point,
529
you would not see this error message; once the full validation is done,
530
the message will be returned even if you do not specify -full.
534
Blob records marked as damaged cannot be opened and will not be deleted
535
from disk. This means that even during backup the blob structures marked
536
as damaged will not be fetched and backed up. (Why this is done
537
differently for blobs than for records I cannot say.
538
Perhaps it was viewed as too difficult to try to retrieve a damaged blob.)
542
#include "firebird.h"
543
#include "memory_routines.h"
545
#include "../jrd/common.h"
547
#include "../jrd/jrd.h"
548
#include "../jrd/ods.h"
549
#include "../jrd/pag.h"
550
#include "../jrd/ibase.h"
551
#include "../jrd/val.h"
552
#include "../jrd/btr.h"
553
#include "../jrd/btn.h"
554
#include "../jrd/all.h"
555
#include "../jrd/lck.h"
556
#include "../jrd/cch.h"
557
#include "../jrd/rse.h"
558
#include "../jrd/sbm.h"
559
#include "../jrd/tra.h"
560
#include "../jrd/btr_proto.h"
561
#include "../jrd/cch_proto.h"
562
#include "../jrd/dbg_proto.h"
563
#include "../jrd/dpm_proto.h"
564
#include "../jrd/err_proto.h"
565
#include "../jrd/jrd_proto.h"
566
#include "../jrd/gds_proto.h"
567
#include "../jrd/met_proto.h"
568
#include "../jrd/sch_proto.h"
569
#include "../jrd/thd.h"
570
#include "../jrd/tra_proto.h"
571
#include "../jrd/val_proto.h"
572
#include "../jrd/thread_proto.h"
574
#ifdef DEBUG_VAL_VERBOSE
575
#include "../jrd/dmp_proto.h"
576
/* Control variable for verbose output during debug of
578
0 == logged errors only
579
1 == logical output also
580
2 == physical page output also */
581
static USHORT VAL_debug_level = 0;
587
/* Validation/garbage collection/repair control block */
591
PageBitmap* vdr_page_bitmap;
595
SLONG vdr_max_transaction;
596
ULONG vdr_rel_backversion_counter; /* Counts slots w/rhd_chain */
597
ULONG vdr_rel_chain_counter; /* Counts chains w/rdr_chain */
598
RecordBitmap* vdr_rel_records; /* 1 bit per valid record */
599
RecordBitmap* vdr_idx_records; /* 1 bit per index item */
604
const USHORT vdr_update = 2; /* fix simple things */
605
const USHORT vdr_repair = 4; /* fix non-simple things (-mend) */
606
const USHORT vdr_records = 8; /* Walk all records */
622
#pragma FB_COMPILER_MESSAGE("This table goes to gds__log and it's not localized")
624
static const TEXT msg_table[VAL_MAX_ERROR][66] =
626
"Page %ld wrong type (expected %d encountered %d)", // 0
627
"Checksum error on page %ld",
628
"Page %ld doubly allocated",
629
"Page %ld is used but marked free",
630
"Page %ld is an orphan",
631
"Warning: blob %ld appears inconsistent", // 5
632
"Blob %ld is corrupt",
633
"Blob %ld is truncated",
634
"Chain for record %ld is broken",
635
"Data page %ld (sequence %ld) is confused",
636
"Data page %ld (sequence %ld), line %ld is bad", // 10
637
"Index %d is corrupt on page %ld level %ld. File: %s, line: %ld\n\t",
638
"Pointer page (sequence %ld) lost",
639
"Pointer page (sequence %ld) inconsistent",
640
"Record %ld is marked as damaged",
641
"Record %ld has bad transaction %ld", // 15
642
"Fragmented record %ld is corrupt",
643
"Record %ld is wrong length",
644
"Missing index root page",
645
"Transaction inventory pages lost",
646
"Transaction inventory page lost, sequence %ld", // 20
647
"Transaction inventory pages confused, sequence %ld",
648
"Relation has %ld orphan backversions (%ld in use)",
649
"Index %d is corrupt (missing entries)",
650
"Index %d has orphan child page at page %ld",
651
"Index %d has a circular reference at page %ld"
655
static RTN corrupt(thread_db*, vdr*, USHORT, const jrd_rel*, ...);
656
static FETCH_CODE fetch_page(thread_db*, vdr*, SLONG, USHORT, WIN *, void *);
657
static void garbage_collect(thread_db*, vdr*);
658
#ifdef DEBUG_VAL_VERBOSE
659
static void print_rhd(USHORT, const rhd*);
661
static RTN walk_blob(thread_db*, vdr*, jrd_rel*, blh*, USHORT, SLONG);
662
static RTN walk_chain(thread_db*, vdr*, jrd_rel*, rhd*, SLONG);
663
static void walk_database(thread_db*, vdr*);
664
static RTN walk_data_page(thread_db*, vdr*, jrd_rel*, SLONG, SLONG);
665
static void walk_generators(thread_db*, vdr*);
666
static void walk_header(thread_db*, vdr*, SLONG);
667
static RTN walk_index(thread_db*, vdr*, jrd_rel*, index_root_page&, USHORT);
668
static void walk_log(thread_db*, vdr*);
669
static void walk_pip(thread_db*, vdr*);
670
static RTN walk_pointer_page(thread_db*, vdr*, jrd_rel*, int);
671
static RTN walk_record(thread_db*, vdr*, jrd_rel*, rhd*, USHORT, SLONG, bool);
672
static RTN walk_relation(thread_db*, vdr*, jrd_rel*);
673
static RTN walk_root(thread_db*, vdr*, jrd_rel*);
674
static RTN walk_tip(thread_db*, vdr*, SLONG);
678
bool VAL_validate(thread_db* tdbb, USHORT switches)
680
/**************************************
682
* V A L _ v a l i d a t e
684
**************************************
686
* Functional description
687
* Validate a database.
689
**************************************/
690
JrdMemoryPool* val_pool = 0;
693
Database* dbb = tdbb->getDatabase();
694
Attachment* att = tdbb->getAttachment();
698
val_pool = JrdMemoryPool::createPool();
699
Jrd::ContextPoolHolder context(tdbb, val_pool);
702
control.vdr_page_bitmap = NULL;
703
control.vdr_flags = 0;
704
control.vdr_errors = 0;
706
if (switches & isc_dpb_records)
707
control.vdr_flags |= vdr_records;
709
if (switches & isc_dpb_repair)
710
control.vdr_flags |= vdr_repair;
712
if (!(switches & isc_dpb_no_update))
713
control.vdr_flags |= vdr_update;
715
control.vdr_max_page = 0;
716
control.vdr_rel_records = NULL;
717
control.vdr_idx_records = NULL;
719
/* initialize validate errors */
721
if (!att->att_val_errors) {
722
att->att_val_errors = vcl::newVector(*dbb->dbb_permanent, VAL_MAX_ERROR);
725
for (USHORT i = 0; i < VAL_MAX_ERROR; i++)
726
(*att->att_val_errors)[i] = 0;
729
tdbb->tdbb_flags |= TDBB_sweeper;
730
walk_database(tdbb, &control);
731
if (control.vdr_errors)
732
control.vdr_flags &= ~vdr_update;
734
garbage_collect(tdbb, &control);
735
CCH_flush(tdbb, FLUSH_FINI, 0);
737
tdbb->tdbb_flags &= ~TDBB_sweeper;
739
catch (const Firebird::Exception& ex) {
740
Firebird::stuff_exception(tdbb->tdbb_status_vector, ex);
741
JrdMemoryPool::deletePool(val_pool);
742
tdbb->tdbb_flags &= ~TDBB_sweeper;
746
JrdMemoryPool::deletePool(val_pool);
750
static RTN corrupt(thread_db* tdbb, vdr* control, USHORT err_code, const jrd_rel* relation, ...)
752
/**************************************
756
**************************************
758
* Functional description
759
* Corruption has been detected.
761
**************************************/
764
Attachment* att = tdbb->getAttachment();
765
if (err_code < att->att_val_errors->count())
766
(*att->att_val_errors)[err_code]++;
768
const TEXT* err_string = err_code < VAL_MAX_ERROR ? msg_table[err_code]: "Unknown error code";
772
const char* fn = tdbb->getAttachment()->att_filename.c_str();
774
va_start(ptr, relation);
775
VSNPRINTF(s, sizeof(s), err_string, ptr);
778
#ifdef DEBUG_VAL_VERBOSE
779
if (VAL_debug_level >= 0)
783
fprintf(stdout, "LOG:\tDatabase: %s\n\t%s in table %s (%d)\n",
784
fn, s, relation->rel_name.c_str(), relation->rel_id);
787
fprintf(stdout, "LOG:\tDatabase: %s\n\t%s\n", fn, s);
793
gds__log("Database: %s\n\t%s in table %s (%d)",
794
fn, s, relation->rel_name.c_str(), relation->rel_id);
797
gds__log("Database: %s\n\t%s", fn, s);
800
++control->vdr_errors;
805
static FETCH_CODE fetch_page(thread_db* tdbb,
808
USHORT type, WIN * window, void *page_pointer)
810
/**************************************
812
* f e t c h _ p a g e
814
**************************************
816
* Functional description
817
* Fetch page and return type of illness, if any. If a control block
818
* is present, check for doubly allocated pages and account for page
821
**************************************/
823
Database* dbb = tdbb->getDatabase();
826
if (--tdbb->tdbb_quantum < 0)
827
JRD_reschedule(tdbb, 0, true);
829
window->win_page = page_number;
830
window->win_flags = 0;
831
*(PAG *) page_pointer = CCH_FETCH_NO_SHADOW(tdbb, window, LCK_write, 0);
833
if ((*(PAG *) page_pointer)->pag_type != type) {
834
corrupt(tdbb, control, VAL_PAG_WRONG_TYPE,
835
0, page_number, type, (*(PAG *) page_pointer)->pag_type);
842
/* If "damaged" flag was set, checksum may be incorrect. Check. */
844
if ((dbb->dbb_flags & DBB_damaged) && !CCH_validate(window)) {
845
corrupt(tdbb, control, VAL_PAG_CHECKSUM_ERR, 0, page_number);
846
if (control->vdr_flags & vdr_repair)
847
CCH_MARK(tdbb, window);
850
control->vdr_max_page = MAX(control->vdr_max_page, page_number);
852
/* For walking back versions & record fragments on data pages we
853
sometimes will fetch the same page more than once. In that
854
event we don't report double allocation. If the page is truely
855
double allocated (to more than one relation) we'll find it
856
when the on-page relation id doesn't match */
858
if ((type != pag_data) && PageBitmap::test(control->vdr_page_bitmap, page_number)) {
859
corrupt(tdbb, control, VAL_PAG_DOUBLE_ALLOC, 0, page_number);
860
return fetch_duplicate;
864
PBM_SET(tdbb->getDefaultPool(), &control->vdr_page_bitmap, page_number);
869
static void garbage_collect(thread_db* tdbb, vdr* control)
871
/**************************************
873
* g a r b a g e _ c o l l e c t
875
**************************************
877
* Functional description
878
* The database has been walked; compare the page inventory against
879
* the bitmap of pages visited.
881
**************************************/
884
Database* dbb = tdbb->getDatabase();
886
PageManager& pageSpaceMgr = dbb->dbb_page_manager;
887
PageSpace* pageSpace = pageSpaceMgr.findPageSpace(DB_PAGE_SPACE);
888
fb_assert(pageSpace);
890
WIN window(DB_PAGE_SPACE, -1);
892
for (SLONG sequence = 0, number = 0; number < control->vdr_max_page; sequence++)
894
const SLONG page_number = (sequence) ? sequence * pageSpaceMgr.pagesPerPIP - 1 : pageSpace->ppFirst;
895
page_inv_page* page = 0;
896
fetch_page(tdbb, 0, page_number, pag_pages, &window, &page);
897
UCHAR* p = page->pip_bits;
898
const UCHAR* const end = p + pageSpaceMgr.bytesBitPIP;
899
while (p < end && number < control->vdr_max_page) {
901
for (int i = 8; i; --i, byte >>= 1, number++) {
902
if (PageBitmap::test(control->vdr_page_bitmap, number)) {
904
corrupt(tdbb, control, VAL_PAG_IN_USE, 0, number);
905
if (control->vdr_flags & vdr_update) {
906
CCH_MARK(tdbb, &window);
907
p[-1] &= ~(1 << (number & 7));
913
/* Page is potentially an orphan - but don't declare it as such
914
unless we think we walked all pages */
916
else if (!(byte & 1) && (control->vdr_flags & vdr_records)) {
917
corrupt(tdbb, control, VAL_PAG_ORPHAN, 0, number);
918
if (control->vdr_flags & vdr_update) {
919
CCH_MARK(tdbb, &window);
920
p[-1] |= 1 << (number & 7);
926
const UCHAR test_byte = p[-1];
927
CCH_RELEASE(tdbb, &window);
928
if (test_byte & 0x80)
932
#ifdef DEBUG_VAL_VERBOSE
933
/* Dump verbose output of all the pages fetched */
934
if (VAL_debug_level >= 2)
936
//We are assuming RSE_get_forward
937
if (control->vdr_page_bitmap->getFirst())
939
SLONG dmp_page_number = control->vdr_page_bitmap->current();
940
DMP_page(dmp_page_number, dbb->dbb_page_size);
941
} while (control->vdr_page_bitmap->getNext());
946
#ifdef DEBUG_VAL_VERBOSE
947
static void print_rhd(USHORT length, const rhd* header)
949
/**************************************
953
**************************************
955
* Functional description
956
* Debugging routine to print a
957
* Record Header Data.
959
**************************************/
960
if (VAL_debug_level) {
961
fprintf(stdout, "rhd: len %d TX %d format %d ",
962
length, header->rhd_transaction, (int) header->rhd_format);
963
fprintf(stdout, "BP %d/%d flags 0x%x ",
964
header->rhd_b_page, header->rhd_b_line, header->rhd_flags);
965
if (header->rhd_flags & rhd_incomplete) {
966
rhdf* fragment = (rhdf*) header;
967
fprintf(stdout, "FP %d/%d ",
968
fragment->rhdf_f_page, fragment->rhdf_f_line);
970
fprintf(stdout, "%s ",
971
(header->rhd_flags & rhd_deleted) ? "DEL" : " ");
972
fprintf(stdout, "%s ",
973
(header->rhd_flags & rhd_chain) ? "CHN" : " ");
974
fprintf(stdout, "%s ",
975
(header->rhd_flags & rhd_fragment) ? "FRG" : " ");
976
fprintf(stdout, "%s ",
977
(header->rhd_flags & rhd_incomplete) ? "INC" : " ");
978
fprintf(stdout, "%s ",
979
(header->rhd_flags & rhd_blob) ? "BLB" : " ");
980
fprintf(stdout, "%s ",
981
(header->rhd_flags & rhd_delta) ? "DLT" : " ");
982
fprintf(stdout, "%s ",
983
(header->rhd_flags & rhd_large) ? "LRG" : " ");
984
fprintf(stdout, "%s ",
985
(header->rhd_flags & rhd_damaged) ? "DAM" : " ");
986
fprintf(stdout, "\n");
991
static RTN walk_blob(thread_db* tdbb,
993
jrd_rel* relation, blh* header, USHORT length, SLONG number)
995
/**************************************
999
**************************************
1001
* Functional description
1004
**************************************/
1007
#ifdef DEBUG_VAL_VERBOSE
1008
if (VAL_debug_level) {
1010
"walk_blob: level %d lead page %d max pages %d max segment %d\n",
1011
header->blh_level, header->blh_lead_page,
1012
header->blh_max_sequence, header->blh_max_segment);
1013
fprintf(stdout, " count %d, length %d sub_type %d\n",
1014
header->blh_count, header->blh_length,
1015
header->blh_sub_type);
1019
/* Level 0 blobs have no work to do. */
1021
if (header->blh_level == 0)
1024
/* Level 1 blobs are a little more complicated */
1025
WIN window1(DB_PAGE_SPACE, -1), window2(DB_PAGE_SPACE, -1);
1027
const SLONG* pages1 = header->blh_page;
1028
const SLONG* const end1 = pages1 + ((USHORT) (length - BLH_SIZE) >> SHIFTLONG);
1031
for (sequence = 0; pages1 < end1; pages1++) {
1032
blob_page* page1 = 0;
1033
fetch_page(tdbb, control, *pages1, pag_blob, &window1, &page1);
1034
if (page1->blp_lead_page != header->blh_lead_page)
1035
corrupt(tdbb, control, VAL_BLOB_INCONSISTENT, relation, number);
1036
if ((header->blh_level == 1 && page1->blp_sequence != sequence)) {
1037
corrupt(tdbb, control, VAL_BLOB_CORRUPT, relation, number);
1038
CCH_RELEASE(tdbb, &window1);
1041
if (header->blh_level == 1)
1044
const SLONG* pages2 = page1->blp_page;
1045
const SLONG* const end2 = pages2 + (page1->blp_length >> SHIFTLONG);
1046
for (; pages2 < end2; pages2++, sequence++) {
1047
blob_page* page2 = 0;
1048
fetch_page(tdbb, control, *pages2, pag_blob, &window2,
1050
if (page2->blp_lead_page != header->blh_lead_page
1051
|| page2->blp_sequence != sequence)
1053
corrupt(tdbb, control, VAL_BLOB_CORRUPT, relation,
1055
CCH_RELEASE(tdbb, &window1);
1056
CCH_RELEASE(tdbb, &window2);
1059
CCH_RELEASE(tdbb, &window2);
1062
CCH_RELEASE(tdbb, &window1);
1065
if (sequence - 1 != header->blh_max_sequence)
1066
return corrupt(tdbb, control, VAL_BLOB_TRUNCATED, relation, number);
1071
static RTN walk_chain(thread_db* tdbb,
1073
jrd_rel* relation, rhd* header, SLONG head_number)
1075
/**************************************
1077
* w a l k _ c h a i n
1079
**************************************
1081
* Functional description
1082
* Make sure chain of record versions is completely intact.
1084
**************************************/
1085
#ifdef DEBUG_VAL_VERBOSE
1091
SLONG page_number = header->rhd_b_page;
1092
USHORT line_number = header->rhd_b_line;
1093
WIN window(DB_PAGE_SPACE, -1);
1095
while (page_number) {
1096
const bool delta_flag = (header->rhd_flags & rhd_delta) ? true : false;
1097
#ifdef DEBUG_VAL_VERBOSE
1098
if (VAL_debug_level)
1099
fprintf(stdout, " BV %02d: ", ++counter);
1101
control->vdr_rel_chain_counter++;
1102
data_page* page = 0;
1103
fetch_page(tdbb, control, page_number, pag_data, &window, &page);
1104
const data_page::dpg_repeat* line = &page->dpg_rpt[line_number];
1105
header = (rhd*) ((UCHAR *) page + line->dpg_offset);
1106
if (page->dpg_count <= line_number ||
1107
!line->dpg_length ||
1108
(header->rhd_flags & (rhd_blob | rhd_fragment)) ||
1109
walk_record(tdbb, control, relation, header, line->dpg_length,
1110
head_number, delta_flag) != rtn_ok)
1112
CCH_RELEASE(tdbb, &window);
1113
return corrupt(tdbb, control, VAL_REC_CHAIN_BROKEN,
1114
relation, head_number);
1116
page_number = header->rhd_b_page;
1117
line_number = header->rhd_b_line;
1118
CCH_RELEASE(tdbb, &window);
1124
static void walk_database(thread_db* tdbb, vdr* control)
1126
/**************************************
1128
* w a l k _ d a t a b a s e
1130
**************************************
1132
* Functional description
1134
**************************************/
1136
Database* dbb = tdbb->getDatabase();
1138
#ifdef DEBUG_VAL_VERBOSE
1139
if (VAL_debug_level) {
1141
"walk_database: %s\nODS: %d.%d (creation ods %d)\nPage size %d\n",
1142
dbb->dbb_filename.c_str(), dbb->dbb_ods_version,
1143
dbb->dbb_minor_version, dbb->dbb_minor_original,
1144
dbb->dbb_page_size);
1148
DPM_scan_pages(tdbb);
1149
WIN window(DB_PAGE_SPACE, -1);
1150
header_page* page = 0;
1151
fetch_page(tdbb, control, (SLONG) HEADER_PAGE, pag_header, &window,
1153
control->vdr_max_transaction = page->hdr_next_transaction;
1155
walk_header(tdbb, control, page->hdr_next_page);
1156
walk_log(tdbb, control);
1157
walk_pip(tdbb, control);
1158
walk_tip(tdbb, control, page->hdr_next_transaction);
1159
walk_generators(tdbb, control);
1161
vec<jrd_rel*>* vector;
1162
for (USHORT i = 0; (vector = dbb->dbb_relations) && i < vector->count(); i++)
1164
#ifdef DEBUG_VAL_VERBOSE
1165
if (i >= 32 /* rel_MAX */ ) // Why not system flag instead?
1166
VAL_debug_level = 2;
1168
jrd_rel* relation = (*vector)[i];
1170
walk_relation(tdbb, control, relation);
1173
CCH_RELEASE(tdbb, &window);
1176
static RTN walk_data_page(thread_db* tdbb,
1178
jrd_rel* relation, SLONG page_number, SLONG sequence)
1180
/**************************************
1182
* w a l k _ d a t a _ p a g e
1184
**************************************
1186
* Functional description
1187
* Walk a single data page.
1189
**************************************/
1191
Database* dbb = tdbb->getDatabase();
1193
WIN window(DB_PAGE_SPACE, -1);
1194
data_page* page = 0;
1195
fetch_page(tdbb, control, page_number, pag_data, &window, &page);
1197
#ifdef DEBUG_VAL_VERBOSE
1198
if (VAL_debug_level) {
1200
"walk_data_page: page %d rel %d seq %d count %d\n",
1201
page_number, page->dpg_relation, page->dpg_sequence,
1206
if (page->dpg_relation != relation->rel_id
1207
|| page->dpg_sequence != sequence)
1209
++control->vdr_errors;
1210
CCH_RELEASE(tdbb, &window);
1211
return corrupt(tdbb, control, VAL_DATA_PAGE_CONFUSED,
1212
relation, page_number, sequence);
1217
const UCHAR* const end_page = (UCHAR *) page + dbb->dbb_page_size;
1218
const data_page::dpg_repeat* const end = page->dpg_rpt + page->dpg_count;
1219
SLONG number = sequence * dbb->dbb_max_records;
1221
for (const data_page::dpg_repeat* line = page->dpg_rpt; line < end;
1224
#ifdef DEBUG_VAL_VERBOSE
1225
if (VAL_debug_level) {
1226
fprintf(stdout, "Slot %02d (%d,%d): ",
1227
line - page->dpg_rpt,
1228
line->dpg_offset, line->dpg_length);
1231
if (line->dpg_length) {
1232
rhd* header = (rhd*) ((UCHAR *) page + line->dpg_offset);
1233
if ((UCHAR *) header < (UCHAR *) end ||
1234
(UCHAR *) header + line->dpg_length > end_page)
1236
return corrupt(tdbb, control, VAL_DATA_PAGE_LINE_ERR,
1237
relation, page_number, sequence,
1238
(SLONG) (line - page->dpg_rpt));
1240
if (header->rhd_flags & rhd_chain)
1241
control->vdr_rel_backversion_counter++;
1243
/* Record the existance of a primary version of a record */
1245
if ((control->vdr_flags & vdr_records) &&
1246
!(header->rhd_flags & (rhd_chain | rhd_fragment | rhd_blob)))
1248
/* Only set committed (or limbo) records in the bitmap. If there
1249
is a backversion then at least one of the record versions is
1250
committed. If there's no backversion then check transaction
1251
state of the lone primary record version. */
1253
if (header->rhd_b_page)
1254
RBM_SET(tdbb->getDefaultPool(), &control->vdr_rel_records, number);
1257
if (header->rhd_transaction < dbb->dbb_oldest_transaction)
1258
state = tra_committed;
1261
TRA_fetch_state(tdbb, header->rhd_transaction);
1262
if (state == tra_committed || state == tra_limbo)
1263
RBM_SET(tdbb->getDefaultPool(), &control->vdr_rel_records, number);
1267
#ifdef DEBUG_VAL_VERBOSE
1268
if (VAL_debug_level) {
1269
if (header->rhd_flags & rhd_chain)
1270
fprintf(stdout, "(backvers)");
1271
if (header->rhd_flags & rhd_fragment)
1272
fprintf(stdout, "(fragment)");
1273
if (header->rhd_flags & (rhd_fragment | rhd_chain))
1274
print_rhd(line->dpg_length, header);
1277
if (!(header->rhd_flags & rhd_chain) &&
1278
((header->rhd_flags & rhd_large) ||
1279
(control->vdr_flags & vdr_records)))
1281
const RTN result = (header->rhd_flags & rhd_blob) ?
1282
walk_blob(tdbb, control, relation, (blh*) header,
1283
line->dpg_length, number) :
1284
walk_record(tdbb, control, relation, header,
1285
line->dpg_length, number, false);
1286
if ((result == rtn_corrupt)
1287
&& (control->vdr_flags & vdr_repair))
1289
CCH_MARK(tdbb, &window);
1290
header->rhd_flags |= rhd_damaged;
1294
#ifdef DEBUG_VAL_VERBOSE
1295
else if (VAL_debug_level)
1296
fprintf(stdout, "(empty)\n");
1300
CCH_RELEASE(tdbb, &window);
1302
#ifdef DEBUG_VAL_VERBOSE
1303
if (VAL_debug_level)
1304
fprintf(stdout, "------------------------------------\n");
1310
static void walk_generators(thread_db* tdbb, vdr* control)
1312
/**************************************
1314
* w a l k _ g e n e r a t o r s
1316
**************************************
1318
* Functional description
1319
* Walk the page inventory pages.
1321
**************************************/
1323
Database* dbb = tdbb->getDatabase();
1325
WIN window(DB_PAGE_SPACE, -1);
1327
vcl* vector = dbb->dbb_gen_id_pages;
1329
vcl::iterator ptr, end;
1330
for (ptr = vector->begin(), end = vector->end(); ptr < end; ++ptr) {
1332
#ifdef DEBUG_VAL_VERBOSE
1333
if (VAL_debug_level)
1334
fprintf(stdout, "walk_generator: page %d\n", *ptr);
1336
pointer_page* page = 0;
1337
fetch_page(tdbb, control, *ptr, pag_ids, &window, &page);
1338
CCH_RELEASE(tdbb, &window);
1344
static void walk_header(thread_db* tdbb, vdr* control, SLONG page_num)
1346
/**************************************
1348
* w a l k _ h e a d e r
1350
**************************************
1352
* Functional description
1353
* Walk the overflow header pages
1355
**************************************/
1359
#ifdef DEBUG_VAL_VERBOSE
1360
if (VAL_debug_level)
1361
fprintf(stdout, "walk_header: page %d\n", page_num);
1363
WIN window(DB_PAGE_SPACE, -1);
1364
header_page* page = 0;
1365
fetch_page(tdbb, control, page_num, pag_header, &window, &page);
1366
page_num = page->hdr_next_page;
1367
CCH_RELEASE(tdbb, &window);
1371
static RTN walk_index(thread_db* tdbb, vdr* control, jrd_rel* relation,
1372
index_root_page& root_page, USHORT id)
1374
/**************************************
1376
* w a l k _ i n d e x
1378
**************************************
1380
* Functional description
1381
* Walk all btree pages left-to-right and top-down.
1382
* Check all the pointers and keys for consistency
1383
* relative to each other, and check sibling pointers.
1385
* NOTE: id is the internal index id, relative for each
1386
* relation. It is 1 less than the user level index id.
1387
* So errors are reported against index id+1
1389
**************************************/
1392
USHORT l; // temporary variable for length
1395
Database* dbb = tdbb->getDatabase();
1398
const SLONG page_number = root_page.irt_rpt[id].irt_root;
1403
const bool unique = (root_page.irt_rpt[id].irt_flags & (irt_unique | idx_primary));
1405
temporary_key nullKey, *null_key = 0;
1406
if (unique && tdbb->getDatabase()->dbb_ods_version >= ODS_VERSION11)
1408
const bool isExpression = root_page.irt_rpt[id].irt_flags & irt_expression;
1410
root_page.irt_rpt[id].irt_flags &= ~irt_expression;
1413
BTR_description(tdbb, relation, &root_page, &idx, id);
1415
root_page.irt_rpt[id].irt_flags |= irt_expression;
1417
null_key = &nullKey;
1418
BTR_make_null_key(tdbb, &idx, null_key);
1421
SLONG next = page_number;
1422
SLONG down = page_number;
1425
SLONG previous_number = 0;
1428
RecordBitmap::reset(control->vdr_idx_records);
1431
bool firstNode = true;
1432
bool nullKeyNode = false; // current node is a null key of unique index
1433
bool nullKeyHandled = !(unique && null_key); // null key of unique index was handled
1437
IndexNode node, lastNode;
1438
PageBitmap visited_pages; // used to check circular page references, Diane Downie 2007-02-09
1442
WIN window(DB_PAGE_SPACE, -1);
1443
btree_page* page = 0;
1444
fetch_page(tdbb, control, next, pag_index, &window, &page);
1446
// remember each page for circular reference detection
1447
visited_pages.set(next);
1449
if ((next != page_number) &&
1450
(page->btr_header.pag_flags & BTR_FLAG_COPY_MASK) !=
1451
(flags & BTR_FLAG_COPY_MASK))
1453
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1454
id + 1, next, page->btr_level, __FILE__, __LINE__);
1456
flags = page->btr_header.pag_flags;
1457
const bool leafPage = (page->btr_level == 0);
1458
const bool useJumpInfo = (flags & btr_jump_info);
1459
const bool useAllRecordNumbers = (flags & btr_all_record_number);
1461
if (!useAllRecordNumbers)
1462
nullKeyHandled = true;
1464
if (page->btr_relation != relation->rel_id ||
1465
page->btr_id != (UCHAR) (id % 256))
1467
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation, id + 1,
1468
next, page->btr_level, __FILE__, __LINE__);
1469
CCH_RELEASE(tdbb, &window);
1475
IndexJumpInfo jumpInfo;
1476
pointer = BTreeNode::getPointerFirstNode(page, &jumpInfo);
1477
const USHORT headerSize = (pointer - (UCHAR*)page);
1478
// Check if firstNodeOffset is not out of page area.
1479
if ((jumpInfo.firstNodeOffset < headerSize) ||
1480
(jumpInfo.firstNodeOffset > page->btr_length))
1482
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1483
id + 1, next, page->btr_level, __FILE__, __LINE__);
1486
USHORT n = jumpInfo.jumpers;
1487
USHORT jumpersSize = 0;
1488
IndexNode checknode;
1489
IndexJumpNode jumpNode;
1491
pointer = BTreeNode::readJumpNode(&jumpNode, pointer, flags);
1492
jumpersSize += BTreeNode::getJumpNodeSize(&jumpNode, flags);
1493
// Check if jump node offset is inside page.
1494
if ((jumpNode.offset < jumpInfo.firstNodeOffset) ||
1495
(jumpNode.offset > page->btr_length))
1497
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1498
id + 1, next, page->btr_level, __FILE__, __LINE__);
1501
// Check if jump node has same length as data node prefix.
1502
BTreeNode::readNode(&checknode,
1503
(UCHAR*)page + jumpNode.offset, flags, leafPage);
1504
if ((jumpNode.prefix + jumpNode.length) != checknode.prefix) {
1505
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1506
id + 1, next, page->btr_level, __FILE__, __LINE__);
1513
// go through all the nodes on the page and check for validity
1514
pointer = BTreeNode::getPointerFirstNode(page);
1515
if (useAllRecordNumbers && firstNode) {
1516
BTreeNode::readNode(&lastNode, pointer, flags, leafPage);
1519
const UCHAR* const endPointer = ((UCHAR *) page + page->btr_length);
1520
while (pointer < endPointer) {
1522
pointer = BTreeNode::readNode(&node, pointer, flags, leafPage);
1523
if (pointer > endPointer) {
1527
// make sure the current key is not less than the previous key
1528
bool duplicateNode = !firstNode && !node.isEndLevel &&
1529
(key.key_length == (node.length + node.prefix));
1531
p = key.key_data + node.prefix;
1532
l = MIN(node.length, (USHORT) (key.key_length - node.prefix));
1533
for (; l; l--, p++, q++) {
1535
duplicateNode = false;
1536
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1537
id + 1, next, page->btr_level, __FILE__, __LINE__);
1540
duplicateNode = false;
1545
if (!duplicateNode && nullKeyNode) {
1546
nullKeyHandled = true;
1547
nullKeyNode = false;
1550
if (useAllRecordNumbers && (node.recordNumber.getValue() >= 0) &&
1551
!firstNode && !node.isEndLevel)
1553
// If this node is equal to the previous one and it's
1554
// not a MARKER, record number should be same or higher.
1555
if (duplicateNode) {
1556
if ((!unique || (unique && nullKeyNode)) &&
1557
(node.recordNumber < lastNode.recordNumber))
1559
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1560
id + 1, next, page->btr_level, __FILE__, __LINE__);
1567
// save the current key
1568
memcpy(key.key_data + node.prefix, node.data, node.length);
1569
//key.key_length = key.key_data + node.prefix + node.length - key.key_data;
1570
key.key_length = node.prefix + node.length;
1572
if (!nullKeyHandled && !nullKeyNode && !duplicateNode)
1574
nullKeyNode = (leafPage || (!leafPage && !firstNode) ) &&
1575
!node.isEndLevel && (null_key->key_length == key.key_length) &&
1576
(memcmp(null_key->key_data, key.key_data, null_key->key_length) == 0);
1583
if (node.isEndBucket || node.isEndLevel) {
1587
// Record the existance of a primary version of a record
1588
if (leafPage && control && (control->vdr_flags & vdr_records)) {
1589
RBM_SET(tdbb->getDefaultPool(), &control->vdr_idx_records, node.recordNumber.getValue());
1592
// fetch the next page down (if full validation was specified)
1593
if (!leafPage && control && (control->vdr_flags & vdr_records))
1595
const SLONG down_number = node.pageNumber;
1596
const RecordNumber down_record_number = node.recordNumber;
1598
// Note: control == 0 for the fetch_page() call here
1599
// as we don't want to mark the page as visited yet - we'll
1600
// mark it when we visit it for real later on
1601
WIN down_window(DB_PAGE_SPACE, -1);
1602
btree_page* down_page = 0;
1603
fetch_page(tdbb, 0, down_number, pag_index, &down_window,
1605
const bool downLeafPage = (down_page->btr_level == 0);
1607
// make sure the initial key is greater than the pointer key
1608
UCHAR* downPointer = BTreeNode::getPointerFirstNode(down_page);
1611
downPointer = BTreeNode::readNode(&downNode, downPointer, flags, downLeafPage);
1615
l = MIN(key.key_length, downNode.length);
1616
for (; l; l--, p++, q++) {
1618
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1619
id + 1, next, page->btr_level, __FILE__, __LINE__);
1626
// Only check record-number if this isn't the first page in
1627
// the level and it isn't a MARKER.
1628
// Also don't check on primary/unique keys, because duplicates aren't
1629
// sorted on recordnumber, except for NULL keys.
1630
if (useAllRecordNumbers && down_page->btr_left_sibling &&
1631
!(downNode.isEndBucket || downNode.isEndLevel) &&
1632
(!unique || (unique && nullKeyNode)) )
1634
// Check record number if key is equal with node on
1635
// pointer page. In that case record number on page
1636
// down should be same or larger.
1637
if ((l == 0) && (key.key_length == downNode.length) &&
1638
(downNode.recordNumber < down_record_number))
1640
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1641
id + 1, next, page->btr_level, __FILE__, __LINE__);
1645
// check the left and right sibling pointers against the parent pointers
1646
if (previous_number != down_page->btr_left_sibling) {
1647
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1648
id + 1, next, page->btr_level, __FILE__, __LINE__);
1651
BTreeNode::readNode(&downNode, pointer, flags, leafPage);
1652
const SLONG next_number = downNode.pageNumber;
1654
if (!(downNode.isEndBucket || downNode.isEndLevel) &&
1655
(next_number != down_page->btr_sibling))
1657
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation,
1658
id + 1, next, page->btr_level, __FILE__, __LINE__);
1661
if (downNode.isEndLevel && down_page->btr_sibling) {
1662
corrupt(tdbb, control, VAL_INDEX_ORPHAN_CHILD, relation,
1665
previous_number = down_number;
1667
CCH_RELEASE(tdbb, &down_window);
1671
if (pointer != endPointer || page->btr_length > dbb->dbb_page_size) {
1672
corrupt(tdbb, control, VAL_INDEX_PAGE_CORRUPT, relation, id + 1,
1673
next, page->btr_level, __FILE__, __LINE__);
1677
if (page->btr_level) {
1678
IndexNode newPageNode;
1679
BTreeNode::readNode(&newPageNode,
1680
BTreeNode::getPointerFirstNode(page), flags, false);
1681
down = newPageNode.pageNumber;
1688
if (!(next = page->btr_sibling)) {
1691
previous_number = 0;
1693
nullKeyNode = false;
1694
nullKeyHandled = !(unique && null_key);
1697
// check for circular referenes
1698
if (next && visited_pages.test(next))
1700
corrupt(tdbb, control, VAL_INDEX_CYCLE, relation,
1704
CCH_RELEASE(tdbb, &window);
1707
// If the index & relation contain different sets of records we
1708
// have a corrupt index
1709
if (control && (control->vdr_flags & vdr_records)) {
1711
RecordBitmap::Accessor accessor(control->vdr_rel_records);
1712
if (accessor.getFirst())
1714
SINT64 next_number = accessor.current();
1715
if (!RecordBitmap::test(control->vdr_idx_records, next_number)) {
1717
return corrupt(tdbb, control, VAL_INDEX_MISSING_ROWS,
1720
} while (accessor.getNext());
1727
static void walk_log(thread_db* tdbb, vdr* control)
1729
/**************************************
1733
**************************************
1735
* Functional description
1736
* Walk the log and overflow pages
1738
**************************************/
1739
log_info_page* page = 0;
1740
SLONG page_num = LOG_PAGE;
1745
WIN window(DB_PAGE_SPACE, -1);
1746
fetch_page(tdbb, control, page_num, pag_log, &window, &page);
1747
page_num = page->log_next_page;
1748
CCH_RELEASE(tdbb, &window);
1752
static void walk_pip(thread_db* tdbb, vdr* control)
1754
/**************************************
1758
**************************************
1760
* Functional description
1761
* Walk the page inventory pages.
1763
**************************************/
1765
Database* dbb = tdbb->getDatabase();
1768
PageManager& pageSpaceMgr = dbb->dbb_page_manager;
1769
const PageSpace* pageSpace = pageSpaceMgr.findPageSpace(DB_PAGE_SPACE);
1770
fb_assert(pageSpace);
1772
page_inv_page* page = 0;
1774
for (USHORT sequence = 0;; sequence++) {
1775
const SLONG page_number =
1776
(sequence) ? sequence * pageSpaceMgr.pagesPerPIP - 1 : pageSpace->ppFirst;
1777
#ifdef DEBUG_VAL_VERBOSE
1778
if (VAL_debug_level)
1779
fprintf(stdout, "walk_pip: page %d\n", page_number);
1781
WIN window(DB_PAGE_SPACE, -1);
1782
fetch_page(tdbb, control, page_number, pag_pages, &window, &page);
1783
const UCHAR byte = page->pip_bits[pageSpaceMgr.bytesBitPIP - 1];
1784
CCH_RELEASE(tdbb, &window);
1790
static RTN walk_pointer_page( thread_db* tdbb,
1795
/**************************************
1797
* w a l k _ p o i n t e r _ p a g e
1799
**************************************
1801
* Functional description
1802
* Walk a pointer page for a relation. Return TRUE if there are more
1805
**************************************/
1808
Database* dbb = tdbb->getDatabase();
1810
const vcl* vector = relation->getBasePages()->rel_pages;
1812
if (!vector || sequence >= static_cast<int>(vector->count())) {
1813
return corrupt(tdbb, control, VAL_P_PAGE_LOST, relation, sequence);
1816
pointer_page* page = 0;
1817
WIN window(DB_PAGE_SPACE, -1);
1820
(*vector)[sequence],
1825
#ifdef DEBUG_VAL_VERBOSE
1826
if (VAL_debug_level)
1828
"walk_pointer_page: page %d relation %d sequence %d\n",
1829
(*vector)[sequence], relation->rel_id, sequence);
1832
/* Give the page a quick once over */
1834
if (page->ppg_relation != relation->rel_id ||
1835
page->ppg_sequence != sequence)
1837
return corrupt(tdbb, control, VAL_P_PAGE_INCONSISTENT, relation,
1841
/* Walk the data pages (someday we may optionally walk pages with "large objects" */
1843
SLONG seq = (SLONG) sequence *dbb->dbb_dp_per_pp;
1846
for (SLONG* pages = page->ppg_page; slot < page->ppg_count;
1847
slot++, pages++, seq++)
1850
const RTN result = walk_data_page(tdbb, control, relation, *pages, seq);
1851
if (result != rtn_ok && (control->vdr_flags & vdr_repair)) {
1852
CCH_MARK(tdbb, &window);
1858
/* If this is the last pointer page in the relation, we're done */
1860
if (page->ppg_header.pag_flags & ppg_eof) {
1861
CCH_RELEASE(tdbb, &window);
1865
/* Make sure the "next" pointer agrees with the pages relation */
1867
if (++sequence >= static_cast<int>(vector->count()) ||
1868
(page->ppg_next && page->ppg_next != (*vector)[sequence]))
1870
CCH_RELEASE(tdbb, &window);
1871
return corrupt( tdbb,
1873
VAL_P_PAGE_INCONSISTENT,
1878
CCH_RELEASE(tdbb, &window);
1883
static RTN walk_record(thread_db* tdbb,
1887
USHORT length, SLONG number, bool delta_flag)
1889
/**************************************
1891
* w a l k _ r e c o r d
1893
**************************************
1895
* Functional description
1898
**************************************/
1901
#ifdef DEBUG_VAL_VERBOSE
1902
if (VAL_debug_level) {
1903
fprintf(stdout, "record: number %ld (%d/%d) ",
1905
(USHORT) number / tdbb->getDatabase()->dbb_max_records,
1906
(USHORT) number % tdbb->getDatabase()->dbb_max_records);
1907
print_rhd(length, header);
1911
if (header->rhd_flags & rhd_damaged) {
1912
corrupt(tdbb, control, VAL_REC_DAMAGED, relation, number);
1916
if (control && header->rhd_transaction > control->vdr_max_transaction)
1918
corrupt(tdbb, control, VAL_REC_BAD_TID, relation, number,
1919
header->rhd_transaction);
1922
/* If there's a back pointer, verify that it's good */
1924
if (header->rhd_b_page && !(header->rhd_flags & rhd_chain)) {
1925
const RTN result = walk_chain(tdbb, control, relation, header, number);
1926
if (result != rtn_ok)
1930
/* If the record is a fragment, not large, or we're not interested in
1931
chasing records, skip the record */
1933
if (header->rhd_flags & (rhd_fragment | rhd_deleted) ||
1934
!((header->rhd_flags & rhd_large) ||
1935
(control && (control->vdr_flags & vdr_records))))
1940
/* Pick up what length there is on the fragment */
1942
const rhdf* fragment = (rhdf*) header;
1946
if (header->rhd_flags & rhd_incomplete) {
1947
p = (SCHAR *) fragment->rhdf_data;
1948
end = p + length - OFFSETA(rhdf*, rhdf_data);
1951
p = (SCHAR *) header->rhd_data;
1952
end = p + length - OFFSETA(rhd*, rhd_data);
1955
USHORT record_length = 0;
1958
const char c = *p++;
1969
/* Next, chase down fragments, if any */
1971
SLONG page_number = fragment->rhdf_f_page;
1972
USHORT line_number = fragment->rhdf_f_line;
1973
USHORT flags = fragment->rhdf_flags;
1975
data_page* page = 0;
1976
while (flags & rhd_incomplete) {
1977
WIN window(DB_PAGE_SPACE, -1);
1978
fetch_page(tdbb, control, page_number, pag_data, &window, &page);
1979
const data_page::dpg_repeat* line = &page->dpg_rpt[line_number];
1980
if (page->dpg_relation != relation->rel_id ||
1981
line_number >= page->dpg_count || !(length = line->dpg_length))
1983
corrupt(tdbb, control, VAL_REC_FRAGMENT_CORRUPT, relation,
1985
CCH_RELEASE(tdbb, &window);
1988
fragment = (rhdf*) ((UCHAR *) page + line->dpg_offset);
1989
#ifdef DEBUG_VAL_VERBOSE
1990
if (VAL_debug_level) {
1991
fprintf(stdout, "fragment: pg %d/%d ",
1992
page_number, line_number);
1993
print_rhd(line->dpg_length, (rhd*) fragment);
1996
if (fragment->rhdf_flags & rhd_incomplete) {
1997
p = (SCHAR *) fragment->rhdf_data;
1998
end = p + line->dpg_length - OFFSETA(rhdf*, rhdf_data);
2001
p = (SCHAR *) ((rhd*) fragment)->rhd_data;
2002
end = p + line->dpg_length - OFFSETA(rhd*, rhd_data);
2005
const char c = *p++;
2015
page_number = fragment->rhdf_f_page;
2016
line_number = fragment->rhdf_f_line;
2017
flags = fragment->rhdf_flags;
2018
CCH_RELEASE(tdbb, &window);
2021
/* Check out record length and format */
2023
const Format* format = MET_format(tdbb, relation, header->rhd_format);
2025
if (!delta_flag && record_length != format->fmt_length)
2026
return corrupt(tdbb, control, VAL_REC_WRONG_LENGTH, relation, number);
2032
static RTN walk_relation(thread_db* tdbb, vdr* control, jrd_rel* relation)
2034
/**************************************
2036
* w a l k _ r e l a t i o n
2038
**************************************
2040
* Functional description
2041
* Walk all pages associated with a given relation.
2043
**************************************/
2049
// If relation hasn't been scanned, do so now
2051
if (!(relation->rel_flags & REL_scanned) ||
2052
(relation->rel_flags & REL_being_scanned))
2054
MET_scan_relation(tdbb, relation);
2057
// skip deleted relations
2058
if (relation->rel_flags & (REL_deleted | REL_deleting)) {
2062
#ifdef DEBUG_VAL_VERBOSE
2063
if (VAL_debug_level)
2064
fprintf(stdout, "walk_relation: id %d Format %d %s %s\n",
2065
relation->rel_id, relation->rel_current_fmt,
2066
relation->rel_name.c_str(), relation->rel_owner_name.c_str());
2069
/* If it's a view, external file or virtual table, skip this */
2071
if (relation->rel_view_rse || relation->rel_file || relation->isVirtual()) {
2076
/* Walk pointer and selected data pages associated with relation */
2079
control->vdr_rel_backversion_counter = 0;
2080
control->vdr_rel_chain_counter = 0;
2081
RecordBitmap::reset(control->vdr_rel_records);
2083
for (SLONG sequence = 0; true; sequence++) {
2084
const RTN result = walk_pointer_page(tdbb, control, relation, sequence);
2085
if (result == rtn_eof) {
2088
if (result != rtn_ok) {
2093
// Walk indices for the relation
2094
walk_root(tdbb, control, relation);
2096
// See if the counts of backversions match
2097
if (control && (control->vdr_flags & vdr_records) &&
2098
(control->vdr_rel_backversion_counter !=
2099
control->vdr_rel_chain_counter))
2101
return corrupt(tdbb,
2103
VAL_REL_CHAIN_ORPHANS,
2105
control->vdr_rel_backversion_counter - control-> vdr_rel_chain_counter,
2106
control-> vdr_rel_chain_counter);
2110
catch (const Firebird::Exception&) {
2111
const char* msg = relation->rel_name.length() > 0 ?
2112
"bugcheck during scan of table %d (%s)" :
2113
"bugcheck during scan of table %d";
2114
gds__log(msg, relation->rel_id, relation->rel_name.c_str());
2115
#ifdef DEBUG_VAL_VERBOSE
2116
if (VAL_debug_level)
2119
SNPRINTF(s, sizeof(s), msg, relation->rel_id, relation->rel_name.c_str());
2120
fprintf(stdout, "LOG:\t%s\n", s);
2130
static RTN walk_root(thread_db* tdbb, vdr* control, jrd_rel* relation)
2132
/**************************************
2136
**************************************
2138
* Functional description
2139
* Walk index root page for a relation as well as any indices.
2141
**************************************/
2144
/* If the relation has an index root, walk it */
2145
RelationPages* relPages = relation->getBasePages();
2147
if (!relPages->rel_index_root) {
2148
return corrupt(tdbb, control, VAL_INDEX_ROOT_MISSING, relation);
2151
index_root_page* page = 0;
2152
WIN window(DB_PAGE_SPACE, -1);
2153
fetch_page(tdbb, control, relPages->rel_index_root, pag_root, &window,
2156
for (USHORT i = 0; i < page->irt_count; i++) {
2157
walk_index(tdbb, control, relation, *page, i);
2160
CCH_RELEASE(tdbb, &window);
2165
static RTN walk_tip(thread_db* tdbb, vdr* control, SLONG transaction)
2167
/**************************************
2171
**************************************
2173
* Functional description
2174
* Walk transaction inventory pages.
2176
**************************************/
2179
Database* dbb = tdbb->getDatabase();
2182
const vcl* vector = dbb->dbb_t_pages;
2184
return corrupt(tdbb, control, VAL_TIP_LOST, 0);
2187
tx_inv_page* page = 0;
2188
const ULONG pages = transaction / dbb->dbb_page_manager.transPerTIP;
2190
for (ULONG sequence = 0; sequence <= pages; sequence++) {
2191
if (!(*vector)[sequence] || sequence >= vector->count()) {
2192
corrupt(tdbb, control, VAL_TIP_LOST_SEQUENCE, 0, sequence);
2193
if (!(control->vdr_flags & vdr_repair))
2195
TRA_extend_tip(tdbb, sequence, 0);
2196
vector = dbb->dbb_t_pages;
2199
WIN window(DB_PAGE_SPACE, -1);
2202
(*vector)[sequence],
2207
#ifdef DEBUG_VAL_VERBOSE
2208
if (VAL_debug_level)
2209
fprintf(stdout, "walk_tip: page %d next %d\n",
2210
(*vector)[sequence], page->tip_next);
2212
if (page->tip_next && page->tip_next != (*vector)[sequence + 1])
2214
corrupt(tdbb, control, VAL_TIP_CONFUSED, 0, sequence);
2216
CCH_RELEASE(tdbb, &window);