19
TESTING NODE FAILURE, ARBITRATION
20
---------------------------------
23
Crash president when he starts to run in ArbitState 1-9.
25
910: Crash new president after node crash
27
934 : Crash president in ALLOC_NODE_ID_REQ
29
935 : Crash master on node failure (delayed)
30
and skip sending GSN_COMMIT_FAILREQ to specified node
32
ERROR CODES FOR TESTING NODE FAILURE, GLOBAL CHECKPOINT HANDLING:
33
-----------------------------------------------------------------
36
Insert system error in master when global checkpoint is idle.
39
Insert system error in master after receiving GCP_PREPARE from
40
all nodes in the cluster.
43
Insert system error in master after receiving GCP_NODEFINISH from
44
all nodes in the cluster.
47
Insert system error in master after receiving GCP_SAVECONF from
48
all nodes in the cluster.
51
Insert system error in master after completing global checkpoint with
52
all nodes in the cluster.
55
Insert system error in GCP participant when receiving GCP_PREPARE.
58
Insert system error in GCP participant when receiving GCP_COMMIT.
61
Insert system error in GCP participant when receiving GCP_TCFINISHED.
64
Insert system error in GCP participant when receiving COPY_GCICONF.
67
Insert system error in GCP participant when receiving GCP_SAVEREQ.
70
Delay GCP_SAVEREQ by 10 secs
72
7165: Delay INCL_NODE_REQ in starting node yeilding error in GCP_PREPARE
74
7030: Delay in GCP_PREPARE until node has completed a node failure
75
7031: Delay in GCP_PREPARE and die 3s later
77
7177: Delay copying of sysfileData in execCOPY_GCIREQ
79
7180: Crash master during master-take-over in execMASTER_LCPCONF
81
7183: Crash when receiving COPY_GCIREQ
83
7184: Crash before starting next GCP after a node failure
85
7185: Dont reply to COPY_GCI_REQ where reason == GCP
87
7193: Dont send LCP_FRAG_ORD to self, and crash when sending first
90
7194: Force removeNodeFromStored to complete in the middle of MASTER_LCPCONF
92
ERROR CODES FOR TESTING NODE FAILURE, LOCAL CHECKPOINT HANDLING:
93
-----------------------------------------------------------------
96
Insert system error in master when local checkpoint is idle.
99
Insert system error in master when local checkpoint is in the
100
state clcpStatus = CALCULATE_KEEP_GCI.
103
Stop local checkpoint in the state CALCULATE_KEEP_GCI.
106
Restart local checkpoint after stopping in CALCULATE_KEEP_GCI.
109
1) Error 7011 in master, wait until report of stopped.
110
2) Error xxxx in participant to crash it.
111
3) Error 7012 in master to start again.
114
Insert system error in master when local checkpoint is in the
115
state clcpStatus = COPY_GCI before sending COPY_GCIREQ.
118
Insert system error in master when local checkpoint is in the
119
state clcpStatus = TC_CLOPSIZE before sending TC_CLOPSIZEREQ.
122
Insert system error in master when local checkpoint is in the
123
state clcpStatus = START_LCP_ROUND before sending START_LCP_ROUND.
126
Insert system error in master when local checkpoint is in the
127
state clcpStatus = START_LCP_ROUND after receiving LCP_REPORT.
130
Insert system error in master when local checkpoint is in the
131
state clcpStatus = TAB_COMPLETED.
134
Insert system error in master when local checkpoint is in the
135
state clcpStatus = TAB_SAVED before sending DIH_LCPCOMPLETE.
138
Insert system error in master when local checkpoint is in the
139
state clcpStatus = IDLE before sending CONTINUEB(ZCHECK_TC_COUNTER).
142
Insert system error in local checkpoint participant at reception of
146
Don't send any LCP_FRAG_ORD(last=true)
147
And crash when all have "not" been sent
149
8000: Crash particpant when receiving TCGETOPSIZEREQ
150
8001: Crash particpant when receiving TC_CLOPSIZEREQ
151
5010: Crash any when receiving LCP_FRAGORD
153
7021: Crash in master when receiving START_LCP_REQ
154
7022: Crash in !master when receiving START_LCP_REQ
156
7023: Crash in master when sending START_LCP_CONF
157
7024: Crash in !master when sending START_LCP_CONF
159
7025: Crash in master when receiving LCP_FRAG_REP
160
7016: Crash in !master when receiving LCP_FRAG_REP
162
7026: Crash in master when changing state to LCP_TAB_COMPLETED
163
7017: Crash in !master when changing state to LCP_TAB_COMPLETED
165
7027: Crash in master when changing state to LCP_TAB_SAVED
166
7018: Crash in master when changing state to LCP_TAB_SAVED
168
7191: Crash when receiving LCP_COMPLETE_REP
169
7192: Crash in setLcpActiveStatusStart - when dead node missed to LCP's
171
ERROR CODES FOR TESTING NODE FAILURE, FAILURE IN COPY FRAGMENT PROCESS:
172
-----------------------------------------------------------------------
175
Insert node failure in starting node when receiving a tuple copied from the copy node
176
as part of copy fragment process.
178
Insert node failure when receiving ABORT signal.
181
Insert node failure handling when receiving COMMITREQ.
184
Insert node failure handling when receiving COMPLETEREQ.
187
Insert node failure handling when receiving ABORTREQ.
190
As 5002, but with specified table (see DumpStateOrd)
192
These error code can be combined with error codes for testing time-out
193
handling in DBTC to ensure that node failures are also well handled in
194
time-out handling. They can also be used to test multiple node failure
197
5045: Crash in PREPARE_COPY_FRAG_REQ
198
5046: Crash if LQHKEYREQ (NrCopy) comes when frag-state is incorrect
200
ERROR CODES FOR TESTING TIME-OUT HANDLING IN DBLQH
201
-------------------------------------------------
203
Delay execution of COMMIT signal 2 seconds to generate time-out.
206
First delay execution of COMMIT signal 2 seconds to generate COMMITREQ.
207
Delay execution of COMMITREQ signal 2 seconds to generate time-out.
210
Delay execution of COMPLETE signal 2 seconds to generate time-out.
213
First delay execution of COMPLETE signal 2 seconds to generate COMPLETEREQ.
214
Delay execution of COMPLETEREQ signal 2 seconds to generate time-out.
217
Delay execution of ABORT signal 2 seconds to generate time-out.
219
5016: (ABORTREQ only as part of take-over)
220
Delay execution of ABORTREQ signal 2 seconds to generate time-out.
222
5031: lqhKeyRef, ZNO_TC_CONNECT_ERROR
223
5032: lqhKeyRef, ZTEMPORARY_REDO_LOG_FAILURE
224
5033: lqhKeyRef, ZTAIL_PROBLEM_IN_LOG_ERROR
226
5034: Don't pop scan queue
228
5035: Delay ACC_CONTOPCONT
230
5038: Drop LQHKEYREQ + set 5039
231
5039: Drop ABORT + set 5003
233
8048: Make TC not choose own node for simple/dirty read
234
5041: Crash is receiving simple read from other TC on different node
236
8050: Send TCKEYREF is operation is non local
238
5100,5101: Drop ABORT req in primary replica
239
Crash on "next" ABORT
241
ERROR CODES FOR TESTING TIME-OUT HANDLING IN DBTC
242
-------------------------------------------------
244
Delay execution of ABORTED signal 2 seconds to generate time-out.
247
Delay execution of COMMITTED signal 2 seconds to generate time-out.
249
Delay execution of COMMITTED signal 2 seconds to generate COMMITCONF.
250
Delay execution of COMMITCONF signal 2 seconds to generate time-out.
253
Delay execution of COMPLETED signal 2 seconds to generate time-out.
256
Delay execution of COMPLETED signal 2 seconds to generate COMPLETECONF.
257
Delay execution of COMPLETECONF signal 2 seconds to generate time-out.
259
8045: (ABORTCONF only as part of take-over)
260
Delay execution of ABORTCONF signal 2 seconds to generate time-out.
262
8050: Send ZABORT_TIMEOUT_BREAK delayed
264
8053: Crash in timeOutFoundLab, state CS_WAIT_COMMIT_CONF
266
5048: Crash in execCOMMIT
267
5049: SET_ERROR_INSERT_VALUE(5048)
269
ERROR CODES FOR TESTING TIME-OUT HANDLING IN DBTC
270
-------------------------------------------------
272
8003: Throw away a LQHKEYCONF in state STARTED
273
8004: Throw away a LQHKEYCONF in state RECEIVING
274
8005: Throw away a LQHKEYCONF in state REC_COMMITTING
275
8006: Throw away a LQHKEYCONF in state START_COMMITTING
277
8007: Ignore send of LQHKEYREQ in state STARTED
278
8008: Ignore send of LQHKEYREQ in state START_COMMITTING
280
8009: Ignore send of LQHKEYREQ+ATTRINFO in state STARTED
281
8010: Ignore send of LQHKEYREQ+ATTRINFO in state START_COMMITTING
283
8011: Abort at send of CONTINUEB(ZSEND_ATTRINFO) in state STARTED
284
8012: Abort at send of CONTINUEB(ZSEND_ATTRINFO) in state START_COMMITTING
286
8013: Ignore send of CONTINUEB(ZSEND_COMPLETE_LOOP) (should crash eventually)
287
8014: Ignore send of CONTINUEB(ZSEND_COMMIT_LOOP) (should crash eventually)
289
8015: Ignore ATTRINFO signal in DBTC in state REC_COMMITTING
290
8016: Ignore ATTRINFO signal in DBTC in state RECEIVING
292
8017: Return immediately from DIVERIFYCONF (should crash eventually)
293
8018: Throw away a COMMITTED signal
294
8019: Throw away a COMPLETED signal
296
TESTING TAKE-OVER FUNCTIONALITY IN DBTC
297
---------------------------------------
299
8002: Crash when sending LQHKEYREQ
300
8029: Crash when receiving LQHKEYCONF
301
8030: Crash when receiving COMMITTED
302
8031: Crash when receiving COMPLETED
303
8020: Crash when all COMMITTED has arrived
304
8021: Crash when all COMPLETED has arrived
305
8022: Crash when all LQHKEYCONF has arrived
307
COMBINATION OF TIME-OUT + CRASH
308
-------------------------------
310
8023 (use 8024): Ignore LQHKEYCONF and crash when ABORTED signal arrives by setting 8024
311
8025 (use 8026): Ignore COMMITTED and crash when COMMITCONF signal arrives by setting 8026
312
8027 (use 8028): Ignore COMPLETED and crash when COMPLETECONF signal arrives by setting 8028
317
8032: No free TC records any more
319
8037 : Invalid schema version in TCINDXREQ
323
8038 : Simulate API disconnect just after SCAN_TAB_REQ
325
8057 : Send only 1 COMMIT per timeslice
327
8052 : Simulate failure of TransactionBufferMemory allocation for OI lookup
329
8051 : Simulate failure of allocation for saveINDXKEYINFO
334
9000 Set RestartOnErrorInsert to restart -n
335
9998 Enter endless loop (trigger watchdog)
336
9999 Crash system immediatly
338
Test Crashes in handling node restarts
339
--------------------------------------
341
7121: Crash after receiving permission to start (START_PERMCONF) in starting
343
7122: Crash master when receiving request for permission to start (START_PERMREQ).
344
7123: Crash any non-starting node when receiving information about a starting node
346
7124: Respond negatively on an info request (START_INFOREQ)
347
7125: Stop an invalidate Node LCP process in the middle to test if START_INFOREQ
348
stopped by long-running processes are handled in a correct manner.
349
7126: Allow node restarts for all nodes (used in conjunction with 7025)
350
7127: Crash when receiving a INCL_NODEREQ message.
351
7128: Crash master after receiving all INCL_NODECONF from all nodes
352
7129: Crash master after receiving all INCL_NODECONF from all nodes and releasing
353
the lock on the dictionary
354
7130: Crash starting node after receiving START_MECONF
355
7131: Crash when receiving START_COPYREQ in master node
356
7132: Crash when receiving START_COPYCONF in starting node
358
7170: Crash when receiving START_PERMREF (InitialStartRequired)
360
8039: DBTC delay INCL_NODECONF and kill starting node
362
7174: Crash starting node before sending DICT_LOCK_REQ
363
7175: Master sends one fake START_PERMREF (ZNODE_ALREADY_STARTING_ERROR)
364
7176: Slave NR pretends master does not support DICT lock (rolling upgrade)
367
6000 Crash during NR when receiving DICTSTARTREQ
368
6001 Crash during NR when receiving SCHEMA_INFO
369
6002 Crash during NR soon after sending GET_TABINFO_REQ
372
5026 Crash when receiving COPY_ACTIVEREQ
373
5027 Crash when receiving STAT_RECREQ
375
5043 Crash starting node, when scan is finished on primary replica
377
Test Crashes in handling take over
378
----------------------------------
380
7133: Crash when receiving START_TOREQ
381
7134: Crash master after receiving all START_TOCONF
382
7135: Crash master after copying table 0 to starting node
383
7136: Crash master after completing copy of tables
384
7137: Crash master after adding a fragment before copying it
385
7138: Crash when receiving CREATE_FRAGREQ in prepare phase
386
7139: Crash when receiving CREATE_FRAGREQ in commit phase
387
7140: Crash master when receiving all CREATE_FRAGCONF in prepare phase
388
7141: Crash master when receiving all CREATE_FRAGCONF in commit phase
389
7142: Crash master when receiving COPY_FRAGCONF
390
7143: Crash master when receiving COPY_ACTIVECONF
391
7144: Crash when receiving END_TOREQ
392
7145: Crash master after receiving first END_TOCONF
393
7146: Crash master after receiving all END_TOCONF
394
7147: Crash master after receiving first START_TOCONF
395
7148: Crash master after receiving first CREATE_FRAGCONF
396
7152: Crash master after receiving first UPDATE_TOCONF
397
7153: Crash master after receiving all UPDATE_TOCONF
398
7154: Crash when receiving UPDATE_TOREQ
399
7155: Crash master when completing writing start take over info
400
7156: Crash master when completing writing end take over info
402
Test failures in various states in take over functionality
403
----------------------------------------------------------
404
7157: Block take over at start take over
405
7158: Block take over at sending of START_TOREQ
406
7159: Block take over at selecting next fragment
407
7160: Block take over at creating new fragment
408
7161: Block take over at sending of CREATE_FRAGREQ in prepare phase
409
7162: Block take over at sending of CREATE_FRAGREQ in commit phase
410
7163: Block take over at sending of UPDATE_TOREQ at end of copy frag
411
7164: Block take over at sending of END_TOREQ
412
7169: Block take over at sending of UPDATE_TOREQ at end of copy
414
5008: Crash at reception of EMPTY_LCPREQ (at master take over after NF)
415
5009: Crash at sending of EMPTY_LCPCONF (at master take over after NF)
417
Test Crashes in Handling Graceful Shutdown
418
------------------------------------------
419
7065: Crash when receiving STOP_PERMREQ in master
420
7066: Crash when receiving STOP_PERMREQ in slave
421
7067: Crash when receiving DIH_SWITCH_REPLICA_REQ
422
7068: Crash when receiving DIH_SWITCH_REPLICA_CONF
426
------------------------------------------
427
10001: Crash on NODE_FAILREP in Backup coordinator
428
10002: Crash on NODE_FAILREP when coordinatorTakeOver
429
10003: Crash on PREP_CREATE_TRIG_{CONF/REF} (only coordinator)
430
10004: Crash on START_BACKUP_{CONF/REF} (only coordinator)
431
10005: Crash on CREATE_TRIG_{CONF/REF} (only coordinator)
432
10006: Crash on WAIT_GCP_REF (only coordinator)
433
10007: Crash on WAIT_GCP_CONF (only coordinator)
434
10008: Crash on WAIT_GCP_CONF during start of backup (only coordinator)
435
10009: Crash on WAIT_GCP_CONF during stop of backup (only coordinator)
436
10010: Crash on BACKUP_FRAGMENT_CONF (only coordinator)
437
10011: Crash on BACKUP_FRAGMENT_REF (only coordinator)
438
10012: Crash on DROP_TRIG_{CONF/REF} (only coordinator)
439
10013: Crash on STOP_BACKUP_{CONF/REF} (only coordinator)
440
10014: Crash on DEFINE_BACKUP_REQ (participant)
441
10015: Crash on START_BACKUP_REQ (participant)
442
10016: Crash on BACKUP_FRAGMENT_REQ (participant)
443
10017: Crash on SCAN_FRAGCONF (participant)
444
10018: Crash on FSAPPENDCONF (participant)
445
10019: Crash on TRIG_ATTRINFO (participant)
446
10020: Crash on STOP_BACKUP_REQ (participant)
447
10021: Crash on NODE_FAILREP in participant not becoming coordinator
449
10022: Fake no backup records at DEFINE_BACKUP_REQ (participant)
450
10023: Abort backup by error at reception of UTIL_SEQUENCE_CONF (code 300)
451
10024: Abort backup by error at reception of DEFINE_BACKUP_CONF (code 301)
452
10025: Abort backup by error at reception of CREATE_TRIG_CONF last (code 302)
453
10026: Abort backup by error at reception of START_BACKUP_CONF (code 303)
454
10027: Abort backup by error at reception of DEFINE_BACKUP_REQ at master (code 304)
455
10028: Abort backup by error at reception of BACKUP_FRAGMENT_CONF at master (code 305)
456
10029: Abort backup by error at reception of FSAPPENDCONF in slave (FileOrScanError = 5)
457
10030: Simulate buffer full from trigger execution => abort backup
458
10031: Error 331 for dictCommitTableMutex_locked
459
10032: backup checkscan
460
10033: backup checkscan
461
10034: define backup reply error
462
10035: Fail to allocate buffers
464
10036: Halt backup for table >= 2
465
10037: Resume backup (from 10036)
467
11001: Send UTIL_SEQUENCE_REF (in master)
469
5028: Crash when receiving LQHKEYREQ (in non-master)
473
7173: Create table failed due to not sufficient number of fragment or
475
3001: Fail create 1st fragment
476
4007 12001: Fail create 1st fragment
477
4008 12002: Fail create 2nd fragment
478
4009 12003: Fail create 1st attribute in 1st fragment
479
4010 12004: Fail create last attribute in 1st fragment
480
4011 12005: Fail create 1st attribute in 2nd fragment
481
4012 12006: Fail create last attribute in 2nd fragment
485
4001: Crash on REL_TABMEMREQ in TUP
486
4002: Crash on DROP_TABFILEREQ in TUP
487
4003: Fail next trigger create in TUP
488
4004: Fail next trigger drop in TUP
489
8033: Fail next trigger create in TC
490
8034: Fail next index create in TC
491
8035: Fail next trigger drop in TC
492
8036: Fail next index drop in TC
493
6006: Crash participant in create index
495
4013: verify TUP tab descr before and after next DROP TABLE
500
5020: Force system to read pages form file when executing prepare operation record
501
3000: Delay writing of datapages in ACC when LCP is started
502
4000: Delay writing of datapages in TUP when LCP is started
503
7070: Set TimeBetweenLcp to min value
504
7071: Set TimeBetweenLcp to max value
505
7072: Split START_FRAGREQ into several log nodes
506
7073: Don't include own node in START_FRAGREQ
512
5021: Crash when receiving SCAN_NEXTREQ if sender is own node
513
5022: Crash when receiving SCAN_NEXTREQ if sender is NOT own node
514
5023: Drop SCAN_NEXTREQ if sender is own node
515
5024: Drop SCAN_NEXTREQ if sender is NOT own node
516
5025: Delay SCAN_NEXTREQ 1 second if sender is NOT own node
517
5030: Drop all SCAN_NEXTREQ until node is shutdown with SYSTEM_ERROR
518
because of scan fragment timeout
520
Test routing of signals:
521
-----------------------
522
4006: Turn on routing of TRANSID_AI signals from TUP
523
5029: Turn on routing of KEYINFO20 signals from LQH
527
12007: Make next alloc node fail with no memory error
531
6003 Crash in participant @ CreateTabReq::Prepare
532
6004 Crash in participant @ CreateTabReq::Commit
533
6005 Crash in participant @ CreateTabReq::CreateDrop
534
6007 Fail on readTableFile for READ_TAB_FILE1 (28770)
537
4014 - handleInsert - Out of undo buffer
538
4015 - handleInsert - Out of log space
539
4016 - handleInsert - AI Inconsistency
540
4017 - handleInsert - Out of memory
541
4018 - handleInsert - Null check error
542
4019 - handleInsert - Alloc rowid error
543
4020 - handleInsert - Size change error
544
4021 - handleInsert - Out of disk space
546
4022 - addTuxEntries - fail before add of first entry
547
4023 - addTuxEntries - fail add of last entry (the entry for last index)
549
4025: Fail all inserts with out of memory
550
4026: Fail one insert with oom
551
4027: Fail inserts randomly with oom
552
4028: Fail one random insert with oom
556
1000: Crash insertion on SystemError::CopyFragRef
557
1001: Delay sending NODE_FAILREP (to own node), until error is cleared
561
15000: Fail to create log file
565
16000: Fail to create data file