556
|
|
Bug #1079700: Issues with renaming/rotating tables during the backup stage
The problem was in the way XtraBackup handled DDL on individual tablespaces on the backup stage. It first created a list of tablespaces to copy which is represented by fil_system, but when starting the actual tablespace copy operation later, it opened tablespace files by name. If a tablespace file could not be opened, XtraBackup assumed the tablespace got removed after the fil_system list was created and ignored the tablespaces. This naturally only worked for cases when the tablespace got dropped. A renamed tablespace would be missing in the resulting backup, and tablespace rotations resulted in a missing tablespace and a duplicate copy of another tablespace participating in rotation.
The idea of the fix is to make sure that once a tablespace is added to fil_system, the underlying file is copied to backup with the same name and space ID, regardless of what operations have been performed on the tablespace/file during the backup procedure.
The only way to achieve that is to reuse file handles created when opening tablespaces and adding them to fil_system, i.e. never attempt to access tablespaces by name and rely on the fact the an open file handle will always point to the same inode, even if the file is unlink()ed or rename()d along the way. In other words, exclude the case when we are going to copy a tablespace, but the file does not exist already.
This requires changes to fil_load_single_table_tablespace() to not close the file handle after adding a space/node to fil_system, but keep the node open and assign the handle to node->handle. We could use fil_node_open_file() for that, but that function creates another handle and open the tablespace file by name, which would still leave some room for a race condition during the time when fil_system is populated. This also requires XtraBackup to close the node correctly basically to comply with various invariants enforced by fil_validate() in debug builds. This part is implemented in XtraBackup in xb_fil_node_close_file().
We also want to reuse file handles in fil_load_single_table_tablespace() only at the backup stage. Historically, XtraBackup patches used recv_recovery_on to detect whether we are currently in the 'backup' or 'recovery' mode. That doesn't always work reliably. For example, we may initialize fil_system before we start recovery to apply an incremental backup. So recovery is still not started, but we are not in the backup mode either. To circumvent that, the patch introduces another global variable srv_backup_mode which is set by XtraBackup only in xtrabackup_backup_func(). This patch also changes remote tablespaces support in innodb56.patch to use srv_backup_mode instead of recv_recovery_on.
Another change necessitated by the patch is ignoring deleted tablespaces on recovery, which has been implemented in XB patches to recv_apply_hashed_log_recs(). After MLOG_FILE_DELETE is replayed on recovery and the underlying file is deleted, possible updates to the same tablespace done before deleting the tablespace are left in the redo log. They will be ignored by the recovery code. This can never cause any problems for server, because file creation/removal is done immediately, so on recovery InnoDB detects missing tablespace files and does not the corresponding log records to the hash table (see the first lines in recv_add_to_hash_table). The situation is different with XtraBackup: the underlying file will be present when recovery starts, but then get deleted when MLOG_FILE_DELETE is replayed, i.e. after all log records from the current batch have been added to the hash table. As a result, log records corresponding to the deleted tablespace are ignore, but are still left in the hash table, so recv_apply_hashed_log_recs() hangs forever waiting for recv_sys->n_addrs to become zero. This was also possible with XtraBackup before this fix, i.e. when a tablespace is removed after it has been copied by XtraBackup. This fix just makes this condition more likely to occur, as we always copy all tablespaces rather than ignore those removed before the copy operation is tarted.
To fix the above case, we also check for deleted tablespaces in recv_apply_hashed_log_recs(). Those corresponding to previously deleted tablespaces are marked as processed and recv_sys->n_addrs is decremented accordingly to not leave spurious unprocessed log records in the hash table.
Finally, now that we reuse fil_system file handles again, bug #870119 needs another fix. I.e. the number of open file handles grows to the number of tablespaces we want to copy, so we want to prevent InnoDB LRU policies from kicking in and closing/reusing file handles. There are multiple ways to achieve that. Patching InnoDB code to disable fil_system LRU and file closing policies in InnoDB appeared to be too risky. The easiest one is to set the allowed number of open InnoDB files to some vary large value unconditionally (i.e. override innodb_open_files with the maximum possible value, LONG_MAX). InnoDB does not allocate any resource for each srv_max_n_open_files increment. It's just the maximum possible LRU list length, so this change does not incur any additional resource consumption.
This revision also extends bug722638.sh (since there's 95% overlap between that one and the test case for this bug), and renames bug722638.sh to ddl.sh to better reflect the contents. It also backports record_db_state() / verify_db_state() from the 2.1 test suite, because messing with individual table checksums and checksum_table looks rather cumbersome with a higher number of tables in the test.
|
Alexey Kopytov |
10 years ago
|
|
|
555
|
|
|
jenkins at percona |
10 years ago
|
|
|
554
|
|
|
Sergei Glushchenko |
10 years ago
|
|
|
553
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
552
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
551
|
|
|
jenkins at percona |
10 years ago
|
|
|
550
|
|
|
jenkins at percona |
10 years ago
|
|
|
549
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
548
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
547
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
546
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
545
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
544
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
543
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
542
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
541
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
540
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
539
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
538
|
|
|
Alexey Kopytov |
10 years ago
|
|
|
537
|
|
|
Alexey Kopytov |
10 years ago
|
|
|