14
pt-table-sync - Synchronize MySQL table data efficiently.
22
Usage: pt-table-sync [OPTION...] DSN [DSN...]
24
pt-table-sync synchronizes data efficiently between MySQL tables.
26
This tool changes data, so for maximum safety, you should back up your data
27
before you use it. When synchronizing a server that is a replication slave with
28
the --replicate or --sync-to-master methods, it \ **always**\ makes the changes on
29
the replication master, \ **never**\ the replication slave directly. This is in
30
general the only safe way to bring a replica back in sync with its master;
31
changes to the replica are usually the source of the problems in the first
32
place. However, the changes it makes on the master should be no-op changes that
33
set the data to their current values, and actually affect only the replica.
34
Please read the detailed documentation that follows to learn more about this.
36
Sync db.tbl on host1 to host2:
41
pt-table-sync --execute h=host1,D=db,t=tbl h=host2
44
Sync all tables on host1 to host2 and host3:
49
pt-table-sync --execute host1 host2 host3
52
Make slave1 have the same data as its replication master:
57
pt-table-sync --execute --sync-to-master slave1
60
Resolve differences that pt-table-checksum found on all slaves of master1:
65
pt-table-sync --execute --replicate test.checksum master1
68
Same as above but only resolve differences on slave1:
73
pt-table-sync --execute --replicate test.checksum \
74
--sync-to-master slave1
77
Sync master2 in a master-master replication configuration, where master2's copy
78
of db.tbl is known or suspected to be incorrect:
83
pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl
86
Note that in the master-master configuration, the following will NOT do what you
87
want, because it will make changes directly on master2, which will then flow
88
through replication and change master1's data:
93
# Don't do this in a master-master setup!
94
pt-table-sync --execute h=master1,D=db,t=tbl master2
103
The following section is included to inform users about the potential risks,
104
whether known or unknown, of using this tool. The two main categories of risks
105
are those created by the nature of the tool (e.g. read-only tools vs. read-write
106
tools) and those created by bugs.
108
With great power comes great responsibility! This tool changes data, so it is a
109
good idea to back up your data. It is also very powerful, which means it is
110
very complex, so you should run it with the "--dry-run" option to see what it
111
will do, until you're familiar with its operation. If you want to see which
112
rows are different, without changing any data, use "--print" instead of
115
Be careful when using pt-table-sync in any master-master setup. Master-master
116
replication is inherently tricky, and it's easy to make mistakes. You need to
117
be sure you're using the tool correctly for master-master replication. See the
118
"SYNOPSIS" for the overview of the correct usage.
120
Also be careful with tables that have foreign key constraints with \ ``ON DELETE``\
121
or \ ``ON UPDATE``\ definitions because these might cause unintended changes on the
124
In general, this tool is best suited when your tables have a primary key or
125
unique index. Although it can synchronize data in tables lacking a primary key
126
or unique index, it might be best to synchronize that data by another means.
128
At the time of this release, there is a potential bug using
129
"--lock-and-rename" with MySQL 5.1, a bug detecting certain differences,
130
a bug using ROUND() across different platforms, and a bug mixing collations.
132
The authoritative source for updated information is always the online issue
133
tracking system. Issues that affect this tool will be marked as such. You can
134
see a list of such issues at the following URL:
135
`http://www.percona.com/bugs/pt-table-sync <http://www.percona.com/bugs/pt-table-sync>`_.
137
See also "BUGS" for more information on filing bugs and getting help.
145
pt-table-sync does one-way and bidirectional synchronization of table data.
146
It does \ **not**\ synchronize table structures, indexes, or any other schema
147
objects. The following describes one-way synchronization.
148
"BIDIRECTIONAL SYNCING" is described later.
150
This tool is complex and functions in several different ways. To use it
151
safely and effectively, you should understand three things: the purpose
152
of "--replicate", finding differences, and specifying hosts. These
153
three concepts are closely related and determine how the tool will run.
154
The following is the abbreviated logic:
159
if DSN has a t part, sync only that table:
162
The DSN is a slave. Connect to its master and sync.
164
The first DSN is the source. Sync each DSN in turn.
167
The DSN is a slave. Connect to its master, find records
168
of differences, and fix.
170
The DSN is the master. Find slaves and connect to each,
171
find records of differences, and fix.
173
if only 1 DSN and --sync-to-master:
174
The DSN is a slave. Connect to its master, find tables and
175
filter with --databases etc, and sync each table to the master.
177
find tables, filtering with --databases etc, and sync each
181
pt-table-sync can run in one of two ways: with "--replicate" or without.
182
The default is to run without "--replicate" which causes pt-table-sync
183
to automatically find differences efficiently with one of several
184
algorithms (see "ALGORITHMS"). Alternatively, the value of
185
"--replicate", if specified, causes pt-table-sync to use the differences
186
already found by having previously ran pt-table-checksum with its own
187
\ ``--replicate``\ option. Strictly speaking, you don't need to use
188
"--replicate" because pt-table-sync can find differences, but many
189
people use "--replicate" if, for example, they checksum regularly
190
using pt-table-checksum then fix differences as needed with pt-table-sync.
191
If you're unsure, read each tool's documentation carefully and decide for
192
yourself, or consult with an expert.
194
Regardless of whether "--replicate" is used or not, you need to specify
195
which hosts to sync. There are two ways: with "--sync-to-master" or
196
without. Specifying "--sync-to-master" makes pt-table-sync expect
197
one and only slave DSN on the command line. The tool will automatically
198
discover the slave's master and sync it so that its data is the same as
199
its master. This is accomplished by making changes on the master which
200
then flow through replication and update the slave to resolve its differences.
201
\ **Be careful though**\ : although this option specifies and syncs a single
202
slave, if there are other slaves on the same master, they will receive
203
via replication the changes intended for the slave that you're trying to
206
Alternatively, if you do not specify "--sync-to-master", the first
207
DSN given on the command line is the source host. There is only ever
208
one source host. If you do not also specify "--replicate", then you
209
must specify at least one other DSN as the destination host. There
210
can be one or more destination hosts. Source and destination hosts
211
must be independent; they cannot be in the same replication topology.
212
pt-table-sync will die with an error if it detects that a destination
213
host is a slave because changes are written directly to destination hosts
214
(and it's not safe to write directly to slaves). Or, if you specify
215
"--replicate" (but not "--sync-to-master") then pt-table-sync expects
216
one and only one master DSN on the command line. The tool will automatically
217
discover all the master's slaves and sync them to the master. This is
218
the only way to sync several (all) slaves at once (because
219
"--sync-to-master" only specifies one slave).
221
Each host on the command line is specified as a DSN. The first DSN
222
(or only DSN for cases like "--sync-to-master") provides default values
223
for other DSNs, whether those other DSNs are specified on the command line
224
or auto-discovered by the tool. So in this example,
229
pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2
232
the host2 DSN inherits the \ ``u``\ and \ ``p``\ DSN parts from the host1 DSN.
233
Use the "--explain-hosts" option to see how pt-table-sync will interpret
234
the DSNs given on the command line.
242
If you specify the "--verbose" option, you'll see information about the
243
differences between the tables. There is one row per table. Each server is
244
printed separately. For example,
249
# Syncing h=host1,D=test,t=test1
250
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
251
# 0 0 3 0 Chunk 13:00:00 13:00:17 2 test.test1
254
Table test.test1 on host1 required 3 \ ``INSERT``\ statements to synchronize
255
and it used the Chunk algorithm (see "ALGORITHMS"). The sync operation
256
for this table started at 13:00:00 and ended 17 seconds later (times taken
257
from \ ``NOW()``\ on the source host). Because differences were found, its
260
If you specify the "--print" option, you'll see the actual SQL statements
261
that the script uses to synchronize the table if "--execute" is also
264
If you want to see the SQL statements that pt-table-sync is using to select
265
chunks, nibbles, rows, etc., then specify "--print" once and "--verbose"
266
twice. Be careful though: this can print a lot of SQL statements.
268
There are cases where no combination of \ ``INSERT``\ , \ ``UPDATE``\ or \ ``DELETE``\
269
statements can resolve differences without violating some unique key. For
270
example, suppose there's a primary key on column a and a unique key on column b.
271
Then there is no way to sync these two tables with straightforward UPDATE
285
The tool rewrites queries to \ ``DELETE``\ and \ ``REPLACE``\ in this case. This is
286
automatically handled after the first index violation, so you don't have to
295
Synchronizing a replication master and slave safely is a non-trivial problem, in
296
general. There are all sorts of issues to think about, such as other processes
297
changing data, trying to change data on the slave, whether the destination and
298
source are a master-master pair, and much more.
300
In general, the safe way to do it is to change the data on the master, and let
301
the changes flow through replication to the slave like any other changes.
302
However, this works only if it's possible to REPLACE into the table on the
303
master. REPLACE works only if there's a unique index on the table (otherwise it
304
just acts like an ordinary INSERT).
306
If your table has unique keys, you should use the "--sync-to-master" and/or
307
"--replicate" options to sync a slave to its master. This will generally do
308
the right thing. When there is no unique key on the table, there is no choice
309
but to change the data on the slave, and pt-table-sync will detect that you're
310
trying to do so. It will complain and die unless you specify
311
\ ``--no-check-slave``\ (see "--[no]check-slave").
313
If you're syncing a table without a primary or unique key on a master-master
314
pair, you must change the data on the destination server. Therefore, you need
315
to specify \ ``--no-bin-log``\ for safety (see "--[no]bin-log"). If you don't,
316
the changes you make on the destination server will replicate back to the
317
source server and change the data there!
319
The generally safe thing to do on a master-master pair is to use the
320
"--sync-to-master" option so you don't change the data on the destination
321
server. You will also need to specify \ ``--no-check-slave``\ to keep
322
pt-table-sync from complaining that it is changing data on a slave.
330
pt-table-sync has a generic data-syncing framework which uses different
331
algorithms to find differences. The tool automatically chooses the best
332
algorithm for each table based on indexes, column types, and the algorithm
333
preferences specified by "--algorithms". The following algorithms are
334
available, listed in their default order of preference:
339
Finds an index whose first column is numeric (including date and time types),
340
and divides the column's range of values into chunks of approximately
341
"--chunk-size" rows. Syncs a chunk at a time by checksumming the entire
342
chunk. If the chunk differs on the source and destination, checksums each
343
chunk's rows individually to find the rows that differ.
345
It is efficient when the column has sufficient cardinality to make the chunks
346
end up about the right size.
348
The initial per-chunk checksum is quite small and results in minimal network
349
traffic and memory consumption. If a chunk's rows must be examined, only the
350
primary key columns and a checksum are sent over the network, not the entire
351
row. If a row is found to be different, the entire row will be fetched, but not
358
Finds an index and ascends the index in fixed-size nibbles of "--chunk-size"
359
rows, using a non-backtracking algorithm (see pt-archiver for more on this
360
algorithm). It is very similar to "Chunk", but instead of pre-calculating
361
the boundaries of each piece of the table based on index cardinality, it uses
362
\ ``LIMIT``\ to define each nibble's upper limit, and the previous nibble's upper
363
limit to define the lower limit.
365
It works in steps: one query finds the row that will define the next nibble's
366
upper boundary, and the next query checksums the entire nibble. If the nibble
367
differs between the source and destination, it examines the nibble row-by-row,
368
just as "Chunk" does.
374
Selects the entire table grouped by all columns, with a COUNT(\*) column added.
375
Compares all columns, and if they're the same, compares the COUNT(\*) column's
376
value to determine how many rows to insert or delete into the destination.
377
Works on tables with no primary key or unique index.
383
Selects the entire table in one big stream and compares all columns. Selects
384
all columns. Much less efficient than the other algorithms, but works when
385
there is no suitable index for them to use.
391
Possibilities for future algorithms are TempTable (what I originally called
392
bottom-up in earlier versions of this tool), DrillDown (what I originally
393
called top-down), and GroupByPrefix (similar to how SqlYOG Job Agent works).
394
Each algorithm has strengths and weaknesses. If you'd like to implement your
395
favorite technique for finding differences between two sources of data on
396
possibly different servers, I'm willing to help. The algorithms adhere to a
397
simple interface that makes it pretty easy to write your own.
402
*********************
403
BIDIRECTIONAL SYNCING
404
*********************
407
Bidirectional syncing is a new, experimental feature. To make it work
408
reliably there are a number of strict limitations:
413
* only works when syncing one server to other independent servers
414
* does not work in any way with replication
415
* requires that the table(s) are chunkable with the Chunk algorithm
416
* is not N-way, only bidirectional between two servers at a time
417
* does not handle DELETE changes
420
For example, suppose we have three servers: c1, r1, r2. c1 is the central
421
server, a pseudo-master to the other servers (viz. r1 and r2 are not slaves
422
to c1). r1 and r2 are remote servers. Rows in table foo are updated and
423
inserted on all three servers and we want to synchronize all the changes
424
between all the servers. Table foo has columns:
430
ts timestamp auto updated
434
Auto-increment offsets are used so that new rows from any server do not
435
create conflicting primary key (id) values. In general, newer rows, as
436
determined by the ts column, take precedence when a same but differing row
437
is found during the bidirectional sync. "Same but differing" means that
438
two rows have the same primary key (id) value but different values for some
439
other column, like the name column in this example. Same but differing
440
conflicts are resolved by a "conflict". A conflict compares some column of
441
the competing rows to determine a "winner". The winning row becomes the
442
source and its values are used to update the other row.
444
There are subtle differences between three columns used to achieve
445
bidirectional syncing that you should be familiar with: chunk column
446
("--chunk-column"), comparison column(s) ("--columns"), and conflict
447
column ("--conflict-column"). The chunk column is only used to chunk the
448
table; e.g. "WHERE id >= 5 AND id < 10". Chunks are checksummed and when
449
chunk checksums reveal a difference, the tool selects the rows in that
450
chunk and checksums the "--columns" for each row. If a column checksum
451
differs, the rows have one or more conflicting column values. In a
452
traditional unidirectional sync, the conflict is a moot point because it can
453
be resolved simply by updating the entire destination row with the source
454
row's values. In a bidirectional sync, however, the "--conflict-column"
455
(in accordance with other \ ``--conflict-\*``\ options list below) is compared
456
to determine which row is "correct" or "authoritative"; this row becomes
459
To sync all three servers completely, two runs of pt-table-sync are required.
460
The first run syncs c1 and r1, then syncs c1 and r2 including any changes
461
from r1. At this point c1 and r2 are completely in sync, but r1 is missing
462
any changes from r2 because c1 didn't have these changes when it and r1
463
were synced. So a second run is needed which syncs the servers in the same
464
order, but this time when c1 and r1 are synced r1 gets r2's changes.
466
The tool does not sync N-ways, only bidirectionally between the first DSN
467
given on the command line and each subsequent DSN in turn. So the tool in
468
this example would be ran twice like:
473
pt-table-sync --bidirectional h=c1 h=r1 h=r2
476
The "--bidirectional" option enables this feature and causes various
477
sanity checks to be performed. You must specify other options that tell
478
pt-table-sync how to resolve conflicts for same but differing rows.
484
* L<"--conflict-column">
485
* L<"--conflict-comparison">
486
* L<"--conflict-value">
487
* L<"--conflict-threshold">
488
* L<"--conflict-error"> (optional)
491
Use "--print" to test this option before "--execute". The printed
492
SQL statements will have comments saying on which host the statement
493
would be executed if you used "--execute".
495
Technical side note: the first DSN is always the "left" server and the other
496
DSNs are always the "right" server. Since either server can become the source
497
or destination it's confusing to think of them as "src" and "dst". Therefore,
498
they're generically referred to as left and right. It's easy to remember
499
this because the first DSN is always to the left of the other server DSNs on
508
The following are the exit statuses (also called return values, or return codes)
509
when pt-table-sync finishes and exits.
515
====== =======================================================
518
2 At least one table differed on the destination.
519
3 Combination of 1 and 2.
528
Specify at least one of "--print", "--execute", or "--dry-run".
530
"--where" and "--replicate" are mutually exclusive.
532
This tool accepts additional command-line arguments. Refer to the
533
"SYNOPSIS" and usage information for details.
538
type: string; default: Chunk,Nibble,GroupBy,Stream
540
Algorithm to use when comparing the tables, in order of preference.
542
For each table, pt-table-sync will check if the table can be synced with
543
the given algorithms in the order that they're given. The first algorithm
544
that can sync the table is used. See "ALGORITHMS".
550
Prompt for a password when connecting to MySQL.
556
Enable bidirectional sync between first and subsequent hosts.
558
See "BIDIRECTIONAL SYNCING" for more information.
566
Log to the binary log (\ ``SET SQL_LOG_BIN=1``\ ).
568
Specifying \ ``--no-bin-log``\ will \ ``SET SQL_LOG_BIN=0``\ .
574
Instruct MySQL to buffer queries in its memory.
576
This option adds the \ ``SQL_BUFFER_RESULT``\ option to the comparison queries.
577
This causes MySQL to execute the queries and place them in a temporary table
578
internally before sending the results back to pt-table-sync. The advantage of
579
this strategy is that pt-table-sync can fetch rows as desired without using a
580
lot of memory inside the Perl process, while releasing locks on the MySQL table
581
(to reduce contention with other queries). The disadvantage is that it uses
582
more memory on the MySQL server instead.
584
You probably want to leave "--[no]buffer-to-client" enabled too, because
585
buffering into a temp table and then fetching it all into Perl's memory is
586
probably a silly thing to do. This option is most useful for the GroupBy and
587
Stream algorithms, which may fetch a lot of data from the server.
591
--[no]buffer-to-client
595
Fetch rows one-by-one from MySQL while comparing.
597
This option enables \ ``mysql_use_result``\ which causes MySQL to hold the selected
598
rows on the server until the tool fetches them. This allows the tool to use
599
less memory but may keep the rows locked on the server longer.
601
If this option is disabled by specifying \ ``--no-buffer-to-client``\ then
602
\ ``mysql_store_result``\ is used which causes MySQL to send all selected rows to
603
the tool at once. This may result in the results "cursor" being held open for
604
a shorter time on the server, but if the tables are large, it could take a long
605
time anyway, and use all your memory.
607
For most non-trivial data sizes, you want to leave this option enabled.
609
This option is disabled when "--bidirectional" is used.
615
short form: -A; type: string
617
Default character set. If the value is utf8, sets Perl's binmode on
618
STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and
619
runs SET NAMES UTF8 after connecting to MySQL. Any other value sets
620
binmode on STDOUT without the utf8 layer, and runs SET NAMES after
629
With "--sync-to-master", try to verify that the detected
630
master is the real master.
634
--[no]check-privileges
638
Check that user has all necessary privileges on source and destination table.
646
Check whether the destination server is a slave.
648
If the destination server is a slave, it's generally unsafe to make changes on
649
it. However, sometimes you have to; "--replace" won't work unless there's a
650
unique index, for example, so you can't make changes on the master in that
651
scenario. By default pt-table-sync will complain if you try to change data on
652
a slave. Specify \ ``--no-check-slave``\ to disable this check. Use it at your own
661
Check that no triggers are defined on the destination table.
663
Triggers were introduced in MySQL v5.0.2, so for older versions this option
664
has no effect because triggers will not be checked.
672
Chunk the table on this column.
680
Chunk the table using this index.
686
type: string; default: 1000
688
Number of rows or data size per chunk.
690
The size of each chunk of rows for the "Chunk" and "Nibble" algorithms.
691
The size can be either a number of rows, or a data size. Data sizes are
692
specified with a suffix of k=kibibytes, M=mebibytes, G=gibibytes. Data sizes
693
are converted to a number of rows by dividing by the average row length.
699
short form: -c; type: array
701
Compare this comma-separated list of columns.
709
Read this comma-separated list of config files; if specified, this must be the
710
first option on the command line.
718
Compare this column when rows conflict during a "--bidirectional" sync.
720
When a same but differing row is found the value of this column from each
721
row is compared according to "--conflict-comparison", "--conflict-value"
722
and "--conflict-threshold" to determine which row has the correct data and
723
becomes the source. The column can be any type for which there is an
724
appropriate "--conflict-comparison" (this is almost all types except, for
727
This option only works with "--bidirectional".
728
See "BIDIRECTIONAL SYNCING" for more information.
732
--conflict-comparison
736
Choose the "--conflict-column" with this property as the source.
738
The option affects how the "--conflict-column" values from the conflicting
739
rows are compared. Possible comparisons are one of these MAGIC_comparisons:
744
newest|oldest|greatest|least|equals|matches
746
COMPARISON CHOOSES ROW WITH
747
========== =========================================================
748
newest Newest temporal L<"--conflict-column"> value
749
oldest Oldest temporal L<"--conflict-column"> value
750
greatest Greatest numerical L<"--conflict-column"> value
751
least Least numerical L<"--conflict-column"> value
752
equals L<"--conflict-column"> value equal to L<"--conflict-value">
753
matches L<"--conflict-column"> value matching Perl regex pattern
754
L<"--conflict-value">
757
This option only works with "--bidirectional".
758
See "BIDIRECTIONAL SYNCING" for more information.
764
type: string; default: warn
766
How to report unresolvable conflicts and conflict errors
768
This option changes how the user is notified when a conflict cannot be
769
resolved or causes some kind of error. Possible values are:
774
* warn: Print a warning to STDERR about the unresolvable conflict
775
* die: Die, stop syncing, and print a warning to STDERR
778
This option only works with "--bidirectional".
779
See "BIDIRECTIONAL SYNCING" for more information.
787
Amount by which one "--conflict-column" must exceed the other.
789
The "--conflict-threshold" prevents a conflict from being resolved if
790
the absolute difference between the two "--conflict-column" values is
791
less than this amount. For example, if two "--conflict-column" have
792
timestamp values "2009-12-01 12:00:00" and "2009-12-01 12:05:00" the difference
793
is 5 minutes. If "--conflict-threshold" is set to "5m" the conflict will
794
be resolved, but if "--conflict-threshold" is set to "6m" the conflict
795
will fail to resolve because the difference is not greater than or equal
796
to 6 minutes. In this latter case, "--conflict-error" will report
799
This option only works with "--bidirectional".
800
See "BIDIRECTIONAL SYNCING" for more information.
808
Use this value for certain "--conflict-comparison".
810
This option gives the value for \ ``equals``\ and \ ``matches``\
811
"--conflict-comparison".
813
This option only works with "--bidirectional".
814
See "BIDIRECTIONAL SYNCING" for more information.
820
short form: -d; type: hash
822
Sync only this comma-separated list of databases.
824
A common request is to sync tables from one database with tables from another
825
database on the same or different server. This is not yet possible.
826
"--databases" will not do it, and you can't do it with the D part of the DSN
827
either because in the absence of a table name it assumes the whole server
828
should be synced and the D part controls only the connection's default database.
834
short form: -F; type: string
836
Only read mysql options from the given file. You must give an absolute pathname.
842
Analyze, decide the sync algorithm to use, print and exit.
844
Implies "--verbose" so you can see the results. The results are in the same
845
output format that you'll see from actually running the tool, but there will be
846
zeros for rows affected. This is because the tool actually executes, but stops
847
before it compares any data and just returns zeros. The zeros do not mean there
848
are no changes to be made.
854
short form: -e; type: hash
856
Sync only this comma-separated list of storage engines.
862
Execute queries to make the tables have identical data.
864
This option makes pt-table-sync actually sync table data by executing all
865
the queries that it created to resolve table differences. Therefore, \ **the
866
tables will be changed!**\ And unless you also specify "--verbose", the
867
changes will be made silently. If this is not what you want, see
868
"--print" or "--dry-run".
874
Print connection information and exit.
876
Print out a list of hosts to which pt-table-sync will connect, with all
877
the various connection options, and exit.
885
Precision for \ ``FLOAT``\ and \ ``DOUBLE``\ number-to-string conversion. Causes FLOAT
886
and DOUBLE values to be rounded to the specified number of digits after the
887
decimal point, with the ROUND() function in MySQL. This can help avoid
888
checksum mismatches due to different floating-point representations of the same
889
values on different MySQL versions and hardware. The default is no rounding;
890
the values are converted to strings by the CONCAT() function, and MySQL chooses
891
the string representation. If you specify a value of 2, for example, then the
892
values 1.008 and 1.009 will be rounded to 1.01, and will checksum as equal.
896
--[no]foreign-key-checks
900
Enable foreign key checks (\ ``SET FOREIGN_KEY_CHECKS=1``\ ).
902
Specifying \ ``--no-foreign-key-checks``\ will \ ``SET FOREIGN_KEY_CHECKS=0``\ .
910
Which hash function you'd like to use for checksums.
912
The default is \ ``CRC32``\ . Other good choices include \ ``MD5``\ and \ ``SHA1``\ . If you
913
have installed the \ ``FNV_64``\ user-defined function, \ ``pt-table-sync``\ will detect
914
it and prefer to use it, because it is much faster than the built-ins. You can
915
also use MURMUR_HASH if you've installed that user-defined function. Both of
916
these are distributed with Maatkit. See pt-table-checksum for more
917
information and benchmarks.
931
\ ``HEX()``\ \ ``BLOB``\ , \ ``TEXT``\ and \ ``BINARY``\ columns.
933
When row data from the source is fetched to create queries to sync the
934
data (i.e. the queries seen with "--print" and executed by "--execute"),
935
binary columns are wrapped in HEX() so the binary data does not produce
936
an invalid SQL statement. You can disable this option but you probably
943
short form: -h; type: string
953
Ignore this comma-separated list of column names in comparisons.
955
This option causes columns not to be compared. However, if a row is determined
956
to differ between tables, all columns in that row will be synced, regardless.
957
(It is not currently possible to exclude columns from the sync process itself,
958
only from the comparison.)
966
Ignore this comma-separated list of databases.
972
type: Hash; default: FEDERATED,MRG_MyISAM
974
Ignore this comma-separated list of storage engines.
982
Ignore this comma-separated list of tables.
984
Table names may be qualified with the database name.
992
Add FORCE/USE INDEX hints to the chunk and row queries.
994
By default \ ``pt-table-sync``\ adds a FORCE/USE INDEX hint to each SQL statement
995
to coerce MySQL into using the index chosen by the sync algorithm or specified
996
by "--chunk-index". This is usually a good thing, but in rare cases the
997
index may not be the best for the query so you can suppress the index hint
998
by specifying \ ``--no-index-hint``\ and let MySQL choose the index.
1000
This does not affect the queries printed by "--print"; it only affects the
1001
chunk and row queries that \ ``pt-table-sync``\ uses to select and compare rows.
1009
Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
1011
This uses \ ``LOCK TABLES``\ . This can help prevent tables being changed while
1012
you're examining them. The possible values are as follows:
1015
.. code-block:: perl
1018
===== =======================================================
1019
0 Never lock tables.
1020
1 Lock and unlock one time per sync cycle (as implemented
1021
by the syncing algorithm). This is the most granular
1022
level of locking available. For example, the Chunk
1023
algorithm will lock each chunk of C<N> rows, and then
1024
unlock them if they are the same on the source and the
1025
destination, before moving on to the next chunk.
1026
2 Lock and unlock before and after each table.
1027
3 Lock and unlock once for every server (DSN) synced, with
1028
C<FLUSH TABLES WITH READ LOCK>.
1031
A replication slave is never locked if "--replicate" or "--sync-to-master"
1032
is specified, since in theory locking the table on the master should prevent any
1033
changes from taking place. (You are not changing data on your slave, right?)
1034
If "--wait" is given, the master (source) is locked and then the tool waits
1035
for the slave to catch up to the master before continuing.
1037
If \ ``--transaction``\ is specified, \ ``LOCK TABLES``\ is not used. Instead, lock
1038
and unlock are implemented by beginning and committing transactions.
1039
The exception is if "--lock" is 3.
1041
If \ ``--no-transaction``\ is specified, then \ ``LOCK TABLES``\ is used for any
1042
value of "--lock". See "--[no]transaction".
1048
Lock the source and destination table, sync, then swap names. This is useful as
1049
a less-blocking ALTER TABLE, once the tables are reasonably in sync with each
1050
other (which you may choose to accomplish via any number of means, including
1051
dump and reload or even something like pt-archiver). It requires exactly two
1052
DSNs and assumes they are on the same server, so it does no waiting for
1053
replication or the like. Tables are locked with LOCK TABLES.
1059
short form: -p; type: string
1061
Password to use when connecting.
1069
Create the given PID file. The file contains the process ID of the script.
1070
The PID file is removed when the script exits. Before starting, the script
1071
checks if the PID file already exists. If it does not, then the script creates
1072
and writes its own PID to it. If it does, then the script checks the following:
1073
if the file contains a PID and a process is running with that PID, then
1074
the script dies; or, if there is no process running with that PID, then the
1075
script overwrites the file with its own PID and starts; else, if the file
1076
contains no PID, then the script dies.
1082
short form: -P; type: int
1084
Port number to use for connection.
1090
Print queries that will resolve differences.
1092
If you don't trust \ ``pt-table-sync``\ , or just want to see what it will do, this
1093
is a good way to be safe. These queries are valid SQL and you can run them
1094
yourself if you want to sync the tables manually.
1102
Preferred recursion method used to find slaves.
1104
Possible methods are:
1107
.. code-block:: perl
1110
=========== ================
1111
processlist SHOW PROCESSLIST
1112
hosts SHOW SLAVE HOSTS
1115
The processlist method is preferred because SHOW SLAVE HOSTS is not reliable.
1116
However, the hosts method is required if the server uses a non-standard
1117
port (not 3306). Usually pt-table-sync does the right thing and finds
1118
the slaves, but you may give a preferred method and it will be used first.
1119
If it doesn't find any slaves, the other methods will be tried.
1125
Write all \ ``INSERT``\ and \ ``UPDATE``\ statements as \ ``REPLACE``\ .
1127
This is automatically switched on as needed when there are unique index
1136
Sync tables listed as different in this table.
1138
Specifies that \ ``pt-table-sync``\ should examine the specified table to find data
1139
that differs. The table is exactly the same as the argument of the same name to
1140
pt-table-checksum. That is, it contains records of which tables (and ranges
1141
of values) differ between the master and slave.
1143
For each table and range of values that shows differences between the master and
1144
slave, \ ``pt-table-checksum``\ will sync that table, with the appropriate \ ``WHERE``\
1145
clause, to its master.
1147
This automatically sets "--wait" to 60 and causes changes to be made on the
1148
master instead of the slave.
1150
If "--sync-to-master" is specified, the tool will assume the server you
1151
specified is the slave, and connect to the master as usual to sync.
1153
Otherwise, it will try to use \ ``SHOW PROCESSLIST``\ to find slaves of the server
1154
you specified. If it is unable to find any slaves via \ ``SHOW PROCESSLIST``\ , it
1155
will inspect \ ``SHOW SLAVE HOSTS``\ instead. You must configure each slave's
1156
\ ``report-host``\ , \ ``report-port``\ and other options for this to work right. After
1157
finding slaves, it will inspect the specified table on each slave to find data
1158
that needs to be synced, and sync it.
1160
The tool examines the master's copy of the table first, assuming that the master
1161
is potentially a slave as well. Any table that shows differences there will
1162
\ **NOT**\ be synced on the slave(s). For example, suppose your replication is set
1163
up as A->B, B->C, B->D. Suppose you use this argument and specify server B.
1164
The tool will examine server B's copy of the table. If it looks like server B's
1165
data in table \ ``test.tbl1``\ is different from server A's copy, the tool will not
1166
sync that table on servers C and D.
1172
type: string; default: wait_timeout=10000
1174
Set these MySQL variables. Immediately after connecting to MySQL, this
1175
string will be appended to SET and executed.
1181
short form: -S; type: string
1183
Socket file to use for connection.
1189
Treat the DSN as a slave and sync it to its master.
1191
Treat the server you specified as a slave. Inspect \ ``SHOW SLAVE STATUS``\ ,
1192
connect to the server's master, and treat the master as the source and the slave
1193
as the destination. Causes changes to be made on the master. Sets "--wait"
1194
to 60 by default, sets "--lock" to 1 by default, and disables
1195
"--[no]transaction" by default. See also "--replicate", which changes
1196
this option's behavior.
1202
short form: -t; type: hash
1204
Sync only this comma-separated list of tables.
1206
Table names may be qualified with the database name.
1212
Keep going if "--wait" fails.
1214
If you specify "--wait" and the slave doesn't catch up to the master's
1215
position before the wait times out, the default behavior is to abort. This
1216
option makes the tool keep going anyway. \ **Warning**\ : if you are trying to get a
1217
consistent comparison between the two servers, you probably don't want to keep
1218
going after a timeout.
1224
Use transactions instead of \ ``LOCK TABLES``\ .
1226
The granularity of beginning and committing transactions is controlled by
1227
"--lock". This is enabled by default, but since "--lock" is disabled by
1228
default, it has no effect.
1230
Most options that enable locking also disable transactions by default, so if
1231
you want to use transactional locking (via \ ``LOCK IN SHARE MODE``\ and \ ``FOR
1232
UPDATE``\ , you must specify \ ``--transaction``\ explicitly.
1234
If you don't specify \ ``--transaction``\ explicitly \ ``pt-table-sync``\ will decide on
1235
a per-table basis whether to use transactions or table locks. It currently
1236
uses transactions on InnoDB tables, and table locks on all others.
1238
If \ ``--no-transaction``\ is specified, then \ ``pt-table-sync``\ will not use
1239
transactions at all (not even for InnoDB tables) and locking is controlled
1242
When enabled, either explicitly or implicitly, the transaction isolation level
1243
is set \ ``REPEATABLE READ``\ and transactions are started \ ``WITH CONSISTENT
1250
\ ``TRIM()``\ \ ``VARCHAR``\ columns in \ ``BIT_XOR``\ and \ ``ACCUM``\ modes. Helps when
1251
comparing MySQL 4.1 to >= 5.0.
1253
This is useful when you don't care about the trailing space differences between
1254
MySQL versions which vary in their handling of trailing spaces. MySQL 5.0 and
1255
later all retain trailing spaces in \ ``VARCHAR``\ , while previous versions would
1264
Enable unique key checks (\ ``SET UNIQUE_CHECKS=1``\ ).
1266
Specifying \ ``--no-unique-checks``\ will \ ``SET UNIQUE_CHECKS=0``\ .
1272
short form: -u; type: string
1274
User for login if not current user.
1280
short form: -v; cumulative: yes
1282
Print results of sync operations.
1284
See "OUTPUT" for more details about the output.
1290
Show version and exit.
1296
short form: -w; type: time
1298
How long to wait for slaves to catch up to their master.
1300
Make the master wait for the slave to catch up in replication before comparing
1301
the tables. The value is the number of seconds to wait before timing out (see
1302
also "--timeout-ok"). Sets "--lock" to 1 and "--[no]transaction" to 0
1303
by default. If you see an error such as the following,
1306
.. code-block:: perl
1308
MASTER_POS_WAIT returned -1
1311
It means the timeout was exceeded and you need to increase it.
1313
The default value of this option is influenced by other options. To see what
1314
value is in effect, run with "--help".
1316
To disable waiting entirely (except for locks), specify "--wait" 0. This
1317
helps when the slave is lagging on tables that are not being synced.
1325
\ ``WHERE``\ clause to restrict syncing to part of the table.
1333
Add a chunk for rows with zero or zero-equivalent values. The only has an
1334
effect when "--chunk-size" is specified. The purpose of the zero chunk
1335
is to capture a potentially large number of zero values that would imbalance
1336
the size of the first chunk. For example, if a lot of negative numbers were
1337
inserted into an unsigned integer column causing them to be stored as zeros,
1338
then these zero values are captured by the zero chunk instead of the first
1339
chunk and all its non-zero values.
1349
These DSN options are used to create a DSN. Each option is given like
1350
\ ``option=value``\ . The options are case-sensitive, so P and p are not the
1351
same option. There cannot be whitespace before or after the \ ``=``\ and
1352
if the value contains whitespace it must be quoted. DSN options are
1353
comma-separated. See the percona-toolkit manpage for full details.
1358
dsn: charset; copy: yes
1360
Default character set.
1366
dsn: database; copy: yes
1368
Database containing the table to be synced.
1374
dsn: mysql_read_default_file; copy: yes
1376
Only read default options from the given file
1382
dsn: host; copy: yes
1390
dsn: password; copy: yes
1392
Password to use when connecting.
1398
dsn: port; copy: yes
1400
Port number to use for connection.
1406
dsn: mysql_socket; copy: yes
1408
Socket file to use for connection.
1422
dsn: user; copy: yes
1424
User for login if not current user.
1434
Visit `http://www.percona.com/software/ <http://www.percona.com/software/>`_ to download the latest release of
1435
Percona Toolkit. Or, to get the latest release from the command line:
1438
.. code-block:: perl
1440
wget percona.com/latest/percona-toolkit/PKG
1443
Replace \ ``PKG``\ with \ ``tar``\ , \ ``rpm``\ , or \ ``deb``\ to download the package in that
1444
format. You can also get individual tools from the latest release:
1447
.. code-block:: perl
1449
wget percona.com/latest/percona-toolkit/TOOL
1452
Replace \ ``TOOL``\ with the name of any tool.
1460
The environment variable \ ``PTDEBUG``\ enables verbose debugging output to STDERR.
1461
To enable debugging and capture all output to a file, run the tool like:
1464
.. code-block:: perl
1466
PTDEBUG=1 pt-table-sync ... > FILE 2>&1
1469
Be careful: debugging output is voluminous and can generate several megabytes
1478
You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1479
installed in any reasonably new version of Perl.
1487
For a list of known bugs, see `http://www.percona.com/bugs/pt-table-sync <http://www.percona.com/bugs/pt-table-sync>`_.
1489
Please report bugs at `https://bugs.launchpad.net/percona-toolkit <https://bugs.launchpad.net/percona-toolkit>`_.
1490
Include the following information in your bug report:
1493
\* Complete command-line used to run the tool
1501
\* MySQL version of all servers involved
1505
\* Output from the tool including STDERR
1509
\* Input files (log/dump/config files, etc.)
1513
If possible, include debugging output by running the tool with \ ``PTDEBUG``\ ;
1530
My work is based in part on Giuseppe Maxia's work on distributed databases,
1531
`http://www.sysadminmag.com/articles/2004/0408/ <http://www.sysadminmag.com/articles/2004/0408/>`_ and code derived from that
1532
article. There is more explanation, and a link to the code, at
1533
`http://www.perlmonks.org/?node_id=381053 <http://www.perlmonks.org/?node_id=381053>`_.
1535
Another programmer extended Maxia's work even further. Fabien Coelho changed
1536
and generalized Maxia's technique, introducing symmetry and avoiding some
1537
problems that might have caused too-frequent checksum collisions. This work
1538
grew into pg_comparator, `http://www.coelho.net/pg_comparator/ <http://www.coelho.net/pg_comparator/>`_. Coelho also
1539
explained the technique further in a paper titled "Remote Comparison of Database
1540
Tables" (`http://cri.ensmp.fr/classement/doc/A-375.pdf <http://cri.ensmp.fr/classement/doc/A-375.pdf>`_).
1542
This existing literature mostly addressed how to find the differences between
1543
the tables, not how to resolve them once found. I needed a tool that would not
1544
only find them efficiently, but would then resolve them. I first began thinking
1545
about how to improve the technique further with my article
1546
`http://tinyurl.com/mysql-data-diff-algorithm <http://tinyurl.com/mysql-data-diff-algorithm>`_,
1547
where I discussed a number of problems with the Maxia/Coelho "bottom-up"
1548
algorithm. After writing that article, I began to write this tool. I wanted to
1549
actually implement their algorithm with some improvements so I was sure I
1550
understood it completely. I discovered it is not what I thought it was, and is
1551
considerably more complex than it appeared to me at first. Fabien Coelho was
1552
kind enough to address some questions over email.
1554
The first versions of this tool implemented a version of the Coelho/Maxia
1555
algorithm, which I called "bottom-up", and my own, which I called "top-down."
1556
Those algorithms are considerably more complex than the current algorithms and
1557
I have removed them from this tool, and may add them back later. The
1558
improvements to the bottom-up algorithm are my original work, as is the
1559
top-down algorithm. The techniques to actually resolve the differences are
1562
Another tool that can synchronize tables is the SQLyog Job Agent from webyog.
1563
Thanks to Rohit Nadhani, SJA's author, for the conversations about the general
1564
techniques. There is a comparison of pt-table-sync and SJA at
1565
`http://tinyurl.com/maatkit-vs-sqlyog <http://tinyurl.com/maatkit-vs-sqlyog>`_
1567
Thanks to the following people and organizations for helping in many ways:
1569
The Rimm-Kaufman Group `http://www.rimmkaufman.com/ <http://www.rimmkaufman.com/>`_,
1570
MySQL AB `http://www.mysql.com/ <http://www.mysql.com/>`_,
1571
Blue Ridge InternetWorks `http://www.briworks.com/ <http://www.briworks.com/>`_,
1572
Percona `http://www.percona.com/ <http://www.percona.com/>`_,
1574
Giuseppe Maxia and others at MySQL AB,
1575
Kristian Koehntopp (MySQL AB),
1576
Rohit Nadhani (WebYog),
1577
The helpful monks at Perlmonks,
1578
And others too numerous to mention.
1581
*********************
1582
ABOUT PERCONA TOOLKIT
1583
*********************
1586
This tool is part of Percona Toolkit, a collection of advanced command-line
1587
tools developed by Percona for MySQL support and consulting. Percona Toolkit
1588
was forked from two projects in June, 2011: Maatkit and Aspersa. Those
1589
projects were created by Baron Schwartz and developed primarily by him and
1590
Daniel Nichter, both of whom are employed by Percona. Visit
1591
`http://www.percona.com/software/ <http://www.percona.com/software/>`_ for more software developed by Percona.
1594
********************************
1595
COPYRIGHT, LICENSE, AND WARRANTY
1596
********************************
1599
This program is copyright 2007-2011 Baron Schwartz, 2011 Percona Inc.
1600
Feedback and improvements are welcome.
1602
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1603
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1604
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1606
This program is free software; you can redistribute it and/or modify it under
1607
the terms of the GNU General Public License as published by the Free Software
1608
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
1609
systems, you can issue \`man perlgpl' or \`man perlartistic' to read these
1612
You should have received a copy of the GNU General Public License along with
1613
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
1614
Place, Suite 330, Boston, MA 02111-1307 USA.
1622
Percona Toolkit v1.0.0 released 2011-08-01