14
pt-archiver - Archive rows from a MySQL table into another table or a file.
22
Usage: pt-archiver [OPTION...] --source DSN --where WHERE
24
pt-archiver nibbles records from a MySQL table. The --source and --dest
25
arguments use DSN syntax; if COPY is yes, --dest defaults to the key's value
30
Archive all rows from oltp_server to olap_server and to a file:
35
pt-archiver --source h=oltp_server,D=test,t=tbl --dest h=olap_server \
36
--file '/var/log/archive/%Y-%m-%d-%D.%t' \
37
--where "1=1" --limit 1000 --commit-each
40
Purge (delete) orphan rows from child table:
45
pt-archiver --source h=host,D=db,t=child --purge \
46
--where 'NOT EXISTS(SELECT * FROM parent WHERE col=child.col)'
55
The following section is included to inform users about the potential risks,
56
whether known or unknown, of using this tool. The two main categories of risks
57
are those created by the nature of the tool (e.g. read-only tools vs. read-write
58
tools) and those created by bugs.
60
pt-achiver is a read-write tool. It deletes data from the source by default, so
61
you should test your archiving jobs with the "--dry-run" option if you're not
62
sure about them. It is designed to have as little impact on production systems
63
as possible, but tuning with "--limit", "--txn-size" and similar options
64
might be a good idea too.
66
If you write or use "--plugin" modules, you should ensure they are good
67
quality and well-tested.
69
At the time of this release there is an unverified bug with
70
"--bulk-insert" that may cause data loss.
72
The authoritative source for updated information is always the online issue
73
tracking system. Issues that affect this tool will be marked as such. You can
74
see a list of such issues at the following URL:
75
`http://www.percona.com/bugs/pt-archiver <http://www.percona.com/bugs/pt-archiver>`_.
77
See also "BUGS" for more information on filing bugs and getting help.
85
pt-archiver is the tool I use to archive tables as described in
86
`http://tinyurl.com/mysql-archiving <http://tinyurl.com/mysql-archiving>`_. The goal is a low-impact, forward-only
87
job to nibble old data out of the table without impacting OLTP queries much.
88
You can insert the data into another table, which need not be on the same
89
server. You can also write it to a file in a format suitable for LOAD DATA
90
INFILE. Or you can do neither, in which case it's just an incremental DELETE.
92
pt-archiver is extensible via a plugin mechanism. You can inject your own
93
code to add advanced archiving logic that could be useful for archiving
94
dependent data, applying complex business rules, or building a data warehouse
95
during the archiving process.
97
You need to choose values carefully for some options. The most important are
98
"--limit", "--retries", and "--txn-size".
100
The strategy is to find the first row(s), then scan some index forward-only to
101
find more rows efficiently. Each subsequent query should not scan the entire
102
table; it should seek into the index, then scan until it finds more archivable
103
rows. Specifying the index with the 'i' part of the "--source" argument can
104
be crucial for this; use "--dry-run" to examine the generated queries and be
105
sure to EXPLAIN them to see if they are efficient (most of the time you probably
106
want to scan the PRIMARY key, which is the default). Even better, profile
107
pt-archiver with pt-query-profiler and make sure it is not scanning the whole
110
You can disable the seek-then-scan optimizations partially or wholly with
111
"--no-ascend" and "--ascend-first". Sometimes this may be more efficient
112
for multi-column keys. Be aware that pt-archiver is built to start at the
113
beginning of the index it chooses and scan it forward-only. This might result
114
in long table scans if you're trying to nibble from the end of the table by an
115
index other than the one it prefers. See "--source" and read the
116
documentation on the \ ``i``\ part if this applies to you.
124
If you specify "--progress", the output is a header row, plus status output
125
at intervals. Each row in the status output lists the current date and time,
126
how many seconds pt-archiver has been running, and how many rows it has
129
If you specify "--statistics", \ ``pt-archiver``\ outputs timing and other
130
information to help you identify which part of your archiving process takes the
139
pt-archiver tries to catch signals and exit gracefully; for example, if you
140
send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the signal, print a
141
message about the signal, and exit fairly normally. It will not execute
142
"--analyze" or "--optimize", because these may take a long time to finish.
143
It will run all other code normally, including calling after_finish() on any
144
plugins (see "EXTENDING").
146
In other words, a signal, if caught, will break out of the main archiving
147
loop and skip optimize/analyze.
155
Specify at least one of "--dest", "--file", or "--purge".
157
"--ignore" and "--replace" are mutually exclusive.
159
"--txn-size" and "--commit-each" are mutually exclusive.
161
"--low-priority-insert" and "--delayed-insert" are mutually exclusive.
163
"--share-lock" and "--for-update" are mutually exclusive.
165
"--analyze" and "--optimize" are mutually exclusive.
167
"--no-ascend" and "--no-delete" are mutually exclusive.
169
DSN values in "--dest" default to values from "--source" if COPY is yes.
176
Run ANALYZE TABLE afterwards on "--source" and/or "--dest".
178
Runs ANALYZE TABLE after finishing. The argument is an arbitrary string. If it
179
contains the letter 's', the source will be analyzed. If it contains 'd', the
180
destination will be analyzed. You can specify either or both. For example, the
181
following will analyze both:
189
See `http://dev.mysql.com/doc/en/analyze-table.html <http://dev.mysql.com/doc/en/analyze-table.html>`_ for details on ANALYZE
196
Ascend only first column of index.
198
If you do want to use the ascending index optimization (see "--no-ascend"),
199
but do not want to incur the overhead of ascending a large multi-column index,
200
you can use this option to tell pt-archiver to ascend only the leftmost column
201
of the index. This can provide a significant performance boost over not
202
ascending the index at all, while avoiding the cost of ascending the whole
205
See "EXTENDING" for a discussion of how this interacts with plugins.
211
Prompt for a password when connecting to MySQL.
217
Buffer output to "--file" and flush at commit.
219
Disables autoflushing to "--file" and flushes "--file" to disk only when a
220
transaction commits. This typically means the file is block-flushed by the
221
operating system, so there may be some implicit flushes to disk between
222
commits as well. The default is to flush "--file" to disk after every row.
224
The danger is that a crash might cause lost data.
226
The performance increase I have seen from using "--buffer" is around 5 to 15
227
percent. Your mileage may vary.
233
Delete each chunk with a single statement (implies "--commit-each").
235
Delete each chunk of rows in bulk with a single \ ``DELETE``\ statement. The
236
statement deletes every row between the first and last row of the chunk,
237
inclusive. It implies "--commit-each", since it would be a bad idea to
238
\ ``INSERT``\ rows one at a time and commit them before the bulk \ ``DELETE``\ .
240
The normal method is to delete every row by its primary key. Bulk deletes might
241
be a lot faster. \ **They also might not be faster**\ if you have a complex
244
This option completely defers all \ ``DELETE``\ processing until the chunk of rows
245
is finished. If you have a plugin on the source, its \ ``before_delete``\ method
246
will not be called. Instead, its \ ``before_bulk_delete``\ method is called later.
248
\ **WARNING**\ : if you have a plugin on the source that sometimes doesn't return
249
true from \ ``is_archivable()``\ , you should use this option only if you understand
250
what it does. If the plugin instructs \ ``pt-archiver``\ not to archive a row,
251
it will still be deleted by the bulk delete!
255
--[no]bulk-delete-limit
259
Add "--limit" to "--bulk-delete" statement.
261
This is an advanced option and you should not disable it unless you know what
262
you are doing and why! By default, "--bulk-delete" appends a "--limit"
263
clause to the bulk delete SQL statement. In certain cases, this clause can be
264
omitted by specifying \ ``--no-bulk-delete-limit``\ . "--limit" must still be
271
Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete" "--commit-each").
273
Insert each chunk of rows with \ ``LOAD DATA LOCAL INFILE``\ . This may be much
274
faster than inserting a row at a time with \ ``INSERT``\ statements. It is
275
implemented by creating a temporary file for each chunk of rows, and writing the
276
rows to this file instead of inserting them. When the chunk is finished, it
279
To protect the safety of your data, this option forces bulk deletes to be used.
280
It would be unsafe to delete each row as it is found, before inserting the rows
281
into the destination first. Forcing bulk deletes guarantees that the deletion
282
waits until the insertion is successful.
284
The "--low-priority-insert", "--replace", and "--ignore" options work
285
with this option, but "--delayed-insert" does not.
291
short form: -A; type: string
293
Default character set. If the value is utf8, sets Perl's binmode on
294
STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET
295
NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT
296
without the utf8 layer, and runs SET NAMES after connecting to MySQL.
298
See also "--[no]check-charset".
306
Ensure connection and table character sets are the same. Disabling this check
307
may cause text to be erroneously converted from one character set to another
308
(usually from utf8 to latin1) which may cause data loss or mojibake. Disabling
309
this check may be useful or necessary when character set conversions are
318
Ensure "--source" and "--dest" have same columns.
320
Enabled by default; causes pt-archiver to check that the source and destination
321
tables have the same columns. It does not check column order, data type, etc.
322
It just checks that all columns in the source exist in the destination and
323
vice versa. If there are any differences, pt-archiver will exit with an
326
To disable this check, specify --no-check-columns.
332
type: time; default: 1s
334
How often to check for slave lag if "--check-slave-lag" is given.
342
Pause archiving until the specified DSN's slave lag is less than "--max-lag".
348
short form: -c; type: array
350
Comma-separated list of columns to archive.
352
Specify a comma-separated list of columns to fetch, write to the file, and
353
insert into the destination table. If specified, pt-archiver ignores other
354
columns unless it needs to add them to the \ ``SELECT``\ statement for ascending an
355
index or deleting rows. It fetches and uses these extra columns internally, but
356
does not write them to the file or to the destination table. It \ *does*\ pass
359
See also "--primary-key-only".
365
Commit each set of fetched and archived rows (disables "--txn-size").
367
Commits transactions and flushes "--file" after each set of rows has been
368
archived, before fetching the next set of rows, and before sleeping if
369
"--sleep" is specified. Disables "--txn-size"; use "--limit" to
370
control the transaction size with "--commit-each".
372
This option is useful as a shortcut to make "--limit" and "--txn-size" the
373
same value, but more importantly it avoids transactions being held open while
374
searching for more rows. For example, imagine you are archiving old rows from
375
the beginning of a very large table, with "--limit" 1000 and "--txn-size"
376
1000. After some period of finding and archiving 1000 rows at a time,
377
pt-archiver finds the last 999 rows and archives them, then executes the next
378
SELECT to find more rows. This scans the rest of the table, but never finds any
379
more rows. It has held open a transaction for a very long time, only to
380
determine it is finished anyway. You can use "--commit-each" to avoid this.
388
Read this comma-separated list of config files; if specified, this must be the
389
first option on the command line.
395
Add the DELAYED modifier to INSERT statements.
397
Adds the DELAYED modifier to INSERT or REPLACE statements. See
398
`http://dev.mysql.com/doc/en/insert.html <http://dev.mysql.com/doc/en/insert.html>`_ for details.
406
DSN specifying the table to archive to.
408
This item specifies a table into which pt-archiver will insert rows
409
archived from "--source". It uses the same key=val argument format as
410
"--source". Most missing values default to the same values as
411
"--source", so you don't have to repeat options that are the same in
412
"--source" and "--dest". Use the "--help" option to see which values
413
are copied from "--source".
415
\ **WARNING**\ : Using a default options file (F) DSN option that defines a
416
socket for "--source" causes pt-archiver to connect to "--dest" using
417
that socket unless another socket for "--dest" is specified. This
418
means that pt-archiver may incorrectly connect to "--source" when it
419
connects to "--dest". For example:
424
--source F=host1.cnf,D=db,t=tbl --dest h=host2
427
When pt-archiver connects to "--dest", host2, it will connect via the
428
"--source", host1, socket defined in host1.cnf.
434
Print queries and exit without doing anything.
436
Causes pt-archiver to exit after printing the filename and SQL statements
445
File to archive to, with DATE_FORMAT()-like formatting.
447
Filename to write archived rows to. A subset of MySQL's DATE_FORMAT()
448
formatting codes are allowed in the filename, as follows:
453
%d Day of the month, numeric (01..31)
455
%i Minutes, numeric (00..59)
456
%m Month, numeric (01..12)
458
%Y Year, numeric, four digits
461
You can use the following extra format codes too:
475
--file '/var/log/archive/%Y-%m-%d-%D.%t'
478
The file's contents are in the same format used by SELECT INTO OUTFILE, as
479
documented in the MySQL manual: rows terminated by newlines, columns
480
terminated by tabs, NULL characters are represented by \N, and special
481
characters are escaped by \. This lets you reload a file with LOAD DATA
482
INFILE's default settings.
484
If you want a column header at the top of the file, see "--header". The file
485
is auto-flushed by default; see "--buffer".
491
Adds the FOR UPDATE modifier to SELECT statements.
493
For details, see `http://dev.mysql.com/doc/en/innodb-locking-reads.html <http://dev.mysql.com/doc/en/innodb-locking-reads.html>`_.
499
Print column header at top of "--file".
501
Writes column names as the first line in the file given by "--file". If the
502
file exists, does not write headers; this keeps the file loadable with LOAD
503
DATA INFILE in case you append more output to it.
513
--high-priority-select
515
Adds the HIGH_PRIORITY modifier to SELECT statements.
517
See `http://dev.mysql.com/doc/en/select.html <http://dev.mysql.com/doc/en/select.html>`_ for details.
523
short form: -h; type: string
531
Use IGNORE for INSERT statements.
533
Causes INSERTs into "--dest" to be INSERT IGNORE.
539
type: int; default: 1
541
Number of rows to fetch and archive per statement.
543
Limits the number of rows returned by the SELECT statements that retrieve rows
544
to archive. Default is one row. It may be more efficient to increase the
545
limit, but be careful if you are archiving sparsely, skipping over many rows;
546
this can potentially cause more contention with other queries, depending on the
547
storage engine, transaction isolation level, and options such as
554
Do not write OPTIMIZE or ANALYZE queries to binlog.
556
Adds the NO_WRITE_TO_BINLOG modifier to ANALYZE and OPTIMIZE queries. See
557
"--analyze" for details.
561
--low-priority-delete
563
Adds the LOW_PRIORITY modifier to DELETE statements.
565
See `http://dev.mysql.com/doc/en/delete.html <http://dev.mysql.com/doc/en/delete.html>`_ for details.
569
--low-priority-insert
571
Adds the LOW_PRIORITY modifier to INSERT or REPLACE statements.
573
See `http://dev.mysql.com/doc/en/insert.html <http://dev.mysql.com/doc/en/insert.html>`_ for details.
579
type: time; default: 1s
581
Pause archiving if the slave given by "--check-slave-lag" lags.
583
This option causes pt-archiver to look at the slave every time it's about
584
to fetch another row. If the slave's lag is greater than the option's value,
585
or if the slave isn't running (so its lag is NULL), pt-table-checksum sleeps
586
for "--check-interval" seconds and then looks at the lag again. It repeats
587
until the slave is caught up, then proceeds to fetch and archive the row.
589
This option may eliminate the need for "--sleep" or "--sleep-coef".
595
Do not use ascending index optimization.
597
The default ascending-index optimization causes \ ``pt-archiver``\ to optimize
598
repeated \ ``SELECT``\ queries so they seek into the index where the previous query
599
ended, then scan along it, rather than scanning from the beginning of the table
600
every time. This is enabled by default because it is generally a good strategy
601
for repeated accesses.
603
Large, multiple-column indexes may cause the WHERE clause to be complex enough
604
that this could actually be less efficient. Consider for example a four-column
605
PRIMARY KEY on (a, b, c, d). The WHERE clause to start where the last query
613
OR (a = ? AND b = ? AND c > ?)
614
OR (a = ? AND b = ? AND c = ? AND d >= ?)
617
Populating the placeholders with values uses memory and CPU, adds network
618
traffic and parsing overhead, and may make the query harder for MySQL to
619
optimize. A four-column key isn't a big deal, but a ten-column key in which
620
every column allows \ ``NULL``\ might be.
622
Ascending the index might not be necessary if you know you are simply removing
623
rows from the beginning of the table in chunks, but not leaving any holes, so
624
starting at the beginning of the table is actually the most efficient thing to
627
See also "--ascend-first". See "EXTENDING" for a discussion of how this
628
interacts with plugins.
634
Do not delete archived rows.
636
Causes \ ``pt-archiver``\ not to delete rows after processing them. This disallows
637
"--no-ascend", because enabling them both would cause an infinite loop.
639
If there is a plugin on the source DSN, its \ ``before_delete``\ method is called
640
anyway, even though \ ``pt-archiver``\ will not execute the delete. See
641
"EXTENDING" for more on plugins.
649
Run OPTIMIZE TABLE afterwards on "--source" and/or "--dest".
651
Runs OPTIMIZE TABLE after finishing. See "--analyze" for the option syntax
652
and `http://dev.mysql.com/doc/en/optimize-table.html <http://dev.mysql.com/doc/en/optimize-table.html>`_ for details on OPTIMIZE
659
short form: -p; type: string
661
Password to use when connecting.
669
Create the given PID file when daemonized. The file contains the process ID of
670
the daemonized instance. The PID file is removed when the daemonized instance
671
exits. The program checks for the existence of the PID file when starting; if
672
it exists and the process with the matching PID exists, the program exits.
680
Perl module name to use as a generic plugin.
682
Specify the Perl module name of a general-purpose plugin. It is currently used
683
only for statistics (see "--statistics") and must have \ ``new()``\ and a
684
\ ``statistics()``\ method.
686
The \ ``new( src =``\ $src, dst => $dst, opts => $o )> method gets the source
687
and destination DSNs, and their database connections, just like the
688
connection-specific plugins do. It also gets an OptionParser object (\ ``$o``\ ) for
689
accessing command-line options (example: \ ``$o-``\ get('purge');>).
691
The \ ``statistics(\%stats, $time)``\ method gets a hashref of the statistics
692
collected by the archiving job, and the time the whole job started.
698
short form: -P; type: int
700
Port number to use for connection.
706
Primary key columns only.
708
A shortcut for specifying "--columns" with the primary key columns. This is
709
an efficiency if you just want to purge rows; it avoids fetching the entire row,
710
when only the primary key columns are needed for \ ``DELETE``\ statements. See also
719
Print progress information every X rows.
721
Prints current time, elapsed time, and rows archived every X rows.
727
Purge instead of archiving; allows omitting "--file" and "--dest".
729
Allows archiving without a "--file" or "--dest" argument, which is
730
effectively a purge since the rows are just deleted.
732
If you just want to purge rows, consider specifying the table's primary key
733
columns with "--primary-key-only". This will prevent fetching all columns
734
from the server for no reason.
740
Adds the QUICK modifier to DELETE statements.
742
See `http://dev.mysql.com/doc/en/delete.html <http://dev.mysql.com/doc/en/delete.html>`_ for details. As stated in the
743
documentation, in some cases it may be faster to use DELETE QUICK followed by
744
OPTIMIZE TABLE. You can use "--optimize" for this.
752
Do not print any output, such as for "--statistics".
754
Suppresses normal output, including the output of "--statistics", but doesn't
755
suppress the output from "--why-quit".
761
Causes INSERTs into "--dest" to be written as REPLACE.
767
type: int; default: 1
769
Number of retries per timeout or deadlock.
771
Specifies the number of times pt-archiver should retry when there is an
772
InnoDB lock wait timeout or deadlock. When retries are exhausted,
773
pt-archiver will exit with an error.
775
Consider carefully what you want to happen when you are archiving between a
776
mixture of transactional and non-transactional storage engines. The INSERT to
777
"--dest" and DELETE from "--source" are on separate connections, so they
778
do not actually participate in the same transaction even if they're on the same
779
server. However, pt-archiver implements simple distributed transactions in
780
code, so commits and rollbacks should happen as desired across the two
783
At this time I have not written any code to handle errors with transactional
784
storage engines other than InnoDB. Request that feature if you need it.
792
Time to run before exiting.
794
Optional suffix s=seconds, m=minutes, h=hours, d=days; if no suffix, s is used.
798
--[no]safe-auto-increment
802
Do not archive row with max AUTO_INCREMENT.
804
Adds an extra WHERE clause to prevent pt-archiver from removing the newest
805
row when ascending a single-column AUTO_INCREMENT key. This guards against
806
re-using AUTO_INCREMENT values if the server restarts, and is enabled by
809
The extra WHERE clause contains the maximum value of the auto-increment column
810
as of the beginning of the archive or purge job. If new rows are inserted while
811
pt-archiver is running, it will not see them.
817
type: string; default: /tmp/pt-archiver-sentinel
819
Exit if this file exists.
821
The presence of the file specified by "--sentinel" will cause pt-archiver to
822
stop archiving and exit. The default is /tmp/pt-archiver-sentinel. You
823
might find this handy to stop cron jobs gracefully if necessary. See also
830
type: string; default: wait_timeout=10000
832
Set these MySQL variables.
834
Specify any variables you want to be set immediately after connecting to MySQL.
835
These will be included in a \ ``SET``\ command.
841
Adds the LOCK IN SHARE MODE modifier to SELECT statements.
843
See `http://dev.mysql.com/doc/en/innodb-locking-reads.html <http://dev.mysql.com/doc/en/innodb-locking-reads.html>`_.
847
--skip-foreign-key-checks
849
Disables foreign key checks with SET FOREIGN_KEY_CHECKS=0.
857
Sleep time between fetches.
859
Specifies how long to sleep between SELECT statements. Default is not to
860
sleep at all. Transactions are NOT committed, and the "--file" file is NOT
861
flushed, before sleeping. See "--txn-size" to control that.
863
If "--commit-each" is specified, committing and flushing happens before
872
Calculate "--sleep" as a multiple of the last SELECT time.
874
If this option is specified, pt-archiver will sleep for the query time of the
875
last SELECT multiplied by the specified coefficient.
877
This is a slightly more sophisticated way to throttle the SELECTs: sleep a
878
varying amount of time between each SELECT, depending on how long the SELECTs
885
short form: -S; type: string
887
Socket file to use for connection.
895
DSN specifying the table to archive from (required). This argument is a DSN.
896
See DSN OPTIONS for the syntax. Most options control how pt-archiver
897
connects to MySQL, but there are some extended DSN options in this tool's
898
syntax. The D, t, and i options select a table to archive:
903
--source h=my_server,D=my_database,t=my_tbl
906
The a option specifies the database to set as the connection's default with USE.
907
If the b option is true, it disables binary logging with SQL_LOG_BIN. The m
908
option specifies pluggable actions, which an external Perl module can provide.
909
The only required part is the table; other parts may be read from various
910
places in the environment (such as options files).
912
The 'i' part deserves special mention. This tells pt-archiver which index
913
it should scan to archive. This appears in a FORCE INDEX or USE INDEX hint in
914
the SELECT statements used to fetch archivable rows. If you don't specify
915
anything, pt-archiver will auto-discover a good index, preferring a \ ``PRIMARY
916
KEY``\ if one exists. In my experience this usually works well, so most of the
917
time you can probably just omit the 'i' part.
919
The index is used to optimize repeated accesses to the table; pt-archiver
920
remembers the last row it retrieves from each SELECT statement, and uses it to
921
construct a WHERE clause, using the columns in the specified index, that should
922
allow MySQL to start the next SELECT where the last one ended, rather than
923
potentially scanning from the beginning of the table with each successive
924
SELECT. If you are using external plugins, please see "EXTENDING" for a
925
discussion of how they interact with ascending indexes.
927
The 'a' and 'b' options allow you to control how statements flow through the
928
binary log. If you specify the 'b' option, binary logging will be disabled on
929
the specified connection. If you specify the 'a' option, the connection will
930
\ ``USE``\ the specified database, which you can use to prevent slaves from
931
executing the binary log events with \ ``--replicate-ignore-db``\ options. These
932
two options can be used as different methods to achieve the same goal: archive
933
data off the master, but leave it on the slave. For example, you can run a
934
purge job on the master and prevent it from happening on the slave using your
937
\ **WARNING**\ : Using a default options file (F) DSN option that defines a
938
socket for "--source" causes pt-archiver to connect to "--dest" using
939
that socket unless another socket for "--dest" is specified. This
940
means that pt-archiver may incorrectly connect to "--source" when it
941
is meant to connect to "--dest". For example:
946
--source F=host1.cnf,D=db,t=tbl --dest h=host2
949
When pt-archiver connects to "--dest", host2, it will connect via the
950
"--source", host1, socket defined in host1.cnf.
956
Collect and print timing statistics.
958
Causes pt-archiver to collect timing statistics about what it does. These
959
statistics are available to the plugin specified by "--plugin"
961
Unless you specify "--quiet", \ ``pt-archiver``\ prints the statistics when it
962
exits. The statistics look like this:
967
Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
972
Action Count Time Pct
973
commit 10 0.1079 88.27
975
deleting 4 0.0028 2.29
976
inserting 4 0.0028 2.28
980
The first two (or three) lines show times and the source and destination tables.
981
The next three lines show how many rows were fetched, inserted, and deleted.
983
The remaining lines show counts and timing. The columns are the action, the
984
total number of times that action was timed, the total time it took, and the
985
percent of the program's total runtime. The rows are sorted in order of
986
descending total time. The last row is the rest of the time not explicitly
987
attributed to anything. Actions will vary depending on command-line options.
989
If "--why-quit" is given, its behavior is changed slightly. This option
990
causes it to print the reason for exiting even when it's just because there are
993
This option requires the standard Time::HiRes module, which is part of core Perl
994
on reasonably new Perl releases.
1000
Stop running instances by creating the sentinel file.
1002
Causes pt-archiver to create the sentinel file specified by "--sentinel" and
1003
exit. This should have the effect of stopping all running instances which are
1004
watching the same sentinel file.
1010
type: int; default: 1
1012
Number of rows per transaction.
1014
Specifies the size, in number of rows, of each transaction. Zero disables
1015
transactions altogether. After pt-archiver processes this many rows, it
1016
commits both the "--source" and the "--dest" if given, and flushes the
1017
file given by "--file".
1019
This parameter is critical to performance. If you are archiving from a live
1020
server, which for example is doing heavy OLTP work, you need to choose a good
1021
balance between transaction size and commit overhead. Larger transactions
1022
create the possibility of more lock contention and deadlocks, but smaller
1023
transactions cause more frequent commit overhead, which can be significant. To
1024
give an idea, on a small test set I worked with while writing pt-archiver, a
1025
value of 500 caused archiving to take about 2 seconds per 1000 rows on an
1026
otherwise quiet MySQL instance on my desktop machine, archiving to disk and to
1027
another table. Disabling transactions with a value of zero, which turns on
1028
autocommit, dropped performance to 38 seconds per thousand rows.
1030
If you are not archiving from or to a transactional storage engine, you may
1031
want to disable transactions so pt-archiver doesn't try to commit.
1037
short form: -u; type: string
1039
User for login if not current user.
1045
Show version and exit.
1053
WHERE clause to limit which rows to archive (required).
1055
Specifies a WHERE clause to limit which rows are archived. Do not include the
1056
word WHERE. You may need to quote the argument to prevent your shell from
1057
interpreting it. For example:
1060
.. code-block:: perl
1062
--where 'ts < current_date - interval 90 day'
1065
For safety, "--where" is required. If you do not require a WHERE clause, use
1072
Print reason for exiting unless rows exhausted.
1074
Causes pt-archiver to print a message if it exits for any reason other than
1075
running out of rows to archive. This can be useful if you have a cron job with
1076
"--run-time" specified, for example, and you want to be sure pt-archiver is
1077
finishing before running out of time.
1079
If "--statistics" is given, the behavior is changed slightly. It will print
1080
the reason for exiting even when it's just because there are no more rows.
1082
This output prints even if "--quiet" is given. That's so you can put
1083
\ ``pt-archiver``\ in a \ ``cron``\ job and get an email if there's an abnormal exit.
1093
These DSN options are used to create a DSN. Each option is given like
1094
\ ``option=value``\ . The options are case-sensitive, so P and p are not the
1095
same option. There cannot be whitespace before or after the \ ``=``\ and
1096
if the value contains whitespace it must be quoted. DSN options are
1097
comma-separated. See the percona-toolkit manpage for full details.
1104
Database to USE when executing queries.
1110
dsn: charset; copy: yes
1112
Default character set.
1120
If true, disable binlog with SQL_LOG_BIN.
1126
dsn: database; copy: yes
1128
Database that contains the table.
1134
dsn: mysql_read_default_file; copy: yes
1136
Only read default options from the given file
1142
dsn: host; copy: yes
1166
dsn: password; copy: yes
1168
Password to use when connecting.
1174
dsn: port; copy: yes
1176
Port number to use for connection.
1182
dsn: mysql_socket; copy: yes
1184
Socket file to use for connection.
1192
Table to archive from/to.
1198
dsn: user; copy: yes
1200
User for login if not current user.
1210
pt-archiver is extensible by plugging in external Perl modules to handle some
1211
logic and/or actions. You can specify a module for both the "--source" and
1212
the "--dest", with the 'm' part of the specification. For example:
1215
.. code-block:: perl
1217
--source D=test,t=test1,m=My::Module1 --dest m=My::Module2,t=test2
1220
This will cause pt-archiver to load the My::Module1 and My::Module2 packages,
1221
create instances of them, and then make calls to them during the archiving
1224
You can also specify a plugin with "--plugin".
1226
The module must provide this interface:
1229
new(dbh => $dbh, db => $db_name, tbl => $tbl_name)
1231
The plugin's constructor is passed a reference to the database handle, the
1232
database name, and table name. The plugin is created just after pt-archiver
1233
opens the connection, and before it examines the table given in the arguments.
1234
This gives the plugin a chance to create and populate temporary tables, or do
1239
before_begin(cols => \@cols, allcols => \@allcols)
1241
This method is called just before pt-archiver begins iterating through rows
1242
and archiving them, but after it does all other setup work (examining table
1243
structures, designing SQL queries, and so on). This is the only time
1244
pt-archiver tells the plugin column names for the rows it will pass the
1245
plugin while archiving.
1247
The \ ``cols``\ argument is the column names the user requested to be archived,
1248
either by default or by the "--columns" option. The \ ``allcols``\ argument is
1249
the list of column names for every row pt-archiver will fetch from the source
1250
table. It may fetch more columns than the user requested, because it needs some
1251
columns for its own use. When subsequent plugin functions receive a row, it is
1252
the full row containing all the extra columns, if any, added to the end.
1256
is_archivable(row => \@row)
1258
This method is called for each row to determine whether it is archivable. This
1259
applies only to "--source". The argument is the row itself, as an arrayref.
1260
If the method returns true, the row will be archived; otherwise it will be
1263
Skipping a row adds complications for non-unique indexes. Normally
1264
pt-archiver uses a WHERE clause designed to target the last processed row as
1265
the place to start the scan for the next SELECT statement. If you have skipped
1266
the row by returning false from is_archivable(), pt-archiver could get into
1267
an infinite loop because the row still exists. Therefore, when you specify a
1268
plugin for the "--source" argument, pt-archiver will change its WHERE clause
1269
slightly. Instead of starting at "greater than or equal to" the last processed
1270
row, it will start "strictly greater than." This will work fine on unique
1271
indexes such as primary keys, but it may skip rows (leave holes) on non-unique
1272
indexes or when ascending only the first column of an index.
1274
\ ``pt-archiver``\ will change the clause in the same way if you specify
1275
"--no-delete", because again an infinite loop is possible.
1277
If you specify the "--bulk-delete" option and return false from this method,
1278
\ ``pt-archiver``\ may not do what you want. The row won't be archived, but it will
1279
be deleted, since bulk deletes operate on ranges of rows and don't know which
1280
rows the plugin selected to keep.
1282
If you specify the "--bulk-insert" option, this method's return value will
1283
influence whether the row is written to the temporary file for the bulk insert,
1284
so bulk inserts will work as expected. However, bulk inserts require bulk
1289
before_delete(row => \@row)
1291
This method is called for each row just before it is deleted. This applies only
1292
to "--source". This is a good place for you to handle dependencies, such as
1293
deleting things that are foreign-keyed to the row you are about to delete. You
1294
could also use this to recursively archive all dependent tables.
1296
This plugin method is called even if "--no-delete" is given, but not if
1297
"--bulk-delete" is given.
1301
before_bulk_delete(first_row => \@row, last_row => \@row)
1303
This method is called just before a bulk delete is executed. It is similar to
1304
the \ ``before_delete``\ method, except its arguments are the first and last row of
1305
the range to be deleted. It is called even if "--no-delete" is given.
1309
before_insert(row => \@row)
1311
This method is called for each row just before it is inserted. This applies
1312
only to "--dest". You could use this to insert the row into multiple tables,
1313
perhaps with an ON DUPLICATE KEY UPDATE clause to build summary tables in a data
1316
This method is not called if "--bulk-insert" is given.
1320
before_bulk_insert(first_row => \@row, last_row => \@row, filename => bulk_insert_filename)
1322
This method is called just before a bulk insert is executed. It is similar to
1323
the \ ``before_insert``\ method, except its arguments are the first and last row of
1324
the range to be deleted.
1328
custom_sth(row => \@row, sql => $sql)
1330
This method is called just before inserting the row, but after
1331
"before_insert()". It allows the plugin to specify different \ ``INSERT``\
1332
statement if desired. The return value (if any) should be a DBI statement
1333
handle. The \ ``sql``\ parameter is the SQL text used to prepare the default
1334
\ ``INSERT``\ statement. This method is not called if you specify
1337
If no value is returned, the default \ ``INSERT``\ statement handle is used.
1339
This method applies only to the plugin specified for "--dest", so if your
1340
plugin isn't doing what you expect, check that you've specified it for the
1341
destination and not the source.
1345
custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql, filename => $bulk_insert_filename)
1347
If you've specified "--bulk-insert", this method is called just before the
1348
bulk insert, but after "before_bulk_insert()", and the arguments are
1351
This method's return value etc is similar to the "custom_sth()" method.
1357
This method is called after pt-archiver exits the archiving loop, commits all
1358
database handles, closes "--file", and prints the final statistics, but
1359
before pt-archiver runs ANALYZE or OPTIMIZE (see "--analyze" and
1364
If you specify a plugin for both "--source" and "--dest", pt-archiver
1365
constructs, calls before_begin(), and calls after_finish() on the two plugins in
1366
the order "--source", "--dest".
1368
pt-archiver assumes it controls transactions, and that the plugin will NOT
1369
commit or roll back the database handle. The database handle passed to the
1370
plugin's constructor is the same handle pt-archiver uses itself. Remember
1371
that "--source" and "--dest" are separate handles.
1373
A sample module might look like this:
1376
.. code-block:: perl
1381
my ( $class, %args ) = @_;
1382
return bless(\%args, $class);
1386
my ( $self, %args ) = @_;
1387
# Save column names for later
1388
$self->{cols} = $args{cols};
1392
my ( $self, %args ) = @_;
1393
# Do some advanced logic with $args{row}
1397
sub before_delete {} # Take no action
1398
sub before_insert {} # Take no action
1399
sub custom_sth {} # Take no action
1400
sub after_finish {} # Take no action
1411
Visit `http://www.percona.com/software/ <http://www.percona.com/software/>`_ to download the latest release of
1412
Percona Toolkit. Or, to get the latest release from the command line:
1415
.. code-block:: perl
1417
wget percona.com/latest/percona-toolkit/PKG
1420
Replace \ ``PKG``\ with \ ``tar``\ , \ ``rpm``\ , or \ ``deb``\ to download the package in that
1421
format. You can also get individual tools from the latest release:
1424
.. code-block:: perl
1426
wget percona.com/latest/percona-toolkit/TOOL
1429
Replace \ ``TOOL``\ with the name of any tool.
1437
The environment variable \ ``PTDEBUG``\ enables verbose debugging output to STDERR.
1438
To enable debugging and capture all output to a file, run the tool like:
1441
.. code-block:: perl
1443
PTDEBUG=1 pt-archiver ... > FILE 2>&1
1446
Be careful: debugging output is voluminous and can generate several megabytes
1455
You need Perl, DBI, DBD::mysql, and some core packages that ought to be
1456
installed in any reasonably new version of Perl.
1464
For a list of known bugs, see `http://www.percona.com/bugs/pt-archiver <http://www.percona.com/bugs/pt-archiver>`_.
1466
Please report bugs at `https://bugs.launchpad.net/percona-toolkit <https://bugs.launchpad.net/percona-toolkit>`_.
1467
Include the following information in your bug report:
1470
\* Complete command-line used to run the tool
1478
\* MySQL version of all servers involved
1482
\* Output from the tool including STDERR
1486
\* Input files (log/dump/config files, etc.)
1490
If possible, include debugging output by running the tool with \ ``PTDEBUG``\ ;
1510
*********************
1511
ABOUT PERCONA TOOLKIT
1512
*********************
1515
This tool is part of Percona Toolkit, a collection of advanced command-line
1516
tools developed by Percona for MySQL support and consulting. Percona Toolkit
1517
was forked from two projects in June, 2011: Maatkit and Aspersa. Those
1518
projects were created by Baron Schwartz and developed primarily by him and
1519
Daniel Nichter, both of whom are employed by Percona. Visit
1520
`http://www.percona.com/software/ <http://www.percona.com/software/>`_ for more software developed by Percona.
1523
********************************
1524
COPYRIGHT, LICENSE, AND WARRANTY
1525
********************************
1528
This program is copyright 2007-2011 Baron Schwartz, 2011 Percona Inc.
1529
Feedback and improvements are welcome.
1531
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
1532
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
1533
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
1535
This program is free software; you can redistribute it and/or modify it under
1536
the terms of the GNU General Public License as published by the Free Software
1537
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
1538
systems, you can issue \`man perlgpl' or \`man perlartistic' to read these
1541
You should have received a copy of the GNU General Public License along with
1542
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
1543
Place, Suite 330, Boston, MA 02111-1307 USA.
1551
Percona Toolkit v1.0.0 released 2011-08-01