~percona-toolkit-dev/percona-toolkit/docu-ptc-rbr-limitation

« back to all changes in this revision

Viewing changes to docs/user/pt-table-sync.rst

Committer: Daniel Nichter
Date: 2011-07-14 19:08:47 UTC
Revision ID: daniel@percona.com-20110714190847-lggalkuvdrh7c4jp

Add standard pkg files (COPYING, README, etc.), percona-toolkit.pod, and user docs. Remove dev/docs/html.

files added:
COPYING

Changelog

INSTALL

Makefile.PL

README

config/sphinx-build

config/sphinx-build/_static

config/sphinx-build/_templates

config/sphinx-build/conf.py

docs/percona-toolkit.pod

docs/user/index.rst

docs/user/pt-align.rst

docs/user/pt-archiver.rst

docs/user/pt-checksum-filter.rst

docs/user/pt-collect.rst

docs/user/pt-config-diff.rst

docs/user/pt-deadlock-logger.rst

docs/user/pt-diskstats.rst

docs/user/pt-duplicate-key-checker.rst

docs/user/pt-fifo-split.rst

docs/user/pt-find.rst

docs/user/pt-fk-error-logger.rst

docs/user/pt-heartbeat.rst

docs/user/pt-index-usage.rst

docs/user/pt-kill.rst

docs/user/pt-log-player.rst

docs/user/pt-mext.rst

docs/user/pt-mysql-summary.rst

docs/user/pt-online-schema-change.rst

docs/user/pt-pmp.rst

docs/user/pt-profile-compact.rst

docs/user/pt-query-advisor.rst

docs/user/pt-query-digest.rst

docs/user/pt-query-profiler.rst

docs/user/pt-rel.rst

docs/user/pt-show-grants.rst

docs/user/pt-sift.rst

docs/user/pt-slave-delay.rst

docs/user/pt-slave-find.rst

docs/user/pt-slave-restart.rst

docs/user/pt-stalk.rst

docs/user/pt-summary.rst

docs/user/pt-table-checksum.rst

docs/user/pt-table-sync.rst

docs/user/pt-tcp-model.rst

docs/user/pt-trend.rst

docs/user/pt-upgrade.rst

docs/user/pt-usl.rst

docs/user/pt-variable-advisor.rst

docs/user/pt-visual-explain.rst

docs/user/tools.rst

files removed:
docs/dev/html

docs/dev/html/files

docs/dev/html/files/modules

docs/dev/html/files/modules/Advisor-pm.html

docs/dev/html/files/modules/AdvisorRules-pm.html

docs/dev/html/files/modules/BinaryLogParser-pm.html

docs/dev/html/files/modules/ChangeHandler-pm.html

docs/dev/html/files/modules/CompareQueryTimes-pm.html

docs/dev/html/files/modules/CompareResults-pm.html

docs/dev/html/files/modules/CompareTableStructs-pm.html

docs/dev/html/files/modules/CompareWarnings-pm.html

docs/dev/html/files/modules/CopyRowsInsertSelect-pm.html

docs/dev/html/files/modules/DSNParser-pm.html

docs/dev/html/files/modules/Daemon-pm.html

docs/dev/html/files/modules/DuplicateKeyFinder-pm.html

docs/dev/html/files/modules/EventAggregator-pm.html

docs/dev/html/files/modules/EventTimeline-pm.html

docs/dev/html/files/modules/ExecutionThrottler-pm.html

docs/dev/html/files/modules/ExplainAnalyzer-pm.html

docs/dev/html/files/modules/FileIterator-pm.html

docs/dev/html/files/modules/ForeignKeyIterator-pm.html

docs/dev/html/files/modules/GeneralLogParser-pm.html

docs/dev/html/files/modules/HTTPProtocolParser-pm.html

docs/dev/html/files/modules/IndexUsage-pm.html

docs/dev/html/files/modules/InnoDBStatusParser-pm.html

docs/dev/html/files/modules/KeySize-pm.html

docs/dev/html/files/modules/LogSplitter-pm.html

docs/dev/html/files/modules/MaatkitTest-pm.html

docs/dev/html/files/modules/MasterSlave-pm.html

docs/dev/html/files/modules/MemcachedEvent-pm.html

docs/dev/html/files/modules/MemcachedProtocolParser-pm.html

docs/dev/html/files/modules/MockSth-pm.html

docs/dev/html/files/modules/MockSync-pm.html

docs/dev/html/files/modules/MockSyncStream-pm.html

docs/dev/html/files/modules/MySQLConfig-pm.html

docs/dev/html/files/modules/MySQLConfigComparer-pm.html

docs/dev/html/files/modules/MySQLDump-pm.html

docs/dev/html/files/modules/MySQLProtocolParser-pm.html

docs/dev/html/files/modules/OSCCaptureSync-pm.html

docs/dev/html/files/modules/OptionParser-pm.html

docs/dev/html/files/modules/Outfile-pm.html

docs/dev/html/files/modules/PgLogParser-pm.html

docs/dev/html/files/modules/Pipeline-pm.html

docs/dev/html/files/modules/PodParser-pm.html

docs/dev/html/files/modules/Processlist-pm.html

docs/dev/html/files/modules/ProcesslistAggregator-pm.html

docs/dev/html/files/modules/Progress-pm.html

docs/dev/html/files/modules/ProtocolParser-pm.html

docs/dev/html/files/modules/QueryAdvisorRules-pm.html

docs/dev/html/files/modules/QueryParser-pm.html

docs/dev/html/files/modules/QueryReportFormatter-pm.html

docs/dev/html/files/modules/QueryReview-pm.html

docs/dev/html/files/modules/QueryRewriter-pm.html

docs/dev/html/files/modules/Quoter-pm.html

docs/dev/html/files/modules/ReportFormatter-pm.html

docs/dev/html/files/modules/Retry-pm.html

docs/dev/html/files/modules/RowDiff-pm.html

docs/dev/html/files/modules/Runtime-pm.html

docs/dev/html/files/modules/SQLParser-pm.html

docs/dev/html/files/modules/Sandbox-pm.html

docs/dev/html/files/modules/Schema-pm.html

docs/dev/html/files/modules/SchemaIterator-pm.html

docs/dev/html/files/modules/SimpleTCPDumpParser-pm.html

docs/dev/html/files/modules/SlowLogParser-pm.html

docs/dev/html/files/modules/SlowLogWriter-pm.html

docs/dev/html/files/modules/SysLogParser-pm.html

docs/dev/html/files/modules/TCPRequestAggregator-pm.html

docs/dev/html/files/modules/TableChecksum-pm.html

docs/dev/html/files/modules/TableChunker-pm.html

docs/dev/html/files/modules/TableNibbler-pm.html

docs/dev/html/files/modules/TableParser-pm.html

docs/dev/html/files/modules/TableSyncChunk-pm.html

docs/dev/html/files/modules/TableSyncGroupBy-pm.html

docs/dev/html/files/modules/TableSyncNibble-pm.html

docs/dev/html/files/modules/TableSyncStream-pm.html

docs/dev/html/files/modules/TableSyncer-pm.html

docs/dev/html/files/modules/TableUsage-pm.html

docs/dev/html/files/modules/TcpdumpParser-pm.html

docs/dev/html/files/modules/TextResultSetParser-pm.html

docs/dev/html/files/modules/TimeSeriesTrender-pm.html

docs/dev/html/files/modules/Transformers-pm.html

docs/dev/html/files/modules/UpgradeReportFormatter-pm.html

docs/dev/html/files/modules/VariableAdvisorRules-pm.html

docs/dev/html/files/modules/VersionParser-pm.html

docs/dev/html/files/tools

docs/dev/html/files/tools/pt-archiver-pm.html

docs/dev/html/files/tools/pt-config-diff-pm.html

docs/dev/html/files/tools/pt-deadlock-logger-pm.html

docs/dev/html/files/tools/pt-duplicate-key-checker-pm.html

docs/dev/html/files/tools/pt-fifo-split-pm.html

docs/dev/html/files/tools/pt-find-pm.html

docs/dev/html/files/tools/pt-fk-error-logger-pm.html

docs/dev/html/files/tools/pt-heartbeat-pm.html

docs/dev/html/files/tools/pt-index-usage-pm.html

docs/dev/html/files/tools/pt-kill-pm.html

docs/dev/html/files/tools/pt-log-player-pm.html

docs/dev/html/files/tools/pt-online-schema-change-pm.html

docs/dev/html/files/tools/pt-profile-compact-pm.html

docs/dev/html/files/tools/pt-query-advisor-pm.html

docs/dev/html/files/tools/pt-query-digest-pm.html

docs/dev/html/files/tools/pt-query-profiler-pm.html

docs/dev/html/files/tools/pt-schema-advisor-pm.html

docs/dev/html/files/tools/pt-show-grants-pm.html

docs/dev/html/files/tools/pt-slave-delay-pm.html

docs/dev/html/files/tools/pt-slave-find-pm.html

docs/dev/html/files/tools/pt-slave-restart-pm.html

docs/dev/html/files/tools/pt-table-checksum-pm.html

docs/dev/html/files/tools/pt-table-sync-pm.html

docs/dev/html/files/tools/pt-table-usage-pm.html

docs/dev/html/files/tools/pt-tcp-model-pm.html

docs/dev/html/files/tools/pt-trend-pm.html

docs/dev/html/files/tools/pt-upgrade-pm.html

docs/dev/html/files/tools/pt-variable-advisor-pm.html

docs/dev/html/files/tools/pt-visual-explain-pm.html

docs/dev/html/index

docs/dev/html/index.html

docs/dev/html/index/Classes.html

docs/dev/html/index/Functions.html

docs/dev/html/index/Functions10.html

docs/dev/html/index/Functions11.html

docs/dev/html/index/Functions2.html

docs/dev/html/index/Functions3.html

docs/dev/html/index/Functions4.html

docs/dev/html/index/Functions5.html

docs/dev/html/index/Functions6.html

docs/dev/html/index/Functions7.html

docs/dev/html/index/Functions8.html

docs/dev/html/index/Functions9.html

docs/dev/html/index/General.html

docs/dev/html/index/General10.html

docs/dev/html/index/General11.html

docs/dev/html/index/General12.html

docs/dev/html/index/General13.html

docs/dev/html/index/General14.html

docs/dev/html/index/General15.html

docs/dev/html/index/General2.html

docs/dev/html/index/General3.html

docs/dev/html/index/General4.html

docs/dev/html/index/General5.html

docs/dev/html/index/General6.html

docs/dev/html/index/General7.html

docs/dev/html/index/General8.html

docs/dev/html/index/General9.html

docs/dev/html/index/Variables.html

docs/dev/html/index/Variables2.html

docs/dev/html/javascript

docs/dev/html/javascript/main.js

docs/dev/html/javascript/prettify.js

docs/dev/html/javascript/searchdata.js

docs/dev/html/search

docs/dev/html/search/ClassesA.html

docs/dev/html/search/ClassesB.html

docs/dev/html/search/ClassesC.html

docs/dev/html/search/ClassesD.html

docs/dev/html/search/ClassesE.html

docs/dev/html/search/ClassesF.html

docs/dev/html/search/ClassesG.html

docs/dev/html/search/ClassesH.html

docs/dev/html/search/ClassesI.html

docs/dev/html/search/ClassesK.html

docs/dev/html/search/ClassesL.html

docs/dev/html/search/ClassesM.html

docs/dev/html/search/ClassesO.html

docs/dev/html/search/ClassesP.html

docs/dev/html/search/ClassesQ.html

docs/dev/html/search/ClassesR.html

docs/dev/html/search/ClassesS.html

docs/dev/html/search/ClassesT.html

docs/dev/html/search/ClassesU.html

docs/dev/html/search/ClassesV.html

docs/dev/html/search/FunctionsA.html

docs/dev/html/search/FunctionsB.html

docs/dev/html/search/FunctionsC.html

docs/dev/html/search/FunctionsD.html

docs/dev/html/search/FunctionsE.html

docs/dev/html/search/FunctionsF.html

docs/dev/html/search/FunctionsG.html

docs/dev/html/search/FunctionsH.html

docs/dev/html/search/FunctionsI.html

docs/dev/html/search/FunctionsJ.html

docs/dev/html/search/FunctionsK.html

docs/dev/html/search/FunctionsL.html

docs/dev/html/search/FunctionsM.html

docs/dev/html/search/FunctionsN.html

docs/dev/html/search/FunctionsO.html

docs/dev/html/search/FunctionsP.html

docs/dev/html/search/FunctionsQ.html

docs/dev/html/search/FunctionsR.html

docs/dev/html/search/FunctionsS.html

docs/dev/html/search/FunctionsSymbols.html

docs/dev/html/search/FunctionsT.html

docs/dev/html/search/FunctionsU.html

docs/dev/html/search/FunctionsV.html

docs/dev/html/search/FunctionsW.html

docs/dev/html/search/GeneralA.html

docs/dev/html/search/GeneralB.html

docs/dev/html/search/GeneralC.html

docs/dev/html/search/GeneralD.html

docs/dev/html/search/GeneralE.html

docs/dev/html/search/GeneralF.html

docs/dev/html/search/GeneralG.html

docs/dev/html/search/GeneralH.html

docs/dev/html/search/GeneralI.html

docs/dev/html/search/GeneralJ.html

docs/dev/html/search/GeneralK.html

docs/dev/html/search/GeneralL.html

docs/dev/html/search/GeneralM.html

docs/dev/html/search/GeneralN.html

docs/dev/html/search/GeneralO.html

docs/dev/html/search/GeneralP.html

docs/dev/html/search/GeneralQ.html

docs/dev/html/search/GeneralR.html

docs/dev/html/search/GeneralS.html

docs/dev/html/search/GeneralSymbols.html

docs/dev/html/search/GeneralT.html

docs/dev/html/search/GeneralU.html

docs/dev/html/search/GeneralV.html

docs/dev/html/search/GeneralW.html

docs/dev/html/search/NoResults.html

docs/dev/html/search/VariablesA.html

docs/dev/html/search/VariablesB.html

docs/dev/html/search/VariablesC.html

docs/dev/html/search/VariablesD.html

docs/dev/html/search/VariablesE.html

docs/dev/html/search/VariablesF.html

docs/dev/html/search/VariablesG.html

docs/dev/html/search/VariablesH.html

docs/dev/html/search/VariablesI.html

docs/dev/html/search/VariablesL.html

docs/dev/html/search/VariablesM.html

docs/dev/html/search/VariablesN.html

docs/dev/html/search/VariablesO.html

docs/dev/html/search/VariablesP.html

docs/dev/html/search/VariablesQ.html

docs/dev/html/search/VariablesR.html

docs/dev/html/search/VariablesS.html

docs/dev/html/search/VariablesT.html

docs/dev/html/search/VariablesU.html

docs/dev/html/search/VariablesV.html

docs/dev/html/search/VariablesW.html

docs/dev/html/styles

docs/dev/html/styles/main.css

files modified:
.bzrignore

bin/pt-log-player

util/write-user-docs

Show diffs side-by-side

added added

removed removed

docs/user/pt-table-sync.rst

#############

pt-table-sync

#############

.. highlight:: perl

****

NAME

****

pt-table-sync - Synchronize MySQL table data efficiently.

********

SYNOPSIS

********

Usage: pt-table-sync [OPTION...] DSN [DSN...]

pt-table-sync synchronizes data efficiently between MySQL tables.

This tool changes data, so for maximum safety, you should back up your data

before you use it. When synchronizing a server that is a replication slave with

the --replicate or --sync-to-master methods, it \ **always**\ makes the changes on

the replication master, \ **never**\ the replication slave directly. This is in

general the only safe way to bring a replica back in sync with its master;

changes to the replica are usually the source of the problems in the first

place. However, the changes it makes on the master should be no-op changes that

set the data to their current values, and actually affect only the replica.

Please read the detailed documentation that follows to learn more about this.

Sync db.tbl on host1 to host2:

.. code-block:: perl

pt-table-sync --execute h=host1,D=db,t=tbl h=host2

Sync all tables on host1 to host2 and host3:

.. code-block:: perl

pt-table-sync --execute host1 host2 host3

Make slave1 have the same data as its replication master:

.. code-block:: perl

pt-table-sync --execute --sync-to-master slave1

Resolve differences that pt-table-checksum found on all slaves of master1:

.. code-block:: perl

pt-table-sync --execute --replicate test.checksum master1

Same as above but only resolve differences on slave1:

.. code-block:: perl

pt-table-sync --execute --replicate test.checksum \

--sync-to-master slave1

Sync master2 in a master-master replication configuration, where master2's copy

of db.tbl is known or suspected to be incorrect:

.. code-block:: perl

pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl

Note that in the master-master configuration, the following will NOT do what you

want, because it will make changes directly on master2, which will then flow

through replication and change master1's data:

.. code-block:: perl

# Don't do this in a master-master setup!

pt-table-sync --execute h=master1,D=db,t=tbl master2

*****

RISKS

100

*****

101

102

103

The following section is included to inform users about the potential risks,

104

whether known or unknown, of using this tool. The two main categories of risks

105

are those created by the nature of the tool (e.g. read-only tools vs. read-write

106

tools) and those created by bugs.

107

108

With great power comes great responsibility! This tool changes data, so it is a

109

good idea to back up your data. It is also very powerful, which means it is

110

very complex, so you should run it with the "--dry-run" option to see what it

111

will do, until you're familiar with its operation. If you want to see which

112

rows are different, without changing any data, use "--print" instead of

113

"--execute".

114

115

Be careful when using pt-table-sync in any master-master setup. Master-master

116

replication is inherently tricky, and it's easy to make mistakes. You need to

117

be sure you're using the tool correctly for master-master replication. See the

118

"SYNOPSIS" for the overview of the correct usage.

119

120

Also be careful with tables that have foreign key constraints with \ ``ON DELETE``\

121

or \ ``ON UPDATE``\ definitions because these might cause unintended changes on the

122

child tables.

123

124

In general, this tool is best suited when your tables have a primary key or

125

unique index. Although it can synchronize data in tables lacking a primary key

126

or unique index, it might be best to synchronize that data by another means.

127

128

At the time of this release, there is a potential bug using

129

"--lock-and-rename" with MySQL 5.1, a bug detecting certain differences,

130

a bug using ROUND() across different platforms, and a bug mixing collations.

131

132

The authoritative source for updated information is always the online issue

133

tracking system. Issues that affect this tool will be marked as such. You can

134

see a list of such issues at the following URL:

135

`http://www.percona.com/bugs/pt-table-sync <http://www.percona.com/bugs/pt-table-sync>`_.

136

137

See also "BUGS" for more information on filing bugs and getting help.

138

139

140

***********

141

DESCRIPTION

142

***********

143

144

145

pt-table-sync does one-way and bidirectional synchronization of table data.

146

It does \ **not**\ synchronize table structures, indexes, or any other schema

147

objects. The following describes one-way synchronization.

148

"BIDIRECTIONAL SYNCING" is described later.

149

150

This tool is complex and functions in several different ways. To use it

151

safely and effectively, you should understand three things: the purpose

152

of "--replicate", finding differences, and specifying hosts. These

153

three concepts are closely related and determine how the tool will run.

154

The following is the abbreviated logic:

155

156

157

.. code-block:: perl

158

159

if DSN has a t part, sync only that table:

160

if 1 DSN:

161

if --sync-to-master:

162

The DSN is a slave. Connect to its master and sync.

163

if more than 1 DSN:

164

The first DSN is the source. Sync each DSN in turn.

165

else if --replicate:

166

if --sync-to-master:

167

The DSN is a slave. Connect to its master, find records

168

of differences, and fix.

169

else:

170

The DSN is the master. Find slaves and connect to each,

171

find records of differences, and fix.

172

else:

173

if only 1 DSN and --sync-to-master:

174

The DSN is a slave. Connect to its master, find tables and

175

filter with --databases etc, and sync each table to the master.

176

else:

177

find tables, filtering with --databases etc, and sync each

178

DSN to the first.

179

180

181

pt-table-sync can run in one of two ways: with "--replicate" or without.

182

The default is to run without "--replicate" which causes pt-table-sync

183

to automatically find differences efficiently with one of several

184

algorithms (see "ALGORITHMS"). Alternatively, the value of

185

"--replicate", if specified, causes pt-table-sync to use the differences

186

already found by having previously ran pt-table-checksum with its own

187

\ ``--replicate``\ option. Strictly speaking, you don't need to use

188

"--replicate" because pt-table-sync can find differences, but many

189

people use "--replicate" if, for example, they checksum regularly

190

using pt-table-checksum then fix differences as needed with pt-table-sync.

191

If you're unsure, read each tool's documentation carefully and decide for

192

yourself, or consult with an expert.

193

194

Regardless of whether "--replicate" is used or not, you need to specify

195

which hosts to sync. There are two ways: with "--sync-to-master" or

196

without. Specifying "--sync-to-master" makes pt-table-sync expect

197

one and only slave DSN on the command line. The tool will automatically

198

discover the slave's master and sync it so that its data is the same as

199

its master. This is accomplished by making changes on the master which

200

then flow through replication and update the slave to resolve its differences.

201

\ **Be careful though**\ : although this option specifies and syncs a single

202

slave, if there are other slaves on the same master, they will receive

203

via replication the changes intended for the slave that you're trying to

204

sync.

205

206

Alternatively, if you do not specify "--sync-to-master", the first

207

DSN given on the command line is the source host. There is only ever

208

one source host. If you do not also specify "--replicate", then you

209

must specify at least one other DSN as the destination host. There

210

can be one or more destination hosts. Source and destination hosts

211

must be independent; they cannot be in the same replication topology.

212

pt-table-sync will die with an error if it detects that a destination

213

host is a slave because changes are written directly to destination hosts

214

(and it's not safe to write directly to slaves). Or, if you specify

215

"--replicate" (but not "--sync-to-master") then pt-table-sync expects

216

one and only one master DSN on the command line. The tool will automatically

217

discover all the master's slaves and sync them to the master. This is

218

the only way to sync several (all) slaves at once (because

219

"--sync-to-master" only specifies one slave).

220

221

Each host on the command line is specified as a DSN. The first DSN

222

(or only DSN for cases like "--sync-to-master") provides default values

223

for other DSNs, whether those other DSNs are specified on the command line

224

or auto-discovered by the tool. So in this example,

225

226

227

.. code-block:: perl

228

229

pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2

230

231

232

the host2 DSN inherits the \ ``u``\ and \ ``p``\ DSN parts from the host1 DSN.

233

Use the "--explain-hosts" option to see how pt-table-sync will interpret

234

the DSNs given on the command line.

235

236

237

******

238

OUTPUT

239

******

240

241

242

If you specify the "--verbose" option, you'll see information about the

243

differences between the tables. There is one row per table. Each server is

244

printed separately. For example,

245

246

247

.. code-block:: perl

248

249

# Syncing h=host1,D=test,t=test1

250

# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE

251

# 0 0 3 0 Chunk 13:00:00 13:00:17 2 test.test1

252

253

254

Table test.test1 on host1 required 3 \ ``INSERT``\ statements to synchronize

255

and it used the Chunk algorithm (see "ALGORITHMS"). The sync operation

256

for this table started at 13:00:00 and ended 17 seconds later (times taken

257

from \ ``NOW()``\ on the source host). Because differences were found, its

258

"EXIT STATUS" was 2.

259

260

If you specify the "--print" option, you'll see the actual SQL statements

261

that the script uses to synchronize the table if "--execute" is also

262

specified.

263

264

If you want to see the SQL statements that pt-table-sync is using to select

265

chunks, nibbles, rows, etc., then specify "--print" once and "--verbose"

266

twice. Be careful though: this can print a lot of SQL statements.

267

268

There are cases where no combination of \ ``INSERT``\ , \ ``UPDATE``\ or \ ``DELETE``\

269

statements can resolve differences without violating some unique key. For

270

example, suppose there's a primary key on column a and a unique key on column b.

271

Then there is no way to sync these two tables with straightforward UPDATE

272

statements:

273

274

275

.. code-block:: perl

276

277

+---+---+ +---+---+

278

| a | b | | a | b |

279

+---+---+ +---+---+

280

| 1 | 2 | | 1 | 1 |

281

| 2 | 1 | | 2 | 2 |

282

+---+---+ +---+---+

283

284

285

The tool rewrites queries to \ ``DELETE``\ and \ ``REPLACE``\ in this case. This is

286

automatically handled after the first index violation, so you don't have to

287

worry about it.

288

289

290

******************

291

REPLICATION SAFETY

292

******************

293

294

295

Synchronizing a replication master and slave safely is a non-trivial problem, in

296

general. There are all sorts of issues to think about, such as other processes

297

changing data, trying to change data on the slave, whether the destination and

298

source are a master-master pair, and much more.

299

300

In general, the safe way to do it is to change the data on the master, and let

301

the changes flow through replication to the slave like any other changes.

302

However, this works only if it's possible to REPLACE into the table on the

303

master. REPLACE works only if there's a unique index on the table (otherwise it

304

just acts like an ordinary INSERT).

305

306

If your table has unique keys, you should use the "--sync-to-master" and/or

307

"--replicate" options to sync a slave to its master. This will generally do

308

the right thing. When there is no unique key on the table, there is no choice

309

but to change the data on the slave, and pt-table-sync will detect that you're

310

trying to do so. It will complain and die unless you specify

311

\ ``--no-check-slave``\ (see "--[no]check-slave").

312

313

If you're syncing a table without a primary or unique key on a master-master

314

pair, you must change the data on the destination server. Therefore, you need

315

to specify \ ``--no-bin-log``\ for safety (see "--[no]bin-log"). If you don't,

316

the changes you make on the destination server will replicate back to the

317

source server and change the data there!

318

319

The generally safe thing to do on a master-master pair is to use the

320

"--sync-to-master" option so you don't change the data on the destination

321

server. You will also need to specify \ ``--no-check-slave``\ to keep

322

pt-table-sync from complaining that it is changing data on a slave.

323

324

325

**********

326

ALGORITHMS

327

**********

328

329

330

pt-table-sync has a generic data-syncing framework which uses different

331

algorithms to find differences. The tool automatically chooses the best

332

algorithm for each table based on indexes, column types, and the algorithm

333

preferences specified by "--algorithms". The following algorithms are

334

available, listed in their default order of preference:

335

336

337

Chunk

338

339

Finds an index whose first column is numeric (including date and time types),

340

and divides the column's range of values into chunks of approximately

341

"--chunk-size" rows. Syncs a chunk at a time by checksumming the entire

342

chunk. If the chunk differs on the source and destination, checksums each

343

chunk's rows individually to find the rows that differ.

344

345

It is efficient when the column has sufficient cardinality to make the chunks

346

end up about the right size.

347

348

The initial per-chunk checksum is quite small and results in minimal network

349

traffic and memory consumption. If a chunk's rows must be examined, only the

350

primary key columns and a checksum are sent over the network, not the entire

351

row. If a row is found to be different, the entire row will be fetched, but not

352

before.

353

354

355

356

Nibble

357

358

Finds an index and ascends the index in fixed-size nibbles of "--chunk-size"

359

rows, using a non-backtracking algorithm (see pt-archiver for more on this

360

algorithm). It is very similar to "Chunk", but instead of pre-calculating

361

the boundaries of each piece of the table based on index cardinality, it uses

362

\ ``LIMIT``\ to define each nibble's upper limit, and the previous nibble's upper

363

limit to define the lower limit.

364

365

It works in steps: one query finds the row that will define the next nibble's

366

upper boundary, and the next query checksums the entire nibble. If the nibble

367

differs between the source and destination, it examines the nibble row-by-row,

368

just as "Chunk" does.

369

370

371

372

GroupBy

373

374

Selects the entire table grouped by all columns, with a COUNT(\*) column added.

375

Compares all columns, and if they're the same, compares the COUNT(\*) column's

376

value to determine how many rows to insert or delete into the destination.

377

Works on tables with no primary key or unique index.

378

379

380

381

Stream

382

383

Selects the entire table in one big stream and compares all columns. Selects

384

all columns. Much less efficient than the other algorithms, but works when

385

there is no suitable index for them to use.

386

387

388

389

Future Plans

390

391

Possibilities for future algorithms are TempTable (what I originally called

392

bottom-up in earlier versions of this tool), DrillDown (what I originally

393

called top-down), and GroupByPrefix (similar to how SqlYOG Job Agent works).

394

Each algorithm has strengths and weaknesses. If you'd like to implement your

395

favorite technique for finding differences between two sources of data on

396

possibly different servers, I'm willing to help. The algorithms adhere to a

397

simple interface that makes it pretty easy to write your own.

398

399

400

401

402

*********************

403

BIDIRECTIONAL SYNCING

404

*********************

405

406

407

Bidirectional syncing is a new, experimental feature. To make it work

408

reliably there are a number of strict limitations:

409

410

411

.. code-block:: perl

412

413

* only works when syncing one server to other independent servers

414

* does not work in any way with replication

415

* requires that the table(s) are chunkable with the Chunk algorithm

416

* is not N-way, only bidirectional between two servers at a time

417

* does not handle DELETE changes

418

419

420

For example, suppose we have three servers: c1, r1, r2. c1 is the central

421

server, a pseudo-master to the other servers (viz. r1 and r2 are not slaves

422

to c1). r1 and r2 are remote servers. Rows in table foo are updated and

423

inserted on all three servers and we want to synchronize all the changes

424

between all the servers. Table foo has columns:

425

426

427

.. code-block:: perl

428

429

id int PRIMARY KEY

430

ts timestamp auto updated

431

name varchar

432

433

434

Auto-increment offsets are used so that new rows from any server do not

435

create conflicting primary key (id) values. In general, newer rows, as

436

determined by the ts column, take precedence when a same but differing row

437

is found during the bidirectional sync. "Same but differing" means that

438

two rows have the same primary key (id) value but different values for some

439

other column, like the name column in this example. Same but differing

440

conflicts are resolved by a "conflict". A conflict compares some column of

441

the competing rows to determine a "winner". The winning row becomes the

442

source and its values are used to update the other row.

443

444

There are subtle differences between three columns used to achieve

445

bidirectional syncing that you should be familiar with: chunk column

446

("--chunk-column"), comparison column(s) ("--columns"), and conflict

447

column ("--conflict-column"). The chunk column is only used to chunk the

448

table; e.g. "WHERE id >= 5 AND id < 10". Chunks are checksummed and when

449

chunk checksums reveal a difference, the tool selects the rows in that

450

chunk and checksums the "--columns" for each row. If a column checksum

451

differs, the rows have one or more conflicting column values. In a

452

traditional unidirectional sync, the conflict is a moot point because it can

453

be resolved simply by updating the entire destination row with the source

454

row's values. In a bidirectional sync, however, the "--conflict-column"

455

(in accordance with other \ ``--conflict-\*``\ options list below) is compared

456

to determine which row is "correct" or "authoritative"; this row becomes

457

the "source".

458

459

To sync all three servers completely, two runs of pt-table-sync are required.

460

The first run syncs c1 and r1, then syncs c1 and r2 including any changes

461

from r1. At this point c1 and r2 are completely in sync, but r1 is missing

462

any changes from r2 because c1 didn't have these changes when it and r1

463

were synced. So a second run is needed which syncs the servers in the same

464

order, but this time when c1 and r1 are synced r1 gets r2's changes.

465

466

The tool does not sync N-ways, only bidirectionally between the first DSN

467

given on the command line and each subsequent DSN in turn. So the tool in

468

this example would be ran twice like:

469

470

471

.. code-block:: perl

472

473

pt-table-sync --bidirectional h=c1 h=r1 h=r2

474

475

476

The "--bidirectional" option enables this feature and causes various

477

sanity checks to be performed. You must specify other options that tell

478

pt-table-sync how to resolve conflicts for same but differing rows.

479

These options are:

480

481

482

.. code-block:: perl

483

484

* L<"--conflict-column">

485

* L<"--conflict-comparison">

486

* L<"--conflict-value">

487

* L<"--conflict-threshold">

488

* L<"--conflict-error"> (optional)

489

490

491

Use "--print" to test this option before "--execute". The printed

492

SQL statements will have comments saying on which host the statement

493

would be executed if you used "--execute".

494

495

Technical side note: the first DSN is always the "left" server and the other

496

DSNs are always the "right" server. Since either server can become the source

497

or destination it's confusing to think of them as "src" and "dst". Therefore,

498

they're generically referred to as left and right. It's easy to remember

499

this because the first DSN is always to the left of the other server DSNs on

500

the command line.

501

502

503

***********

504

EXIT STATUS

505

***********

506

507

508

The following are the exit statuses (also called return values, or return codes)

509

when pt-table-sync finishes and exits.

510

511

512

.. code-block:: perl

513

514

STATUS MEANING

515

====== =======================================================

516

0 Success.

517

1 Internal error.

518

2 At least one table differed on the destination.

519

3 Combination of 1 and 2.

520

521

522

523

*******

524

OPTIONS

525

*******

526

527

528

Specify at least one of "--print", "--execute", or "--dry-run".

529

530

"--where" and "--replicate" are mutually exclusive.

531

532

This tool accepts additional command-line arguments. Refer to the

533

"SYNOPSIS" and usage information for details.

534

535

536

--algorithms

537

538

type: string; default: Chunk,Nibble,GroupBy,Stream

539

540

Algorithm to use when comparing the tables, in order of preference.

541

542

For each table, pt-table-sync will check if the table can be synced with

543

the given algorithms in the order that they're given. The first algorithm

544

that can sync the table is used. See "ALGORITHMS".

545

546

547

548

--ask-pass

549

550

Prompt for a password when connecting to MySQL.

551

552

553

554

--bidirectional

555

556

Enable bidirectional sync between first and subsequent hosts.

557

558

See "BIDIRECTIONAL SYNCING" for more information.

559

560

561

562

--[no]bin-log

563

564

default: yes

565

566

Log to the binary log (\ ``SET SQL_LOG_BIN=1``\ ).

567

568

Specifying \ ``--no-bin-log``\ will \ ``SET SQL_LOG_BIN=0``\ .

569

570

571

572

--buffer-in-mysql

573

574

Instruct MySQL to buffer queries in its memory.

575

576

This option adds the \ ``SQL_BUFFER_RESULT``\ option to the comparison queries.

577

This causes MySQL to execute the queries and place them in a temporary table

578

internally before sending the results back to pt-table-sync. The advantage of

579

this strategy is that pt-table-sync can fetch rows as desired without using a

580

lot of memory inside the Perl process, while releasing locks on the MySQL table

581

(to reduce contention with other queries). The disadvantage is that it uses

582

more memory on the MySQL server instead.

583

584

You probably want to leave "--[no]buffer-to-client" enabled too, because

585

buffering into a temp table and then fetching it all into Perl's memory is

586

probably a silly thing to do. This option is most useful for the GroupBy and

587

Stream algorithms, which may fetch a lot of data from the server.

588

589

590

591

--[no]buffer-to-client

592

593

default: yes

594

595

Fetch rows one-by-one from MySQL while comparing.

596

597

This option enables \ ``mysql_use_result``\ which causes MySQL to hold the selected

598

rows on the server until the tool fetches them. This allows the tool to use

599

less memory but may keep the rows locked on the server longer.

600

601

If this option is disabled by specifying \ ``--no-buffer-to-client``\ then

602

\ ``mysql_store_result``\ is used which causes MySQL to send all selected rows to

603

the tool at once. This may result in the results "cursor" being held open for

604

a shorter time on the server, but if the tables are large, it could take a long

605

time anyway, and use all your memory.

606

607

For most non-trivial data sizes, you want to leave this option enabled.

608

609

This option is disabled when "--bidirectional" is used.

610

611

612

613

--charset

614

615

short form: -A; type: string

616

617

Default character set. If the value is utf8, sets Perl's binmode on

618

STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and

619

runs SET NAMES UTF8 after connecting to MySQL. Any other value sets

620

binmode on STDOUT without the utf8 layer, and runs SET NAMES after

621

connecting to MySQL.

622

623

624

625

--[no]check-master

626

627

default: yes

628

629

With "--sync-to-master", try to verify that the detected

630

master is the real master.

631

632

633

634

--[no]check-privileges

635

636

default: yes

637

638

Check that user has all necessary privileges on source and destination table.

639

640

641

642

--[no]check-slave

643

644

default: yes

645

646

Check whether the destination server is a slave.

647

648

If the destination server is a slave, it's generally unsafe to make changes on

649

it. However, sometimes you have to; "--replace" won't work unless there's a

650

unique index, for example, so you can't make changes on the master in that

651

scenario. By default pt-table-sync will complain if you try to change data on

652

a slave. Specify \ ``--no-check-slave``\ to disable this check. Use it at your own

653

risk.

654

655

656

657

--[no]check-triggers

658

659

default: yes

660

661

Check that no triggers are defined on the destination table.

662

663

Triggers were introduced in MySQL v5.0.2, so for older versions this option

664

has no effect because triggers will not be checked.

665

666

667

668

--chunk-column

669

670

type: string

671

672

Chunk the table on this column.

673

674

675

676

--chunk-index

677

678

type: string

679

680

Chunk the table using this index.

681

682

683

684

--chunk-size

685

686

type: string; default: 1000

687

688

Number of rows or data size per chunk.

689

690

The size of each chunk of rows for the "Chunk" and "Nibble" algorithms.

691

The size can be either a number of rows, or a data size. Data sizes are

692

specified with a suffix of k=kibibytes, M=mebibytes, G=gibibytes. Data sizes

693

are converted to a number of rows by dividing by the average row length.

694

695

696

697

--columns

698

699

short form: -c; type: array

700

701

Compare this comma-separated list of columns.

702

703

704

705

--config

706

707

type: Array

708

709

Read this comma-separated list of config files; if specified, this must be the

710

first option on the command line.

711

712

713

714

--conflict-column

715

716

type: string

717

718

Compare this column when rows conflict during a "--bidirectional" sync.

719

720

When a same but differing row is found the value of this column from each

721

row is compared according to "--conflict-comparison", "--conflict-value"

722

and "--conflict-threshold" to determine which row has the correct data and

723

becomes the source. The column can be any type for which there is an

724

appropriate "--conflict-comparison" (this is almost all types except, for

725

example, blobs).

726

727

This option only works with "--bidirectional".

728

See "BIDIRECTIONAL SYNCING" for more information.

729

730

731

732

--conflict-comparison

733

734

type: string

735

736

Choose the "--conflict-column" with this property as the source.

737

738

The option affects how the "--conflict-column" values from the conflicting

739

rows are compared. Possible comparisons are one of these MAGIC_comparisons:

740

741

742

.. code-block:: perl

743

744

745

746

COMPARISON CHOOSES ROW WITH

747

========== =========================================================

748

newest Newest temporal L<"--conflict-column"> value

749

oldest Oldest temporal L<"--conflict-column"> value

750

greatest Greatest numerical L<"--conflict-column"> value

751

least Least numerical L<"--conflict-column"> value

752

equals L<"--conflict-column"> value equal to L<"--conflict-value">

753

matches L<"--conflict-column"> value matching Perl regex pattern

754

L<"--conflict-value">

755

756

757

This option only works with "--bidirectional".

758

See "BIDIRECTIONAL SYNCING" for more information.

759

760

761

762

--conflict-error

763

764

type: string; default: warn

765

766

How to report unresolvable conflicts and conflict errors

767

768

This option changes how the user is notified when a conflict cannot be

769

resolved or causes some kind of error. Possible values are:

770

771

772

.. code-block:: perl

773

774

* warn: Print a warning to STDERR about the unresolvable conflict

775

* die: Die, stop syncing, and print a warning to STDERR

776

777

778

This option only works with "--bidirectional".

779

See "BIDIRECTIONAL SYNCING" for more information.

780

781

782

783

--conflict-threshold

784

785

type: string

786

787

Amount by which one "--conflict-column" must exceed the other.

788

789

The "--conflict-threshold" prevents a conflict from being resolved if

790

the absolute difference between the two "--conflict-column" values is

791

less than this amount. For example, if two "--conflict-column" have

792

timestamp values "2009-12-01 12:00:00" and "2009-12-01 12:05:00" the difference

793

is 5 minutes. If "--conflict-threshold" is set to "5m" the conflict will

794

be resolved, but if "--conflict-threshold" is set to "6m" the conflict

795

will fail to resolve because the difference is not greater than or equal

796

to 6 minutes. In this latter case, "--conflict-error" will report

797

the failure.

798

799

This option only works with "--bidirectional".

800

See "BIDIRECTIONAL SYNCING" for more information.

801

802

803

804

--conflict-value

805

806

type: string

807

808

Use this value for certain "--conflict-comparison".

809

810

This option gives the value for \ ``equals``\ and \ ``matches``\

811

"--conflict-comparison".

812

813

This option only works with "--bidirectional".

814

See "BIDIRECTIONAL SYNCING" for more information.

815

816

817

818

--databases

819

820

short form: -d; type: hash

821

822

Sync only this comma-separated list of databases.

823

824

A common request is to sync tables from one database with tables from another

825

database on the same or different server. This is not yet possible.

826

"--databases" will not do it, and you can't do it with the D part of the DSN

827

either because in the absence of a table name it assumes the whole server

828

should be synced and the D part controls only the connection's default database.

829

830

831

832

--defaults-file

833

834

short form: -F; type: string

835

836

Only read mysql options from the given file. You must give an absolute pathname.

837

838

839

840

--dry-run

841

842

Analyze, decide the sync algorithm to use, print and exit.

843

844

Implies "--verbose" so you can see the results. The results are in the same

845

output format that you'll see from actually running the tool, but there will be

846

zeros for rows affected. This is because the tool actually executes, but stops

847

before it compares any data and just returns zeros. The zeros do not mean there

848

are no changes to be made.

849

850

851

852

--engines

853

854

short form: -e; type: hash

855

856

Sync only this comma-separated list of storage engines.

857

858

859

860

--execute

861

862

Execute queries to make the tables have identical data.

863

864

This option makes pt-table-sync actually sync table data by executing all

865

the queries that it created to resolve table differences. Therefore, \ **the

866

tables will be changed!**\ And unless you also specify "--verbose", the

867

changes will be made silently. If this is not what you want, see

868

"--print" or "--dry-run".

869

870

871

872

--explain-hosts

873

874

Print connection information and exit.

875

876

Print out a list of hosts to which pt-table-sync will connect, with all

877

the various connection options, and exit.

878

879

880

881

--float-precision

882

883

type: int

884

885

Precision for \ ``FLOAT``\ and \ ``DOUBLE``\ number-to-string conversion. Causes FLOAT

886

and DOUBLE values to be rounded to the specified number of digits after the

887

decimal point, with the ROUND() function in MySQL. This can help avoid

888

checksum mismatches due to different floating-point representations of the same

889

values on different MySQL versions and hardware. The default is no rounding;

890

the values are converted to strings by the CONCAT() function, and MySQL chooses

891

the string representation. If you specify a value of 2, for example, then the

892

values 1.008 and 1.009 will be rounded to 1.01, and will checksum as equal.

893

894

895

896

--[no]foreign-key-checks

897

898

default: yes

899

900

Enable foreign key checks (\ ``SET FOREIGN_KEY_CHECKS=1``\ ).

901

902

Specifying \ ``--no-foreign-key-checks``\ will \ ``SET FOREIGN_KEY_CHECKS=0``\ .

903

904

905

906

--function

907

908

type: string

909

910

Which hash function you'd like to use for checksums.

911

912

The default is \ ``CRC32``\ . Other good choices include \ ``MD5``\ and \ ``SHA1``\ . If you

913

have installed the \ ``FNV_64``\ user-defined function, \ ``pt-table-sync``\ will detect

914

it and prefer to use it, because it is much faster than the built-ins. You can

915

also use MURMUR_HASH if you've installed that user-defined function. Both of

916

these are distributed with Maatkit. See pt-table-checksum for more

917

information and benchmarks.

918

919

920

921

--help

922

923

Show help and exit.

924

925

926

927

--[no]hex-blob

928

929

default: yes

930

931

\ ``HEX()``\ \ ``BLOB``\ , \ ``TEXT``\ and \ ``BINARY``\ columns.

932

933

When row data from the source is fetched to create queries to sync the

934

data (i.e. the queries seen with "--print" and executed by "--execute"),

935

binary columns are wrapped in HEX() so the binary data does not produce

936

an invalid SQL statement. You can disable this option but you probably

937

shouldn't.

938

939

940

941

--host

942

943

short form: -h; type: string

944

945

Connect to host.

946

947

948

949

--ignore-columns

950

951

type: Hash

952

953

Ignore this comma-separated list of column names in comparisons.

954

955

This option causes columns not to be compared. However, if a row is determined

956

to differ between tables, all columns in that row will be synced, regardless.

957

(It is not currently possible to exclude columns from the sync process itself,

958

only from the comparison.)

959

960

961

962

--ignore-databases

963

964

type: Hash

965

966

Ignore this comma-separated list of databases.

967

968

969

970

--ignore-engines

971

972

type: Hash; default: FEDERATED,MRG_MyISAM

973

974

Ignore this comma-separated list of storage engines.

975

976

977

978

--ignore-tables

979

980

type: Hash

981

982

Ignore this comma-separated list of tables.

983

984

Table names may be qualified with the database name.

985

986

987

988

--[no]index-hint

989

990

default: yes

991

992

Add FORCE/USE INDEX hints to the chunk and row queries.

993

994

By default \ ``pt-table-sync``\ adds a FORCE/USE INDEX hint to each SQL statement

995

to coerce MySQL into using the index chosen by the sync algorithm or specified

996

by "--chunk-index". This is usually a good thing, but in rare cases the

997

index may not be the best for the query so you can suppress the index hint

998

by specifying \ ``--no-index-hint``\ and let MySQL choose the index.

999

1000

This does not affect the queries printed by "--print"; it only affects the

1001

chunk and row queries that \ ``pt-table-sync``\ uses to select and compare rows.

1002

1003

1004

1005

--lock

1006

1007

type: int

1008

1009

Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.

1010

1011

This uses \ ``LOCK TABLES``\ . This can help prevent tables being changed while

1012

you're examining them. The possible values are as follows:

1013

1014

1015

.. code-block:: perl

1016

1017

VALUE MEANING

1018

===== =======================================================

1019

0 Never lock tables.

1020

1 Lock and unlock one time per sync cycle (as implemented

1021

by the syncing algorithm). This is the most granular

1022

level of locking available. For example, the Chunk

1023

algorithm will lock each chunk of C<N> rows, and then

1024

unlock them if they are the same on the source and the

1025

destination, before moving on to the next chunk.

1026

2 Lock and unlock before and after each table.

1027

3 Lock and unlock once for every server (DSN) synced, with

1028

C<FLUSH TABLES WITH READ LOCK>.

1029

1030

1031

A replication slave is never locked if "--replicate" or "--sync-to-master"

1032

is specified, since in theory locking the table on the master should prevent any

1033

changes from taking place. (You are not changing data on your slave, right?)

1034

If "--wait" is given, the master (source) is locked and then the tool waits

1035

for the slave to catch up to the master before continuing.

1036

1037

If \ ``--transaction``\ is specified, \ ``LOCK TABLES``\ is not used. Instead, lock

1038

and unlock are implemented by beginning and committing transactions.

1039

The exception is if "--lock" is 3.

1040

1041

If \ ``--no-transaction``\ is specified, then \ ``LOCK TABLES``\ is used for any

1042

value of "--lock". See "--[no]transaction".

1043

1044

1045

1046

--lock-and-rename

1047

1048

Lock the source and destination table, sync, then swap names. This is useful as

1049

a less-blocking ALTER TABLE, once the tables are reasonably in sync with each

1050

other (which you may choose to accomplish via any number of means, including

1051

dump and reload or even something like pt-archiver). It requires exactly two

1052

DSNs and assumes they are on the same server, so it does no waiting for

1053

replication or the like. Tables are locked with LOCK TABLES.

1054

1055

1056

1057

--password

1058

1059

short form: -p; type: string

1060

1061

Password to use when connecting.

1062

1063

1064

1065

--pid

1066

1067

type: string

1068

1069

Create the given PID file. The file contains the process ID of the script.

1070

The PID file is removed when the script exits. Before starting, the script

1071

checks if the PID file already exists. If it does not, then the script creates

1072

and writes its own PID to it. If it does, then the script checks the following:

1073

if the file contains a PID and a process is running with that PID, then

1074

the script dies; or, if there is no process running with that PID, then the

1075

script overwrites the file with its own PID and starts; else, if the file

1076

contains no PID, then the script dies.

1077

1078

1079

1080

--port

1081

1082

short form: -P; type: int

1083

1084

Port number to use for connection.

1085

1086

1087

1088

--print

1089

1090

Print queries that will resolve differences.

1091

1092

If you don't trust \ ``pt-table-sync``\ , or just want to see what it will do, this

1093

is a good way to be safe. These queries are valid SQL and you can run them

1094

yourself if you want to sync the tables manually.

1095

1096

1097

1098

--recursion-method

1099

1100

type: string

1101

1102

Preferred recursion method used to find slaves.

1103

1104

Possible methods are:

1105

1106

1107

.. code-block:: perl

1108

1109

METHOD USES

1110

=========== ================

1111

processlist SHOW PROCESSLIST

1112

hosts SHOW SLAVE HOSTS

1113

1114

1115

The processlist method is preferred because SHOW SLAVE HOSTS is not reliable.

1116

However, the hosts method is required if the server uses a non-standard

1117

port (not 3306). Usually pt-table-sync does the right thing and finds

1118

the slaves, but you may give a preferred method and it will be used first.

1119

If it doesn't find any slaves, the other methods will be tried.

1120

1121

1122

1123

--replace

1124

1125

Write all \ ``INSERT``\ and \ ``UPDATE``\ statements as \ ``REPLACE``\ .

1126

1127

This is automatically switched on as needed when there are unique index

1128

violations.

1129

1130

1131

1132

--replicate

1133

1134

type: string

1135

1136

Sync tables listed as different in this table.

1137

1138

Specifies that \ ``pt-table-sync``\ should examine the specified table to find data

1139

that differs. The table is exactly the same as the argument of the same name to

1140

pt-table-checksum. That is, it contains records of which tables (and ranges

1141

of values) differ between the master and slave.

1142

1143

For each table and range of values that shows differences between the master and

1144

slave, \ ``pt-table-checksum``\ will sync that table, with the appropriate \ ``WHERE``\

1145

clause, to its master.

1146

1147

This automatically sets "--wait" to 60 and causes changes to be made on the

1148

master instead of the slave.

1149

1150

If "--sync-to-master" is specified, the tool will assume the server you

1151

specified is the slave, and connect to the master as usual to sync.

1152

1153

Otherwise, it will try to use \ ``SHOW PROCESSLIST``\ to find slaves of the server

1154

you specified. If it is unable to find any slaves via \ ``SHOW PROCESSLIST``\ , it

1155

will inspect \ ``SHOW SLAVE HOSTS``\ instead. You must configure each slave's

1156

\ ``report-host``\ , \ ``report-port``\ and other options for this to work right. After

1157

finding slaves, it will inspect the specified table on each slave to find data

1158

that needs to be synced, and sync it.

1159

1160

The tool examines the master's copy of the table first, assuming that the master

1161

is potentially a slave as well. Any table that shows differences there will

1162

\ **NOT**\ be synced on the slave(s). For example, suppose your replication is set

1163

up as A->B, B->C, B->D. Suppose you use this argument and specify server B.

1164

The tool will examine server B's copy of the table. If it looks like server B's

1165

data in table \ ``test.tbl1``\ is different from server A's copy, the tool will not

1166

sync that table on servers C and D.

1167

1168

1169

1170

--set-vars

1171

1172

type: string; default: wait_timeout=10000

1173

1174

Set these MySQL variables. Immediately after connecting to MySQL, this

1175

string will be appended to SET and executed.

1176

1177

1178

1179

--socket

1180

1181

short form: -S; type: string

1182

1183

Socket file to use for connection.

1184

1185

1186

1187

--sync-to-master

1188

1189

Treat the DSN as a slave and sync it to its master.

1190

1191

Treat the server you specified as a slave. Inspect \ ``SHOW SLAVE STATUS``\ ,

1192

connect to the server's master, and treat the master as the source and the slave

1193

as the destination. Causes changes to be made on the master. Sets "--wait"

1194

to 60 by default, sets "--lock" to 1 by default, and disables

1195

"--[no]transaction" by default. See also "--replicate", which changes

1196

this option's behavior.

1197

1198

1199

1200

--tables

1201

1202

short form: -t; type: hash

1203

1204

Sync only this comma-separated list of tables.

1205

1206

Table names may be qualified with the database name.

1207

1208

1209

1210

--timeout-ok

1211

1212

Keep going if "--wait" fails.

1213

1214

If you specify "--wait" and the slave doesn't catch up to the master's

1215

position before the wait times out, the default behavior is to abort. This

1216

option makes the tool keep going anyway. \ **Warning**\ : if you are trying to get a

1217

consistent comparison between the two servers, you probably don't want to keep

1218

going after a timeout.

1219

1220

1221

1222

--[no]transaction

1223

1224

Use transactions instead of \ ``LOCK TABLES``\ .

1225

1226

The granularity of beginning and committing transactions is controlled by

1227

"--lock". This is enabled by default, but since "--lock" is disabled by

1228

default, it has no effect.

1229

1230

Most options that enable locking also disable transactions by default, so if

1231

you want to use transactional locking (via \ ``LOCK IN SHARE MODE``\ and \ ``FOR

1232

UPDATE``\ , you must specify \ ``--transaction``\ explicitly.

1233

1234

If you don't specify \ ``--transaction``\ explicitly \ ``pt-table-sync``\ will decide on

1235

a per-table basis whether to use transactions or table locks. It currently

1236

uses transactions on InnoDB tables, and table locks on all others.

1237

1238

If \ ``--no-transaction``\ is specified, then \ ``pt-table-sync``\ will not use

1239

transactions at all (not even for InnoDB tables) and locking is controlled

1240

by "--lock".

1241

1242

When enabled, either explicitly or implicitly, the transaction isolation level

1243

is set \ ``REPEATABLE READ``\ and transactions are started \ ``WITH CONSISTENT

1244

SNAPSHOT``\ .

1245

1246

1247

1248

--trim

1249

1250

\ ``TRIM()``\ \ ``VARCHAR``\ columns in \ ``BIT_XOR``\ and \ ``ACCUM``\ modes. Helps when

1251

comparing MySQL 4.1 to >= 5.0.

1252

1253

This is useful when you don't care about the trailing space differences between

1254

MySQL versions which vary in their handling of trailing spaces. MySQL 5.0 and

1255

later all retain trailing spaces in \ ``VARCHAR``\ , while previous versions would

1256

remove them.

1257

1258

1259

1260

--[no]unique-checks

1261

1262

default: yes

1263

1264

Enable unique key checks (\ ``SET UNIQUE_CHECKS=1``\ ).

1265

1266

Specifying \ ``--no-unique-checks``\ will \ ``SET UNIQUE_CHECKS=0``\ .

1267

1268

1269

1270

--user

1271

1272

short form: -u; type: string

1273

1274

User for login if not current user.

1275

1276

1277

1278

--verbose

1279

1280

short form: -v; cumulative: yes

1281

1282

Print results of sync operations.

1283

1284

See "OUTPUT" for more details about the output.

1285

1286

1287

1288

--version

1289

1290

Show version and exit.

1291

1292

1293

1294

--wait

1295

1296

short form: -w; type: time

1297

1298

How long to wait for slaves to catch up to their master.

1299

1300

Make the master wait for the slave to catch up in replication before comparing

1301

the tables. The value is the number of seconds to wait before timing out (see

1302

also "--timeout-ok"). Sets "--lock" to 1 and "--[no]transaction" to 0

1303

by default. If you see an error such as the following,

1304

1305

1306

.. code-block:: perl

1307

1308

MASTER_POS_WAIT returned -1

1309

1310

1311

It means the timeout was exceeded and you need to increase it.

1312

1313

The default value of this option is influenced by other options. To see what

1314

value is in effect, run with "--help".

1315

1316

To disable waiting entirely (except for locks), specify "--wait" 0. This

1317

helps when the slave is lagging on tables that are not being synced.

1318

1319

1320

1321

--where

1322

1323

type: string

1324

1325

\ ``WHERE``\ clause to restrict syncing to part of the table.

1326

1327

1328

1329

--[no]zero-chunk

1330

1331

default: yes

1332

1333

Add a chunk for rows with zero or zero-equivalent values. The only has an

1334

effect when "--chunk-size" is specified. The purpose of the zero chunk

1335

is to capture a potentially large number of zero values that would imbalance

1336

the size of the first chunk. For example, if a lot of negative numbers were

1337

inserted into an unsigned integer column causing them to be stored as zeros,

1338

then these zero values are captured by the zero chunk instead of the first

1339

chunk and all its non-zero values.

1340

1341

1342

1343

1344

***********

1345

DSN OPTIONS

1346

***********

1347

1348

1349

These DSN options are used to create a DSN. Each option is given like

1350

\ ``option=value``\ . The options are case-sensitive, so P and p are not the

1351

same option. There cannot be whitespace before or after the \ ``=``\ and

1352

if the value contains whitespace it must be quoted. DSN options are

1353

comma-separated. See the percona-toolkit manpage for full details.

1354

1355

1356

\* A

1357

1358

dsn: charset; copy: yes

1359

1360

Default character set.

1361

1362

1363

1364

\* D

1365

1366

dsn: database; copy: yes

1367

1368

Database containing the table to be synced.

1369

1370

1371

1372

\* F

1373

1374

dsn: mysql_read_default_file; copy: yes

1375

1376

Only read default options from the given file

1377

1378

1379

1380

\* h

1381

1382

dsn: host; copy: yes

1383

1384

Connect to host.

1385

1386

1387

1388

\* p

1389

1390

dsn: password; copy: yes

1391

1392

Password to use when connecting.

1393

1394

1395

1396

\* P

1397

1398

dsn: port; copy: yes

1399

1400

Port number to use for connection.

1401

1402

1403

1404

\* S

1405

1406

dsn: mysql_socket; copy: yes

1407

1408

Socket file to use for connection.

1409

1410

1411

1412

\* t

1413

1414

copy: yes

1415

1416

Table to be synced.

1417

1418

1419

1420

\* u

1421

1422

dsn: user; copy: yes

1423

1424

User for login if not current user.

1425

1426

1427

1428

1429

***********

1430

DOWNLOADING

1431

***********

1432

1433

1434

Visit `http://www.percona.com/software/ <http://www.percona.com/software/>`_ to download the latest release of

1435

Percona Toolkit. Or, to get the latest release from the command line:

1436

1437

1438

.. code-block:: perl

1439

1440

wget percona.com/latest/percona-toolkit/PKG

1441

1442

1443

Replace \ ``PKG``\ with \ ``tar``\ , \ ``rpm``\ , or \ ``deb``\ to download the package in that

1444

format. You can also get individual tools from the latest release:

1445

1446

1447

.. code-block:: perl

1448

1449

wget percona.com/latest/percona-toolkit/TOOL

1450

1451

1452

Replace \ ``TOOL``\ with the name of any tool.

1453

1454

1455

***********

1456

ENVIRONMENT

1457

***********

1458

1459

1460

The environment variable \ ``PTDEBUG``\ enables verbose debugging output to STDERR.

1461

To enable debugging and capture all output to a file, run the tool like:

1462

1463

1464

.. code-block:: perl

1465

1466

PTDEBUG=1 pt-table-sync ... > FILE 2>&1

1467

1468

1469

Be careful: debugging output is voluminous and can generate several megabytes

1470

of output.

1471

1472

1473

*******************

1474

SYSTEM REQUIREMENTS

1475

*******************

1476

1477

1478

You need Perl, DBI, DBD::mysql, and some core packages that ought to be

1479

installed in any reasonably new version of Perl.

1480

1481

1482

****

1483

BUGS

1484

****

1485

1486

1487

For a list of known bugs, see `http://www.percona.com/bugs/pt-table-sync <http://www.percona.com/bugs/pt-table-sync>`_.

1488

1489

Please report bugs at `https://bugs.launchpad.net/percona-toolkit <https://bugs.launchpad.net/percona-toolkit>`_.

1490

Include the following information in your bug report:

1491

1492

1493

\* Complete command-line used to run the tool

1494

1495

1496

1497

\* Tool "--version"

1498

1499

1500

1501

\* MySQL version of all servers involved

1502

1503

1504

1505

\* Output from the tool including STDERR

1506

1507

1508

1509

\* Input files (log/dump/config files, etc.)

1510

1511

1512

1513

If possible, include debugging output by running the tool with \ ``PTDEBUG``\ ;

1514

see "ENVIRONMENT".

1515

1516

1517

*******

1518

AUTHORS

1519

*******

1520

1521

1522

Baron Schwartz

1523

1524

1525

***************

1526

ACKNOWLEDGMENTS

1527

***************

1528

1529

1530

My work is based in part on Giuseppe Maxia's work on distributed databases,

1531

`http://www.sysadminmag.com/articles/2004/0408/ <http://www.sysadminmag.com/articles/2004/0408/>`_ and code derived from that

1532

article. There is more explanation, and a link to the code, at

1533

`http://www.perlmonks.org/?node_id=381053 <http://www.perlmonks.org/?node_id=381053>`_.

1534

1535

Another programmer extended Maxia's work even further. Fabien Coelho changed

1536

and generalized Maxia's technique, introducing symmetry and avoiding some

1537

problems that might have caused too-frequent checksum collisions. This work

1538

grew into pg_comparator, `http://www.coelho.net/pg_comparator/ <http://www.coelho.net/pg_comparator/>`_. Coelho also

1539

explained the technique further in a paper titled "Remote Comparison of Database

1540

Tables" (`http://cri.ensmp.fr/classement/doc/A-375.pdf <http://cri.ensmp.fr/classement/doc/A-375.pdf>`_).

1541

1542

This existing literature mostly addressed how to find the differences between

1543

the tables, not how to resolve them once found. I needed a tool that would not

1544

only find them efficiently, but would then resolve them. I first began thinking

1545

about how to improve the technique further with my article

1546

`http://tinyurl.com/mysql-data-diff-algorithm <http://tinyurl.com/mysql-data-diff-algorithm>`_,

1547

where I discussed a number of problems with the Maxia/Coelho "bottom-up"

1548

algorithm. After writing that article, I began to write this tool. I wanted to

1549

actually implement their algorithm with some improvements so I was sure I

1550

understood it completely. I discovered it is not what I thought it was, and is

1551

considerably more complex than it appeared to me at first. Fabien Coelho was

1552

kind enough to address some questions over email.

1553

1554

The first versions of this tool implemented a version of the Coelho/Maxia

1555

algorithm, which I called "bottom-up", and my own, which I called "top-down."

1556

Those algorithms are considerably more complex than the current algorithms and

1557

I have removed them from this tool, and may add them back later. The

1558

improvements to the bottom-up algorithm are my original work, as is the

1559

top-down algorithm. The techniques to actually resolve the differences are

1560

also my own work.

1561

1562

Another tool that can synchronize tables is the SQLyog Job Agent from webyog.

1563

Thanks to Rohit Nadhani, SJA's author, for the conversations about the general

1564

techniques. There is a comparison of pt-table-sync and SJA at

1565

`http://tinyurl.com/maatkit-vs-sqlyog <http://tinyurl.com/maatkit-vs-sqlyog>`_

1566

1567

Thanks to the following people and organizations for helping in many ways:

1568

1569

The Rimm-Kaufman Group `http://www.rimmkaufman.com/ <http://www.rimmkaufman.com/>`_,

1570

MySQL AB `http://www.mysql.com/ <http://www.mysql.com/>`_,

1571

Blue Ridge InternetWorks `http://www.briworks.com/ <http://www.briworks.com/>`_,

1572

Percona `http://www.percona.com/ <http://www.percona.com/>`_,

1573

Fabien Coelho,

1574

Giuseppe Maxia and others at MySQL AB,

1575

Kristian Koehntopp (MySQL AB),

1576

Rohit Nadhani (WebYog),

1577

The helpful monks at Perlmonks,

1578

And others too numerous to mention.

1579

1580

1581

*********************

1582

ABOUT PERCONA TOOLKIT

1583

*********************

1584

1585

1586

This tool is part of Percona Toolkit, a collection of advanced command-line

1587

tools developed by Percona for MySQL support and consulting. Percona Toolkit

1588

was forked from two projects in June, 2011: Maatkit and Aspersa. Those

1589

projects were created by Baron Schwartz and developed primarily by him and

1590

Daniel Nichter, both of whom are employed by Percona. Visit

1591

`http://www.percona.com/software/ <http://www.percona.com/software/>`_ for more software developed by Percona.

1592

1593

1594

********************************

1595

1596

********************************

1597

1598

1599

1600

Feedback and improvements are welcome.

1601

1602

THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED

1603

WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF

1604

MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

1605

1606

This program is free software; you can redistribute it and/or modify it under

1607

the terms of the GNU General Public License as published by the Free Software

1608

Foundation, version 2; OR the Perl Artistic License. On UNIX and similar

1609

systems, you can issue \`man perlgpl' or \`man perlartistic' to read these

1610

licenses.

1611

1612

You should have received a copy of the GNU General Public License along with

1613

this program; if not, write to the Free Software Foundation, Inc., 59 Temple

1614

Place, Suite 330, Boston, MA 02111-1307 USA.

1615

1616

1617

*******

1618

VERSION

1619

*******

1620

1621

1622

Percona Toolkit v1.0.0 released 2011-08-01

1623

Older »