1
<!-- doc/src/sgml/maintenance.sgml -->
3
<chapter id="maintenance">
4
<title>Routine Database Maintenance Tasks</title>
6
<indexterm zone="maintenance">
7
<primary>maintenance</primary>
10
<indexterm zone="maintenance">
11
<primary>routine maintenance</primary>
15
<productname>PostgreSQL</>, like any database software, requires that certain tasks
16
be performed regularly to achieve optimum performance. The tasks
17
discussed here are <emphasis>required</emphasis>, but they
18
are repetitive in nature and can easily be automated using standard
19
tools such as <application>cron</application> scripts or
20
Windows' <application>Task Scheduler</>. It is the database
21
administrator's responsibility to set up appropriate scripts, and to
22
check that they execute successfully.
26
One obvious maintenance task is the creation of backup copies of the data on a
27
regular schedule. Without a recent backup, you have no chance of recovery
28
after a catastrophe (disk failure, fire, mistakenly dropping a critical
29
table, etc.). The backup and recovery mechanisms available in
30
<productname>PostgreSQL</productname> are discussed at length in
31
<xref linkend="backup">.
35
The other main category of maintenance task is periodic <quote>vacuuming</>
36
of the database. This activity is discussed in
37
<xref linkend="routine-vacuuming">. Closely related to this is updating
38
the statistics that will be used by the query planner, as discussed in
39
<xref linkend="vacuum-for-statistics">.
43
Another task that might need periodic attention is log file management.
44
This is discussed in <xref linkend="logfile-maintenance">.
49
url="http://bucardo.org/wiki/Check_postgres"><application>check_postgres</></ulink>
50
is available for monitoring database health and reporting unusual
51
conditions. <application>check_postgres</> integrates with
52
Nagios and MRTG, but can be run standalone too.
56
<productname>PostgreSQL</productname> is low-maintenance compared
57
to some other database management systems. Nonetheless,
58
appropriate attention to these tasks will go far towards ensuring a
59
pleasant and productive experience with the system.
62
<sect1 id="routine-vacuuming">
63
<title>Routine Vacuuming</title>
65
<indexterm zone="routine-vacuuming">
66
<primary>vacuum</primary>
70
<productname>PostgreSQL</productname> databases require periodic
71
maintenance known as <firstterm>vacuuming</>. For many installations, it
72
is sufficient to let vacuuming be performed by the <firstterm>autovacuum
73
daemon</>, which is described in <xref linkend="autovacuum">. You might
74
need to adjust the autovacuuming parameters described there to obtain best
75
results for your situation. Some database administrators will want to
76
supplement or replace the daemon's activities with manually-managed
77
<command>VACUUM</> commands, which typically are executed according to a
78
schedule by <application>cron</application> or <application>Task
79
Scheduler</> scripts. To set up manually-managed vacuuming properly,
80
it is essential to understand the issues discussed in the next few
81
subsections. Administrators who rely on autovacuuming may still wish
82
to skim this material to help them understand and adjust autovacuuming.
85
<sect2 id="vacuum-basics">
86
<title>Vacuuming Basics</title>
89
<productname>PostgreSQL</productname>'s
90
<xref linkend="sql-vacuum"> command has to
91
process each table on a regular basis for several reasons:
95
<simpara>To recover or reuse disk space occupied by updated or deleted
100
<simpara>To update data statistics used by the
101
<productname>PostgreSQL</productname> query planner.</simpara>
105
<simpara>To protect against loss of very old data due to
106
<firstterm>transaction ID wraparound</>.</simpara>
110
Each of these reasons dictates performing <command>VACUUM</> operations
111
of varying frequency and scope, as explained in the following subsections.
115
There are two variants of <command>VACUUM</>: standard <command>VACUUM</>
116
and <command>VACUUM FULL</>. <command>VACUUM FULL</> can reclaim more
117
disk space but runs much more slowly. Also,
118
the standard form of <command>VACUUM</> can run in parallel with production
119
database operations. (Commands such as <command>SELECT</command>,
120
<command>INSERT</command>, <command>UPDATE</command>, and
121
<command>DELETE</command> will continue to function normally, though you
122
will not be able to modify the definition of a table with commands such as
123
<command>ALTER TABLE</command> while it is being vacuumed.)
124
<command>VACUUM FULL</> requires exclusive lock on the table it is
125
working on, and therefore cannot be done in parallel with other use
126
of the table. Generally, therefore,
127
administrators should strive to use standard <command>VACUUM</> and
128
avoid <command>VACUUM FULL</>.
132
<command>VACUUM</command> creates a substantial amount of I/O
133
traffic, which can cause poor performance for other active sessions.
134
There are configuration parameters that can be adjusted to reduce the
135
performance impact of background vacuuming — see
136
<xref linkend="runtime-config-resource-vacuum-cost">.
140
<sect2 id="vacuum-for-space-recovery">
141
<title>Recovering Disk Space</title>
143
<indexterm zone="vacuum-for-space-recovery">
144
<primary>disk space</primary>
148
In <productname>PostgreSQL</productname>, an
149
<command>UPDATE</> or <command>DELETE</> of a row does not
150
immediately remove the old version of the row.
151
This approach is necessary to gain the benefits of multiversion
152
concurrency control (<acronym>MVCC</>, see <xref linkend="mvcc">): the row version
153
must not be deleted while it is still potentially visible to other
154
transactions. But eventually, an outdated or deleted row version is no
155
longer of interest to any transaction. The space it occupies must then be
156
reclaimed for reuse by new rows, to avoid unbounded growth of disk
157
space requirements. This is done by running <command>VACUUM</>.
161
The standard form of <command>VACUUM</command> removes dead row
162
versions in tables and indexes and marks the space available for
163
future reuse. However, it will not return the space to the operating
164
system, except in the special case where one or more pages at the
165
end of a table become entirely free and an exclusive table lock can be
166
easily obtained. In contrast, <command>VACUUM FULL</> actively compacts
167
tables by writing a complete new version of the table file with no dead
168
space. This minimizes the size of the table, but can take a long time.
169
It also requires extra disk space for the new copy of the table, until
170
the operation completes.
174
The usual goal of routine vacuuming is to do standard <command>VACUUM</>s
175
often enough to avoid needing <command>VACUUM FULL</>. The
176
autovacuum daemon attempts to work this way, and in fact will
177
never issue <command>VACUUM FULL</>. In this approach, the idea
178
is not to keep tables at their minimum size, but to maintain steady-state
179
usage of disk space: each table occupies space equivalent to its
180
minimum size plus however much space gets used up between vacuumings.
181
Although <command>VACUUM FULL</> can be used to shrink a table back
182
to its minimum size and return the disk space to the operating system,
183
there is not much point in this if the table will just grow again in the
184
future. Thus, moderately-frequent standard <command>VACUUM</> runs are a
185
better approach than infrequent <command>VACUUM FULL</> runs for
186
maintaining heavily-updated tables.
190
Some administrators prefer to schedule vacuuming themselves, for example
191
doing all the work at night when load is low.
192
The difficulty with doing vacuuming according to a fixed schedule
193
is that if a table has an unexpected spike in update activity, it may
194
get bloated to the point that <command>VACUUM FULL</> is really necessary
195
to reclaim space. Using the autovacuum daemon alleviates this problem,
196
since the daemon schedules vacuuming dynamically in response to update
197
activity. It is unwise to disable the daemon completely unless you
198
have an extremely predictable workload. One possible compromise is
199
to set the daemon's parameters so that it will only react to unusually
200
heavy update activity, thus keeping things from getting out of hand,
201
while scheduled <command>VACUUM</>s are expected to do the bulk of the
202
work when the load is typical.
206
For those not using autovacuum, a typical approach is to schedule a
207
database-wide <command>VACUUM</> once a day during a low-usage period,
208
supplemented by more frequent vacuuming of heavily-updated tables as
209
necessary. (Some installations with extremely high update rates vacuum
210
their busiest tables as often as once every few minutes.) If you have
211
multiple databases in a cluster, don't forget to
212
<command>VACUUM</command> each one; the program <xref
213
linkend="app-vacuumdb"> might be helpful.
218
Plain <command>VACUUM</> may not be satisfactory when
219
a table contains large numbers of dead row versions as a result of
220
massive update or delete activity. If you have such a table and
221
you need to reclaim the excess disk space it occupies, you will need
222
to use <command>VACUUM FULL</>, or alternatively
223
<xref linkend="sql-cluster">
224
or one of the table-rewriting variants of
225
<xref linkend="sql-altertable">.
226
These commands rewrite an entire new copy of the table and build
227
new indexes for it. All these options require exclusive lock. Note that
228
they also temporarily use extra disk space approximately equal to the size
229
of the table, since the old copies of the table and indexes can't be
230
released until the new ones are complete.
236
If you have a table whose entire contents are deleted on a periodic
237
basis, consider doing it with
238
<xref linkend="sql-truncate"> rather
239
than using <command>DELETE</command> followed by
240
<command>VACUUM</command>. <command>TRUNCATE</command> removes the
241
entire content of the table immediately, without requiring a
242
subsequent <command>VACUUM</command> or <command>VACUUM
243
FULL</command> to reclaim the now-unused disk space.
244
The disadvantage is that strict MVCC semantics are violated.
249
<sect2 id="vacuum-for-statistics">
250
<title>Updating Planner Statistics</title>
252
<indexterm zone="vacuum-for-statistics">
253
<primary>statistics</primary>
254
<secondary>of the planner</secondary>
257
<indexterm zone="vacuum-for-statistics">
258
<primary>ANALYZE</primary>
262
The <productname>PostgreSQL</productname> query planner relies on
263
statistical information about the contents of tables in order to
264
generate good plans for queries. These statistics are gathered by
265
the <xref linkend="sql-analyze"> command,
266
which can be invoked by itself or
267
as an optional step in <command>VACUUM</>. It is important to have
268
reasonably accurate statistics, otherwise poor choices of plans might
269
degrade database performance.
273
The autovacuum daemon, if enabled, will automatically issue
274
<command>ANALYZE</> commands whenever the content of a table has
275
changed sufficiently. However, administrators might prefer to rely
276
on manually-scheduled <command>ANALYZE</> operations, particularly
277
if it is known that update activity on a table will not affect the
278
statistics of <quote>interesting</> columns. The daemon schedules
279
<command>ANALYZE</> strictly as a function of the number of rows
280
inserted or updated; it has no knowledge of whether that will lead
281
to meaningful statistical changes.
285
As with vacuuming for space recovery, frequent updates of statistics
286
are more useful for heavily-updated tables than for seldom-updated
287
ones. But even for a heavily-updated table, there might be no need for
288
statistics updates if the statistical distribution of the data is
289
not changing much. A simple rule of thumb is to think about how much
290
the minimum and maximum values of the columns in the table change.
291
For example, a <type>timestamp</type> column that contains the time
292
of row update will have a constantly-increasing maximum value as
293
rows are added and updated; such a column will probably need more
294
frequent statistics updates than, say, a column containing URLs for
295
pages accessed on a website. The URL column might receive changes just
296
as often, but the statistical distribution of its values probably
297
changes relatively slowly.
301
It is possible to run <command>ANALYZE</> on specific tables and even
302
just specific columns of a table, so the flexibility exists to update some
303
statistics more frequently than others if your application requires it.
304
In practice, however, it is usually best to just analyze the entire
305
database, because it is a fast operation. <command>ANALYZE</> uses a
306
statistically random sampling of the rows of a table rather than reading
312
Although per-column tweaking of <command>ANALYZE</> frequency might not be
313
very productive, you might find it worthwhile to do per-column
314
adjustment of the level of detail of the statistics collected by
315
<command>ANALYZE</>. Columns that are heavily used in <literal>WHERE</>
316
clauses and have highly irregular data distributions might require a
317
finer-grain data histogram than other columns. See <command>ALTER TABLE
318
SET STATISTICS</>, or change the database-wide default using the <xref
319
linkend="guc-default-statistics-target"> configuration parameter.
323
Also, by default there is limited information available about
324
the selectivity of functions. However, if you create an expression
325
index that uses a function call, useful statistics will be
326
gathered about the function, which can greatly improve query
327
plans that use the expression index.
332
<sect2 id="vacuum-for-wraparound">
333
<title>Preventing Transaction ID Wraparound Failures</title>
335
<indexterm zone="vacuum-for-wraparound">
336
<primary>transaction ID</primary>
337
<secondary>wraparound</secondary>
341
<productname>PostgreSQL</productname>'s MVCC transaction semantics
342
depend on being able to compare transaction ID (<acronym>XID</>)
343
numbers: a row version with an insertion XID greater than the current
344
transaction's XID is <quote>in the future</> and should not be visible
345
to the current transaction. But since transaction IDs have limited size
346
(32 bits) a cluster that runs for a long time (more
347
than 4 billion transactions) would suffer <firstterm>transaction ID
348
wraparound</>: the XID counter wraps around to zero, and all of a sudden
349
transactions that were in the past appear to be in the future — which
350
means their output become invisible. In short, catastrophic data loss.
351
(Actually the data is still there, but that's cold comfort if you cannot
352
get at it.) To avoid this, it is necessary to vacuum every table
353
in every database at least once every two billion transactions.
357
The reason that periodic vacuuming solves the problem is that
358
<productname>PostgreSQL</productname> reserves a special XID
359
as <literal>FrozenXID</>. This XID does not follow the normal XID
360
comparison rules and is always considered older
361
than every normal XID. Normal XIDs are
362
compared using modulo-2<superscript>31</> arithmetic. This means
363
that for every normal XID, there are two billion XIDs that are
364
<quote>older</> and two billion that are <quote>newer</>; another
365
way to say it is that the normal XID space is circular with no
366
endpoint. Therefore, once a row version has been created with a particular
367
normal XID, the row version will appear to be <quote>in the past</> for
368
the next two billion transactions, no matter which normal XID we are
369
talking about. If the row version still exists after more than two billion
370
transactions, it will suddenly appear to be in the future. To
371
prevent this, old row versions must be reassigned the XID
372
<literal>FrozenXID</> sometime before they reach the
373
two-billion-transactions-old mark. Once they are assigned this
374
special XID, they will appear to be <quote>in the past</> to all
375
normal transactions regardless of wraparound issues, and so such
376
row versions will be valid until deleted, no matter how long that is.
377
This reassignment of old XIDs is handled by <command>VACUUM</>.
381
<xref linkend="guc-vacuum-freeze-min-age">
382
controls how old an XID value has to be before it's replaced with
383
<literal>FrozenXID</>. Larger values of this setting
384
preserve transactional information longer, while smaller values increase
385
the number of transactions that can elapse before the table must be
390
<command>VACUUM</> normally skips pages that don't have any dead row
391
versions, but those pages might still have row versions with old XID
392
values. To ensure all old XIDs have been replaced by
393
<literal>FrozenXID</>, a scan of the whole table is needed.
394
<xref linkend="guc-vacuum-freeze-table-age"> controls when
395
<command>VACUUM</> does that: a whole table sweep is forced if
396
the table hasn't been fully scanned for <varname>vacuum_freeze_table_age</>
397
minus <varname>vacuum_freeze_min_age</> transactions. Setting it to 0
398
forces <command>VACUUM</> to always scan all pages, effectively ignoring
403
The maximum time that a table can go unvacuumed is two billion
404
transactions minus the <varname>vacuum_freeze_min_age</> value at
405
the time <command>VACUUM</> last scanned the whole table. If it were to go
406
unvacuumed for longer than
407
that, data loss could result. To ensure that this does not happen,
408
autovacuum is invoked on any table that might contain XIDs older than the
409
age specified by the configuration parameter <xref
410
linkend="guc-autovacuum-freeze-max-age">. (This will happen even if
411
autovacuum is disabled.)
415
This implies that if a table is not otherwise vacuumed,
416
autovacuum will be invoked on it approximately once every
417
<varname>autovacuum_freeze_max_age</> minus
418
<varname>vacuum_freeze_min_age</> transactions.
419
For tables that are regularly vacuumed for space reclamation purposes,
420
this is of little importance. However, for static tables
421
(including tables that receive inserts, but no updates or deletes),
422
there is no need to vacuum for space reclamation, so it can
423
be useful to try to maximize the interval between forced autovacuums
424
on very large static tables. Obviously one can do this either by
425
increasing <varname>autovacuum_freeze_max_age</> or decreasing
426
<varname>vacuum_freeze_min_age</>.
430
The effective maximum for <varname>vacuum_freeze_table_age</> is 0.95 *
431
<varname>autovacuum_freeze_max_age</>; a setting higher than that will be
432
capped to the maximum. A value higher than
433
<varname>autovacuum_freeze_max_age</> wouldn't make sense because an
434
anti-wraparound autovacuum would be triggered at that point anyway, and
435
the 0.95 multiplier leaves some breathing room to run a manual
436
<command>VACUUM</> before that happens. As a rule of thumb,
437
<command>vacuum_freeze_table_age</> should be set to a value somewhat
438
below <varname>autovacuum_freeze_max_age</>, leaving enough gap so that
439
a regularly scheduled <command>VACUUM</> or an autovacuum triggered by
440
normal delete and update activity is run in that window. Setting it too
441
close could lead to anti-wraparound autovacuums, even though the table
442
was recently vacuumed to reclaim space, whereas lower values lead to more
443
frequent whole-table scans.
447
The sole disadvantage of increasing <varname>autovacuum_freeze_max_age</>
448
(and <varname>vacuum_freeze_table_age</> along with it)
449
is that the <filename>pg_clog</> subdirectory of the database cluster
450
will take more space, because it must store the commit status of all
451
transactions back to the <varname>autovacuum_freeze_max_age</> horizon.
452
The commit status uses two bits per transaction, so if
453
<varname>autovacuum_freeze_max_age</> is set to its maximum allowed value of
454
a little less than two billion, <filename>pg_clog</> can be expected to
455
grow to about half a gigabyte. If this is trivial compared to your
456
total database size, setting <varname>autovacuum_freeze_max_age</> to
457
its maximum allowed value is recommended. Otherwise, set it depending
458
on what you are willing to allow for <filename>pg_clog</> storage.
459
(The default, 200 million transactions, translates to about 50MB of
460
<filename>pg_clog</> storage.)
464
One disadvantage of decreasing <varname>vacuum_freeze_min_age</> is that
465
it might cause <command>VACUUM</> to do useless work: changing a table row's
466
XID to <literal>FrozenXID</> is a waste of time if the row is modified
467
soon thereafter (causing it to acquire a new XID). So the setting should
468
be large enough that rows are not frozen until they are unlikely to change
469
any more. Another disadvantage of decreasing this setting is
470
that details about exactly which transaction inserted or modified a
471
row will be lost sooner. This information sometimes comes in handy,
472
particularly when trying to analyze what went wrong after a database
473
failure. For these two reasons, decreasing this setting is not
474
recommended except for completely static tables.
478
To track the age of the oldest XIDs in a database,
479
<command>VACUUM</> stores XID
480
statistics in the system tables <structname>pg_class</> and
481
<structname>pg_database</>. In particular,
482
the <structfield>relfrozenxid</> column of a table's
483
<structname>pg_class</> row contains the freeze cutoff XID that was used
484
by the last whole-table <command>VACUUM</> for that table. All normal
485
XIDs older than this cutoff XID are guaranteed to have been replaced by
486
<literal>FrozenXID</> within the table. Similarly,
487
the <structfield>datfrozenxid</> column of a database's
488
<structname>pg_database</> row is a lower bound on the normal XIDs
489
appearing in that database — it is just the minimum of the
490
per-table <structfield>relfrozenxid</> values within the database.
492
examine this information is to execute queries such as:
495
SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
496
SELECT datname, age(datfrozenxid) FROM pg_database;
499
The <literal>age</> column measures the number of transactions from the
500
cutoff XID to the current transaction's XID.
504
<command>VACUUM</> normally
505
only scans pages that have been modified since the last vacuum, but
506
<structfield>relfrozenxid</> can only be advanced when the whole table is
507
scanned. The whole table is scanned when <structfield>relfrozenxid</> is
508
more than <varname>vacuum_freeze_table_age</> transactions old, when
509
<command>VACUUM</>'s <literal>FREEZE</> option is used, or when all pages
511
require vacuuming to remove dead row versions. When <command>VACUUM</>
512
scans the whole table, after it's finished <literal>age(relfrozenxid)</>
513
should be a little more than the <varname>vacuum_freeze_min_age</> setting
514
that was used (more by the number of transactions started since the
515
<command>VACUUM</> started). If no whole-table-scanning <command>VACUUM</>
516
is issued on the table until <varname>autovacuum_freeze_max_age</> is
517
reached, an autovacuum will soon be forced for the table.
521
If for some reason autovacuum fails to clear old XIDs from a table,
522
the system will begin to emit warning messages like this when the
523
database's oldest XIDs reach ten million transactions from the wraparound
527
WARNING: database "mydb" must be vacuumed within 177009986 transactions
528
HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".
531
(A manual <command>VACUUM</> should fix the problem, as suggested by the
532
hint; but note that the <command>VACUUM</> must be performed by a
533
superuser, else it will fail to process system catalogs and thus not
534
be able to advance the database's <structfield>datfrozenxid</>.)
535
If these warnings are
536
ignored, the system will shut down and refuse to start any new
537
transactions once there are fewer than 1 million transactions left
541
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
542
HINT: Stop the postmaster and use a standalone backend to VACUUM in "mydb".
545
The 1-million-transaction safety margin exists to let the
546
administrator recover without data loss, by manually executing the
547
required <command>VACUUM</> commands. However, since the system will not
548
execute commands once it has gone into the safety shutdown mode,
549
the only way to do this is to stop the server and use a single-user
550
backend to execute <command>VACUUM</>. The shutdown mode is not enforced
551
by a single-user backend. See the <xref linkend="app-postgres"> reference
552
page for details about using a single-user backend.
557
<sect2 id="autovacuum">
558
<title>The Autovacuum Daemon</title>
561
<primary>autovacuum</primary>
562
<secondary>general information</secondary>
565
<productname>PostgreSQL</productname> has an optional but highly
566
recommended feature called <firstterm>autovacuum</firstterm>,
567
whose purpose is to automate the execution of
568
<command>VACUUM</command> and <command>ANALYZE </command> commands.
569
When enabled, autovacuum checks for
570
tables that have had a large number of inserted, updated or deleted
571
tuples. These checks use the statistics collection facility;
572
therefore, autovacuum cannot be used unless <xref
573
linkend="guc-track-counts"> is set to <literal>true</literal>.
574
In the default configuration, autovacuuming is enabled and the related
575
configuration parameters are appropriately set.
579
The <quote>autovacuum daemon</> actually consists of multiple processes.
580
There is a persistent daemon process, called the
581
<firstterm>autovacuum launcher</firstterm>, which is in charge of starting
582
<firstterm>autovacuum worker</firstterm> processes for all databases. The
583
launcher will distribute the work across time, attempting to start one
584
worker within each database every <xref linkend="guc-autovacuum-naptime">
585
seconds. (Therefore, if the installation has <replaceable>N</> databases,
586
a new worker will be launched every
587
<varname>autovacuum_naptime</>/<replaceable>N</> seconds.)
588
A maximum of <xref linkend="guc-autovacuum-max-workers"> worker processes
589
are allowed to run at the same time. If there are more than
590
<varname>autovacuum_max_workers</> databases to be processed,
591
the next database will be processed as soon as the first worker finishes.
592
Each worker process will check each table within its database and
593
execute <command>VACUUM</> and/or <command>ANALYZE</> as needed.
597
If several large tables all become eligible for vacuuming in a short
598
amount of time, all autovacuum workers might become occupied with
599
vacuuming those tables for a long period. This would result
600
in other tables and databases not being vacuumed until a worker became
601
available. There is no limit on how many workers might be in a
602
single database, but workers do try to avoid repeating work that has
603
already been done by other workers. Note that the number of running
604
workers does not count towards <xref linkend="guc-max-connections"> or
605
<xref linkend="guc-superuser-reserved-connections"> limits.
609
Tables whose <structfield>relfrozenxid</> value is more than
610
<xref linkend="guc-autovacuum-freeze-max-age"> transactions old are always
611
vacuumed (this also applies to those tables whose freeze max age has
612
been modified via storage parameters; see below). Otherwise, if the
613
number of tuples obsoleted since the last
614
<command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
615
table is vacuumed. The vacuum threshold is defined as:
617
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
619
where the vacuum base threshold is
620
<xref linkend="guc-autovacuum-vacuum-threshold">,
621
the vacuum scale factor is
622
<xref linkend="guc-autovacuum-vacuum-scale-factor">,
623
and the number of tuples is
624
<structname>pg_class</structname>.<structfield>reltuples</structfield>.
625
The number of obsolete tuples is obtained from the statistics
626
collector; it is a semi-accurate count updated by each
627
<command>UPDATE</command> and <command>DELETE</command> operation. (It
628
is only semi-accurate because some information might be lost under heavy
629
load.) If the <structfield>relfrozenxid</> value of the table is more
630
than <varname>vacuum_freeze_table_age</> transactions old, the whole
631
table is scanned to freeze old tuples and advance
632
<structfield>relfrozenxid</>, otherwise only pages that have been modified
633
since the last vacuum are scanned.
637
For analyze, a similar condition is used: the threshold, defined as:
639
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
641
is compared to the total number of tuples inserted, updated, or deleted
642
since the last <command>ANALYZE</command>.
646
Temporary tables cannot be accessed by autovacuum. Therefore,
647
appropriate vacuum and analyze operations should be performed via
648
session SQL commands.
652
The default thresholds and scale factors are taken from
653
<filename>postgresql.conf</filename>, but it is possible to override them
654
on a table-by-table basis; see
655
<xref linkend="sql-createtable-storage-parameters"
656
endterm="sql-createtable-storage-parameters-title"> for more information.
658
has been changed via storage parameters, that value is used; otherwise the
659
global settings are used. See <xref linkend="runtime-config-autovacuum"> for
660
more details on the global settings.
664
Besides the base threshold values and scale factors, there are six
665
more autovacuum parameters that can be set for each table via
667
The first parameter, <literal>autovacuum_enabled</>,
668
can be set to <literal>false</literal> to instruct the autovacuum daemon
669
to skip that particular table entirely. In this case
670
autovacuum will only touch the table if it must do so
671
to prevent transaction ID wraparound.
672
Another two parameters,
673
<varname>autovacuum_vacuum_cost_delay</> and
674
<varname>autovacuum_vacuum_cost_limit</>, are used to set
675
table-specific values for the cost-based vacuum delay feature
676
(see <xref linkend="runtime-config-resource-vacuum-cost">).
677
<varname>autovacuum_freeze_min_age</>,
678
<varname>autovacuum_freeze_max_age</> and
679
<varname>autovacuum_freeze_table_age</> are used to set
680
values for <xref linkend="guc-vacuum-freeze-min-age">,
681
<xref linkend="guc-autovacuum-freeze-max-age"> and
682
<xref linkend="guc-vacuum-freeze-table-age"> respectively.
686
When multiple workers are running, the cost limit is
687
<quote>balanced</quote> among all the running workers, so that the
688
total impact on the system is the same, regardless of the number
689
of workers actually running.
695
<sect1 id="routine-reindex">
696
<title>Routine Reindexing</title>
698
<indexterm zone="routine-reindex">
699
<primary>reindex</primary>
703
In some situations it is worthwhile to rebuild indexes periodically
704
with the <xref linkend="sql-reindex">
709
B-tree index pages that have become completely empty are reclaimed for
710
re-use. However, there is still a possibility
711
of inefficient use of space: if all but a few index keys on a page have
712
been deleted, the page remains allocated. Therefore, a usage
713
pattern in which most, but not all, keys in each range are eventually
714
deleted will see poor use of space. For such usage patterns,
715
periodic reindexing is recommended.
719
The potential for bloat in non-B-tree indexes has not been well
720
researched. It is a good idea to periodically monitor the index's physical
721
size when using any non-B-tree index type.
725
Also, for B-tree indexes, a freshly-constructed index is slightly faster to
726
access than one that has been updated many times because logically
727
adjacent pages are usually also physically adjacent in a newly built index.
728
(This consideration does not apply to non-B-tree indexes.) It
729
might be worthwhile to reindex periodically just to improve access speed.
734
<sect1 id="logfile-maintenance">
735
<title>Log File Maintenance</title>
737
<indexterm zone="logfile-maintenance">
738
<primary>server log</primary>
739
<secondary>log file maintenance</secondary>
743
It is a good idea to save the database server's log output
744
somewhere, rather than just discarding it via <filename>/dev/null</>.
745
The log output is invaluable when diagnosing
746
problems. However, the log output tends to be voluminous
747
(especially at higher debug levels) so you won't want to save it
748
indefinitely. You need to <emphasis>rotate</> the log files so that
749
new log files are started and old ones removed after a reasonable
754
If you simply direct the <systemitem>stderr</> of
755
<command>postgres</command> into a
756
file, you will have log output, but
757
the only way to truncate the log file is to stop and restart
758
the server. This might be acceptable if you are using
759
<productname>PostgreSQL</productname> in a development environment,
760
but few production servers would find this behavior acceptable.
764
A better approach is to send the server's
765
<systemitem>stderr</> output to some type of log rotation program.
766
There is a built-in log rotation facility, which you can use by
767
setting the configuration parameter <varname>logging_collector</> to
768
<literal>true</> in <filename>postgresql.conf</>. The control
769
parameters for this program are described in <xref
770
linkend="runtime-config-logging-where">. You can also use this approach
771
to capture the log data in machine readable <acronym>CSV</>
772
(comma-separated values) format.
776
Alternatively, you might prefer to use an external log rotation
777
program if you have one that you are already using with other
778
server software. For example, the <application>rotatelogs</application>
779
tool included in the <productname>Apache</productname> distribution
780
can be used with <productname>PostgreSQL</productname>. To do this,
781
just pipe the server's
782
<systemitem>stderr</> output to the desired program.
783
If you start the server with
784
<command>pg_ctl</>, then <systemitem>stderr</>
785
is already redirected to <systemitem>stdout</>, so you just need a
786
pipe command, for example:
789
pg_ctl start | rotatelogs /var/log/pgsql_log 86400
794
Another production-grade approach to managing log output is to
795
send it to <application>syslog</> and let
796
<application>syslog</> deal with file rotation. To do this, set the
797
configuration parameter <varname>log_destination</> to <literal>syslog</>
798
(to log to <application>syslog</> only) in
799
<filename>postgresql.conf</>. Then you can send a <literal>SIGHUP</literal>
800
signal to the <application>syslog</> daemon whenever you want to force it
801
to start writing a new log file. If you want to automate log
802
rotation, the <application>logrotate</application> program can be
803
configured to work with log files from
804
<application>syslog</application>.
808
On many systems, however, <application>syslog</> is not very reliable,
809
particularly with large log messages; it might truncate or drop messages
810
just when you need them the most. Also, on <productname>Linux</>,
811
<application>syslog</> will flush each message to disk, yielding poor
812
performance. (You can use a <quote><literal>-</></> at the start of the file name
813
in the <application>syslog</> configuration file to disable syncing.)
817
Note that all the solutions described above take care of starting new
818
log files at configurable intervals, but they do not handle deletion
819
of old, no-longer-useful log files. You will probably want to set
820
up a batch job to periodically delete old log files. Another possibility
821
is to configure the rotation program so that old log files are overwritten
826
<ulink url="http://pgfouine.projects.postgresql.org/">pgFouine</ulink>
827
is an external project that does sophisticated log file analysis.
829
url="http://bucardo.org/wiki/Check_postgres">check_postgres</ulink>
830
provides Nagios alerts when important messages appear in the log
831
files, as well as detection of many other extraordinary conditions.