1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
8
CONTENT="Modular DocBook HTML Stylesheet Version 1.79"><LINK
10
HREF="mailto:pgsql-docs@postgresql.org"><LINK
12
TITLE="PostgreSQL 9.1beta1 Documentation"
13
HREF="index.html"><LINK
15
TITLE="Database Physical Storage"
16
HREF="storage.html"><LINK
18
TITLE="Database File Layout"
19
HREF="storage-file-layout.html"><LINK
21
TITLE="Free Space Map"
22
HREF="storage-fsm.html"><LINK
25
HREF="stylesheet.css"><META
26
HTTP-EQUIV="Content-Type"
27
CONTENT="text/html; charset=ISO-8859-1"><META
29
CONTENT="2011-04-27T21:20:33"></HEAD
35
SUMMARY="Header navigation table"
47
>PostgreSQL 9.1beta1 Documentation</A
56
TITLE="Database File Layout"
57
HREF="storage-file-layout.html"
66
TITLE="Database Physical Storage"
74
>Chapter 55. Database Physical Storage</TD
80
TITLE="Database Physical Storage"
89
TITLE="Free Space Map"
90
HREF="storage-fsm.html"
108
>This section provides an overview of <ACRONYM
112
Oversized-Attribute Storage Technique).</P
117
> uses a fixed page size (commonly
118
8 kB), and does not allow tuples to span multiple pages. Therefore, it is
119
not possible to store very large field values directly. To overcome
120
this limitation, large field values are compressed and/or broken up into
121
multiple physical rows. This happens transparently to the user, with only
122
small impact on most of the backend code. The technique is affectionately
128
>"the best thing since sliced bread"</SPAN
131
>Only certain data types support <ACRONYM
134
> — there is no need to
135
impose the overhead on data types that cannot produce large field values.
139
>, a data type must have a variable-length
143
>) representation, in which the first 32-bit word of any
144
stored value contains the total length of the value in bytes (including
148
> does not constrain the rest of the representation.
149
All the C-level functions supporting a <ACRONYM
152
>-able data type must
153
be careful to handle <ACRONYM
156
>ed input values. (This is normally done
159
>PG_DETOAST_DATUM</CODE
160
> before doing anything with an input
161
value, but in some cases more efficient approaches are possible.)</P
166
> usurps two bits of the varlena length word (the high-order
167
bits on big-endian machines, the low-order bits on little-endian machines),
168
thereby limiting the logical size of any value of a <ACRONYM
172
data type to 1 GB (2<SUP
174
> - 1 bytes). When both bits are zero,
175
the value is an ordinary un-<ACRONYM
178
>ed value of the data type, and
179
the remaining bits of the length word give the total datum size (including
180
length word) in bytes. When the highest-order or lowest-order bit is set,
181
the value has only a single-byte header instead of the normal four-byte
182
header, and the remaining bits give the total datum size (including length
183
byte) in bytes. As a special case, if the remaining bits are all zero
184
(which would be impossible for a self-inclusive length), the value is a
185
pointer to out-of-line data stored in a separate TOAST table. (The size of
186
a TOAST pointer is given in the second byte of the datum.)
187
Values with single-byte headers aren't aligned on any particular
188
boundary, either. Lastly, when the highest-order or lowest-order bit is
189
clear but the adjacent bit is set, the content of the datum has been
190
compressed and must be decompressed before use. In this case the remaining
191
bits of the length word give the total size of the compressed datum, not the
192
original data. Note that compression is also possible for out-of-line data
193
but the varlena header does not tell whether it has occurred —
194
the content of the TOAST pointer tells that, instead.</P
196
>If any of the columns of a table are <ACRONYM
199
>-able, the table will
200
have an associated <ACRONYM
203
> table, whose OID is stored in the table's
214
>ed values are kept in the <ACRONYM
218
described in more detail below.</P
220
>The compression technique used is a fairly simple and very fast member
221
of the LZ family of compression techniques. See
224
>src/backend/utils/adt/pg_lzcompress.c</TT
225
> for the details.</P
227
>Out-of-line values are divided (after compression if used) into chunks of at
230
>TOAST_MAX_CHUNK_SIZE</TT
231
> bytes (by default this value is chosen
232
so that four chunk rows will fit on a page, making it about 2000 bytes).
234
as a separate row in the <ACRONYM
237
> table for the owning table. Every
241
> table has the columns <TT
245
identifying the particular <ACRONYM
252
> (a sequence number for the chunk within its value),
256
> (the actual data of the chunk). A unique index
264
retrieval of the values. A pointer datum representing an out-of-line
268
>ed value therefore needs to store the OID of the
272
> table in which to look and the OID of the specific value
276
>). For convenience, pointer datums also store the
277
logical datum size (original uncompressed data length) and actual stored size
278
(different if compression was applied). Allowing for the varlena header bytes,
279
the total size of a <ACRONYM
282
> pointer datum is therefore 18 bytes
283
regardless of the actual size of the represented value.</P
288
> code is triggered only
289
when a row value to be stored in a table is wider than
292
>TOAST_TUPLE_THRESHOLD</TT
293
> bytes (normally 2 kB).
297
> code will compress and/or move
298
field values out-of-line until the row value is shorter than
301
>TOAST_TUPLE_TARGET</TT
302
> bytes (also normally 2 kB)
303
or no more gains can be had. During an UPDATE
304
operation, values of unchanged fields are normally preserved as-is; so an
305
UPDATE of a row with out-of-line values incurs no <ACRONYM
309
none of the out-of-line values change.</P
314
> code recognizes four different strategies for storing
328
> prevents either compression or
329
out-of-line storage; furthermore it disables use of single-byte headers
331
This is the only possible strategy for
332
columns of non-<ACRONYM
343
> allows both compression and out-of-line
344
storage. This is the default for most <ACRONYM
348
Compression will be attempted first, then out-of-line storage if
349
the row is still too big.
357
> allows out-of-line storage but not
358
compression. Use of <TT
362
make substring operations on wide <TT
369
> columns faster (at the penalty of increased storage
370
space) because these operations are optimized to fetch only the
371
required parts of the out-of-line value when it is not compressed.
379
> allows compression but not out-of-line
380
storage. (Actually, out-of-line storage will still be performed
381
for such columns, but only as a last resort when there is no other
382
way to make the row small enough to fit on a page.)
391
>-able data type specifies a default strategy for columns
392
of that data type, but the strategy for a given table column can be altered
395
>ALTER TABLE SET STORAGE</TT
398
>This scheme has a number of advantages compared to a more straightforward
399
approach such as allowing row values to span pages. Assuming that queries are
400
usually qualified by comparisons against relatively small key values, most of
401
the work of the executor will be done using the main row entry. The big values
405
>ed attributes will only be pulled out (if selected at all)
406
at the time the result set is sent to the client. Thus, the main table is much
407
smaller and more of its rows fit in the shared buffer cache than would be the
408
case without any out-of-line storage. Sort sets shrink also, and sorts will
409
more often be done entirely in memory. A little test showed that a table
410
containing typical HTML pages and their URLs was stored in about half of the
411
raw data size including the <ACRONYM
414
> table, and that the main table
415
contained only about 10% of the entire data (the URLs and some small HTML
416
pages). There was no run time difference compared to an un-<ACRONYM
420
comparison table, in which all the HTML pages were cut down to 7 kB to fit.</P
427
SUMMARY="Footer navigation table"
438
HREF="storage-file-layout.html"
456
HREF="storage-fsm.html"
466
>Database File Layout</TD
b'\\ No newline at end of file'