79
<p>The size of the pages used in the underlying database can be specified by
80
calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB->set_pagesize()</a> method. The minimum page size is 512 bytes
81
and the maximum page size is 64K bytes, and must be a power of two. If
82
no page size is specified by the application, a page size is selected
83
based on the underlying filesystem I/O block size. (A page size selected
84
in this way has a lower limit of 512 bytes and an upper limit of 16K
86
<p>There are several issues to consider when selecting a pagesize: overflow
87
record sizes, locking, I/O efficiency, and recoverability.</p>
88
<p>First, the page size implicitly sets the size of an overflow record.
89
Overflow records are key or data items that are too large to fit on a
90
normal database page because of their size, and are therefore stored in
91
overflow pages. Overflow pages are pages that exist outside of the normal
92
database structure. For this reason, there is often a significant
93
performance penalty associated with retrieving or modifying overflow
94
records. Selecting a page size that is too small, and which forces the
95
creation of large numbers of overflow pages, can seriously impact the
96
performance of an application.</p>
97
<p>Second, in the Btree, Hash and Recno access methods, the finest-grained
98
lock that Berkeley DB acquires is for a page. (The Queue access method
99
generally acquires record-level locks rather than page-level locks.)
100
Selecting a page size that is too large, and which causes threads or
101
processes to wait because other threads of control are accessing or
102
modifying records on the same page, can impact the performance of your
104
<p>Third, the page size specifies the granularity of I/O from the database
105
to the operating system. Berkeley DB will give a page-sized unit of bytes to
106
the operating system to be scheduled for reading/writing from/to the
107
disk. For many operating systems, there is an internal <span class="bold"><strong>block
108
size</strong></span> which is used as the granularity of I/O from the operating system
109
to the disk. Generally, it will be more efficient for Berkeley DB to write
110
filesystem-sized blocks to the operating system and for the operating
111
system to write those same blocks to the disk.</p>
112
<p>Selecting a database page size smaller than the filesystem block size
113
may cause the operating system to coalesce or otherwise manipulate Berkeley DB
114
pages and can impact the performance of your application. When the page
115
size is smaller than the filesystem block size and a page written by
116
Berkeley DB is not found in the operating system's cache, the operating system
117
may be forced to read a block from the disk, copy the page into the
118
block it read, and then write out the block to disk, rather than simply
119
writing the page to disk. Additionally, as the operating system is
120
reading more data into its buffer cache than is strictly necessary to
121
satisfy each Berkeley DB request for a page, the operating system buffer cache
122
may be wasting memory.</p>
123
<p>Alternatively, selecting a page size larger than the filesystem block
124
size may cause the operating system to read more data than necessary.
125
On some systems, reading filesystem blocks sequentially may cause the
126
operating system to begin performing read-ahead. If requesting a single
127
database page implies reading enough filesystem blocks to satisfy the
128
operating system's criteria for read-ahead, the operating system may do
129
more I/O than is required.</p>
130
<p>Fourth, when using the Berkeley DB Transactional Data Store product, the page size may affect the errors
131
from which your database can recover See
132
<a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more
80
The size of the pages used in the underlying database can be specified
81
by calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB->set_pagesize()</a> method. The minimum page size is 512
82
bytes and the maximum page size is 64K bytes, and must be a power of
83
two. If no page size is specified by the application, a page size is
84
selected based on the underlying filesystem I/O block size. (A page
85
size selected in this way has a lower limit of 512 bytes and an upper
89
There are several issues to consider when selecting a pagesize: overflow
90
record sizes, locking, I/O efficiency, and recoverability.
93
First, the page size implicitly sets the size of an overflow record.
94
Overflow records are key or data items that are too large to fit on a
95
normal database page because of their size, and are therefore stored in
96
overflow pages. Overflow pages are pages that exist outside of the
97
normal database structure. For this reason, there is often a
98
significant performance penalty associated with retrieving or modifying
99
overflow records. Selecting a page size that is too small, and which
100
forces the creation of large numbers of overflow pages, can seriously
101
impact the performance of an application.
104
Second, in the Btree, Hash and Recno access methods, the finest-grained
105
lock that Berkeley DB acquires is for a page. (The Queue access method
106
generally acquires record-level locks rather than page-level locks.)
107
Selecting a page size that is too large, and which causes threads or
108
processes to wait because other threads of control are accessing or
109
modifying records on the same page, can impact the performance of your
113
Third, the page size specifies the granularity of I/O from the database
114
to the operating system. Berkeley DB will give a page-sized unit of
115
bytes to the operating system to be scheduled for reading/writing
116
from/to the disk. For many operating systems, there is an internal
117
<span class="bold"><strong>block size</strong></span> which is used as the
118
granularity of I/O from the operating system to the disk. Generally,
119
it will be more efficient for Berkeley DB to write filesystem-sized
120
blocks to the operating system and for the operating system to write
121
those same blocks to the disk.
124
Selecting a database page size smaller than the filesystem block size
125
may cause the operating system to coalesce or otherwise manipulate
126
Berkeley DB pages and can impact the performance of your application.
127
When the page size is smaller than the filesystem block size and a page
128
written by Berkeley DB is not found in the operating system's cache,
129
the operating system may be forced to read a block from the disk, copy
130
the page into the block it read, and then write out the block to disk,
131
rather than simply writing the page to disk. Additionally, as the
132
operating system is reading more data into its buffer cache than is
133
strictly necessary to satisfy each Berkeley DB request for a page, the
134
operating system buffer cache may be wasting memory.
137
Alternatively, selecting a page size larger than the filesystem block
138
size may cause the operating system to read more data than necessary.
139
On some systems, reading filesystem blocks sequentially may cause the
140
operating system to begin performing read-ahead. If requesting a
141
single database page implies reading enough filesystem blocks to
142
satisfy the operating system's criteria for read-ahead, the operating
143
system may do more I/O than is required.
146
Fourth, when using the Berkeley DB Transactional Data Store product,
147
the page size may affect the errors from which your database can
148
recover See <a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more information.
135
151
<div class="sect2" lang="en" xml:lang="en">
136
152
<div class="titlepage">
243
<p>The Btree and Hash access methods support the creation of multiple data
244
items for a single key item. By default, multiple data items are not
245
permitted, and each database store operation will overwrite any previous
246
data item for that key. To configure Berkeley DB for duplicate data items,
247
call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag. Only one
248
copy of the key will be stored for each set of duplicate data items.
249
If the Btree access method comparison routine returns that two keys
250
compare equally, it is undefined which of the two keys will be stored
251
and returned from future database operations.</p>
252
<p>By default, Berkeley DB stores duplicates in the order in which they were added,
253
that is, each new duplicate data item will be stored after any already
254
existing data items. This default behavior can be overridden by using
255
the <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> method and one of the <a href="../api_reference/C/dbcput.html#put_DB_AFTER" class="olink">DB_AFTER</a>, <a href="../api_reference/C/dbcput.html#put_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#put_DB_KEYFIRST" class="olink">DB_KEYFIRST</a> or <a href="../api_reference/C/dbcput.html#put_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags.
256
Alternatively, Berkeley DB may be configured to sort duplicate data items.</p>
257
<p>When stepping through the database sequentially, duplicate data items will
258
be returned individually, as a key/data pair, where the key item only
259
changes after the last duplicate data item has been returned. For this
260
reason, duplicate data items cannot be accessed using the
261
<a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> method, as it always returns the first of the duplicate data
262
items. Duplicate data items should be retrieved using a Berkeley DB cursor
263
interface such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> method.</p>
264
<p>There is a flag that permits applications to request the following data
265
item only if it <span class="bold"><strong>is</strong></span> a duplicate data item of the current entry,
266
see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information. There is a flag that
267
permits applications to request the following data item only if it
268
<span class="bold"><strong>is not</strong></span> a duplicate data item of the current entry, see
269
<a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and <a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a> for more information.</p>
270
<p>It is also possible to maintain duplicate records in sorted order. Sorting
271
duplicates will significantly increase performance when searching them
272
and performing equality joins, common operations when using secondary
273
indices. To configure Berkeley DB to sort duplicate data items, the application
274
must call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag (in
275
addition to the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag). In addition, a custom comparison
276
function may be specified using the <a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB->set_dup_compare()</a> method. If the
277
<a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given, but no comparison routine is specified,
278
then Berkeley DB defaults to the same lexicographical sorting used for Btree
279
keys, with shorter items collating before longer items.</p>
280
<p>If the duplicate data items are unsorted, applications may store identical
281
duplicate data items, or, for those that just like the way it sounds,
282
<span class="emphasis"><em>duplicate duplicates</em></span>.</p>
283
<p><span class="bold"><strong>In this release it is an error to attempt to store identical
284
duplicate data items when duplicates are being stored in a sorted order.</strong></span>
285
This restriction is expected to be lifted in a future release. There
286
is a flag that permits applications to disallow storing duplicate data
287
items when the database has been configured for sorted duplicates, see
288
<a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> for more information. Applications not wanting to
289
permit duplicate duplicates in databases configured for sorted
290
duplicates should begin using the <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag immediately.</p>
291
<p>For further information on how searching and insertion behaves in the
292
presence of duplicates (sorted or not), see the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB->put()</a>, <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> and
293
<a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> documentation.</p>
260
The Btree and Hash access methods support the creation of multiple data
261
items for a single key item. By default, multiple data items are not
262
permitted, and each database store operation will overwrite any
263
previous data item for that key. To configure Berkeley DB for
264
duplicate data items, call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
265
flag. Only one copy of the key will be stored for each set of
266
duplicate data items. If the Btree access method comparison routine
267
returns that two keys compare equally, it is undefined which of the two
268
keys will be stored and returned from future database operations.
271
By default, Berkeley DB stores duplicates in the order in which they
272
were added, that is, each new duplicate data item will be stored after
273
any already existing data items. This default behavior can be
274
overridden by using the <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> method and one of the <a href="../api_reference/C/dbcput.html#dbcput_DB_AFTER" class="olink">DB_AFTER</a>,
275
<a href="../api_reference/C/dbcput.html#dbcput_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYFIRST" class="olink">DB_KEYFIRST</a> or <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags. Alternatively,
276
Berkeley DB may be configured to sort duplicate data items.
279
When stepping through the database sequentially, duplicate data items
280
will be returned individually, as a key/data pair, where the key item
281
only changes after the last duplicate data item has been returned. For
282
this reason, duplicate data items cannot be accessed using the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a>
283
method, as it always returns the first of the duplicate data items.
284
Duplicate data items should be retrieved using a Berkeley DB cursor
285
interface such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> method.
288
There is a flag that permits applications to request the following data
289
item only if it <span class="bold"><strong>is</strong></span> a duplicate data
290
item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information.
291
There is a flag that permits applications to request the following data
292
item only if it <span class="bold"><strong>is not</strong></span> a duplicate
293
data item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and <a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a>
294
for more information.
297
It is also possible to maintain duplicate records in sorted order.
298
Sorting duplicates will significantly increase performance when
299
searching them and performing equality joins — both of which are
300
common operations when using secondary indices. To configure Berkeley
301
DB to sort duplicate data items, the application must call the
302
<a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag. Note that <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a>
303
automatically turns on the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag for you, so you do not
304
have to also set that flag; however, it is not an error to also set <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a>
305
when configuring for sorted duplicate records.
308
When configuring sorted duplicate records, you can also specify a
309
custom comparison function using the <a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB->set_dup_compare()</a> method. If
310
the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given, but no comparison routine is specified,
311
then Berkeley DB defaults to the same lexicographical sorting used for
312
Btree keys, with shorter items collating before longer items.
315
If the duplicate data items are unsorted, applications may store
316
identical duplicate data items, or, for those that just like the way it
317
sounds, <span class="emphasis"><em>duplicate duplicates</em></span>.
320
<span class="bold"><strong>It is an error to attempt to store identical
321
duplicate data items when duplicates are being stored in a sorted
322
order.</strong></span>. Any such attempt results in the
323
<code class="literal">DB_KEYEXISTS</code> return code.
326
For databases configured for duplicates but which are not stored in
327
sorted order, you can disallow storing identical duplicate data items
328
by using the <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag. As is the case with sorted duplicate
329
database, an attempt to store identical data items when <a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a>
330
is specified results in the <code class="literal">DB_KEYEXISTS</code> return
334
For further information on how searching and insertion behaves in the
335
presence of duplicates (sorted or not), see the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB->put()</a>,
336
<a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> and <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> documentation.
295
339
<div class="sect2" lang="en" xml:lang="en">
296
340
<div class="titlepage">