2
<!--Copyright 1997-2002 by Sleepycat Software, Inc.-->
3
<!--All rights reserved.-->
4
<!--See the file LICENSE for redistribution information.-->
7
<title>Berkeley DB Reference Guide: Equality Join</title>
8
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
9
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
12
<a name="2"><!--meow--></a><a name="3"><!--meow--></a><a name="4"><!--meow--></a>
13
<table width="100%"><tr valign=top>
14
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Access Methods</dl></h3></td>
15
<td align=right><a href="../../ref/am/curdup.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../reftoc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/am/count.html"><img src="../../images/next.gif" alt="Next"></a>
18
<h1 align=center>Equality Join</h1>
19
<p>Berkeley DB supports "equality" (also known as "natural"), joins on secondary
20
indices. An equality join is a method of retrieving data from a primary
21
database using criteria stored in a set of secondary indices. It
22
requires the data be organized as a primary database which contains the
23
primary key and primary data field, and a set of secondary indices.
24
Each of the secondary indices is indexed by a different secondary key,
25
and, for each key in a secondary index, there is a set of duplicate data
26
items that match the primary keys in the primary database.
27
<p>For example, let's assume the need for an application that will return
28
the names of stores in which one can buy fruit of a given color. We
29
would first construct a primary database that lists types of fruit as
30
the key item, and the store where you can buy them as the data item:
31
<p><blockquote><p><table border=1>
32
<tr><th>Primary key:</th><th>Primary data:</th></tr>
33
<tr> <td align=left>apple</td> <td align=left>Convenience Store</td> </tr>
34
<tr> <td align=left>blueberry</td> <td align=left>Farmer's Market</td> </tr>
35
<tr> <td align=left>peach</td> <td align=left>Shopway</td> </tr>
36
<tr> <td align=left>pear</td> <td align=left>Farmer's Market</td> </tr>
37
<tr> <td align=left>raspberry</td> <td align=left>Shopway</td> </tr>
38
<tr> <td align=left>strawberry</td> <td align=left>Farmer's Market</td> </tr>
40
<p>We would then create a secondary index with the key <b>color</b>, and,
41
as the data items, the names of fruits of different colors.
42
<p><blockquote><p><table border=1>
43
<tr><th>Secondary key:</th><th>Secondary data:</th></tr>
44
<tr> <td align=left>blue</td> <td align=left>blueberry</td> </tr>
45
<tr> <td align=left>red</td> <td align=left>apple</td> </tr>
46
<tr> <td align=left>red</td> <td align=left>raspberry</td> </tr>
47
<tr> <td align=left>red</td> <td align=left>strawberry</td> </tr>
48
<tr> <td align=left>yellow</td> <td align=left>peach</td> </tr>
49
<tr> <td align=left>yellow</td> <td align=left>pear</td> </tr>
51
<p>This secondary index would allow an application to look up a color, and
52
then use the data items to look up the stores where the colored fruit
53
could be purchased. For example, by first looking up <b>blue</b>,
54
the data item <b>blueberry</b> could be used as the lookup key in the
55
primary database, returning <b>Farmer's Market</b>.
56
<p>Your data must be organized in the following manner in order to use the
57
<a href="../../api_c/db_join.html">DB->join</a> method:
59
<p><li>The actual data should be stored in the database represented by the
60
<a href="../../api_c/db_class.html">DB</a> object used to invoke this function. Generally, this
61
<a href="../../api_c/db_class.html">DB</a> object is called the <i>primary</i>.
62
<p><li>Secondary indices should be stored in separate databases, whose keys
63
are the values of the secondary indices and whose data items are the
64
primary keys corresponding to the records having the designated
65
secondary key value. It is acceptable (and expected) that there may be
66
duplicate entries in the secondary indices.
67
<p>These duplicate entries should be sorted for performance reasons, although
68
it is not required. For more information see the <a href="../../api_c/db_set_flags.html#DB_DUPSORT">DB_DUPSORT</a> flag
69
to the <a href="../../api_c/db_set_flags.html">DB->set_flags</a> method.
71
<p>What the <a href="../../api_c/db_join.html">DB->join</a> method does is review a list of secondary keys, and,
72
when it finds a data item that appears as a data item for all of the
73
secondary keys, it uses that data items as a lookup into the primary
74
database, and returns the associated data item.
75
<p>If there were a another secondary index that had as its key the
76
<b>cost</b> of the fruit, a similar lookup could be done on stores
77
where inexpensive fruit could be purchased:
78
<p><blockquote><p><table border=1>
79
<tr><th>Secondary key:</th><th>Secondary data:</th></tr>
80
<tr> <td align=left>expensive</td> <td align=left>blueberry</td> </tr>
81
<tr> <td align=left>expensive</td> <td align=left>peach</td> </tr>
82
<tr> <td align=left>expensive</td> <td align=left>pear</td> </tr>
83
<tr> <td align=left>expensive</td> <td align=left>strawberry</td> </tr>
84
<tr> <td align=left>inexpensive</td> <td align=left>apple</td> </tr>
85
<tr> <td align=left>inexpensive</td> <td align=left>pear</td> </tr>
86
<tr> <td align=left>inexpensive</td> <td align=left>raspberry</td> </tr>
88
<p>The <a href="../../api_c/db_join.html">DB->join</a> method provides equality join functionality. While not
89
strictly cursor functionality, in that it is not a method off a cursor
90
handle, it is more closely related to the cursor operations than to the
91
standard <a href="../../api_c/db_class.html">DB</a> operations.
92
<p>It is also possible to do lookups based on multiple criteria in a single
93
operation. For example, it is possible to look up fruits that are both
94
red and expensive in a single operation. If the same fruit appeared as
95
a data item in both the color and expense indices, then that fruit name
96
would be used as the key for retrieval from the primary index, and would
97
then return the store where expensive, red fruit could be purchased.
99
<p>Consider the following three databases:
101
<p><dt>personnel<dd><p><ul type=disc>
103
<li>data = record containing name, address, phone number, job title
105
<p><dt>lastname<dd><p><ul type=disc>
109
<p><dt>jobs<dd><p><ul type=disc>
114
<p>Consider the following query:
115
<p><blockquote><pre>Return the personnel records of all people named smith with the job
116
title manager.</pre></blockquote>
117
<p>This query finds are all the records in the primary database (personnel)
118
for whom the criteria <b>lastname=smith and job title=manager</b> is
120
<p>Assume that all databases have been properly opened and have the
121
handles: pers_db, name_db, job_db. We also assume that we have an
122
active transaction to which the handle txn refers.
123
<p><blockquote><pre>DBC *name_curs, *job_curs, *join_curs;
130
memset(&key, 0, sizeof(key));
131
memset(&data, 0, sizeof(data));
134
name_db->cursor(name_db, txn, &name_curs, 0)) != 0)
137
key.size = sizeof("smith");
139
name_curs->c_get(name_curs, &key, &data, DB_SET)) != 0)
142
if ((ret = job_db->cursor(job_db, txn, &job_curs, 0)) != 0)
144
key.data = "manager";
145
key.size = sizeof("manager");
147
job_curs->c_get(job_curs, &key, &data, DB_SET)) != 0)
150
carray[0] = name_curs;
151
carray[1] = job_curs;
155
pers_db->join(pers_db, carray, &join_curs, 0)) != 0)
158
join_curs->c_get(join_curs, &key, &data, 0)) == 0) {
159
/* Process record returned in key/data. */
163
* If we exited the loop because we ran out of records,
164
* then it has completed successfully.
166
if (ret == DB_NOTFOUND)
170
if (join_curs != NULL &&
171
(tret = join_curs->c_close(join_curs)) != 0 && ret == 0)
173
if (name_curs != NULL &&
174
(tret = name_curs->c_close(name_curs)) != 0 && ret == 0)
176
if (job_curs != NULL &&
177
(tret = job_curs->c_close(job_curs)) != 0 && ret == 0)
182
<p>The name cursor is positioned at the beginning of the duplicate list
183
for <b>smith</b> and the job cursor is placed at the beginning of
184
the duplicate list for <b>manager</b>. The join cursor is returned
185
from the join method. This code then loops over the join cursor getting
186
the personnel records of each one until there are no more.
187
<table width="100%"><tr><td><br></td><td align=right><a href="../../ref/am/curdup.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../reftoc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/am/count.html"><img src="../../images/next.gif" alt="Next"></a>
189
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>