1
src/backend/access/gin/README
6
Gin was sponsored by jfg://networks (http://www.jfg-networks.com/)
8
Gin stands for Generalized Inverted Index and should be considered as a genie,
11
Generalized means that the index does not know which operation it accelerates.
12
It instead works with custom strategies, defined for specific data types (read
13
"Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin
14
is similar to GiST and differs from btree indices, which have predefined,
15
comparison-based operations.
17
An inverted index is an index structure storing a set of (key, posting list)
18
pairs, where 'posting list' is a set of heap rows in which the key occurs.
19
(A text document would usually contain many keys.) The primary goal of
20
Gin indices is support for highly scalable, full-text search in PostgreSQL.
22
A Gin index consists of a B-tree index constructed over key values,
23
where each key is an element of some indexed items (element of array, lexeme
24
for tsvector) and where each tuple in a leaf page contains either a pointer to
25
a B-tree over item pointers (posting tree), or a simple list of item pointers
26
(posting list) if the list is small enough.
28
Note: There is no delete operation in the key (entry) tree. The reason for
29
this is that in our experience, the set of distinct words in a large corpus
30
changes very slowly. This greatly simplifies the code and concurrency
33
Core PostgreSQL includes built-in Gin support for one-dimensional arrays
34
(eg. integer[], text[]). The following operations are available:
36
* contains: value_array @> query_array
37
* overlaps: value_array && query_array
38
* is contained by: value_array <@ query_array
43
=# create index txt_idx on aa using gin(a);
49
* Write-Ahead Logging (WAL). (Recoverability from crashes.)
50
* User-defined opclasses. (The scheme is similar to GiST.)
51
* Optimized index creation (Makes use of maintenance_work_mem to accumulate
53
* Text search support via an opclass
54
* Soft upper limit on the returned results set using a GUC variable:
55
gin_fuzzy_search_limit
60
There are often situations when a full-text search returns a very large set of
61
results. Since reading tuples from the disk and sorting them could take a
62
lot of time, this is unacceptable for production. (Note that the search
65
Such queries usually contain very frequent lexemes, so the results are not
66
very helpful. To facilitate execution of such queries Gin has a configurable
67
soft upper limit on the size of the returned set, determined by the
68
'gin_fuzzy_search_limit' GUC variable. This is set to 0 by default (no
71
If a non-zero search limit is set, then the returned set is a subset of the
72
whole result set, chosen at random.
74
"Soft" means that the actual number of returned results could differ
75
from the specified limit, depending on the query and the quality of the
76
system's random number generator.
78
From experience, a value of 'gin_fuzzy_search_limit' in the thousands
79
(eg. 5000-20000) works well. This means that 'gin_fuzzy_search_limit' will
80
have no effect for queries returning a result set with less tuples than this
86
The "items" that a GIN index indexes are composite values that contain
87
zero or more "keys". For example, an item might be an integer array, and
88
then the keys would be the individual integer values. The index actually
89
stores and searches for the key values, not the items per se. In the
90
pg_opclass entry for a GIN opclass, the opcintype is the data type of the
91
items, and the opckeytype is the data type of the keys. GIN is optimized
92
for cases where items contain many keys and the same key values appear
93
in many different items.
95
A GIN index contains a metapage, a btree of key entries, and possibly
96
"posting tree" pages, which hold the overflow when a key entry acquires
97
too many heap tuple pointers to fit in a btree page. Additionally, if the
98
fast-update feature is enabled, there can be "list pages" holding "pending"
99
key entries that haven't yet been merged into the main btree. The list
100
pages have to be scanned linearly when doing a search, so the pending
101
entries should be merged into the main btree before there get to be too
102
many of them. The advantage of the pending list is that bulk insertion of
103
a few thousand entries can be much faster than retail insertion. (The win
104
comes mainly from not having to do multiple searches/insertions when the
105
same key appears in multiple new heap tuples.)
107
Key entries are nominally of the same IndexEntry format as used in other
108
index types, but since a leaf key entry typically refers to multiple heap
109
tuples, there are significant differences. (See GinFormTuple, which works
110
by building a "normal" index tuple and then modifying it.) The points to
113
* In a single-column index, a key tuple just contains the key datum, but
114
in a multi-column index, a key tuple contains the pair (column number,
115
key datum) where the column number is stored as an int2. This is needed
116
to support different key data types in different columns. This much of
117
the tuple is built by index_form_tuple according to the usual rules.
118
The column number (if present) can never be null, but the key datum can
119
be, in which case a null bitmap is present as usual. (As usual for index
120
tuples, the size of the null bitmap is fixed at INDEX_MAX_KEYS.)
122
* If the key datum is null (ie, IndexTupleHasNulls() is true), then
123
just after the nominal index data (ie, at offset IndexInfoFindDataOffset
124
or IndexInfoFindDataOffset + sizeof(int2)) there is a byte indicating
125
the "category" of the null entry. These are the possible categories:
126
1 = ordinary null key value extracted from an indexable item
127
2 = placeholder for zero-key indexable item
128
3 = placeholder for null indexable item
129
Placeholder null entries are inserted into the index because otherwise
130
there would be no index entry at all for an empty or null indexable item,
131
which would mean that full index scans couldn't be done and various corner
132
cases would give wrong answers. The different categories of null entries
133
are treated as distinct keys by the btree, but heap itempointers for the
134
same category of null entry are merged into one index entry just as happens
135
with ordinary key entries.
137
* In a key entry at the btree leaf level, at the next SHORTALIGN boundary,
138
there is an array of zero or more ItemPointers, which store the heap tuple
139
TIDs for which the indexable items contain this key. This is called the
140
"posting list". The TIDs in a posting list must appear in sorted order.
141
If the list would be too big for the index tuple to fit on an index page,
142
the ItemPointers are pushed out to a separate posting page or pages, and
143
none appear in the key entry itself. The separate pages are called a
144
"posting tree"; they are organized as a btree of ItemPointer values.
145
Note that in either case, the ItemPointers associated with a key can
146
easily be read out in sorted order; this is relied on by the scan
149
* The index tuple header fields of a leaf key entry are abused as follows:
151
1) Posting list case:
153
* ItemPointerGetBlockNumber(&itup->t_tid) contains the offset from index
154
tuple start to the posting list.
155
Access macros: GinGetPostingOffset(itup) / GinSetPostingOffset(itup,n)
157
* ItemPointerGetOffsetNumber(&itup->t_tid) contains the number of elements
158
in the posting list (number of heap itempointers).
159
Access macros: GinGetNPosting(itup) / GinSetNPosting(itup,n)
161
* If IndexTupleHasNulls(itup) is true, the null category byte can be
162
accessed/set with GinGetNullCategory(itup,gs) / GinSetNullCategory(itup,gs,c)
164
* The posting list can be accessed with GinGetPosting(itup)
166
2) Posting tree case:
168
* ItemPointerGetBlockNumber(&itup->t_tid) contains the index block number
169
of the root of the posting tree.
170
Access macros: GinGetPostingTree(itup) / GinSetPostingTree(itup, blkno)
172
* ItemPointerGetOffsetNumber(&itup->t_tid) contains the magic number
173
GIN_TREE_POSTING, which distinguishes this from the posting-list case
174
(it's large enough that that many heap itempointers couldn't possibly
175
fit on an index page). This value is inserted automatically by the
176
GinSetPostingTree macro.
178
* If IndexTupleHasNulls(itup) is true, the null category byte can be
179
accessed/set with GinGetNullCategory(itup) / GinSetNullCategory(itup,c)
181
* The posting list is not present and must not be accessed.
183
Use the macro GinIsPostingTree(itup) to determine which case applies.
185
In both cases, itup->t_info & INDEX_SIZE_MASK contains actual total size of
186
tuple, and the INDEX_VAR_MASK and INDEX_NULL_MASK bits have their normal
187
meanings as set by index_form_tuple.
189
Index tuples in non-leaf levels of the btree contain the optional column
190
number, key datum, and null category byte as above. They do not contain
191
a posting list. ItemPointerGetBlockNumber(&itup->t_tid) is the downlink
192
to the next lower btree level, and ItemPointerGetOffsetNumber(&itup->t_tid)
193
is InvalidOffsetNumber. Use the access macros GinGetDownlink/GinSetDownlink
194
to get/set the downlink.
196
Index entries that appear in "pending list" pages work a tad differently as
197
well. The optional column number, key datum, and null category byte are as
198
for other GIN index entries. However, there is always exactly one heap
199
itempointer associated with a pending entry, and it is stored in the t_tid
200
header field just as in non-GIN indexes. There is no posting list.
201
Furthermore, the code that searches the pending list assumes that all
202
entries for a given heap tuple appear consecutively in the pending list and
203
are sorted by the column-number-plus-key-datum. The GIN_LIST_FULLROW page
204
flag bit tells whether entries for a given heap tuple are spread across
205
multiple pending-list pages. If GIN_LIST_FULLROW is set, the page contains
206
all the entries for one or more heap tuples. If GIN_LIST_FULLROW is clear,
207
the page contains entries for only one heap tuple, *and* they are not all
208
the entries for that tuple. (Thus, a heap tuple whose entries do not all
209
fit on one pending-list page must have those pages to itself, even if this
210
results in wasting much of the space on the preceding page and the last
216
* Gin doesn't use scan->kill_prior_tuple & scan->ignore_killed_tuples
217
* Gin searches entries only by equality matching, or simple range
218
matching using the "partial match" feature.
225
* Opclasses for more types (no programming, just many catalog changes)
229
* Replace B-tree of entries to something like GiST
234
Original work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov