1
package Bio::DB::GFF::Adaptor::dbi::pg_fts;
3
# $Id: pg_fts.pm,v 1.2.4.1 2006/10/02 23:10:16 sendu Exp $
7
Bio::DB::GFF::Adaptor::dbi::pg_fts -- Database adaptor for a specific postgres schema with a TSearch2 implementation
11
#create new GFF database connection
12
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg_fts',
13
-dsn => 'dbi:Pg:dbname=worm');
15
#add full text indexing 'stuff'
16
#assumes that TSearch2 is available to PostgreSQL
17
#this will take a VERY long time for a reasonably large database
18
$db->install_TSearch2();
21
#we don't like full text searching...
22
$db->remove_TSearch2();
26
This adaptor is based on Bio::DB::GFF::Adaptor::dbi::pg but it implements
27
the TSearch2 PostgreSQL contrib module for fast full text searching. To
28
use this module with your PostgreSQL GFF database, you need to make
29
TSearch2 available in the database.
31
To use this adaptor, follow these steps:
35
=item Install TSearch2 contrib module for Pg
37
Can be as easy as `sudo yum install postgresql-contrib`, or you may
38
need to recompile PostgreSQL to include it. See
39
L<http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html>
42
=item Load the TSearch2 functions to you database
44
% cat tsearch2.sql | psql <your database>
46
=item Load your data using the pg adaptor:
48
% bp_pg_bulk_load_gff.pl -c -d yeast saccharomyces_cerevisiae.gff
52
% bp_load_gff.pl -c -d yeast -a dbi::pg saccharomyces_cerevisiae.gff
54
=item Add GFF/TSearch2 specific modifications
56
Execute a perl script like this one:
63
my $db = Bio::DB::GFF->new(
64
-adaptor => 'dbi::pg_fts',
65
-dsn => 'dbi:Pg:dbname=yeast',
69
print "Installing TSearch2 columns...\n";
71
$db->install_TSearch2();
77
Note that this last step will take a long time. For a S. cerevisiae
78
database with 15K rows, it took over an hour on my laptop, and
79
with a C. elegans database (~10 million rows) it took well over a day.
81
If at some point you add more data you your database, you need to run
82
a similar script to the one above, only executing the update_TSearch2()
83
method. Finally, if you want to remove the TSearch2 columns from your
84
database and go back to using the pg adaptor, you can execute a script
85
like the one above, only executing the remove_TSearch2() method.
87
=head1 NOTES ABOUT TSearch2 SEARCHING
89
You should know a few things about how searching with TSearch2 works in
90
the GBrowse enviroment:
96
TSearch2 does not do wild cards, so you should encourage your users not
97
to use them. If wild cards are used, the adaptor will fall back on
98
an ILIKE search, which will be much slower.
102
However, TSearch2 does do 'word stemming'. That is, if you search
103
for 'copy', it will find 'copy', 'copies', and 'copied'.
107
TSearch2 does not do phrase searching; all of the terms in the
108
search string are ANDed together.
112
=head1 ACKNOWLEDGEMENTS
114
Special thanks to Russell Smithies and Paul Smale at AgResearch in
115
New Zealand for giving me their recipe for doing full text indexing
120
Please report bugs to the BioPerl and/or GBrowse mailing lists
121
(L<mailto:bioperl-l@lists.open-bio.org> and L<mailto:gmod-gbrowse@lists.sourceforge.net>
126
Please see L<Bio::DB::GFF::Adaptor::dbi::pg> for more information
127
about tuning your PostgreSQL server for GFF data, and for general
128
information about GFF database access, see L<Bio::DB::GFF>.
132
Scott Cain, cain@cshl.edu
138
# a simple postgres adaptor
140
use Bio::DB::GFF::Adaptor::dbi;
141
use base qw(Bio::DB::GFF::Adaptor::dbi::pg);
143
use constant FULLTEXTSEARCH => <<END;
144
SELECT distinct gclass,gname,fattribute_value
145
FROM fgroup,fattribute_to_feature,fdata
146
WHERE fgroup.gid=fdata.gid
147
AND fdata.fid=fattribute_to_feature.fid
148
AND (fattribute_to_feature.idxfti @@ to_tsquery('default', ?))
152
use constant FULLTEXTWILDCARD => <<END;
153
SELECT distinct gclass,gname,fattribute_value
154
FROM fgroup,fattribute_to_feature,fdata
155
WHERE fgroup.gid=fdata.gid
156
AND fdata.fid=fattribute_to_feature.fid
157
AND lower(fattribute_to_feature.fattribute_value) LIKE lower(?)
163
my $self = $class->SUPER::new(@_);
170
Usage : @search_results = $db->search_notes("full text string",$limit)
171
Function: Search the notes for a text string, using PostgreSQL TSearch2
172
Returns : array of results
173
Args : full text search string, and an optional row limit
176
This is based on the mysql-specific method that makes use of the TSearch2
177
functionality in PosgreSQL's contrib directory. Given a search string,
178
it performs a full-text search of the notes table and returns an array
179
of results. Each row of the returned array is a arrayref containing
180
the following fields:
182
column 1 A Bio::DB::GFF::Featname object, for passing to segment()
183
column 2 The text of the note
184
column 3 A relevance score.
190
my ($search_string,$limit) = @_;
192
my @terms = split /\s+/, $search_string;
195
if ($search_string =~ /\*/) {
196
$search_string =~ tr/*/%/s;
197
my $query = FULLTEXTWILDCARD;
198
$query .= " limit $limit" if defined $limit;
199
$sth = $self->dbh->do_query($query,$search_string);
201
elsif (@terms == 1) {
202
my $query = FULLTEXTSEARCH;
203
$query .= " limit $limit" if defined $limit;
204
$sth = $self->dbh->do_query($query,$search_string);
207
my $query = FULLTEXTSEARCH;
208
my $andstring = join (' & ', @terms);
209
# $query .= qq{ AND (fattribute_to_feature.fattribute_value ILIKE '\%$search_string%')};
210
$query .= " LIMIT $limit" if defined $limit;
211
$sth = $self->dbh->do_query($query,$andstring);
215
while (my ($class,$name,$note) = $sth->fetchrow_array) {
217
next unless $class && $name; # sorry, ignore NULL objects
218
my $featname = Bio::DB::GFF::Featname->new($class=>$name);
220
push @results,[$featname,$note,0]; #gbrowse expects a score, but
221
#pg doesn't give one, thus the 0
227
=head2 make_features_by_name_where_part
229
Title : make_features_by_name_where_part
230
Function: constructs a TSearch2-compliant WHERE clause for a name search
235
#need a make_features_by_name_where_part method to override pg
236
sub make_features_by_name_where_part {
238
my ($class,$name) = @_;
240
my @terms = split /\s+/, $name;
244
return ("fgroup.gclass=? AND lower(fgroup.gname) LIKE lower(?)",$class,$name);
247
my $where_str = "fgroup.gclass=? AND (fgroup.idxfti @@ to_tsquery('default', ?)) ";
249
return ($where_str,$class,$name);
252
my $andstring = join (' & ', @terms);
253
# $where_str .= qq{ AND (fgroup.gname ILIKE '\%$name%')};
254
return ($where_str,$class,$andstring);
259
=head2 install_TSearch2
261
Title : install_TSearch2
262
Function: installs schema modifications for use with TSearch2
263
Usage : $db->install_TSearch2
269
#needs method for installing TSearch2 (does that mean that the SQL for
270
#creating the tables and functions should go in here? That would be
271
#the safest and easiest thing to do
272
sub install_TSearch2 {
275
my $dbh = $self->features_db;
277
$dbh->do('ALTER TABLE fattribute_to_feature ADD COLUMN idxFTI tsvector')
278
or $self->throw('adding FTI column to f_to_f failed');
280
$dbh->do('ALTER TABLE fgroup ADD COLUMN idxFTI tsvector')
281
or $self->throw('adding FTI column to fgroup failed');
283
$self->update_TSearch2();
288
=head2 update_TSearch2
290
Title : update_TSearch2
291
Function: Updates TSearch2 columns
292
Usage : $db->update_TSearch2
298
sub update_TSearch2 {
301
my $dbh = $self->features_db;
303
$self->warn('updating full text column; this may take a very long time...');
304
$dbh->do("UPDATE fattribute_to_feature "
305
."SET idxFTI= to_tsvector('default', fattribute_value) "
306
."WHERE idxFTI IS NULL")
307
or $self->throw('updating fti column failed');
308
$dbh->do("UPDATE fgroup "
309
."SET idxFTI= to_tsvector('default', gname) "
310
."WHERE idxFTI IS NULL")
311
or $self->throw('updating fgroup fti column failed');
313
$self->warn('Preliminary optimization of database; this may also take a long time...');
314
$dbh->do('VACUUM FULL ANALYZE')
315
or $self->throw('vacuum failed');
317
$self->warn('Updating full text index; again, this may take a long time');
318
$dbh->do('CREATE INDEX idxFTI_idx ON fattribute_to_feature '
319
.'USING gist(idxFTI)')
320
or $self->warn('creating full text index failed');
321
$dbh->do('CREATE INDEX fgroup_idxFTI_idx ON fgroup '
322
.'USING gist(idxFTI)')
323
or $self->warn('creating fgroup full text index failed');
325
$self->warn('Optimizing database; hopefully, this will not take as long as other steps');
326
$dbh->do('VACUUM FULL ANALYZE');
327
$dbh->do("SELECT set_curcfg('default')");
332
=head2 remove_TSearch2
334
Title : remove_TSearch2
335
Function: Removes TSearch2 columns
336
Usage : $db->remove_TSearch2
341
sub remove_TSearch2 {
344
my $dbh = $self->features_db;
346
$self->warn('Removing full text search capabilities');
347
$dbh->do('DROP INDEX idxFTI_idx')
348
or $self->throw('dropping full text index failed');
349
$dbh->do('DROP INDEX fgroup_idxFTI_idx')
350
or $self->throw('dropping full text index failed');
352
$dbh->do('ALTER TABLE fattribute_to_feature DROP COLUMN idxFTI')
353
or $self->throw('dropping full text column failed');
354
$dbh->do('ALTER TABLE fgroup DROP COLUMN idxFTI')
355
or $self->throw('dropping full text column failed');