1
package Bio::DB::SeqFeature::Store::berkeleydb;
3
# $Id: berkeleydb.pm,v 1.5.4.5 2006/11/22 20:27:47 lstein Exp $
7
use base 'Bio::DB::SeqFeature::Store';
8
use Bio::DB::GFF::Util::Rearrange 'rearrange';
11
use Fcntl qw(O_RDWR O_CREAT);
12
use File::Temp 'tempdir';
13
use File::Path 'rmtree','mkpath';
14
use constant BINSIZE => 10_000;
15
use constant MININT => -999_999_999_999;
16
use constant MAXINT => 999_999_999_999;
20
Bio::DB::SeqFeature::Store::berkeleydb -- Storage and retrieval of sequence annotation data in Berkeleydb files
24
use Bio::DB::SeqFeature::Store;
26
# Create a database from the feature files located in /home/fly4.3 and store
27
# the database index in the same directory:
28
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
29
-dir => '/home/fly4.3');
31
# Create a database that will monitor the files in /home/fly4.3, but store
32
# the indexes in /var/databases/fly4.3
33
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
34
-dsn => '/var/databases/fly4.3',
35
-dir => '/home/fly4.3');
37
# Create a feature database from scratch
38
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
39
-dsn => '/var/databases/fly4.3',
42
# get a feature from somewhere
43
my $feature = Bio::SeqFeature::Generic->new(...);
46
$db->store($feature) or die "Couldn't store!";
48
# primary ID of the feature is changed to indicate its primary ID
50
my $id = $feature->primary_id;
52
# get the feature back out
53
my $f = $db->fetch($id);
55
# change the feature and update it
57
$db->update($f) or $self->throw("Couldn't update!");
59
# use the GFF3 loader to do a bulk write:
60
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db,
63
$loader->load('/home/fly4.3/dmel-all.gff');
68
my @features = $db->fetch_many(@list_of_ids);
71
@features = $db->get_features_by_name('ZK909');
74
@features = $db->get_features_by_alias('sma-3');
77
@features = $db->get_features_by_type('gene');
80
@features = $db->get_features_by_location(-seq_id=>'Chr1',-start=>4000,-end=>600000);
83
@features = $db->get_features_by_attribute({description => 'protein kinase'})
85
# ...by the GFF "Note" field
86
@result_list = $db->search_notes('kinase');
88
# ...by arbitrary combinations of selectors
89
@features = $db->features(-name => $name,
94
-attributes => $attributes);
96
# ...using an iterator
97
my $iterator = $db->get_seq_stream(-name => $name,
102
-attributes => $attributes);
104
while (my $feature = $iterator->next_seq) {
105
# do something with the feature
108
# ...limiting the search to a particular region
109
my $segment = $db->segment('Chr1',5000=>6000);
110
my @features = $segment->features(-type=>['mRNA','match']);
112
# getting & storing sequence information
113
# Warning: this returns a string, and not a PrimarySeq object
114
$db->insert_sequence('Chr1','GATCCCCCGGGATTCCAAAA...');
115
my $sequence = $db->fetch_sequence('Chr1',5000=>6000);
117
# create a new feature in the database
118
my $feature = $db->new_feature(-primary_tag => 'mRNA',
125
Bio::DB::SeqFeature::Store::berkeleydb is the Berkeleydb adaptor for
126
Bio::DB::SeqFeature::Store. You will not create it directly, but
127
instead use Bio::DB::SeqFeature::Store-E<gt>new() to do so.
129
See L<Bio::DB::SeqFeature::Store> for complete usage instructions.
131
=head2 Using the berkeleydb adaptor
133
The Berkeley database consists of a series of Berkeleydb index files,
134
and a couple of special purpose indexes. You can create the index
135
files from scratch by creating a new database and calling
136
new_feature() repeatedly, you can create the database and then bulk
137
populate it using the GFF3 loader, or you can monitor a directory of
138
preexisting GFF3 and FASTA files and rebuild the indexes whenever one
139
or more of the fiels changes. The last mode is probably the most
144
=item The new() constructor
146
The new() constructor method all the arguments recognized by
147
Bio::DB::SeqFeature::Store, and a few additional ones.
154
-adaptor The name of the Adaptor class (default DBI::mysql)
156
-serializer The name of the serializer class (default Storable)
158
-index_subfeatures Whether or not to make subfeatures searchable
161
-cache Activate LRU caching feature -- size of cache
163
-compress Compresses features before storing them in database
166
Adaptor-specific arguments
171
-dsn Where the index files are stored
173
-dir Where the source (GFF3, FASTA) files are stored
175
-autoindex An alias for -dir.
177
-write Pass true to open the index files for writing.
179
-create Pass true to create the index files if they don't exist
182
-temp Pass true to create temporary index files that will
183
be deleted once the script exits.
187
To create an empty database which will be populated using calls to
188
store() or new_feature(), or which will be bulk-loaded using the GFF3
191
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
192
-dsn => '/var/databases/fly4.3',
195
To open a preexisting database in read-only mode:
197
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
198
-dsn => '/var/databases/fly4.3');
200
To open a preexisting database in read/write (update) mode:
202
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
203
-dsn => '/var/databases/fly4.3',
206
To monitor a set of GFF3 and FASTA files located in a directory and
207
create/update the database indexes as needed. The indexes will be
208
stored in a new subdirectory named "indexes":
210
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
211
-dir => '/var/databases/fly4.3');
213
As above, but store the source files and index files in separate directories:
215
$db = Bio::DB::SeqFeature::Store->new( -adaptor => 'berkeleydb',
216
-dsn => '/var/databases/fly4.3',
217
-dir => '/home/gff3_files/fly4.3');
219
B<-autoindex> is an alias for B<-dir>.
223
See L<Bio::DB::SeqFeature::Store> for all the access methods supported
224
by this adaptor. The various methods for storing and updating features
225
and sequences into the database are supported, but there is no
226
locking. If two processes try to update the same database
227
simultaneously, the database will likely become corrupted.
232
# object initialization
241
) = rearrange([['DSN','DB'],
243
['TMP','TEMP','TEMPORARY'],
244
[qw(WRITE WRITABLE)],
248
-d $autoindex or $self->throw("Invalid directory $autoindex");
249
$directory ||= "$autoindex/indexes";
251
$directory ||= $is_temporary ? File::Spec->tmpdir : '.';
253
my $pacname = __PACKAGE__;
254
if ($^O =~ /mswin/i) {
255
$pacname =~ s/:+/_/g;
257
$directory = tempdir($pacname.'_XXXXXX',
260
DIR=>$directory) if $is_temporary;
262
-d $directory or $self->throw("Invalid directory $directory");
264
$create++ if $is_temporary;
266
$self->throw("Can't write into the directory $directory")
267
if $write && !-w $directory;
270
$self->default_settings;
271
$self->directory($directory);
272
$self->temporary($is_temporary);
273
$self->_delete_databases() if $create;
274
$self->_open_databases($write,$create,$autoindex);
275
$self->_permissions($write,$create);
279
sub can_store_parentage { 1 }
283
my ($autodir) = rearrange([['DIR','AUTOINDEX']],@_);
284
return unless $autodir && -d $autodir;
288
opendir (my $D,$autodir) or $self->throw("Couldn't open directory $autodir for reading: $!");
290
my $fasta_files_present;
292
while (defined (my $node = readdir($D))) {
293
next if $node =~ /^\./;
294
my $path = "$autodir/$node";
295
next unless -f $path;
297
# skip fasta files - the Bio::DB::Fasta module indexes them on its own
298
if ($node =~ /\.(?:fa|fasta|dna)(?:\.gz)?$/) {
299
$fasta_files_present++;
304
next if $node =~ /\.(?:bdb|idx|index|stamp)/;
306
# skip autosave files, etc
307
next if $node =~ /^\#/;
308
next if $node =~ /~$/;
310
my $mtime = _mtime(\*_); # not a typo
311
$maxtime = $mtime if $mtime > $maxtime;
317
my $timestamp_time = _mtime($self->_mtime_path) || 0;
319
if ($maxtime > $timestamp_time) {
320
warn "Reindexing... this may take a while.";
321
$self->_permissions(1,1);
322
$self->_close_databases();
323
$self->_open_databases(1,1);
324
require Bio::DB::SeqFeature::Store::GFF3Loader
325
unless Bio::DB::SeqFeature::Store::GFF3Loader->can('new');
326
my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $self,
327
-sf_class => $self->seqfeature_class)
328
or $self->throw("Couldn't create GFF3Loader");
329
$loader->load(@reindex);
330
$self->_touch_timestamp;
333
if ($fasta_files_present) {
334
my $dna_db = Bio::DB::Fasta->new($autodir);
335
$self->dna_db($dna_db);
339
sub _open_databases {
341
my ($write,$create,$ignore_errors) = @_;
343
my $directory = $self->directory;
344
unless (-d $directory) { # directory does not exist
345
$create or $self->throw("Directory $directory does not exist and you did not specify the -create flag");
346
mkpath($directory) or $self->throw("Couldn't create database directory $directory: $!");
349
my $flags = O_RDONLY;
350
$flags |= O_RDWR if $write;
351
$flags |= O_CREAT if $create;
353
# Create the main database; this is a DB_HASH implementation
355
my $result = tie (%h,'DB_File',$self->_features_path,$flags,0666,$DB_HASH);
357
return if $ignore_errors; # autoindex set, so defer this
358
$self->throw("Couldn't tie: ".$self->_features_path . " $!");
366
# Create the index databases; these are DB_BTREE implementations with duplicates allowed.
367
local $DB_BTREE->{flags} = R_DUP;
368
$DB_BTREE->{compare} = sub { lc($_[0]) cmp lc($_[1]) };
369
for my $idx ($self->_index_files) {
370
my $path = $self->_qualify("$idx.idx");
372
tie(%db,'DB_File',$path,$flags,0666,$DB_BTREE)
373
or $self->throw("Couldn't tie $path: $!");
375
$self->index_db($idx=>\%db);
378
# Create the parentage database
380
tie (%p,'DB_File',$self->_parentage_path,$flags,0666,$DB_BTREE)
381
or $self->throw("Couldn't tie: ".$self->_parentage_path . $!);
383
$self->parentage_db(\%p);
385
if (-e $self->_fasta_path) {
386
my $dna_db = Bio::DB::Fasta->new($self->_fasta_path) or $self->throw("Can't reindex sequence file: $@");
387
$self->dna_db($dna_db);
390
my $mode = $write ? "+>>"
394
open (my $F,$mode,$self->_notes_path) or $self->throw($self->_notes_path.": $!");
398
sub commit { # reindex fasta files
400
if (my $fh = $self->{fasta_fh}) {
402
$self->dna_db(Bio::DB::Fasta->new($self->{fasta_file}));
403
} elsif (-d $self->directory) {
404
$self->dna_db(Bio::DB::Fasta->new($self->directory));
408
sub _close_databases {
411
$self->dna_db(undef);
412
$self->notes_db(undef);
413
$self->index_db($_=>undef) foreach $self->_index_files;
416
# do nothing -- new() with -create=>1 will do the trick
417
sub _init_database { }
419
sub _delete_databases {
421
for my $idx ($self->_index_files) {
422
my $path = $self->_qualify("$idx.idx");
425
unlink $self->_parentage_path;
426
unlink $self->_fasta_path;
427
unlink $self->_features_path;
428
unlink $self->_mtime_path;
431
sub _touch_timestamp {
433
my $tsf = $self->_mtime_path;
434
open (F,">$tsf") or $self->throw("Couldn't open $tsf: $!");
435
print F scalar(localtime);
445
my $primary_id = $obj->primary_id;
446
$self->_delete_indexes($obj,$primary_id) if $indexed && $primary_id;
447
$primary_id = $db->{'.next_id'}++ unless defined $primary_id;
448
$db->{$primary_id} = $self->freeze($obj);
449
$obj->primary_id($primary_id);
450
$self->_update_indexes($obj) if $indexed;
456
sub _delete_indexes {
459
# the additional "1" causes the index to be deleted
460
$self->_update_name_index($obj,$id,1);
461
$self->_update_type_index($obj,$id,1);
462
$self->_update_location_index($obj,$id,1);
463
$self->_update_attribute_index($obj,$id,1);
464
$self->_update_note_index($obj,$id,1);
471
my $obj = $self->thaw($db->{$id},$id);
475
sub _add_SeqFeature {
479
my $parent_id = (ref $parent ? $parent->primary_id : $parent)
480
or $self->throw("$parent should have a primary_id");
481
my $p = $self->parentage_db;
482
for my $child (@children) {
483
my $child_id = ref $child ? $child->primary_id : $child;
484
defined $child_id or $self->throw("no primary ID known for $child");
485
$p->{$parent_id} = $child_id;
489
sub _fetch_SeqFeatures {
493
my $parent_id = $parent->primary_id or $self->throw("$parent should have a primary_id");
494
my $index = $self->parentage_db;
495
my $db = tied %$index;
497
my @children_ids = $db->get_dup($parent_id);
498
my @children = map {$self->fetch($_)} @children_ids;
501
my $regexp = join '|',map {quotemeta($_)} $self->find_types(@types);
502
return grep {($_->primary_tag.':'.$_->source_tag) =~ /^$regexp$/i} @children;
508
sub _update_indexes {
511
defined (my $id = $obj->primary_id) or return;
512
$self->_update_name_index($obj,$id);
513
$self->_update_type_index($obj,$id);
514
$self->_update_location_index($obj,$id);
515
$self->_update_attribute_index($obj,$id);
516
$self->_update_note_index($obj,$id);
519
sub _update_name_index {
521
my ($obj,$id,$delete) = @_;
522
my $db = $self->index_db('names') or $self->throw("Couldn't find 'names' index file");
523
my ($names,$aliases) = $self->feature_names($obj);
525
# little stinky - needs minor refactoring
528
$self->update_or_delete($delete,$db,$key,$id);
531
foreach (@$aliases) {
532
my $key = lc($_)."_2"; # the _2 indicates a secondary (alias) ID
533
$self->update_or_delete($delete,$db,$key,$id);
538
sub _update_type_index {
540
my ($obj,$id,$delete) = @_;
541
my $db = $self->index_db('types')
542
or $self->throw("Couldn't find 'types' index file");
543
my $primary_tag = $obj->primary_tag;
544
my $source_tag = $obj->source_tag || '';
545
return unless defined $primary_tag;
547
$primary_tag .= ":$source_tag";
548
my $key = lc $primary_tag;
549
$self->update_or_delete($delete,$db,$key,$id);
552
# Note: this indexing scheme is space-inefficient because it stores the
553
# denormalized sequence ID followed by the bin in XXXXXX zero-leading
554
# format. It should be replaced with a binary numeric encoding and the
555
# BTREE {compare} attribute changed accordingly.
556
sub _update_location_index {
558
my ($obj,$id,$delete) = @_;
559
my $db = $self->index_db('locations')
560
or $self->throw("Couldn't find 'locations' index file");
562
my $seq_id = $obj->seq_id || '';
563
my $start = $obj->start || '';
564
my $end = $obj->end || '';
565
my $strand = $obj->strand;
566
my $bin_min = int $start/BINSIZE;
567
my $bin_max = int $end/BINSIZE;
569
for (my $bin = $bin_min; $bin <= $bin_max; $bin++ ) {
570
my $key = sprintf("%s%06d",lc($seq_id),$bin);
571
$self->update_or_delete($delete,$db,$key,pack("i4",$id,$start,$end,$strand));
575
sub _update_attribute_index {
577
my ($obj,$id,$delete) = @_;
578
my $db = $self->index_db('attributes')
579
or $self->throw("Couldn't find 'attributes' index file");
581
for my $tag ($obj->all_tags) {
582
for my $value ($obj->each_tag_value($tag)) {
583
my $key = "\L${tag}:${value}\E";
584
$self->update_or_delete($delete,$db,$key,$id);
589
sub _update_note_index {
591
my ($obj,$id,$delete) = @_;
592
return if $delete; # we don't know how to do this
594
my $fh = $self->notes_db;
595
my @notes = $obj->get_tag_values('Note') if $obj->has_tag('Note');
598
print $fh $_,"\t",pack("u*",$id) or $self->throw("An error occurred while updating note index: $!")
602
sub update_or_delete {
604
my ($delete,$db,$key,$id) = @_;
606
tied(%$db)->del_dup($key,$id);
612
# these methods return pointers to....
613
# the database that stores serialized objects
616
my $d = $self->setting('db');
617
$self->setting(db=>shift) if @_;
623
my $d = $self->setting('parentage_db');
624
$self->setting(parentage_db=>shift) if @_;
628
# the Bio::DB::Fasta object
631
my $d = $self->setting('dna_db');
632
$self->setting(dna_db=>shift) if @_;
636
# the specialized notes database
639
my $d = $self->setting('notes_db');
640
$self->setting(notes_db=>shift) if @_;
644
# The indicated index berkeley db
647
my $index_name = shift;
648
my $d = $self->setting($index_name);
649
$self->setting($index_name=>shift) if @_;
656
my @stat = stat($file);
660
# return names of all the indexes
662
return qw(names types locations attributes);
665
# the directory in which we store our indexes
668
my $d = $self->setting('directory');
669
$self->setting(directory=>shift) if @_;
673
# flag indicating that we are a temporary database
676
my $d = $self->setting('temporary');
677
$self->setting(temporary=>shift) if @_;
683
my $d = $self->setting('permissions') or return;
685
my ($write,$create) = @_;
686
$self->setting(permissions=>[$write,$create]);
691
# file name utilities...
695
return $self->directory .'/' . $file;
699
shift->_qualify('features.bdb');
702
sub _parentage_path {
703
shift->_qualify('parentage.bdb');
707
shift->_qualify('types.idx');
711
shift->_qualify('locations.idx');
714
sub _attribute_path {
715
shift->_qualify('attributes.idx');
719
shift->_qualify('notes.idx');
723
shift->_qualify('sequence.fa');
727
shift->_qualify('mtime.stamp');
730
###########################################
732
###########################################
736
my ($seq_id,$start,$end,$strand,
737
$name,$class,$allow_aliases,
742
) = rearrange([['SEQID','SEQ_ID','REF'],'START',['STOP','END'],'STRAND',
743
'NAME','CLASS','ALIASES',
744
['TYPES','TYPE','PRIMARY_TAG'],
745
['ATTRIBUTES','ATTRIBUTE'],
750
my (@from,@where,@args,@group);
751
$range_type ||= 'overlaps';
754
unless (defined $name or defined $seq_id or defined $types or defined $attributes) {
755
@result = grep {$_ ne '.next_id' } keys %{$self->db};
761
if (defined($name)) {
762
# hacky backward compatibility workaround
763
undef $class if $class && $class eq 'Sequence';
764
$name = "$class:$name" if defined $class && length $class > 0;
765
$result &&= $self->filter_by_name($name,$allow_aliases,\%found);
768
if (defined $seq_id) {
769
$result &&= $self->filter_by_location($seq_id,$start,$end,$strand,$range_type,\%found);
772
if (defined $types) {
773
$result &&= $self->filter_by_type($types,\%found);
776
if (defined $attributes) {
777
$result &&= $self->filter_by_attribute($attributes,\%found);
780
push @result,keys %found if $result;
781
return $iterator ? Bio::DB::SeqFeature::Store::berkeleydb::Iterator->new($self,\@result)
782
: map {$self->fetch($_)} @result;
787
my ($name,$allow_aliases,$filter) = @_;
789
my $index = $self->index_db('names');
790
my $db = tied(%$index);
792
my ($stem,$regexp) = $self->glob_match($name);
795
$regexp .= "(?:_2)?" if $allow_aliases;
800
for (my $status = $db->seq($key,$value,R_CURSOR);
801
$status == 0 and $key =~ /^$regexp$/i;
802
$status = $db->seq($key,$value,R_NEXT)) {
803
push @results,$value;
805
$self->update_filter($filter,\@results);
810
my ($types,$filter) = @_;
811
my @types = ref $types eq 'ARRAY' ? @$types : $types;
813
my $index = $self->index_db('types');
814
my $db = tied(%$index);
818
for my $type (@types) {
819
my ($primary_tag,$source_tag);
820
if (ref $type && $type->isa('Bio::DB::GFF::Typename')) {
821
$primary_tag = $type->method;
822
$source_tag = $type->source;
824
($primary_tag,$source_tag) = split ':',$type,2;
826
my $match = defined $source_tag ? "^$primary_tag:$source_tag\$" : "^$primary_tag:";
828
my $key = lc "$primary_tag:$source_tag";
831
for (my $status = $db->seq($key,$value,R_CURSOR);
832
$status == 0 && $key =~ /$match/i;
833
$status = $db->seq($key,$value,R_NEXT)) {
834
push @results,$value;
838
$self->update_filter($filter,\@results);
841
sub filter_by_location {
843
my ($seq_id,$start,$end,$strand,$range_type,$filter) = @_;
846
my $index = $self->index_db('locations');
847
my $db = tied(%$index);
849
my $binstart = defined $start ? sprintf("%06d",int $start/BINSIZE) : '';
850
my $binend = defined $end ? sprintf("%06d",int $end/BINSIZE) : 'z'; # beyond a number
855
$start = MININT if !defined $start;
856
$end = MAXINT if !defined $end;
858
if ($range_type eq 'overlaps' or $range_type eq 'contains') {
859
my $key = "\L$seq_id\E$binstart";
860
my $keystop = "\L$seq_id\E$binend";
862
for (my $status = $db->seq($key,$value,R_CURSOR);
863
$status == 0 && $key le $keystop;
864
$status = $db->seq($key,$value,R_NEXT)) {
865
my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value);
866
next if $seenit{$id}++;
867
next if $strand && $fstrand != $strand;
868
if ($range_type eq 'overlaps') {
869
next unless $fend >= $start && $fstart <= $end;
871
elsif ($range_type eq 'contains') {
872
next unless $fstart >= $start && $fend <= $end;
878
# for contained in, we look for features originating and terminating outside the specified range
879
# this is incredibly inefficient, but fortunately the query is rare (?)
880
elsif ($range_type eq 'contained_in') {
881
my $key = "\L$seq_id";
882
my $keystop = "\L$seq_id\E$binstart";
885
# do the left part of the range
886
for (my $status = $db->seq($key,$value,R_CURSOR);
887
$status == 0 && $key le $keystop;
888
$status = $db->seq($key,$value,R_NEXT)) {
889
my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value);
890
next if $seenit{$id}++;
891
next if $strand && $fstrand != $strand;
892
next unless $fstart <= $start && $fend >= $end;
896
# do the right part of the range
897
$key = "\L$seq_id\E$binend";
898
for (my $status = $db->seq($key,$value,R_CURSOR);
900
$status = $db->seq($key,$value,R_NEXT)) {
901
my ($id,$fstart,$fend,$fstrand) = unpack("i4",$value);
902
next if $seenit{$id}++;
903
next if $strand && $fstrand != $strand;
904
next unless $fstart <= $start && $fend >= $end;
910
$self->update_filter($filter,\@results);
913
sub filter_by_attribute {
915
my ($attributes,$filter) = @_;
917
my $index = $self->index_db('attributes');
918
my $db = tied(%$index);
921
for my $att_name (keys %$attributes) {
923
my @search_terms = ref($attributes->{$att_name}) && ref($attributes->{$att_name}) eq 'ARRAY'
924
? @{$attributes->{$att_name}} : $attributes->{$att_name};
926
for my $v (@search_terms) {
927
my ($stem,$regexp) = $self->glob_match($v);
930
my $key = "\L${att_name}:${stem}\E";
932
for (my $status = $db->seq($key,$value,R_CURSOR);
933
$status == 0 && $key =~ /^$att_name:$regexp$/i;
934
$status = $db->seq($key,$value,R_NEXT)) {
938
$result ||= $self->update_filter($filter,\@result);
943
sub _search_attributes {
945
my ($search_string,$attribute_array,$limit) = @_;
946
$search_string =~ tr/*?//d;
947
my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g;
948
my $search = join '|',@words;
950
my $index = $self->index_db('attributes');
951
my $db = tied(%$index);
953
my (%results,%notes);
955
for my $tag (@$attribute_array) {
957
my $key = "\L$tag:\E";
958
for (my $status = $db->seq($key,$id,R_CURSOR);
959
$status == 0 and $key =~ /^$tag:(.*)/i;
960
$status = $db->seq($key,$id,R_NEXT)) {
962
next unless $text =~ /$search/;
964
my @hits = $text =~ /($w)/ig or next;
965
$results{$id} += @hits;
967
$notes{$id} .= "$text ";
972
for my $id (keys %results) {
973
my $hits = $results{$id};
974
my $note = $notes{$id};
976
my $relevance = 10 * $hits;
977
my $feature = $self->fetch($id) or next;
978
my $name = $feature->display_name or next;
979
push @results,[$name,$note,$relevance];
987
my ($search_string,$limit) = @_;
989
$search_string =~ tr/*?//d;
993
my @words = map {quotemeta($_)} $search_string =~ /(\w+)/g;
994
my $search = join '|',@words;
997
my $note_index = $self->notes_db;
998
seek($note_index,0,0); # back to start
999
while (<$note_index>) {
1000
next unless /$search/;
1002
my ($note,$uu) = split "\t";
1003
$found{unpack("u*",$uu)}++;
1004
last if $limit && ++$found >= $limit;
1007
my (@features, @matches);
1008
for my $idx (keys %found) {
1009
my $feature = $self->fetch($idx) or next;
1010
my @values = $feature->get_tag_values('Note') if $feature->has_tag('Note');
1011
my $value = "@values";
1014
$hits++ while $value =~ /($search)/ig; # count the number of times we were hit
1015
push @matches,$hits;
1016
push @features,$feature;
1019
for (my $i=0; $i<@matches; $i++) {
1020
my $feature = $features[$i];
1021
my $matches = $matches[$i];
1023
my $relevance = 10 * $matches;
1025
$note = join ' ',$feature->get_tag_values('Note') if $feature->has_tag('Note');
1026
push @results,[$feature->display_name,$note,$relevance];
1035
return unless $term =~ /([^*?]*)(?:^|[^\\])?[*?]/;
1037
$term =~ s/(^|[^\\])([+\[\]^{}\$|\(\).])/$1\\$2/g;
1038
$term =~ s/(^|[^\\])\*/$1.*/g;
1039
$term =~ s/(^|[^\\])\?/$1./g;
1040
return ($stem,$term);
1046
my ($filter,$results) = @_;
1047
return unless @$results;
1050
my @filtered = grep {$filter->{$_}} @$results;
1051
%$filter = map {$_=>1} @filtered;
1053
%$filter = map {$_=>1} @$results;
1059
sub _insert_sequence {
1061
my ($seqid,$seq,$offset) = @_;
1062
my $dna_fh = $self->private_fasta_file or return;
1063
if ($offset == 0) { # start of the sequence
1064
print $dna_fh ">$seqid\n";
1066
print $dna_fh $seq,"\n";
1069
sub _fetch_sequence {
1071
my ($seqid,$start,$end) = @_;
1072
my $db = $self->dna_db or return;
1073
return $db->seq($seqid,$start,$end);
1076
sub private_fasta_file {
1078
return $self->{fasta_fh} if exists $self->{fasta_fh};
1079
$self->{fasta_file} = $self->_qualify("sequence.fa");
1080
return $self->{fasta_fh} = IO::File->new($self->{fasta_file},">");
1083
sub finish_bulk_update {
1085
if (my $fh = $self->{fasta_fh}) {
1087
$self->{fasta_db} = Bio::DB::Fasta->new($self->{fasta_file});
1094
$self->_close_databases();
1095
rmtree($self->directory,0,1) if $self->temporary;
1098
# TIE interface -- a little annoying because we are storing magic ".variable"
1099
# meta-variables in the same data structure as the IDs, so these variables
1105
while ( ($key,$value) = each %{$db}) {
1106
last unless $key =~ /^\./;
1116
while ( ($key,$value) = each %$db) {
1117
last unless $key =~ /^\./;
1125
return exists $self->db->{$id};
1131
my $obj = $self->fetch($id) or return;
1132
$self->_delete_indexes($obj,$id);
1133
delete $self->db->{$id};
1138
$self->_close_databases();
1139
$self->_delete_databases();
1140
my ($write,$create) = $self->_permissions;
1141
$self->_open_databases($write,$create);
1146
return scalar %{$self->db};
1150
package Bio::DB::SeqFeature::Store::berkeleydb::Iterator;
1156
return bless {store => $store,
1157
ids => $ids},ref($class) || $class;
1162
my $store = $self->{store} or return;
1163
my $id = shift @{$self->{ids}};
1164
defined $id or return;
1165
return $store->fetch($id);
1174
This is an early version, so there are certainly some bugs. Please
1175
use the BioPerl bug tracking system to report bugs.
1180
L<Bio::DB::SeqFeature>,
1181
L<Bio::DB::SeqFeature::Store>,
1182
L<Bio::DB::SeqFeature::GFF3Loader>,
1183
L<Bio::DB::SeqFeature::Segment>,
1184
L<Bio::DB::SeqFeature::Store::memory>,
1185
L<Bio::DB::SeqFeature::Store::DBI::mysql>,
1189
Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
1191
Copyright (c) 2006 Cold Spring Harbor Laboratory.
1193
This library is free software; you can redistribute it and/or modify
1194
it under the same terms as Perl itself.