1
# $Id: Stockholm.pm 14708 2008-06-10 00:08:17Z heikki $
3
# BioPerl module for Bio::Index::Stockholm
5
# Cared for by Chris Fields <cjfields@uiuc.edu>
7
# Copyright Chris Fields
9
# You may distribute this module under the same terms as perl itself
11
# POD documentation - main docs before the code
15
Bio::Index::Stockholm - Indexes Stockholm format alignments (such as those from
16
Pfam and Rfam. Retrieves raw stream data using the ID or a Bio::SimpleAlign
22
use Bio::Index::Stockholm;
23
my ($indexfile,$file1,$file2,$query);
24
my $index = Bio::Index::Stockholm->new(-filename => $indexfile,
26
$index->make_index($file1,$file2);
28
# get raw data stream starting at alignment position
29
my $fh = $index->get_stream($query);
31
# fetch individual alignment
32
my $align = $index->fetch_aln($query); # alias for fetch_report
33
my $align = $index->fetch_report($query); # same as above
34
print "query is ", $align->display_id, "\n";
38
This object allows one to build an index for any file (or files)
39
containing Stockholm alignment format (such as Rfam and Pfam) and provides
40
quick access to the alignment based on the alignment ID.
42
This also allows for ID parsing using a callback:
44
$inx->id_parser(\&get_id);
46
$inx->make_index($file_name);
48
# here is where the retrieval key is specified
51
$line =~ /^>.+gi\|(\d+)/;
55
The indexer is capable of indexing based on multiple IDs passed back from the
56
callback; this is assuming of course all IDs are unique. The default is to use
57
the alignment ID provided for Rfam/Pfam output.
59
Note: for best results 'use strict'.
63
- allow using an alternative regex for indexing (for instance, the ID instead of AC)
69
User feedback is an integral part of the evolution of this and other
70
Bioperl modules. Send your comments and suggestions preferably to
71
the Bioperl mailing list. Your participation is much appreciated.
73
bioperl-l@bioperl.org - General discussion
74
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
78
Report bugs to the Bioperl bug tracking system to help us keep track
79
of the bugs and their resolution. Bug reports can be submitted via the
82
http://bugzilla.open-bio.org/
84
=head1 AUTHOR - Chris Fields
86
Email cjfields-at-bioperl-dot-org
90
The rest of the documentation details each of the object methods.
91
Internal methods are usually preceded with a _
95
# Let the code begin...
97
package Bio::Index::Stockholm;
101
use base qw(Bio::Index::Abstract Bio::Root::Root);
104
return ${Bio::Root::Version::VERSION};
109
Usage : $index = Bio::Index::Abstract->new(
110
-filename => $dbm_file,
112
-dbm_package => 'DB_File',
115
Function: Returns a new index object. If filename is
116
specified, then open_dbm() is immediately called.
117
Bio::Index::Abstract->new() will usually be called
118
directly only when opening an existing index.
119
Returns : A new index object
120
Args : -filename The name of the dbm index file.
121
-write_flag TRUE if write access to the dbm file is
123
-dbm_package The Perl dbm module to use for the
125
-verbose Print debugging output to STDERR if
132
my($class,@args) = @_;
134
my $self = $class->SUPER::new(@args);
138
=head2 Bio::Index::Stockholm implemented methods
145
Usage : my $align = $idx->fetch_report($id);
146
Function: Returns a Bio::SimpleAlign object
147
for a specific alignment
148
Returns : Bio::SimpleAlign
155
my $fh = $self->get_stream($id);
156
my $report = Bio::AlignIO->new(-noclose => 1,
157
-format => 'stockholm',
159
return $report->next_aln;
165
Usage : my $align = $idx->fetch_report($id);
166
Function: Returns a Bio::SimpleAlign object
167
for a specific alignment
168
Returns : Bio::SimpleAlign
170
Note : alias for fetch_report
174
*fetch_aln = \&fetch_report;
176
=head2 Require methods from Bio::Index::Abstract
183
Usage : $index->_index_file( $file_name, $i )
184
Function: Specialist function to index report file(s).
185
Is provided with a filename and an integer
186
by make_index in its SUPER class.
196
$i, # Index-number of file being indexed
199
my( $begin, # Offset from start of file of the start
200
# of the last found record.
203
open(my $BLAST, '<', $file) or $self->throw("cannot open file $file\n");
207
if(m{^#\sSTOCKHOLM} ) {
208
$indexpoint = tell($BLAST)-length $_;
209
$self->debug("Index:$indexpoint\n")
211
if(m{^#=GF\s+AC\s+(\S[^\n]+)}) {
212
foreach my $id ($self->id_parser()->($1)) {
213
$self->debug("id is $id, begin is $indexpoint\n");
214
#$self->add_record($id, $i, $indexpoint);
220
# shamelessly stolen from Bio::Index::Fasta
225
Usage : $index->id_parser( CODE )
226
Function: Stores or returns the code used by record_id to
227
parse the ID for record from a string. Useful
228
for (for instance) specifying a different
229
parser for different flavours of IDs (for instance,
230
custom stockholm-formated files).
231
Returns \&default_id_parser (see below) if not
232
set. If you supply your own id_parser
233
subroutine, then it should expect a fasta
234
description line. An entry will be added to
235
the index for each string in the list returned.
236
Example : $index->id_parser( \&my_id_parser )
237
Returns : ref to CODE if called without arguments
243
my( $self, $code ) =@_;
246
$self->{'_id_parser'} = $code;
248
return $self->{'_id_parser'} || \&default_id_parser;
251
=head2 default_id_parser
253
Title : default_id_parser
254
Usage : $id = default_id_parser( $header )
255
Function: The default Blast Query ID parser for Bio::Index::Blast.pm
256
Returns $1 from applying the regexp /^>\s*(\S+)/
259
Args : a header line string
263
sub default_id_parser
265
if ($_[0] =~ /^\s*(\S+)/) {
272
=head2 Bio::Index::Abstract methods
279
Usage : $value = $self->filename();
280
$self->filename($value);
281
Function: Gets or sets the name of the dbm index file.
282
Returns : The current value of filename
283
Args : Value of filename if setting, or none if
289
Usage : $value = $self->write_flag();
290
$self->write_flag($value);
291
Function: Gets or sets the value of write_flag, which
292
is wether the dbm file should be opened with
294
Returns : The current value of write_flag (default 0)
295
Args : Value of write_flag if setting, or none if
300
Usage : $value = $self->dbm_package();
301
$self->dbm_package($value);
303
Function: Gets or sets the name of the Perl dbm module used.
304
If the value is unset, then it returns the value of
305
the package variable $USE_DBM_TYPE or if that is
306
unset, then it chooses the best available dbm type,
307
choosing 'DB_File' in preference to 'SDBM_File'.
308
Bio::Abstract::Index may work with other dbm file
311
Returns : The current value of dbm_package
312
Args : Value of dbm_package if setting, or none if
319
Usage : $stream = $index->get_stream( $id );
320
Function: Returns a file handle with the file pointer
321
at the approprite place
323
This provides for a way to get the actual
324
file contents and not an object
326
WARNING: you must parse the record deliminter
327
*yourself*. Abstract wont do this for you
330
$fh = $index->get_stream($myid);
334
will parse the entire file if you do not put in
335
a last statement in, like
338
/^\/\// && last; # end of record
342
Returns : A filehandle object
343
Args : string represents the accession number
344
Notes : This method should not be used without forethought
349
Usage : $index->open_dbm()
350
Function: Opens the dbm file associated with the index
351
object. Write access is only given if explicitly
352
asked for by calling new(-write => 1) or having set
353
the write_flag(1) on the index object. The type of
354
dbm file opened is that returned by dbm_package().
355
The name of the file to be is opened is obtained by
356
calling the filename() method.
358
Example : $index->_open_dbm()
359
Returns : 1 on success
365
Usage : $type = $index->_version()
366
Function: Returns a string which identifes the version of an
367
index module. Used to permanently identify an index
368
file as having been created by a particular version
369
of the index module. Must be provided by the sub class
377
Usage : $index->_filename( FILE INT )
378
Function: Indexes the file
386
Usage : $fh = $index->_file_handle( INT )
387
Function: Returns an open filehandle for the file
388
index INT. On opening a new filehandle it
389
caches it in the @{$index->_filehandle} array.
390
If the requested filehandle is already open,
391
it simply returns it from the array.
392
Example : $fist_file_indexed = $index->_file_handle( 0 );
393
Returns : ref to a filehandle
399
Usage : $index->_file_count( INT )
400
Function: Used by the index building sub in a sub class to
401
track the number of files indexed. Sets or gets
402
the number of files indexed when called with or
412
Usage : $index->add_record( $id, @stuff );
413
Function: Calls pack_record on @stuff, and adds the result
414
of pack_record to the index database under key $id.
415
If $id is a reference to an array, then a new entry
416
is added under a key corresponding to each element
418
Example : $index->add_record( $id, $fileNumber, $begin, $end )
419
Returns : TRUE on success or FALSE on failure
425
Usage : $packed_string = $index->pack_record( LIST )
426
Function: Packs an array of scalars into a single string
427
joined by ASCII 034 (which is unlikely to be used
428
in any of the strings), and returns it.
429
Example : $packed_string = $index->pack_record( $fileNumber, $begin, $end )
430
Returns : STRING or undef
435
Title : unpack_record
436
Usage : $index->unpack_record( STRING )
437
Function: Splits the sting provided into an array,
438
splitting on ASCII 034.
439
Example : ( $fileNumber, $begin, $end ) = $index->unpack_record( $self->db->{$id} )
440
Returns : A 3 element ARRAY
441
Args : STRING containing ASCII 034
446
Usage : Called automatically when index goes out of scope
447
Function: Closes connection to database and handles to