1
#-------------------------------------------------------------------------------
2
# PACKAGE : Bio::Tools::SeqAnal
3
# PURPOSE : To provide a base class for different sequence analysis tools.
4
# AUTHOR : Steve Chervitz (sac@bioperl.org)
5
# CREATED : 27 Mar 1998
6
# REVISION: $Id: SeqAnal.pm,v 1.13 2003/06/04 08:36:43 heikki Exp $
9
# For documentation, run this module through pod2html
10
# (preferably from Perl v5.004 or better).
11
#-------------------------------------------------------------------------------
13
package Bio::Tools::SeqAnal;
15
use Bio::Root::Object ();
16
use Bio::Root::Global qw(:std);
19
use vars qw($ID @ISA);
21
@ISA = qw( Bio::Root::Object );
22
$ID = 'Bio::Tools::SeqAnal';
29
Bio::Tools::SeqAnal - Bioperl sequence analysis base class.
33
=head2 Object Creation
35
This module is an abstract base class. Perl will let you instantiate it,
36
but it provides little functionality on its own. This module
37
should be used via a specialized subclass. See L<_initialize()|_initialize>
38
for a description of constructor parameters.
40
require Bio::Tools::SeqAnal;
42
To run and parse a new report:
44
$hit = new Bio::Tools::SeqAnal ( -run => \%runParams,
47
To parse an existing report:
49
$hit = new Bio::Tools::SeqAnal ( -file => 'filename.data',
52
To run a report without parsing:
54
$hit = new Bio::Tools::SeqAnal ( -run => \%runParams
57
To read an existing report without parsing:
59
$hit = new Bio::Tools::SeqAnal ( -file => 'filename.data',
65
This module is included with the central Bioperl distribution:
67
http://bio.perl.org/Core/Latest
68
ftp://bio.perl.org/pub/DIST
70
Follow the installation instructions included in the README file.
75
Bio::Tools::SeqAnal.pm is a base class for specialized
76
sequence analysis modules such as B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>.
77
It provides some basic data and functionalities that are not unique to
78
a specialized module such as:
82
=item * reading raw data into memory.
84
=item * storing name and version of the program.
86
=item * storing name of the query sequence.
88
=item * storing name and version of the database.
90
=item * storing & determining the date on which the analysis was performed.
92
=item * basic file manipulations (compress, uncompress, delete).
96
Some of these functionalities (reading, file maipulation) are inherited from
97
B<Bio::Root::Object>, from which Bio::Tools::SeqAnal.pm derives.
101
=head1 RUN, PARSE, and READ
103
A SeqAnal.pm object can be created using one of three modes: run, parse, or read.
107
run Run a new sequence analysis report. New results can then
108
be parsed or saved for analysis later.
110
parse Parse the data from a sequence analysis report loading it
111
into the SeqAnal.pm object.
113
read Read in data from an existing raw analysis report without
114
parsing it. In the future, this may also permit persistent
115
SeqAnal.pm objects. This mode is considered experimental.
117
The mode is set by supplying switches to the constructor, see L<_initialize()|_initialize>.
121
A key feature of SeqAnal.pm is the ability to access raw data in a
122
generic fashion. Regardless of what sequence analysis method is used,
123
the raw data always need to be read into memory. The SeqAnal.pm class
124
utilizes the L<Bio::Root::Object::read()|Bio::Root::Object> method inherited from
125
B<Bio::Root::Object> to permit the following:
129
=item * read from a file or STDIN.
131
=item * read a single record or a stream containing multiple records.
133
=item * specify a record separator.
135
=item * store all input data in memory or process the data stream as it is being read.
139
By permitting the parsing of data as it is being read, each record can be
140
analyzed as it is being read and saved or discarded as necessary.
141
This can be useful when cruching through thousands of reports.
142
For examples of this, see the L<parse()|parse> methods defined in B<Bio::Tools::Blast> and
143
B<Bio::Tools::Fasta>.
146
=head2 Parsing & Running
148
Parsing and running of sequence analysis reports must be implemented for each
149
specific subclass of SeqAnal.pm. No-op stubs ("virtual methods") are provided here for
150
the L<parse()|parse> and L<run()|run> methods. See B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>
156
Bio::Tools::SeqAnal.pm is a concrete class that inherits from B<Bio::Root::Object>.
157
This module also makes use of a number of functionalities inherited from
158
B<Bio::Root::Object> (file manipulations such as reading, compressing, decompressing,
159
deleting, and obtaining date.
166
User feedback is an integral part of the evolution of this and other Bioperl modules.
167
Send your comments and suggestions preferably to one of the Bioperl mailing lists.
168
Your participation is much appreciated.
170
bioperl-l@bioperl.org - General discussion
171
http://bio.perl.org/MailList.html - About the mailing lists
173
=head2 Reporting Bugs
175
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and
176
their resolution. Bug reports can be submitted via email or the web:
178
bioperl-bugs@bio.perl.org
179
http://bugzilla.bioperl.org/
183
Steve Chervitz, sac@bioperl.org
185
See the L<FEEDBACK | FEEDBACK> section for where to send bug reports and comments.
189
Bio::Tools::SeqAnal.pm, 0.011
193
Copyright (c) 1998 Steve Chervitz. All Rights Reserved.
194
This module is free software; you can redistribute it and/or
195
modify it under the same terms as Perl itself.
200
http://bio.perl.org/Projects/modules.html - Online module documentation
201
http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
202
http://bio.perl.org/ - Bioperl Project Homepage
212
#### END of main POD documentation.
219
Methods beginning with a leading underscore are considered private
220
and are intended for internal use by this module. They are
221
B<not> considered part of the public interface and are described here
222
for documentation purposes only.
226
##############################################################################
228
##############################################################################
233
Usage : n/a; automatically called by Bio::Root::Object::new()
234
Purpose : Calls private methods to extract the raw report data,
235
: Calls superclass constructor first (Bio::Root::Object.pm).
236
Returns : string containing the make parameter value.
237
Argument : Named parameters (TAGS CAN BE ALL UPPER OR ALL LOWER CASE).
238
: The SeqAnal.pm constructor only processes the following
239
: parameters passed from new()
240
: -RUN => hash reference for named parameters to be used
241
: for running a sequence analysis program.
242
: These are dereferenced and passed to the run() method.
246
: If -RUN is HASH ref, the run() method will be called with the
248
: If -PARSE is true, all parameters passed from new() are passed
249
: to the parse() method. This occurs after the run method call
250
: to enable combined running + parsing.
251
: If -READ is true, all parameters passed from new() are passed
252
: to the read() method.
253
: Either -PARSE or -READ should be true, not both.
254
Comments : Does not calls _rearrange() to handle parameters since only
255
: a few are required and there may be potentially many.
257
See Also : B<Bio::Root::Object::new()>, B<Bio::Root::Object::_rearrange()>
264
my( $self, %param ) = @_;
266
my $make = $self->SUPER::_initialize(%param);
268
my($read, $parse, $runparam) = (
269
($param{-READ}||$param{'-read'}), ($param{-PARSE}||$param{'-parse'}),
270
($param{-RUN}||$param{'-run'})
273
# $self->_rearrange([qw(READ PARSE RUN)], @param);
275
# Issue: How to keep all the arguments for running the analysis
276
# separate from other arguments needed for parsing the results, etc?
277
# Solution: place all the run arguments in a separate hash.
279
$self->run(%$runparam) if ref $runparam eq 'HASH';
281
if($parse) { $self->parse(%param); }
282
elsif($read) { $self->read(%param) }
291
$DEBUG==2 && print STDERR "DESTROYING $self ${\$self->name}";
292
undef $self->{'_rawData'};
293
$self->SUPER::destroy;
297
###############################################################################
299
###############################################################################
301
# The mode of the SeqAnal object is no longer explicitly set.
302
# This simplifies the interface somewhat.
304
##----------------------------------------------------------------------
307
# Usage : $object->mode();
309
# Purpose : Set/Get the mode for the sequence analysis object.
316
# Comments : The mode specifies how much detail to extract from the
317
# : sequence analysis report. There are three modes:
319
# : 'parse' -- Parse the sequence analysis output data.
321
# : 'read' -- Reads in the raw report but does not
322
# : attempt to parse it. Useful when you just
323
# : want to work with the output as-is
324
# : (e.g., create HTML-formatted output).
326
# : 'run' -- Generates a new report.
328
# : Allowable modes are defined by the exported package global array
331
#See Also : _set_mode()
333
##----------------------------------------------------------------------
336
# if(@_) { $self->{'_mode'} = lc(shift); }
344
Usage : $object->best();
345
Purpose : Set/Get the indicator for processing only the best match.
346
Returns : Boolean (1 | 0)
355
if(@_) { $self->{'_best'} = shift; }
363
Usage : $object->_set_db_stats(<named parameters>);
364
Purpose : Set stats about the database searched.
366
Argument : named parameters:
367
: -LETTERS => <int> (number of letters in db)
368
: -SEQS => <int> (number of sequences in db)
375
my ($self, %param) = @_;
377
$self->{'_db'} ||= $param{-NAME} || '';
378
$self->{'_dbRelease'} = $param{-RELEASE} || '';
379
($self->{'_dbLetters'} = $param{-LETTERS} || 0) =~ s/,//g;
380
($self->{'_dbSeqs'} = $param{-SEQS} || 0) =~ s/,//g;
387
Usage : $object->database();
388
Purpose : Set/Get the name of the database searched.
398
if(@_) { $self->{'_db'} = shift; }
404
=head2 database_release
406
Usage : $object->database_release();
407
Purpose : Set/Get the release date of the queried database.
413
#-----------------------
414
sub database_release {
415
#-----------------------
417
if(@_) { $self->{'_dbRelease'} = shift; }
418
$self->{'_dbRelease'};
422
=head2 database_letters
424
Usage : $object->database_letters();
425
Purpose : Set/Get the number of letters in the queried database.
431
#----------------------
432
sub database_letters {
433
#----------------------
435
if(@_) { $self->{'_dbLetters'} = shift; }
436
$self->{'_dbLetters'};
443
Usage : $object->database_seqs();
444
Purpose : Set/Get the number of sequences in the queried database.
454
if(@_) { $self->{'_dbSeqs'} = shift; }
462
Usage : $object->set_date([<string>]);
463
Purpose : Set the name of the date on which the analysis was performed.
464
Argument : The optional string argument ca be the date or the
465
: string 'file' in which case the date will be obtained from
468
Throws : Exception if no date is supplied and no file exists.
469
Comments : This method attempts to set the date in either of two ways:
470
: 1) using data passed in as an argument,
471
: 2) using the Bio::Root::Utilities.pm file_date() method
472
: on the output file.
473
: Another way is to extract the date from the contents of the
474
: raw output data. Such parsing will have to be specialized
475
: for different seq analysis reports. Override this method
476
: to create such custom parsing code if desired.
478
See Also : L<date()|date>, B<Bio::Root::Object::file_date()>
489
if( !$date and ($file = $self->file)) {
490
# If no date is passed and a file exists, determine date from the file.
491
# (provided by superclass Bio::Root::Object.pm)
493
$date = $self->SUPER::file_date(-FMT => 'd m y');
497
$self->warn("Can't set date of report.");
500
$self->{'_date'} = $date;
507
Usage : $object->date();
508
Purpose : Get the name of the date on which the analysis was performed.
511
Comments : This method is not a combination set/get, it only gets.
513
See Also : L<set_date()|set_date>
518
sub date { my $self = shift; $self->{'_date'}; }
526
Usage : $object->length();
527
Purpose : Set/Get the length of the query sequence (number of monomers).
530
Comments : Developer note: when using the built-in length function within
531
: this module, call it as CORE::length().
539
if(@_) { $self->{'_length'} = shift; }
545
Usage : $object->program();
546
Purpose : Set/Get the name of the sequence analysis (BLASTP, FASTA, etc.)
556
if(@_) { $self->{'_prog'} = shift; }
562
=head2 program_version
564
Usage : $object->program_version();
565
Purpose : Set/Get the version number of the sequence analysis program.
566
: (e.g., 1.4.9MP, 2.0a19MP-WashU).
572
#---------------------
573
sub program_version {
574
#---------------------
576
if(@_) { $self->{'_progVersion'} = shift; }
577
$self->{'_progVersion'};
583
Usage : $name = $object->query();
584
Purpose : Get the name of the query sequence used to generate the report.
587
Comments : Equivalent to $object->name().
592
sub query { my $self = shift; $self->name; }
598
Usage : $object->desc();
599
Purpose : Set/Get the description of the query sequence for the analysis.
609
if(@_) { $self->{'_qDesc'} = shift; }
618
Usage : $object->display(<named parameters>);
619
Purpose : Display information about Bio::Tools::SeqAnal.pm data members.
620
: Overrides Bio::Root::Object::display().
621
Example : $object->display(-SHOW=>'stats');
622
Argument : Named parameters: -SHOW => 'file' | 'stats'
623
: -WHERE => filehandle (default = STDOUT)
625
Status : Experimental
627
See Also : L<_display_stats()|_display_stats>, L<_display_file()|_display_file>, B<Bio::Root::Object::display()>
634
my( $self, %param ) = @_;
636
$self->SUPER::display(%param);
638
my $OUT = $self->fh();
639
$self->show =~ /file/i and $self->_display_file($OUT);
647
Usage : n/a; called automatically by display()
648
Purpose : Print the contents of the raw report file.
650
Argument : one argument = filehandle object.
652
Status : Experimental
654
See Also : L<display()|display>
661
my( $self, $OUT) = @_;
663
print $OUT scalar($self->read);
669
=head2 _display_stats
671
Usage : n/a; called automatically by display()
672
Purpose : Display information about Bio::Tools::SeqAnal.pm data members.
673
: Prints the file name, program, program version, database name,
674
: database version, query name, query length,
676
Argument : one argument = filehandle object.
677
Returns : printf call.
678
Status : Experimental
680
See Also : B<Bio::Root::Object::display()>
684
#--------------------
686
#--------------------
687
my( $self, $OUT ) = @_;
689
printf( $OUT "\n%-15s: %s\n", "QUERY NAME", $self->query ||'UNKNOWN' );
690
printf( $OUT "%-15s: %s\n", "QUERY DESC", $self->query_desc || 'UNKNOWN');
691
printf( $OUT "%-15s: %s\n", "LENGTH", $self->length || 'UNKNOWN');
692
printf( $OUT "%-15s: %s\n", "FILE", $self->file || 'STDIN');
693
printf( $OUT "%-15s: %s\n", "DATE", $self->date || 'UNKNOWN');
694
printf( $OUT "%-15s: %s\n", "PROGRAM", $self->program || 'UNKNOWN');
695
printf( $OUT "%-15s: %s\n", "VERSION", $self->program_version || 'UNKNOWN');
696
printf( $OUT "%-15s: %s\n", "DB-NAME", $self->database || 'UNKNOWN');
697
printf( $OUT "%-15s: %s\n", "DB-RELEASE", ($self->database_release || 'UNKNOWN'));
698
printf( $OUT "%-15s: %s\n", "DB-LETTERS", ($self->database_letters) ? $self->database_letters : 'UNKNOWN');
699
printf( $OUT "%-15s: %s\n", "DB-SEQUENCES", ($self->database_seqs) ? $self->database_seqs : 'UNKNOWN');
703
#####################################################################################
704
## VIRTUAL METHODS ##
705
#####################################################################################
707
=head1 VIRTUAL METHODS
711
Usage : $object->parse( %named_parameters )
712
Purpose : Parse a raw sequence analysis report.
713
Returns : Integer (number of sequence analysis reports parsed).
714
Argument : Named parameters.
715
Throws : Exception: virtual method not defined.
716
: Propagates any exception thrown by read()
718
Comments : This is virtual method that should be overridden to
719
: parse a specific type of data.
721
See Also : B<Bio::Root::Object::read()>
728
my ($self, @param) = @_;
730
$self->throw("Virtual method parse() not defined ${ref($self)} objects.");
732
# The first step in parsing is reading in the data:
740
Usage : $object->run( %named_parameters )
741
Purpose : Run a sequence analysis program on one or more sequences.
743
: Run mode should be configurable to return a parsed object or
744
: the raw results data.
745
Argument : Named parameters:
746
Throws : Exception: virtual method not defined.
754
my ($self, %param) = @_;
755
$self->throw("Virtual method run() not defined ${ref($self)} objects.");
762
#####################################################################################
764
#####################################################################################
767
=head1 FOR DEVELOPERS ONLY
771
Information about the various data members of this module is provided for those
772
wishing to modify or understand the code. Two things to bear in mind:
776
=item 1 Do NOT rely on these in any code outside of this module.
778
All data members are prefixed with an underscore to signify that they are private.
779
Always use accessor methods. If the accessor doesn't exist or is inadequate,
780
create or modify an accessor (and let me know, too!).
782
=item 2 This documentation may be incomplete and out of date.
784
It is easy for these data member descriptions to become obsolete as
785
this module is still evolving. Always double check this info and search
786
for members not described here.
790
An instance of Bio::Tools::SeqAnal.pm is a blessed reference to a hash containing
791
all or some of the following fields:
794
--------------------------------------------------------------
795
_file Full path to file containing raw sequence analysis report.
797
_mode Affects how much detail to extract from the raw report.
798
Future mode will also distinguish 'running' from 'parsing'
801
THE FOLLOWING MAY BE EXTRACTABLE FROM THE RAW REPORT FILE:
803
_prog Name of the sequence analysis program.
805
_progVersion Version number of the program.
807
_db Database searched.
809
_dbRelease Version or date of the database searched.
811
_dbLetters Total number of letters in the database.
813
_dbSequences Total number of sequences in the database.
815
_query Name of query sequence.
817
_length Length of the query sequence.
819
_date Date on which the analysis was performed.
822
INHERITED DATA MEMBERS
824
_name From Bio::Root::Object.pm. String representing the name of the query sequence.
825
Typically obtained from the report file.
827
_parent From Bio::Root::Object.pm. This member contains a reference to the
828
object to which this seq anal report belongs. Optional & experimenta.
829
(E.g., a protein object could create and own a Blast object.)