1
# $Id: PrimarySeqI.pm,v 1.1 2003/03/07 19:30:25 sac Exp $
3
# BioPerl module for Bio::PrimarySeqI
5
# Cared for by Ewan Birney <birney@sanger.ac.uk>
7
# Copyright Ewan Birney
9
# You may distribute this module under the same terms as perl itself
11
# POD documentation - main docs before the code
15
Bio::PrimarySeqI [Developers] - Interface definition for a Bio::PrimarySeq
20
# Bio::PrimarySeqI is the interface class for sequences.
22
# If you are a newcomer to bioperl, you should
23
# start with Bio::Seq documentation. This
24
# documentation is mainly for developers using
27
# to test this is a seq object
29
$obj->isa("Bio::PrimarySeqI") ||
30
$obj->throw("$obj does not implement the Bio::PrimarySeqI interface");
34
$string = $obj->seq();
35
$substring = $obj->subseq(12,50);
36
$display = $obj->display_id(); # for human display
37
$id = $obj->primary_id(); # unique id for this object,
38
# implementation defined
39
$unique_key= $obj->accession_number();
40
# unique biological id
45
$rev = $obj->revcom();
48
$obj->throw(-class => 'Bio::Root::Exception',
49
-text => "Could not reverse complement. ".
50
"Probably not DNA. Actual exception\n$@\n",
54
$trunc = $obj->trunc(12,50);
56
# $rev and $trunc are Bio::PrimarySeqI compliant objects
61
This object defines an abstract interface to basic sequence
62
information - for most users of the package the documentation (and
63
methods) in this class are not useful - this is a developers only
64
class which defines what methods have to be implmented by other Perl
65
objects to comply to the Bio::PrimarySeqI interface. Go "perldoc
66
Bio::Seq" or "man Bio::Seq" for more information on the main class for
70
PrimarySeq is an object just for the sequence and its name(s), nothing
71
more. Seq is the larger object complete with features. There is a pure
72
perl implementation of this in Bio::PrimarySeq. If you just want to
73
use Bio::PrimarySeq objects, then please read that module first. This
74
module defines the interface, and is of more interest to people who
75
want to wrap their own Perl Objects/RDBs/FileSystems etc in way that
76
they "are" bioperl sequence objects, even though it is not using Perl
77
to store the sequence etc.
80
This interface defines what bioperl consideres necessary to "be" a
81
sequence, without providing an implementation of this. (An
82
implementation is provided in Bio::PrimarySeq). If you want to provide
83
a Bio::PrimarySeq 'compliant' object which in fact wraps another
84
object/database/out-of-perl experience, then this is the correct thing
85
to wrap, generally by providing a wrapper class which would inheriet
86
from your object and this Bio::PrimarySeqI interface. The wrapper class
87
then would have methods lists in the "Implementation Specific
88
Functions" which would provide these methods for your object.
95
User feedback is an integral part of the evolution of this and other
96
Bioperl modules. Send your comments and suggestions preferably to one
97
of the Bioperl mailing lists. Your participation is much appreciated.
99
bioperl-l@bioperl.org - General discussion
100
http://bio.perl.org/MailList.html - About the mailing lists
102
=head2 Reporting Bugs
104
Report bugs to the Bioperl bug tracking system to help us keep track
105
the bugs and their resolution. Bug reports can be submitted via email
108
bioperl-bugs@bio.perl.org
109
http://bugzilla.bioperl.org/
111
=head1 AUTHOR - Ewan Birney
113
Email birney@sanger.ac.uk
117
The rest of the documentation details each of the object
118
methods. Internal methods are usually preceded with a _
123
# Let the code begin...
126
package Bio::PrimarySeqI;
129
use Bio::Root::RootI;
130
use Bio::Tools::CodonTable;
132
@ISA = qw(Bio::Root::RootI);
134
=head1 Implementation Specific Functions
136
These functions are the ones that a specific implementation must
142
Usage : $string = $obj->seq()
143
Function: Returns the sequence as a string of letters. The
144
case of the letters is left up to the implementer.
145
Suggested cases are upper case for proteins and lower case for
146
DNA sequence (IUPAC standard),
147
but implementations are suggested to keep an open mind about
148
case (some users... want mixed case!)
156
$self->throw_not_implemented();
162
Usage : $substring = $obj->subseq(10,40);
163
Function: returns the subseq from start to end, where the first base
164
is 1 and the number is inclusive, ie 1-2 are the first two
165
bases of the sequence
167
Start cannot be larger than end but can be equal
177
$self->throw_not_implemented();
183
Usage : $id_string = $obj->display_id();
184
Function: returns the display id, aka the common name of the Sequence object.
186
The semantics of this is that it is the most likely string
187
to be used as an identifier of the sequence, and likely to
188
have "human" readability. The id is equivalent to the ID
189
field of the GenBank/EMBL databanks and the id field of the
190
Swissprot/sptrembl database. In fasta format, the >(\S+) is
191
presumed to be the id, though some people overload the id
192
to embed other information. Bioperl does not use any
193
embedded information in the ID field, and people are
194
encouraged to use other mechanisms (accession field for
195
example, or extending the sequence object) to solve this.
197
Notice that $seq->id() maps to this function, mainly for
198
legacy/convience issues
208
$self->throw_not_implemented();
212
=head2 accession_number
214
Title : accession_number
215
Usage : $unique_biological_key = $obj->accession_number;
216
Function: Returns the unique biological id for a sequence, commonly
217
called the accession_number. For sequences from established
218
databases, the implementors should try to use the correct
219
accession number. Notice that primary_id() provides the
220
unique id for the implemetation, allowing multiple objects
221
to have the same accession number in a particular implementation.
223
For sequences with no accession number, this method should return
232
sub accession_number {
233
my ($self,@args) = @_;
234
$self->throw_not_implemented();
242
Usage : $unique_implementation_key = $obj->primary_id;
243
Function: Returns the unique id for this object in this
244
implementation. This allows implementations to manage their
245
own object ids in a way the implementaiton can control
246
clients can expect one id to map to one object.
248
For sequences with no accession number, this method should
249
return a stringified memory location.
251
[Note this method name is likely to change in 1.3]
261
my ($self,@args) = @_;
262
$self->throw_not_implemented();
269
Usage : if( $obj->can_call_new ) {
270
$newobj = $obj->new( %param );
272
Function: can_call_new returns 1 or 0 depending
273
on whether an implementation allows new
274
constructor to be called. If a new constructor
275
is allowed, then it should take the followed hashed
278
$myobject->new( -seq => $sequence_as_string,
280
-accession_number => $accession
291
my ($self,@args) = @_;
293
# we default to 0 here
301
Usage : if( $obj->alphabet eq 'dna' ) { /Do Something/ }
302
Function: Returns the type of sequence being one of
303
'dna', 'rna' or 'protein'. This is case sensitive.
305
This is not called <type> because this would cause
306
upgrade problems from the 0.5 and earlier Seq objects.
308
Returns : a string either 'dna','rna','protein'. NB - the object must
309
make a call of the type - if there is no type specified it
319
$self->throw_not_implemented();
323
my ($self,@args) = @_;
325
$self->warn("moltype: pre v1.0 method. Calling alphabet() instead...");
326
$self->alphabet(@args);
330
=head1 Optional Implementation Functions
332
The following functions rely on the above functions. An
333
implementing class does not need to provide these functions, as they
334
will be provided by this class, but is free to override these
337
All of revcom(), trunc(), and translate() create new sequence
338
objects. They will call new() on the class of the sequence object
339
instance passed as argument, unless can_call_new() returns FALSE. In
340
the latter case a Bio::PrimarySeq object will be created. Implementors
341
which really want to control how objects are created (eg, for object
342
persistence over a database, or objects in a CORBA framework), they
343
are encouraged to override these methods
348
Usage : $rev = $seq->revcom()
349
Function: Produces a new Bio::PrimarySeqI implementing object which
350
is the reversed complement of the sequence. For protein
351
sequences this throws an exception of "Sequence is a
352
protein. Cannot revcom"
354
The id is the same id as the original sequence, and the
355
accession number is also indentical. If someone wants to
356
track that this sequence has be reversed, it needs to
357
define its own extensions
359
To do an inplace edit of an object you can go:
361
$seq = $seq->revcom();
363
This of course, causes Perl to handle the garbage
364
collection of the old object, but it is roughly speaking as
365
efficient as an inplace edit.
367
Returns : A new (fresh) Bio::PrimarySeqI object
377
# check the type is good first.
378
my $t = $self->alphabet;
380
if( $t eq 'protein' ) {
381
$self->throw(-class => 'Bio::Root::Exception',
382
-text => "Sequence is a protein. Cannot revcom");
385
if( $t ne 'dna' && $t ne 'rna' ) {
386
if( $self->can('warn') ) {
387
$self->warn("Sequence is not dna or rna, but [$t]. ".
388
"Attempting to revcom, but unsure if this is right");
390
warn("[$self] Sequence is not dna or rna, but [$t]. ".
391
"Attempting to revcom, but unsure if this is right");
395
# yank out the sequence string
397
my $str = $self->seq();
399
# if is RNA - map to DNA then map back
407
$str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;
408
my $revseq = CORE::reverse $str;
411
$revseq =~ tr/tT/uU/;
415
if($self->can_call_new()) {
416
$seqclass = ref($self);
418
$seqclass = 'Bio::PrimarySeq';
419
$self->_attempt_to_load_Seq();
421
my $out = $seqclass->new( '-seq' => $revseq,
422
'-display_id' => $self->display_id,
423
'-accession_number' => $self->accession_number,
424
'-alphabet' => $self->alphabet,
425
'-desc' => $self->desc()
434
Usage : $subseq = $myseq->trunc(10,100);
435
Function: Provides a truncation of a sequence,
438
Returns : a fresh Bio::PrimarySeqI implementing object
439
Args : Two integers denoting first and last base of the sub-sequence.
445
my ($self,$start,$end) = @_;
448
if( defined $start && ref($start) &&
449
$start->isa('Bio::LocationI') ) {
450
$str = $self->subseq($start); # start is a location actually
452
$self->throw("trunc start,end -- there was no end for $start");
453
} elsif( $end < $start ) {
454
my $msg = "start [$start] is greater than end [$end]. \n".
455
"If you want to truncated and reverse complement, \n".
456
"you must call trunc followed by revcom. Sorry.";
459
$str = $self->subseq($start,$end);
463
if($self->can_call_new()) {
464
$seqclass = ref($self);
466
$seqclass = 'Bio::PrimarySeq';
467
$self->_attempt_to_load_Seq();
470
my $out = $seqclass->new( '-seq' => $str,
471
'-display_id' => $self->display_id,
472
'-accession_number' => $self->accession_number,
473
'-alphabet' => $self->alphabet,
474
'-desc' => $self->desc()
483
Usage : $protein_seq_obj = $dna_seq_obj->translate
484
#if full CDS expected:
485
$protein_seq_obj = $cds_seq_obj->translate(undef,undef,undef,undef,1);
488
Provides the translation of the DNA sequence using full
489
IUPAC ambiguities in DNA/RNA and amino acid codes.
491
The full CDS translation is identical to EMBL/TREMBL
492
database translation. Note that the trailing terminator
493
character is removed before returning the translation
496
Note: if you set $dna_seq_obj->verbose(1) you will get a
497
warning if the first codon is not a valid initiator.
500
Returns : A Bio::PrimarySeqI implementing object
501
Args : character for terminator (optional) defaults to '*'
502
character for unknown amino acid (optional) defaults to 'X'
503
frame (optional) valid values 0, 1, 2, defaults to 0
504
codon table id (optional) defaults to 1
505
complete coding sequence expected, defaults to 0 (false)
506
boolean, throw exception if not complete CDS (true) or defaults to warning (false)
512
my($stop, $unknown, $frame, $tableid, $fullCDS, $throw) = @_;
513
my($i, $len, $output) = (0,0,'');
517
## User can pass in symbol for stop and unknown codons
518
unless(defined($stop) and $stop ne '') { $stop = "*"; }
519
unless(defined($unknown) and $unknown ne '') { $unknown = "X"; }
520
unless(defined($frame) and $frame ne '') { $frame = 0; }
522
## the codon table ID
523
unless(defined($tableid) and $tableid ne '') { $tableid = 1; }
525
##Error if monomer is "Amino"
526
if ($self->alphabet eq 'protein') {
527
$self->throw(-class => 'Bio::Root::Exception',
528
-text => "Can't translate an amino acid sequence.")
531
##Error if frame is not 0, 1 or 2
532
unless ($frame == 0 or $frame == 1 or $frame == 2) {
533
$self->throw(-class => 'Bio::Root::BadParameter',
534
-text => "Valid values for frame are 0, 1, 2, not [$frame].",
538
#warns if ID is invalid
539
my $codonTable = Bio::Tools::CodonTable->new( -id => $tableid);
541
my ($seq) = $self->seq();
543
# deal with frame offset.
545
$seq = substr ($seq,$frame);
549
$output = $codonTable->translate($seq);
550
# Use user-input stop/unknown
551
$output =~ s/\*/$stop/g;
552
$output =~ s/X/$unknown/g;
554
# only if we are expecting to translate a complete coding region
556
my $id = $self->display_id;
557
#remove the stop character
558
if( substr($output,-1,1) eq $stop ) {
561
$throw && $self->throw(-class => 'Bio::Root::Exception',
562
-text => "Seq [$id]: Not using a valid terminator codon!: ". substr($output,-1,1),
563
-value => substr($output,-1,1));
564
$self->warn("Seq [$id]: Not using a valid terminator codon!: ". substr($output,-1,1));
566
# test if there are terminator characters inside the protein sequence!
567
if ($output =~ /\*/) {
568
$throw && $self->throw(-class => 'Bio::Root::Exception',
569
-text => "Seq [$id]: Terminator codon inside CDS!");
570
$self->warn("Seq [$id]: Terminator codon inside CDS!");
572
# if the initiator codon is not ATG, the amino acid needs to changed into M
573
if ( substr($output,0,1) ne 'M' ) {
574
if ($codonTable->is_start_codon(substr($seq, 0, 3)) ) {
575
$output = 'M'. substr($output,1);
578
$self->throw(-class => 'Bio::Root::Exception',
579
-text => "Seq [$id]: Not using a valid initiator codon!: ". substr($seq, 0, 3),
580
-value => substr($seq, 0, 3));
582
$self->warn("Seq [$id]: Not using a valid initiator codon!");
588
if($self->can_call_new()) {
589
$seqclass = ref($self);
591
$seqclass = 'Bio::PrimarySeq';
592
$self->_attempt_to_load_Seq();
594
my $out = $seqclass->new( '-seq' => $output,
595
'-display_id' => $self->display_id,
596
'-accession_number' => $self->accession_number,
597
# is there anything wrong with retaining the
599
'-desc' => $self->desc(),
600
'-alphabet' => 'protein'
609
Usage : $id = $seq->id()
610
Function: ID of the sequence. This should normally be (and actually is in
611
the implementation provided here) just a synonym for display_id().
622
return $self->display_id();
629
Usage : $len = $seq->length()
632
Returns : integer representing the length of the sequence.
640
$self->throw_not_implemented();
646
Usage : $seq->desc($newval);
647
$description = $seq->desc();
648
Function: Get/set description text for a seq object
650
Returns : value of desc
651
Args : newvalue (optional)
657
my ($self,$value) = @_;
658
$self->throw_not_implemented();
665
Usage : if( $obj->is_circular) { /Do Something/ }
666
Function: Returns true if the molecule is circular
667
Returns : Boolean value
673
my ($self,$value) = @_;
674
if (defined $value) {
675
$self->{'_is_circular'} = $value;
677
return $self->{'_is_circular'};
680
=head1 Private functions
682
These are some private functions for the PrimarySeqI interface. You do not
683
need to implement these functions
685
=head2 _attempt_to_load_Seq
687
Title : _attempt_to_load_Seq
697
sub _attempt_to_load_Seq{
700
if( $main::{'Bio::PrimarySeq'} ) {
704
require Bio::PrimarySeq;
707
my $text = "Bio::PrimarySeq could not be loaded for [$self]\n".
708
"This indicates that you are using Bio::PrimarySeqI ".
709
"without Bio::PrimarySeq loaded or without providing a ".
710
"complete implementation.\nThe most likely problem is that there ".
711
"has been a misconfiguration of the bioperl environment\n".
712
"Actual exception:\n\n";
713
$self->throw(-class => 'Bio::Root::Exception',
714
-text => "$text$@\n",