1
# $Id: Seq.pm,v 1.1.1.1 2001/12/27 16:32:42 sac Exp $
3
# BioPerl module for Bio::Seq
5
# Cared for by Ewan Birney <birney@sanger.ac.uk>
7
# Copyright Ewan Birney
9
# You may distribute this module under the same terms as perl itself
11
# POD documentation - main docs before the code
15
Bio::Seq - Sequence object, with features
19
$seqio = Bio::SeqIO->new ( '-format' => 'embl' , -file => 'myfile.dat');
20
$seqobj = $seqio->next_seq();
22
# features must implement Bio::SeqFeatureI
24
@features = $seqobj->top_SeqFeatures(); # just top level
25
@features = $seqobj->all_SeqFeatures(); # descend into sub features
27
$seq = $seqobj->seq(); # actual sequence as a string
28
$seqstr = $seqobj->subseq(10,50);
29
$ann = $seqobj->annotation(); # annotation object
33
A Seq object is a sequence with sequence features placed on them. The
34
Seq object contains a PrimarySeq object for the actual sequence and
35
also implements its interface.
37
In bioperl we have 3 main players that people are going to use
39
Bio::PrimarySeq - just the sequence and its names, nothing else.
40
Bio::SeqFeatureI - a location on a sequence, potentially with a sequence.
42
Bio::Seq - A sequence and a collection of seqfeatures (an aggregate) with
45
Although bioperl is not tied to file formats heavily, these distinctions do
46
map to file formats sensibly and for some bioinformaticians this might help
49
Bio::PrimarySeq - Fasta file of a sequence
50
Bio::SeqFeatureI - A single entry in an EMBL/GenBank/DDBJ feature table
51
Bio::Seq - A single EMBL/GenBank/DDBJ entry
53
By having this split we avoid alot of nasty ciricular references
54
(seqfeatures can hold a reference to a sequence without the sequence
55
holding a reference to the seqfeature).
57
Ian Korf really helped in the design of the Seq and SeqFeature system.
63
User feedback is an integral part of the evolution of this
64
and other Bioperl modules. Send your comments and suggestions preferably
65
to one of the Bioperl mailing lists.
66
Your participation is much appreciated.
68
bioperl-l@bioperl.org - General discussion
69
http://bio.perl.org/MailList.html - About the mailing lists
73
Report bugs to the Bioperl bug tracking system to help us keep track
74
the bugs and their resolution.
75
Bug reports can be submitted via email or the web:
77
bioperl-bugs@bioperl.org
78
http://bio.perl.org/bioperl-bugs/
80
=head1 AUTHOR - Ewan Birney, inspired by Ian Korf objects
82
Email birney@sanger.ac.uk
84
Describe contact details here
88
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
93
# Let the code begin...
101
# Object preamble - inheriets from Bio::Root::Object
103
use Bio::Root::RootI;
107
@ISA = qw(Bio::Root::RootI Bio::SeqI);
113
Usage : $seq = Bio::Seq->new( -seq => 'ATGGGGGTGGTGGTACCCT',
115
-accession_number => 'AL000012',
118
Function: Returns a new seq object from
119
basic constructors, being a string for the sequence
120
and strings for id and accession_number
121
Returns : a new Bio::Seq object
126
# standard new call..
127
my($caller,@args) = @_;
128
my $self = $caller->SUPER::new(@args);
129
# this is way too sneaky probably. We delegate the construction of
130
# the Seq object onto PrimarySeq and then pop primary_seq into
131
# our primary_seq slot
133
my $pseq = Bio::PrimarySeq->new(@args);
134
$self->{'_as_feat'} = [];
136
my $ann = new Bio::Annotation;
137
$self->annotation($ann);
138
$self->primary_seq($pseq);
143
=head1 PrimarySeq interface
145
The primaryseq interface is the basic sequence getting
146
and setting methods found on all sequences.
148
These methods implement the PrimarySeq interface by delegating
149
to the primary_seq inside the object. This means that you
150
can use a Seq object wherever there is a PrimarySeq, and
151
of course, you are free to use these functions anyway.
158
Usage : $string = $obj->seq()
159
Function: Returns the sequence as a string of letters. The
160
case of the letters is left up to the implementer.
161
Suggested cases are upper case for proteins and lower case for
162
DNA sequence (IUPAC standard),
163
but implementations are suggested to keep an open mind about
164
case (some users... want mixed case!)
171
my ($self,$value) = @_;
172
if( defined $value ) {
173
return $self->primary_seq()->seq($value);
175
return $self->primary_seq()->seq();
181
Usage : $len = $seq->length()
184
Returns : integer representing the length of the sequence.
191
return $self->primary_seq()->length();
197
Usage : $substring = $obj->subseq(10,40);
198
Function: returns the subseq from start to end, where the first base
199
is 1 and the number is inclusive, ie 1-2 are the first two
200
bases of the sequence
202
Start cannot be larger than end but can be equal
211
my ($self,$s,$e) = @_;
212
return $self->primary_seq()->subseq($s,$e);
218
Usage : $id_string = $obj->display_id($newid);
219
Function: returns or sets the display id, aka the common name of the
222
The semantics of this is that it is the most likely string
223
to be used as an identifier of the sequence, and likely to
224
have "human" readability. The id is equivalent to the ID
225
field of the GenBank/EMBL databanks and the id field of the
226
Swissprot/sptrembl database. In fasta format, the >(\S+) is
227
presumed to be the id, though some people overload the id
228
to embed other information. Bioperl does not use any
229
embedded information in the ID field, and people are
230
encouraged to use other mechanisms (accession field for
231
example, or extending the sequence object) to solve this.
233
Notice that $seq->id() maps to this function, mainly for
234
legacy/convience issues
236
Args : newid (optional)
242
my ($self,$value) = @_;
243
if( defined $value ) {
244
return $self->primary_seq->display_id($value);
246
return $self->primary_seq->display_id();
251
=head2 accession_number
253
Title : accession_number
254
Usage : $unique_biological_key = $obj->accession_number;
255
Function: Returns the unique biological id for a sequence, commonly
256
called the accession_number. For sequences from established
257
databases, the implementors should try to use the correct
258
accession number. Notice that primary_id() provides the
259
unique id for the implemetation, allowing multiple objects
260
to have the same accession number in a particular implementation.
262
For sequences with no accession number, this method should return
270
sub accession_number {
271
my ($self,$value) = @_;
272
if( defined $value ) {
273
return $self->primary_seq->accession_number($value);
275
return $self->primary_seq->accession_number();
282
Usage : $seqobj->desc()
283
Function: Sets/Gets the description of the sequnce
292
my ($self,$value) = @_;
294
if( defined $value ) {
295
return $self->primary_seq->desc($value);
297
return $self->primary_seq->desc();
306
Usage : $unique_implementation_key = $obj->primary_id;
307
Function: Returns the unique id for this object in this
308
implementation. This allows implementations to manage
309
their own object ids in a way the implementaiton can control
310
clients can expect one id to map to one object.
312
For sequences with no natural id, this method should return
313
a stringified memory location.
315
Also notice that this method is B<not> delegated to the
316
internal PrimarySeq object
324
my ($obj,$value) = @_;
326
if( defined $value) {
327
$obj->{'primary_id'} = $value;
329
if( ! exists $obj->{'primary_id'} ) {
332
return $obj->{'primary_id'};
338
Usage : if( $obj->can_call_new ) {
339
$newobj = $obj->new( %param );
341
Function: can_call_new returns 1 or 0 depending
342
on whether an implementation allows new
343
constructor to be called. If a new constructor
344
is allowed, then it should take the followed hashed
347
$myobject->new( -seq => $sequence_as_string,
349
-accession_number => $accession
368
Usage : if( $obj->moltype eq 'dna' ) { /Do Something/ }
369
Function: Returns the type of sequence being one of
370
'dna', 'rna' or 'protein'. This is case sensitive.
372
This is not called <type> because this would cause
373
upgrade problems from the 0.5 and earlier Seq objects.
375
Returns : a string either 'dna','rna','protein'. NB - the object must
376
make a call of the type - if there is no type specified it
384
my ($self,$value) = @_;
385
if( defined $value ) {
386
return $self->primary_seq->moltype($value);
388
return $self->primary_seq->moltype();
391
=head1 Methods provided in the Bio::PrimarySeqI interface
393
These methods are inherited from the PrimarySeq interface
394
and work as one expects, building new Bio::Seq objects
395
or other information as expected.
397
Sequence Features are B<not> transfered to the new objects.
398
This is possibly a mistake. Anyone who feels the urge in
399
dealing with this is welcome to give it a go.
404
Usage : $rev = $seq->revcom()
405
Function: Produces a new Bio::Seq object which
406
is the reversed complement of the sequence. For protein
407
sequences this throws an exception of "Sequence is a protein. Cannot revcom"
409
The id is the same id as the orginal sequence, and the accession number
410
is also indentical. If someone wants to track that this sequence has be
411
reversed, it needs to define its own extensions
413
To do an inplace edit of an object you can go:
415
$seq = $seq->revcom();
417
This of course, causes Perl to handle the garbage collection of the old
418
object, but it is roughly speaking as efficient as an inplace edit.
420
Returns : A new (fresh) Bio::Seq object
429
Usage : $subseq = $myseq->trunc(10,100);
430
Function: Provides a truncation of a sequence,
433
Returns : a fresh Bio::Seq object
442
Usage : $id = $seq->id()
443
Function: This is mapped on display_id
452
my ($self,$value) = @_;
454
if( defined $value ) {
455
return $self->display_id($value);
457
return $self->display_id();
461
=head1 Seq only methods
463
These methods are specific to the Bio::Seq object, and not
464
found on the Bio::PrimarySeq object
469
Usage : $obj->primary_seq($newval)
472
Returns : value of primary_seq
473
Args : newvalue (optional)
474
Throws : Bio::Root::BadParameter if the supplied argument does
475
not derive from Bio::PrimarySeqI.
480
my ($obj,$value) = @_;
482
if( defined $value) {
483
if( ! ref $value || ! $value->isa('Bio::PrimarySeqI') ) {
484
$obj->throw(-class => 'Bio::Root::BadParameter',
485
-text => "$value is not a Bio::PrimarySeq compliant object",
489
$obj->{'primary_seq'} = $value;
490
# descend down over all seqfeature objects, seeing whether they
491
# want an attached seq.
493
foreach my $sf ( $obj->top_SeqFeatures() ) {
494
if( $sf->can("attach_seq") ) {
495
$sf->attach_seq($value);
497
$obj->warn("In Seq primary_seq, a sequence feature cannot attach seq. Bugger");
502
return $obj->{'primary_seq'};
509
Usage : $obj->annotation($seq_obj)
512
Returns : value of annotation
513
Args : newvalue (optional)
519
my ($obj,$value) = @_;
520
if( defined $value) {
521
$obj->{'annotation'} = $value;
523
return $obj->{'annotation'};
527
=head2 add_SeqFeature
529
Title : add_SeqFeature
530
Usage : $seq->add_SeqFeature($feat);
531
$seq->add_SeqFeature(@feat);
532
Function: Adds the given feature object (or each of an array of feature
533
objects to the feature array of this
534
sequence. The object passed is required to implement the
535
Bio::SeqFeatureI interface.
537
Returns : TRUE on success
538
Args : A Bio::SeqFeatureI implementing object, or an array of such objects.
539
Throws : Bio::Root::BadParameter if any of the supplied arguments do
540
not derive from Bio::SeqFeatureI.
546
my ($self,@feat) = @_;
550
foreach my $feat ( @feat ) {
551
if( !$feat->isa("Bio::SeqFeatureI") ) {
552
$self->throw(-class => 'Bio::Root::BadParameter',
553
-text =>"$feat is not a SeqFeatureI and that's what we expect...",
557
if( $feat->can("entire_seq") ) {
558
$fseq = $feat->entire_seq;
559
$aseq = $self->primary_seq;
561
if( defined $aseq ) {
562
if( defined $fseq ) {
563
unless ($aseq == $fseq) {
564
$self->warn("$feat has an attached sequence which is not in this annseq. I worry about this");
567
if( $feat->can("attach_seq") ) {
569
$feat->attach_seq($aseq);
573
} # end of if the feat can entire_seq
575
push(@{$self->{'_as_feat'}},$feat);
580
=head2 top_SeqFeatures
582
Title : top_SeqFeatures
583
Usage : @feat_ary = $seq->top_SeqFeatures();
584
Function: Returns the array of top-level features for this sequence object.
585
Features which are not top-level are subfeatures of one or more
586
of the returned feature objects, which means that you must
587
traverse the subfeature arrays of each top-level feature object
588
in order to traverse all features associated with this sequence.
590
Use all_SeqFeatures() if you want the feature tree flattened into
593
Returns : An array of Bio::SeqFeatureI implementing objects.
599
sub top_SeqFeatures {
602
return @{$self->{'_as_feat'}};
605
=head2 all_SeqFeatures
607
Title : all_SeqFeatures
608
Usage : @feat_ary = $seq->all_SeqFeatures();
609
Function: Returns the tree of feature objects attached to this sequence
610
object flattened into one single array. Top-level features will
611
still contain their subfeature-arrays, which means that you
612
will encounter subfeatures twice if you traverse the subfeature
613
tree of the returned objects.
615
Use top_SeqFeatures() if you want the array to contain only the
618
Returns : An array of Bio::SeqFeatureI implementing objects.
624
sub all_SeqFeatures {
627
foreach my $feat ( $self->top_SeqFeatures() ){
629
&_retrieve_subSeqFeature(\@array,$feat);
637
Title : feature_count
638
Usage : $seq->feature_count()
639
Function: Return the number of SeqFeatures attached to a sequence
641
Returns : number of SeqFeatures
650
if (defined($self->{'_as_feat'})) {
651
return ($#{$self->{'_as_feat'}} + 1);
657
sub _retrieve_subSeqFeature {
658
my ($arrayref,$feat) = @_;
660
foreach my $sub ( $feat->sub_SeqFeature() ) {
661
push(@$arrayref,$sub);
662
&_retrieve_subSeqFeature($arrayref,$sub);
672
Function: Gets or sets the species
673
Example : $species = $self->species();
674
Returns : Bio::Species object
675
Args : Bio::Species object or none;
681
my ($self, $species) = @_;
683
$self->{'species'} = $species;
685
return $self->{'species'};
689
# keep AUTOLOAD happy