3
Bio::DB::GFF::Feature -- A relative segment identified by a feature type
11
Bio::DB::GFF::Feature is a stretch of sequence that corresponding to a
12
single annotation in a GFF database. It inherits from
13
Bio::DB::GFF::RelSegment, and so has all the support for relative
14
addressing of this class and its ancestors. It also inherits from
15
Bio::SeqFeatureI, and so has the familiar start(), stop(),
16
and primary_tag() methods.
18
Bio::DB::GFF::Feature adds new methods to retrieve the annotation's
19
type, group, and other GFF attributes. Annotation types are
20
represented by Bio::DB::GFF::Typename objects, a simple class that has
21
two methods called method() and source(). These correspond to the
22
method and source fields of a GFF file.
24
Annotation groups serve the dual purpose of giving the annotation a
25
human-readable name, and providing higher-order groupings of
26
subfeatures into features. The groups returned by this module are
27
objects of the Bio::DB::GFF::Featname class.
29
Bio::DB::GFF::Feature inherits from and implements the abstract
30
methods of Bio::SeqFeatureI, allowing it to interoperate with other
33
Generally, you will not create or manipulate Bio::DB::GFF::Feature
34
objects directly, but use those that are returned by the
35
Bio::DB::GFF::RelSegment-E<gt>features() method.
37
=head2 Important note about start() vs end()
39
If features are derived from segments that use relative addressing
40
(which is the default), then start() will be less than end() if the
41
feature is on the opposite strand from the reference sequence. This
42
breaks Bio::SeqI compliance, but is necessary to avoid having the real
43
genomic locations designated by start() and end() swap places when
44
changing reference points.
46
To avoid this behavior, call $segment-E<gt>absolute(1) before fetching
47
features from it. This will force everything into absolute
52
my $segment = $db->segment('CHROMOSOME_I');
53
$segment->absolute(1);
54
my @features = $segment->features('transcript');
58
The remainder of this document describes the public and private
59
methods implemented by this module.
63
package Bio::DB::GFF::Feature;
67
use Bio::DB::GFF::Util::Rearrange;
68
use Bio::DB::GFF::RelSegment;
69
use Bio::DB::GFF::Featname;
70
use Bio::DB::GFF::Typename;
71
use Bio::DB::GFF::Homol;
75
use vars qw($VERSION @ISA $AUTOLOAD);
76
@ISA = qw(Bio::DB::GFF::RelSegment Bio::SeqFeatureI Bio::Root::Root);
81
*segments = \&sub_SeqFeature;
82
my %CONSTANT_TAGS = (method=>1, source=>1, score=>1, phase=>1, notes=>1, id=>1, group=>1);
84
=head2 new_from_parent
86
Title : new_from_parent
87
Usage : $f = Bio::DB::GFF::Feature->new_from_parent(@args);
88
Function: create a new feature object
89
Returns : new Bio::DB::GFF::Feature object
93
This method is called by Bio::DB::GFF to create a new feature using
95
information obtained from the GFF database. It is one of two similar
96
constructors. This one is called when the feature is generated from a
97
RelSegment object, and should inherit that object's coordinate system.
99
The 13 arguments are positional (sorry):
101
$parent a Bio::DB::GFF::RelSegment object (or descendent)
102
$start start of this feature
103
$stop stop of this feature
104
$method this feature's GFF method
105
$source this feature's GFF source
106
$score this feature's score
107
$fstrand this feature's strand (relative to the source
108
sequence, which has its own strandedness!)
109
$phase this feature's phase
110
$group this feature's group (a Bio::DB::GFF::Featname object)
111
$db_id this feature's internal database ID
112
$group_id this feature's internal group database ID
113
$tstart this feature's target start
114
$tstop this feature's target stop
116
tstart and tstop aren't used for anything at the moment, since the
117
information is embedded in the group object.
121
# this is called for a feature that is attached to a parent sequence,
122
# in which case it inherits its coordinate reference system and strandedness
123
sub new_from_parent {
127
$method,$source,$score,
129
$group,$db_id,$group_id,
130
$tstart,$tstop) = @_;
132
($start,$stop) = ($stop,$start) if defined($fstrand) and $fstrand eq '-';
133
my $class = $group ? $group->class : $parent->class;
136
factory => $parent->{factory},
137
sourceseq => $parent->{sourceseq},
138
strand => $parent->{strand},
139
ref => $parent->{ref},
140
refstart => $parent->{refstart},
141
refstrand => $parent->{refstrand},
142
absolute => $parent->{absolute},
145
type => Bio::DB::GFF::Typename->new($method,$source),
151
group_id => $group_id,
160
Usage : $f = Bio::DB::GFF::Feature->new(@args);
161
Function: create a new feature object
162
Returns : new Bio::DB::GFF::Feature object
166
This method is called by Bio::DB::GFF to create a new feature using
167
information obtained from the GFF database. It is one of two similar
168
constructors. This one is called when the feature is generated
169
without reference to a RelSegment object, and should therefore use its
170
default coordinate system (relative to itself).
172
The 11 arguments are positional:
174
$factory a Bio::DB::GFF adaptor object (or descendent)
175
$srcseq the source sequence
176
$start start of this feature
177
$stop stop of this feature
178
$method this feature's GFF method
179
$source this feature's GFF source
180
$score this feature's score
181
$fstrand this feature's strand (relative to the source
182
sequence, which has its own strandedness!)
183
$phase this feature's phase
184
$group this feature's group
185
$db_id this feature's internal database ID
189
# 'This is called when creating a feature from scratch. It does not have
190
# an inherited coordinate system.
197
$score,$fstrand,$phase,
198
$group,$db_id,$group_id,
199
$tstart,$tstop) = @_;
201
my $self = bless { },$package;
202
($start,$stop) = ($stop,$start) if defined($fstrand) and $fstrand eq '-';
204
my $class = $group ? $group->class : 'Sequence';
206
@{$self}{qw(factory sourceseq start stop strand class)} =
207
($factory,$srcseq,$start,$stop,$fstrand,$class);
209
# if the target start and stop are defined, then we use this information to create
210
# the reference sequence
211
# THIS SHOULD BE BUILT INTO RELSEGMENT
212
if (0 && $tstart ne '' && $tstop ne '') {
213
if ($tstart < $tstop) {
214
@{$self}{qw(ref refstart refstrand)} = ($group,$start - $tstart + 1,'+');
216
@{$self}{'start','stop'} = @{$self}{'stop','start'};
217
@{$self}{qw(ref refstart refstrand)} = ($group,$tstop + $stop - 1,'-');
221
@{$self}{qw(ref refstart refstrand)} = ($srcseq,1,'+');
224
@{$self}{qw(type fstrand score phase group db_id group_id)} =
225
(Bio::DB::GFF::Typename->new($method,$source),$fstrand,$score,$phase,$group,$db_id,$group_id);
233
Usage : $type = $f->type([$newtype])
234
Function: get or set the feature type
235
Returns : a Bio::DB::GFF::Typename object
236
Args : a new Typename object (optional)
239
This method gets or sets the type of the feature. The type is a
240
Bio::DB::GFF::Typename object, which encapsulates the feature method
243
The method() and source() methods described next provide shortcuts to
244
the individual fields of the type.
250
my $d = $self->{type};
251
$self->{type} = shift if @_;
258
Usage : $method = $f->method([$newmethod])
259
Function: get or set the feature method
261
Args : a new method (optional)
264
This method gets or sets the feature method. It is a convenience
265
feature that delegates the task to the feature's type object.
271
my $d = $self->{type}->method;
272
$self->{type}->method(shift) if @_;
279
Usage : $source = $f->source([$newsource])
280
Function: get or set the feature source
282
Args : a new source (optional)
285
This method gets or sets the feature source. It is a convenience
286
feature that delegates the task to the feature's type object.
292
my $d = $self->{type}->source;
293
$self->{type}->source(shift) if @_;
300
Usage : $score = $f->score([$newscore])
301
Function: get or set the feature score
303
Args : a new score (optional)
306
This method gets or sets the feature score.
312
my $d = $self->{score};
313
$self->{score} = shift if @_;
320
Usage : $phase = $f->phase([$phase])
321
Function: get or set the feature phase
323
Args : a new phase (optional)
326
This method gets or sets the feature phase.
332
my $d = $self->{phase};
333
$self->{phase} = shift if @_;
340
Usage : $strand = $f->strand
341
Function: get the feature strand
346
Returns the strand of the feature. Unlike the other methods, the
347
strand cannot be changed once the object is created (due to coordinate
354
return 0 unless $self->{fstrand};
355
if ($self->absolute) {
356
return Bio::DB::GFF::RelSegment::_to_strand($self->{fstrand});
358
return $self->SUPER::strand;
359
# return 0 unless defined $self->{start};
360
# return $self->{start} < $self->{stop} ? '+1' : '-1';
366
Usage : $group = $f->group([$new_group])
367
Function: get or set the feature group
368
Returns : a Bio::DB::GFF::Featname object
369
Args : a new group (optional)
372
This method gets or sets the feature group. The group is a
373
Bio::DB::GFF::Featname object, which has an ID and a class.
379
my $d = $self->{group};
380
$self->{group} = shift if @_;
387
Usage : $info = $f->info([$new_info])
388
Function: get or set the feature group
389
Returns : a Bio::DB::GFF::Featname object
390
Args : a new group (optional)
393
This method is an alias for group(). It is provided for AcePerl
403
Usage : $target = $f->target([$new_target])
404
Function: get or set the feature target
405
Returns : a Bio::DB::GFF::Featname object
406
Args : a new group (optional)
409
This method works like group(), but only returns the group if it
410
implements the start() method. This is typical for
411
similarity/assembly features, where the target encodes the start and stop
412
location of the alignment.
418
my $group = $self->group or return;
419
return unless $group->can('start');
427
Function: get the feature ID
428
Returns : a database identifier
432
This method retrieves the database identifier for the feature. It
437
sub id { shift->{db_id} }
442
Usage : $id = $f->group_id
443
Function: get the feature ID
444
Returns : a database identifier
448
This method retrieves the database group identifier for the feature.
449
It cannot be changed. Often the group identifier is more useful than
450
the feature identifier, since it is used to refer to a complex object
455
sub group_id { shift->{group_id} }
460
Usage : $feature = $f->clone
461
Function: make a copy of the feature
462
Returns : a new Bio::DB::GFF::Feature object
466
This method returns a copy of the feature.
472
my $clone = $self->SUPER::clone;
474
if (ref(my $t = $clone->type)) {
475
my $type = $t->can('clone') ? $t->clone : bless {%$t},ref $t;
479
if (ref(my $g = $clone->group)) {
480
my $group = $g->can('clone') ? $g->clone : bless {%$g},ref $g;
481
$clone->group($group);
484
if (my $merged = $self->{merged_segs}) {
485
$clone->{merged_segs} = { %$merged };
492
=head2 sub_SeqFeature
494
Title : sub_SeqFeature
495
Usage : @feat = $feature->sub_SeqFeature([$method])
496
Function: get subfeatures
497
Returns : a list of Bio::DB::GFF::Feature objects
498
Args : a feature method (optional)
501
This method returns a list of any subfeatures that belong to the main
502
feature. For those features that contain heterogeneous subfeatures,
503
you can retrieve a subset of the subfeatures by providing a method
506
For AcePerl compatibility, this method may also be called as
514
my $subfeat = $self->{subfeatures} or return;
515
$self->sort_features;
517
my $features = $subfeat->{lc $type} or return;
520
return map {@{$_}} values %{$subfeat};
524
=head2 add_subfeature
526
Title : add_subfeature
527
Usage : $feature->add_subfeature($feature)
528
Function: add a subfeature to the feature
530
Args : a Bio::DB::GFF::Feature object
533
This method adds a new subfeature to the object. It is used
534
internally by aggregators, but is available for public use as well.
541
my $type = $feature->method;
542
my $subfeat = $self->{subfeatures}{lc $type} ||= [];
543
push @{$subfeat},$feature;
546
=head2 merged_segments
548
Title : merged_segments
549
Usage : @segs = $feature->merged_segments([$method])
550
Function: get merged subfeatures
551
Returns : a list of Bio::DB::GFF::Feature objects
552
Args : a feature method (optional)
555
This method acts like sub_SeqFeature, except that it merges
556
overlapping segments of the same time into contiguous features. For
557
those features that contain heterogeneous subfeatures, you can
558
retrieve a subset of the subfeatures by providing a method name to
561
A side-effect of this method is that the features are returned in
562
sorted order by their start tposition.
568
sub merged_segments {
571
$type ||= ''; # prevent uninitialized variable warnings
573
my $truename = overload::StrVal($self);
575
return @{$self->{merged_segs}{$type}} if exists $self->{merged_segs}{$type};
577
$a->start <=> $b->start
579
$a->type cmp $b->type
580
} $self->sub_SeqFeature($type);
582
# attempt to merge overlapping segments
585
my $previous = $merged[-1] if @merged;
586
if (defined($previous) && $previous->stop+1 >= $s->start){
587
$previous->{stop} = $s->{stop};
588
# fix up the target too
589
my $g = $previous->{group};
590
if ( ref($g) && $g->isa('Bio::DB::GFF::Homol')) {
591
my $cg = $s->{group};
592
$g->{stop} = $cg->{stop};
594
} elsif (defined($previous) && $previous->start == $s->start && $previous->stop == $s->stop) {
597
my $copy = $s->clone;
601
$self->{merged_segs}{$type} = \@merged;
608
Usage : @methods = $feature->sub_types
609
Function: get methods of all sub-seqfeatures
610
Returns : a list of method names
614
For those features that contain subfeatures, this method will return a
615
unique list of method names of those subfeatures, suitable for use
616
with sub_SeqFeature().
622
my $subfeat = $self->{subfeatures} or return;
623
return keys %$subfeat;
629
Usage : @attributes = $feature->attributes($name)
630
Function: get the "attributes" on a particular feature
631
Returns : an array of string
635
Some GFF version 2 files use the groups column to store a series of
636
attribute/value pairs. In this interpretation of GFF, the first such
637
pair is treated as the primary group for the feature; subsequent pairs
638
are treated as attributes. Two attributes have special meaning:
639
"Note" is for backward compatibility and is used for unstructured text
640
remarks. "Alias" is considered as a synonym for the feature name.
642
@gene_names = $feature->attributes('Gene');
643
@aliases = $feature->attributes('Alias');
645
If no name is provided, then attributes() returns a flattened hash, of
646
attribute=E<gt>value pairs. This lets you do:
648
%attributes = $db->attributes;
654
my $factory = $self->factory;
655
defined(my $id = $self->id) or return;
656
$factory->attributes($id,@_)
663
Usage : @notes = $feature->notes
664
Function: get the "notes" on a particular feature
665
Returns : an array of string
669
Some GFF version 2 files use the groups column to store various notes
670
and remarks. Adaptors can elect to store the notes in the database,
671
or just ignore them. For those adaptors that store the notes, the
672
notes() method will return them as a list.
678
$self->attributes('Note');
684
Usage : @aliases = $feature->aliases
685
Function: get the "aliases" on a particular feature
686
Returns : an array of string
690
This method will return a list of attributes of type 'Alias'.
696
$self->attributes('Alias');
701
=head2 Autogenerated Methods
704
Usage : @subfeat = $feature->Method
705
Function: Return subfeatures using autogenerated methods
706
Returns : a list of Bio::DB::GFF::Feature objects
710
Any method that begins with an initial capital letter will be passed
711
to AUTOLOAD and treated as a call to sub_SeqFeature with the method
712
name used as the method argument. For instance, this call:
714
@exons = $feature->Exon;
716
is equivalent to this call:
718
@exons = $feature->sub_SeqFeature('exon');
722
=head2 SeqFeatureI methods
724
The following Bio::SeqFeatureI methods are implemented:
726
primary_tag(), source_tag(), all_tags(), has_tag(), each_tag_value().
730
*primary_tag = \&method;
731
*source_tag = \&source;
734
my @tags = keys %CONSTANT_TAGS;
735
# autogenerated methods
736
if (my $subfeat = $self->{subfeatures}) {
737
push @tags,keys %$subfeat;
744
my %tags = map {$_=>1} $self->all_tags;
750
return $self->$tag() if $CONSTANT_TAGS{$tag};
752
return $self->$tag(); # try autogenerated tag
756
my($pack,$func_name) = $AUTOLOAD=~/(.+)::([^:]+)$/;
760
# ignore DESTROY calls
761
return if $func_name eq 'DESTROY';
763
# fetch subfeatures if func_name has an initial cap
764
# return sort {$a->start <=> $b->start} $self->sub_SeqFeature($func_name) if $func_name =~ /^[A-Z]/;
765
return $self->sub_SeqFeature($func_name) if $func_name =~ /^[A-Z]/;
767
# error message of last resort
768
$self->throw(qq(Can't locate object method "$func_name" via package "$pack"));
773
Title : adjust_bounds
774
Usage : $feature->adjust_bounds
775
Function: adjust the bounds of a feature
776
Returns : ($start,$stop,$strand)
780
This method adjusts the boundaries of the feature to enclose all its
781
subfeatures. It returns the new start, stop and strand of the
786
# adjust a feature so that its boundaries are synched with its subparts' boundaries.
787
# this works recursively, so subfeatures can contain other features
790
my $g = $self->{group};
792
if (my $subfeat = $self->{subfeatures}) {
793
for my $list (values %$subfeat) {
794
for my $feat (@$list) {
796
# fix up our bounds to hold largest subfeature
797
my($start,$stop,$strand) = $feat->adjust_bounds;
798
$self->{fstrand} = $strand unless defined $self->{fstrand};
799
if ($start <= $stop) {
800
$self->{start} = $start if !defined($self->{start}) || $start < $self->{start};
801
$self->{stop} = $stop if !defined($self->{stop}) || $stop > $self->{stop};
803
$self->{start} = $start if !defined($self->{start}) || $start > $self->{start};
804
$self->{stop} = $stop if !defined($self->{stop}) || $stop < $self->{stop};
807
# fix up endpoints of targets too (for homologies only)
808
my $h = $feat->group;
809
next unless $h && $h->isa('Bio::DB::GFF::Homol');
810
next unless $g && $g->isa('Bio::DB::GFF::Homol');
811
($start,$stop) = ($h->{start},$h->{stop});
812
if ($h->strand >= 0) {
813
$g->{start} = $start if !defined($g->{start}) || $start < $g->{start};
814
$g->{stop} = $stop if !defined($g->{stop}) || $stop > $g->{stop};
816
$g->{start} = $start if !defined($g->{start}) || $start > $g->{start};
817
$g->{stop} = $stop if !defined($g->{stop}) || $stop < $g->{stop};
823
($self->{start},$self->{stop},$self->strand);
828
my $d = $self->{aggregated};
829
$self->{aggregated} = shift if @_;
835
Title : sort_features
836
Usage : $feature->sort_features
837
Function: sort features
842
This method sorts subfeatures in ascending order by their start
843
position. For reverse strand features, it sorts subfeatures in
844
descending order. After this is called sub_SeqFeature will return the
847
This method is called internally by merged_segments().
854
return if $self->{sorted}++;
855
my $strand = $self->strand or return;
856
my $subfeat = $self->{subfeatures} or return;
857
for my $type (keys %$subfeat) {
858
$subfeat->{$type} = [sort {$a->start<=>$b->start} @{$subfeat->{$type}}] if $strand > 0;
859
$subfeat->{$type} = [sort {$b->start<=>$a->start} @{$subfeat->{$type}}] if $strand < 0;
866
Usage : $string = $feature->asString
867
Function: return human-readabled representation of feature
872
This method returns a human-readable representation of the feature and
873
is called by the overloaded "" operator.
879
my $type = $self->type;
880
my $name = $self->group;
881
return "$type($name)" if $name;
883
# my $type = $self->method;
884
# my $id = $self->group || 'unidentified';
885
# return join '/',$id,$type,$self->SUPER::asString;
890
return $self->group || $self->SUPER::name;
896
my ($start,$stop) = ($self->start,$self->stop);
898
# the defined() tests prevent uninitialized variable warnings, when dealing with clone objects
899
# whose endpoints may be undefined
900
($start,$stop) = ($stop,$start) if defined($start) && defined($stop) && $start > $stop;
902
my ($class,$name) = ('','');
904
if (my $t = $self->target) {
905
my $class = $t->class;
907
my $start = $t->start;
909
push @group,qq(Target "$class:$name" $start $stop);
912
elsif (my $g = $self->group) {
913
$class = $g->class || '';
914
$name = $g->name || '';
915
push @group,"$class $name";
917
push @group,map {qq(Note "$_")} $self->notes;
919
my $group_field = join ' ; ',@group;
920
my $strand = ('-','.','+')[$self->strand+1];
921
my $ref = $self->ref;
922
my $n = ref($ref) ? $ref->name : $ref;
923
my $phase = $self->phase;
924
$phase = '.' unless defined $phase;
925
return join("\t",$n,$self->source,$self->method,$start||'.',$stop||'.',$self->score||'.',$strand||'.',$phase,$group_field);
928
=head1 A Note About Similarities
930
The current default aggregator for GFF "similarity" features creates a
931
composite Bio::DB::GFF::Feature object of type "gapped_alignment".
932
The target() method for the feature as a whole will return a
933
RelSegment object that is as long as the extremes of the similarity
934
hit target, but will not necessarily be the same length as the query
935
sequence. The length of each "similarity" subfeature will be exactly
936
the same length as its target(). These subfeatures are essentially
937
the HSPs of the match.
939
The following illustrates this:
941
@similarities = $segment->feature('similarity:BLASTN');
942
$sim = $similarities[0];
944
print $sim->type; # yields "gapped_similarity:BLASTN"
946
$query_length = $sim->length;
947
$target_length = $sim->target->length; # $query_length != $target_length
949
@matches = $sim->Similarity; # use autogenerated method
950
$query1_length = $matches[0]->length;
951
$target1_length = $matches[0]->target->length; # $query1_length == $target1_length
953
If you merge segments by calling merged_segments(), then the length of
954
the query sequence segments will no longer necessarily equal the
955
length of the targets, because the alignment information will have
956
been lost. Nevertheless, the targets are adjusted so that the first
957
and last base pairs of the query match the first and last base pairs
966
This module is still under development.
970
L<bioperl>, L<Bio::DB::GFF>, L<Bio::DB::RelSegment>
974
Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
976
Copyright (c) 2001 Cold Spring Harbor Laboratory.
978
This library is free software; you can redistribute it and/or modify
979
it under the same terms as Perl itself.