1
# $Id: MapTiling.pm 16123 2009-09-17 12:57:27Z cjfields $
3
# BioPerl module for Bio::Search::Tiling::MapTiling
5
# Please direct questions and support issues to <bioperl-l@bioperl.org>
7
# Cared for by Mark A. Jensen <maj@fortinbras.us>
9
# Copyright Mark A. Jensen
11
# You may distribute this module under the same terms as perl itself
13
# POD documentation - main docs before the code
17
Bio::Search::Tiling::MapTiling - An implementation of an HSP tiling
18
algorithm, with methods to obtain frequently-requested statistics
22
# get a BLAST $hit from somewhere, then
23
$tiling = Bio::Search::Tiling::MapTiling->new($hit);
26
$numID = $tiling->identities();
27
$numCons = $tiling->conserved();
28
$query_length = $tiling->length('query');
29
$subject_length = $tiling->length('subject'); # or...
30
$subject_length = $tiling->length('hit');
32
# get a visual on the coverage map
33
print $tiling->coverage_map_as_text('query',$context,'LEGEND');
36
$context = $tiling->_context( -type => 'subject', -strand=> 1, -frame=>1);
37
@covering_hsps_for_subject = $tiling->next_tiling('subject',$context);
38
$context = $tiling->_context( -type => 'query', -strand=> -1, -frame=>0);
39
@covering_hsps_for_query = $tiling->next_tiling('query', $context);
43
Frequently, users want to use a set of high-scoring pairs (HSPs)
44
obtained from a BLAST or other search to assess the overall level of
45
identity, conservation, or coverage represented by matches between a
46
subject and a query sequence. Because a set of HSPs frequently
47
describes multiple overlapping sequence fragments, a simple summation of
48
statistics over the HSPs will generally overestimate those
49
statistics. To obtain an accurate estimate of global hit statistics, a
50
'tiling' of HSPs onto either the subject or the query sequence must be
51
performed, in order to properly correct for this.
53
This module will execute a tiling algorithm on a given hit based on an
54
interval decomposition I'm calling the "coverage map". Internal object
55
methods compute the various statistics, which are then stored in
56
appropriately-named public object attributes. See
57
L<Bio::Search::Tiling::MapTileUtils> for more info on the algorithm.
59
=head2 STRAND/FRAME CONTEXTS
61
In BLASTX, TBLASTN, and TBLASTX reports, strand and frame information
62
are reported for the query, subject, or query and subject,
63
respectively, for each HSP. Tilings for these sequence types are only
64
meaningful when they include HSPs in the same strand and frame, or
65
"context". So, in these situations, the context must be specified
66
in the method calls or the methods will throw.
68
Contexts are specified as strings: C<[ 'all' | [m|p][_|0|1|2] ]>, where
69
C<all> = all HSPs (will throw if context must be specified), C<m> = minus
70
strand, C<p> = plus strand, and C<_> = no frame info, C<0,1,2> = respective
71
(absolute) frame. The L</_make_context_key> method will convert a (strand,
72
frame) specification to a context string, e.g.:
74
$context = $self->_context(-type=>'query', -strand=>-1, -frame=>-2);
78
The contexts present among the HSPs in a hit are identified and stored
79
for convenience upon object construction. These are accessed off the
80
object with the L</contexts> method. If contexts don't apply for the
81
given report, this returns C<('all')>.
85
The major calculations are made just-in-time, and then memoized. So,
86
for example, for a given MapTiling object, a coverage map would
87
usually be calculated only once (for the query), and at most twice (if
88
the subject perspective is also desired), and then only when a
89
statistic is first accessed. Afterward, the map and/or any statistic
90
is read from storage. So feel free to call the statistic methods
91
frequently if it suits you.
97
User feedback is an integral part of the evolution of this and other
98
Bioperl modules. Send your comments and suggestions preferably to
99
the Bioperl mailing list. Your participation is much appreciated.
101
bioperl-l@bioperl.org - General discussion
102
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
106
Please direct usage questions or support issues to the mailing list:
108
I<bioperl-l@bioperl.org>
110
rather than to the module maintainer directly. Many experienced and
111
reponsive experts will be able look at the problem and quickly
112
address it. Please include a thorough description of the problem
113
with code and data examples if at all possible.
115
=head2 Reporting Bugs
117
Report bugs to the Bioperl bug tracking system to help us keep track
118
of the bugs and their resolution. Bug reports can be submitted via
121
http://bugzilla.open-bio.org/
123
=head1 AUTHOR - Mark A. Jensen
125
Email maj -at- fortinbras -dot- us
129
The rest of the documentation details each of the object methods.
130
Internal methods are usually preceded with a _
134
# Let the code begin...
136
package Bio::Search::Tiling::MapTiling;
140
# Object preamble - inherits from Bio::Root::Root
144
use Bio::Search::Tiling::TilingI;
145
use Bio::Search::Tiling::MapTileUtils;
147
# use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI);
148
use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI);
155
Usage : my $obj = new Bio::Search::Tiling::GenericTiling();
156
Function: Builds a new Bio::Search::Tiling::GenericTiling object
157
Returns : an instance of Bio::Search::Tiling::GenericTiling
158
Args : -hit => $a_Bio_Search_Hit_HitI_object
159
general filter function:
160
-hsp_filter => sub { my $this_hsp = shift;
170
my $self = $class->SUPER::new(@args);
171
my($hit, $filter) = $self->_rearrange( [qw( HIT HSP_FILTER)],@args );
173
$self->throw("HitI object required") unless $hit;
174
$self->throw("Argument must be HitI object") unless ( ref $hit && $hit->isa('Bio::Search::Hit::HitI') );
176
$self->_set_attributes();
177
$self->{"_algorithm"} = $hit->algorithm;
179
my @hsps = $hit->hsps;
180
# apply filter function if requested
181
if ( defined $filter ) {
182
if ( ref($filter) eq 'CODE' ) {
183
@hsps = map { $filter->($_) ? $_ : () } @hsps;
186
$self->warn("-filter is not a coderef; ignoring");
190
# identify available contexts
191
for my $t qw( query hit ) {
193
for my $i (0..$#hsps) {
194
my $ctxt = $self->_context(
196
-strand => $hsps[$i]->strand($t),
197
-frame => $hsps[$i]->frame($t));
198
$contexts{$ctxt} ||= [];
199
push @{$contexts{$ctxt}}, $i;
201
$self->{"_contexts_${t}"} = \%contexts;
204
$self->warn("No HSPs present in hit after filtering") unless (@hsps);
209
# a tiling is based on the set of hsps contained in a single hit.
210
# check all the boundaries - zero hsps, one hsp, all disjoint hsps
212
=head1 TILING ITERATORS
217
Usage : @hsps = $self->next_tiling($type);
218
Function: Obtain a tiling: a minimal set of HSPs covering the $type
219
('hit', 'subject', 'query') sequence
221
Returns : an array of HSPI objects
222
Args : scalar $type: one of 'hit', 'subject', 'query', with
223
'subject' an alias for 'hit'
229
my ($type, $context) = @_;
230
$self->_check_type_arg(\$type);
231
$self->_check_context_arg($type, \$context);
232
return $self->_tiling_iterator($type, $context)->();
235
=head2 rewind_tilings
237
Title : rewind_tilings
238
Usage : $self->rewind_tilings($type)
239
Function: Reset the next_tilings($type) iterator
241
Returns : True on success
242
Args : scalar $type: one of 'hit', 'subject', 'query';
249
my ($type,$context) = @_;
250
$self->_check_type_arg(\$type);
251
$self->_check_context_arg($type, \$context);
252
return $self->_tiling_iterator($type, $context)->('REWIND');
260
Usage : $tiling->identities($type, $action, $context)
261
Function: Retrieve the calculated number of identities for the invocant
263
Returns : value of identities (a scalar)
264
Args : scalar $type: one of 'hit', 'subject', 'query'
266
option scalar $action: one of 'exact', 'est', 'fast', 'max'
268
option scalar $context: strand/frame context string
275
my ($type, $action, $context) = @_;
276
$self->_check_type_arg(\$type);
277
$self->_check_action_arg(\$action);
278
$self->_check_context_arg($type, \$context);
279
if (!defined $self->{"identities_${type}_${action}_${context}"}) {
280
$self->_calc_stats($type, $action, $context);
282
return $self->{"identities_${type}_${action}_${context}"};
288
Usage : $tiling->conserved($type, $action)
289
Function: Retrieve the calculated number of conserved sites for the invocant
291
Returns : value of conserved (a scalar)
292
Args : scalar $type: one of 'hit', 'subject', 'query'
294
option scalar $action: one of 'exact', 'est', 'fast', 'max'
296
option scalar $context: strand/frame context string
303
my ($type, $action, $context) = @_;
304
$self->_check_type_arg(\$type);
305
$self->_check_action_arg(\$action);
306
$self->_check_context_arg($type, \$context);
307
if (!defined $self->{"conserved_${type}_${action}_${context}"}) {
308
$self->_calc_stats($type, $action, $context);
310
return $self->{"conserved_${type}_${action}_${context}"};
316
Usage : $tiling->length($type, $action, $context)
317
Function: Retrieve the total length of aligned residues for
320
Returns : value of length (a scalar)
321
Args : scalar $type: one of 'hit', 'subject', 'query'
323
option scalar $action: one of 'exact', 'est', 'fast', 'max'
325
option scalar $context: strand/frame context string
332
my ($type,$action,$context) = @_;
333
$self->_check_type_arg(\$type);
334
$self->_check_action_arg(\$action);
335
$self->_check_context_arg($type, \$context);
336
if (!defined $self->{"length_${type}_${action}_${context}"}) {
337
$self->_calc_stats($type, $action, $context);
339
return $self->{"length_${type}_${action}_${context}"};
345
Usage : $tiling->frac($type, $denom, $action, $context, $method)
346
Function: Return the fraction of sequence length consisting
347
of desired kinds of pairs (given by $method),
348
with respect to $denom
349
Returns : scalar float
350
Args : -type => one of 'hit', 'subject', 'query'
351
-denom => one of 'total', 'aligned'
352
-action => one of 'exact', 'est', 'fast', 'max'
353
-context => strand/frame context string
354
-method => one of 'identical', 'conserved'
355
Note : $denom == 'aligned', return desired_stat/num_aligned
356
$denom == 'total', return desired_stat/_reported_length
357
(i.e., length of the original input sequences)
358
Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
359
reported lengths of translated dna are reduced by
360
a factor of 3, to provide fractions relative to
361
amino acid coordinates.
368
my ($type, $denom, $action, $context, $method) = $self->_rearrange([qw(TYPE DENOM ACTION CONTEXT METHOD)],@args);
369
$self->_check_type_arg(\$type);
370
$self->_check_action_arg(\$action);
371
$self->_check_context_arg($type, \$context);
372
unless ($method and grep(/^$method$/, qw( identical conserved ))) {
373
$self->throw("-method must specified; one of ('identical', 'conserved')");
376
unless (grep /^$denom/, qw( total aligned )) {
377
$self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'");
379
my $key = "frac_${method}_${type}_${denom}_${action}_${context}";
382
$_ eq 'identical' && do {
383
$stat = $self->identities($type, $action, $context);
386
$_ eq 'conserved' && do {
387
$stat = $self->conserved($type, $action, $context);
391
$self->throw("What are YOU doing here?");
394
if (!defined $self->{$key}) {
398
$stat/$self->_reported_length($type); # need fudge fac??
403
$stat/$self->length($type,$action,$context);
407
$self->throw("What are YOU doing here?");
411
return $self->{$key};
414
=head2 frac_identical
416
Title : frac_identical
417
Usage : $tiling->frac_identical($type, $denom, $action, $context)
418
Function: Return the fraction of sequence length consisting
419
of identical pairs, with respect to $denom
420
Returns : scalar float
421
Args : -type => one of 'hit', 'subject', 'query'
422
-denom => one of 'total', 'aligned'
423
-action => one of 'exact', 'est', 'fast', 'max'
424
-context => strand/frame context string
425
Note : $denom == 'aligned', return conserved/num_aligned
426
$denom == 'total', return conserved/_reported_length
427
(i.e., length of the original input sequences)
428
Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
429
reported lengths of translated dna are reduced by
430
a factor of 3, to provide fractions relative to
431
amino acid coordinates.
432
Note : This an alias that calls frac()
439
my ($type, $denom, $action,$context) = $self->_rearrange( [qw[ TYPE DENOM ACTION CONTEXT]],@args );
440
$self->frac( -type=>$type, -denom=>$denom, -action=>$action, -method=>'identical', -context=>$context);
443
=head2 frac_conserved
445
Title : frac_conserved
446
Usage : $tiling->frac_conserved($type, $denom, $action, $context)
447
Function: Return the fraction of sequence length consisting
448
of conserved pairs, with respect to $denom
449
Returns : scalar float
450
Args : -type => one of 'hit', 'subject', 'query'
451
-denom => one of 'total', 'aligned'
452
-action => one of 'exact', 'est', 'fast', 'max'
453
-context => strand/frame context string
454
Note : $denom == 'aligned', return conserved/num_aligned
455
$denom == 'total', return conserved/_reported_length
456
(i.e., length of the original input sequences)
457
Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
458
reported lengths of translated dna are reduced by
459
a factor of 3, to provide fractions relative to
460
amino acid coordinates.
461
Note : This an alias that calls frac()
468
my ($type, $denom, $action, $context) = $self->_rearrange( [qw[ TYPE DENOM ACTION CONTEXT]],@args );
469
$self->frac( -type=>$type, -denom=>$denom, -action=>$action, -context=>$context, -method=>'conserved');
475
Aliases : frac_aligned_query - frac_aligned(-type=>'query',...)
476
frac_aligned_hit - frac_aligned(-type=>'hit',...)
477
Usage : $tiling->frac_aligned(-type=>$type,
480
Function: Return the fraction of input sequence length
481
that was aligned by the algorithm
482
Returns : scalar float
483
Args : -type => one of 'hit', 'subject', 'query'
484
-action => one of 'exact', 'est', 'fast', 'max'
485
-context => strand/frame context string
490
my ($self, @args) = @_;
491
my ($type, $action, $context) = $self->_rearrange([qw(TYPE ACTION CONTEXT)],@args);
492
$self->_check_type_arg(\$type);
493
$self->_check_action_arg(\$action);
494
$self->_check_context_arg($type, \$context);
495
if (!$self->{"frac_aligned_${type}_${action}_${context}"}) {
496
$self->{"frac_aligned_${type}_${action}_${context}"} = $self->num_aligned($type,$action,$context)/$self->_reported_length($type);
498
return $self->{"frac_aligned_${type}_${action}_${context}"};
501
sub frac_aligned_query { shift->frac_aligned(-type=>'query', @_) }
502
sub frac_aligned_hit { shift->frac_aligned(-type=>'hit', @_) }
508
Usage : $tiling->num_aligned(-type=>$type)
509
Function: Return the number of residues of sequence $type
510
that were aligned by the algorithm
512
Args : -type => one of 'hit', 'subject', 'query'
513
-action => one of 'exact', 'est', 'fast', 'max'
514
-context => strand/frame context string
515
Note : Since this is calculated from reported coordinates,
516
not symbol string counts, it is already in terms of
518
Note : Aliases length()
522
sub num_aligned { shift->length( @_ ) };
526
Title : num_unaligned
527
Usage : $tiling->num_unaligned(-type=>$type)
528
Function: Return the number of residues of sequence $type
529
that were left unaligned by the algorithm
531
Args : -type => one of 'hit', 'subject', 'query'
532
-action => one of 'exact', 'est', 'fast', 'max'
533
-context => strand/frame context string
534
Note : Since this is calculated from reported coordinates,
535
not symbol string counts, it is already in terms of
542
my ($type,$action,$context) = @_;
544
$self->_check_type_arg(\$type);
545
$self->_check_action_arg(\$action);
546
$self->_check_context_arg($type, \$context);
547
if (!defined $self->{"num_unaligned_${type}_${action}_${context}"}) {
548
$self->{"num_unaligned_${type}_${action}_${context}"} = $self->_reported_length($type)-$self->num_aligned($type,$action,$context);
550
return $self->{"num_unaligned_${type}_${action}_${context}"};
557
Usage : $tiling->range(-type=>$type)
558
Function: Returns the extent of the longest tiling
559
as ($min_coord, $max_coord)
560
Returns : array of two scalar integers
561
Args : -type => one of 'hit', 'subject', 'query'
562
-context => strand/frame context string
567
my ($self, $type, $context) = @_;
568
$self->_check_type_arg(\$type);
569
$self->_check_context_arg($type, \$context);
570
my @a = $self->_contig_intersection($type,$context);
571
return ($a[0][0], $a[-1][1]);
581
Usage : $map = $tiling->coverage_map($type)
582
Function: Property to contain the coverage map calculated
583
by _calc_coverage_map() - see that for
586
Returns : value of coverage_map_$type as an array
587
Args : scalar $type: one of 'hit', 'subject', 'query'
595
my ($type, $context) = @_;
596
$self->_check_type_arg(\$type);
597
$self->_check_context_arg($type, \$context);
599
if (!defined $self->{"coverage_map_${type}_${context}"}) {
600
# following calculates coverage maps in all strands/frames
602
$self->_calc_coverage_map($type, $context);
604
# if undef is returned, then there were no hsps for given strand/frame
605
if (!defined $self->{"coverage_map_${type}_${context}"}) {
606
$self->warn("No HSPS present for type '$type' in context '$context' for this hit");
609
return @{$self->{"coverage_map_${type}_${context}"}};
612
=head2 coverage_map_as_text
614
Title : coverage_map_as_text
615
Usage : $tiling->coverage_map_as_text($type, $legend_flag)
616
Function: Format a text-graphic representation of the
618
Returns : an array of scalar strings, suitable for printing
619
Args : $type: one of 'query', 'hit', 'subject'
620
$context: strand/frame context string
621
$legend_flag: boolean; add a legend indicating
622
the actual interval coordinates for each component
623
interval and hsp (in the $type sequence context)
624
Example : print $tiling->coverage_map_as_text('query',1);
628
sub coverage_map_as_text{
630
my ($type, $context, $legend_q) = @_;
631
$self->_check_type_arg(\$type);
632
$self->_check_context_arg($type, \$context);
634
my @map = $self->coverage_map($type, $context);
636
my @hsps = $self->hit->hsps;
638
require Tie::RefHash;
639
tie %hsps_i, 'Tie::RefHash';
640
@hsps_i{@hsps} = (0..$#hsps);
643
my @hspx = ('') x @hsps;
644
my @these_hsps = @{$map[$_]->[1]};
645
@hspx[@hsps_i{@these_hsps}] = ('*') x @these_hsps;
650
push @ret, "\tIntvl\n";
651
push @ret, "HSPS\t", join ("\t", (0..$#map)), "\n";
652
foreach my $h (0..$#hsps) {
653
push @ret, join("\t", $h, map { $mx[$_][$h] } (0..$#map) ),"\n";
656
push @ret, "Interval legend\n";
658
push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$map[$_][0]});
660
push @ret, "HSP legend\n";
661
my @ints = get_intervals_from_hsps($type,@hsps);
662
foreach (0..$#hsps) {
663
push @ret, sprintf("%d\t[%d, %d]\n", $_, @{$ints[$_]});
675
Returns : The HitI object associated with the invocant
683
$self->warn("Getter only") if @_;
684
return $self->{'hit'};
690
Usage : $tiling->hsps()
691
Function: Container for the HSP objects associated with invocant
693
Returns : an array of hsps associated with the hit
694
Args : on set, new value (an arrayref or undef, optional)
700
return $self->{'hsps'} = shift if @_;
701
return @{$self->{'hsps'}};
707
Usage : @contexts = $tiling->context($type) or
708
@indices = $tiling->context($type, $context)
709
Function: Retrieve the set of available contexts in the hit,
710
or the indices of hsps having the given context
711
(integer indices for the array returned by $self->hsps)
712
Returns : array of scalar context strings or
713
array of scalar positive integers
714
undef if no hsps in given context
715
Args : $type: one of 'query', 'hit', 'subject'
716
optional $context: context string
722
my ($type, $context) = @_;
723
$self->_check_type_arg(\$type);
724
return keys %{$self->{"_contexts_$type"}} unless defined $context;
725
return undef unless $self->{"_contexts_$type"}{$context};
726
return @{$self->{"_contexts_$type"}{$context}};
732
Usage : $tiling->mapping($type)
733
Function: Retrieve the mapping coefficient for the sequence type
734
based on the underlying algorithm
735
Returns : scalar integer (mapping coefficient)
736
Args : $type: one of 'query', 'hit', 'subject'
737
Note : getter only (set in constructor)
744
$self->_check_type_arg(\$type);
745
return $self->{"_mapping_${type}"};
748
=head2 default_context
750
Title : default_context
751
Usage : $tiling->default_context($type)
752
Function: Retrieve the default strand/frame context string
753
for the sequence type based on the underlying algorithm
754
Returns : scalar string (context string)
755
Args : $type: one of 'query', 'hit', 'subject'
756
Note : getter only (set in constructor)
763
$self->_check_type_arg(\$type);
764
return $self->{"_def_context_${type}"};
770
Usage : $tiling->algorithm
771
Function: Retrieve the algorithm name associated with the
772
invocant's hit object
773
Returns : scalar string
775
Note : getter only (set in constructor)
781
$self->warn("Getter only") if @_;
782
return $self->{"_algorithm"};
785
=head1 "PRIVATE" METHODS
789
See L<Bio::Search::Tiling::MapTileUtils> for lower level
792
=head2 _calc_coverage_map
794
Title : _calc_coverage_map
795
Usage : $tiling->_calc_coverage_map($type)
796
Function: Calculates the coverage map for the object's associated
797
hit from the perspective of the desired $type (see Args:)
798
and sets the coverage_map() property
799
Returns : True on success
800
Args : optional scalar $type: one of 'hit'|'subject'|'query'
802
Note : The "coverage map" is an array with the following format:
803
( [ $component_interval => [ @containing_hsps ] ], ... ),
804
where $component_interval is a closed interval (see
805
DESCRIPTION) of the form [$a0, $a1] with $a0 <= $a1, and
806
@containing_hsps is an array of all HspI objects in the hit
807
which completely contain the $component_interval.
808
The set of $component_interval's is a disjoint decomposition
809
of the minimum set of minimal intervals that completely
810
cover the hit's HSPs (from the perspective of the $type)
811
Note : This calculates the map for all strand/frame contexts available
816
sub _calc_coverage_map {
819
$self->_check_type_arg(\$type);
821
# obtain the [start, end] intervals for all hsps in the hit (relative
823
unless ($self->{'hsps'}) {
824
$self->warn("No HSPs for this hit");
828
my (@map, @hsps, %filters, @intervals);
832
my $c = $self->mapping($type);
834
# create the possible maps
835
for my $context ($self->contexts($type)) {
837
@hsps = ($self->hsps)[$self->contexts($type, $context)];
838
@intervals = get_intervals_from_hsps( $type, @hsps );
840
my $f = ($intervals[0]->[0] - 1) % $c;
842
# convert interval endpoints...
844
$$_[0] = ($$_[0] - $f + $c - 1)/$c;
845
$$_[1] = ($$_[1] - $f)/$c;
848
# determine the minimal set of disjoint intervals that cover the
849
# set of hsp intervals
850
my @dj_set = interval_tiling(\@intervals);
852
# decompose each disjoint interval into another set of disjoint
853
# intervals, each of which is completely contained within the
854
# original hsp intervals with which it overlaps
857
for my $dj_elt (@dj_set) {
858
my ($covering, $indices) = @$dj_elt;
859
my @covering_hsps = @hsps[@$indices];
860
my @coverers = @intervals[@$indices];
861
@decomp = decompose_interval( \@coverers );
863
my ($component, $container_indices) = @{$_};
864
push @map, [ $component,
865
[@covering_hsps[@$container_indices]] ];
870
# unconvert the components:
873
$$_[0][0] = $c*$$_[0][0] - $c + 1 + $f;
874
$$_[0][1] = $c*$$_[0][1] + $f;
877
$$_[0][0] = $c*$$_[0][0] - $c + 1 + $f;
878
$$_[0][1] = $c*$$_[0][1] + $f;
881
# sort the map on the interval left-ends
882
@map = sort { $a->[0][0]<=>$b->[0][0] } @map;
883
$self->{"coverage_map_${type}_${context}"} = [@map];
884
# set the _contig_intersection attribute here (side effect)
885
$self->{"_contig_intersection_${type}_${context}"} = [map { $$_[0] } @dj_set];
894
Usage : $tiling->_calc_stats($type, $action, $context)
895
Function: Calculates [estimated] tiling statistics (identities, conserved sites
896
length) and sets the public accessors
897
Returns : True on success
898
Args : scalar $type: one of 'hit', 'subject', 'query'
900
optional scalar $action: requests calculation method
901
currently one of 'exact', 'est', 'fast', 'max'
902
option scalar $context: strand/frame context string
903
Note : Action: The statistics are calculated by summing quantities
904
over the disjoint component intervals, taking into account
905
coverage of those intervals by multiple HSPs. The action
906
tells the algorithm how to obtain those quantities--
907
'exact' will use Bio::Search::HSP::HSPI::matches
908
to count the appropriate segment of the homology string;
909
'est' will estimate the statistics by multiplying the
910
fraction of the HSP overlapped by the component interval
911
(see MapTileUtils) by the BLAST-reported identities/postives
912
(this may be convenient for BLAST summary report formats)
913
* Both exact and est take the average over the number of HSPs
914
that overlap the component interval.
915
'max' uses the exact method to calculate the statistics,
916
and returns only the maximum identites/positives over
917
overlapping HSP for the component interval. No averaging
919
'fast' doesn't involve tiling at all (hence the name),
920
but it seems like a very good estimate, and uses only
921
reported values, and so does not require sequence data. It
922
calculates an average of reported identities, conserved
923
sites, and lengths, over unmodified hsps in the hit,
924
weighted by the length of the hsps.
930
my ($type, $action, $context) = @_;
931
# need to check args here, in case method is called internally.
932
$self->_check_type_arg(\$type);
933
$self->_check_action_arg(\$action);
934
$self->_check_context_arg($type, \$context);
935
my ($ident, $cons, $length) = (0,0,0);
937
# fast : avoid coverage map altogether, get a pretty damn
938
# good estimate with a weighted average of reported hsp
941
($action eq 'fast') && do {
942
my @hsps = $self->hit->hsps;
943
@hsps = @hsps[$self->contexts($type, $context)];
944
# weights for averages
945
my @wt = map {$_->length($type)} @hsps;
946
my $sum = eval( join('+',@wt) );
947
$_ /= $sum for (@wt);
950
$ident += $wt*$_->matches_MT($type,'identities');
951
$cons += $wt*$_->matches_MT($type,'conserved');
952
$length += $wt*$_->length($type);
958
# calculate identities/conserved sites in tiling
959
# estimate based on the fraction of the component interval covered
960
# and ident/cons reported by the HSPs
961
($action ne 'fast') && do {
962
foreach ($self->coverage_map($type, $context)) {
963
my ($intvl, $hsps) = @{$_};
964
my $len = ($$intvl[1]-$$intvl[0]+1);
965
my $ncover = ($action eq 'max') ? 1 : scalar @$hsps;
966
my ($acc_i, $acc_c) = (0,0);
967
foreach my $hsp (@$hsps) {
969
($_ eq 'est') && do {
970
my ($inc_i, $inc_c) = $hsp->matches_MT(
972
-action => 'searchutils',
974
my $frac = $len/$hsp->length($type);
975
$acc_i += $inc_i * $frac;
976
$acc_c += $inc_c * $frac;
979
($_ eq 'max') && do {
980
my ($inc_i, $inc_c) = $hsp->matches_MT(
982
-action => 'searchutils',
983
-start => $$intvl[0],
986
$acc_i = ($acc_i > $inc_i) ? $acc_i : $inc_i;
987
$acc_c = ($acc_c > $inc_c) ? $acc_c : $inc_c;
990
(!$_ || ($_ eq 'exact')) && do {
991
my ($inc_i, $inc_c) = $hsp->matches_MT(
993
-action => 'searchutils',
994
-start => $$intvl[0],
1003
$ident += ($acc_i/$ncover);
1004
$cons += ($acc_c/$ncover);
1009
$self->{"identities_${type}_${action}_${context}"} = $ident;
1010
$self->{"conserved_${type}_${action}_${context}"} = $cons;
1011
$self->{"length_${type}_${action}_${context}"} = $length;
1016
=head2 Tiling Helper Methods
1020
# coverage_map is of the form
1021
# ( [ $interval, \@containing_hsps ], ... )
1023
# so, for each interval, pick one of the containing hsps,
1024
# and return the union of all the picks.
1026
# use the combinatorial generating iterator, with
1027
# the urns containing the @containing_hsps for each
1030
=head2 _make_tiling_iterator
1032
Title : _make_tiling_iterator
1033
Usage : $self->_make_tiling_iterator($type)
1034
Function: Create an iterator code ref that will step through all
1035
minimal combinations of HSPs that produce complete coverage
1036
of the $type ('hit', 'subject', 'query') sequence,
1037
and set the correct iterator property of the invocant
1039
Returns : True on success
1040
Args : scalar $type, one of 'hit', 'subject', 'query';
1045
sub _make_tiling_iterator {
1048
my ($type, $context) = @_;
1049
$self->_check_type_arg(\$type);
1050
$self->_check_context_arg($type, \$context);
1052
# initialize the urns
1053
my @urns = map { [0, $$_[1]] } $self->coverage_map($type, $context);
1058
if (my $rewind = shift) {
1059
# reinitialize urn indices
1060
$$_[0] = 0 for (@urns);
1065
return if $FINISHED;
1067
my $finished_incrementing = 0;
1068
# @ret is the collector of urn choices
1071
for my $urn (@urns) {
1072
my ($n, $hsps) = @$urn;
1073
push @ret, $$hsps[$n];
1074
unless ($finished_incrementing) {
1075
if ($n == $#$hsps) { $$urn[0] = 0; }
1076
else { ($$urn[0])++; $finished_incrementing = 1 }
1081
$FINISHED = 1 unless $finished_incrementing;
1083
# $hsp->rank is a unique identifier for an hsp in a hit.
1084
# preserve order in @ret
1087
@order{(0..$#ret)} = @ret;
1088
$uniq{$order{$_}->rank} = $_ for (0..$#ret);
1089
@ret = @order{ sort {$a<=>$b} values %uniq };
1094
$self->{"_tiling_iterator_${type}_${context}"} = $iter;
1098
=head2 _tiling_iterator
1100
Title : _tiling_iterator
1101
Usage : $tiling->_tiling_iterator($type,$context)
1102
Function: Retrieve the tiling iterator coderef for the requested
1103
$type ('hit', 'subject', 'query')
1105
Returns : coderef to the desired iterator
1106
Args : scalar $type, one of 'hit', 'subject', 'query'
1108
option scalar $context: strand/frame context string
1113
sub _tiling_iterator {
1115
my ($type, $context) = @_;
1116
$self->_check_type_arg(\$type);
1117
$self->_check_context_arg($type, \$context);
1119
if (!defined $self->{"_tiling_iterator_${type}_${context}"}) {
1120
$self->_make_tiling_iterator($type,$context);
1122
return $self->{"_tiling_iterator_${type}_${context}"};
1125
=head2 Construction Helper Methods
1127
See also L<Bio::Search::Tiling::MapTileUtils>.
1131
sub _check_type_arg {
1133
my $typeref = shift;
1134
$$typeref ||= 'query';
1135
$self->throw("Unknown type '$$typeref'") unless grep(/^$$typeref$/, qw( hit query subject ));
1136
$$typeref = 'hit' if $$typeref eq 'subject';
1140
sub _check_action_arg {
1142
my $actionref = shift;
1144
$$actionref = ($self->_has_sequence_data ? 'exact' : 'est');
1147
$self->throw("Calc action '$$actionref' unrecognized") unless grep /^$$actionref$/, qw( est exact fast max );
1148
if ($$actionref ne 'est' and !$self->_has_sequence_data) {
1149
$self->warn("Blast file did not possess sequence data; defaulting to 'est' action");
1150
$$actionref = 'est';
1156
sub _check_context_arg {
1158
my ($type, $contextref) = @_;
1159
if (!$$contextref) {
1160
$self->throw("Type '$type' requires strand/frame context for algorithm ".$self->algorithm) unless ($self->mapping($type) == 1);
1161
# set default according to default_context attrib
1162
$$contextref = $self->default_context($type);
1165
($$contextref =~ /^[mp]$/) && do { $$contextref .= '_' };
1166
$self->throw("Context '$$contextref' unrecognized") unless
1167
$$contextref =~ /all|[mp][0-2_]/;
1172
=head2 _make_context_key
1174
Title : _make_context_key
1176
Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame)
1177
Function: create a string indicating strand/frame context; serves as
1178
component of memoizing hash keys
1179
Returns : scalar string
1180
Args : -type => one of ('query', 'hit', 'subject')
1181
-strand => one of (1,0,-1)
1182
-frame => one of (-2, 1, 0, 1, -2)
1183
called w/o args: returns 'all'
1187
sub _make_context_key {
1190
my ($type, $strand, $frame) = $self->_rearrange([qw(TYPE STRAND FRAME)], @args);
1191
_check_type_arg(\$type);
1192
return 'all' unless (defined $strand or defined $frame);
1193
if ( defined $strand && $self->_has_strand($type) ) {
1194
if (defined $frame && $self->_has_frame($type)) {
1195
return ($strand >= 0 ? 'p' : 'm').abs($frame);
1198
return ($strand >= 0 ? 'p_' : 'm_');
1202
if (defined $frame && $self->_has_frame($type)) {
1203
$self->warn("Frame defined without strand; punting with plus strand");
1204
return 'p'.abs($frame);
1215
Alias : _make_context_key
1216
Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame)
1217
Function: create a string indicating strand/frame context; serves as
1218
component of memoizing hash keys
1219
Returns : scalar string
1220
Args : -type => one of ('query', 'hit', 'subject')
1221
-strand => one of (1,0,-1)
1222
-frame => one of (-2, 1, 0, 1, -2)
1223
called w/o args: returns 'all'
1227
sub _context { shift->_make_context_key(@_) }
1231
Most based on a reading of the algorithm name with a configuration lookup.
1235
=item _has_sequence_data()
1239
sub _has_sequence_data {
1241
$self->throw("Hit attribute not yet set") unless defined $self->hit;
1242
return (($self->hit->hsps)[0]->seq_str('match') ? 1 : 0);
1245
=item _has_logical_length()
1249
sub _has_logical_length {
1252
$self->_check_type_arg(\$type);
1253
# based on mapping coeff
1254
$self->throw("Mapping coefficients not yet set") unless defined $self->mapping($type);
1255
return ($self->mapping($type) > 1);
1265
$self->_check_type_arg(\$type);
1266
return $self->{"_has_strand_${type}"};
1276
$self->_check_type_arg(\$type);
1277
return $self->{"_has_frame_${type}"};
1282
=head1 Private Accessors
1284
=head2 _contig_intersection
1286
Title : _contig_intersection
1287
Usage : $tiling->_contig_intersection($type)
1288
Function: Return the minimal set of $type coordinate intervals
1289
covered by the invocant's HSPs
1290
Returns : array of intervals (2-member arrayrefs; see MapTileUtils)
1291
Args : scalar $type: one of 'query', 'hit', 'subject'
1295
sub _contig_intersection {
1297
my ($type, $context) = @_;
1298
$self->_check_type_arg(\$type);
1299
$self->_check_context_arg($type, \$context);
1300
if (!defined $self->{"_contig_intersection_${type}_${context}"}) {
1301
$self->_calc_coverage_map($type);
1303
return @{$self->{"_contig_intersection_${type}_${context}"}};
1306
=head2 _reported_length
1308
Title : _reported_length
1309
Usage : $tiling->_reported_length($type)
1310
Function: Get the total length of the seq $type
1311
for the invocant's hit object, as reported
1312
by (not calculated from) the input data file
1313
Returns : scalar int
1314
Args : scalar $type: one of 'query', 'hit', 'subject'
1315
Note : This is kludgy; the hit object does not currently
1316
maintain accessors for these values, but the
1317
hsps possess these attributes. This is a wrapper
1318
that allows a consistent access method in the
1320
Note : Since this number is based on a reported length,
1321
it is already a "logical length".
1325
sub _reported_length {
1328
$self->_check_type_arg(\$type);
1329
my $key = uc( $type."_LENGTH" );
1330
return ($self->hsps)[0]->{$key};