15
Bio::Tools::Run::StandAloneBlast - Object for the local execution of the
16
NCBI Blast program suite (blastall, blastpgp, bl2seq). There is experimental
13
Bio::Tools::Run::StandAloneBlast - Object for the local execution
14
of the NCBI BLAST program suite (blastall, blastpgp, bl2seq).
15
There is experimental support for WU-Blast and NCBI rpsblast.
21
19
# Local-blast "factory object" creation and blast-parameter
24
@params = ('database' => 'swissprot','outfile' => 'blast1.out',
25
'_READMETHOD' => 'Blast');
22
@params = (-database => 'swissprot',-outfile => 'blast1.out');
27
23
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
29
25
# Blast a sequence against a database:
31
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta' );
27
$str = Bio::SeqIO->new(-file=>'t/amino.fa', -format => 'Fasta');
32
28
$input = $str->next_seq();
33
29
$input2 = $str->next_seq();
34
30
$blast_report = $factory->blastall($input);
43
39
# Use blast to align 2 sequences against each other:
45
$factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out');
41
$factory = Bio::Tools::Run::StandAloneBlast->new(-outfile => 'bl2seq.out');
46
42
$factory->bl2seq($input, $input2);
48
#experimental support for WU-Blast 2.0
49
my $factory= Bio::Tools::Run::StandAloneBlast->new('program'=>"wublastp",
50
'database'=>"swissprot",
52
my $blast_report = $factory->wublast($seq);
54
# Various additional options and input formats are available. See
55
# the DESCRIPTION section for details.
44
# Experimental support for WU-Blast 2.0
46
my $factory = Bio::Tools::Run::StandAloneBlast->new(-program =>"wublastp",
47
-database =>"swissprot",
49
my $blast_report = $factory->wublast($seq);
51
# Experimental support for NCBI rpsblast
53
my $factory = Bio::Tools::Run::StandAloneBlast->new(-db => 'CDD/Cog',
55
$factory->F('T'); # turn on SEG filtering of query sequence
56
my $blast_report = $factory->rpsblast($seq);
58
# Various additional options and input formats are available,
59
# see the DESCRIPTION section for details.
62
66
BLAST, please see the BLAST documentation which accompanies the BLAST
63
67
distribution. BLAST is available from ftp://ncbi.nlm.nih.gov/blast/.
65
(A source of confusion in documenting a BLAST interface is that the
69
A source of confusion in documenting a BLAST interface is that the
66
70
term "program" is used in - at least - three different ways in the
67
71
BLAST documentation. In this DESCRIPTION, "program" will refer to the
68
BLAST routine set by BLAST's C<-p> parameter that can be set to blastn,
72
BLAST routine set by the BLAST C<-p> parameter that can be set to blastn,
69
73
blastp, tblastx etc. We will use the term Blast "executable" to refer
70
74
to the various different executable files that may be called - ie
71
75
blastall, blastpgp or bl2seq. In addition, there are several BLAST
72
capabilities (which are also referred to as "programs") and are
76
capabilities, which are also referred to as "programs", and are
73
77
implemented by using specific combinations of BLAST executables,
74
78
programs and parameters. They will be referred by their specific
75
names - eg PSIBLAST and PHIBLAST. )
77
StandAloneBlast has been tested so far only under Linux. I expect
78
that it should also work under other Unix systems. However, since the
79
module is implemented using (unix) system calls, modification may be
80
necessary before StandAloneBlast would work under non-Unix
81
operating systems (eg Windows, MacOS). Before running
82
StandAloneBlast it is necessary: to install BLAST on your system,
83
to edit set the environmental variable $BLASTDIR or your $PATH
84
variable to point to the BLAST directory, and to ensure that users
85
have execute privileges for the BLAST program. If the databases
86
which will be searched by BLAST are located in the data subdirectory
87
of the blast program directory (the default installation location),
88
StandAloneBlast will find them; however, if the database files are
89
located in any other location, environmental variable $BLASTDATADIR
90
will need to be set to point to that directory.
79
names - eg PSIBLAST and PHIBLAST.
81
Before running StandAloneBlast it is necessary: to install BLAST
82
on your system, to edit set the environmental variable $BLASTDIR
83
or your $PATH variable to point to the BLAST directory, and to
84
ensure that users have execute privileges for the BLAST program.
86
If the databases which will be searched by BLAST are located in the
87
data subdirectory of the blast program directory (the default
88
installation location), StandAloneBlast will find them; however,
89
if the database files are located in any other location, environmental
90
variable $BLASTDATADIR will need to be set to point to that directory.
92
92
The use of the StandAloneBlast module is as follows: Initially, a
93
93
local blast "factory object" is created. The constructor may be passed
94
94
an optional array of (non-default) parameters to be used by the
97
@params = ('program' => 'blastn', 'database' => 'ecoli.nt');
97
@params = (-program => 'blastn', -database => 'ecoli.nt');
98
98
$factory = Bio::Tools::Run::StandAloneBlast->new(@params);
100
100
Any parameters not explicitly set will remain as the defaults of the
109
109
parameter/switch being set/read is valid. Except where specifically
110
110
noted, StandAloneBlast uses the same single-letter, case-sensitive
111
111
parameter names as the actual blast program. Currently no checks are
112
included to verify that parameters are of the proper type (eg string
112
included to verify that parameters are of the proper type (e.g. string
113
113
or numeric) or that their values are within the proper range.
115
115
As an example, to change the value of the Blast parameter 'e' ('e' is
116
116
the parameter for expectation-value cutoff)
119
$factory->e($expectvalue);
119
$factory->e($expectvalue);
121
121
Note that for improved script readibility one can modify the name of
122
122
the BLAST parameters as desired as long as the initial letter (and
123
case) of the parameter are preserved, eg
124
$factory-E<gt>expectvalue($expectvalue); Unfortunately, some of the BLAST
125
parameters are not the single letter one might expect (eg "iteration
126
round" in blastpgp is 'j'). Again one can check by using (eg)
123
case) of the parameter are preserved, e.g.:
125
$factory->expectvalue($expectvalue);
127
Unfortunately, some of the BLAST parameters are not the single
128
letter one might expect (eg "iteration round" in blastpgp is 'j').
129
Again one can check by using, for example:
130
133
Once the factory has been created and the appropriate parameters set,
131
one can call one of the supported blast executables. The input
132
sequence(s) to these executables may be fasta file(s) as described in
133
the BLAST documentation.
134
one can call one of the supported blast executables. The input
135
sequence(s) to these executables may be fasta file(s) as described in
136
the BLAST documentation.
135
$inputfilename = 't/testquery.fa';
136
$blast_report = $factory->blastall($inputfilename);
138
$inputfilename = 't/testquery.fa';
139
$blast_report = $factory->blastall($inputfilename);
138
141
In addition, sequence input may be in the form of either a Bio::Seq
139
object or or an array of Bio::Seq objects, eg
142
object or or an array of Bio::Seq objects, e.g.:
141
$input = Bio::Seq->new(-id=>"test query",-seq=>"ACTACCCTTTAAATCAGTGGGGG");
142
$blast_report = $factory->blastall($input);
144
$input = Bio::Seq->new(-id => "test query",
145
-seq => "ACTACCCTTTAAATCAGTGGGGG");
146
$blast_report = $factory->blastall($input);
144
148
For blastall and non-psiblast blastpgp runs, report object is either a
145
BPlite.pm or Bio::SearchIO object, selected by the user with the
146
parameter _READMETHOD. (The leading underscore is needed to
149
L<Bio::Tools::BPlite> or L<Bio::SearchIO> object, selected by the user
150
with the parameter _READMETHOD. The leading underscore is needed to
147
151
distinguish this option from options which are passed to the BLAST
148
executable.) The default parser is Bio::SearchIO::blast. For
149
(multiple iteration) psiblast and bl2seq runs the report is
150
automatically parsed by the BPpsilite.pm and BPbl2seq.pm parsers
151
respectively, since neither Blast.pm nor BPlite can parse these
152
reports. In any case, the "raw" blast report is also available. The
153
filename is set by the in the 'outfile' parameter and has the default
154
value of "blastreport.out".
152
executable. The default parser is Bio::SearchIO::blast. If BPlite
153
method is selected, L<Bio::Tools::BPlite> objects will be returned for
154
standard blast and L<Bio::Tools::BPpsilite> for a multiple-iteration
155
blasts, and a L<Bio::Tools::BPbl2seq> for bl2seq. In any case, the "raw"
156
blast report is also available. The filename is set by the in the
157
'outfile' parameter and has the default value of "blastreport.out".
158
The BPlite method is only provided to support legacy code since
159
the BPlite modules are no longer maintained - do not use BPlite
160
since these modules will be removed eventually.
156
For psiblast execution in BLAST's "jumpstart" mode, the program must
162
For psiblast execution in the BLAST "jumpstart" mode, the program must
157
163
be passed (in addition to the query sequence itself) an alignment
158
164
containing the query sequence (in the form of a SimpleAlign object) as
159
165
well as a "mask" specifying at what residues position-specific scoring
164
170
locations where (PSSMs) are to be used and a "0" at all other
165
171
locations. So for example:
167
$str = Bio::AlignIO->new(-file=> "cysprot.msf", '-format' => 'msf' );
168
$aln = $str->next_aln();
169
$len = $aln->length_aln();
170
$mask = '1' x $len; # simple case where PSSM's to be used at all residues
171
$report = $factory->blastpgp("cysprot1.fa", $aln, $mask);
173
$str = Bio::AlignIO->new(-file => "cysprot.msf",
175
$aln = $str->next_aln();
176
$len = $aln->length_aln();
178
# simple case where PSSM's to be used at all residues
179
$report = $factory->blastpgp("cysprot1.fa", $aln, $mask);
173
181
For bl2seq execution, StandAloneBlast.pm can be combined with
174
182
AlignIO.pm to directly produce a SimpleAlign object from the alignment
175
183
of the two sequences produced by bl2seq as in:
178
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , '-format' => 'Fasta', );
179
my $seq3 = $str->next_seq();
180
my $seq4 = $str->next_seq();
183
$factory = Bio::Tools::Run::StandAloneBlast->new('outfile' => 'bl2seq.out');
184
my $bl2seq_report = $factory->bl2seq($seq3, $seq4);
186
# Use AlignIO.pm to create a SimpleAlign object from the bl2seq report
187
$str = Bio::AlignIO->new(-file=> 'bl2seq.out','-format' => 'bl2seq');
188
$aln = $str->next_aln();
186
$str = Bio::SeqIO->new(-file=>'t/amino.fa' , -format => 'Fasta');
187
my $seq3 = $str->next_seq();
188
my $seq4 = $str->next_seq();
191
$factory = Bio::Tools::Run::StandAloneBlast->new(-program => 'blastp',
192
-outfile => 'bl2seq.out');
193
my $bl2seq_report = $factory->bl2seq($seq3, $seq4);
195
# Use AlignIO.pm to create a SimpleAlign object from the bl2seq report
196
$str = Bio::AlignIO->new(-file=> 'bl2seq.out',-format => 'bl2seq');
197
$aln = $str->next_aln();
190
199
For more examples of syntax and use of Blast.pm, the user is
191
200
encouraged to run the scripts standaloneblast.pl in the bioperl
192
/examples directory and StandAloneBlast.t in the bioperl /t directory.
194
Note: There is a similar (but older) perl object interface offered by
195
nhgri. The nhgri module only supports blastall and does not support
196
blastpgp, psiblast, phiblast, bl2seq etc. This module can be found at
197
http://genome.nhgri.nih.gov/blastall/.
199
=head1 DEVELOPERS NOTES
201
B<STILL TO BE WRITTEN>
203
Note: This module is still under development. If you would like that a
204
specific BLAST feature be added to this perl interface, let me know.
201
examples/tools directory and StandAloneBlast.t in the bioperl t/
237
238
package Bio::Tools::Run::StandAloneBlast;
239
use vars qw($AUTOLOAD @ISA $PROGRAMDIR $DATADIR $BLASTTYPE
240
@BLASTALL_PARAMS @BLASTPGP_PARAMS @WUBLAST_PARAMS @WUBLAST_SWITCH
241
@BL2SEQ_PARAMS @OTHER_PARAMS %OK_FIELD
240
use vars qw($AUTOLOAD $PROGRAMDIR $DATADIR $BLASTTYPE
241
@BLASTALL_PARAMS @BLASTPGP_PARAMS @WUBLAST_PARAMS
242
@WUBLAST_SWITCH @RPSBLAST_PARAMS @BL2SEQ_PARAMS
243
@OTHER_PARAMS %OK_FIELD $DEFAULTREADMETHOD
246
248
use Bio::Root::IO;
249
251
use Bio::Tools::BPbl2seq;
250
252
use Bio::Tools::BPpsilite;
251
253
use Bio::SearchIO;
252
use Bio::Tools::Run::WrapperBase;
253
use Bio::Factory::ApplicationFactoryI;
257
@BLASTALL_PARAMS = qw( p d i e m o F G E X I q r v b f g Q
258
D a O J M W z K L Y S T l U y Z);
259
@BLASTPGP_PARAMS = qw(d i A f e m o y P F G E X N g S H a I h c
260
j J Z O M v b C R W z K L Y p k T Q B l U);
261
@BL2SEQ_PARAMS = qw(i j p g o d a G E X W M q r F e S T m);
262
$DEFAULTREADMETHOD = 'BLAST';
264
@WUBLAST_PARAMS = qw( E S E2 S2 W T X M Y Z L K H V B
265
matrix Q R filter wordmask filter maskextra
266
hitdist wink ctxfactor gapE gapS gapE2 gapS2 gapW gapX olf golf
267
olmax golmax gapdecayrate topcomboN topcomboE sumstatsmethod
268
hspsepqmax hspsepsmax gapsepqmax gapsepsmax altscore hspmax gspmax
269
qoffset nwstart nwlen qrecmin qrecmax
270
dbrecmin dbrecmax vdbdescmax dbchunks sort_by_pvalue
271
cpus putenv getenv progress o database input);
272
@WUBLAST_SWITCH = qw(kap sump poissonp lcfilter lcmask echofilter stats nogap gapall pingpong
273
nosegs postsw span2 span1 span prune consistency
274
links ucdb gi noseqs qtype qres sort_by_pvalue sort_by_count
275
sort_by_highscore sort_by_totalscore sort_by_subjectlength
276
mmio nonnegok novalidctxok shortqueryok notes warnings errors endputenv
277
getenv endgetenv abortonerror abortonfatal);
279
# Non BLAST parameters start with underscore to differentiate them
280
# from BLAST parameters
281
@OTHER_PARAMS = qw(_READMETHOD);
283
# _READMETHOD = 'BPlite' (default) or 'Blast'
284
# my @other_switches = qw(QUIET);
287
# Authorize attribute fields
288
foreach my $attr (@BLASTALL_PARAMS, @BLASTPGP_PARAMS,
289
@BL2SEQ_PARAMS, @OTHER_PARAMS ,@WUBLAST_PARAMS, @WUBLAST_SWITCH )
256
use base qw(Bio::Root::Root Bio::Tools::Run::WrapperBase Bio::Factory::ApplicationFactoryI);
259
@BLASTALL_PARAMS = qw(A B C D E F G I J K L M O P Q R S T U V W X Y Z a b d e f g i l m n o p q r s t v w y z);
260
@BLASTPGP_PARAMS = qw(A B C E F G H I J K L M N O P Q R S T U W X Y Z a b c d e f h i j k l m o p q s t u v y z);
261
@RPSBLAST_PARAMS = qw(F I J L N O P T U V X Y Z a b d e i l m o p v y z);
262
@BL2SEQ_PARAMS = qw(A D E F G I J M S T U V W X Y a d e g i j m o p q r t);
263
$DEFAULTREADMETHOD = 'BLAST';
266
qw( E S E2 S2 W T X M Y Z L K H V B
267
matrix Q R filter wordmask filter maskextra
268
hitdist wink ctxfactor gapE gapS gapE2 gapS2 gapW gapX olf golf
269
olmax golmax gapdecayrate topcomboN topcomboE sumstatsmethod
270
hspsepqmax hspsepsmax gapsepqmax gapsepsmax altscore hspmax gspmax
271
qoffset nwstart nwlen qrecmin qrecmax
272
dbrecmin dbrecmax vdbdescmax dbchunks sort_by_pvalue
273
cpus putenv getenv progress o database input);
275
qw(kap sump poissonp lcfilter lcmask echofilter stats nogap
277
nosegs postsw span2 span1 span prune consistency
278
links ucdb gi noseqs qtype qres sort_by_pvalue sort_by_count
279
sort_by_highscore sort_by_totalscore sort_by_subjectlength
280
mmio nonnegok novalidctxok shortqueryok notes warnings
282
getenv endgetenv abortonerror abortonfatal);
284
# Non BLAST parameters start with underscore to differentiate them
285
# from BLAST parameters
286
@OTHER_PARAMS = qw(_READMETHOD);
288
# my @other_switches = qw(QUIET);
290
# Authorize attribute fields
291
foreach my $attr (@BLASTALL_PARAMS, @BLASTPGP_PARAMS, @RPSBLAST_PARAMS,
292
@BL2SEQ_PARAMS, @OTHER_PARAMS ,@WUBLAST_PARAMS,
290
294
{ $OK_FIELD{$attr}++; }
292
# You will need to enable Blast to find the Blast program. This can be done
293
# in (at least) two different ways:
294
# 1. define an environmental variable blastDIR:
295
# export BLASTDIR=/home/peter/blast or
296
# 2. include a definition of an environmental variable BLASTDIR in every script that will
297
# use StandAloneBlast.pm.
298
# BEGIN {$ENV{BLASTDIR} = '/home/peter/blast/'; }
299
$PROGRAMDIR = $BLASTTYPE eq 'ncbi' ? $ENV{'BLASTDIR'}: $ENV{'WUBLASTIDR'};
301
# If local BLAST databases are not stored in the standard
302
# /data directory, the variable BLASTDATADIR will need to be set explicitly
303
$DATADIR = $ENV{'BLASTDATADIR'} || $ENV{'BLASTDB'} || '';
296
# You will need to enable Blast to find the Blast program.
297
# This can be done in at least two different ways:
298
# 1. define an environmental variable blastDIR:
299
# export BLASTDIR=/home/peter/blast or
300
# 2. include a definition of an environmental variable
301
# BLASTDIR in every script that will use StandAloneBlast.pm.
303
$PROGRAMDIR = $BLASTTYPE eq 'ncbi' ? $ENV{'BLASTDIR'}: $ENV{'WUBLASTDIR'};
305
# If local BLAST databases are not stored in the standard
306
# /data directory, the variable BLASTDATADIR will need to be
308
$DATADIR = $ENV{'BLASTDATADIR'} || $ENV{'BLASTDB'} || '';
306
@ISA = qw(Bio::Root::Root
307
Bio::Tools::Run::WrapperBase
308
Bio::Factory::ApplicationFactoryI);
310
312
=head1 BLAST parameters
617
656
my $blast_report = &_generic_local_blast($self, $executable, $input1, $input2);
662
Usage : $blast_report = $factory->rpsblast('t/testquery.fa');
664
$input = Bio::Seq->new(-id=>"test query",
665
-seq=>"MVVLCRADDEEQQPPTCADEEQQQVVGG");
666
$blast_report = $factory->rpsblast($input);
668
$seq_array_ref = \@seq_array;
669
# where @seq_array is an array of Bio::Seq objects
670
$blast_report = $factory->rpsblast(\@seq_array);
671
Args : Name of a file or Bio::Seq object or an array of
672
Bio::Seq object containing the query sequence(s).
673
Throws an exception if argument is not either a string
674
(eg a filename) or a reference to a Bio::Seq object
675
(or to an array of Seq objects). If argument is string,
676
throws exception if file corresponding to string name can
678
Returns : Reference to a Bio::SearchIO object or BPlite object
679
containing the blast report (BPlite only if you specify
680
_READMETHOD=> 'BPlite')
685
my ($self, $input1) = @_;
687
my $executable = 'rpsblast';
689
# Create input file pointer
690
my $infilename1 = $self->_setinput($executable, $input1);
691
if (! $infilename1) {
692
$self->throw(" $input1 ($infilename1) not Bio::Seq object or array of Bio::Seq objects or file name!");
694
$self->i($infilename1); # set file name of sequence to be blasted to inputfilename1 (-i param of blastall)
696
# Run like a standard NCBI blast from this point
697
my $blast_report = _generic_local_blast($self, $executable);
623
Usage : $factory-> blastpgp('t/seq1.fa', 't/seq2.fa');
703
Usage : $factory-> bl2seq('t/seq1.fa', 't/seq2.fa');
625
705
$input1 = Bio::Seq->new(-id=>"test query1",
626
706
-seq=>"ACTADDEEQQPPTCADEEQQQVVGG");
708
788
Usage : Internal function, not to be called directly
709
789
Function: makes actual system call to Blast program
711
Returns : Report object in the appropriate format (BPlite,
712
BPpsilite, Blast, or BPbl2seq)
791
Returns : Report object in the appropriate format (Bio::SearchIO)
792
or if BPlite is requested: Bio::Tools::BPlite,
793
Bio::Tools::BPpsilite,or Bio::Tools::BPbl2seq)
713
794
Args : Reference to calling object, name of BLAST executable,
714
795
and parameter string for executable
719
my ($self,$executable,$param_string) = @_;
720
my ($blast_obj,$exe);
721
if( ! ($exe = $self->executable($executable)) ) {
722
$self->warn("cannot find path to $executable");
725
my $commandstring = $exe. $param_string;
727
# next line for debugging
728
$self->debug( "$commandstring \n");
730
my $status = system($commandstring);
732
$self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ;
733
my $outfile = $self->o() ; # get outputfilename
734
my $signif = $self->e() || 1e-5 ;
736
# set significance cutoff to set expectation value or default value
737
# (may want to make this value vary for different executables)
739
# If running bl2seq or psiblast (blastpgp with multiple iterations),
740
# the specific parsers for these programs must be used (ie BPbl2seq or
741
# BPpsilite). Otherwise either the Blast parser or the BPlite
742
# parsers can be selected.
744
if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i ) {
745
$blast_obj = Bio::SearchIO->new(-file=>$outfile,
746
-format => 'blast' ) ;
747
} elsif( $self->_READMETHOD =~ /BPlite/i ) {
748
if ($executable =~ /bl2seq/i) {
749
# Added program info so BPbl2seq can compute strand info
750
$blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile,
751
-REPORT_TYPE => $self->p );
752
} elsif ($executable =~ /blastpgp/i && defined $self->j() &&
754
$self->debug( "using psilite parser\n");
755
$blast_obj = Bio::Tools::BPpsilite->new(-file => $outfile);
756
} elsif( $executable =~ /blastall/i ) {
757
$blast_obj = Bio::Tools::BPlite->new(-file=>$outfile);
759
$self->warn("Unrecognized executable $executable");
762
$self->warn("Unrecognized readmethod ".$self->_READMETHOD);
800
my ($self,$executable,$param_string) = @_;
801
my ($blast_obj,$exe);
802
if( ! ($exe = $self->executable($executable)) ) {
803
$self->warn("cannot find path to $executable");
806
my $commandstring = $exe. $param_string;
808
# next line for debugging
809
$self->debug( "$commandstring \n");
811
my $status = system($commandstring);
813
$self->throw("$executable call crashed: $? $commandstring\n")
814
unless ($status==0) ;
815
my $outfile = $self->o() ; # get outputfilename
816
my $signif = $self->e() || 1e-5 ;
818
# set significance cutoff to set expectation value or default value
819
# (may want to make this value vary for different executables)
821
# If running bl2seq or psiblast (blastpgp with multiple iterations),
822
# the specific parsers for these programs must be used (ie BPbl2seq or
823
# BPpsilite). Otherwise either the Blast parser or the BPlite
824
# parsers can be selected.
826
if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i ) {
827
$blast_obj = Bio::SearchIO->new(-file=>$outfile,
828
-format => 'blast' ) ;
829
} elsif( $self->_READMETHOD =~ /BPlite/i ) {
830
if ($executable =~ /bl2seq/i) {
831
# Added program info so BPbl2seq can compute strand info
832
$blast_obj = Bio::Tools::BPbl2seq->new(-file => $outfile,
833
-REPORT_TYPE => $self->p );
834
} elsif ($executable =~ /blastpgp/i && defined $self->j() &&
836
$self->debug( "using psilite parser\n");
837
$blast_obj = Bio::Tools::BPpsilite->new(-file => $outfile);
838
} elsif( $executable =~ /blastall|rpsblast/i) {
839
$blast_obj = Bio::Tools::BPlite->new(-file=>$outfile);
841
$self->warn("Unrecognized executable $executable");
844
$self->warn("Unrecognized readmethod ".$self->_READMETHOD);
768
850
=head2 _runwublast
780
862
sub _runwublast {
781
my ($self,$executable,$param_string) = @_;
782
my ($blast_obj,$exe);
783
if( ! ($exe = $self->executable($self->p))){
784
$self->warn("cannot find path to $executable");
787
my $commandstring = $exe. " ".$param_string;
789
# next line for debugging
790
$self->debug( "$commandstring \n");
792
my $status = system($commandstring);
794
$self->throw("$executable call crashed: $? $commandstring\n") unless ($status==0) ;
795
my $outfile = $self->o() ; # get outputfilename
796
$blast_obj = Bio::SearchIO->new(-file=>$outfile,
863
my ($self,$executable,$param_string) = @_;
864
my ($blast_obj,$exe);
865
if( ! ($exe = $self->executable($self->p))){
866
$self->warn("cannot find path to $executable");
869
my $commandstring = $exe. " ".$param_string;
871
# next line for debugging
872
$self->debug( "$commandstring \n");
874
my $status = system($commandstring);
876
$self->throw("$executable call crashed: $? $commandstring\n")
877
unless ($status==0) ;
878
my $outfile = $self->o() ; # get outputfilename
879
$blast_obj = Bio::SearchIO->new(-file=>$outfile,
797
880
-format => 'blast') ;
813
my ($self, $executable, $input1, $input2) = @_;
814
my ($seq, $temp, $infilename1, $infilename2,$fh ) ;
815
# If $input1 is not a reference it better be the name of a file with
816
# the sequence/ alignment data...
817
$self->io->_io_cleanup();
896
my ($self, $executable, $input1, $input2) = @_;
897
my ($seq, $temp, $infilename1, $infilename2,$fh ) ;
898
# If $input1 is not a reference it better be the name of a file with
899
# the sequence/ alignment data...
900
$self->io->_io_cleanup();
820
903
unless (ref $input1) {
821
$infilename1 = (-e $input1) ? $input1 : 0 ;
904
$infilename1 = (-e $input1) ? $input1 : 0 ;
824
# $input may be an array of BioSeq objects...
907
# $input may be an array of BioSeq objects...
825
908
if (ref($input1) =~ /ARRAY/i ) {
826
($fh,$infilename1) = $self->io->tempfile();
827
$temp = Bio::SeqIO->new(-fh=> $fh,
828
'-format' => 'fasta');
829
foreach $seq (@$input1) {
830
unless ($seq->isa("Bio::PrimarySeqI")) {return 0;}
831
$seq->display_id($seq->display_id);
832
$temp->write_seq($seq);
909
($fh,$infilename1) = $self->io->tempfile();
910
$temp = Bio::SeqIO->new(-fh=> $fh,
912
foreach $seq (@$input1) {
913
unless ($seq->isa("Bio::PrimarySeqI")) {return 0;}
914
$seq->display_id($seq->display_id);
915
$temp->write_seq($seq);
838
# $input may be a single BioSeq object...
921
# $input may be a single BioSeq object...
839
922
elsif ($input1->isa("Bio::PrimarySeqI")) {
840
($fh,$infilename1) = $self->io->tempfile();
842
# just in case $input1 is taken from an alignment and has spaces (ie
843
# deletions) indicated within it, we have to remove them - otherwise
844
# the BLAST programs will be unhappy
846
my $seq_string = $input1->seq();
847
$seq_string =~ s/\W+//g; # get rid of spaces in sequence
848
$input1->seq($seq_string);
849
$temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'fasta');
850
$temp->write_seq($input1);
923
($fh,$infilename1) = $self->io->tempfile();
925
# just in case $input1 is taken from an alignment and has spaces (ie
926
# deletions) indicated within it, we have to remove them - otherwise
927
# the BLAST programs will be unhappy
929
my $seq_string = $input1->seq();
930
$seq_string =~ s/\W+//g; # get rid of spaces in sequence
931
$input1->seq($seq_string);
932
$temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'fasta');
933
$temp->write_seq($input1);
855
938
$infilename1 = 0; # Set error flag if you get here
857
unless ($input2) { return $infilename1; }
940
unless ($input2) { return $infilename1; }
859
942
unless (ref $input2) {
860
$infilename2 = (-e $input2) ? $input2 : 0 ;
943
$infilename2 = (-e $input2) ? $input2 : 0 ;
863
946
if ($input2->isa("Bio::PrimarySeqI") && $executable eq 'bl2seq' ) {
864
($fh,$infilename2) = $self->io->tempfile();
947
($fh,$infilename2) = $self->io->tempfile();
866
$temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta');
867
$temp->write_seq($input2);
949
$temp = Bio::SeqIO->new(-fh=> $fh, '-format' => 'Fasta');
950
$temp->write_seq($input2);
872
# Option for using psiblast's pre-alignment "jumpstart" feature
955
# Option for using psiblast's pre-alignment "jumpstart" feature
873
956
elsif ($input2->isa("Bio::SimpleAlign") &&
874
$executable eq 'blastpgp' ) {
875
# a bit of a lie since it won't be a fasta file
957
$executable eq 'blastpgp' ) {
958
# a bit of a lie since it won't be a fasta file
876
959
($fh,$infilename2) = $self->io->tempfile();
878
# first we retrieve the "mask" that determines which residues should
879
# by scored according to their position and which should be scored
880
# using the non-position-specific matrices
961
# first we retrieve the "mask" that determines which residues should
962
# by scored according to their position and which should be scored
963
# using the non-position-specific matrices
882
965
my @mask = split("", shift ); # get mask
884
# then we have to convert all the residues in every sequence to upper
885
# case at the positions that we want psiblast to use position specific
967
# then we have to convert all the residues in every sequence to upper
968
# case at the positions that we want psiblast to use position specific
888
971
foreach $seq ( $input2->each_seq() ) {
889
my @seqstringlist = split("",$seq->seq());
890
for (my $i = 0; $i < scalar(@mask); $i++) {
891
unless ( $seqstringlist[$i] =~ /[a-zA-Z]/ ) {next}
892
$seqstringlist[$i] = $mask[$i] ? uc $seqstringlist[$i]: lc $seqstringlist[$i] ;
894
my $newseqstring = join("", @seqstringlist);
895
$seq->seq($newseqstring);
972
my @seqstringlist = split("",$seq->seq());
973
for (my $i = 0; $i < scalar(@mask); $i++) {
974
unless ( $seqstringlist[$i] =~ /[a-zA-Z]/ ) {next}
975
$seqstringlist[$i] = $mask[$i] ? uc $seqstringlist[$i]: lc $seqstringlist[$i] ;
977
my $newseqstring = join("", @seqstringlist);
978
$seq->seq($newseqstring);
897
# Now we need to write out the alignment to a file
898
# in the "psi format" which psiblast is expecting
980
# Now we need to write out the alignment to a file
981
# in the "psi format" which psiblast is expecting
899
982
$input2->map_chars('\.','-');
900
983
$temp = Bio::AlignIO->new(-fh=> $fh, '-format' => 'psi');
901
984
$temp->write_aln($input2);
906
989
$infilename2 = 0; # Set error flag if you get here
908
return ($infilename1, $infilename2);
991
return ($infilename1, $infilename2);
911
994
=head2 _setparams
923
1006
my ($self,$executable) = @_;
924
1007
my ($attr, $value, @execparams);
926
if ($executable eq 'blastall') {@execparams = @BLASTALL_PARAMS; }
927
if ($executable eq 'blastpgp') {@execparams = @BLASTPGP_PARAMS; }
928
if ($executable eq 'bl2seq') {@execparams = @BL2SEQ_PARAMS; }
929
if($executable eq 'wublast') {@execparams = @WUBLAST_PARAMS; }
1009
if ($executable eq 'blastall') { @execparams = @BLASTALL_PARAMS; }
1010
elsif ($executable eq 'blastpgp') { @execparams = @BLASTPGP_PARAMS; }
1011
elsif ($executable eq 'rpsblast') { @execparams = @RPSBLAST_PARAMS; }
1012
elsif ($executable eq 'bl2seq' ) { @execparams = @BL2SEQ_PARAMS; }
1013
elsif ($executable eq 'wublast' ) { @execparams = @WUBLAST_PARAMS; }
931
1015
my $param_string = "";
932
1016
for $attr ( @execparams ) {
933
$value = $self->$attr();
934
next unless (defined $value);
935
# Need to prepend datadirectory to database name
936
if($executable eq 'wublast'){
937
next if $attr =~ /database|^d$/;
938
next if $attr =~ /input|^i$/;
939
$attr = 'o' if ($attr =~/outfile/);
1017
$value = $self->$attr();
1018
next unless (defined $value);
1019
# Need to prepend datadirectory to database name
1020
if ($executable eq 'wublast') {
1021
next if $attr =~ /database|^d$/;
1022
next if $attr =~ /input|^i$/;
1023
$attr = 'o' if ($attr =~/outfile/);
942
1026
if ($attr eq 'd' && ($executable ne 'bl2seq')) {
943
# This is added so that you can specify a DB with a full path
944
if (! (-e $value.".nin" || -e $value.".pin")){
945
my @dbs = split(/ /, $value);
946
for (my $i = 0; $i < scalar(@dbs); $i++) {
947
$dbs[$i] = File::Spec->catdir($DATADIR, $dbs[$i]);
949
$value = '"'.join(" ", @dbs).'"';
1027
my @dbs = split(/ /, $value);
1028
for (my $i = 0; $i < scalar(@dbs); $i++) {
1029
# moved the test for full path db to work with multiple databases
1030
if (! (-e $dbs[$i].".nin" || -e $dbs[$i].".pin") &&
1031
! (-e $dbs[$i].".nal" || -e $dbs[$i].".pal") ) {
1032
$dbs[$i] = File::Spec->catdir($DATADIR, $dbs[$i]);
1034
$value = '"'.join(" ", @dbs).'"';
952
1037
# put params in format expected by Blast
953
1038
$attr = '-'. $attr ;
954
1039
$param_string .= " $attr $value ";
957
if ($self->quiet()) { $param_string .= ' 2>/dev/null';}
958
if($executable eq 'wublast'){
959
foreach my $attr(@WUBLAST_SWITCH){
1042
if ($self->quiet()) {
1043
$param_string .= ' 2> '.File::Spec->devnull;
1045
if ($executable eq 'wublast') {
1046
foreach my $attr (@WUBLAST_SWITCH) {
960
1047
my $value = $self->$attr();
961
1048
next unless (defined $value);
962
1049
my $attr_key = ' -'.(lc $attr);