1
# $Id: protdist.pm,v 1.6 2006/07/04 22:23:35 mauricio Exp $
2
# BioPerl module for Bio::Tools::Run::PiseApplication::protdist
4
# Cared for by Catherine Letondal <letondal@pasteur.fr>
6
# For copyright and disclaimer see below.
8
# POD documentation - main docs before the code
12
Bio::Tools::Run::PiseApplication::protdist
20
Bio::Tools::Run::PiseApplication::protdist
24
Phylip protdist - Program to compute distance matrix from protein sequences (Felsenstein)
28
Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.
30
Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.
37
http://bioweb.pasteur.fr/seqanal/interfaces/protdist.html
38
for available values):
51
Gamma distribution of rates among positions (G)
54
Perform a bootstrap before analysis
59
seqboot_seed (Integer)
60
Random number seed (must be odd)
66
Use weights for sites (W)
71
multiple_dataset (String)
75
bootterminal_type (String)
78
Print out the data at start of run (1)
84
Categorization of amino acids (A)
87
Prob change category (1.0=easy) (E)
90
Transition/transversion ratio (T)
92
base_frequencies (Integer)
93
Base frequencies for A, C, G, T/U (separated by commas)
97
terminal_type (String)
103
User feedback is an integral part of the evolution of this and other
104
Bioperl modules. Send your comments and suggestions preferably to
105
the Bioperl mailing list. Your participation is much appreciated.
107
bioperl-l@bioperl.org - General discussion
108
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
110
=head2 Reporting Bugs
112
Report bugs to the Bioperl bug tracking system to help us keep track
113
of the bugs and their resolution. Bug reports can be submitted via the
116
http://bugzilla.open-bio.org/
120
Catherine Letondal (letondal@pasteur.fr)
124
Copyright (C) 2003 Institut Pasteur & Catherine Letondal.
127
This module is free software; you can redistribute it and/or modify
128
it under the same terms as Perl itself.
132
This software is provided "as is" without warranty of any kind.
140
http://bioweb.pasteur.fr/seqanal/interfaces/protdist.html
144
Bio::Tools::Run::PiseApplication
148
Bio::Tools::Run::AnalysisFactory::Pise
152
Bio::Tools::Run::PiseJob
159
package Bio::Tools::Run::PiseApplication::protdist;
163
use Bio::Tools::Run::PiseApplication;
165
@ISA = qw(Bio::Tools::Run::PiseApplication);
170
Usage : my $protdist = Bio::Tools::Run::PiseApplication::protdist->new($location, $email, @params);
171
Function: Creates a Bio::Tools::Run::PiseApplication::protdist object.
172
This method should not be used directly, but rather by
173
a Bio::Tools::Run::AnalysisFactory::Pise instance.
174
my $factory = Bio::Tools::Run::AnalysisFactory::Pise->new();
175
my $protdist = $factory->program('protdist');
177
Returns : An instance of Bio::Tools::Run::PiseApplication::protdist.
182
my ($class, $location, $email, @params) = @_;
183
my $self = $class->SUPER::new($location, $email);
185
# -- begin of definitions extracted from /local/gensoft/lib/Pise/5.a/PerlDef/protdist.pm
187
$self->{COMMAND} = "protdist";
188
$self->{VERSION} = "5.a";
189
$self->{TITLE} = "Phylip";
191
$self->{DESCRIPTION} = "protdist - Program to compute distance matrix from protein sequences";
193
$self->{OPT_EMAIL} = 0;
195
$self->{AUTHORS} = "Felsenstein";
197
$self->{DOCLINK} = "http://bioweb.pasteur.fr/docs/gensoft-evol.html#PHYLIP";
199
$self->{REFERENCE} = [
201
"Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.",
203
"Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.",
206
$self->{_INTERFACE_STANDOUT} = undef;
207
$self->{_STANDOUT_FILE} = undef;
209
$self->{TOP_PARAMETERS} = [
229
$self->{PARAMETERS_ORDER} = [
231
"infile", # Alignement File
232
"method", # Distance model (P)
233
"gamma_dist", # Gamma distribution of rates among positions (G)
234
"bootstrap", # Bootstrap options
235
"seqboot", # Perform a bootstrap before analysis
236
"resamp_method", # Resampling methods
237
"seqboot_seed", # Random number seed (must be odd)
238
"replicates", # How many replicates
239
"weight_opt", # Weight options
240
"weights", # Use weights for sites (W)
241
"weights_file", # Weights file
245
"output", # Output options
246
"printdata", # Print out the data at start of run (1)
247
"categ_opt", # Categories model options
248
"code", # Genetic code (U)
249
"categorization", # Categorization of amino acids (A)
250
"change_prob", # Prob change category (1.0=easy) (E)
251
"ratio", # Transition/transversion ratio (T)
252
"base_frequencies", # Base frequencies for A, C, G, T/U (separated by commas)
262
"protdist" => 'String',
263
"infile" => 'Sequence',
265
"gamma_dist" => 'Excl',
266
"bootstrap" => 'Paragraph',
267
"seqboot" => 'Switch',
268
"resamp_method" => 'Excl',
269
"seqboot_seed" => 'Integer',
270
"replicates" => 'Integer',
271
"weight_opt" => 'Paragraph',
272
"weights" => 'Switch',
273
"weights_file" => 'InFile',
274
"multiple_dataset" => 'String',
275
"bootconfirm" => 'String',
276
"bootterminal_type" => 'String',
277
"output" => 'Paragraph',
278
"printdata" => 'Switch',
279
"categ_opt" => 'Paragraph',
281
"categorization" => 'Excl',
282
"change_prob" => 'Integer',
283
"ratio" => 'Integer',
284
"base_frequencies" => 'Integer',
285
"outfile" => 'Results',
286
"params" => 'Results',
287
"confirm" => 'String',
288
"terminal_type" => 'String',
289
"tmp_params" => 'Results',
295
"perl" => '"protdist < params"',
298
"perl" => '"ln -sf $infile infile; "',
307
"perl" => '($value) ? "seqboot < seqboot.params && mv outfile infile && " : ""',
312
"perl" => '"$value\\n"',
315
"perl" => '($value && $value != $vdef)? "R\\n$value\\n" : ""',
320
"perl" => '($value)? "W\\n" : ""',
323
"perl" => '($value)? "ln -s $weights_file weights; " : ""',
325
"multiple_dataset" => {
326
"perl" => '"M\\nD\\n$replicates\\n"',
331
"bootterminal_type" => {
337
"perl" => '($value) ? "1\\n" : ""',
342
"perl" => '($value and $value ne $vdef)? "U\\n$code\\n" : "" ',
344
"categorization" => {
345
"perl" => '($value and $value ne $vdef) ? "A\\n$categorization\\n" : ""',
348
"perl" => '($value and $value != $vdef) ? "E\\n$value\\n" : ""',
351
"perl" => '($value && $value != $vdef)? "T\\n$value\\n" : ""',
353
"base_frequencies" => {
354
"perl" => '($value) ? "F\\n$base_frequencies\\n" : "" ',
371
$self->{FILENAMES} = {
372
"outfile" => 'outfile',
373
"params" => 'params',
374
"tmp_params" => '*.params',
389
"resamp_method" => 1,
390
"seqboot_seed" => 10000,
393
"weights_file" => -1,
394
"multiple_dataset" => 1,
395
"bootconfirm" => 1000,
396
"bootterminal_type" => -1,
400
"categorization" => 10,
403
"base_frequencies" => 3,
405
"terminal_type" => -1,
409
$self->{BY_GROUP_PARAMETERS} = [
445
$self->{ISHIDDEN} = {
452
"resamp_method" => 0,
458
"multiple_dataset" => 1,
460
"bootterminal_type" => 1,
465
"categorization" => 0,
468
"base_frequencies" => 0,
472
"terminal_type" => 1,
477
$self->{ISCOMMAND} = {
484
"resamp_method" => 0,
490
"multiple_dataset" => 0,
492
"bootterminal_type" => 0,
497
"categorization" => 0,
500
"base_frequencies" => 0,
504
"terminal_type" => 0,
509
$self->{ISMANDATORY} = {
516
"resamp_method" => 1,
522
"multiple_dataset" => 0,
524
"bootterminal_type" => 0,
529
"categorization" => 0,
532
"base_frequencies" => 0,
536
"terminal_type" => 0,
543
"infile" => "Alignement File",
544
"method" => "Distance model (P)",
545
"gamma_dist" => "Gamma distribution of rates among positions (G)",
546
"bootstrap" => "Bootstrap options",
547
"seqboot" => "Perform a bootstrap before analysis",
548
"resamp_method" => "Resampling methods",
549
"seqboot_seed" => "Random number seed (must be odd)",
550
"replicates" => "How many replicates",
551
"weight_opt" => "Weight options",
552
"weights" => "Use weights for sites (W)",
553
"weights_file" => "Weights file",
554
"multiple_dataset" => "",
556
"bootterminal_type" => "",
557
"output" => "Output options",
558
"printdata" => "Print out the data at start of run (1)",
559
"categ_opt" => "Categories model options",
560
"code" => "Genetic code (U)",
561
"categorization" => "Categorization of amino acids (A)",
562
"change_prob" => "Prob change category (1.0=easy) (E)",
563
"ratio" => "Transition/transversion ratio (T)",
564
"base_frequencies" => "Base frequencies for A, C, G, T/U (separated by commas)",
568
"terminal_type" => "",
573
$self->{ISSTANDOUT} = {
580
"resamp_method" => 0,
586
"multiple_dataset" => 0,
588
"bootterminal_type" => 0,
593
"categorization" => 0,
596
"base_frequencies" => 0,
600
"terminal_type" => 0,
607
"method" => ['J','Jones-Taylor-Thornton matrix','D','Dayhoff PAM matrix','K','Kimura formula','S','Similarity table','C','Categories model',],
608
"gamma_dist" => ['N','No','Y','Yes','G','Gamma+Invariant',],
609
"bootstrap" => ['seqboot','resamp_method','seqboot_seed','replicates',],
610
"resamp_method" => ['bootstrap','Bootstrap','jackknife','Delete-half jackknife','permute','Permute species for each character',],
611
"weight_opt" => ['weights','weights_file',],
612
"output" => ['printdata',],
613
"categ_opt" => ['code','categorization','change_prob','ratio','base_frequencies',],
614
"code" => ['U','U: Universal','M','M: Mitochondrial','V','V: Vertebrate mitochondrial','F','F: Fly mitochondrial','Y','Y: Yeast mitochondrial',],
615
"categorization" => ['G','G: George/Hunt/Barker','C','C: Chemical','H','H: Hall',],
622
'C' => '"P\\nP\\nP\\nP\\n"',
623
'S' => '"P\\nP\\nP\\n"',
636
'permute' => '"J\\nJ\\n"',
637
'jackknife' => '"J\\n"',
642
$self->{SEPARATOR} = {
649
"resamp_method" => 'bootstrap',
650
"replicates" => '100',
653
"categorization" => 'G',
654
"change_prob" => '0.4570',
660
"protdist" => { "perl" => '1' },
661
"infile" => { "perl" => '1' },
662
"method" => { "perl" => '1' },
664
"perl" => '$method eq "J" or $method eq "D" or $method eq "C"',
666
"bootstrap" => { "perl" => '1' },
667
"seqboot" => { "perl" => '1' },
669
"perl" => '$seqboot',
672
"perl" => '$seqboot',
675
"perl" => '$seqboot',
677
"weight_opt" => { "perl" => '1' },
678
"weights" => { "perl" => '1' },
680
"perl" => '$weights',
682
"multiple_dataset" => {
683
"perl" => '$seqboot',
686
"perl" => '$seqboot',
688
"bootterminal_type" => {
689
"perl" => '$seqboot',
691
"output" => { "perl" => '1' },
692
"printdata" => { "perl" => '1' },
694
"perl" => '$method eq "C"',
697
"perl" => '$method eq "C"',
699
"categorization" => {
700
"perl" => '$method eq "C"',
703
"perl" => '$method eq "C"',
706
"perl" => '$method eq "C"',
708
"base_frequencies" => {
709
"perl" => '$method eq "C"',
711
"outfile" => { "perl" => '1' },
712
"params" => { "perl" => '1' },
713
"confirm" => { "perl" => '1' },
714
"terminal_type" => { "perl" => '1' },
715
"tmp_params" => { "perl" => '1' },
722
'$value <= 0 || (($value % 2) == 0)' => "Random number seed must be odd",
727
'$replicates > 1000' => "this server allows no more than 1000 replicates",
732
'$change_prob < 0.0 || $change_prob > 1.0' => "Enter a value between 0.0 and 1.0",
735
"base_frequencies" => {
737
'($base_frequencies =~ s/,/ /g) && 0' => "",
745
'$method ne "S"' => "phylip_dist",
750
$self->{WITHPIPEOUT} = {
756
"readseq_ok_alig" => '1',
761
$self->{WITHPIPEIN} = {
772
"resamp_method" => 0,
778
"multiple_dataset" => 0,
780
"bootterminal_type" => 0,
785
"categorization" => 0,
788
"base_frequencies" => 0,
792
"terminal_type" => 0,
797
$self->{ISSIMPLE} = {
804
"resamp_method" => 0,
810
"multiple_dataset" => 0,
812
"bootterminal_type" => 0,
817
"categorization" => 0,
820
"base_frequencies" => 0,
824
"terminal_type" => 0,
829
$self->{PARAMFILE} = {
830
"method" => "params",
831
"gamma_dist" => "params",
832
"resamp_method" => "seqboot.params",
833
"seqboot_seed" => "seqboot.params",
834
"replicates" => "seqboot.params",
835
"weights" => "params",
836
"multiple_dataset" => "params",
837
"bootconfirm" => "seqboot.params",
838
"bootterminal_type" => "seqboot.params",
839
"printdata" => "params",
841
"categorization" => "params",
842
"change_prob" => "params",
844
"base_frequencies" => "params",
845
"confirm" => "params",
846
"terminal_type" => "params",
852
"By selecting this option, the bootstrap will be performed on your sequence file. So you don\'t need to perform a separated seqboot before.",
853
"Don\'t give an already bootstrapped file to the program, this won\'t work!",
856
"1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data.",
857
"2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986).",
858
"3. Permuting species within characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just a pair of sibling species).",
860
"categorization" => [
861
"All have groups: (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp) plus:",
862
"George/Hunt/Barker: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr Pro)",
863
"Chemical: (Cys Met), (Val Leu Ileu Gly Ala Ser Thr), (Pro)",
864
"Hall: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr), (Pro)",
869
$self->{SCALEMIN} = {
873
$self->{SCALEMAX} = {
877
$self->{SCALEINC} = {
885
# -- end of definitions extracted from /local/gensoft/lib/Pise/5.a/PerlDef/protdist.pm
889
$self->_init_params(@params);
896
1; # Needed to keep compiler happy