~ubuntu-branches/ubuntu/natty/python-cogent/natty

Viewing changes to doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

Committer: Bazaar Package Importer
Author(s): Steffen Moeller
Date: 2010-12-04 22:30:35 UTC
mfrom: (1.1.1 upstream)
Revision ID: james.westby@ubuntu.com-20101204223035-j11kinhcrrdgg2p2

Tags: 1.5-1

* Bumped standard to 3.9.1, no changes required.
* New upstream version.
  - major additions to Cookbook
  - added AlleleFreqs attribute to ensembl Variation objects.
  - added getGeneByStableId method to genome objects.
  - added Introns attribute to Transcript objects and an Intron class.
  - added Mann-Whitney test and a Monte-Carlo version
  - exploratory and confirmatory period estimation techniques (suitable for
    symbolic and continuous data)
  - Information theoretic measures (AIC and BIC) added
  - drawing of trees with collapsed nodes
  - progress display indicator support for terminal and GUI apps
  - added parser for illumina HiSeq2000 and GAiix sequence files as
    cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser.
  - added parser to FASTQ files, one of the output options for illumina's
    workflow, also added cookbook demo.
  - added functionality for parsing of SFF files without the Roche tools in
    cogent.parse.binary_sff
  - thousand fold performance improvement to nmds
  - >10-fold performance improvements to some Table operations

files added:
cogent/cluster/approximate_mds.py

cogent/maths/_period.c

cogent/maths/_period.pyx

cogent/maths/period.py

cogent/maths/stats/information_criteria.py

cogent/maths/stats/period.py

cogent/parse/binary_sff.py

cogent/parse/fastq.py

cogent/parse/illumina_sequence.py

cogent/parse/kegg_ko.py

cogent/parse/kegg_pos.py

cogent/parse/kegg_taxonomy.py

cogent/util/progress_display.py

cogent/util/terminal.py

doc/_static

doc/_static/google_feed.js

doc/cookbook/alphabet.rst

doc/cookbook/checkpointing_long_running.rst

doc/cookbook/ensembl.rst

doc/cookbook/loading_sequences.rst

doc/cookbook/managing_trees.rst

doc/cookbook/moltypesequence.rst

doc/cookbook/parallel_tasks.rst

doc/cookbook/phylonodes.rst

doc/cookbook/structural_contacts.rst

doc/cookbook/structural_data_2.rst

doc/data/1HQF.pdb

doc/data/Crump_et_al_example_env_file.txt

doc/data/Crump_example_tree_newick.txt

doc/data/inseqs_protein.fasta

doc/data/refseqs_protein.fasta

doc/examples/building_and_using_an_application_controller.rst

doc/examples/period_estimation.rst

doc/examples/seqsim_alignment_simulation.rst

doc/examples/seqsim_aln_sim_user_alphabet.rst

doc/examples/seqsim_tree_sim.rst

tests/data/F6AVWTA01.sff

tests/data/fastq.txt

tests/test_cluster/test_approximate_mds.py

tests/test_maths/test_period.py

tests/test_maths/test_stats/test_information_criteria.py

tests/test_maths/test_stats/test_period.py

tests/test_parse/test_binary_sff.py

tests/test_parse/test_fastq.py

tests/test_parse/test_illumina_sequence.py

tests/test_parse/test_kegg_ko.py

tests/test_parse/test_kegg_pos.py

tests/test_parse/test_kegg_taxonomy.py

tests/test_parse/test_mothur.py

tests/test_parse/test_pdb.py

tests/test_parse/test_rna_plot.py

tests/test_parse/test_structure.py

files removed:
tests/test_core/test_tree2.py

files modified:
.pc/fix_python_shebang_line.patch/cogent/align/dp_calculation.py

.pc/fix_python_shebang_line.patch/cogent/data/molecular_weight.py

.pc/fix_python_shebang_line.patch/cogent/format/text_tree.py

.pc/fix_python_shebang_line.patch/cogent/phylo/maximum_likelihood.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/__init__.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/setting.py

ChangeLog

cogent/__init__.py

cogent/align/__init__.py

cogent/align/_compare.c

cogent/align/_compare.pyx

cogent/align/_pairwise_pogs.c

cogent/align/_pairwise_pogs.pyx

cogent/align/_pairwise_seqs.c

cogent/align/_pairwise_seqs.pyx

cogent/align/algorithm.py

cogent/align/align.py

cogent/align/dp_calculation.py

cogent/align/indel_model.py

cogent/align/indel_positions.py

cogent/align/pairwise.py

cogent/align/partial_order_graph.py

cogent/align/progressive.py

cogent/align/pycompare.py

cogent/align/traceback.py

cogent/align/weights/__init__.py

cogent/align/weights/methods.py

cogent/align/weights/util.py

cogent/app/__init__.py

cogent/app/blast.py

cogent/app/carnac.py

cogent/app/cd_hit.py

cogent/app/clearcut.py

cogent/app/clustalw.py

cogent/app/cmfinder.py

cogent/app/comrna.py

cogent/app/consan.py

cogent/app/contrafold.py

cogent/app/cove.py

cogent/app/dialign.py

cogent/app/dotur.py

cogent/app/dynalign.py

cogent/app/fasttree.py

cogent/app/fasttree_v1.py

cogent/app/foldalign.py

cogent/app/formatdb.py

cogent/app/gctmpca.py

cogent/app/ilm.py

cogent/app/infernal.py

cogent/app/knetfold.py

cogent/app/mafft.py

cogent/app/mfold.py

cogent/app/mothur.py

cogent/app/msms.py

cogent/app/muscle.py

cogent/app/nupack.py

cogent/app/parameters.py

cogent/app/pfold.py

cogent/app/pknotsrg.py

cogent/app/raxml.py

cogent/app/rdp_classifier.py

cogent/app/rnaalifold.py

cogent/app/rnaforester.py

cogent/app/rnashapes.py

cogent/app/rnaview.py

cogent/app/sfffile.py

cogent/app/sffinfo.py

cogent/app/sfold.py

cogent/app/stride.py

cogent/app/uclust.py

cogent/app/unafold.py

cogent/app/util.py

cogent/app/vienna_package.py

cogent/cluster/UPGMA.py

cogent/cluster/__init__.py

cogent/cluster/goodness_of_fit.py

cogent/cluster/metric_scaling.py

cogent/cluster/nmds.py

cogent/cluster/procrustes.py

cogent/core/__init__.py

cogent/core/alignment.py

cogent/core/alphabet.py

cogent/core/annotation.py

cogent/core/bitvector.py

cogent/core/entity.py

cogent/core/genetic_code.py

cogent/core/info.py

cogent/core/location.py

cogent/core/moltype.py

cogent/core/profile.py

cogent/core/sequence.py

cogent/core/tree.py

cogent/core/usage.py

cogent/data/__init__.py

cogent/data/energy_params.py

cogent/data/ligand_properties.py

cogent/data/molecular_weight.py

cogent/data/nucleic_properties.py

cogent/data/protein_properties.py

cogent/db/__init__.py

cogent/db/ensembl/__init__.py

cogent/db/ensembl/assembly.py

cogent/db/ensembl/compara.py

cogent/db/ensembl/database.py

cogent/db/ensembl/feature_level.py

cogent/db/ensembl/genome.py

cogent/db/ensembl/host.py

cogent/db/ensembl/name.py

cogent/db/ensembl/region.py

cogent/db/ensembl/related_region.py

cogent/db/ensembl/sequence.py

cogent/db/ensembl/species.py

cogent/db/ensembl/util.py

cogent/db/ncbi.py

cogent/db/pdb.py

cogent/db/rfam.py

cogent/db/util.py

cogent/draw/__init__.py

cogent/draw/arrow_rates.py

cogent/draw/codon_usage.py

cogent/draw/dendrogram.py

cogent/draw/dinuc.py

cogent/draw/dotplot.py

cogent/draw/fancy_arrow.py

cogent/draw/legend.py

cogent/draw/linear.py

cogent/draw/multivariate_plot.py

cogent/draw/rlg2mpl.py

cogent/draw/util.py

cogent/evolve/__init__.py

cogent/evolve/_likelihood_tree.c

cogent/evolve/_likelihood_tree.pyx

cogent/evolve/best_likelihood.py

cogent/evolve/bootstrap.py

cogent/evolve/coevolution.py

cogent/evolve/discrete_markov.py

cogent/evolve/likelihood_calculation.py

cogent/evolve/likelihood_function.py

cogent/evolve/likelihood_tree.py

cogent/evolve/models.py

cogent/evolve/motif_prob_model.py

cogent/evolve/parameter_controller.py

cogent/evolve/predicate.py

cogent/evolve/simulate.py

cogent/evolve/substitution_calculation.py

cogent/evolve/substitution_model.py

cogent/format/__init__.py

cogent/format/alignment.py

cogent/format/clustal.py

cogent/format/fasta.py

cogent/format/mage.py

cogent/format/motif.py

cogent/format/nexus.py

cogent/format/pdb.py

cogent/format/pdb_color.py

cogent/format/phylip.py

cogent/format/rna_struct.py

cogent/format/stockholm.py

cogent/format/structure.py

cogent/format/table.py

cogent/format/text_tree.py

cogent/format/xyzrn.py

cogent/maths/__init__.py

cogent/maths/_matrix_exponentiation.c

cogent/maths/_matrix_exponentiation.pyx

cogent/maths/distance_transform.py

cogent/maths/eigen.c

cogent/maths/function_optimisation.py

cogent/maths/geometry.py

cogent/maths/markov.py

cogent/maths/matrix/__init__.py

cogent/maths/matrix/distance.py

cogent/maths/matrix_exponentiation.py

cogent/maths/matrix_invert.c

cogent/maths/matrix_logarithm.py

cogent/maths/optimiser.py

cogent/maths/optimisers.py

cogent/maths/scipy_optimisers.py

cogent/maths/scipy_optimize.py

cogent/maths/simannealingoptimiser.py

cogent/maths/solve.py

cogent/maths/spatial/__init__.py

cogent/maths/spatial/ckd3.c

cogent/maths/spatial/ckd3.pyx

cogent/maths/stats/__init__.py

cogent/maths/stats/alpha_diversity.py

cogent/maths/stats/cai/__init__.py

cogent/maths/stats/cai/adaptor.py

cogent/maths/stats/cai/get_by_cai.py

cogent/maths/stats/cai/util.py

cogent/maths/stats/distribution.py

cogent/maths/stats/histogram.py

cogent/maths/stats/kendall.py

cogent/maths/stats/ks.py

cogent/maths/stats/rarefaction.py

cogent/maths/stats/special.py

cogent/maths/stats/test.py

cogent/maths/stats/util.py

cogent/maths/svd.py

cogent/maths/unifrac/__init__.py

cogent/maths/unifrac/fast_tree.py

cogent/maths/unifrac/fast_unifrac.py

cogent/motif/__init__.py

cogent/motif/k_word.py

cogent/motif/util.py

cogent/parse/__init__.py

cogent/parse/aaindex.py

cogent/parse/agilent_microarray.py

cogent/parse/blast.py

cogent/parse/blast_xml.py

cogent/parse/bpseq.py

cogent/parse/carnac.py

cogent/parse/cigar.py

cogent/parse/clustal.py

cogent/parse/cmfinder.py

cogent/parse/column.py

cogent/parse/comrna.py

cogent/parse/consan.py

cogent/parse/contrafold.py

cogent/parse/cove.py

cogent/parse/ct.py

cogent/parse/cut.py

cogent/parse/cutg.py

cogent/parse/dialign.py

cogent/parse/dotur.py

cogent/parse/dynalign.py

cogent/parse/ebi.py

cogent/parse/fasta.py

cogent/parse/flowgram.py

cogent/parse/flowgram_collection.py

cogent/parse/flowgram_parser.py

cogent/parse/foldalign.py

cogent/parse/gbseq.py

cogent/parse/gcg.py

cogent/parse/genbank.py

cogent/parse/gff.py

cogent/parse/gibbs.py

cogent/parse/ilm.py

cogent/parse/infernal.py

cogent/parse/knetfold.py

cogent/parse/locuslink.py

cogent/parse/macsim.py

cogent/parse/mage.py

cogent/parse/meme.py

cogent/parse/mfold.py

cogent/parse/mothur.py

cogent/parse/msms.py

cogent/parse/ncbi_taxonomy.py

cogent/parse/newick.py

cogent/parse/nexus.py

cogent/parse/nupack.py

cogent/parse/paml.py

cogent/parse/paml_matrix.py

cogent/parse/pdb.py

cogent/parse/pfold.py

cogent/parse/phylip.py

cogent/parse/pknotsrg.py

cogent/parse/rdb.py

cogent/parse/record.py

cogent/parse/record_finder.py

cogent/parse/rfam.py

cogent/parse/rna_fold.py

cogent/parse/rna_plot.py

cogent/parse/rnaalifold.py

cogent/parse/rnaforester.py

cogent/parse/rnashapes.py

cogent/parse/rnaview.py

cogent/parse/sequence.py

cogent/parse/sfold.py

cogent/parse/sprinzl.py

cogent/parse/stride.py

cogent/parse/structure.py

cogent/parse/table.py

cogent/parse/tinyseq.py

cogent/parse/tree.py

cogent/parse/tree_xml.py

cogent/parse/unafold.py

cogent/parse/unigene.py

cogent/phylo/__init__.py

cogent/phylo/compatibility.py

cogent/phylo/consensus.py

cogent/phylo/distance.py

cogent/phylo/least_squares.py

cogent/phylo/maximum_likelihood.py

cogent/phylo/nj.py

cogent/phylo/tree_collection.py

cogent/phylo/tree_space.py

cogent/phylo/util.py

cogent/recalculation/__init__.py

cogent/recalculation/calculation.py

cogent/recalculation/definition.py

cogent/recalculation/scope.py

cogent/recalculation/setting.py

cogent/seqsim/__init__.py

cogent/seqsim/analysis.py

cogent/seqsim/birth_death.py

cogent/seqsim/markov.py

cogent/seqsim/microarray.py

cogent/seqsim/microarray_normalize.py

cogent/seqsim/randomization.py

cogent/seqsim/searchpath.py

cogent/seqsim/sequence_generators.py

cogent/seqsim/tree.py

cogent/seqsim/usage.py

cogent/struct/__init__.py

cogent/struct/_asa.c

cogent/struct/_asa.pyx

cogent/struct/_contact.c

cogent/struct/_contact.pyx

cogent/struct/annotation.py

cogent/struct/asa.py

cogent/struct/contact.py

cogent/struct/dihedral.py

cogent/struct/knots.py

cogent/struct/manipulation.py

cogent/struct/pairs_util.py

cogent/struct/rna2d.py

cogent/struct/selection.py

cogent/util/__init__.py

cogent/util/array.py

cogent/util/checkpointing.py

cogent/util/datatypes.py

cogent/util/dict2d.py

cogent/util/dict_array.py

cogent/util/misc.py

cogent/util/modules.py

cogent/util/organizer.py

cogent/util/parallel.py

cogent/util/recode_alignment.py

cogent/util/table.py

cogent/util/transform.py

cogent/util/trie.py

cogent/util/unit_test.py

cogent/util/update_version.py

cogent/util/warning.py

debian/changelog

debian/control

doc/conf.py

doc/cookbook/DNA_and_RNA_sequences.rst

doc/cookbook/accessing_databases.rst

doc/cookbook/alignments.rst

doc/cookbook/analysis_of_sequence_composition.rst

doc/cookbook/annotations.rst

doc/cookbook/blast.rst

doc/cookbook/building_alignments.rst

doc/cookbook/building_phylogenies.rst

doc/cookbook/community_analysis.rst

doc/cookbook/dealing_with_hts_data.rst

doc/cookbook/genetic_code.rst

doc/cookbook/hpc_environments.rst

doc/cookbook/index.rst

doc/cookbook/introduction.rst

doc/cookbook/manipulating_biological_data.rst

doc/cookbook/multivariate_data_analysis.rst

doc/cookbook/simple_trees.rst

doc/cookbook/standard_statistical_analyses.rst

doc/cookbook/structural_data.rst

doc/cookbook/tips_for_using_python.rst

doc/cookbook/useful_utilities.rst

doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

doc/data_file_links.rst

doc/examples/alignment_app_controllers.rst

doc/examples/application_controller_framework.rst

doc/examples/calculate_UPGMA_cluster.rst

doc/examples/calculate_neigbourjoining_tree.rst

doc/examples/calculate_pairwise_distances.rst

doc/examples/codon_models.rst

doc/examples/draw_dendrogram.rst

doc/examples/draw_dotplot.rst

doc/examples/empirical_protein_models.rst

doc/examples/estimate_startingpoint.rst

doc/examples/genetic_code_aa_index.rst

doc/examples/handling_3dstructures.rst

doc/examples/hmm_par_heterogeneity.rst

doc/examples/index.rst

doc/examples/maketree_from_proteinseqs.rst

doc/examples/neutral_test.rst

doc/examples/parametric_bootstrap.rst

doc/examples/perform_PCoA_analysis.rst

doc/examples/phylo_by_ls.rst

doc/examples/phylogeny_app_controllers.rst

doc/examples/query_ensembl.rst

doc/examples/query_ncbi.rst

doc/examples/rate_heterogeneity.rst

doc/examples/relative_rate.rst

doc/examples/reuse_results.rst

doc/examples/scope_model_params_on_trees.rst

doc/examples/simple.rst

doc/examples/testing_multi_loci.rst

doc/examples/unrestricted_nucleotide.rst

doc/index.rst

doc/install.rst

doc/templates/layout.html

include/array_interface.h

include/numerical_pyrex.pyx

setup.py

tests/__init__.py

tests/alltests.py

tests/benchmark.py

tests/benchmark_aligning.py

tests/test_align/__init__.py

tests/test_align/test_algorithm.py

tests/test_align/test_align.py

tests/test_align/test_weights/__init__.py

tests/test_align/test_weights/test_methods.py

tests/test_align/test_weights/test_util.py

tests/test_app/__init__.py

tests/test_app/test_blast.py

tests/test_app/test_carnac.py

tests/test_app/test_cd_hit.py

tests/test_app/test_clearcut.py

tests/test_app/test_clustalw.py

tests/test_app/test_cmfinder.py

tests/test_app/test_comrna.py

tests/test_app/test_consan.py

tests/test_app/test_contrafold.py

tests/test_app/test_cove.py

tests/test_app/test_dialign.py

tests/test_app/test_dotur.py

tests/test_app/test_dynalign.py

tests/test_app/test_fasttree.py

tests/test_app/test_fasttree_v1.py

tests/test_app/test_foldalign.py

tests/test_app/test_formatdb.py

tests/test_app/test_gctmpca.py

tests/test_app/test_ilm.py

tests/test_app/test_infernal.py

tests/test_app/test_knetfold.py

tests/test_app/test_mafft.py

tests/test_app/test_mfold.py

tests/test_app/test_mothur.py

tests/test_app/test_msms.py

tests/test_app/test_muscle.py

tests/test_app/test_nupack.py

tests/test_app/test_parameters.py

tests/test_app/test_pfold.py

tests/test_app/test_pknotsrg.py

tests/test_app/test_raxml.py

tests/test_app/test_rdp_classifier.py

tests/test_app/test_rnaalifold.py

tests/test_app/test_rnaforester.py

tests/test_app/test_rnaview.py

tests/test_app/test_sfffile.py

tests/test_app/test_sffinfo.py

tests/test_app/test_sfold.py

tests/test_app/test_stride.py

tests/test_app/test_uclust.py

tests/test_app/test_unafold.py

tests/test_app/test_util.py

tests/test_app/test_vienna_package.py

tests/test_cluster/__init__.py

tests/test_cluster/test_UPGMA.py

tests/test_cluster/test_goodness_of_fit.py

tests/test_cluster/test_metric_scaling.py

tests/test_cluster/test_nmds.py

tests/test_cluster/test_procrustes.py

tests/test_core/__init__.py

tests/test_core/test_alignment.py

tests/test_core/test_alphabet.py

tests/test_core/test_annotation.py

tests/test_core/test_bitvector.py

tests/test_core/test_core_standalone.py

tests/test_core/test_entity.py

tests/test_core/test_genetic_code.py

tests/test_core/test_info.py

tests/test_core/test_location.py

tests/test_core/test_maps.py

tests/test_core/test_moltype.py

tests/test_core/test_profile.py

tests/test_core/test_seq_aln_integration.py

tests/test_core/test_sequence.py

tests/test_core/test_tree.py

tests/test_core/test_usage.py

tests/test_data/__init__.py

tests/test_data/test_molecular_weight.py

tests/test_db/__init__.py

tests/test_db/test_ensembl/__init__.py

tests/test_db/test_ensembl/test_assembly.py

tests/test_db/test_ensembl/test_compara.py

tests/test_db/test_ensembl/test_database.py

tests/test_db/test_ensembl/test_feature_level.py

tests/test_db/test_ensembl/test_genome.py

tests/test_db/test_ensembl/test_host.py

tests/test_db/test_ensembl/test_species.py

tests/test_db/test_ncbi.py

tests/test_db/test_pdb.py

tests/test_db/test_rfam.py

tests/test_db/test_util.py

tests/test_draw.py

tests/test_draw/test_matplotlib/test_arrow_rates.py

tests/test_draw/test_matplotlib/test_codon_usage.py

tests/test_draw/test_matplotlib/test_dinuc.py

tests/test_draw/test_matplotlib/test_multivariate_plot.py

tests/test_evolve/__init__.py

tests/test_evolve/test_best_likelihood.py

tests/test_evolve/test_bootstrap.py

tests/test_evolve/test_coevolution.py

tests/test_evolve/test_likelihood_function.py

tests/test_evolve/test_models.py

tests/test_evolve/test_motifchange.py

tests/test_evolve/test_newq.py

tests/test_evolve/test_parameter_controller.py

tests/test_evolve/test_scale_rules.py

tests/test_evolve/test_simulation.py

tests/test_evolve/test_substitution_model.py

tests/test_format/__init__.py

tests/test_format/test_clustal.py

tests/test_format/test_fasta.py

tests/test_format/test_mage.py

tests/test_format/test_pdb_color.py

tests/test_format/test_stockholm.py

tests/test_format/test_xyzrn.py

tests/test_maths/__init__.py

tests/test_maths/test_distance_transform.py

tests/test_maths/test_function_optimisation.py

tests/test_maths/test_geometry.py

tests/test_maths/test_matrix/__init__.py

tests/test_maths/test_matrix/test_distance.py

tests/test_maths/test_matrix_logarithm.py

tests/test_maths/test_optimisers.py

tests/test_maths/test_spatial/__init__.py

tests/test_maths/test_spatial/test_ckd3.py

tests/test_maths/test_stats/__init__.py

tests/test_maths/test_stats/test_alpha_diversity.py

tests/test_maths/test_stats/test_cai/__init__.py

tests/test_maths/test_stats/test_cai/test_adaptor.py

tests/test_maths/test_stats/test_cai/test_get_by_cai.py

tests/test_maths/test_stats/test_cai/test_util.py

tests/test_maths/test_stats/test_distribution.py

tests/test_maths/test_stats/test_histogram.py

tests/test_maths/test_stats/test_ks.py

tests/test_maths/test_stats/test_rarefaction.py

tests/test_maths/test_stats/test_special.py

tests/test_maths/test_stats/test_test.py

tests/test_maths/test_stats/test_util.py

tests/test_maths/test_svd.py

tests/test_maths/test_unifrac/__init__.py

tests/test_maths/test_unifrac/test_fast_tree.py

tests/test_maths/test_unifrac/test_fast_unifrac.py

tests/test_motif/__init__.py

tests/test_motif/test_util.py

tests/test_parse/__init__.py

tests/test_parse/test_aaindex.py

tests/test_parse/test_agilent_microarray.py

tests/test_parse/test_blast.py

tests/test_parse/test_blast_xml.py

tests/test_parse/test_bpseq.py

tests/test_parse/test_cigar.py

tests/test_parse/test_clustal.py

tests/test_parse/test_column.py

tests/test_parse/test_comrna.py

tests/test_parse/test_consan.py

tests/test_parse/test_cove.py

tests/test_parse/test_ct.py

tests/test_parse/test_cut.py

tests/test_parse/test_cutg.py

tests/test_parse/test_dialign.py

tests/test_parse/test_dotur.py

tests/test_parse/test_ebi.py

tests/test_parse/test_fasta.py

tests/test_parse/test_flowgram.py

tests/test_parse/test_flowgram_collection.py

tests/test_parse/test_flowgram_parser.py

tests/test_parse/test_genbank.py

tests/test_parse/test_gff.py

tests/test_parse/test_gibbs.py

tests/test_parse/test_ilm.py

tests/test_parse/test_infernal.py

tests/test_parse/test_locuslink.py

tests/test_parse/test_mage.py

tests/test_parse/test_meme.py

tests/test_parse/test_msms.py

tests/test_parse/test_ncbi_taxonomy.py

tests/test_parse/test_nexus.py

tests/test_parse/test_nupack.py

tests/test_parse/test_phylip.py

tests/test_parse/test_pknotsrg.py

tests/test_parse/test_rdb.py

tests/test_parse/test_record.py

tests/test_parse/test_record_finder.py

tests/test_parse/test_rfam.py

tests/test_parse/test_rna_fold.py

tests/test_parse/test_rnaalifold.py

tests/test_parse/test_rnaforester.py

tests/test_parse/test_rnaview.py

tests/test_parse/test_sprinzl.py

tests/test_parse/test_stride.py

tests/test_parse/test_tree.py

tests/test_parse/test_unigene.py

tests/test_phylo.py

tests/test_recalculation.rst

tests/test_seqsim/__init__.py

tests/test_seqsim/test_analysis.py

tests/test_seqsim/test_birth_death.py

tests/test_seqsim/test_markov.py

tests/test_seqsim/test_microarray.py

tests/test_seqsim/test_microarray_normalize.py

tests/test_seqsim/test_randomization.py

tests/test_seqsim/test_searchpath.py

tests/test_seqsim/test_sequence_generators.py

tests/test_seqsim/test_tree.py

tests/test_seqsim/test_usage.py

tests/test_struct/__init__.py

tests/test_struct/test_annotation.py

tests/test_struct/test_asa.py

tests/test_struct/test_contact.py

tests/test_struct/test_dihedral.py

tests/test_struct/test_knots.py

tests/test_struct/test_manipulation.py

tests/test_struct/test_pairs_util.py

tests/test_struct/test_rna2d.py

tests/test_struct/test_selection.py

tests/test_util/__init__.py

tests/test_util/test_array.py

tests/test_util/test_dict2d.py

tests/test_util/test_misc.py

tests/test_util/test_organizer.py

tests/test_util/test_recode_alignment.py

tests/test_util/test_table.rst

tests/test_util/test_transform.py

tests/test_util/test_trie.py

tests/test_util/test_unit_test.py

tests/timetrial.py

Show diffs side-by-side

added added

removed removed

doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

Canned models

-------------

*To be written.*

MotifChange and predicates

--------------------------

*To be written.*

Many standard evolutionary models come pre-defined in the ``cogent.evolve.models`` module.

The available nucleotide, codon and protein models are

.. doctest::

>>> from cogent.evolve import models

>>> print models.nucleotide_models

['JC69', 'F81', 'HKY85', 'GTR']

>>> print models.codon_models

['CNFGTR', 'CNFHKY', 'MG94HKY', 'MG94GTR', 'GY94', 'H04G', 'H04GK', 'H04GGK']

>>> print models.protein_models

['DSO78', 'AH96', 'AH96_mtmammals', 'JTT92', 'WG01']

While those values are strings, a function of the same name exists within the module so creating the substitution models requires only calling that function. I demonstrate that for a nucleotide model here.

.. doctest::

>>> from cogent.evolve.models import F81

>>> sub_mod = F81()

We'll be using these for the examples below.

Rate heterogeneity models

-------------------------

*To be written.*

We illustrate this for the gamma distributed case using examples of the canned models displayed above. Creating rate heterogeneity variants of the canned models can be done by using optional arguments that get passed to the substitution model class.

For nucleotide

^^^^^^^^^^^^^^

We specify a general time reversible nucleotide model with gamma distributed rate heterogeneity.

.. doctest::

>>> from cogent.evolve.models import GTR

>>> sub_mod = GTR(with_rate=True, distribution='gamma')

>>> print sub_mod

Nucleotide ( name = 'GTR'; type = 'None'; params = ['A/G', 'A/T', 'A/C', 'C/T', 'C/G']; number of motifs = 4; motifs = ['T', 'C', 'A', 'G'])

For codon

^^^^^^^^^

We specify a conditional nucleotide frequency codon model with nucleotide general time reversible parameters and a parameter for the ratio of nonsynonymous to synonymous substitutions (omega) with gamma distributed rate heterogeneity.

.. doctest::

>>> from cogent.evolve.models import CNFGTR

>>> sub_mod = CNFGTR(with_rate=True, distribution='gamma')

>>> print sub_mod

Codon ( name = 'CNFGTR'; type = 'None'; params = ['A/G', 'A/C', 'C/T', 'A/T', 'C/G', 'omega']; ...

For protein

^^^^^^^^^^^

We specify a Jones, Taylor and Thornton 1992 empirical protein substitution model with gamma distributed rate heterogeneity.

.. doctest::

>>> from cogent.evolve.models import JTT92

>>> sub_mod = JTT92(with_rate=True, distribution='gamma')

>>> print sub_mod

Empirical ( name = 'JTT92'; type = 'None'; number of motifs = 20; motifs = ['A', 'C'...

Specifying likelihood functions

===============================

Making a likelihood function

----------------------------

You start by specifying a substitution model and use that to construct a likelihood function for a specific tree.

.. doctest::

>>> from cogent import LoadTree

>>> from cogent.evolve.models import F81

>>> sub_mod = F81()

>>> tree = LoadTree(treestring='(a,b,(c,d))')

>>> lf = sub_mod.makeLikelihoodFunction(tree)

Providing an alignment to a likelihood function

-----------------------------------------------

You need to load an alignment and then provide it a likelihood function. I construct very simple trees and alignments for this example.

100

.. doctest::

101

102

>>> from cogent import LoadTree, LoadSeqs

103

>>> from cogent.evolve.models import F81

104

>>> sub_mod = F81()

105

>>> tree = LoadTree(treestring='(a,b,(c,d))')

106

>>> lf = sub_mod.makeLikelihoodFunction(tree)

107

>>> aln = LoadSeqs(data=[('a', 'ACGT'), ('b', 'AC-T'), ('c', 'ACGT'),

108

... ('d', 'AC-T')])

109

...

110

>>> lf.setAlignment(aln)

111

112

Scoping parameters on trees

113

---------------------------

114

*To be written.*

115

For many evolutionary analyses, it's desirable to allow different branches on a tree to have different values of a parameter. We show this for a simple codon model case here where we want the great apes (the clade that includes human and orangutan) to have a different value of the ratio of nonsynonymous to synonymous substitutions. This parameter is identified in the precanned ``CNFGTR`` model as ``omega``.

116

117

.. doctest::

118

119

>>> from cogent import LoadTree

120

>>> from cogent.evolve.models import CNFGTR

121

>>> tree = LoadTree('data/primate_brca1.tree')

122

>>> print tree.asciiArt()

123

/-Galago

124

125

-root----|--HowlerMon

126

127

| /-Rhesus

128

\edge.3--|

129

| /-Orangutan

130

\edge.2--|

131

| /-Gorilla

132

\edge.1--|

133

| /-Human

134

\edge.0--|

135

\-Chimpanzee

136

>>> sm = CNFGTR()

137

>>> lf = sm.makeLikelihoodFunction(tree, digits=2)

138

>>> lf.setParamRule('omega', tip_names=['Human', 'Orangutan'], outgroup_name='Galago', is_clade=True, init=0.5)

139

140

We've set an *initial* value for this clade so that the edges affected by this rule are evident below.

141

142

.. doctest::

143

144

>>> print lf

145

Likelihood Function Table

146

====================================

147

A/C A/G A/T C/G C/T

148

------------------------------------

149

1.00 1.00 1.00 1.00 1.00

150

------------------------------------

151

=======================================

152

edge parent length omega

153

---------------------------------------

154

Galago root 1.00 1.00

155

HowlerMon root 1.00 1.00

156

Rhesus edge.3 1.00 1.00

157

Orangutan edge.2 1.00 0.50

158

Gorilla edge.1 1.00 0.50

159

Human edge.0 1.00 0.50

160

Chimpanzee edge.0 1.00 0.50

161

edge.0 edge.1 1.00 0.50

162

edge.1 edge.2 1.00 0.50

163

edge.2 edge.3 1.00 1.00

164

edge.3 root 1.00 1.00

165

---------------------------------------...

166

167

A more extensive description of capabilities is in :ref:`scope-params-on-trees`.

168

169

Specifying parameter values

170

---------------------------

171

*To be written.*

.. constant, bounds, initial

172

Specifying a parameter as constant

173

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

174

175

This means the parameter will not be modified during likelihood maximisation. We show this here by making the ``omega`` parameter constant at the value 1 -- essentially the condition of selective neutrality.

176

177

.. doctest::

178

179

>>> from cogent import LoadTree

180

>>> from cogent.evolve.models import CNFGTR

181

>>> tree = LoadTree('data/primate_brca1.tree')

182

>>> sm = CNFGTR()

183

>>> lf = sm.makeLikelihoodFunction(tree, digits=2)

184

>>> lf.setParamRule('omega', is_const=True)

185

186

Providing a starting value for a parameter

187

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

188

189

This can be useful to improve performance, the closer you are to the maximum likelihood estimator the quicker optimisation will be.

190

191

.. doctest::

192

193

>>> from cogent import LoadTree

194

>>> from cogent.evolve.models import CNFGTR

195

>>> tree = LoadTree('data/primate_brca1.tree')

196

>>> sm = CNFGTR()

197

>>> lf = sm.makeLikelihoodFunction(tree, digits=2)

198

>>> lf.setParamRule('omega', init=0.1)

199

200

Setting bounds for optimising a function

201

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

202

203

This can be useful for stopping optimisers from getting stuck in a bad part of parameter space.

204

205

.. doctest::

206

207

>>> from cogent import LoadTree

208

>>> from cogent.evolve.models import CNFGTR

209

>>> tree = LoadTree('data/primate_brca1.tree')

210

>>> sm = CNFGTR()

211

>>> lf = sm.makeLikelihoodFunction(tree, digits=2)

212

>>> lf.setParamRule('omega', init=0.1, lower=1e-9, upper=20.0)

213

214

If you set bounds it's a very good idea to set the starting value too. That way you can be sure the starting value lies within the bounds you set. The default parameter value for substitution parameter exchangeability terms is 1.0, so if you set an upper bound of 0.5, you'll get an error (shown below) when you try to optimise the likelihood.

215

216

.. doctest::

217

218

>>> from cogent import LoadTree, LoadSeqs

219

>>> from cogent.evolve.models import CNFGTR

220

>>> tree = LoadTree('data/primate_brca1.tree')

221

>>> sm = CNFGTR()

222

>>> lf = sm.makeLikelihoodFunction(tree, digits=2)

223

>>> lf.setParamRule('omega', upper=0.5, init=1.0)

224

>>> aln = LoadSeqs('data/primate_brca1.fasta')

225

>>> lf.setAlignment(aln)

226

>>> lf.optimise()

227

Traceback (most recent call last):

228

ValueError: Initial parameter values must be valid ...

229

230

Specifying rate heterogeneity functions

231

---------------------------------------

232

*To be written.*

233

We extend the simple gamma distributed rate heterogeneity case for nucleotides from above to construction of the actual likelihood function. We do this for 4 bins and constraint the bin probabilities to be equal.

234

235

.. doctest::

236

237

>>> from cogent import LoadTree, LoadSeqs

238

>>> from cogent.evolve.models import GTR

239

>>> sm = GTR(with_rate=True, distribution='gamma')

240

>>> tree = LoadTree('data/primate_brca1.tree')

241

>>> lf = sm.makeLikelihoodFunction(tree, bins=4, digits=2)

242

>>> lf.setParamRule('bprobs', is_const=True)

243

244

For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity`.

245

246

Specifying Phylo-HMMs

247

---------------------

248

*To be written.*

249

.. doctest::

250

251

>>> from cogent import LoadTree, LoadSeqs

252

>>> from cogent.evolve.models import GTR

253

>>> sm = GTR(with_rate=True, distribution='gamma')

254

>>> tree = LoadTree('data/primate_brca1.tree')

255

>>> lf = sm.makeLikelihoodFunction(tree, bins=4, sites_independent=False,

256

... digits=2)

257

>>> lf.setParamRule('bprobs', is_const=True)

258

259

For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity-hmm`.

260

261

Fitting likelihood functions

262

============================

264

Choice of optimisers

265

--------------------

266

*To be written.*

267

There are 2 types of optimiser: simulated annealing, a *global* optimiser; and Powell, a *local* optimiser. The simulated annealing method is slow compared to Powell and in general Powell is an adequate choice. I setup a simple nucleotide model to illustrate these.

268

269

.. doctest::

270

271

>>> from cogent import LoadTree, LoadSeqs

272

>>> from cogent.evolve.models import F81

273

>>> tree = LoadTree('data/primate_brca1.tree')

274

>>> aln = LoadSeqs('data/primate_brca1.fasta')

275

>>> sm = F81()

276

>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)

277

>>> lf.setAlignment(aln)

278

279

The default is to use the simulated annealing optimiser followed by Powell.

280

281

.. doctest::

282

283

>>> lf.optimise(show_progress=False)

284

285

We can specify just using the local optimiser. To do so, it's recommended to set the ``max_restarts`` argument since this provides a mechanism for Powell to attempt restarting the optimisation from slightly different sport which can help in overcoming local maxima.

286

287

.. doctest::

288

289

>>> lf.optimise(local=True, max_restarts=5, show_progress=False)

290

291

We might want to do crude simulated annealing following by more rigorous Powell.

292

293

.. doctest::

294

295

>>> lf.optimise(show_progress=False, global_tolerance=1.0, tolerance=1e-8,

296

... max_restarts=5)

297

298

Checkpointing runs

299

------------------

300

*To be written.*

301

See :ref:`checkpointing-optimisation`.

302

303

How to check your optimisation was successful.

304

----------------------------------------------

305

*To be written.*

.. Try again, use global optimisation, check maximum numbers of calculations not exceeded.

306

There is no guarantee that an optimised function has achieved a global maximum. We can, however, be sure that a maximum was achieved by validating that the optimiser stopped because the specified tolerance condition was met, rather than exceeding the maximum number of evaluations. The latter number is set to ensure optimisation doesn't proceed endlessly. If the optimiser exited because this limit was exceeded you can be sure that the function **has not** been successfully optimised.

307

308

To take this approach we first need to specify a maximum and second we need to get back the actual calculator object as this records how many evaluations it has done. I set a very small maximum so the optimiser exits too early.

309

310

.. doctest::

311

312

>>> from cogent import LoadTree, LoadSeqs

313

>>> from cogent.evolve.models import F81

314

>>> tree = LoadTree('data/primate_brca1.tree')

315

>>> aln = LoadSeqs('data/primate_brca1.fasta')

316

>>> sm = F81()

317

>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)

318

>>> lf.setAlignment(aln)

319

>>> max_evals = 10

320

>>> calculator = lf.optimise(show_progress=False,

321

... max_evaluations=max_evals, return_calculator=True)

322

...

323

FORCED EXIT from SimulatedAnnealing:

324

Too many function evaluations, results are likely to be poor.

325

You can increase max_evaluations or decrease tolerance...

326

>>> if calculator.evaluations > max_evals:

327

... print 'Failed to optimise'

328

Failed to optimise

329

330

331

332

Getting statistics out of likelihood functions

333

==============================================

334

*To be written.*

.. the annotated tree, the tables, getParamValue

335

Model fit statistics

336

--------------------

337

338

Log likelihood and number of free parameters

339

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

340

341

.. doctest::

342

343

>>> from cogent import LoadTree, LoadSeqs

344

>>> from cogent.evolve.models import GTR

345

>>> sm = GTR()

346

>>> tree = LoadTree('data/primate_brca1.tree')

347

>>> lf = sm.makeLikelihoodFunction(tree)

348

>>> aln = LoadSeqs('data/primate_brca1.fasta')

349

>>> lf.setAlignment(aln)

350

351

We get the log-likelihood and the number of free parameters.

352

353

.. doctest::

354

355

>>> lnL = lf.getLogLikelihood()

356

>>> print lnL

357

-24601.9...

358

>>> nfp = lf.getNumFreeParams()

359

>>> print nfp

360

361

362

.. warning:: The number of free parameters (nfp) refers only to the number of parameters that were modifiable by the optimiser. Typically, the degrees-of-freedom of a likelihood ratio test statistic is computed as the difference in nfp between models. This will not be correct for models in which boundary conditions exist (rate heterogeneity models where a parameter value boundary is set between bins).

363

364

Information theoretic measures

365

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

366

367

Aikake Information Criterion

368

""""""""""""""""""""""""""""

369

370

..note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed.

371

372

.. doctest::

373

374

>>> from cogent import LoadTree, LoadSeqs

375

>>> from cogent.evolve.models import GTR

376

>>> sm = GTR()

377

>>> tree = LoadTree('data/primate_brca1.tree')

378

>>> lf = sm.makeLikelihoodFunction(tree)

379

>>> aln = LoadSeqs('data/primate_brca1.fasta')

380

>>> lf.setAlignment(aln)

381

>>> AIC = lf.getAic()

382

>>> AIC

383

49235.869...

384

385

We can also get the second-order AIC.

386

387

.. doctest::

388

389

>>> AICc = lf.getAic(second_order=True)

390

>>> AICc

391

49236.064...

392

393

Bayesian Information Criterion

394

""""""""""""""""""""""""""""""

395

396

..note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed.

397

398

.. doctest::

399

400

>>> from cogent import LoadTree, LoadSeqs

401

>>> from cogent.evolve.models import GTR

402

>>> sm = GTR()

403

>>> tree = LoadTree('data/primate_brca1.tree')

404

>>> lf = sm.makeLikelihoodFunction(tree)

405

>>> aln = LoadSeqs('data/primate_brca1.fasta')

406

>>> lf.setAlignment(aln)

407

>>> BIC = lf.getBic()

408

>>> BIC

409

49330.9475...

410

411

Getting maximum likelihood estimates

412

------------------------------------

413

414

We fit the model defined in the previous section and use that in the following.

415

416

One at a time

417

^^^^^^^^^^^^^

418

419

We get the statistics out individually. We get the ``length`` for the Human edge and the exchangeability parameter ``A/G``.

420

421

.. doctest::

422

423

>>> lf.optimise(local=True, show_progress=False)

424

>>> a_g = lf.getParamValue('A/G')

425

>>> print a_g

426

5.25...

427

>>> human = lf.getParamValue('length', 'Human')

428

>>> print human

429

0.006...

430

431

Just the motif probabilities

432

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

433

434

.. doctest::

435

436

>>> mprobs = lf.getMotifProbs()

437

>>> print mprobs

438

====================================

439

T C A G

440

------------------------------------

441

0.2406 0.1742 0.3757 0.2095

442

------------------------------------

443

444

On the tree object

445

^^^^^^^^^^^^^^^^^^

446

447

If written to file in xml format, then model parameters will be saved. This can be useful for later plotting or recreating likelihood functions.

448

449

.. doctest::

450

451

>>> annot_tree = lf.getAnnotatedTree()

452

>>> print annot_tree.getXML()

453

<?xml version="1.0"?>

454

<clade>

455

<clade>

456

<name>Galago</name>

457

458

459

460

<param><name>length</name><value>0.173113656134</value></param>...

461

462

.. warning:: This method fails for some rate-heterogeneity models.

463

464

As tables

465

^^^^^^^^^

466

467

.. doctest::

468

469

>>> tables = lf.getStatistics(with_motif_probs=True, with_titles=True)

470

>>> for table in tables:

471

... if 'global' in table.Title:

472

... print table

473

global params

474

==============================================

475

A/C A/G A/T C/G C/T

476

----------------------------------------------

477

1.2316 5.2534 0.9585 2.3158 5.9700

478

----------------------------------------------

479

480

Testing hypotheses

481

==================

482

*To be written.*

.. LRTs, assuming chisq, bootstrapping, randomisation

483

Using likelihood ratio tests

484

----------------------------

485

486

We test the molecular clock hypothesis for human and chimpanzee lineages. The null has these two branches constrained to be equal.

487

488

.. doctest::

489

490

>>> from cogent import LoadTree, LoadSeqs

491

>>> from cogent.evolve.models import F81

492

>>> tree = LoadTree('data/primate_brca1.tree')

493

>>> aln = LoadSeqs('data/primate_brca1.fasta')

494

>>> sm = F81()

495

>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)

496

>>> lf.setAlignment(aln)

497

>>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'],

498

... outgroup_name='Galago', is_clade=True, is_independent=False)

499

...

500

>>> lf.setName('Null Hypothesis')

501

>>> lf.optimise(local=True, show_progress=False)

502

>>> null_lnL = lf.getLogLikelihood()

503

>>> null_nfp = lf.getNumFreeParams()

504

>>> print lf

505

Null Hypothesis

506

==========================

507

edge parent length

508

--------------------------

509

Galago root 0.167

510

HowlerMon root 0.044

511

Rhesus edge.3 0.021

512

Orangutan edge.2 0.008

513

Gorilla edge.1 0.002

514

Human edge.0 0.004

515

Chimpanzee edge.0 0.004

516

edge.0 edge.1 0.000...

517

518

The alternate allows the human and chimpanzee branches to differ by just setting all lengths to be independent.

519

520

.. doctest::

521

522

>>> lf.setParamRule('length', is_independent=True)

523

>>> lf.setName('Alt Hypothesis')

524

>>> lf.optimise(local=True, show_progress=False)

525

>>> alt_lnL = lf.getLogLikelihood()

526

>>> alt_nfp = lf.getNumFreeParams()

527

>>> print lf

528

Alt Hypothesis

529

==========================

530

edge parent length

531

--------------------------

532

Galago root 0.167

533

HowlerMon root 0.044

534

Rhesus edge.3 0.021

535

Orangutan edge.2 0.008

536

Gorilla edge.1 0.002

537

Human edge.0 0.006

538

Chimpanzee edge.0 0.003

539

edge.0 edge.1 0.000...

540

541

We import the function for computing the probability of a chi-square test statistic, compute the likelihood ratio test statistic, degrees of freedom and the corresponding probability.

542

543

.. doctest::

544

545

>>> from cogent.maths.stats import chisqprob

546

>>> LR = 2 * (alt_lnL - null_lnL) # the likelihood ratio statistic

547

>>> df = (alt_nfp - null_nfp) # the test degrees of freedom

548

>>> p = chisqprob(LR, df)

549

>>> print 'LR=%.4f ; df = %d ; p=%.4f' % (LR, df, p)

550

LR=3.3294 ; df = 1 ; p=0.0681

551

552

By parametric bootstrapping

553

---------------------------

554

555

If we can't rely on the asymptotic behaviour of the LRT, e.g. due to small alignment length, we can use a parametric bootstrap. Convenience functions for that are described in more detail here :ref:`parametric-bootstrap`.

556

557

In general, however, this capability derives from the ability of any defined ``evolve`` likelihood function to simulate an alignment. This property is provided as ``simulateAlignment`` method on likelihood function objects.

558

559

.. doctest::

560

561

>>> from cogent import LoadTree, LoadSeqs

562

>>> from cogent.evolve.models import F81

563

>>> tree = LoadTree('data/primate_brca1.tree')

564

>>> aln = LoadSeqs('data/primate_brca1.fasta')

565

>>> sm = F81()

566

>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)

567

>>> lf.setAlignment(aln)

568

>>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'],

569

... outgroup_name='Galago', is_clade=True, is_independent=False)

570

...

571

>>> lf.setName('Null Hypothesis')

572

>>> lf.optimise(local=True, show_progress=False)

573

>>> sim_aln = lf.simulateAlignment()

574

>>> print repr(sim_aln)

575

7 x 2814 dna alignment: Gorilla...

576

577

Determining confidence intervals on MLEs

578

========================================

579

*To be written.*

580

The profile method is used to calculate a confidence interval for a named parameter. We show it here for a global substitution model exchangeability parameter (*kappa*, the ratio of transition to transversion rates) and for an edge specific parameter (just the human branch length).

581

582

.. doctest::

583

584

>>> from cogent import LoadTree, LoadSeqs

585

>>> from cogent.evolve.models import HKY85

586

>>> tree = LoadTree('data/primate_brca1.tree')

587

>>> aln = LoadSeqs('data/primate_brca1.fasta')

588

>>> sm = HKY85()

589

>>> lf = sm.makeLikelihoodFunction(tree)

590

>>> lf.setAlignment(aln)

591

>>> lf.optimise(local=True, show_progress=False)

592

>>> kappa_lo, kappa_mle, kappa_hi = lf.getParamInterval('kappa')

593

>>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (kappa_lo, kappa_mle, kappa_hi)

594

lo=3.78 ; mle=4.44 ; hi = 5.22

595

>>> human_lo, human_mle, human_hi = lf.getParamInterval('length', 'Human')

596

>>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (human_lo, human_mle, human_hi)

597

lo=0.00 ; mle=0.01 ; hi = 0.01

598

599

Saving results

600

==============

601

*To be written.*

602

Use either the annotated tree or statistics tables to obtain objects that can easily be written to file.

603

604

Visualising statistics on trees

605

===============================

606

*To be written.*

607

We look at the distribution of ``omega`` from the CNF codon model family across different primate lineages. We allow each edge to have an independent value for ``omega``.

608

609

.. doctest::

610

611

>>> from cogent import LoadTree, LoadSeqs

612

>>> from cogent.evolve.models import CNFGTR

613

>>> tree = LoadTree('data/primate_brca1.tree')

614

>>> aln = LoadSeqs('data/primate_brca1.fasta')

615

>>> sm = CNFGTR()

616

>>> lf = sm.makeLikelihoodFunction(tree, digits=2, space=2)

617

>>> lf.setParamRule('omega', is_independent=True, upper = 10.0)

618

>>> lf.setAlignment(aln)

619

>>> lf.optimise(show_progress=False, local=True)

620

>>> print lf

621

Likelihood Function Table

622

============================

623

A/C A/G A/T C/G C/T

624

----------------------------

625

1.07 3.88 0.79 1.96 4.09

626

----------------------------

627

=================================

628

edge parent length omega

629

---------------------------------

630

Galago root 0.53 0.85

631

HowlerMon root 0.14 0.71

632

Rhesus edge.3 0.07 0.58

633

Orangutan edge.2 0.02 0.49

634

Gorilla edge.1 0.01 0.43

635

Human edge.0 0.02 2.44

636

Chimpanzee edge.0 0.01 2.28

637

edge.0 edge.1 0.00 1.04

638

edge.1 edge.2 0.01 0.55

639

edge.2 edge.3 0.04 0.33

640

edge.3 root 0.02 1.10...

641

642

We need an annotated tree object to do the drawing, we write this out to an XML formatted file so it can be reloaded for later reuse.

643

644

.. doctest::

645

646

>>> annot_tree = lf.getAnnotatedTree()

647

>>> annot_tree.writeToFile('result_tree.xml')

648

649

We first import an unrooted dendrogram and then generate a heat mapped image to file where edges are colored red by the magnitude of ``omega`` with maximal saturation when ``omega=1``.

650

651

.. doctest::

652

653

>>> from cogent.draw.dendrogram import ContemporaneousDendrogram

654

>>> dend = ContemporaneousDendrogram(annot_tree)

655

>>> fig = dend.makeFigure(height=6, width=6, shade_param='omega',

656

... max_value=1.0, stroke_width=2)

657

>>> fig.savefig('omega_heat_map.png')

658

659

Reconstructing ancestral sequences

660

==================================

661

100

*To be written.*

101

102

.. most likely ancestors, the complete posterior probabilities

662

We first fit a likelihood function.

663

664

.. doctest::

665

666

>>> from cogent import LoadTree, LoadSeqs

667

>>> from cogent.evolve.models import F81

668

>>> tree = LoadTree('data/primate_brca1.tree')

669

>>> aln = LoadSeqs('data/primate_brca1.fasta')

670

>>> sm = F81()

671

>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)

672

>>> lf.setAlignment(aln)

673

>>> lf.optimise(show_progress=False, local=True)

674

675

We then get the most likely ancestral sequences.

676

677

.. doctest::

678

679

>>> ancestors = lf.likelyAncestralSeqs()

680

>>> print ancestors

681

>root

682

TGTGGCACAAATACTCATGCCAGCTCATTACAGCA...

683

684

Or we can get the posterior probabilities (returned as a ``DictArray``) of sequence states at each node.

685

686

.. doctest::

687

688

>>> ancestral_probs = lf.reconstructAncestralSeqs()

689

>>> print ancestral_probs['root']

690

============================================

691

T C A G

692

--------------------------------------------

693

0 0.1816 0.0000 0.0000 0.0000

694

1 0.0000 0.0000 0.0000 0.1561

695

2 0.1816 0.0000 0.0000 0.0000

696

3 0.0000 0.0000 0.0000 0.1561...

103

697

104

698

Tips for improved performance

105

699

=============================

106

700

107

*To be written.*

108

109

701

Sequentially build the fitting

110

702

------------------------------

111

703

112

*To be written.*

113

114

.. start with null, then modify lf to alternate. Don't forget to record the values you need.

115

116

.. how to specify the alt so it is the null for rate heterogeneity models

704

There's nothing that improves performance quite like being close to the maximum likelihood values. So using the ``setParamRule`` method to provide good starting values can be very useful. As this can be difficult to do one easy way is to build simpler models that are nested within the one you're interested in. Fitting those models and then relaxing constraints until you’re at the parameterisation of interest can markedly improve optimisation speed.

705

706

Being able to save results to file allows you to do this between sessions.

117

707

118

708

Sampling

119

709

--------

120

710

121

*To be written.*

122

123

.. using a subset of data

711

If you're dealing with a very large alignment, another approach is to use a subset of the alignment to fit the model then try fitting the entire alignment. The alignment method does have an method to facilitate this approach. The following samples 99 codons without replacement.

712

713

.. doctest::

714

715

>>> from cogent import LoadSeqs

716

>>> aln = LoadSeqs('data/primate_brca1.fasta')

717

>>> smpl = aln.sample(n=99, with_replacement=False, motif_length=3)

718

>>> len(smpl)

719

297

720

721

While this samples 99 nucleotides without replacement.

722

723

.. doctest::

724

725

>>> smpl = aln.sample(n=99, with_replacement=False)

726

>>> len(smpl)

727

728

729

.. following cleans up files

730

731

.. doctest::

732

:hide:

733

734

>>> from cogent.util.misc import remove_files

735

>>> remove_files(['result_tree.xml', 'omega_heat_map.png'],

736

... error_on_missing=False)

Older »