~ubuntu-branches/ubuntu/natty/python-cogent/natty

Viewing changes to doc/cookbook/multivariate_data_analysis.rst

Committer: Bazaar Package Importer
Author(s): Steffen Moeller
Date: 2010-12-04 22:30:35 UTC
mfrom: (1.1.1 upstream)
Revision ID: james.westby@ubuntu.com-20101204223035-j11kinhcrrdgg2p2

Tags: 1.5-1

* Bumped standard to 3.9.1, no changes required.
* New upstream version.
  - major additions to Cookbook
  - added AlleleFreqs attribute to ensembl Variation objects.
  - added getGeneByStableId method to genome objects.
  - added Introns attribute to Transcript objects and an Intron class.
  - added Mann-Whitney test and a Monte-Carlo version
  - exploratory and confirmatory period estimation techniques (suitable for
    symbolic and continuous data)
  - Information theoretic measures (AIC and BIC) added
  - drawing of trees with collapsed nodes
  - progress display indicator support for terminal and GUI apps
  - added parser for illumina HiSeq2000 and GAiix sequence files as
    cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser.
  - added parser to FASTQ files, one of the output options for illumina's
    workflow, also added cookbook demo.
  - added functionality for parsing of SFF files without the Roche tools in
    cogent.parse.binary_sff
  - thousand fold performance improvement to nmds
  - >10-fold performance improvements to some Table operations

files added:
cogent/cluster/approximate_mds.py

cogent/maths/_period.c

cogent/maths/_period.pyx

cogent/maths/period.py

cogent/maths/stats/information_criteria.py

cogent/maths/stats/period.py

cogent/parse/binary_sff.py

cogent/parse/fastq.py

cogent/parse/illumina_sequence.py

cogent/parse/kegg_ko.py

cogent/parse/kegg_pos.py

cogent/parse/kegg_taxonomy.py

cogent/util/progress_display.py

cogent/util/terminal.py

doc/_static

doc/_static/google_feed.js

doc/cookbook/alphabet.rst

doc/cookbook/checkpointing_long_running.rst

doc/cookbook/ensembl.rst

doc/cookbook/loading_sequences.rst

doc/cookbook/managing_trees.rst

doc/cookbook/moltypesequence.rst

doc/cookbook/parallel_tasks.rst

doc/cookbook/phylonodes.rst

doc/cookbook/structural_contacts.rst

doc/cookbook/structural_data_2.rst

doc/data/1HQF.pdb

doc/data/Crump_et_al_example_env_file.txt

doc/data/Crump_example_tree_newick.txt

doc/data/inseqs_protein.fasta

doc/data/refseqs_protein.fasta

doc/examples/building_and_using_an_application_controller.rst

doc/examples/period_estimation.rst

doc/examples/seqsim_alignment_simulation.rst

doc/examples/seqsim_aln_sim_user_alphabet.rst

doc/examples/seqsim_tree_sim.rst

tests/data/F6AVWTA01.sff

tests/data/fastq.txt

tests/test_cluster/test_approximate_mds.py

tests/test_maths/test_period.py

tests/test_maths/test_stats/test_information_criteria.py

tests/test_maths/test_stats/test_period.py

tests/test_parse/test_binary_sff.py

tests/test_parse/test_fastq.py

tests/test_parse/test_illumina_sequence.py

tests/test_parse/test_kegg_ko.py

tests/test_parse/test_kegg_pos.py

tests/test_parse/test_kegg_taxonomy.py

tests/test_parse/test_mothur.py

tests/test_parse/test_pdb.py

tests/test_parse/test_rna_plot.py

tests/test_parse/test_structure.py

files removed:
tests/test_core/test_tree2.py

files modified:
.pc/fix_python_shebang_line.patch/cogent/align/dp_calculation.py

.pc/fix_python_shebang_line.patch/cogent/data/molecular_weight.py

.pc/fix_python_shebang_line.patch/cogent/format/text_tree.py

.pc/fix_python_shebang_line.patch/cogent/phylo/maximum_likelihood.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/__init__.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/setting.py

ChangeLog

cogent/__init__.py

cogent/align/__init__.py

cogent/align/_compare.c

cogent/align/_compare.pyx

cogent/align/_pairwise_pogs.c

cogent/align/_pairwise_pogs.pyx

cogent/align/_pairwise_seqs.c

cogent/align/_pairwise_seqs.pyx

cogent/align/algorithm.py

cogent/align/align.py

cogent/align/dp_calculation.py

cogent/align/indel_model.py

cogent/align/indel_positions.py

cogent/align/pairwise.py

cogent/align/partial_order_graph.py

cogent/align/progressive.py

cogent/align/pycompare.py

cogent/align/traceback.py

cogent/align/weights/__init__.py

cogent/align/weights/methods.py

cogent/align/weights/util.py

cogent/app/__init__.py

cogent/app/blast.py

cogent/app/carnac.py

cogent/app/cd_hit.py

cogent/app/clearcut.py

cogent/app/clustalw.py

cogent/app/cmfinder.py

cogent/app/comrna.py

cogent/app/consan.py

cogent/app/contrafold.py

cogent/app/cove.py

cogent/app/dialign.py

cogent/app/dotur.py

cogent/app/dynalign.py

cogent/app/fasttree.py

cogent/app/fasttree_v1.py

cogent/app/foldalign.py

cogent/app/formatdb.py

cogent/app/gctmpca.py

cogent/app/ilm.py

cogent/app/infernal.py

cogent/app/knetfold.py

cogent/app/mafft.py

cogent/app/mfold.py

cogent/app/mothur.py

cogent/app/msms.py

cogent/app/muscle.py

cogent/app/nupack.py

cogent/app/parameters.py

cogent/app/pfold.py

cogent/app/pknotsrg.py

cogent/app/raxml.py

cogent/app/rdp_classifier.py

cogent/app/rnaalifold.py

cogent/app/rnaforester.py

cogent/app/rnashapes.py

cogent/app/rnaview.py

cogent/app/sfffile.py

cogent/app/sffinfo.py

cogent/app/sfold.py

cogent/app/stride.py

cogent/app/uclust.py

cogent/app/unafold.py

cogent/app/util.py

cogent/app/vienna_package.py

cogent/cluster/UPGMA.py

cogent/cluster/__init__.py

cogent/cluster/goodness_of_fit.py

cogent/cluster/metric_scaling.py

cogent/cluster/nmds.py

cogent/cluster/procrustes.py

cogent/core/__init__.py

cogent/core/alignment.py

cogent/core/alphabet.py

cogent/core/annotation.py

cogent/core/bitvector.py

cogent/core/entity.py

cogent/core/genetic_code.py

cogent/core/info.py

cogent/core/location.py

cogent/core/moltype.py

cogent/core/profile.py

cogent/core/sequence.py

cogent/core/tree.py

cogent/core/usage.py

cogent/data/__init__.py

cogent/data/energy_params.py

cogent/data/ligand_properties.py

cogent/data/molecular_weight.py

cogent/data/nucleic_properties.py

cogent/data/protein_properties.py

cogent/db/__init__.py

cogent/db/ensembl/__init__.py

cogent/db/ensembl/assembly.py

cogent/db/ensembl/compara.py

cogent/db/ensembl/database.py

cogent/db/ensembl/feature_level.py

cogent/db/ensembl/genome.py

cogent/db/ensembl/host.py

cogent/db/ensembl/name.py

cogent/db/ensembl/region.py

cogent/db/ensembl/related_region.py

cogent/db/ensembl/sequence.py

cogent/db/ensembl/species.py

cogent/db/ensembl/util.py

cogent/db/ncbi.py

cogent/db/pdb.py

cogent/db/rfam.py

cogent/db/util.py

cogent/draw/__init__.py

cogent/draw/arrow_rates.py

cogent/draw/codon_usage.py

cogent/draw/dendrogram.py

cogent/draw/dinuc.py

cogent/draw/dotplot.py

cogent/draw/fancy_arrow.py

cogent/draw/legend.py

cogent/draw/linear.py

cogent/draw/multivariate_plot.py

cogent/draw/rlg2mpl.py

cogent/draw/util.py

cogent/evolve/__init__.py

cogent/evolve/_likelihood_tree.c

cogent/evolve/_likelihood_tree.pyx

cogent/evolve/best_likelihood.py

cogent/evolve/bootstrap.py

cogent/evolve/coevolution.py

cogent/evolve/discrete_markov.py

cogent/evolve/likelihood_calculation.py

cogent/evolve/likelihood_function.py

cogent/evolve/likelihood_tree.py

cogent/evolve/models.py

cogent/evolve/motif_prob_model.py

cogent/evolve/parameter_controller.py

cogent/evolve/predicate.py

cogent/evolve/simulate.py

cogent/evolve/substitution_calculation.py

cogent/evolve/substitution_model.py

cogent/format/__init__.py

cogent/format/alignment.py

cogent/format/clustal.py

cogent/format/fasta.py

cogent/format/mage.py

cogent/format/motif.py

cogent/format/nexus.py

cogent/format/pdb.py

cogent/format/pdb_color.py

cogent/format/phylip.py

cogent/format/rna_struct.py

cogent/format/stockholm.py

cogent/format/structure.py

cogent/format/table.py

cogent/format/text_tree.py

cogent/format/xyzrn.py

cogent/maths/__init__.py

cogent/maths/_matrix_exponentiation.c

cogent/maths/_matrix_exponentiation.pyx

cogent/maths/distance_transform.py

cogent/maths/eigen.c

cogent/maths/function_optimisation.py

cogent/maths/geometry.py

cogent/maths/markov.py

cogent/maths/matrix/__init__.py

cogent/maths/matrix/distance.py

cogent/maths/matrix_exponentiation.py

cogent/maths/matrix_invert.c

cogent/maths/matrix_logarithm.py

cogent/maths/optimiser.py

cogent/maths/optimisers.py

cogent/maths/scipy_optimisers.py

cogent/maths/scipy_optimize.py

cogent/maths/simannealingoptimiser.py

cogent/maths/solve.py

cogent/maths/spatial/__init__.py

cogent/maths/spatial/ckd3.c

cogent/maths/spatial/ckd3.pyx

cogent/maths/stats/__init__.py

cogent/maths/stats/alpha_diversity.py

cogent/maths/stats/cai/__init__.py

cogent/maths/stats/cai/adaptor.py

cogent/maths/stats/cai/get_by_cai.py

cogent/maths/stats/cai/util.py

cogent/maths/stats/distribution.py

cogent/maths/stats/histogram.py

cogent/maths/stats/kendall.py

cogent/maths/stats/ks.py

cogent/maths/stats/rarefaction.py

cogent/maths/stats/special.py

cogent/maths/stats/test.py

cogent/maths/stats/util.py

cogent/maths/svd.py

cogent/maths/unifrac/__init__.py

cogent/maths/unifrac/fast_tree.py

cogent/maths/unifrac/fast_unifrac.py

cogent/motif/__init__.py

cogent/motif/k_word.py

cogent/motif/util.py

cogent/parse/__init__.py

cogent/parse/aaindex.py

cogent/parse/agilent_microarray.py

cogent/parse/blast.py

cogent/parse/blast_xml.py

cogent/parse/bpseq.py

cogent/parse/carnac.py

cogent/parse/cigar.py

cogent/parse/clustal.py

cogent/parse/cmfinder.py

cogent/parse/column.py

cogent/parse/comrna.py

cogent/parse/consan.py

cogent/parse/contrafold.py

cogent/parse/cove.py

cogent/parse/ct.py

cogent/parse/cut.py

cogent/parse/cutg.py

cogent/parse/dialign.py

cogent/parse/dotur.py

cogent/parse/dynalign.py

cogent/parse/ebi.py

cogent/parse/fasta.py

cogent/parse/flowgram.py

cogent/parse/flowgram_collection.py

cogent/parse/flowgram_parser.py

cogent/parse/foldalign.py

cogent/parse/gbseq.py

cogent/parse/gcg.py

cogent/parse/genbank.py

cogent/parse/gff.py

cogent/parse/gibbs.py

cogent/parse/ilm.py

cogent/parse/infernal.py

cogent/parse/knetfold.py

cogent/parse/locuslink.py

cogent/parse/macsim.py

cogent/parse/mage.py

cogent/parse/meme.py

cogent/parse/mfold.py

cogent/parse/mothur.py

cogent/parse/msms.py

cogent/parse/ncbi_taxonomy.py

cogent/parse/newick.py

cogent/parse/nexus.py

cogent/parse/nupack.py

cogent/parse/paml.py

cogent/parse/paml_matrix.py

cogent/parse/pdb.py

cogent/parse/pfold.py

cogent/parse/phylip.py

cogent/parse/pknotsrg.py

cogent/parse/rdb.py

cogent/parse/record.py

cogent/parse/record_finder.py

cogent/parse/rfam.py

cogent/parse/rna_fold.py

cogent/parse/rna_plot.py

cogent/parse/rnaalifold.py

cogent/parse/rnaforester.py

cogent/parse/rnashapes.py

cogent/parse/rnaview.py

cogent/parse/sequence.py

cogent/parse/sfold.py

cogent/parse/sprinzl.py

cogent/parse/stride.py

cogent/parse/structure.py

cogent/parse/table.py

cogent/parse/tinyseq.py

cogent/parse/tree.py

cogent/parse/tree_xml.py

cogent/parse/unafold.py

cogent/parse/unigene.py

cogent/phylo/__init__.py

cogent/phylo/compatibility.py

cogent/phylo/consensus.py

cogent/phylo/distance.py

cogent/phylo/least_squares.py

cogent/phylo/maximum_likelihood.py

cogent/phylo/nj.py

cogent/phylo/tree_collection.py

cogent/phylo/tree_space.py

cogent/phylo/util.py

cogent/recalculation/__init__.py

cogent/recalculation/calculation.py

cogent/recalculation/definition.py

cogent/recalculation/scope.py

cogent/recalculation/setting.py

cogent/seqsim/__init__.py

cogent/seqsim/analysis.py

cogent/seqsim/birth_death.py

cogent/seqsim/markov.py

cogent/seqsim/microarray.py

cogent/seqsim/microarray_normalize.py

cogent/seqsim/randomization.py

cogent/seqsim/searchpath.py

cogent/seqsim/sequence_generators.py

cogent/seqsim/tree.py

cogent/seqsim/usage.py

cogent/struct/__init__.py

cogent/struct/_asa.c

cogent/struct/_asa.pyx

cogent/struct/_contact.c

cogent/struct/_contact.pyx

cogent/struct/annotation.py

cogent/struct/asa.py

cogent/struct/contact.py

cogent/struct/dihedral.py

cogent/struct/knots.py

cogent/struct/manipulation.py

cogent/struct/pairs_util.py

cogent/struct/rna2d.py

cogent/struct/selection.py

cogent/util/__init__.py

cogent/util/array.py

cogent/util/checkpointing.py

cogent/util/datatypes.py

cogent/util/dict2d.py

cogent/util/dict_array.py

cogent/util/misc.py

cogent/util/modules.py

cogent/util/organizer.py

cogent/util/parallel.py

cogent/util/recode_alignment.py

cogent/util/table.py

cogent/util/transform.py

cogent/util/trie.py

cogent/util/unit_test.py

cogent/util/update_version.py

cogent/util/warning.py

debian/changelog

debian/control

doc/conf.py

doc/cookbook/DNA_and_RNA_sequences.rst

doc/cookbook/accessing_databases.rst

doc/cookbook/alignments.rst

doc/cookbook/analysis_of_sequence_composition.rst

doc/cookbook/annotations.rst

doc/cookbook/blast.rst

doc/cookbook/building_alignments.rst

doc/cookbook/building_phylogenies.rst

doc/cookbook/community_analysis.rst

doc/cookbook/dealing_with_hts_data.rst

doc/cookbook/genetic_code.rst

doc/cookbook/hpc_environments.rst

doc/cookbook/index.rst

doc/cookbook/introduction.rst

doc/cookbook/manipulating_biological_data.rst

doc/cookbook/multivariate_data_analysis.rst

doc/cookbook/simple_trees.rst

doc/cookbook/standard_statistical_analyses.rst

doc/cookbook/structural_data.rst

doc/cookbook/tips_for_using_python.rst

doc/cookbook/useful_utilities.rst

doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

doc/data_file_links.rst

doc/examples/alignment_app_controllers.rst

doc/examples/application_controller_framework.rst

doc/examples/calculate_UPGMA_cluster.rst

doc/examples/calculate_neigbourjoining_tree.rst

doc/examples/calculate_pairwise_distances.rst

doc/examples/codon_models.rst

doc/examples/draw_dendrogram.rst

doc/examples/draw_dotplot.rst

doc/examples/empirical_protein_models.rst

doc/examples/estimate_startingpoint.rst

doc/examples/genetic_code_aa_index.rst

doc/examples/handling_3dstructures.rst

doc/examples/hmm_par_heterogeneity.rst

doc/examples/index.rst

doc/examples/maketree_from_proteinseqs.rst

doc/examples/neutral_test.rst

doc/examples/parametric_bootstrap.rst

doc/examples/perform_PCoA_analysis.rst

doc/examples/phylo_by_ls.rst

doc/examples/phylogeny_app_controllers.rst

doc/examples/query_ensembl.rst

doc/examples/query_ncbi.rst

doc/examples/rate_heterogeneity.rst

doc/examples/relative_rate.rst

doc/examples/reuse_results.rst

doc/examples/scope_model_params_on_trees.rst

doc/examples/simple.rst

doc/examples/testing_multi_loci.rst

doc/examples/unrestricted_nucleotide.rst

doc/index.rst

doc/install.rst

doc/templates/layout.html

include/array_interface.h

include/numerical_pyrex.pyx

setup.py

tests/__init__.py

tests/alltests.py

tests/benchmark.py

tests/benchmark_aligning.py

tests/test_align/__init__.py

tests/test_align/test_algorithm.py

tests/test_align/test_align.py

tests/test_align/test_weights/__init__.py

tests/test_align/test_weights/test_methods.py

tests/test_align/test_weights/test_util.py

tests/test_app/__init__.py

tests/test_app/test_blast.py

tests/test_app/test_carnac.py

tests/test_app/test_cd_hit.py

tests/test_app/test_clearcut.py

tests/test_app/test_clustalw.py

tests/test_app/test_cmfinder.py

tests/test_app/test_comrna.py

tests/test_app/test_consan.py

tests/test_app/test_contrafold.py

tests/test_app/test_cove.py

tests/test_app/test_dialign.py

tests/test_app/test_dotur.py

tests/test_app/test_dynalign.py

tests/test_app/test_fasttree.py

tests/test_app/test_fasttree_v1.py

tests/test_app/test_foldalign.py

tests/test_app/test_formatdb.py

tests/test_app/test_gctmpca.py

tests/test_app/test_ilm.py

tests/test_app/test_infernal.py

tests/test_app/test_knetfold.py

tests/test_app/test_mafft.py

tests/test_app/test_mfold.py

tests/test_app/test_mothur.py

tests/test_app/test_msms.py

tests/test_app/test_muscle.py

tests/test_app/test_nupack.py

tests/test_app/test_parameters.py

tests/test_app/test_pfold.py

tests/test_app/test_pknotsrg.py

tests/test_app/test_raxml.py

tests/test_app/test_rdp_classifier.py

tests/test_app/test_rnaalifold.py

tests/test_app/test_rnaforester.py

tests/test_app/test_rnaview.py

tests/test_app/test_sfffile.py

tests/test_app/test_sffinfo.py

tests/test_app/test_sfold.py

tests/test_app/test_stride.py

tests/test_app/test_uclust.py

tests/test_app/test_unafold.py

tests/test_app/test_util.py

tests/test_app/test_vienna_package.py

tests/test_cluster/__init__.py

tests/test_cluster/test_UPGMA.py

tests/test_cluster/test_goodness_of_fit.py

tests/test_cluster/test_metric_scaling.py

tests/test_cluster/test_nmds.py

tests/test_cluster/test_procrustes.py

tests/test_core/__init__.py

tests/test_core/test_alignment.py

tests/test_core/test_alphabet.py

tests/test_core/test_annotation.py

tests/test_core/test_bitvector.py

tests/test_core/test_core_standalone.py

tests/test_core/test_entity.py

tests/test_core/test_genetic_code.py

tests/test_core/test_info.py

tests/test_core/test_location.py

tests/test_core/test_maps.py

tests/test_core/test_moltype.py

tests/test_core/test_profile.py

tests/test_core/test_seq_aln_integration.py

tests/test_core/test_sequence.py

tests/test_core/test_tree.py

tests/test_core/test_usage.py

tests/test_data/__init__.py

tests/test_data/test_molecular_weight.py

tests/test_db/__init__.py

tests/test_db/test_ensembl/__init__.py

tests/test_db/test_ensembl/test_assembly.py

tests/test_db/test_ensembl/test_compara.py

tests/test_db/test_ensembl/test_database.py

tests/test_db/test_ensembl/test_feature_level.py

tests/test_db/test_ensembl/test_genome.py

tests/test_db/test_ensembl/test_host.py

tests/test_db/test_ensembl/test_species.py

tests/test_db/test_ncbi.py

tests/test_db/test_pdb.py

tests/test_db/test_rfam.py

tests/test_db/test_util.py

tests/test_draw.py

tests/test_draw/test_matplotlib/test_arrow_rates.py

tests/test_draw/test_matplotlib/test_codon_usage.py

tests/test_draw/test_matplotlib/test_dinuc.py

tests/test_draw/test_matplotlib/test_multivariate_plot.py

tests/test_evolve/__init__.py

tests/test_evolve/test_best_likelihood.py

tests/test_evolve/test_bootstrap.py

tests/test_evolve/test_coevolution.py

tests/test_evolve/test_likelihood_function.py

tests/test_evolve/test_models.py

tests/test_evolve/test_motifchange.py

tests/test_evolve/test_newq.py

tests/test_evolve/test_parameter_controller.py

tests/test_evolve/test_scale_rules.py

tests/test_evolve/test_simulation.py

tests/test_evolve/test_substitution_model.py

tests/test_format/__init__.py

tests/test_format/test_clustal.py

tests/test_format/test_fasta.py

tests/test_format/test_mage.py

tests/test_format/test_pdb_color.py

tests/test_format/test_stockholm.py

tests/test_format/test_xyzrn.py

tests/test_maths/__init__.py

tests/test_maths/test_distance_transform.py

tests/test_maths/test_function_optimisation.py

tests/test_maths/test_geometry.py

tests/test_maths/test_matrix/__init__.py

tests/test_maths/test_matrix/test_distance.py

tests/test_maths/test_matrix_logarithm.py

tests/test_maths/test_optimisers.py

tests/test_maths/test_spatial/__init__.py

tests/test_maths/test_spatial/test_ckd3.py

tests/test_maths/test_stats/__init__.py

tests/test_maths/test_stats/test_alpha_diversity.py

tests/test_maths/test_stats/test_cai/__init__.py

tests/test_maths/test_stats/test_cai/test_adaptor.py

tests/test_maths/test_stats/test_cai/test_get_by_cai.py

tests/test_maths/test_stats/test_cai/test_util.py

tests/test_maths/test_stats/test_distribution.py

tests/test_maths/test_stats/test_histogram.py

tests/test_maths/test_stats/test_ks.py

tests/test_maths/test_stats/test_rarefaction.py

tests/test_maths/test_stats/test_special.py

tests/test_maths/test_stats/test_test.py

tests/test_maths/test_stats/test_util.py

tests/test_maths/test_svd.py

tests/test_maths/test_unifrac/__init__.py

tests/test_maths/test_unifrac/test_fast_tree.py

tests/test_maths/test_unifrac/test_fast_unifrac.py

tests/test_motif/__init__.py

tests/test_motif/test_util.py

tests/test_parse/__init__.py

tests/test_parse/test_aaindex.py

tests/test_parse/test_agilent_microarray.py

tests/test_parse/test_blast.py

tests/test_parse/test_blast_xml.py

tests/test_parse/test_bpseq.py

tests/test_parse/test_cigar.py

tests/test_parse/test_clustal.py

tests/test_parse/test_column.py

tests/test_parse/test_comrna.py

tests/test_parse/test_consan.py

tests/test_parse/test_cove.py

tests/test_parse/test_ct.py

tests/test_parse/test_cut.py

tests/test_parse/test_cutg.py

tests/test_parse/test_dialign.py

tests/test_parse/test_dotur.py

tests/test_parse/test_ebi.py

tests/test_parse/test_fasta.py

tests/test_parse/test_flowgram.py

tests/test_parse/test_flowgram_collection.py

tests/test_parse/test_flowgram_parser.py

tests/test_parse/test_genbank.py

tests/test_parse/test_gff.py

tests/test_parse/test_gibbs.py

tests/test_parse/test_ilm.py

tests/test_parse/test_infernal.py

tests/test_parse/test_locuslink.py

tests/test_parse/test_mage.py

tests/test_parse/test_meme.py

tests/test_parse/test_msms.py

tests/test_parse/test_ncbi_taxonomy.py

tests/test_parse/test_nexus.py

tests/test_parse/test_nupack.py

tests/test_parse/test_phylip.py

tests/test_parse/test_pknotsrg.py

tests/test_parse/test_rdb.py

tests/test_parse/test_record.py

tests/test_parse/test_record_finder.py

tests/test_parse/test_rfam.py

tests/test_parse/test_rna_fold.py

tests/test_parse/test_rnaalifold.py

tests/test_parse/test_rnaforester.py

tests/test_parse/test_rnaview.py

tests/test_parse/test_sprinzl.py

tests/test_parse/test_stride.py

tests/test_parse/test_tree.py

tests/test_parse/test_unigene.py

tests/test_phylo.py

tests/test_recalculation.rst

tests/test_seqsim/__init__.py

tests/test_seqsim/test_analysis.py

tests/test_seqsim/test_birth_death.py

tests/test_seqsim/test_markov.py

tests/test_seqsim/test_microarray.py

tests/test_seqsim/test_microarray_normalize.py

tests/test_seqsim/test_randomization.py

tests/test_seqsim/test_searchpath.py

tests/test_seqsim/test_sequence_generators.py

tests/test_seqsim/test_tree.py

tests/test_seqsim/test_usage.py

tests/test_struct/__init__.py

tests/test_struct/test_annotation.py

tests/test_struct/test_asa.py

tests/test_struct/test_contact.py

tests/test_struct/test_dihedral.py

tests/test_struct/test_knots.py

tests/test_struct/test_manipulation.py

tests/test_struct/test_pairs_util.py

tests/test_struct/test_rna2d.py

tests/test_struct/test_selection.py

tests/test_util/__init__.py

tests/test_util/test_array.py

tests/test_util/test_dict2d.py

tests/test_util/test_misc.py

tests/test_util/test_organizer.py

tests/test_util/test_recode_alignment.py

tests/test_util/test_table.rst

tests/test_util/test_transform.py

tests/test_util/test_trie.py

tests/test_util/test_unit_test.py

tests/timetrial.py

Show diffs side-by-side

added added

removed removed

doc/cookbook/multivariate_data_analysis.rst

.. _multivariate-analysis:

**************************

Multivariate data analysis

**************************

PCoA

====

*To be written.*

.. sectionauthor Justin Kuczynski, Catherine Lozupone, Andreas Wilm

Principal Coordinates Analysis

==============================

Principal Coordinates Analysis works on a matrix of pairwise distances. In this example we start by calculating the pairwise distances for a set of aligned sequences, though any distance matrix can be used with PCoA, relating any objects, not only sequences.

.. doctest::

>>> from cogent import LoadSeqs

>>> from cogent.phylo import distance

>>> from cogent.cluster.metric_scaling import PCoA

Import a substitution model (or create your own).

.. doctest::

>>> from cogent.evolve.models import HKY85

Load the alignment.

.. doctest::

>>> al = LoadSeqs("data/test.paml")

Create a pairwise distances object calculator for the alignment, providing a substitution model instance.

.. doctest::

>>> d = distance.EstimateDistances(al, submodel= HKY85())

>>> d.run(show_progress=False)

Now use this matrix to perform principal coordinates analysis.

.. doctest::

>>> PCoA_result = PCoA(d.getPairwiseDistances())

>>> print PCoA_result # doctest: +SKIP

======================================================================================

Type Label vec_num-0 vec_num-1 vec_num-2 vec_num-3 vec_num-4

--------------------------------------------------------------------------------------

Eigenvectors NineBande -0.02 0.01 0.04 0.01 0.00

Eigenvectors DogFaced -0.04 -0.06 -0.01 0.00 0.00

Eigenvectors HowlerMon -0.07 0.01 0.01 -0.02 0.00

Eigenvectors Mouse 0.20 0.01 -0.01 -0.00 0.00

Eigenvectors Human -0.07 0.04 -0.03 0.01 0.00

Eigenvalues eigenvalues 0.05 0.01 0.00 0.00 -0.00

Eigenvalues var explained (%) 85.71 9.60 3.73 0.95 -0.00

--------------------------------------------------------------------------------------

We can save these results to a file in a delimited format (we'll use tab here) that can be opened up in any data analysis program, like R or Excel. Here the principal coordinates can be plotted against each other for visualization.

.. doctest::

>>> PCoA_result.writeToFile('PCoA_results.txt',sep='\t')

Fast-MDS

========

The eigendecomposition step in Principal Coordinates Analysis (PCoA)

doesn't scale very well. And for thousands of objects the computation

of all pairwise distance alone can get very slow, because it scales

quadratically. For a huge number of objects this might even pose a

memory problem. Fast-MDS methods approximate an MDS/PCoA solution and

do not suffer from these problems.

First, let's simulate a big data sample by creating 1500 objects living

in 10 dimension. Then compute their pairwise distances and perform a

principal coordinates analysis on it. Note that the last two steps might take

already a couple of minutes.

.. doctest::

>>> from cogent.maths.distance_transform import dist_euclidean

>>> from cogent.cluster.metric_scaling import principal_coordinates_analysis

>>> from numpy import random

>>> objs = random.random((1500, 10))

>>> distmtx = dist_euclidean(objs)

>>> full_pcoa = principal_coordinates_analysis(distmtx)

PyCogent implements two fast MDS approximations called

Split-and-Combine MDS (SCMDS, still in development) and Nystrom (also known as

Landmark-MDS). Both can easily handle many thousands objects. One

reason is that they don't require all distances to be computed.

Instead you pass down the distance function and only required

distances are calculated.

Nystrom works by using a so called seed-matrix, which contains (only) k by

n distances, where n is the total number of objects and k<<n. The

bigger k, the more exact the approximation will be and the longer the

computation will take. One further difference to normal Principal

100

Coordinates Analysis is, that no eigenvalues, but only approximate

101

eigenvectors of length dim will be returned.

102

103

.. doctest::

104

105

>>> from cogent.cluster.approximate_mds import nystrom

106

>>> from random import sample

107

>>> from numpy import array

108

>>> n_seeds = 100

109

>>> seeds = array(sample(distmtx,n_seeds))

110

>>> dims = 3

111

>>> nystrom_3d = nystrom(seeds, dims)

112

113

A good rule of thumb for picking n_seeds is log(n), log(n)**2 or

114

sqrt(n).

115

116

117

SCMDS works by dividing the pairwise distance matrix into chunks of

118

certain size and overlap. MDS is performed on each chunk individually

119

and the resulting solutions are progressively joined. As in the case

120

of Nystrom not all distances will be computed, but only those of the

121

overlapping tiles. The size and overlap of the tiles determine the

122

quality of the approximation as well as the run-time.

123

124

.. doctest::

125

126

>>> from cogent.cluster.approximate_mds import CombineMds, cmds_tzeng

127

>>> combine_mds = CombineMds()

128

>>> tile_overlap = 100

129

>>> dims = 3

130

>>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[0:500,0:500], dims)

131

>>> combine_mds.add(tile_eigvecs, tile_overlap)

132

>>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[400:900,400:900], dims)

133

>>> combine_mds.add(tile_eigvecs, tile_overlap)

134

>>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[800:1300,800:1300], dims)

135

>>> combine_mds.add(tile_eigvecs, tile_overlap)

136

>>> tile_eigvecs, tile_eigvals = cmds_tzeng(distmtx[1200:1500,1200:1500], dims)

137

>>> combine_mds.add(tile_eigvecs, tile_overlap)

138

>>> combien_mds_3d = combine_mds.getFinalMDS()

139

140

If you want to know how good the returned approximations are, you will

141

have to perform principal_coordinates_analysis() on a smallish

142

submatrix and perform a goodness_of_fit analysis.

143

144

145

146

NMDS

147

====

148

*To be written.*

149

NMDS (Non-metric MultiDimensional Scaling) works on a matrix of pairwise distances. In this example, we generate a matrix based on the euclidean distances of an abundance matrix.

150

151

.. doctest::

152

153

>>> from cogent.cluster.nmds import NMDS

154

>>> from cogent.maths.distance_transform import dist_euclidean

155

>>> from numpy import array

156

157

We start with an abundance matrix, samples (rows) by sequences/species (cols)

158

159

.. doctest::

160

161

>>> abundance = array(

162

... [[7,1,0,0,0,0,0,0,0],

163

... [4,2,0,0,0,1,0,0,0],

164

... [2,4,0,0,0,1,0,0,0],

165

... [1,7,0,0,0,0,0,0,0],

166

... [0,8,0,0,0,0,0,0,0],

167

... [0,7,1,0,0,0,0,0,0],#idx 5

168

... [0,4,2,0,0,0,2,0,0],

169

... [0,2,4,0,0,0,1,0,0],

170

... [0,1,7,0,0,0,0,0,0],

171

... [0,0,8,0,0,0,0,0,0],

172

... [0,0,7,1,0,0,0,0,0],#idx 10

173

... [0,0,4,2,0,0,0,3,0],

174

... [0,0,2,4,0,0,0,1,0],

175

... [0,0,1,7,0,0,0,0,0],

176

... [0,0,0,8,0,0,0,0,0],

177

... [0,0,0,7,1,0,0,0,0],#idx 15

178

... [0,0,0,4,2,0,0,0,4],

179

... [0,0,0,2,4,0,0,0,1],

180

... [0,0,0,1,7,0,0,0,0]], 'float')

181

182

Then compute a distance matrix using euclidean distance, and perform nmds on that matrix

183

184

.. doctest::

185

186

>>> euc_distmtx = dist_euclidean(abundance)

187

>>> nm = NMDS(euc_distmtx, verbosity=0)

188

189

The NMDS object provides a list of points, which can be plotted if desired

190

191

.. doctest::

192

193

>>> pts = nm.getPoints()

194

>>> stress = nm.getStress()

195

196

With matplotlib installed, we could then do ``plt.plot(pts[:,0], pts[:,1])``

197

198

Hierarchical clustering (UPGMA, NJ)

199

===================================

200

*To be written.*

k-means clustering

==================

*To be written.*

201

Hierarchical clustering techniques work on a matrix of pairwise distances. In this case, we use the distance matrix from the NMDS example, relating samples of species to one another using UPGMA (NJ below).

202

203

.. note:: UPGMA should not be used for phylogenetic reconstruction.

204

205

.. doctest::

206

207

>>> from cogent.cluster.UPGMA import upgma

208

209

we start with the distance matrix and list of sample names:

210

211

.. doctest::

212

213

>>> sample_names = ['sample'+str(i) for i in range(len(euc_distmtx))]

214

215

make 2d dict:

216

217

.. doctest::

218

219

>>> euc_distdict = {}

220

>>> for i in range(len(sample_names)):

221

... for j in range(len(sample_names)):

222

... euc_distdict[(sample_names[i],sample_names[j])]=euc_distmtx[i,j]

223

224

e.g.: ``euc_distdict[('sample6', 'sample5')] == 3.7416573867739413``

225

226

Now use this matrix to build a UPGMA cluster.

227

228

.. doctest::

229

230

>>> mycluster = upgma(euc_distdict)

231

>>> print mycluster.asciiArt()

232

/-sample10

233

/edge.3--|

234

/edge.2--| \-sample8

235

| |

236

| \-sample9

237

/edge.1--|

238

| | /-sample12

239

| | /edge.5--|

240

| | | \-sample11

241

| \edge.4--|

242

| | /-sample6

243

| \edge.6--|

244

/edge.0--| \-sample7

245

| |

246

| | /-sample15

247

| | /edge.10-|

248

| | /edge.9--| \-sample14

249

| | | |

250

| | /edge.8--| \-sample13

251

| | | |

252

| \edge.7--| \-sample16

253

-root----| |

254

| | /-sample17

255

| \edge.11-|

256

| \-sample18

257

258

| /-sample5

259

| /edge.14-|

260

| /edge.13-| \-sample4

261

| | |

262

| | \-sample3

263

\edge.12-|

264

| /-sample2

265

| /edge.16-|

266

\edge.15-| \-sample1

267

268

\-sample0

269

270

We demonstrate saving this UPGMA cluster to a file.

271

272

.. doctest::

273

274

>>> mycluster.writeToFile('test_upgma.tree')

275

276

277

We don't actually want to keep that file now, so I'm importing the ``os`` module to delete it.

278

279

.. doctest::

280

:hide:

281

282

>>> import os

283

>>> os.remove('test_upgma.tree')

284

285

We can use neighbor joining (NJ) instead of UPGMA:

286

287

.. doctest::

288

289

>>> from cogent.phylo.nj import nj

290

>>> njtree = nj(euc_distdict)

291

>>> print njtree.asciiArt()

292

/-sample16

293

294

| /-sample12

295

| /edge.2--|

296

| | | /-sample13

297

| | \edge.1--|

298

| | | /-sample14

299

| | \edge.0--|

300

| | \-sample15

301

| |

302

| | /-sample7

303

|-edge.14-| /edge.5--|

304

| | | | /-sample8

305

| | | \edge.4--|

306

| | /edge.6--| | /-sample10

307

308

| | | | \-sample9

309

-root----| | | |

310

| | | \-sample11

311

| | |

312

| \edge.13-| /-sample6

313

| | |

314

| | | /-sample4

315

| | /edge.10-| /edge.7--|

316

317

| | | | | |

318

| | | \edge.9--| \-sample5

319

| \edge.12-| |

320

| | \-sample2

321

| |

322

| | /-sample0

323

| \edge.11-|

324

| \-sample1

325

326

| /-sample18

327

\edge.15-|

328

\-sample17

Older »