~ubuntu-branches/ubuntu/natty/python-cogent/natty

Viewing changes to doc/cookbook/analysis_of_sequence_composition.rst

Committer: Bazaar Package Importer
Author(s): Steffen Moeller
Date: 2010-12-04 22:30:35 UTC
mfrom: (1.1.1 upstream)
Revision ID: james.westby@ubuntu.com-20101204223035-j11kinhcrrdgg2p2

Tags: 1.5-1

* Bumped standard to 3.9.1, no changes required.
* New upstream version.
  - major additions to Cookbook
  - added AlleleFreqs attribute to ensembl Variation objects.
  - added getGeneByStableId method to genome objects.
  - added Introns attribute to Transcript objects and an Intron class.
  - added Mann-Whitney test and a Monte-Carlo version
  - exploratory and confirmatory period estimation techniques (suitable for
    symbolic and continuous data)
  - Information theoretic measures (AIC and BIC) added
  - drawing of trees with collapsed nodes
  - progress display indicator support for terminal and GUI apps
  - added parser for illumina HiSeq2000 and GAiix sequence files as
    cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser.
  - added parser to FASTQ files, one of the output options for illumina's
    workflow, also added cookbook demo.
  - added functionality for parsing of SFF files without the Roche tools in
    cogent.parse.binary_sff
  - thousand fold performance improvement to nmds
  - >10-fold performance improvements to some Table operations

files added:
cogent/cluster/approximate_mds.py

cogent/maths/_period.c

cogent/maths/_period.pyx

cogent/maths/period.py

cogent/maths/stats/information_criteria.py

cogent/maths/stats/period.py

cogent/parse/binary_sff.py

cogent/parse/fastq.py

cogent/parse/illumina_sequence.py

cogent/parse/kegg_ko.py

cogent/parse/kegg_pos.py

cogent/parse/kegg_taxonomy.py

cogent/util/progress_display.py

cogent/util/terminal.py

doc/_static

doc/_static/google_feed.js

doc/cookbook/alphabet.rst

doc/cookbook/checkpointing_long_running.rst

doc/cookbook/ensembl.rst

doc/cookbook/loading_sequences.rst

doc/cookbook/managing_trees.rst

doc/cookbook/moltypesequence.rst

doc/cookbook/parallel_tasks.rst

doc/cookbook/phylonodes.rst

doc/cookbook/structural_contacts.rst

doc/cookbook/structural_data_2.rst

doc/data/1HQF.pdb

doc/data/Crump_et_al_example_env_file.txt

doc/data/Crump_example_tree_newick.txt

doc/data/inseqs_protein.fasta

doc/data/refseqs_protein.fasta

doc/examples/building_and_using_an_application_controller.rst

doc/examples/period_estimation.rst

doc/examples/seqsim_alignment_simulation.rst

doc/examples/seqsim_aln_sim_user_alphabet.rst

doc/examples/seqsim_tree_sim.rst

tests/data/F6AVWTA01.sff

tests/data/fastq.txt

tests/test_cluster/test_approximate_mds.py

tests/test_maths/test_period.py

tests/test_maths/test_stats/test_information_criteria.py

tests/test_maths/test_stats/test_period.py

tests/test_parse/test_binary_sff.py

tests/test_parse/test_fastq.py

tests/test_parse/test_illumina_sequence.py

tests/test_parse/test_kegg_ko.py

tests/test_parse/test_kegg_pos.py

tests/test_parse/test_kegg_taxonomy.py

tests/test_parse/test_mothur.py

tests/test_parse/test_pdb.py

tests/test_parse/test_rna_plot.py

tests/test_parse/test_structure.py

files removed:
tests/test_core/test_tree2.py

files modified:
.pc/fix_python_shebang_line.patch/cogent/align/dp_calculation.py

.pc/fix_python_shebang_line.patch/cogent/data/molecular_weight.py

.pc/fix_python_shebang_line.patch/cogent/format/text_tree.py

.pc/fix_python_shebang_line.patch/cogent/phylo/maximum_likelihood.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/__init__.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/setting.py

ChangeLog

cogent/__init__.py

cogent/align/__init__.py

cogent/align/_compare.c

cogent/align/_compare.pyx

cogent/align/_pairwise_pogs.c

cogent/align/_pairwise_pogs.pyx

cogent/align/_pairwise_seqs.c

cogent/align/_pairwise_seqs.pyx

cogent/align/algorithm.py

cogent/align/align.py

cogent/align/dp_calculation.py

cogent/align/indel_model.py

cogent/align/indel_positions.py

cogent/align/pairwise.py

cogent/align/partial_order_graph.py

cogent/align/progressive.py

cogent/align/pycompare.py

cogent/align/traceback.py

cogent/align/weights/__init__.py

cogent/align/weights/methods.py

cogent/align/weights/util.py

cogent/app/__init__.py

cogent/app/blast.py

cogent/app/carnac.py

cogent/app/cd_hit.py

cogent/app/clearcut.py

cogent/app/clustalw.py

cogent/app/cmfinder.py

cogent/app/comrna.py

cogent/app/consan.py

cogent/app/contrafold.py

cogent/app/cove.py

cogent/app/dialign.py

cogent/app/dotur.py

cogent/app/dynalign.py

cogent/app/fasttree.py

cogent/app/fasttree_v1.py

cogent/app/foldalign.py

cogent/app/formatdb.py

cogent/app/gctmpca.py

cogent/app/ilm.py

cogent/app/infernal.py

cogent/app/knetfold.py

cogent/app/mafft.py

cogent/app/mfold.py

cogent/app/mothur.py

cogent/app/msms.py

cogent/app/muscle.py

cogent/app/nupack.py

cogent/app/parameters.py

cogent/app/pfold.py

cogent/app/pknotsrg.py

cogent/app/raxml.py

cogent/app/rdp_classifier.py

cogent/app/rnaalifold.py

cogent/app/rnaforester.py

cogent/app/rnashapes.py

cogent/app/rnaview.py

cogent/app/sfffile.py

cogent/app/sffinfo.py

cogent/app/sfold.py

cogent/app/stride.py

cogent/app/uclust.py

cogent/app/unafold.py

cogent/app/util.py

cogent/app/vienna_package.py

cogent/cluster/UPGMA.py

cogent/cluster/__init__.py

cogent/cluster/goodness_of_fit.py

cogent/cluster/metric_scaling.py

cogent/cluster/nmds.py

cogent/cluster/procrustes.py

cogent/core/__init__.py

cogent/core/alignment.py

cogent/core/alphabet.py

cogent/core/annotation.py

cogent/core/bitvector.py

cogent/core/entity.py

cogent/core/genetic_code.py

cogent/core/info.py

cogent/core/location.py

cogent/core/moltype.py

cogent/core/profile.py

cogent/core/sequence.py

cogent/core/tree.py

cogent/core/usage.py

cogent/data/__init__.py

cogent/data/energy_params.py

cogent/data/ligand_properties.py

cogent/data/molecular_weight.py

cogent/data/nucleic_properties.py

cogent/data/protein_properties.py

cogent/db/__init__.py

cogent/db/ensembl/__init__.py

cogent/db/ensembl/assembly.py

cogent/db/ensembl/compara.py

cogent/db/ensembl/database.py

cogent/db/ensembl/feature_level.py

cogent/db/ensembl/genome.py

cogent/db/ensembl/host.py

cogent/db/ensembl/name.py

cogent/db/ensembl/region.py

cogent/db/ensembl/related_region.py

cogent/db/ensembl/sequence.py

cogent/db/ensembl/species.py

cogent/db/ensembl/util.py

cogent/db/ncbi.py

cogent/db/pdb.py

cogent/db/rfam.py

cogent/db/util.py

cogent/draw/__init__.py

cogent/draw/arrow_rates.py

cogent/draw/codon_usage.py

cogent/draw/dendrogram.py

cogent/draw/dinuc.py

cogent/draw/dotplot.py

cogent/draw/fancy_arrow.py

cogent/draw/legend.py

cogent/draw/linear.py

cogent/draw/multivariate_plot.py

cogent/draw/rlg2mpl.py

cogent/draw/util.py

cogent/evolve/__init__.py

cogent/evolve/_likelihood_tree.c

cogent/evolve/_likelihood_tree.pyx

cogent/evolve/best_likelihood.py

cogent/evolve/bootstrap.py

cogent/evolve/coevolution.py

cogent/evolve/discrete_markov.py

cogent/evolve/likelihood_calculation.py

cogent/evolve/likelihood_function.py

cogent/evolve/likelihood_tree.py

cogent/evolve/models.py

cogent/evolve/motif_prob_model.py

cogent/evolve/parameter_controller.py

cogent/evolve/predicate.py

cogent/evolve/simulate.py

cogent/evolve/substitution_calculation.py

cogent/evolve/substitution_model.py

cogent/format/__init__.py

cogent/format/alignment.py

cogent/format/clustal.py

cogent/format/fasta.py

cogent/format/mage.py

cogent/format/motif.py

cogent/format/nexus.py

cogent/format/pdb.py

cogent/format/pdb_color.py

cogent/format/phylip.py

cogent/format/rna_struct.py

cogent/format/stockholm.py

cogent/format/structure.py

cogent/format/table.py

cogent/format/text_tree.py

cogent/format/xyzrn.py

cogent/maths/__init__.py

cogent/maths/_matrix_exponentiation.c

cogent/maths/_matrix_exponentiation.pyx

cogent/maths/distance_transform.py

cogent/maths/eigen.c

cogent/maths/function_optimisation.py

cogent/maths/geometry.py

cogent/maths/markov.py

cogent/maths/matrix/__init__.py

cogent/maths/matrix/distance.py

cogent/maths/matrix_exponentiation.py

cogent/maths/matrix_invert.c

cogent/maths/matrix_logarithm.py

cogent/maths/optimiser.py

cogent/maths/optimisers.py

cogent/maths/scipy_optimisers.py

cogent/maths/scipy_optimize.py

cogent/maths/simannealingoptimiser.py

cogent/maths/solve.py

cogent/maths/spatial/__init__.py

cogent/maths/spatial/ckd3.c

cogent/maths/spatial/ckd3.pyx

cogent/maths/stats/__init__.py

cogent/maths/stats/alpha_diversity.py

cogent/maths/stats/cai/__init__.py

cogent/maths/stats/cai/adaptor.py

cogent/maths/stats/cai/get_by_cai.py

cogent/maths/stats/cai/util.py

cogent/maths/stats/distribution.py

cogent/maths/stats/histogram.py

cogent/maths/stats/kendall.py

cogent/maths/stats/ks.py

cogent/maths/stats/rarefaction.py

cogent/maths/stats/special.py

cogent/maths/stats/test.py

cogent/maths/stats/util.py

cogent/maths/svd.py

cogent/maths/unifrac/__init__.py

cogent/maths/unifrac/fast_tree.py

cogent/maths/unifrac/fast_unifrac.py

cogent/motif/__init__.py

cogent/motif/k_word.py

cogent/motif/util.py

cogent/parse/__init__.py

cogent/parse/aaindex.py

cogent/parse/agilent_microarray.py

cogent/parse/blast.py

cogent/parse/blast_xml.py

cogent/parse/bpseq.py

cogent/parse/carnac.py

cogent/parse/cigar.py

cogent/parse/clustal.py

cogent/parse/cmfinder.py

cogent/parse/column.py

cogent/parse/comrna.py

cogent/parse/consan.py

cogent/parse/contrafold.py

cogent/parse/cove.py

cogent/parse/ct.py

cogent/parse/cut.py

cogent/parse/cutg.py

cogent/parse/dialign.py

cogent/parse/dotur.py

cogent/parse/dynalign.py

cogent/parse/ebi.py

cogent/parse/fasta.py

cogent/parse/flowgram.py

cogent/parse/flowgram_collection.py

cogent/parse/flowgram_parser.py

cogent/parse/foldalign.py

cogent/parse/gbseq.py

cogent/parse/gcg.py

cogent/parse/genbank.py

cogent/parse/gff.py

cogent/parse/gibbs.py

cogent/parse/ilm.py

cogent/parse/infernal.py

cogent/parse/knetfold.py

cogent/parse/locuslink.py

cogent/parse/macsim.py

cogent/parse/mage.py

cogent/parse/meme.py

cogent/parse/mfold.py

cogent/parse/mothur.py

cogent/parse/msms.py

cogent/parse/ncbi_taxonomy.py

cogent/parse/newick.py

cogent/parse/nexus.py

cogent/parse/nupack.py

cogent/parse/paml.py

cogent/parse/paml_matrix.py

cogent/parse/pdb.py

cogent/parse/pfold.py

cogent/parse/phylip.py

cogent/parse/pknotsrg.py

cogent/parse/rdb.py

cogent/parse/record.py

cogent/parse/record_finder.py

cogent/parse/rfam.py

cogent/parse/rna_fold.py

cogent/parse/rna_plot.py

cogent/parse/rnaalifold.py

cogent/parse/rnaforester.py

cogent/parse/rnashapes.py

cogent/parse/rnaview.py

cogent/parse/sequence.py

cogent/parse/sfold.py

cogent/parse/sprinzl.py

cogent/parse/stride.py

cogent/parse/structure.py

cogent/parse/table.py

cogent/parse/tinyseq.py

cogent/parse/tree.py

cogent/parse/tree_xml.py

cogent/parse/unafold.py

cogent/parse/unigene.py

cogent/phylo/__init__.py

cogent/phylo/compatibility.py

cogent/phylo/consensus.py

cogent/phylo/distance.py

cogent/phylo/least_squares.py

cogent/phylo/maximum_likelihood.py

cogent/phylo/nj.py

cogent/phylo/tree_collection.py

cogent/phylo/tree_space.py

cogent/phylo/util.py

cogent/recalculation/__init__.py

cogent/recalculation/calculation.py

cogent/recalculation/definition.py

cogent/recalculation/scope.py

cogent/recalculation/setting.py

cogent/seqsim/__init__.py

cogent/seqsim/analysis.py

cogent/seqsim/birth_death.py

cogent/seqsim/markov.py

cogent/seqsim/microarray.py

cogent/seqsim/microarray_normalize.py

cogent/seqsim/randomization.py

cogent/seqsim/searchpath.py

cogent/seqsim/sequence_generators.py

cogent/seqsim/tree.py

cogent/seqsim/usage.py

cogent/struct/__init__.py

cogent/struct/_asa.c

cogent/struct/_asa.pyx

cogent/struct/_contact.c

cogent/struct/_contact.pyx

cogent/struct/annotation.py

cogent/struct/asa.py

cogent/struct/contact.py

cogent/struct/dihedral.py

cogent/struct/knots.py

cogent/struct/manipulation.py

cogent/struct/pairs_util.py

cogent/struct/rna2d.py

cogent/struct/selection.py

cogent/util/__init__.py

cogent/util/array.py

cogent/util/checkpointing.py

cogent/util/datatypes.py

cogent/util/dict2d.py

cogent/util/dict_array.py

cogent/util/misc.py

cogent/util/modules.py

cogent/util/organizer.py

cogent/util/parallel.py

cogent/util/recode_alignment.py

cogent/util/table.py

cogent/util/transform.py

cogent/util/trie.py

cogent/util/unit_test.py

cogent/util/update_version.py

cogent/util/warning.py

debian/changelog

debian/control

doc/conf.py

doc/cookbook/DNA_and_RNA_sequences.rst

doc/cookbook/accessing_databases.rst

doc/cookbook/alignments.rst

doc/cookbook/analysis_of_sequence_composition.rst

doc/cookbook/annotations.rst

doc/cookbook/blast.rst

doc/cookbook/building_alignments.rst

doc/cookbook/building_phylogenies.rst

doc/cookbook/community_analysis.rst

doc/cookbook/dealing_with_hts_data.rst

doc/cookbook/genetic_code.rst

doc/cookbook/hpc_environments.rst

doc/cookbook/index.rst

doc/cookbook/introduction.rst

doc/cookbook/manipulating_biological_data.rst

doc/cookbook/multivariate_data_analysis.rst

doc/cookbook/simple_trees.rst

doc/cookbook/standard_statistical_analyses.rst

doc/cookbook/structural_data.rst

doc/cookbook/tips_for_using_python.rst

doc/cookbook/useful_utilities.rst

doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

doc/data_file_links.rst

doc/examples/alignment_app_controllers.rst

doc/examples/application_controller_framework.rst

doc/examples/calculate_UPGMA_cluster.rst

doc/examples/calculate_neigbourjoining_tree.rst

doc/examples/calculate_pairwise_distances.rst

doc/examples/codon_models.rst

doc/examples/draw_dendrogram.rst

doc/examples/draw_dotplot.rst

doc/examples/empirical_protein_models.rst

doc/examples/estimate_startingpoint.rst

doc/examples/genetic_code_aa_index.rst

doc/examples/handling_3dstructures.rst

doc/examples/hmm_par_heterogeneity.rst

doc/examples/index.rst

doc/examples/maketree_from_proteinseqs.rst

doc/examples/neutral_test.rst

doc/examples/parametric_bootstrap.rst

doc/examples/perform_PCoA_analysis.rst

doc/examples/phylo_by_ls.rst

doc/examples/phylogeny_app_controllers.rst

doc/examples/query_ensembl.rst

doc/examples/query_ncbi.rst

doc/examples/rate_heterogeneity.rst

doc/examples/relative_rate.rst

doc/examples/reuse_results.rst

doc/examples/scope_model_params_on_trees.rst

doc/examples/simple.rst

doc/examples/testing_multi_loci.rst

doc/examples/unrestricted_nucleotide.rst

doc/index.rst

doc/install.rst

doc/templates/layout.html

include/array_interface.h

include/numerical_pyrex.pyx

setup.py

tests/__init__.py

tests/alltests.py

tests/benchmark.py

tests/benchmark_aligning.py

tests/test_align/__init__.py

tests/test_align/test_algorithm.py

tests/test_align/test_align.py

tests/test_align/test_weights/__init__.py

tests/test_align/test_weights/test_methods.py

tests/test_align/test_weights/test_util.py

tests/test_app/__init__.py

tests/test_app/test_blast.py

tests/test_app/test_carnac.py

tests/test_app/test_cd_hit.py

tests/test_app/test_clearcut.py

tests/test_app/test_clustalw.py

tests/test_app/test_cmfinder.py

tests/test_app/test_comrna.py

tests/test_app/test_consan.py

tests/test_app/test_contrafold.py

tests/test_app/test_cove.py

tests/test_app/test_dialign.py

tests/test_app/test_dotur.py

tests/test_app/test_dynalign.py

tests/test_app/test_fasttree.py

tests/test_app/test_fasttree_v1.py

tests/test_app/test_foldalign.py

tests/test_app/test_formatdb.py

tests/test_app/test_gctmpca.py

tests/test_app/test_ilm.py

tests/test_app/test_infernal.py

tests/test_app/test_knetfold.py

tests/test_app/test_mafft.py

tests/test_app/test_mfold.py

tests/test_app/test_mothur.py

tests/test_app/test_msms.py

tests/test_app/test_muscle.py

tests/test_app/test_nupack.py

tests/test_app/test_parameters.py

tests/test_app/test_pfold.py

tests/test_app/test_pknotsrg.py

tests/test_app/test_raxml.py

tests/test_app/test_rdp_classifier.py

tests/test_app/test_rnaalifold.py

tests/test_app/test_rnaforester.py

tests/test_app/test_rnaview.py

tests/test_app/test_sfffile.py

tests/test_app/test_sffinfo.py

tests/test_app/test_sfold.py

tests/test_app/test_stride.py

tests/test_app/test_uclust.py

tests/test_app/test_unafold.py

tests/test_app/test_util.py

tests/test_app/test_vienna_package.py

tests/test_cluster/__init__.py

tests/test_cluster/test_UPGMA.py

tests/test_cluster/test_goodness_of_fit.py

tests/test_cluster/test_metric_scaling.py

tests/test_cluster/test_nmds.py

tests/test_cluster/test_procrustes.py

tests/test_core/__init__.py

tests/test_core/test_alignment.py

tests/test_core/test_alphabet.py

tests/test_core/test_annotation.py

tests/test_core/test_bitvector.py

tests/test_core/test_core_standalone.py

tests/test_core/test_entity.py

tests/test_core/test_genetic_code.py

tests/test_core/test_info.py

tests/test_core/test_location.py

tests/test_core/test_maps.py

tests/test_core/test_moltype.py

tests/test_core/test_profile.py

tests/test_core/test_seq_aln_integration.py

tests/test_core/test_sequence.py

tests/test_core/test_tree.py

tests/test_core/test_usage.py

tests/test_data/__init__.py

tests/test_data/test_molecular_weight.py

tests/test_db/__init__.py

tests/test_db/test_ensembl/__init__.py

tests/test_db/test_ensembl/test_assembly.py

tests/test_db/test_ensembl/test_compara.py

tests/test_db/test_ensembl/test_database.py

tests/test_db/test_ensembl/test_feature_level.py

tests/test_db/test_ensembl/test_genome.py

tests/test_db/test_ensembl/test_host.py

tests/test_db/test_ensembl/test_species.py

tests/test_db/test_ncbi.py

tests/test_db/test_pdb.py

tests/test_db/test_rfam.py

tests/test_db/test_util.py

tests/test_draw.py

tests/test_draw/test_matplotlib/test_arrow_rates.py

tests/test_draw/test_matplotlib/test_codon_usage.py

tests/test_draw/test_matplotlib/test_dinuc.py

tests/test_draw/test_matplotlib/test_multivariate_plot.py

tests/test_evolve/__init__.py

tests/test_evolve/test_best_likelihood.py

tests/test_evolve/test_bootstrap.py

tests/test_evolve/test_coevolution.py

tests/test_evolve/test_likelihood_function.py

tests/test_evolve/test_models.py

tests/test_evolve/test_motifchange.py

tests/test_evolve/test_newq.py

tests/test_evolve/test_parameter_controller.py

tests/test_evolve/test_scale_rules.py

tests/test_evolve/test_simulation.py

tests/test_evolve/test_substitution_model.py

tests/test_format/__init__.py

tests/test_format/test_clustal.py

tests/test_format/test_fasta.py

tests/test_format/test_mage.py

tests/test_format/test_pdb_color.py

tests/test_format/test_stockholm.py

tests/test_format/test_xyzrn.py

tests/test_maths/__init__.py

tests/test_maths/test_distance_transform.py

tests/test_maths/test_function_optimisation.py

tests/test_maths/test_geometry.py

tests/test_maths/test_matrix/__init__.py

tests/test_maths/test_matrix/test_distance.py

tests/test_maths/test_matrix_logarithm.py

tests/test_maths/test_optimisers.py

tests/test_maths/test_spatial/__init__.py

tests/test_maths/test_spatial/test_ckd3.py

tests/test_maths/test_stats/__init__.py

tests/test_maths/test_stats/test_alpha_diversity.py

tests/test_maths/test_stats/test_cai/__init__.py

tests/test_maths/test_stats/test_cai/test_adaptor.py

tests/test_maths/test_stats/test_cai/test_get_by_cai.py

tests/test_maths/test_stats/test_cai/test_util.py

tests/test_maths/test_stats/test_distribution.py

tests/test_maths/test_stats/test_histogram.py

tests/test_maths/test_stats/test_ks.py

tests/test_maths/test_stats/test_rarefaction.py

tests/test_maths/test_stats/test_special.py

tests/test_maths/test_stats/test_test.py

tests/test_maths/test_stats/test_util.py

tests/test_maths/test_svd.py

tests/test_maths/test_unifrac/__init__.py

tests/test_maths/test_unifrac/test_fast_tree.py

tests/test_maths/test_unifrac/test_fast_unifrac.py

tests/test_motif/__init__.py

tests/test_motif/test_util.py

tests/test_parse/__init__.py

tests/test_parse/test_aaindex.py

tests/test_parse/test_agilent_microarray.py

tests/test_parse/test_blast.py

tests/test_parse/test_blast_xml.py

tests/test_parse/test_bpseq.py

tests/test_parse/test_cigar.py

tests/test_parse/test_clustal.py

tests/test_parse/test_column.py

tests/test_parse/test_comrna.py

tests/test_parse/test_consan.py

tests/test_parse/test_cove.py

tests/test_parse/test_ct.py

tests/test_parse/test_cut.py

tests/test_parse/test_cutg.py

tests/test_parse/test_dialign.py

tests/test_parse/test_dotur.py

tests/test_parse/test_ebi.py

tests/test_parse/test_fasta.py

tests/test_parse/test_flowgram.py

tests/test_parse/test_flowgram_collection.py

tests/test_parse/test_flowgram_parser.py

tests/test_parse/test_genbank.py

tests/test_parse/test_gff.py

tests/test_parse/test_gibbs.py

tests/test_parse/test_ilm.py

tests/test_parse/test_infernal.py

tests/test_parse/test_locuslink.py

tests/test_parse/test_mage.py

tests/test_parse/test_meme.py

tests/test_parse/test_msms.py

tests/test_parse/test_ncbi_taxonomy.py

tests/test_parse/test_nexus.py

tests/test_parse/test_nupack.py

tests/test_parse/test_phylip.py

tests/test_parse/test_pknotsrg.py

tests/test_parse/test_rdb.py

tests/test_parse/test_record.py

tests/test_parse/test_record_finder.py

tests/test_parse/test_rfam.py

tests/test_parse/test_rna_fold.py

tests/test_parse/test_rnaalifold.py

tests/test_parse/test_rnaforester.py

tests/test_parse/test_rnaview.py

tests/test_parse/test_sprinzl.py

tests/test_parse/test_stride.py

tests/test_parse/test_tree.py

tests/test_parse/test_unigene.py

tests/test_phylo.py

tests/test_recalculation.rst

tests/test_seqsim/__init__.py

tests/test_seqsim/test_analysis.py

tests/test_seqsim/test_birth_death.py

tests/test_seqsim/test_markov.py

tests/test_seqsim/test_microarray.py

tests/test_seqsim/test_microarray_normalize.py

tests/test_seqsim/test_randomization.py

tests/test_seqsim/test_searchpath.py

tests/test_seqsim/test_sequence_generators.py

tests/test_seqsim/test_tree.py

tests/test_seqsim/test_usage.py

tests/test_struct/__init__.py

tests/test_struct/test_annotation.py

tests/test_struct/test_asa.py

tests/test_struct/test_contact.py

tests/test_struct/test_dihedral.py

tests/test_struct/test_knots.py

tests/test_struct/test_manipulation.py

tests/test_struct/test_pairs_util.py

tests/test_struct/test_rna2d.py

tests/test_struct/test_selection.py

tests/test_util/__init__.py

tests/test_util/test_array.py

tests/test_util/test_dict2d.py

tests/test_util/test_misc.py

tests/test_util/test_organizer.py

tests/test_util/test_recode_alignment.py

tests/test_util/test_table.rst

tests/test_util/test_transform.py

tests/test_util/test_trie.py

tests/test_util/test_unit_test.py

tests/timetrial.py

Show diffs side-by-side

added added

removed removed

doc/cookbook/analysis_of_sequence_composition.rst

Analysis of sequence composition

********************************

.. sectionauthor:: Jesse Zaneveld

PyCogent provides several tools for analyzing the composition of DNA, RNA, or

protein sequences.

Loading your sequence

=====================

Let us say that we wish to study the sequence composition of the *Y. pseudotuberculosis* PB1 DNA Polymerase III beta subunit.

First we input the sequence as a string.

.. doctest::

>>> y_pseudo_seq = \

... """ atgaaatttatcattgaacgtgagcatctgctaaaaccactgcaacaggtcagtagcccg

... ctgggtggacgccctacgttgcctattttgggtaacttgttgctgcaagtcacggaaggc

... tctttgcggctgaccggtaccgacttggagatggagatggtggcttgtgttgccttgtct

... cagtcccatgagccgggtgctaccacagtacccgcacggaagttttttgatatctggcgt

... ggtttacccgaaggggcggaaattacggtagcgttggatggtgatcgcctgctagtgcgc

... tctggtcgcagccgtttctcgctgtctaccttgcctgcgattgacttccctaatctggat

... gactggcagagtgaggttgaattcactttaccgcaggctacgttaaagcgtctgattgag

... tccactcagttttcgatggcccatcaggatgtccgttattatttgaacggcatgctgttt

... gagaccgaaggcgaagagttacgtactgtggcgaccgatgggcatcgcttggctgtatgc

... tcaatgcctattggccagacgttaccctcacattcggtgatcgtgccgcgtaaaggtgtg

... atggagctggttcggttgctggatggtggtgatacccccttgcggctgcaaattggcagt

... aataatattcgtgctcatgtgggcgattttattttcacatctaagctggttgatggccgt

... ttcccggattatcgccgcgtattgccgaagaatcctgataaaatgctggaagccggttgc

... gatttactgaaacaggcattttcgcgtgcggcaattctgtcaaatgagaagttccgtggt

... gttcggctctatgtcagccacaatcaactcaaaatcactgctaataatcctgaacaggaa

... gaagcagaagagatcctcgatgttagctacgaggggacagaaatggagatcggtttcaac

... gtcagctatgtgcttgatgtgctaaatgcactgaagtgcgaagatgtgcgcctgttattg

... actgactctgtatccagtgtgcagattgaagacagcgccagccaagctgcagcctatgtc

... gtcatgccaatgcgtttgtag"""

To check that our results are reasonable, we can also load a small example string.

.. doctest::

>>> example_seq = "GCGTTT"

In order to calculate compositional statistics, we need to import one of the ``Usage`` objects from ``cogent.core.usage``, create an object from our string, and normalize the counts contained in the string into frequencies. ``Usage`` objects include ``BaseUsage``, ``PositionalBaseUsage``, ``CodonUsage``, and ``AminoAcidUsage``.

Let us start with the ``BaseUsage`` object. The first few steps will be the same for the other Usage objects, however (as we will see below).

GC content

==========

Total GC content

-----------------

GC content is one commonly used compositional statistic. To calculate the total GC content of our gene, we will need to initiate and normalize a ``BaseUsage`` object.

.. doctest::

>>> from cogent.core.usage import BaseUsage

>>> example_bu = BaseUsage(example_seq)

>>> # Print raw counts

>>> print example_bu.content("GC")

3.0

>>> example_bu.normalize()

>>> print example_bu.content("GC")

0.5

We can now visually verify that the reported GC contents are correct, and use the same technique on our full sequence.

.. doctest::

>>> y_pseudo_bu = BaseUsage(y_pseudo_seq)

>>> # Print raw counts

>>> y_pseudo_bu.content("GC")

555.0

>>> y_pseudo_bu.normalize()

>>> print y_pseudo_bu.content("GC")

0.50408719346

Positional GC content of Codons

-------------------------------

When analyzing protein coding genes, it is often useful to subdivide the GC content by codon position. In particular, the 3rd codon position ``CodonUsage`` objects allow us to calculate the GC content at each codon position.

First, let us calculate the GC content for the codons in the example sequence as follows.

.. doctest::

>>> # Import CodonUsage object

>>> from cogent.core.usage import CodonUsage

>>> # Initiate & normalize CodonUsage object

>>> example_seq_cu = CodonUsage(example_seq)

>>> example_seq_cu.normalize()

>>> GC,P1,P2,P3 = example_seq_cu.positionalGC()

Here, GC is the overall GC content for the sequence, while P1, P2, and P3 are the GC content at the first, second, and third codon positions, respectively.

Printing the results for the example gives the following results.

100

101

.. doctest::

102

103

>>> print "GC:", GC

104

GC: 0.5

105

>>> print "P1:", P1

106

P1: 0.5

107

>>> print "P2:", P2

108

P2: 0.5

109

>>> print "P3:", P3

110

P3: 0.5

111

112

We can then do the same for our biological sequence.

113

114

.. doctest::

115

116

>>> y_pseudo_cu = CodonUsage(y_pseudo_seq)

117

>>> y_pseudo_cu.normalize()

118

>>> y_pseudo_GC = y_pseudo_cu.positionalGC()

119

>>> print y_pseudo_GC

120

[0.51874999999999993, 0.58437499999999987, 0.47500000000000009, 0.49687499999999996]

121

122

These results could then be fed into downstream analyses.

123

124

One important note is that ``CodonUsage`` objects calculate the GC content of codons within nucleotide sequences, rather than the full GC content. Therefore, ``BaseUsage`` rather than ``CodonUsage`` objects should be used for calculating the GC content of non-coding sequences.

125

126

Total Base Usage

127

================

128

129

A more detailed view of composition incorporates the relative counts or frequencies of all bases. We can calculate total base usage as follows.

130

131

.. doctest::

132

133

>>> from cogent.core.usage import BaseUsage

134

>>> example_bu = BaseUsage(example_seq)

135

>>> # Print raw counts

136

>>> for k in example_bu.RequiredKeys:

137

... print k, example_bu[k]

138

A 0.0

139

C 1.0

140

U 3.0

141

G 2.0

142

>>> example_bu.normalize()

143

>>> for k in example_bu.RequiredKeys:

144

... print k, example_bu[k]

145

A 0.0

146

C 0.166666666667

147

U 0.5

148

G 0.333333333333

149

150

Dinucleotide Content

151

====================

152

153

The ``DinucUsage`` object allows us to calculate Dinucleotide usage for our sequence.

154

155

Dinucleotide usage can be calculated using overlapping, non-overlapping, or '3-1' dinucleotides.

156

157

Given the sequence "AATTAAGCC", each method will count dinucleotide usage differently. Overlapping dinucleotide usage will count "AA", "AT", "TT", "TA", "AA", "AG", "GC", "CC". Non-overlapping dinucleotide usage will count "AA", "TT", "AA", "GC" 3-1 dinucleotide usage will count "TT", "AC".

158

159

Calculating the GC content at the third and first codon positions ("3-1" usage) is useful for some applications, such as gene transfer detection, because changes at these positions tend to produce the most conservative amino acid substitutions, and thus are thought to better reflect mutational (rather than selective) pressure.

160

161

Overlapping dinucleotide content

162

--------------------------------

163

164

To calculate overlapping dinucleotide usage for our *Y. pseudotuberculosis* PB1 sequence.

165

166

.. doctest::

167

168

>>> from cogent.core.usage import DinucUsage

169

>>> du = DinucUsage(y_pseudo_seq, Overlapping=True)

170

>>> du.normalize()

171

172

We can inspect individual dinucleotide usages and confirm that the results add to 100% as follows

173

174

.. doctest::

175

176

>>> total = 0.0

177

>>> for k in du.RequiredKeys:

178

... print k, du[k]

179

... total += du[k]

180

UU 0.0757855822551

181

UC 0.0517560073937

182

UA 0.043438077634

183

UG 0.103512014787

184

CU 0.0619223659889

185

CC 0.0517560073937

186

CA 0.0517560073937

187

CG 0.0573012939002

188

AU 0.0674676524954

189

AC 0.043438077634

190

AA 0.0573012939002

191

AG 0.054528650647

192

GU 0.0711645101664

193

GC 0.0794824399261

194

GA 0.0674676524954

195

GG 0.0619223659889

196

>>> print "Total:",total

197

Total: 1.0

198

199

Non-overlapping Dinucleotide Content

200

------------------------------------

201

202

To calculate non-overlapping dinucleotide usage we simply change the ``Overlapping`` parameter to ``False`` when initiating the ``DinucUsage`` object.

203

204

.. doctest::

205

206

>>> from cogent.core.usage import DinucUsage

207

>>> du_no = DinucUsage(y_pseudo_seq, Overlapping=False)

208

>>> du_no.normalize()

209

>>> total = 0

210

>>> for k in du_no.RequiredKeys:

211

... print k, du_no[k]

212

... total += du_no[k]

213

UU 0.0733082706767

214

UC 0.0507518796992

215

UA 0.0375939849624

216

UG 0.105263157895

217

CU 0.0733082706767

218

CC 0.046992481203

219

CA 0.0394736842105

220

CG 0.0601503759398

221

AU 0.0751879699248

222

AC 0.046992481203

223

AA 0.062030075188

224

AG 0.0545112781955

225

GU 0.0601503759398

226

GC 0.0845864661654

227

GA 0.0676691729323

228

GG 0.062030075188

229

>>> print "Total:",total

230

Total: 1.0

231

232

'3-1' Dinucleotide Content

233

--------------------------

234

235

To calculate dinucleotide usage considering only adjacent first and third codon positions, we set the Overlapping parameter to '3-1' when constructing our ``DinucUsage`` object

236

237

.. doctest::

238

239

>>> from cogent.core.usage import DinucUsage

240

>>> du_3_1 = DinucUsage(y_pseudo_seq, Overlapping='3-1')

241

>>> du_3_1.normalize()

242

>>> total = 0

243

>>> for k in du_3_1.RequiredKeys:

244

... print k, du_3_1[k]

245

... total += du_3_1[k]

246

UU 0.0720221606648

247

UC 0.0664819944598

248

UA 0.0360110803324

249

UG 0.0914127423823

250

CU 0.0387811634349

251

CC 0.0415512465374

252

CA 0.0554016620499

253

CG 0.0554016620499

254

AU 0.0498614958449

255

AC 0.0470914127424

256

AA 0.0664819944598

257

AG 0.0747922437673

258

GU 0.0886426592798

259

GC 0.0886426592798

260

GA 0.0609418282548

261

GG 0.0664819944598

262

>>> print "Total:",total

263

Total: 1.0

264

265

Comparing dinucleotide usages

266

-----------------------------

267

268

Above, we noted that there are several ways to calculate dinucleotide usages on a single sequence, and that the choice of methods changes the reported frequencies somewhat. How could we quantify the effect this choice make on the result?

269

270

One way to test this is to calculate the Euclidean distance between the resulting frequencies. We can do this using the dinucleotide usage's

271

272

.. doctest::

273

274

>>> du_vs_du_3_1_dist = du.distance(du_3_1)

275

276

As required of a true distance, the results are independent of the direction of the calculation.

277

278

.. doctest::

279

280

>>> du_3_1_vs_du_dist = du_3_1.distance(du)

281

>>> print du_3_1_vs_du_dist == du_vs_du_3_1_dist

282

True

283

284

Caution regarding unnormalized distances

285

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

286

287

Note that in this case we have already called ``du.normalize()`` on each ``DinucUsage`` object. You MUST call ``du.normalize()`` before calculating distances. Otherwise the distance calculated will be for the dinucleotide counts, rather than frequencies. Distances of counts can be non-zero even for sequences with identical dinucleotide usage, if those sequences are of different lengths.

288

289

k-words

290

-------

291

292

*To be written.*

293

294

Codon usage analyses

295

====================

296

*To be written.*

k-words

=======

297

In addition to allowing a more detailed examination of GC content in coding sequences, ``CodonUsage`` objects (as the name implies) let us examine the codon usage of our sequence.

298

299

.. doctest::

300

301

>>> from cogent.core.usage import CodonUsage

302

>>> y_pseudo_cu = CodonUsage(y_pseudo_seq)

303

>>> # Print raw counts

304

>>> for k in y_pseudo_cu.RequiredKeys:

305

... print k, y_pseudo_cu[k]

306

UUU 8.0

307

UUC 4.0

308

UUA 5.0

309

UUG 14.0

310

UCU 4.0

311

UCC 3.0

312

UCA 5.0

313

UCG 3.0

314

UAU 8.0...

315

316

Note that before normalization the ``CodonUsage`` object holds raw counts of results. However, for most purposes, we will want frequencies, so we normalize the counts.

317

318

.. doctest::

319

320

>>> y_pseudo_cu.normalize()

321

>>> # Print normalized frequencies

322

>>> for k in y_pseudo_cu.RequiredKeys:

323

... print k, y_pseudo_cu[k]

324

UUU 0.0225988700565

325

UUC 0.0112994350282

326

UUA 0.0141242937853

327

UUG 0.0395480225989

328

UCU 0.0112994350282

329

UCC 0.00847457627119

330

UCA 0.0141242937853

331

UCG 0.00847457627119

332

UAU 0.0225988700565...

333

334

Relative Synonymous Codon Usage

335

-------------------------------

336

337

The RSCU or relative synonymous codon usage metric divides the frequency of each codon by the total frequency of all codons encoding the same amino acid.

338

339

.. doctest::

340

341

>>> y_pseudo_cu.normalize()

342

>>> y_pseudo_rscu = y_pseudo_cu.rscu()

343

>>> # Print rscu frequencies

344

>>> for k in y_pseudo_rscu.keys():

345

... print k, y_pseudo_rscu[k]

346

ACC 0.263157894737

347

GUC 0.238095238095

348

ACA 0.210526315789

349

ACG 0.263157894737

350

AAC 0.4

351

CCU 0.315789473684

352

UGG 1.0

353

AUC 0.266666666667

354

GUA 0.190476190476...

355

356

PR2 bias

357

--------

358

359

*To be written*

360

361

Fingerprint analysis

362

--------------------

363

364

*To be written*

365

366

Amino Acid Usage

367

================

368

369

*To be written.*

370

Older »