~ubuntu-branches/ubuntu/natty/python-cogent/natty

Viewing changes to doc/cookbook/ensembl.rst

Committer: Bazaar Package Importer
Author(s): Steffen Moeller
Date: 2010-12-04 22:30:35 UTC
mfrom: (1.1.1 upstream)
Revision ID: james.westby@ubuntu.com-20101204223035-j11kinhcrrdgg2p2

Tags: 1.5-1

* Bumped standard to 3.9.1, no changes required.
* New upstream version.
  - major additions to Cookbook
  - added AlleleFreqs attribute to ensembl Variation objects.
  - added getGeneByStableId method to genome objects.
  - added Introns attribute to Transcript objects and an Intron class.
  - added Mann-Whitney test and a Monte-Carlo version
  - exploratory and confirmatory period estimation techniques (suitable for
    symbolic and continuous data)
  - Information theoretic measures (AIC and BIC) added
  - drawing of trees with collapsed nodes
  - progress display indicator support for terminal and GUI apps
  - added parser for illumina HiSeq2000 and GAiix sequence files as
    cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser.
  - added parser to FASTQ files, one of the output options for illumina's
    workflow, also added cookbook demo.
  - added functionality for parsing of SFF files without the Roche tools in
    cogent.parse.binary_sff
  - thousand fold performance improvement to nmds
  - >10-fold performance improvements to some Table operations

files added:
cogent/cluster/approximate_mds.py

cogent/maths/_period.c

cogent/maths/_period.pyx

cogent/maths/period.py

cogent/maths/stats/information_criteria.py

cogent/maths/stats/period.py

cogent/parse/binary_sff.py

cogent/parse/fastq.py

cogent/parse/illumina_sequence.py

cogent/parse/kegg_ko.py

cogent/parse/kegg_pos.py

cogent/parse/kegg_taxonomy.py

cogent/util/progress_display.py

cogent/util/terminal.py

doc/_static

doc/_static/google_feed.js

doc/cookbook/alphabet.rst

doc/cookbook/checkpointing_long_running.rst

doc/cookbook/ensembl.rst

doc/cookbook/loading_sequences.rst

doc/cookbook/managing_trees.rst

doc/cookbook/moltypesequence.rst

doc/cookbook/parallel_tasks.rst

doc/cookbook/phylonodes.rst

doc/cookbook/structural_contacts.rst

doc/cookbook/structural_data_2.rst

doc/data/1HQF.pdb

doc/data/Crump_et_al_example_env_file.txt

doc/data/Crump_example_tree_newick.txt

doc/data/inseqs_protein.fasta

doc/data/refseqs_protein.fasta

doc/examples/building_and_using_an_application_controller.rst

doc/examples/period_estimation.rst

doc/examples/seqsim_alignment_simulation.rst

doc/examples/seqsim_aln_sim_user_alphabet.rst

doc/examples/seqsim_tree_sim.rst

tests/data/F6AVWTA01.sff

tests/data/fastq.txt

tests/test_cluster/test_approximate_mds.py

tests/test_maths/test_period.py

tests/test_maths/test_stats/test_information_criteria.py

tests/test_maths/test_stats/test_period.py

tests/test_parse/test_binary_sff.py

tests/test_parse/test_fastq.py

tests/test_parse/test_illumina_sequence.py

tests/test_parse/test_kegg_ko.py

tests/test_parse/test_kegg_pos.py

tests/test_parse/test_kegg_taxonomy.py

tests/test_parse/test_mothur.py

tests/test_parse/test_pdb.py

tests/test_parse/test_rna_plot.py

tests/test_parse/test_structure.py

files removed:
tests/test_core/test_tree2.py

files modified:
.pc/fix_python_shebang_line.patch/cogent/align/dp_calculation.py

.pc/fix_python_shebang_line.patch/cogent/data/molecular_weight.py

.pc/fix_python_shebang_line.patch/cogent/format/text_tree.py

.pc/fix_python_shebang_line.patch/cogent/phylo/maximum_likelihood.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/__init__.py

.pc/fix_python_shebang_line.patch/cogent/recalculation/setting.py

ChangeLog

cogent/__init__.py

cogent/align/__init__.py

cogent/align/_compare.c

cogent/align/_compare.pyx

cogent/align/_pairwise_pogs.c

cogent/align/_pairwise_pogs.pyx

cogent/align/_pairwise_seqs.c

cogent/align/_pairwise_seqs.pyx

cogent/align/algorithm.py

cogent/align/align.py

cogent/align/dp_calculation.py

cogent/align/indel_model.py

cogent/align/indel_positions.py

cogent/align/pairwise.py

cogent/align/partial_order_graph.py

cogent/align/progressive.py

cogent/align/pycompare.py

cogent/align/traceback.py

cogent/align/weights/__init__.py

cogent/align/weights/methods.py

cogent/align/weights/util.py

cogent/app/__init__.py

cogent/app/blast.py

cogent/app/carnac.py

cogent/app/cd_hit.py

cogent/app/clearcut.py

cogent/app/clustalw.py

cogent/app/cmfinder.py

cogent/app/comrna.py

cogent/app/consan.py

cogent/app/contrafold.py

cogent/app/cove.py

cogent/app/dialign.py

cogent/app/dotur.py

cogent/app/dynalign.py

cogent/app/fasttree.py

cogent/app/fasttree_v1.py

cogent/app/foldalign.py

cogent/app/formatdb.py

cogent/app/gctmpca.py

cogent/app/ilm.py

cogent/app/infernal.py

cogent/app/knetfold.py

cogent/app/mafft.py

cogent/app/mfold.py

cogent/app/mothur.py

cogent/app/msms.py

cogent/app/muscle.py

cogent/app/nupack.py

cogent/app/parameters.py

cogent/app/pfold.py

cogent/app/pknotsrg.py

cogent/app/raxml.py

cogent/app/rdp_classifier.py

cogent/app/rnaalifold.py

cogent/app/rnaforester.py

cogent/app/rnashapes.py

cogent/app/rnaview.py

cogent/app/sfffile.py

cogent/app/sffinfo.py

cogent/app/sfold.py

cogent/app/stride.py

cogent/app/uclust.py

cogent/app/unafold.py

cogent/app/util.py

cogent/app/vienna_package.py

cogent/cluster/UPGMA.py

cogent/cluster/__init__.py

cogent/cluster/goodness_of_fit.py

cogent/cluster/metric_scaling.py

cogent/cluster/nmds.py

cogent/cluster/procrustes.py

cogent/core/__init__.py

cogent/core/alignment.py

cogent/core/alphabet.py

cogent/core/annotation.py

cogent/core/bitvector.py

cogent/core/entity.py

cogent/core/genetic_code.py

cogent/core/info.py

cogent/core/location.py

cogent/core/moltype.py

cogent/core/profile.py

cogent/core/sequence.py

cogent/core/tree.py

cogent/core/usage.py

cogent/data/__init__.py

cogent/data/energy_params.py

cogent/data/ligand_properties.py

cogent/data/molecular_weight.py

cogent/data/nucleic_properties.py

cogent/data/protein_properties.py

cogent/db/__init__.py

cogent/db/ensembl/__init__.py

cogent/db/ensembl/assembly.py

cogent/db/ensembl/compara.py

cogent/db/ensembl/database.py

cogent/db/ensembl/feature_level.py

cogent/db/ensembl/genome.py

cogent/db/ensembl/host.py

cogent/db/ensembl/name.py

cogent/db/ensembl/region.py

cogent/db/ensembl/related_region.py

cogent/db/ensembl/sequence.py

cogent/db/ensembl/species.py

cogent/db/ensembl/util.py

cogent/db/ncbi.py

cogent/db/pdb.py

cogent/db/rfam.py

cogent/db/util.py

cogent/draw/__init__.py

cogent/draw/arrow_rates.py

cogent/draw/codon_usage.py

cogent/draw/dendrogram.py

cogent/draw/dinuc.py

cogent/draw/dotplot.py

cogent/draw/fancy_arrow.py

cogent/draw/legend.py

cogent/draw/linear.py

cogent/draw/multivariate_plot.py

cogent/draw/rlg2mpl.py

cogent/draw/util.py

cogent/evolve/__init__.py

cogent/evolve/_likelihood_tree.c

cogent/evolve/_likelihood_tree.pyx

cogent/evolve/best_likelihood.py

cogent/evolve/bootstrap.py

cogent/evolve/coevolution.py

cogent/evolve/discrete_markov.py

cogent/evolve/likelihood_calculation.py

cogent/evolve/likelihood_function.py

cogent/evolve/likelihood_tree.py

cogent/evolve/models.py

cogent/evolve/motif_prob_model.py

cogent/evolve/parameter_controller.py

cogent/evolve/predicate.py

cogent/evolve/simulate.py

cogent/evolve/substitution_calculation.py

cogent/evolve/substitution_model.py

cogent/format/__init__.py

cogent/format/alignment.py

cogent/format/clustal.py

cogent/format/fasta.py

cogent/format/mage.py

cogent/format/motif.py

cogent/format/nexus.py

cogent/format/pdb.py

cogent/format/pdb_color.py

cogent/format/phylip.py

cogent/format/rna_struct.py

cogent/format/stockholm.py

cogent/format/structure.py

cogent/format/table.py

cogent/format/text_tree.py

cogent/format/xyzrn.py

cogent/maths/__init__.py

cogent/maths/_matrix_exponentiation.c

cogent/maths/_matrix_exponentiation.pyx

cogent/maths/distance_transform.py

cogent/maths/eigen.c

cogent/maths/function_optimisation.py

cogent/maths/geometry.py

cogent/maths/markov.py

cogent/maths/matrix/__init__.py

cogent/maths/matrix/distance.py

cogent/maths/matrix_exponentiation.py

cogent/maths/matrix_invert.c

cogent/maths/matrix_logarithm.py

cogent/maths/optimiser.py

cogent/maths/optimisers.py

cogent/maths/scipy_optimisers.py

cogent/maths/scipy_optimize.py

cogent/maths/simannealingoptimiser.py

cogent/maths/solve.py

cogent/maths/spatial/__init__.py

cogent/maths/spatial/ckd3.c

cogent/maths/spatial/ckd3.pyx

cogent/maths/stats/__init__.py

cogent/maths/stats/alpha_diversity.py

cogent/maths/stats/cai/__init__.py

cogent/maths/stats/cai/adaptor.py

cogent/maths/stats/cai/get_by_cai.py

cogent/maths/stats/cai/util.py

cogent/maths/stats/distribution.py

cogent/maths/stats/histogram.py

cogent/maths/stats/kendall.py

cogent/maths/stats/ks.py

cogent/maths/stats/rarefaction.py

cogent/maths/stats/special.py

cogent/maths/stats/test.py

cogent/maths/stats/util.py

cogent/maths/svd.py

cogent/maths/unifrac/__init__.py

cogent/maths/unifrac/fast_tree.py

cogent/maths/unifrac/fast_unifrac.py

cogent/motif/__init__.py

cogent/motif/k_word.py

cogent/motif/util.py

cogent/parse/__init__.py

cogent/parse/aaindex.py

cogent/parse/agilent_microarray.py

cogent/parse/blast.py

cogent/parse/blast_xml.py

cogent/parse/bpseq.py

cogent/parse/carnac.py

cogent/parse/cigar.py

cogent/parse/clustal.py

cogent/parse/cmfinder.py

cogent/parse/column.py

cogent/parse/comrna.py

cogent/parse/consan.py

cogent/parse/contrafold.py

cogent/parse/cove.py

cogent/parse/ct.py

cogent/parse/cut.py

cogent/parse/cutg.py

cogent/parse/dialign.py

cogent/parse/dotur.py

cogent/parse/dynalign.py

cogent/parse/ebi.py

cogent/parse/fasta.py

cogent/parse/flowgram.py

cogent/parse/flowgram_collection.py

cogent/parse/flowgram_parser.py

cogent/parse/foldalign.py

cogent/parse/gbseq.py

cogent/parse/gcg.py

cogent/parse/genbank.py

cogent/parse/gff.py

cogent/parse/gibbs.py

cogent/parse/ilm.py

cogent/parse/infernal.py

cogent/parse/knetfold.py

cogent/parse/locuslink.py

cogent/parse/macsim.py

cogent/parse/mage.py

cogent/parse/meme.py

cogent/parse/mfold.py

cogent/parse/mothur.py

cogent/parse/msms.py

cogent/parse/ncbi_taxonomy.py

cogent/parse/newick.py

cogent/parse/nexus.py

cogent/parse/nupack.py

cogent/parse/paml.py

cogent/parse/paml_matrix.py

cogent/parse/pdb.py

cogent/parse/pfold.py

cogent/parse/phylip.py

cogent/parse/pknotsrg.py

cogent/parse/rdb.py

cogent/parse/record.py

cogent/parse/record_finder.py

cogent/parse/rfam.py

cogent/parse/rna_fold.py

cogent/parse/rna_plot.py

cogent/parse/rnaalifold.py

cogent/parse/rnaforester.py

cogent/parse/rnashapes.py

cogent/parse/rnaview.py

cogent/parse/sequence.py

cogent/parse/sfold.py

cogent/parse/sprinzl.py

cogent/parse/stride.py

cogent/parse/structure.py

cogent/parse/table.py

cogent/parse/tinyseq.py

cogent/parse/tree.py

cogent/parse/tree_xml.py

cogent/parse/unafold.py

cogent/parse/unigene.py

cogent/phylo/__init__.py

cogent/phylo/compatibility.py

cogent/phylo/consensus.py

cogent/phylo/distance.py

cogent/phylo/least_squares.py

cogent/phylo/maximum_likelihood.py

cogent/phylo/nj.py

cogent/phylo/tree_collection.py

cogent/phylo/tree_space.py

cogent/phylo/util.py

cogent/recalculation/__init__.py

cogent/recalculation/calculation.py

cogent/recalculation/definition.py

cogent/recalculation/scope.py

cogent/recalculation/setting.py

cogent/seqsim/__init__.py

cogent/seqsim/analysis.py

cogent/seqsim/birth_death.py

cogent/seqsim/markov.py

cogent/seqsim/microarray.py

cogent/seqsim/microarray_normalize.py

cogent/seqsim/randomization.py

cogent/seqsim/searchpath.py

cogent/seqsim/sequence_generators.py

cogent/seqsim/tree.py

cogent/seqsim/usage.py

cogent/struct/__init__.py

cogent/struct/_asa.c

cogent/struct/_asa.pyx

cogent/struct/_contact.c

cogent/struct/_contact.pyx

cogent/struct/annotation.py

cogent/struct/asa.py

cogent/struct/contact.py

cogent/struct/dihedral.py

cogent/struct/knots.py

cogent/struct/manipulation.py

cogent/struct/pairs_util.py

cogent/struct/rna2d.py

cogent/struct/selection.py

cogent/util/__init__.py

cogent/util/array.py

cogent/util/checkpointing.py

cogent/util/datatypes.py

cogent/util/dict2d.py

cogent/util/dict_array.py

cogent/util/misc.py

cogent/util/modules.py

cogent/util/organizer.py

cogent/util/parallel.py

cogent/util/recode_alignment.py

cogent/util/table.py

cogent/util/transform.py

cogent/util/trie.py

cogent/util/unit_test.py

cogent/util/update_version.py

cogent/util/warning.py

debian/changelog

debian/control

doc/conf.py

doc/cookbook/DNA_and_RNA_sequences.rst

doc/cookbook/accessing_databases.rst

doc/cookbook/alignments.rst

doc/cookbook/analysis_of_sequence_composition.rst

doc/cookbook/annotations.rst

doc/cookbook/blast.rst

doc/cookbook/building_alignments.rst

doc/cookbook/building_phylogenies.rst

doc/cookbook/community_analysis.rst

doc/cookbook/dealing_with_hts_data.rst

doc/cookbook/genetic_code.rst

doc/cookbook/hpc_environments.rst

doc/cookbook/index.rst

doc/cookbook/introduction.rst

doc/cookbook/manipulating_biological_data.rst

doc/cookbook/multivariate_data_analysis.rst

doc/cookbook/simple_trees.rst

doc/cookbook/standard_statistical_analyses.rst

doc/cookbook/structural_data.rst

doc/cookbook/tips_for_using_python.rst

doc/cookbook/useful_utilities.rst

doc/cookbook/using_likelihood_to_perform_evolutionary_analyses.rst

doc/data_file_links.rst

doc/examples/alignment_app_controllers.rst

doc/examples/application_controller_framework.rst

doc/examples/calculate_UPGMA_cluster.rst

doc/examples/calculate_neigbourjoining_tree.rst

doc/examples/calculate_pairwise_distances.rst

doc/examples/codon_models.rst

doc/examples/draw_dendrogram.rst

doc/examples/draw_dotplot.rst

doc/examples/empirical_protein_models.rst

doc/examples/estimate_startingpoint.rst

doc/examples/genetic_code_aa_index.rst

doc/examples/handling_3dstructures.rst

doc/examples/hmm_par_heterogeneity.rst

doc/examples/index.rst

doc/examples/maketree_from_proteinseqs.rst

doc/examples/neutral_test.rst

doc/examples/parametric_bootstrap.rst

doc/examples/perform_PCoA_analysis.rst

doc/examples/phylo_by_ls.rst

doc/examples/phylogeny_app_controllers.rst

doc/examples/query_ensembl.rst

doc/examples/query_ncbi.rst

doc/examples/rate_heterogeneity.rst

doc/examples/relative_rate.rst

doc/examples/reuse_results.rst

doc/examples/scope_model_params_on_trees.rst

doc/examples/simple.rst

doc/examples/testing_multi_loci.rst

doc/examples/unrestricted_nucleotide.rst

doc/index.rst

doc/install.rst

doc/templates/layout.html

include/array_interface.h

include/numerical_pyrex.pyx

setup.py

tests/__init__.py

tests/alltests.py

tests/benchmark.py

tests/benchmark_aligning.py

tests/test_align/__init__.py

tests/test_align/test_algorithm.py

tests/test_align/test_align.py

tests/test_align/test_weights/__init__.py

tests/test_align/test_weights/test_methods.py

tests/test_align/test_weights/test_util.py

tests/test_app/__init__.py

tests/test_app/test_blast.py

tests/test_app/test_carnac.py

tests/test_app/test_cd_hit.py

tests/test_app/test_clearcut.py

tests/test_app/test_clustalw.py

tests/test_app/test_cmfinder.py

tests/test_app/test_comrna.py

tests/test_app/test_consan.py

tests/test_app/test_contrafold.py

tests/test_app/test_cove.py

tests/test_app/test_dialign.py

tests/test_app/test_dotur.py

tests/test_app/test_dynalign.py

tests/test_app/test_fasttree.py

tests/test_app/test_fasttree_v1.py

tests/test_app/test_foldalign.py

tests/test_app/test_formatdb.py

tests/test_app/test_gctmpca.py

tests/test_app/test_ilm.py

tests/test_app/test_infernal.py

tests/test_app/test_knetfold.py

tests/test_app/test_mafft.py

tests/test_app/test_mfold.py

tests/test_app/test_mothur.py

tests/test_app/test_msms.py

tests/test_app/test_muscle.py

tests/test_app/test_nupack.py

tests/test_app/test_parameters.py

tests/test_app/test_pfold.py

tests/test_app/test_pknotsrg.py

tests/test_app/test_raxml.py

tests/test_app/test_rdp_classifier.py

tests/test_app/test_rnaalifold.py

tests/test_app/test_rnaforester.py

tests/test_app/test_rnaview.py

tests/test_app/test_sfffile.py

tests/test_app/test_sffinfo.py

tests/test_app/test_sfold.py

tests/test_app/test_stride.py

tests/test_app/test_uclust.py

tests/test_app/test_unafold.py

tests/test_app/test_util.py

tests/test_app/test_vienna_package.py

tests/test_cluster/__init__.py

tests/test_cluster/test_UPGMA.py

tests/test_cluster/test_goodness_of_fit.py

tests/test_cluster/test_metric_scaling.py

tests/test_cluster/test_nmds.py

tests/test_cluster/test_procrustes.py

tests/test_core/__init__.py

tests/test_core/test_alignment.py

tests/test_core/test_alphabet.py

tests/test_core/test_annotation.py

tests/test_core/test_bitvector.py

tests/test_core/test_core_standalone.py

tests/test_core/test_entity.py

tests/test_core/test_genetic_code.py

tests/test_core/test_info.py

tests/test_core/test_location.py

tests/test_core/test_maps.py

tests/test_core/test_moltype.py

tests/test_core/test_profile.py

tests/test_core/test_seq_aln_integration.py

tests/test_core/test_sequence.py

tests/test_core/test_tree.py

tests/test_core/test_usage.py

tests/test_data/__init__.py

tests/test_data/test_molecular_weight.py

tests/test_db/__init__.py

tests/test_db/test_ensembl/__init__.py

tests/test_db/test_ensembl/test_assembly.py

tests/test_db/test_ensembl/test_compara.py

tests/test_db/test_ensembl/test_database.py

tests/test_db/test_ensembl/test_feature_level.py

tests/test_db/test_ensembl/test_genome.py

tests/test_db/test_ensembl/test_host.py

tests/test_db/test_ensembl/test_species.py

tests/test_db/test_ncbi.py

tests/test_db/test_pdb.py

tests/test_db/test_rfam.py

tests/test_db/test_util.py

tests/test_draw.py

tests/test_draw/test_matplotlib/test_arrow_rates.py

tests/test_draw/test_matplotlib/test_codon_usage.py

tests/test_draw/test_matplotlib/test_dinuc.py

tests/test_draw/test_matplotlib/test_multivariate_plot.py

tests/test_evolve/__init__.py

tests/test_evolve/test_best_likelihood.py

tests/test_evolve/test_bootstrap.py

tests/test_evolve/test_coevolution.py

tests/test_evolve/test_likelihood_function.py

tests/test_evolve/test_models.py

tests/test_evolve/test_motifchange.py

tests/test_evolve/test_newq.py

tests/test_evolve/test_parameter_controller.py

tests/test_evolve/test_scale_rules.py

tests/test_evolve/test_simulation.py

tests/test_evolve/test_substitution_model.py

tests/test_format/__init__.py

tests/test_format/test_clustal.py

tests/test_format/test_fasta.py

tests/test_format/test_mage.py

tests/test_format/test_pdb_color.py

tests/test_format/test_stockholm.py

tests/test_format/test_xyzrn.py

tests/test_maths/__init__.py

tests/test_maths/test_distance_transform.py

tests/test_maths/test_function_optimisation.py

tests/test_maths/test_geometry.py

tests/test_maths/test_matrix/__init__.py

tests/test_maths/test_matrix/test_distance.py

tests/test_maths/test_matrix_logarithm.py

tests/test_maths/test_optimisers.py

tests/test_maths/test_spatial/__init__.py

tests/test_maths/test_spatial/test_ckd3.py

tests/test_maths/test_stats/__init__.py

tests/test_maths/test_stats/test_alpha_diversity.py

tests/test_maths/test_stats/test_cai/__init__.py

tests/test_maths/test_stats/test_cai/test_adaptor.py

tests/test_maths/test_stats/test_cai/test_get_by_cai.py

tests/test_maths/test_stats/test_cai/test_util.py

tests/test_maths/test_stats/test_distribution.py

tests/test_maths/test_stats/test_histogram.py

tests/test_maths/test_stats/test_ks.py

tests/test_maths/test_stats/test_rarefaction.py

tests/test_maths/test_stats/test_special.py

tests/test_maths/test_stats/test_test.py

tests/test_maths/test_stats/test_util.py

tests/test_maths/test_svd.py

tests/test_maths/test_unifrac/__init__.py

tests/test_maths/test_unifrac/test_fast_tree.py

tests/test_maths/test_unifrac/test_fast_unifrac.py

tests/test_motif/__init__.py

tests/test_motif/test_util.py

tests/test_parse/__init__.py

tests/test_parse/test_aaindex.py

tests/test_parse/test_agilent_microarray.py

tests/test_parse/test_blast.py

tests/test_parse/test_blast_xml.py

tests/test_parse/test_bpseq.py

tests/test_parse/test_cigar.py

tests/test_parse/test_clustal.py

tests/test_parse/test_column.py

tests/test_parse/test_comrna.py

tests/test_parse/test_consan.py

tests/test_parse/test_cove.py

tests/test_parse/test_ct.py

tests/test_parse/test_cut.py

tests/test_parse/test_cutg.py

tests/test_parse/test_dialign.py

tests/test_parse/test_dotur.py

tests/test_parse/test_ebi.py

tests/test_parse/test_fasta.py

tests/test_parse/test_flowgram.py

tests/test_parse/test_flowgram_collection.py

tests/test_parse/test_flowgram_parser.py

tests/test_parse/test_genbank.py

tests/test_parse/test_gff.py

tests/test_parse/test_gibbs.py

tests/test_parse/test_ilm.py

tests/test_parse/test_infernal.py

tests/test_parse/test_locuslink.py

tests/test_parse/test_mage.py

tests/test_parse/test_meme.py

tests/test_parse/test_msms.py

tests/test_parse/test_ncbi_taxonomy.py

tests/test_parse/test_nexus.py

tests/test_parse/test_nupack.py

tests/test_parse/test_phylip.py

tests/test_parse/test_pknotsrg.py

tests/test_parse/test_rdb.py

tests/test_parse/test_record.py

tests/test_parse/test_record_finder.py

tests/test_parse/test_rfam.py

tests/test_parse/test_rna_fold.py

tests/test_parse/test_rnaalifold.py

tests/test_parse/test_rnaforester.py

tests/test_parse/test_rnaview.py

tests/test_parse/test_sprinzl.py

tests/test_parse/test_stride.py

tests/test_parse/test_tree.py

tests/test_parse/test_unigene.py

tests/test_phylo.py

tests/test_recalculation.rst

tests/test_seqsim/__init__.py

tests/test_seqsim/test_analysis.py

tests/test_seqsim/test_birth_death.py

tests/test_seqsim/test_markov.py

tests/test_seqsim/test_microarray.py

tests/test_seqsim/test_microarray_normalize.py

tests/test_seqsim/test_randomization.py

tests/test_seqsim/test_searchpath.py

tests/test_seqsim/test_sequence_generators.py

tests/test_seqsim/test_tree.py

tests/test_seqsim/test_usage.py

tests/test_struct/__init__.py

tests/test_struct/test_annotation.py

tests/test_struct/test_asa.py

tests/test_struct/test_contact.py

tests/test_struct/test_dihedral.py

tests/test_struct/test_knots.py

tests/test_struct/test_manipulation.py

tests/test_struct/test_pairs_util.py

tests/test_struct/test_rna2d.py

tests/test_struct/test_selection.py

tests/test_util/__init__.py

tests/test_util/test_array.py

tests/test_util/test_dict2d.py

tests/test_util/test_misc.py

tests/test_util/test_organizer.py

tests/test_util/test_recode_alignment.py

tests/test_util/test_table.rst

tests/test_util/test_transform.py

tests/test_util/test_trie.py

tests/test_util/test_unit_test.py

tests/timetrial.py

Show diffs side-by-side

added added

removed removed

doc/cookbook/ensembl.rst

Note that much more extensive documentation is available in :ref:`query-ensembl`.

Connecting

----------

.. Gavin Huttley

`Ensembl <http://www.ensembl.org>`_ provides access to their MySQL databases directly or users can download and run those databases on a local machine. To use the Ensembl's UK servers for running queries, nothing special needs to be done as this is the default setting for PyCogent's ``ensembl`` module. To use a different Ensembl installation, you create an account instance:

.. doctest::

>>> from cogent.db.ensembl import HostAccount

>>> account = HostAccount('fastcomputer.topuni.edu', 'username',

... 'canthackthis')

To specify a specific port to connect to MySQL on:

.. doctest::

>>> from cogent.db.ensembl import HostAccount

>>> account = HostAccount('fastcomputer.topuni.edu', 'dude',

... 'ucanthackthis', port=3306)

.. we create valid account now to work on my local machines here at ANU

.. doctest::

:hide:

>>> import os

>>> uname, passwd = os.environ['ENSEMBL_ACCOUNT'].split()

>>> account = HostAccount('cg.anu.edu.au', uname, passwd)

Species to be queried

---------------------

To see what existing species are available

.. doctest::

>>> from cogent.db.ensembl import Species

>>> print Species

================================================================================

Common Name Species Name Ensembl Db Prefix

--------------------------------------------------------------------------------

A.aegypti Aedes aegypti aedes_aegypti

Alpaca Vicugna pacos vicugna_pacos...

If Ensembl has added a new species which is not yet included in ``Species``, you can add it yourself.

.. doctest::

>>> Species.amendSpecies('A latinname', 'a common name')

You can get the common name for a species

.. doctest::

>>> Species.getCommonName('Procavia capensis')

'Rock hyrax'

and the Ensembl database name prefix which will be used for all databases for this species.

.. doctest::

>>> Species.getEnsemblDbPrefix('Procavia capensis')

'procavia_capensis'

Get genomic features

--------------------

Find a gene by gene symbol

^^^^^^^^^^^^^^^^^^^^^^^^^^

We query for the *BRCA2* gene for humans.

.. doctest::

>>> from cogent.db.ensembl import Genome

>>> human = Genome('human', Release=58, account=account)

>>> print human

Genome(Species='Homo sapiens'; Release='58')

>>> genes = human.getGenesMatching(Symbol='BRCA2')

>>> for gene in genes:

... if gene.Symbol == 'BRCA2':

... print gene

... break

Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast cancer 2,...'; StableId='ENSG00000139618'; Status='KNOWN'; Symbol='BRCA2')

Find a gene by Ensembl Stable ID

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We use the stable ID for *BRCA2*.

.. doctest::

>>> from cogent.db.ensembl import Genome

>>> human = Genome('human', Release=58, account=account)

>>> gene = human.getGeneByStableId(StableId='ENSG00000139618')

>>> print gene

100

Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast cancer 2,...'; StableId='ENSG00000139618'; Status='KNOWN'; Symbol='BRCA2')

101

102

Find genes matching a description

103

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

104

105

We look for breast cancer related genes that are estrogen induced.

106

107

.. doctest::

108

109

>>> from cogent.db.ensembl import Genome

110

>>> human = Genome('human', Release=58, account=account)

111

>>> genes = human.getGenesMatching(Description='breast cancer estrogen')

112

>>> for gene in genes:

113

... print gene

114

Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='breast cancer estrogen-induced...'; StableId='ENSG00000181097'; Status='KNOWN'; Symbol='AC105219.1')

115

116

Get canonical transcript for a gene

117

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

118

119

We get the canonical transcripts for *BRCA2*.

120

121

.. doctest::

122

123

>>> from cogent.db.ensembl import Genome

124

>>> human = Genome('human', Release=58, account=account)

125

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

126

>>> transcript = brca2.CanonicalTranscript

127

>>> print transcript

128

Transcript(Species='Homo sapiens'; CoordName='13'; Start=32889610; End=32973347; length=83737; Strand='+')

129

130

Get the CDS for a transcript

131

^^^^^^^^^^^^^^^^^^^^^^^^^^^^

132

133

.. doctest::

134

135

>>> from cogent.db.ensembl import Genome

136

>>> human = Genome('human', Release=58, account=account)

137

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

138

>>> transcript = brca2.CanonicalTranscript

139

>>> cds = transcript.Cds

140

>>> print type(cds)

141

142

>>> print cds

143

ATGCCTATTGGATCCAAAGAGAGGCCA...

144

145

Look at all transcripts for a gene

146

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

147

148

.. doctest::

149

150

>>> from cogent.db.ensembl import Genome

151

>>> human = Genome('human', Release=58, account=account)

152

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

153

>>> for transcript in brca2.Transcripts:

154

... print transcript

155

Transcript(Species='Homo sapiens'; CoordName='13'; Start=32889610; End=32973347; length=83737; Strand='+')

156

Transcript(Species='Homo sapiens'; CoordName='13'; Start=32953976; End=32972409; length=18433; Strand='+')

157

158

Get the first exon for a transcript

159

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

160

161

We show just for the canonical transcript.

162

163

.. doctest::

164

165

>>> from cogent.db.ensembl import Genome

166

>>> human = Genome('human', Release=58, account=account)

167

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

168

>>> print brca2.CanonicalTranscript.Exons[0]

169

Exon(StableId=ENSE00001184784, Rank=1)

170

171

Get the introns for a transcript

172

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

173

174

We show just for the canonical transcript.

175

176

.. doctest::

177

178

>>> from cogent.db.ensembl import Genome

179

>>> human = Genome('human', Release=58, account=account)

180

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

181

>>> for intron in brca2.CanonicalTranscript.Introns:

182

... print intron

183

Intron(TranscriptId=ENST00000380152, Rank=1)

184

Intron(TranscriptId=ENST00000380152, Rank=2)

185

Intron(TranscriptId=ENST00000380152, Rank=3)...

186

187

188

Inspect the genomic coordinate for a feature

189

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

190

191

.. doctest::

192

193

>>> from cogent.db.ensembl import Genome

194

>>> human = Genome('human', Release=58, account=account)

195

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

196

>>> print brca2.Location.CoordName

197

198

>>> print brca2.Location.Start

199

32889610

200

>>> print brca2.Location.Strand

201

202

203

Get repeat elements in a genomic interval

204

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

205

206

We query the genome for repeats within a specific coordinate range on chromosome 13.

207

208

.. doctest::

209

210

>>> from cogent.db.ensembl import Genome

211

>>> human = Genome('human', Release=58, account=account)

212

>>> repeats = human.getFeatures(CoordName='13', Start=32879610, End=32889610, feature_types='repeat')

213

>>> for repeat in repeats:

214

... print repeat.RepeatClass

215

... print repeat

216

... break

217

SINE/Alu

218

Repeat(CoordName='13'; Start=32879362; End=32879662; length=300; Strand='-', Score=2479.0)

219

220

Get CpG island elements in a genomic interval

221

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

222

223

We query the genome for CpG islands within a specific coordinate range on chromosome 11.

224

225

.. doctest::

226

227

>>> from cogent.db.ensembl import Genome

228

>>> human = Genome('human', Release=58, account=account)

229

>>> islands = human.getFeatures(CoordName='11', Start=2150341, End=2170833, feature_types='cpg')

230

>>> for island in islands:

231

... print island

232

... break

233

CpGisland(CoordName='11'; Start=2158951; End=2162484; length=3533; Strand='-', Score=3254.0)

234

235

Get SNPs

236

--------

237

238

For a gene

239

^^^^^^^^^^

240

241

We find the genetic variants for the canonical transcript of *BRCA2*.

242

243

.. note:: The output is significantly truncated!

244

245

.. doctest::

246

247

>>> from cogent.db.ensembl import Genome

248

>>> human = Genome('human', Release=58, account=account)

249

>>> brca2 = human.getGeneByStableId(StableId='ENSG00000139618')

250

>>> transcript = brca2.CanonicalTranscript

251

>>> print transcript.Variants

252

(<cogent.db.ensembl.region.Variation object at ...

253

>>> for variant in transcript.Variants:

254

... print variant

255

... break

256

Variation(Symbol='rs55880202'; Effect='5PRIME_UTR'; Alleles='C/T')...

257

258

Get a single SNP

259

^^^^^^^^^^^^^^^^

260

261

We get a single SNP and print it's allele frequencies.

262

263

.. doctest::

264

265

>>> snp = list(human.getVariation(Symbol='rs34213141'))[0]

266

>>> print snp.AlleleFreqs

267

=============================

268

allele freq sample_id

269

-----------------------------

270

A 0.0303 913

271

G 0.9697 913

272

-----------------------------

273

274

What alignment types available

275

------------------------------

276

277

We create a ``Compara`` instance for human, chimpanzee and macaque.

278

279

.. doctest::

280

281

>>> from cogent.db.ensembl import Compara

282

>>> compara = Compara(['human', 'chimp', 'macaque'], Release=58,

283

... account=account)

284

>>> print compara.method_species_links

285

Align Methods/Clades

286

===================================================================================================================

287

method_link_species_set_id method_link_id species_set_id align_method align_clade

288

-------------------------------------------------------------------------------------------------------------------

289

469 10 33006 PECAN 16 amniota vertebrates Pecan

290

467 13 32905 EPO 12 eutherian mammals EPO...

291

292

Get genomic alignment for a gene region

293

---------------------------------------

294

295

We first get the syntenic region corresponding to human gene *BRCA2*.

296

297

.. doctest::

298

299

>>> from cogent.db.ensembl import Compara

300

>>> compara = Compara(['human', 'chimp', 'macaque'], Release=58,

301

... account=account)

302

>>> human_brca2 = compara.Human.getGeneByStableId(StableId='ENSG00000139618')

303

>>> regions = compara.getSyntenicRegions(region=human_brca2, align_method='EPO', align_clade='primates')

304

>>> for region in regions:

305

... print region

306

SyntenicRegions:

307

Coordinate(Human,chro...,13,32889610-32962969,1)

308

Coordinate(Chimp,chro...,13,32082473-32155304,1)

309

Coordinate(Macaque,chro...,17,11686607-11760932,1)...

310

311

We then get a cogent ``Alignment`` object, requesting that sequences be annotated for gene spans.

312

313

.. doctest::

314

315

>>> aln = region.getAlignment(feature_types='gene')

316

>>> print repr(aln)

317

3 x 11471 dna alignment: Homo sapiens:chromosome:13:3296...

318

319

Getting related genes

320

---------------------

321

322

What gene relationships are available

323

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

324

325

.. doctest::

326

327

>>> from cogent.db.ensembl import Compara

328

>>> compara = Compara(['human', 'chimp', 'macaque'], Release=58,

329

... account=account)

330

>>> print compara.getDistinct('relationship')

331

['ortholog_one2one', 'within_species_paralog', 'ortholog_one2many', ...

332

333

Get one-to-one orthologs

334

^^^^^^^^^^^^^^^^^^^^^^^^

335

336

We get the one-to-one orthologs for *BRCA2*.

337

338

.. doctest::

339

340

>>> from cogent.db.ensembl import Compara

341

>>> compara = Compara(['human', 'chimp', 'macaque'], Release=58,

342

... account=account)

343

>>> orthologs = compara.getRelatedGenes(StableId='ENSG00000139618',

344

... Relationship='ortholog_one2one')

345

>>> print orthologs

346

RelatedGenes:

347

Relationships=ortholog_one2one

348

Gene(Species='Pan troglodytes'; BioType='protein_coding'; Description='Breast cancer 2...'; Location=Coordinate(Chimp,chro...,13,32082479-32166147,1); StableId='ENSPTRG00000005766'; Status='KNOWN'; Symbol='Q8HZQ1_PANTR')...

349

350

We iterate over the related members.

351

352

.. doctest::

353

354

>>> for ortholog in orthologs.Members:

355

... print ortholog

356

Gene(Species='Pan troglodytes'; BioType='protein_coding'; Description='Breast...

357

358

We get statistics on the ortholog CDS lengths.

359

360

.. doctest::

361

362

>>> print orthologs.getMaxCdsLengths()

363

[10242, 10008, 10257]

364

365

We get the sequences as a sequence collection, with annotations for gene.

366

367

.. doctest::

368

369

>>> seqs = orthologs.getSeqCollection(feature_types='gene')

370

371

Get CDS for all one-to-one orthologs

372

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

373

374

We sample all one-to-one orthologs for a group of species, generating a FASTA formatted string that can be written to file. We check all species have an ortholog and that all are translatable.

375

376

.. doctest::

377

378

>>> from cogent.core.alphabet import AlphabetError

379

>>> common_names = ["mouse", "rat", "human", "opossum"]

380

>>> latin_names = set([Species.getSpeciesName(n) for n in common_names])

381

>>> latin_to_common = dict(zip(latin_names, common_names))

382

>>> compara = Compara(common_names, Release=58, account=account)

383

>>> for gene in compara.Human.getGenesMatching(BioType='protein_coding'):

384

... orthologs = compara.getRelatedGenes(gene,

385

... Relationship='ortholog_one2one')

386

... # make sure all species represented

387

... if orthologs is None or orthologs.getSpeciesSet() != latin_names:

388

... continue

389

... seqs = []

390

... for m in orthologs.Members:

391

... try: # if sequence can't be translated, we ignore it

392

... # get the CDS without the ending stop

393

... seq = m.CanonicalTranscript.Cds.withoutTerminalStopCodon()

394

... # make the sequence name

395

... seq.Name = '%s:%s:%s' % \

396

... (latin_to_common[m.genome.Species], m.StableId, m.Location)

397

... aa = seq.getTranslation()

398

... seqs += [seq]

399

... except (AlphabetError, AssertionError):

400

... seqs = [] # exclude this gene

401

... break

402

... if len(seqs) == len(common_names):

403

... fasta = '\n'.join(s.toFasta() for s in seqs)

404

... break

405

406

Get within species paralogs

407

^^^^^^^^^^^^^^^^^^^^^^^^^^^

408

409

.. doctest::

410

411

>>> paralogs = compara.getRelatedGenes(StableId='ENSG00000164032',

412

... Relationship='within_species_paralog')

413

>>> print paralogs

414

RelatedGenes:

415

Relationships=within_species_paralog

416

Gene(Species='Homo sapiens'; BioType='protein_coding'; Description='H2A histone...

417

Older »