13
MotifChange and predicates
14
--------------------------
11
Many standard evolutionary models come pre-defined in the ``cogent.evolve.models`` module.
13
The available nucleotide, codon and protein models are
17
>>> from cogent.evolve import models
18
>>> print models.nucleotide_models
19
['JC69', 'F81', 'HKY85', 'GTR']
20
>>> print models.codon_models
21
['CNFGTR', 'CNFHKY', 'MG94HKY', 'MG94GTR', 'GY94', 'H04G', 'H04GK', 'H04GGK']
22
>>> print models.protein_models
23
['DSO78', 'AH96', 'AH96_mtmammals', 'JTT92', 'WG01']
25
While those values are strings, a function of the same name exists within the module so creating the substitution models requires only calling that function. I demonstrate that for a nucleotide model here.
29
>>> from cogent.evolve.models import F81
32
We'll be using these for the examples below.
18
34
Rate heterogeneity models
19
35
-------------------------
37
We illustrate this for the gamma distributed case using examples of the canned models displayed above. Creating rate heterogeneity variants of the canned models can be done by using optional arguments that get passed to the substitution model class.
42
We specify a general time reversible nucleotide model with gamma distributed rate heterogeneity.
46
>>> from cogent.evolve.models import GTR
47
>>> sub_mod = GTR(with_rate=True, distribution='gamma')
50
Nucleotide ( name = 'GTR'; type = 'None'; params = ['A/G', 'A/T', 'A/C', 'C/T', 'C/G']; number of motifs = 4; motifs = ['T', 'C', 'A', 'G'])
56
We specify a conditional nucleotide frequency codon model with nucleotide general time reversible parameters and a parameter for the ratio of nonsynonymous to synonymous substitutions (omega) with gamma distributed rate heterogeneity.
60
>>> from cogent.evolve.models import CNFGTR
61
>>> sub_mod = CNFGTR(with_rate=True, distribution='gamma')
64
Codon ( name = 'CNFGTR'; type = 'None'; params = ['A/G', 'A/C', 'C/T', 'A/T', 'C/G', 'omega']; ...
69
We specify a Jones, Taylor and Thornton 1992 empirical protein substitution model with gamma distributed rate heterogeneity.
73
>>> from cogent.evolve.models import JTT92
74
>>> sub_mod = JTT92(with_rate=True, distribution='gamma')
77
Empirical ( name = 'JTT92'; type = 'None'; number of motifs = 20; motifs = ['A', 'C'...
23
79
Specifying likelihood functions
24
80
===============================
82
Making a likelihood function
83
----------------------------
85
You start by specifying a substitution model and use that to construct a likelihood function for a specific tree.
89
>>> from cogent import LoadTree
90
>>> from cogent.evolve.models import F81
92
>>> tree = LoadTree(treestring='(a,b,(c,d))')
93
>>> lf = sub_mod.makeLikelihoodFunction(tree)
95
Providing an alignment to a likelihood function
96
-----------------------------------------------
98
You need to load an alignment and then provide it a likelihood function. I construct very simple trees and alignments for this example.
102
>>> from cogent import LoadTree, LoadSeqs
103
>>> from cogent.evolve.models import F81
105
>>> tree = LoadTree(treestring='(a,b,(c,d))')
106
>>> lf = sub_mod.makeLikelihoodFunction(tree)
107
>>> aln = LoadSeqs(data=[('a', 'ACGT'), ('b', 'AC-T'), ('c', 'ACGT'),
110
>>> lf.setAlignment(aln)
26
112
Scoping parameters on trees
27
113
---------------------------
115
For many evolutionary analyses, it's desirable to allow different branches on a tree to have different values of a parameter. We show this for a simple codon model case here where we want the great apes (the clade that includes human and orangutan) to have a different value of the ratio of nonsynonymous to synonymous substitutions. This parameter is identified in the precanned ``CNFGTR`` model as ``omega``.
119
>>> from cogent import LoadTree
120
>>> from cogent.evolve.models import CNFGTR
121
>>> tree = LoadTree('data/primate_brca1.tree')
122
>>> print tree.asciiArt()
125
-root----|--HowlerMon
137
>>> lf = sm.makeLikelihoodFunction(tree, digits=2)
138
>>> lf.setParamRule('omega', tip_names=['Human', 'Orangutan'], outgroup_name='Galago', is_clade=True, init=0.5)
140
We've set an *initial* value for this clade so that the edges affected by this rule are evident below.
145
Likelihood Function Table
146
====================================
148
------------------------------------
149
1.00 1.00 1.00 1.00 1.00
150
------------------------------------
151
=======================================
152
edge parent length omega
153
---------------------------------------
154
Galago root 1.00 1.00
155
HowlerMon root 1.00 1.00
156
Rhesus edge.3 1.00 1.00
157
Orangutan edge.2 1.00 0.50
158
Gorilla edge.1 1.00 0.50
159
Human edge.0 1.00 0.50
160
Chimpanzee edge.0 1.00 0.50
161
edge.0 edge.1 1.00 0.50
162
edge.1 edge.2 1.00 0.50
163
edge.2 edge.3 1.00 1.00
164
edge.3 root 1.00 1.00
165
---------------------------------------...
167
A more extensive description of capabilities is in :ref:`scope-params-on-trees`.
31
169
Specifying parameter values
32
170
---------------------------
36
.. constant, bounds, initial
172
Specifying a parameter as constant
173
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
175
This means the parameter will not be modified during likelihood maximisation. We show this here by making the ``omega`` parameter constant at the value 1 -- essentially the condition of selective neutrality.
179
>>> from cogent import LoadTree
180
>>> from cogent.evolve.models import CNFGTR
181
>>> tree = LoadTree('data/primate_brca1.tree')
183
>>> lf = sm.makeLikelihoodFunction(tree, digits=2)
184
>>> lf.setParamRule('omega', is_const=True)
186
Providing a starting value for a parameter
187
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
189
This can be useful to improve performance, the closer you are to the maximum likelihood estimator the quicker optimisation will be.
193
>>> from cogent import LoadTree
194
>>> from cogent.evolve.models import CNFGTR
195
>>> tree = LoadTree('data/primate_brca1.tree')
197
>>> lf = sm.makeLikelihoodFunction(tree, digits=2)
198
>>> lf.setParamRule('omega', init=0.1)
200
Setting bounds for optimising a function
201
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
203
This can be useful for stopping optimisers from getting stuck in a bad part of parameter space.
207
>>> from cogent import LoadTree
208
>>> from cogent.evolve.models import CNFGTR
209
>>> tree = LoadTree('data/primate_brca1.tree')
211
>>> lf = sm.makeLikelihoodFunction(tree, digits=2)
212
>>> lf.setParamRule('omega', init=0.1, lower=1e-9, upper=20.0)
214
If you set bounds it's a very good idea to set the starting value too. That way you can be sure the starting value lies within the bounds you set. The default parameter value for substitution parameter exchangeability terms is 1.0, so if you set an upper bound of 0.5, you'll get an error (shown below) when you try to optimise the likelihood.
218
>>> from cogent import LoadTree, LoadSeqs
219
>>> from cogent.evolve.models import CNFGTR
220
>>> tree = LoadTree('data/primate_brca1.tree')
222
>>> lf = sm.makeLikelihoodFunction(tree, digits=2)
223
>>> lf.setParamRule('omega', upper=0.5, init=1.0)
224
>>> aln = LoadSeqs('data/primate_brca1.fasta')
225
>>> lf.setAlignment(aln)
227
Traceback (most recent call last):
228
ValueError: Initial parameter values must be valid ...
38
230
Specifying rate heterogeneity functions
39
231
---------------------------------------
233
We extend the simple gamma distributed rate heterogeneity case for nucleotides from above to construction of the actual likelihood function. We do this for 4 bins and constraint the bin probabilities to be equal.
237
>>> from cogent import LoadTree, LoadSeqs
238
>>> from cogent.evolve.models import GTR
239
>>> sm = GTR(with_rate=True, distribution='gamma')
240
>>> tree = LoadTree('data/primate_brca1.tree')
241
>>> lf = sm.makeLikelihoodFunction(tree, bins=4, digits=2)
242
>>> lf.setParamRule('bprobs', is_const=True)
244
For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity`.
43
246
Specifying Phylo-HMMs
44
247
---------------------
251
>>> from cogent import LoadTree, LoadSeqs
252
>>> from cogent.evolve.models import GTR
253
>>> sm = GTR(with_rate=True, distribution='gamma')
254
>>> tree = LoadTree('data/primate_brca1.tree')
255
>>> lf = sm.makeLikelihoodFunction(tree, bins=4, sites_independent=False,
257
>>> lf.setParamRule('bprobs', is_const=True)
259
For more detailed discussion of defining and using these models see :ref:`rate-heterogeneity-hmm`.
48
261
Fitting likelihood functions
49
262
============================
51
264
Choice of optimisers
52
265
--------------------
267
There are 2 types of optimiser: simulated annealing, a *global* optimiser; and Powell, a *local* optimiser. The simulated annealing method is slow compared to Powell and in general Powell is an adequate choice. I setup a simple nucleotide model to illustrate these.
271
>>> from cogent import LoadTree, LoadSeqs
272
>>> from cogent.evolve.models import F81
273
>>> tree = LoadTree('data/primate_brca1.tree')
274
>>> aln = LoadSeqs('data/primate_brca1.fasta')
276
>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)
277
>>> lf.setAlignment(aln)
279
The default is to use the simulated annealing optimiser followed by Powell.
283
>>> lf.optimise(show_progress=False)
285
We can specify just using the local optimiser. To do so, it's recommended to set the ``max_restarts`` argument since this provides a mechanism for Powell to attempt restarting the optimisation from slightly different sport which can help in overcoming local maxima.
289
>>> lf.optimise(local=True, max_restarts=5, show_progress=False)
291
We might want to do crude simulated annealing following by more rigorous Powell.
295
>>> lf.optimise(show_progress=False, global_tolerance=1.0, tolerance=1e-8,
56
298
Checkpointing runs
57
299
------------------
301
See :ref:`checkpointing-optimisation`.
61
303
How to check your optimisation was successful.
62
304
----------------------------------------------
66
.. Try again, use global optimisation, check maximum numbers of calculations not exceeded.
306
There is no guarantee that an optimised function has achieved a global maximum. We can, however, be sure that a maximum was achieved by validating that the optimiser stopped because the specified tolerance condition was met, rather than exceeding the maximum number of evaluations. The latter number is set to ensure optimisation doesn't proceed endlessly. If the optimiser exited because this limit was exceeded you can be sure that the function **has not** been successfully optimised.
308
To take this approach we first need to specify a maximum and second we need to get back the actual calculator object as this records how many evaluations it has done. I set a very small maximum so the optimiser exits too early.
312
>>> from cogent import LoadTree, LoadSeqs
313
>>> from cogent.evolve.models import F81
314
>>> tree = LoadTree('data/primate_brca1.tree')
315
>>> aln = LoadSeqs('data/primate_brca1.fasta')
317
>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)
318
>>> lf.setAlignment(aln)
320
>>> calculator = lf.optimise(show_progress=False,
321
... max_evaluations=max_evals, return_calculator=True)
323
FORCED EXIT from SimulatedAnnealing:
324
Too many function evaluations, results are likely to be poor.
325
You can increase max_evaluations or decrease tolerance...
326
>>> if calculator.evaluations > max_evals:
327
... print 'Failed to optimise'
68
332
Getting statistics out of likelihood functions
69
333
==============================================
73
.. the annotated tree, the tables, getParamValue
338
Log likelihood and number of free parameters
339
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
343
>>> from cogent import LoadTree, LoadSeqs
344
>>> from cogent.evolve.models import GTR
346
>>> tree = LoadTree('data/primate_brca1.tree')
347
>>> lf = sm.makeLikelihoodFunction(tree)
348
>>> aln = LoadSeqs('data/primate_brca1.fasta')
349
>>> lf.setAlignment(aln)
351
We get the log-likelihood and the number of free parameters.
355
>>> lnL = lf.getLogLikelihood()
358
>>> nfp = lf.getNumFreeParams()
362
.. warning:: The number of free parameters (nfp) refers only to the number of parameters that were modifiable by the optimiser. Typically, the degrees-of-freedom of a likelihood ratio test statistic is computed as the difference in nfp between models. This will not be correct for models in which boundary conditions exist (rate heterogeneity models where a parameter value boundary is set between bins).
364
Information theoretic measures
365
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
367
Aikake Information Criterion
368
""""""""""""""""""""""""""""
370
..note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed.
374
>>> from cogent import LoadTree, LoadSeqs
375
>>> from cogent.evolve.models import GTR
377
>>> tree = LoadTree('data/primate_brca1.tree')
378
>>> lf = sm.makeLikelihoodFunction(tree)
379
>>> aln = LoadSeqs('data/primate_brca1.fasta')
380
>>> lf.setAlignment(aln)
381
>>> AIC = lf.getAic()
385
We can also get the second-order AIC.
389
>>> AICc = lf.getAic(second_order=True)
393
Bayesian Information Criterion
394
""""""""""""""""""""""""""""""
396
..note:: this measure only makes sense when the model has been optimised, a step I'm skipping here in the interests of speed.
400
>>> from cogent import LoadTree, LoadSeqs
401
>>> from cogent.evolve.models import GTR
403
>>> tree = LoadTree('data/primate_brca1.tree')
404
>>> lf = sm.makeLikelihoodFunction(tree)
405
>>> aln = LoadSeqs('data/primate_brca1.fasta')
406
>>> lf.setAlignment(aln)
407
>>> BIC = lf.getBic()
411
Getting maximum likelihood estimates
412
------------------------------------
414
We fit the model defined in the previous section and use that in the following.
419
We get the statistics out individually. We get the ``length`` for the Human edge and the exchangeability parameter ``A/G``.
423
>>> lf.optimise(local=True, show_progress=False)
424
>>> a_g = lf.getParamValue('A/G')
427
>>> human = lf.getParamValue('length', 'Human')
431
Just the motif probabilities
432
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
436
>>> mprobs = lf.getMotifProbs()
438
====================================
440
------------------------------------
441
0.2406 0.1742 0.3757 0.2095
442
------------------------------------
447
If written to file in xml format, then model parameters will be saved. This can be useful for later plotting or recreating likelihood functions.
451
>>> annot_tree = lf.getAnnotatedTree()
452
>>> print annot_tree.getXML()
453
<?xml version="1.0"?>
457
<param><name>A/G</name><value>5.25337479406</value></param>
458
<param><name>A/C</name><value>1.23158577782</value></param>
459
<param><name>C/T</name><value>5.96998512452</value></param>
460
<param><name>length</name><value>0.173113656134</value></param>...
462
.. warning:: This method fails for some rate-heterogeneity models.
469
>>> tables = lf.getStatistics(with_motif_probs=True, with_titles=True)
470
>>> for table in tables:
471
... if 'global' in table.Title:
474
==============================================
476
----------------------------------------------
477
1.2316 5.2534 0.9585 2.3158 5.9700
478
----------------------------------------------
75
480
Testing hypotheses
76
481
==================
80
.. LRTs, assuming chisq, bootstrapping, randomisation
483
Using likelihood ratio tests
484
----------------------------
486
We test the molecular clock hypothesis for human and chimpanzee lineages. The null has these two branches constrained to be equal.
490
>>> from cogent import LoadTree, LoadSeqs
491
>>> from cogent.evolve.models import F81
492
>>> tree = LoadTree('data/primate_brca1.tree')
493
>>> aln = LoadSeqs('data/primate_brca1.fasta')
495
>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)
496
>>> lf.setAlignment(aln)
497
>>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'],
498
... outgroup_name='Galago', is_clade=True, is_independent=False)
500
>>> lf.setName('Null Hypothesis')
501
>>> lf.optimise(local=True, show_progress=False)
502
>>> null_lnL = lf.getLogLikelihood()
503
>>> null_nfp = lf.getNumFreeParams()
506
==========================
508
--------------------------
512
Orangutan edge.2 0.008
515
Chimpanzee edge.0 0.004
516
edge.0 edge.1 0.000...
518
The alternate allows the human and chimpanzee branches to differ by just setting all lengths to be independent.
522
>>> lf.setParamRule('length', is_independent=True)
523
>>> lf.setName('Alt Hypothesis')
524
>>> lf.optimise(local=True, show_progress=False)
525
>>> alt_lnL = lf.getLogLikelihood()
526
>>> alt_nfp = lf.getNumFreeParams()
529
==========================
531
--------------------------
535
Orangutan edge.2 0.008
538
Chimpanzee edge.0 0.003
539
edge.0 edge.1 0.000...
541
We import the function for computing the probability of a chi-square test statistic, compute the likelihood ratio test statistic, degrees of freedom and the corresponding probability.
545
>>> from cogent.maths.stats import chisqprob
546
>>> LR = 2 * (alt_lnL - null_lnL) # the likelihood ratio statistic
547
>>> df = (alt_nfp - null_nfp) # the test degrees of freedom
548
>>> p = chisqprob(LR, df)
549
>>> print 'LR=%.4f ; df = %d ; p=%.4f' % (LR, df, p)
550
LR=3.3294 ; df = 1 ; p=0.0681
552
By parametric bootstrapping
553
---------------------------
555
If we can't rely on the asymptotic behaviour of the LRT, e.g. due to small alignment length, we can use a parametric bootstrap. Convenience functions for that are described in more detail here :ref:`parametric-bootstrap`.
557
In general, however, this capability derives from the ability of any defined ``evolve`` likelihood function to simulate an alignment. This property is provided as ``simulateAlignment`` method on likelihood function objects.
561
>>> from cogent import LoadTree, LoadSeqs
562
>>> from cogent.evolve.models import F81
563
>>> tree = LoadTree('data/primate_brca1.tree')
564
>>> aln = LoadSeqs('data/primate_brca1.fasta')
566
>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)
567
>>> lf.setAlignment(aln)
568
>>> lf.setParamRule('length', tip_names=['Human', 'Chimpanzee'],
569
... outgroup_name='Galago', is_clade=True, is_independent=False)
571
>>> lf.setName('Null Hypothesis')
572
>>> lf.optimise(local=True, show_progress=False)
573
>>> sim_aln = lf.simulateAlignment()
574
>>> print repr(sim_aln)
575
7 x 2814 dna alignment: Gorilla...
82
577
Determining confidence intervals on MLEs
83
578
========================================
580
The profile method is used to calculate a confidence interval for a named parameter. We show it here for a global substitution model exchangeability parameter (*kappa*, the ratio of transition to transversion rates) and for an edge specific parameter (just the human branch length).
584
>>> from cogent import LoadTree, LoadSeqs
585
>>> from cogent.evolve.models import HKY85
586
>>> tree = LoadTree('data/primate_brca1.tree')
587
>>> aln = LoadSeqs('data/primate_brca1.fasta')
589
>>> lf = sm.makeLikelihoodFunction(tree)
590
>>> lf.setAlignment(aln)
591
>>> lf.optimise(local=True, show_progress=False)
592
>>> kappa_lo, kappa_mle, kappa_hi = lf.getParamInterval('kappa')
593
>>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (kappa_lo, kappa_mle, kappa_hi)
594
lo=3.78 ; mle=4.44 ; hi = 5.22
595
>>> human_lo, human_mle, human_hi = lf.getParamInterval('length', 'Human')
596
>>> print "lo=%.2f ; mle=%.2f ; hi = %.2f" % (human_lo, human_mle, human_hi)
597
lo=0.00 ; mle=0.01 ; hi = 0.01
602
Use either the annotated tree or statistics tables to obtain objects that can easily be written to file.
92
604
Visualising statistics on trees
93
605
===============================
607
We look at the distribution of ``omega`` from the CNF codon model family across different primate lineages. We allow each edge to have an independent value for ``omega``.
611
>>> from cogent import LoadTree, LoadSeqs
612
>>> from cogent.evolve.models import CNFGTR
613
>>> tree = LoadTree('data/primate_brca1.tree')
614
>>> aln = LoadSeqs('data/primate_brca1.fasta')
616
>>> lf = sm.makeLikelihoodFunction(tree, digits=2, space=2)
617
>>> lf.setParamRule('omega', is_independent=True, upper = 10.0)
618
>>> lf.setAlignment(aln)
619
>>> lf.optimise(show_progress=False, local=True)
621
Likelihood Function Table
622
============================
624
----------------------------
625
1.07 3.88 0.79 1.96 4.09
626
----------------------------
627
=================================
628
edge parent length omega
629
---------------------------------
630
Galago root 0.53 0.85
631
HowlerMon root 0.14 0.71
632
Rhesus edge.3 0.07 0.58
633
Orangutan edge.2 0.02 0.49
634
Gorilla edge.1 0.01 0.43
635
Human edge.0 0.02 2.44
636
Chimpanzee edge.0 0.01 2.28
637
edge.0 edge.1 0.00 1.04
638
edge.1 edge.2 0.01 0.55
639
edge.2 edge.3 0.04 0.33
640
edge.3 root 0.02 1.10...
642
We need an annotated tree object to do the drawing, we write this out to an XML formatted file so it can be reloaded for later reuse.
646
>>> annot_tree = lf.getAnnotatedTree()
647
>>> annot_tree.writeToFile('result_tree.xml')
649
We first import an unrooted dendrogram and then generate a heat mapped image to file where edges are colored red by the magnitude of ``omega`` with maximal saturation when ``omega=1``.
653
>>> from cogent.draw.dendrogram import ContemporaneousDendrogram
654
>>> dend = ContemporaneousDendrogram(annot_tree)
655
>>> fig = dend.makeFigure(height=6, width=6, shade_param='omega',
656
... max_value=1.0, stroke_width=2)
657
>>> fig.savefig('omega_heat_map.png')
97
659
Reconstructing ancestral sequences
98
660
==================================
102
.. most likely ancestors, the complete posterior probabilities
662
We first fit a likelihood function.
666
>>> from cogent import LoadTree, LoadSeqs
667
>>> from cogent.evolve.models import F81
668
>>> tree = LoadTree('data/primate_brca1.tree')
669
>>> aln = LoadSeqs('data/primate_brca1.fasta')
671
>>> lf = sm.makeLikelihoodFunction(tree, digits=3, space=2)
672
>>> lf.setAlignment(aln)
673
>>> lf.optimise(show_progress=False, local=True)
675
We then get the most likely ancestral sequences.
679
>>> ancestors = lf.likelyAncestralSeqs()
682
TGTGGCACAAATACTCATGCCAGCTCATTACAGCA...
684
Or we can get the posterior probabilities (returned as a ``DictArray``) of sequence states at each node.
688
>>> ancestral_probs = lf.reconstructAncestralSeqs()
689
>>> print ancestral_probs['root']
690
============================================
692
--------------------------------------------
693
0 0.1816 0.0000 0.0000 0.0000
694
1 0.0000 0.0000 0.0000 0.1561
695
2 0.1816 0.0000 0.0000 0.0000
696
3 0.0000 0.0000 0.0000 0.1561...
104
698
Tips for improved performance
105
699
=============================
109
701
Sequentially build the fitting
110
702
------------------------------
114
.. start with null, then modify lf to alternate. Don't forget to record the values you need.
116
.. how to specify the alt so it is the null for rate heterogeneity models
704
There's nothing that improves performance quite like being close to the maximum likelihood values. So using the ``setParamRule`` method to provide good starting values can be very useful. As this can be difficult to do one easy way is to build simpler models that are nested within the one you're interested in. Fitting those models and then relaxing constraints until you’re at the parameterisation of interest can markedly improve optimisation speed.
706
Being able to save results to file allows you to do this between sessions.
123
.. using a subset of data
711
If you're dealing with a very large alignment, another approach is to use a subset of the alignment to fit the model then try fitting the entire alignment. The alignment method does have an method to facilitate this approach. The following samples 99 codons without replacement.
715
>>> from cogent import LoadSeqs
716
>>> aln = LoadSeqs('data/primate_brca1.fasta')
717
>>> smpl = aln.sample(n=99, with_replacement=False, motif_length=3)
721
While this samples 99 nucleotides without replacement.
725
>>> smpl = aln.sample(n=99, with_replacement=False)
729
.. following cleans up files
734
>>> from cogent.util.misc import remove_files
735
>>> remove_files(['result_tree.xml', 'omega_heat_map.png'],
736
... error_on_missing=False)