1
.. _denoising_and_chimera_detection_usage_comparison.rst:
3
Denoising and chimera detection usage differences in QIIME
4
----------------------------------------------------------
6
This tutorial covers some of the main differences in the utilization of the various denoising and chimera detection software implemented in QIIME.
8
The `overview tutorial <tutorial.html>`_ describes the steps one would use to process 454 data without denoising or chimera detection. The data processing can be roughly summarized as the following:
10
1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing
12
Differences from the default pipeline listed above will be used to describe how each denoising/chimera detection software integrates into the QIIME software package.
17
Ampliconnoise uses flowgram files generated from SFF files to denoise 454 data and optionally detect chimeras. See script details here: `ampliconnoise.py <../scripts/ampliconnoise.html>`_
19
Ampliconnoise effectively replaces the demultiplexing/quality filtering step above, making the pipeline this:
21
1. SFF (raw 454 data) -> 2. flowgram (sff.txt) -> 3. ampliconnoise.py (plus suggested step of reverse primer removal) -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing
23
Barcodes and forward primers are removed by ampliconnoise.py, however, reverse primers at the end of the sequence may be retained, so it is strongly recommended that `truncate_reverse_primer.py <../scripts/truncate_reverse_primer.html>`_ be run immediately after ampliconnoise.py so the reverse primer and subsequent sequence does not interfere with downstream steps.
29
`Denoiser <denoising_454_data.html>`_ also utilizes flowgram files to detect and correct sequencing errors (but not chimeras). However, it utilizes the output of split_libraries.py to limit the sequences tested to those present in the output fasta file generated by split_libraries.py. Reverse primer removal with `truncate_reverse_primer.py <../scripts/truncate_reverse_primer.html>`_ is also strongly encouraged. The steps involved are:
31
1. SFF (raw 454 data) -> 2. fasta/qual/flowgram (sff.txt) files -> 3. split_libraries.py -> 4. denoise_wrapper.py (plus suggested step of reverse primer removal) -> 4. OTU picking -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing
36
`Usearch <usearch_quality_filter.html>`_ uses cluster abundance for `de novo` chimera detection, a reference sequence set for reference based chimera detection, and a cluster size filtering step (which is similar to filtering singletons as a rough but fast way to remove noise from data), and clusters sequences into OTUs. Usearch is used after demultiplexing sequences, so the steps for processing data are:
38
1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking/chimera detection/low abundance cluster filtering with usearch implementation in pick_otus.py -> 5. representative sequences -> 6. taxonomic assignments/tree building -> 7. OTU table and downstream processing
42
`ChimeraSlayer <chimera_checking.html>`_ utilizes a reference dataset to detect potential chimeras in a representative sequence set. The processing pipeline is:
44
1. SFF (raw 454 data) -> 2. fasta/qual files -> 3. demultiplexing/quality filtering -> 4. OTU picking -> 5. representative sequences -> 6. Chimera detection with identify_chimeric_seqs.py -> 7. Filter chimeras as described `here <chimera_checking.html>`_. -> 8. taxonomic assignments/tree building -> 9. OTU table and downstream processing