3
3
.. index:: add_qiime_labels.py
5
*add_qiime_labels.py* -- Takes a directory and a mapping file of SampleIDs to fasta file names, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels.
5
*add_qiime_labels.py* -- Takes a directory, a metadata mapping file, and a column name that contains the fasta file names that SampleIDs are associated with, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels.
6
6
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10
A tab separated text file with SampleIDs
10
A metadata mapping file with SampleIDs
11
11
and fasta file names (just the file name itself, not the full or relative
12
12
filepath) is used to generate a combined fasta file with valid
13
13
QIIME labels based upon the SampleIDs specified in the mapping file.
15
See: http://qiime.org/documentation/file_formats.html#metadata-mapping-files
16
for details about the metadata file format.
19
#SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description
20
Sample.1 AAAACCCCGGGG CTACATAATCGGRATT seqs1.fna sample.1
21
Sample.2 TTTTGGGGAAAA CTACATAATCGGRATT seqs2.fna sample.2
15
23
This script is to handle situations where fasta data comes already
16
demultiplexed into a one fasta file per sample basis. Apart from altering
17
the fasta label to add a QIIME compatible label at the beginning (example:
24
demultiplexed into a one fasta file per sample basis. Only alters
25
the fasta label to add a QIIME compatible label at the beginning.
28
With the metadata mapping file above, and an specified directory containing the
29
files seqs1.fna and seqs2.fna, the first line from the seqs1.fna file might
18
31
>FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
20
>control.sample_1 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
32
AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
22
Note that limited checking is done on the mapping file. The only tests
23
are that every fasta file name is unique, and that SampleIDs are
24
MIMARKS compliant (alphanumeric and period characters only). Duplicate
25
SampleIDs are allowed, so care should be taken that there are no typos.
34
and in the output combined fasta file would be written like this
35
>Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
36
AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
27
38
No changes are made to the sequences.
41
52
SampleID to fasta file name mapping file filepath
43
54
Directory of fasta files to combine and label.
55
-c, `-`-filename_column
56
Specify column used in metadata mapping file for fasta file names.
48
Required output directory for log file and corrected mapping file, log file, and html file. [default: ./]
61
Required output directory for log file and corrected mapping file, log file, and html file. [default: .]
49
62
-n, `-`-count_start
50
63
Specify the number to start enumerating sequence labels with. [default: 0]
55
A combined_seqs.fasta file will be created in the output directory
68
A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file.
60
Specify fasta_dir as the input directory of fasta files, use the SampleID to fasta file mapping file example_mapping.txt, start enumerating with 1000000 following SampleIDs, and output the data to the directory combined_fasta
73
Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fasta
64
add_qiime_labels.py -i fasta_dir -m example_mapping.txt -n 1000000 -o combined_fasta
77
add_qiime_labels.py -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fasta