20
20
options_lookup = get_options_lookup()
22
script_info['brief_description']="""Takes a directory and a mapping file of SampleIDs to fasta file names, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels."""
23
script_info['script_description']="""A tab separated text file with SampleIDs
22
script_info['brief_description']="""Takes a directory, a metadata mapping file, and a column name that contains the fasta file names that SampleIDs are associated with, combines all files that have valid fasta extensions into a single fasta file, with valid QIIME fasta labels."""
23
script_info['script_description']="""A metadata mapping file with SampleIDs
24
24
and fasta file names (just the file name itself, not the full or relative
25
25
filepath) is used to generate a combined fasta file with valid
26
26
QIIME labels based upon the SampleIDs specified in the mapping file.
28
See: http://qiime.org/documentation/file_formats.html#metadata-mapping-files
29
for details about the metadata file format.
32
#SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description
33
Sample.1 AAAACCCCGGGG CTACATAATCGGRATT seqs1.fna sample.1
34
Sample.2 TTTTGGGGAAAA CTACATAATCGGRATT seqs2.fna sample.2
28
36
This script is to handle situations where fasta data comes already
29
demultiplexed into a one fasta file per sample basis. Apart from altering
30
the fasta label to add a QIIME compatible label at the beginning (example:
37
demultiplexed into a one fasta file per sample basis. Only alters
38
the fasta label to add a QIIME compatible label at the beginning.
41
With the metadata mapping file above, and an specified directory containing the
42
files seqs1.fna and seqs2.fna, the first line from the seqs1.fna file might
31
44
>FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
33
>control.sample_1 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
45
AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
35
Note that limited checking is done on the mapping file. The only tests
36
are that every fasta file name is unique, and that SampleIDs are
37
MIMARKS compliant (alphanumeric and period characters only). Duplicate
38
SampleIDs are allowed, so care should be taken that there are no typos.
47
and in the output combined fasta file would be written like this
48
>Sample.1_0 FLP3FBN01ELBSX length=250 xy=1766_0111 region=1 run=R_2008_12_09_13_51_01_
49
AACAGATTAGACCAGATTAAGCCGAGATTTACCCGA
40
51
No changes are made to the sequences.
42
53
script_info['script_usage']=[]
43
script_info['script_usage'].append(("""Example:""","""Specify fasta_dir as the input directory of fasta files, use the SampleID to fasta file mapping file example_mapping.txt, start enumerating with 1000000 following SampleIDs, and output the data to the directory combined_fasta""","""add_qiime_labels.py -i fasta_dir -m example_mapping.txt -n 1000000 -o combined_fasta"""))
44
script_info['output_description']="""A combined_seqs.fasta file will be created in the output directory"""
54
script_info['script_usage'].append(("""Example:""","""Specify fasta_dir as the input directory of fasta files, use the metadata mapping file example_mapping.txt, with the metadata fasta file name column specified as InputFileName, start enumerating with 1000000, and output the data to the directory combined_fasta""","""%prog -i fasta_dir -m example_mapping.txt -c InputFileName -n 1000000 -o combined_fasta"""))
55
script_info['output_description']="""A combined_seqs.fasta file will be created in the output directory, with the sequences assigned to the SampleID given in the metadata mapping file."""
45
56
script_info['required_options']= [\
46
57
make_option('-m', '--mapping_fp',type='existing_filepath',
47
58
help='SampleID to fasta file name mapping file filepath'),
48
make_option('-i', '--fasta_dir',
49
help='Directory of fasta files to combine and label.')
59
make_option('-i', '--fasta_dir',type='existing_dirpath',
60
help='Directory of fasta files to combine and label.'),
61
make_option('-c', '--filename_column', type=str,
62
help='Specify column used in metadata mapping file for '+\
52
66
script_info['optional_options']= [\
53
make_option('-o', '--output_dir',
67
make_option('-o', '--output_dir',type='new_dirpath',
54
68
help='Required output directory for log file and corrected mapping '+\
55
'file, log file, and html file. [default: %default]', default="./"),
69
'file, log file, and html file. [default: %default]', default="."),
56
70
make_option('-n', '--count_start',
57
71
help='Specify the number to start enumerating sequence labels with. '+\
58
72
'[default: %default]', default=0, type="int")