103
103
otherwise only alignments overlapping the specified regions
104
104
will be output. An alignment may be given multiple times if
105
105
it is overlapping several regions. A region can be presented,
106
for example, in the following format: `chr2', `chr2:1000000'
107
or `chr2:1,000,000-2,000,000'. The coordinate is 1-based.
106
for example, in the following format: `chr2' (the whole
107
chr2), `chr2:1000000' (region starting from 1,000,000bp) or
108
`chr2:1,000,000-2,000,000' (region between 1,000,000 and
109
2,000,000bp including the end points). The coordinate is
111
114
-b Output in the BAM format.
113
116
-u Output uncompressed BAM. This option saves time spent
114
on compression/decomprssion and is thus preferred
117
on compression/decomprssion and is thus preferred
115
118
when the output is piped to another samtools command.
117
120
-h Include the header in the output.
119
122
-H Output the header only.
121
-S Input is in SAM. If @SQ header lines are absent, the
124
-S Input is in SAM. If @SQ header lines are absent, the
122
125
`-t' option is required.
124
-t FILE This file is TAB-delimited. Each line must contain
125
the reference name and the length of the reference,
126
one line for each distinct reference; additional
127
fields are ignored. This file also defines the order
128
of the reference sequences in sorting. If you run
129
`samtools faidx <ref.fa>', the resultant index file
130
<ref.fa>.fai can be used as this <in.ref_list> file.
127
-t FILE This file is TAB-delimited. Each line must contain
128
the reference name and the length of the reference,
129
one line for each distinct reference; additional
130
fields are ignored. This file also defines the order
131
of the reference sequences in sorting. If you run
132
`samtools faidx <ref.fa>', the resultant index file
133
<ref.fa>.fai can be used as this <in.ref_list> file.
132
135
-o FILE Output file [stdout]
134
-f INT Only output alignments with all bits in INT present
137
-f INT Only output alignments with all bits in INT present
135
138
in the FLAG field. INT can be in hex in the format of
136
139
/^0x[0-9A-F]+/ [0]
147
150
faidx samtools faidx <ref.fasta> [region1 [...]]
149
Index reference sequence in the FASTA format or extract sub-
150
sequence from indexed reference sequence. If no region is
152
Index reference sequence in the FASTA format or extract sub-
153
sequence from indexed reference sequence. If no region is
151
154
specified, faidx will index the file and create
152
<ref.fasta>.fai on the disk. If regions are speficified, the
153
subsequences will be retrieved and printed to stdout in the
154
FASTA format. The input file can be compressed in the RAZF
155
<ref.fasta>.fai on the disk. If regions are speficified, the
156
subsequences will be retrieved and printed to stdout in the
157
FASTA format. The input file can be compressed in the RAZF
158
pileup samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l
159
in.site_list] [-iscgS2] [-T theta] [-N nHap] [-r
161
pileup samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l
162
in.site_list] [-iscgS2] [-T theta] [-N nHap] [-r
160
163
pairDiffRate] <in.bam>|<in.sam>
162
Print the alignment in the pileup format. In the pileup for-
163
mat, each line represents a genomic position, consisting of
165
Print the alignment in the pileup format. In the pileup for-
166
mat, each line represents a genomic position, consisting of
164
167
chromosome name, coordinate, reference base, read bases, read
165
qualities and alignment mapping qualities. Information on
168
qualities and alignment mapping qualities. Information on
166
169
match, mismatch, indel, strand, mapping quality and start and
167
end of a read are all encoded at the read base column. At
168
this column, a dot stands for a match to the reference base
169
on the forward strand, a comma for a match on the reverse
170
strand, `ACGTN' for a mismatch on the forward strand and
171
`acgtn' for a mismatch on the reverse strand. A pattern
172
`\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion
173
between this reference position and the next reference posi-
174
tion. The length of the insertion is given by the integer in
175
the pattern, followed by the inserted sequence. Similarly, a
170
end of a read are all encoded at the read base column. At
171
this column, a dot stands for a match to the reference base
172
on the forward strand, a comma for a match on the reverse
173
strand, `ACGTN' for a mismatch on the forward strand and
174
`acgtn' for a mismatch on the reverse strand. A pattern
175
`\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion
176
between this reference position and the next reference posi-
177
tion. The length of the insertion is given by the integer in
178
the pattern, followed by the inserted sequence. Similarly, a
176
179
pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the
177
reference. The deleted bases will be presented as `*' in the
178
following lines. Also at the read base column, a symbol `^'
179
marks the start of a read segment which is a contiguous sub-
180
sequence on the read separated by `N/S/H' CIGAR operations.
181
The ASCII of the character following `^' minus 33 gives the
182
mapping quality. A symbol `$' marks the end of a read seg-
180
reference. The deleted bases will be presented as `*' in the
181
following lines. Also at the read base column, a symbol `^'
182
marks the start of a read segment which is a contiguous sub-
183
sequence on the read separated by `N/S/H' CIGAR operations.
184
The ASCII of the character following `^' minus 33 gives the
185
mapping quality. A symbol `$' marks the end of a read seg-
185
If option -c is applied, the consensus base, consensus qual-
186
ity, SNP quality and RMS mapping quality of the reads cover-
187
ing the site will be inserted between the `reference base'
188
and the `read bases' columns. An indel occupies an additional
189
line. Each indel line consists of chromosome name, coordi-
190
nate, a star, the genotype, consensus quality, SNP quality,
188
If option -c is applied, the consensus base, Phred-scaled
189
consensus quality, SNP quality (i.e. the Phred-scaled proba-
190
bility of the consensus being identical to the reference) and
191
root mean square (RMS) mapping quality of the reads covering
192
the site will be inserted between the `reference base' and
193
the `read bases' columns. An indel occupies an additional
194
line. Each indel line consists of chromosome name, coordi-
195
nate, a star, the genotype, consensus quality, SNP quality,
191
196
RMS mapping quality, # covering reads, the first alllele, the
192
second allele, # reads supporting the first allele, # reads
193
supporting the second allele and # reads containing indels
197
second allele, # reads supporting the first allele, # reads
198
supporting the second allele and # reads containing indels
194
199
different from the top two alleles.
199
-s Print the mapping quality as the last column. This
200
option makes the output easier to parse, although
204
-s Print the mapping quality as the last column. This
205
option makes the output easier to parse, although
201
206
this format is not space efficient.
207
212
-i Only output pileup lines containing indels.
210
-f FILE The reference sequence in the FASTA format. Index
215
-f FILE The reference sequence in the FASTA format. Index
211
216
file FILE.fai will be created if absent.
214
219
-M INT Cap mapping quality at INT [60]
217
-t FILE List of reference names ane sequence lengths, in
218
the format described for the import command. If
219
this option is present, samtools assumes the input
222
-t FILE List of reference names ane sequence lengths, in
223
the format described for the import command. If
224
this option is present, samtools assumes the input
220
225
<in.alignment> is in SAM format; otherwise it
221
226
assumes in BAM format.
224
-l FILE List of sites at which pileup is output. This file
225
is space delimited. The first two columns are
226
required to be chromosome and 1-based coordinate.
227
Additional columns are ignored. It is recommended
229
-l FILE List of sites at which pileup is output. This file
230
is space delimited. The first two columns are
231
required to be chromosome and 1-based coordinate.
232
Additional columns are ignored. It is recommended
228
233
to use option -s together with -l as in the default
229
234
format we may not know the mapping quality.
232
-c Call the consensus sequence using MAQ consensus
237
-c Call the consensus sequence using MAQ consensus
233
238
model. Options -T, -N, -I and -r are only effective
234
239
when -c or -g is in use.
237
-g Generate genotype likelihood in the binary GLFv3
242
-g Generate genotype likelihood in the binary GLFv3
238
243
format. This option suppresses -c, -i and -s.
241
-T FLOAT The theta parameter (error dependency coefficient)
246
-T FLOAT The theta parameter (error dependency coefficient)
242
247
in the maq consensus calling model [0.85]
245
250
-N INT Number of haplotypes in the sample (>=2) [2]
248
-r FLOAT Expected fraction of differences between a pair of
253
-r FLOAT Expected fraction of differences between a pair of
249
254
haplotypes [0.001]
252
-I INT Phred probability of an indel in sequencing/prep.
257
-I INT Phred probability of an indel in sequencing/prep.
257
262
tview samtools tview <in.sorted.bam> [ref.fasta]
259
Text alignment viewer (based on the ncurses library). In the
260
viewer, press `?' for help and press `g' to check the align-
261
ment start from a region in the format like
264
Text alignment viewer (based on the ncurses library). In the
265
viewer, press `?' for help and press `g' to check the align-
266
ment start from a region in the format like
262
267
`chr10:10,000,000'.
266
270
fixmate samtools fixmate <in.nameSrt.bam> <out.bam>
268
272
Fill in mate coordinates, ISIZE and mate related flags from a
272
276
rmdup samtools rmdup <input.srt.bam> <out.bam>
274
Remove potential PCR duplicates: if multiple read pairs have
275
identical external coordinates, only retain the pair with
276
highest mapping quality. This command ONLY works with FR
278
Remove potential PCR duplicates: if multiple read pairs have
279
identical external coordinates, only retain the pair with
280
highest mapping quality. This command ONLY works with FR
277
281
orientation and requires ISIZE is correctly set.
281
284
rmdupse samtools rmdupse <input.srt.bam> <out.bam>
283
286
Remove potential duplicates for single-ended reads. This com-
284
mand will treat all reads as single-ended even if they are
287
mand will treat all reads as single-ended even if they are
289
291
fillmd samtools fillmd [-e] <aln.bam> <ref.fasta>
291
Generate the MD tag. If the MD tag is already present, this
292
command will give a warning if the MD tag generated is dif-
293
Generate the MD tag. If the MD tag is already present, this
294
command will give a warning if the MD tag generated is dif-
293
295
ferent from the existing tag.
297
-e Convert a the read base to = if it is identical to
298
the aligned reference base. Indel caller does not
299
-e Convert a the read base to = if it is identical to
300
the aligned reference base. Indel caller does not
299
301
support the = bases at the moment.
304
SAM is TAB-delimited. Apart from the header lines, which are started
306
SAM is TAB-delimited. Apart from the header lines, which are started
305
307
with the `@' symbol, each alignment line consists of: