1
#########################################
3
#########################################
5
By far, the most common question asked of two sets of genomic features is whether or not any of the
6
features in the two sets "overlap" with one another. This is known as feature intersection. **bedtools intersect**
7
allows one to screen for overlaps between two sets of genomic features. Moreover, it allows one to have
8
fine control as to how the intersections are reported. **bedtools intersect** works with both BED/GFF/VCF
9
and BAM files as input.
11
===============================
12
5.1.1 Usage and option summary
13
===============================
17
bedtools intersect [OPTIONS] [-a <BED/GFF/VCF> || -abam <BAM>] -b <BED/GFF/VCF>
19
intersectBed [OPTIONS] [-a <BED/GFF/VCF> || -abam <BAM>] -b <BED/GFF/VCF>
23
=========================== =========================================================================================================================================================
25
=========================== =========================================================================================================================================================
26
**-a** BED/GFF/VCF file A. Each feature in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe.
27
**-b** BED/GFF/VCF file B. Use "stdin" if passing B with a UNIX pipe.
28
**-abam** BAM file A. Each BAM alignment in A is compared to B in search of overlaps. Use "stdin" if passing A with a UNIX pipe: For example: samtools view -b <BAM> | bedtools intersect -abam stdin -b genes.bed
29
**-ubam** Write uncompressed BAM output. The default is write compressed BAM output.
30
**-bed** When using BAM input (-abam), write output as BED. The default is to write output in BAM when using -abam. For example: bedtools intersect -abam reads.bam -b genes.bed -bed
31
**-wa** Write the original entry in A for each overlap.
32
**-wb** Write the original entry in B for each overlap. Useful for knowing what A overlaps. Restricted by -f and -r.
33
**-wo** Write the original A and B entries plus the number of base pairs of overlap between the two features. Only A features with overlap are reported. Restricted by -f and -r.
34
**-wao** Write the original A and B entries plus the number of base pairs of overlap between the two features. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0. Restricted by -f and -r.
35
**-u** Write original A entry once if any overlaps found in B. In other words, just report the fact at least one overlap was found in B. Restricted by -f and -r.
36
**-c** For each entry in A, report the number of hits in B while restricting to -f. Reports 0 for A entries that have no overlap with B. Restricted by -f and -r.
37
**-v** Only report those entries in A that have no overlap in B. Restricted by -f and -r.
38
**-f** Minimum overlap required as a fraction of A. Default is 1E-9 (i.e. 1bp).
39
**-r** Require that the fraction of overlap be reciprocal for A and B. In other words, if -f is 0.90 and -r is used, this requires that B overlap at least 90% of A and that A also overlaps at least 90% of B.
40
**-s** Force "strandedness". That is, only report hits in B that overlap A on the same strand. By default, overlaps are reported without respect to strand.
41
**-split** Treat "split" BAM (i.e., having an "N" CIGAR operation) or BED12 entries as distinct BED intervals.
42
=========================== =========================================================================================================================================================
45
===============================
46
5.1.2 Default behavior
47
===============================
48
By default, if an overlap is found, **bedtools intersect** reports the shared interval between the two
60
bedtools intersect -a A.bed -b B.bed
65
a = """chr1 10 20\nchr1 30 40"""
68
title = "bedtools intersect -a A.bed -b B.bed"
69
from matplotlib.pyplot import show
70
from pyplots.plotter import plot_a_b_tool
71
plot_a_b_tool(a, b, 'intersect', title, 'A.bed', 'B.bed')
74
=============================================
75
5.1.3 (-wa) Reporting the original A feature
76
=============================================
77
Instead, one can force **bedtools intersect** to report the *original* **"A"** feature when an overlap is found. As
78
shown below, the entire "A" feature is reported, not just the portion that overlaps with the "B" feature.
89
bedtools intersect -a A.bed -b B.bed -wa
94
a = """chr1 10 20\nchr1 30 40"""
97
title = "bedtools intersect -a A.bed -b B.bed -wa"
98
from matplotlib.pyplot import show
99
from pyplots.plotter import plot_a_b_tool
100
plot_a_b_tool(a, b, 'intersect', title, 'A.bed', 'B.bed', wa=True)
104
=============================================
105
5.1.4 (-wb) Reporting the original B feature
106
=============================================
107
Similarly, one can force **bedtools intersect** to report the *original* **"B"** feature when an overlap is found. If
108
just -wb is used, the overlapping portion of A will be reported followed by the *original* **"B"**. If both -wa
109
and -wb are used, the *originals* of both **"A"** and **"B"** will be reported.
111
For example (-wb alone):
122
bedtools intersect -a A.bed -b B.bed -wb
135
bedtools intersect -a A.bed -b B.bed -wa -wb
138
=======================================================================
139
5.1.5 (-u) Reporting the presence of *at least one* overlapping feature
140
=======================================================================
141
Frequently a feature in "A" will overlap with multiple features in "B". By default, **bedtools intersect** will
142
report each overlap as a separate output line. However, one may want to simply know that there is at
143
least one overlap (or none). When one uses the -u option, "A" features that overlap with one or more
144
"B" features are reported once. Those that overlap with no "B" features are not reported at all.
147
For example (*without* -u):
157
bedtools intersect -a A.bed -b B.bed -wb
161
For example (*with* -u):
171
bedtools intersect -a A.bed -b B.bed -u
174
=======================================================================
175
5.1.6 (-c) Reporting the number of overlapping features
176
=======================================================================
177
The -c option reports a column after each "A" feature indicating the *number* (0 or more) of overlapping
178
features found in "B". Therefore, *each feature in A is reported once*.
190
bedtools intersect -a A.bed -b B.bed -u
195
=======================================================================
196
5.1.7 (-v) Reporting the absence of any overlapping features
197
=======================================================================
198
There will likely be cases where you'd like to know which "A" features do not overlap with any of the
199
"B" features. Perhaps you'd like to know which SNPs don't overlap with any gene annotations. The -v
200
(an homage to "grep -v") option will only report those "A" features that have no overlaps in "B".
211
bedtools intersect -a A.bed -b B.bed -v
216
a = """chr1 10 20\nchr1 30 40"""
219
title = "bedtools intersect -a A -b B -v"
220
from matplotlib.pyplot import show
221
from pyplots.plotter import plot_a_b_tool
222
plot_a_b_tool(a, b, 'intersect', title, 'A.bed', 'B.bed', v=True)
226
=======================================================================
227
5.1.8 (-f) Requiring a minimal overlap fraction
228
=======================================================================
229
By default, **bedtools intersect** will report an overlap between A and B so long as there is at least one base
230
pair is overlapping. Yet sometimes you may want to restrict reported overlaps between A and B to cases
231
where the feature in B overlaps at least X% (e.g. 50%) of the A feature. The -f option does exactly
234
For example (note that the second B entry is not reported):
243
bedtools intersect -a A.bed -b B.bed -f 0.50 -wa -wb
244
chr1 100 200 chr1 130 201
246
==========================================================================
247
5.1.9 (-r, combined with -f)Requiring reciprocal minimal overlap fraction
248
==========================================================================
249
Similarly, you may want to require that a minimal fraction of both the A and the B features is
250
overlapped. For example, if feature A is 1kb and feature B is 1Mb, you might not want to report the
251
overlap as feature A can overlap at most 1% of feature B. If one set -f to say, 0.02, and one also
252
enable the -r (reciprocal overlap fraction required), this overlap would not be reported.
254
For example (note that the second B entry is not reported):
263
bedtools intersect -a A.bed -b B.bed -f 0.50 -r -wa -wb
264
chr1 100 200 chr1 130 201
266
==========================================================================
267
5.1.10 (-s)Enforcing "strandedness"
268
==========================================================================
269
By default, **bedtools intersect** will report overlaps between features even if the features are on opposite
270
strands. However, if strand information is present in both BED files and the "-s" option is used, overlaps
271
will only be reported when features are on the same strand.
273
For example (note that the second B entry is not reported):
276
chr1 100 200 a1 100 +
279
chr1 130 201 b1 100 -
280
chr1 130 201 b2 100 +
282
bedtools intersect -a A.bed -b B.bed -wa -wb -s
283
chr1 100 200 a1 100 + chr1 130 201 b2 100 +
286
==========================================================================
287
5.1.11 (-abam)Default behavior when using BAM input
288
==========================================================================
289
When comparing alignments in BAM format (**-abam**) to features in BED format (**-b**), **bedtools intersect**
290
will, **by default**, write the output in BAM format. That is, each alignment in the BAM file that meets
291
the user's criteria will be written (to standard output) in BAM format. This serves as a mechanism to
292
create subsets of BAM alignments are of biological interest, etc. Note that only the mate in the BAM
293
alignment is compared to the BED file. Thus, if only one end of a paired-end sequence overlaps with a
294
feature in B, then that end will be written to the BAM output. By contrast, the other mate for the
295
pair will not be written. One should use **pairToBed(Section 5.2)** if one wants each BAM alignment
296
for a pair to be written to BAM output.
300
bedtools intersect -abam reads.unsorted.bam -b simreps.bed | samtools view - | head -3
302
BERTHA_0001:3:1:15:1362#0 99 chr4 9236904 0 50M = 9242033 5 1 7 9
303
AGACGTTAACTTTACACACCTCTGCCAAGGTCCTCATCCTTGTATTGAAG W c T U ] b \ g c e g X g f c b f c c b d d g g V Y P W W _
304
\c`dcdabdfW^a^gggfgd XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:19 X1:i:2 XM:i:0 XO:i:0 XG:i:0 MD:Z:50
305
BERTHA _0001:3:1:16:994#0 83 chr6 114221672 37 25S6M1I11M7S =
306
114216196 -5493 G A A A G G C C A G A G T A T A G A A T A A A C A C A A C A A T G T C C A A G G T A C A C T G T T A
307
gffeaaddddggggggedgcgeggdegggggffcgggggggegdfggfgf XT:A:M NM:i:3 SM:i:37 AM:i:37 XM:i:2 X O : i :
309
BERTHA _0001:3:1:16:594#0 147 chr8 43835330 0 50M =
310
43830893 -4487 CTTTGGGAGGGCTTTGTAGCCTATCTGGAAAAAGGAAATATCTTCCCATG U
311
\e^bgeTdg_Kgcg`ggeggg_gggggggggddgdggVg\gWdfgfgff XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:10 X1:i:7 X M : i :
312
2 XO:i:0 XG:i:0 MD:Z:1A2T45
315
==========================================================================
316
5.1.12 (-bed)Output BED format when using BAM input
317
==========================================================================
318
When comparing alignments in BAM format (**-abam**) to features in BED format (**-b**), **bedtools intersect**
319
will **optionally** write the output in BED format. That is, each alignment in the BAM file is converted
320
to a 6 column BED feature and if overlaps are found (or not) based on the user's criteria, the BAM
321
alignment will be reported in BED format. The BED "name" field is comprised of the RNAME field in
322
the BAM alignment. If mate information is available, the mate (e.g., "/1" or "/2") field will be
323
appended to the name. The "score" field is the mapping quality score from the BAM alignment.
327
bedtools intersect -abam reads.unsorted.bam -b simreps.bed -bed | head -20
329
chr4 9236903 9236953 BERTHA_0001:3:1:15:1362#0/1 0 +
330
chr6 114221671 114221721 BERTHA_0001:3:1:16:994#0/1 37 -
331
chr8 43835329 43835379 BERTHA_0001:3:1:16:594#0/2 0 -
332
chr4 49110668 49110718 BERTHA_0001:3:1:31:487#0/1 23 +
333
chr19 27732052 27732102 BERTHA_0001:3:1:32:890#0/2 46 +
334
chr19 27732012 27732062 BERTHA_0001:3:1:45:1135#0/1 37 +
335
chr10 117494252 117494302 BERTHA_0001:3:1:68:627#0/1 37 -
336
chr19 27731966 27732016 BERTHA_0001:3:1:83:931#0/2 9 +
337
chr8 48660075 48660125 BERTHA_0001:3:1:86:608#0/2 37 -
338
chr9 34986400 34986450 BERTHA_0001:3:1:113:183#0/2 37 -
339
chr10 42372771 42372821 BERTHA_0001:3:1:128:1932#0/1 3 -
340
chr19 27731954 27732004 BERTHA_0001:3:1:130:1402#0/2 0 +
341
chr10 42357337 42357387 BERTHA_0001:3:1:137:868#0/2 9 +
342
chr1 159720631 159720681 BERTHA_0001:3:1:147:380#0/2 37 -
343
chrX 58230155 58230205 BERTHA_0001:3:1:151:656#0/2 37 -
344
chr5 142612746 142612796 BERTHA_0001:3:1:152:1893#0/1 37 -
345
chr9 71795659 71795709 BERTHA_0001:3:1:177:387#0/1 37 +
346
chr1 106240854 106240904 BERTHA_0001:3:1:194:928#0/1 37 -
347
chr4 74128456 74128506 BERTHA_0001:3:1:221:724#0/1 37 -
348
chr8 42606164 42606214 BERTHA_0001:3:1:244:962#0/1 37 +
350
==================================================================================
351
5.1.13 (-split)Reporting overlaps with spliced alignments or blocked BED features
352
==================================================================================
353
As described in section 1.3.19, bedtools intersect will, by default, screen for overlaps against the entire span
354
of a spliced/split BAM alignment or blocked BED12 feature. When dealing with RNA-seq reads, for
355
example, one typically wants to only screen for overlaps for the portions of the reads that come from
356
exons (and ignore the interstitial intron sequence). The **-split** command allows for such overlaps to be
359
For example, the diagram below illustrates the *default* behavior. The blue dots represent the "split/
360
spliced" portion of the alignment (i.e., CIGAR "N" operation). In this case, the two exon annotations
361
are reported as overlapping with the "split" BAM alignment, but in addition, a third feature that
362
overlaps the "split" portion of the alignment is also reported.
364
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
366
Exons --------------- ----------
368
BED/BAM A ************.......................................****
370
BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^
372
Result =============== ======== ==========
375
In contrast, when using the **-split** option, only the exon overlaps are reported.
377
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379
Exons --------------- ----------
381
BED/BAM A ************.......................................****
383
BED File B ^^^^^^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^
385
Result =============== ==========
b'\\ No newline at end of file'