1
\name{GAlignmentPairs-class}
5
\alias{class:GAlignmentPairs}
6
\alias{GAlignmentPairs-class}
9
\alias{GAlignmentPairs}
10
\alias{readGAlignmentPairs}
13
\alias{length,GAlignmentPairs-method}
14
\alias{names,GAlignmentPairs-method}
15
\alias{names<-,GAlignmentPairs-method}
17
\alias{first,GAlignmentPairs-method}
19
\alias{last,GAlignmentPairs-method}
21
\alias{left,GAlignmentPairs-method}
23
\alias{right,GAlignmentPairs-method}
24
\alias{seqnames,GAlignmentPairs-method}
25
\alias{strand,GAlignmentPairs-method}
26
\alias{strand<-,GAlignmentPairs-method}
27
\alias{ngap,GAlignmentPairs-method}
29
\alias{isProperPair,GAlignmentPairs-method}
30
\alias{elementMetadata<-,GAlignmentPairs-method}
31
\alias{seqinfo,GAlignmentPairs-method}
32
\alias{seqlevelsInUse,GAlignmentPairs-method}
33
\alias{seqinfo<-,GAlignmentPairs-method}
36
\alias{[[,GAlignmentPairs,ANY,ANY-method}
37
\alias{unlist,GAlignmentPairs-method}
40
\alias{grglist,GAlignmentPairs-method}
41
\alias{granges,GAlignmentPairs-method}
42
\alias{introns,GAlignmentPairs-method}
43
\alias{coerce,GAlignmentPairs,GRangesList-method}
44
\alias{coerce,GAlignmentPairs,GRanges-method}
45
\alias{coerce,GAlignmentPairs,GAlignments-method}
48
\alias{show,GAlignmentPairs-method}
51
\alias{c,GAlignmentPairs-method}
54
\alias{GappedAlignmentPairs}
55
\alias{class:GappedAlignmentPairs}
56
\alias{GappedAlignmentPairs-class}
57
\alias{show,GappedAlignmentPairs-method}
58
\alias{readGappedAlignmentPairs}
61
\title{GAlignmentPairs objects}
64
The GAlignmentPairs class is a container for "genomic alignment pairs".
68
A GAlignmentPairs object is a list-like object where each element
69
describes a pair of genomic alignment.
71
An "alignment pair" is made of a "first" and a "last" alignment,
72
and is formally represented by a \link{GAlignments} object of
73
length 2. It is typically representing a hit of a paired-end read to
74
the reference genome that was used by the aligner. More precisely,
75
in a given pair, the "first" alignment represents the hit of the first
76
end of the read (aka "first segment in the template", using SAM Spec
77
terminology), and the "last" alignment represents the hit of the second
78
end of the read (aka "last segment in the template", using SAM Spec
81
In general, a GAlignmentPairs object will be created by loading
82
records from a BAM (or SAM) file containing aligned paired-end reads,
83
using the \code{readGAlignmentPairs} function (see below).
84
Each element in the returned object will be obtained by pairing 2
88
\section{Constructors}{
91
\code{readGAlignmentPairs(file, format="BAM", use.names=FALSE, ...)}:
92
Read a file containing paired-end reads as a GAlignmentPairs
94
By default (i.e. \code{use.names=FALSE}), the resulting object has no
95
names. If \code{use.names} is \code{TRUE}, then the names are
96
constructed from the query template names (QNAME field in a SAM/BAM
97
file). Note that the 2 records in a pair of records have the same QNAME.
99
Note that this function is just a front-end that delegates to the
100
format-specific back-end function specified via the \code{format}
101
argument. The \code{use.names} argument and any extra argument are
102
passed to the back-end function.
103
Only the BAM format is supported for now. Its back-end is the
104
\code{\link[Rsamtools]{readGAlignmentPairsFromBam}} function
105
defined in the Rsamtools package.
106
See \code{?\link[Rsamtools]{readGAlignmentPairsFromBam}} for
107
more information (you might need to install and load the Rsamtools
111
\code{GAlignmentPairs(first, last, isProperPair, names=NULL)}:
112
Low-level GAlignmentPairs constructor. Generally not used directly.
118
In the code snippets below, \code{x} is a GAlignmentPairs object.
123
Return the number of alignment pairs in \code{x}.
126
\code{names(x)}, \code{names(x) <- value}:
127
Get or set the names of \code{x}.
128
See \code{readGAlignmentPairs} above for how to automatically
129
extract and set the names from the file to read.
132
\code{first(x, invert.strand=FALSE)},
133
\code{last(x, invert.strand=FALSE)}:
134
Get the "first" or "last" alignment for each alignment pair in
136
The result is a \link{GAlignments} object of the same length
138
If \code{invert.strand=TRUE}, then the strand is inverted on-the-fly,
139
i.e. "+" becomes "-", "-" becomes "+", and "*" remains unchanged.
143
Get the "left" alignment for each alignment pair in \code{x}.
144
By definition, the "left" alignment in a pair is the alignment that
145
is on the + strand. If this is the "first" alignment, then it's returned
146
as-is by \code{left(x)}, but if this is the "last" alignment, then it's
147
returned by \code{left(x)} with the strand inverted.
151
Get the "right" alignment for each alignment pair in \code{x}.
152
By definition, the "right" alignment in a pair is the alignment that
153
is on the - strand. If this is the "first" alignment, then it's returned
154
as-is by \code{right(x)}, but if this is the "last" alignment, then it's
155
returned by \code{right(x)} with the strand inverted.
159
Get the name of the reference sequence for each alignment pair
160
in \code{x}. This comes from the RNAME field of the BAM file and
161
has the same value for the 2 records in a pair
162
(\code{\link[Rsamtools]{makeGAlignmentPairs}}, the function
163
used by \code{\link[Rsamtools]{readGAlignmentPairsFromBam}} for
164
doing the pairing, rejects pairs with incompatible RNAME values).
167
\code{strand(x)}, \code{strand(x) <- value}:
168
Get or set the strand for each alignment pair in \code{x}.
169
By definition (and in a somewhat arbitrary way) the strand of an
170
alignment pair is the strand of the \emph{"first"} alignment in the pair.
171
In a GAlignmentPairs object, the strand of the "last" alignment
172
in a pair is typically (but not always) the opposite of the strand
173
of the "first" alignment. Note that, currently,
174
\code{\link[Rsamtools]{makeGAlignmentPairs}}, the function
175
used internally by \code{\link[Rsamtools]{readGAlignmentPairsFromBam}}
176
for doing the pairing, rejects pairs where the "first" and "last"
177
alignments are on the same strand, but those pairs might be supported
182
Equivalent to \code{ngap(first(x)) + ngap(last(x))}.
185
\code{isProperPair(x)}:
186
Get the "isProperPair" flag bit (bit 0x2 in SAM Spec) set by
187
the aligner for each alignment pair in \code{x}.
190
\code{seqinfo(x)}, \code{seqinfo(x) <- value}:
191
Get or set the information about the underlying sequences.
192
\code{value} must be a \link{Seqinfo} object.
195
\code{seqlevels(x)}, \code{seqlevels(x) <- value}:
196
Get or set the sequence levels.
197
\code{seqlevels(x)} is equivalent to \code{seqlevels(seqinfo(x))}
198
or to \code{levels(seqnames(x))}, those 2 expressions being
199
guaranteed to return identical character vectors on a
200
GAlignmentPairs object. \code{value} must be a character vector
202
See \code{?\link{seqlevels}} for more information.
205
\code{seqlengths(x)}, \code{seqlengths(x) <- value}:
206
Get or set the sequence lengths.
207
\code{seqlengths(x)} is equivalent to \code{seqlengths(seqinfo(x))}.
208
\code{value} can be a named non-negative integer or numeric vector
212
\code{isCircular(x)}, \code{isCircular(x) <- value}:
213
Get or set the circularity flags.
214
\code{isCircular(x)} is equivalent to \code{isCircular(seqinfo(x))}.
215
\code{value} must be a named logical vector eventually with NAs.
218
\code{genome(x)}, \code{genome(x) <- value}:
219
Get or set the genome identifier or assembly name for each sequence.
220
\code{genome(x)} is equivalent to \code{genome(seqinfo(x))}.
221
\code{value} must be a named character vector eventually with NAs.
224
\code{seqnameStyle(x)}:
225
Get or set the seqname style for \code{x}.
226
Note that this information is not stored in \code{x} but inferred
227
by looking up \code{seqnames(x)} against a seqname style database
228
stored in the seqnames.db metadata package (required).
229
\code{seqnameStyle(x)} is equivalent to \code{seqnameStyle(seqinfo(x))}
230
and can return more than 1 seqname style (with a warning)
231
in case the style cannot be determined unambiguously.
236
\section{Vector methods}{
237
In the code snippets below, \code{x} is a GAlignmentPairs object.
242
Return a new GAlignmentPairs object made of the selected
248
\section{List methods}{
249
In the code snippets below, \code{x} is a GAlignmentPairs object.
254
Extract the i-th alignment pair as a \link{GAlignments} object
255
of length 2. As expected \code{x[[i]][1]} and \code{x[[i]][2]} are
256
respectively the "first" and "last" alignments in the pair.
259
\code{unlist(x, use.names=TRUE)}:
260
Return the \link{GAlignments} object conceptually defined
261
by \code{c(x[[1]], x[[2]], ..., x[[length(x)]])}.
262
\code{use.names} determines whether \code{x} names should be
263
propagated to the result or not.
269
In the code snippets below, \code{x} is a GAlignmentPairs object.
273
\code{grglist(x, order.as.in.query=FALSE, drop.D.ranges=FALSE)}:
275
Return a \link{GRangesList} object of length \code{length(x)}
276
where the i-th element represents the ranges (with respect to the
277
reference) of the i-th alignment pair in \code{x}.
279
IMPORTANT: The strand of the ranges coming from the "last" alignment
280
in the pair is \emph{always} inverted.
282
The \code{order.as.in.query} toggle affects the order of the ranges
283
\emph{within} each top-level element of the returned object.
285
If \code{FALSE} (the default), then the "left" ranges are placed before
286
the "right" ranges, and, within each left or right group, are ordered
287
from 5' to 3' in elements associated with the plus strand and from 3'
288
to 5' in elements associated with the minus strand.
289
More formally, the i-th element in the returned \link{GRangesList}
290
object can be defined as \code{c(grl1[[i]], grl2[[i]])}, where
291
\code{grl1} is \code{grglist(left(x))} and \code{grl2} is
292
\code{grglist(right(x))}.
294
If \code{TRUE}, then the "first" ranges are placed before the "last"
295
ranges, and, within each first or last group, are \emph{always}
296
ordered from 5' to 3', whatever the strand is.
297
More formally, the i-th element in the returned \link{GRangesList}
298
object can be defined as \code{c(grl1[[i]], grl2[[i]])}, where
299
\code{grl1} is \code{grglist(first(x),
300
order.as.in.query=TRUE)}
302
\code{grl2} is \code{grglist(last(x, invert.strand=TRUE),
303
order.as.in.query=TRUE)}.
305
Note that the relationship between the 2 \link{GRangesList} objects
306
obtained with \code{order.as.in.query} being respectively
307
\code{FALSE} or \code{TRUE} is simpler than it sounds: the only
308
difference is that the order of the ranges in elements associated
309
with the \emph{minus} strand is reversed.
311
Finally note that, in the latter, the ranges are \emph{always} ordered
312
consistently with the original "query template", that is, in the order
313
defined by walking the "query template" from the beginning to the end.
315
If \code{drop.D.ranges} is \code{TRUE}, then deletions (Ds in the
316
CIGAR) are treated like gaps (Ns in the CIGAR), that is, the ranges
317
corresponding to deletions are dropped.
320
\code{granges(x)}: Return a \link{GRanges} object of length
321
\code{length(x)} where each range is obtained by merging all the
322
ranges within the corresponding top-level element in \code{grglist(x)}.
325
\code{introns(x)}: Extract the gaps (i.e. N operations in the CIGAR)
326
of the "first" and "last" alignments of each pair as a
327
\link{GRangesList} object of the same length as \code{x}.
328
Equivalent to (but faster than):
330
introns1 <- introns(first(x))
331
introns2 <- introns(last(x, invert.strand=TRUE))
332
mendoapply(c, introns1, introns2)
336
\code{as(x, "GRangesList")}, \code{as(x, "GRanges")}:
337
Alternate ways of doing \code{grglist(x)} and \code{granges(x)},
341
\code{as(x, "GAlignments")}:
342
Equivalent of \code{unlist(x, use.names=TRUE)}.
347
\section{Other methods}{
348
In the code snippets below, \code{x} is a GAlignmentPairs object.
353
By default the \code{show} method displays 5 head and 5 tail
354
elements. This can be changed by setting the global options
355
\code{showHeadLines} and \code{showTailLines}. If the object
356
length is less than (or equal to) the sum of these 2 options
357
plus 1, then the full object is displayed.
358
Note that these options also affect the display of \link{GRanges}
359
and \link{GAlignments} objects, as well as other objects defined
360
in the IRanges and Biostrings packages (e.g. \link[IRanges]{Ranges}
361
and \link[Biostrings]{XStringSet} objects).
372
\item \link{GAlignments-class}.
373
\item \code{\link[Rsamtools]{readGAlignmentPairsFromBam}}.
374
\item \code{\link[Rsamtools]{makeGAlignmentPairs}}.
375
\item \link{GRangesList-class}.
376
\item \link{GRanges-class}.
377
\item \link{findOverlaps-methods}.
378
\item \link{coverage-methods}.
379
\item \code{\link{seqinfo}}.
384
ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
385
galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
393
last(galp, invert.strand=TRUE)
399
table(isProperPair(galp))
402
## Rename the reference sequences:
403
seqlevels(galp) <- sub("seq", "chr", seqlevels(galp))
409
grglist(galp) # a GRangesList object
410
grglist(galp, order.as.in.query=TRUE)
411
stopifnot(identical(unname(elementLengths(grglist(galp))), ngap(galp) + 2L))
413
granges(galp) # a GRanges object
415
introns(galp) # a GRangesList object
416
stopifnot(identical(unname(elementLengths(introns(galp))), ngap(galp)))