~ubuntu-branches/ubuntu/trusty/r-cran-genabel/trusty

Viewing changes to man/hom.Rd

Committer: Package Import Robot
Author(s): Andreas Tille
Date: 2013-07-22 09:22:48 UTC
mfrom: (1.1.5)
Revision ID: package-import@ubuntu.com-20130722092248-xds9dpinjhbx3kho

Tags: 1.7-6-1

* New upstream version
* debian/control: Drop citation from long description because this
information is provided in debian/upstream.

files modified:
CHANGES.LOG

DESCRIPTION

R/ccfast.R

R/ccfast.new.R

R/cocohet.R

R/egscore.R

R/export.plink.R

R/hom.R

R/ibs.R

R/load.gwaa.data.R

R/polygenic.R

R/qtscore.R

R/snp.names.R

R/zzz.R

debian/changelog

debian/control

inst/doc/GenABEL-tutorial.pdf

inst/unitTests/runit.exports.R

man/GenABEL.Rd

man/catable.Rd

man/ccfast.Rd

man/check.marker-class.Rd

man/check.marker.Rd

man/cocohet.Rd

man/convert.snp.affymetrix.Rd

man/convert.snp.mach.Rd

man/egscore.Rd

man/egscore.old.Rd

man/emp.qtscore.Rd

man/export.merlin.Rd

man/export.plink.Rd

man/hom.Rd

man/hom.old.Rd

man/ibs.Rd

man/ibs.old.Rd

man/merge.snp.data.Rd

man/mmscore.Rd

man/npsubtreated.Rd

man/plot.check.marker.Rd

man/qtscore.Rd

man/scan.haplo.2D.Rd

man/scan.haplo.Rd

man/snp.subset.Rd

man/srdta.Rd

src/export_plink.cpp

src/frutil.cpp

Show diffs side-by-side

added added

removed removed

man/hom.Rd

\name{hom}

\alias{hom}

\title{function to compute average homozygosity within a person}

\description{

This function computes average homozygosity (inbreeding) for a set of

people, across multiple markers. Can be used for Quality Control

(e.g. contamination checks)

}

\usage{

hom(data, snpsubset, idsubset, snpfreq, n.snpfreq = 1000)

}

\arguments{

\item{data}{Object of \link{gwaa.data-class} or \link{snp.data-class}}

\item{data}{Object of \link{gwaa.data-class} or

\link{snp.data-class}}

\item{snpsubset}{Subset of SNPs to be used}

\item{idsubset}{People for whom average homozygosity is to be computed}

\item{snpfreq}{when option weight="freq" used, you can provide

fixed allele frequencies}

\item{n.snpfreq}{when option weight="freq" used, you can provide

a vector supplying the number of people used to estimate allele

frequencies at the particular marker, or a fixed number}

\item{idsubset}{People for whom average homozygosity is

to be computed}

\item{snpfreq}{when option weight="freq" used, you can

provide fixed allele frequencies}

\item{n.snpfreq}{when option weight="freq" used, you can

provide a vector supplying the number of people used to

estimate allele frequencies at the particular marker, or

a fixed number}

}

\value{

A matrix with rows corresponding to the ID names and

columns showing the number of SNPs measured in this

person (NoMeasured), the number of measured polymorphic

SNPs (NoPoly), homozygosity (Hom), expected homozygosity

(E(Hom)), variance, and the estimate of inbreeding, F.

}

\description{

This function computes average homozygosity (inbreeding)

for a set of people, across multiple markers. Can be used

for Quality Control (e.g. contamination checks)

}

\details{

Homozygosity is measured as proportion of

homozygous genotypes observed in a person.

Inbreeding for person \eqn{i} is estimated with

\deqn{

f_i = \frac{(O_i - E_i)}{(L_i - E_i)}

}{

f_i = ((O_i - E_i))/((L_i - E_i))

}

where \eqn{O_i} is observed homozygosity, \eqn{L_i} is the number of SNPs

measured in individual \eqn{i} and

\deqn{

E_i = \Sigma_{j=1}^{L_i} (1 - 2 p_j (1 - p_j) \frac{T_{Aj}}{T_{Aj}-1})

}{

E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1))

}

where \eqn{T_{Aj}} is the number of measured genotypes at locus \eqn{j};

\eqn{T_{Aj}} is either estimated from data or provided by "n.snpfreq"

parameter (vector). Allelic frequencies are either estimated from

data or provided by the "snpfreq" vector.

This measure is the same as used by PLINK (see reference).

The variance (Var) is estimated as

\deqn{

V_{i} = \frac(1)(N) \Sigma_k \frac{(x_{i,k} - p_k)^2}{(p_k * (1 - p_k))}

}

where k changes from 1 to N = number of SNPs, \eqn{x_{i,k}} is

a genotype of ith person at the kth SNP, coded as 0, 1/2, 1 and

\eqn{p_k} is the frequency

of the "+" allele.

Only polymorphic loci with number of measured genotypes >1 are used

with this option.

This variance is used as diagonal of the genomic

kinship matrix when using EIGENSTRAT method.

You should use as many people and markers as possible when estimating

inbreeding/variance from marker data.

}

\value{

A matrix with rows corresponding to the ID names and columns

showing the number of SNPs measured in this person (NoMeasured),

the number of measured polymorphic SNPs (NoPoly),

homozygosity (Hom),

expected homozygosity (E(Hom)), variance, and

the estimate of inbreeding, F.

}

\references{

Purcell S. et al, (2007) PLINK: a toolset for whole genome association and population-based

linkage analyses. Am. J. Hum. Genet.

}

\author{Yurii Aulchenko, partly based on code by John Barnard}

%\note{

\seealso{

\code{\link{ibs}},

\code{\link{gwaa.data-class}},

\code{\link{snp.data-class}}

Homozygosity is measured as proportion of homozygous

genotypes observed in a person.

Inbreeding for person \eqn{i} is estimated with

\deqn{ }{ f_i = ((O_i - E_i))/((L_i - E_i)) }\deqn{ f_i =

\frac{(O_i - E_i)}{(L_i - E_i)} }{ f_i = ((O_i -

E_i))/((L_i - E_i)) }\deqn{ }{ f_i = ((O_i - E_i))/((L_i

- E_i)) }

where \eqn{O_i} is observed homozygosity, \eqn{L_i} is

the number of SNPs measured in individual \eqn{i} and

\deqn{ }{ E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j)

(T_(Aj))/(T_(Aj)-1)) }\deqn{ E_i = \Sigma_{j=1}^{L_i} (1

- 2 p_j (1 - p_j) \frac{T_{Aj}}{T_{Aj}-1}) }{ E_i =

Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j)

(T_(Aj))/(T_(Aj)-1)) }\deqn{ }{ E_i = Sigma_(j=1)^(L_i)

(1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1)) }

where \eqn{T_{Aj}} is the number of measured genotypes at

locus \eqn{j}; \eqn{T_{Aj}} is either estimated from data

or provided by "n.snpfreq" parameter (vector). Allelic

frequencies are either estimated from data or provided by

the "snpfreq" vector.

This measure is the same as used by PLINK (see

reference).

The variance (Var) is estimated as

\deqn{ V_{i} = \frac{1}{N} \Sigma_k \frac{(x_{i,k} -

p_k)^2}{(p_k * (1 - p_k))} }

where k changes from 1 to N = number of SNPs,

\eqn{x_{i,k}} is a genotype of ith person at the kth SNP,

coded as 0, 1/2, 1 and \eqn{p_k} is the frequency of the

"+" allele.

Only polymorphic loci with number of measured genotypes

>1 are used with this option.

This variance is used as diagonal of the genomic kinship

matrix when using EIGENSTRAT method.

You should use as many people and markers as possible

when estimating inbreeding/variance from marker data.

}

\examples{

data(ge03d2)

h[1:5,]

homsem <- h[,"Hom"]*(1-h[,"Hom"])/h[,"NoMeasured"]

plot(h[,"Hom"],homsem)

# wrong analysis: one should use all people (for right frequency) and markers (for right F) available!

# wrong analysis: one should use all people (for right frequency)

# and markers (for right F) available!

h <- hom(ge03d2[,c(1:10)])

}

100

\keyword{htest}% at least one, from doc/KEYWORDS

\author{

Yurii Aulchenko, partly based on code by John Barnard

}

\references{

100

Purcell S. et al, (2007) PLINK: a toolset for whole

101

genome association and population-based linkage analyses.

102

Am. J. Hum. Genet.

103

}

104

\seealso{

105

\code{\link{ibs}}, \code{\link{gwaa.data-class}},

106

\code{\link{snp.data-class}}

107

}

108

\keyword{htest}

109

Older »