3
3
\title{function to compute average homozygosity within a person}
5
This function computes average homozygosity (inbreeding) for a set of
6
people, across multiple markers. Can be used for Quality Control
7
(e.g. contamination checks)
10
hom(data, snpsubset, idsubset, snpfreq, n.snpfreq = 1000)
5
hom(data, snpsubset, idsubset, snpfreq, n.snpfreq = 1000)
13
\item{data}{Object of \link{gwaa.data-class} or \link{snp.data-class}}
8
\item{data}{Object of \link{gwaa.data-class} or
14
11
\item{snpsubset}{Subset of SNPs to be used}
15
\item{idsubset}{People for whom average homozygosity is to be computed}
16
\item{snpfreq}{when option weight="freq" used, you can provide
17
fixed allele frequencies}
18
\item{n.snpfreq}{when option weight="freq" used, you can provide
19
a vector supplying the number of people used to estimate allele
20
frequencies at the particular marker, or a fixed number}
13
\item{idsubset}{People for whom average homozygosity is
16
\item{snpfreq}{when option weight="freq" used, you can
17
provide fixed allele frequencies}
19
\item{n.snpfreq}{when option weight="freq" used, you can
20
provide a vector supplying the number of people used to
21
estimate allele frequencies at the particular marker, or
25
A matrix with rows corresponding to the ID names and
26
columns showing the number of SNPs measured in this
27
person (NoMeasured), the number of measured polymorphic
28
SNPs (NoPoly), homozygosity (Hom), expected homozygosity
29
(E(Hom)), variance, and the estimate of inbreeding, F.
32
This function computes average homozygosity (inbreeding)
33
for a set of people, across multiple markers. Can be used
34
for Quality Control (e.g. contamination checks)
23
Homozygosity is measured as proportion of
24
homozygous genotypes observed in a person.
26
Inbreeding for person \eqn{i} is estimated with
29
f_i = \frac{(O_i - E_i)}{(L_i - E_i)}
31
f_i = ((O_i - E_i))/((L_i - E_i))
34
where \eqn{O_i} is observed homozygosity, \eqn{L_i} is the number of SNPs
35
measured in individual \eqn{i} and
38
E_i = \Sigma_{j=1}^{L_i} (1 - 2 p_j (1 - p_j) \frac{T_{Aj}}{T_{Aj}-1})
40
E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1))
43
where \eqn{T_{Aj}} is the number of measured genotypes at locus \eqn{j};
44
\eqn{T_{Aj}} is either estimated from data or provided by "n.snpfreq"
45
parameter (vector). Allelic frequencies are either estimated from
46
data or provided by the "snpfreq" vector.
48
This measure is the same as used by PLINK (see reference).
50
The variance (Var) is estimated as
53
V_{i} = \frac(1)(N) \Sigma_k \frac{(x_{i,k} - p_k)^2}{(p_k * (1 - p_k))}
56
where k changes from 1 to N = number of SNPs, \eqn{x_{i,k}} is
57
a genotype of ith person at the kth SNP, coded as 0, 1/2, 1 and
58
\eqn{p_k} is the frequency
61
Only polymorphic loci with number of measured genotypes >1 are used
64
This variance is used as diagonal of the genomic
65
kinship matrix when using EIGENSTRAT method.
67
You should use as many people and markers as possible when estimating
68
inbreeding/variance from marker data.
71
A matrix with rows corresponding to the ID names and columns
72
showing the number of SNPs measured in this person (NoMeasured),
73
the number of measured polymorphic SNPs (NoPoly),
75
expected homozygosity (E(Hom)), variance, and
76
the estimate of inbreeding, F.
79
Purcell S. et al, (2007) PLINK: a toolset for whole genome association and population-based
80
linkage analyses. Am. J. Hum. Genet.
82
\author{Yurii Aulchenko, partly based on code by John Barnard}
87
\code{\link{gwaa.data-class}},
88
\code{\link{snp.data-class}}
37
Homozygosity is measured as proportion of homozygous
38
genotypes observed in a person.
40
Inbreeding for person \eqn{i} is estimated with
42
\deqn{ }{ f_i = ((O_i - E_i))/((L_i - E_i)) }\deqn{ f_i =
43
\frac{(O_i - E_i)}{(L_i - E_i)} }{ f_i = ((O_i -
44
E_i))/((L_i - E_i)) }\deqn{ }{ f_i = ((O_i - E_i))/((L_i
47
where \eqn{O_i} is observed homozygosity, \eqn{L_i} is
48
the number of SNPs measured in individual \eqn{i} and
50
\deqn{ }{ E_i = Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j)
51
(T_(Aj))/(T_(Aj)-1)) }\deqn{ E_i = \Sigma_{j=1}^{L_i} (1
52
- 2 p_j (1 - p_j) \frac{T_{Aj}}{T_{Aj}-1}) }{ E_i =
53
Sigma_(j=1)^(L_i) (1 - 2 p_j (1 - p_j)
54
(T_(Aj))/(T_(Aj)-1)) }\deqn{ }{ E_i = Sigma_(j=1)^(L_i)
55
(1 - 2 p_j (1 - p_j) (T_(Aj))/(T_(Aj)-1)) }
57
where \eqn{T_{Aj}} is the number of measured genotypes at
58
locus \eqn{j}; \eqn{T_{Aj}} is either estimated from data
59
or provided by "n.snpfreq" parameter (vector). Allelic
60
frequencies are either estimated from data or provided by
63
This measure is the same as used by PLINK (see
66
The variance (Var) is estimated as
68
\deqn{ V_{i} = \frac{1}{N} \Sigma_k \frac{(x_{i,k} -
69
p_k)^2}{(p_k * (1 - p_k))} }
71
where k changes from 1 to N = number of SNPs,
72
\eqn{x_{i,k}} is a genotype of ith person at the kth SNP,
73
coded as 0, 1/2, 1 and \eqn{p_k} is the frequency of the
76
Only polymorphic loci with number of measured genotypes
77
>1 are used with this option.
79
This variance is used as diagonal of the genomic kinship
80
matrix when using EIGENSTRAT method.
82
You should use as many people and markers as possible
83
when estimating inbreeding/variance from marker data.