~ubuntu-branches/ubuntu/oneiric/lme4/oneiric : revision 30

14

\author{Douglas Bates\\Department of Statistics\\%

15

University of Wisconsin -- Madison}

16

\begin{document}

17

\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=TRUE}

17

\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=true,keep.source=TRUE}

18

\SweaveOpts{include=FALSE}

19

\setkeys{Gin}{width=\textwidth}

20

\newcommand{\code}[1]{\texttt{\small{#1}}}

94

% models, can be evaluated exactly.

95

96

In the next section we describe the general form of the mixed models

97

that can be represented in the \package{lme4} package and the

97

that can be represented in the \package{lme4} package and the

98

computational approach embodied in the package. In the following

99

section we describe a particular form of mixed model,

100

called a linear mixed model, and the computational details for those

339

\begin{aligned}

340

-2\log(f_{\bc U}(\bm u))&=q\log(2\pi\sigma^2)+\frac{\|\bm u\|^2}{\sigma^2}\\

341

-2\log(f_{\bc Y|\bc U}(\bm y|\bm u))&=n\log(2\pi\sigma^2)+

342

\frac{\|\bm y-\bm Z\bm\Lambda(\bm\theta)\bm u-\bm X\bm\beta\|^2}{\sigma^2}\\

342

\frac{\|\bm y-\bm Z\bm\Lambda(\bm\theta)\bm u-\bm X\bm\beta\|^2}{\sigma^2}\\

343

-2\log(h(\bm u|\bm y,\bm\theta,\bm\beta,\sigma))

344

&= (n+q)\log(2\pi\sigma^2)+

345

\frac{\|\bm y-\bm\gamma(\bm u,\bm\theta,\bm\beta)\|^2+\|\bm

349

\end{aligned}

350

\end{equation}

351

352

In (\ref{eq:7}) the \emph{discrepancy} function,

352

In (\ref{eq:7}) the \emph{discrepancy} function,

353

\begin{equation}

354

\label{eq:9}

355

d(\bm u|\bm y,\bm\theta,\bm\beta) =

355

d(\bm u|\bm y,\bm\theta,\bm\beta) =

356

\|\bm y-\bm\gamma(\bm u,\bm\theta,\bm\beta)\|^2+\|\bm u\|^2

357

\end{equation}

358

has the form of a penalized residual sum of squares in that the first

360

the residual sum of squares for $\bm y$, $\bm u$, $\bm\theta$ and

361

$\bm\beta$ and the second term, $\|\bm u\|^2$, is a penalty on the

362

size of $\bm u$. Notice that the discrepancy does not depend on the

363

common scale parameter, $\sigma$.

363

common scale parameter, $\sigma$.

364

365

\subsection{The canonical form of the discrepancy}

366

\label{sec:conditional-mode-bm}

426

$\bm P$, which is determined from the pattern of nonzeros in $\bm A$

427

but does not depend on particular values of those nonzeros, can have a

428

profound impact on the number of nonzeros in $\bm L$ and hence on the

429

speed with which $\bm L$ can be calculated from $\bm A$.

429

speed with which $\bm L$ can be calculated from $\bm A$.

430

431

In most applications of linear mixed models the matrix $\bm

432

Z\bm\Lambda(\bm\theta)$ is sparse while $\bm X$ is dense or close to

512

\begin{multline}

513

\label{eq:lmmdev}

514

-2\ell(\bm\theta,\bm\beta,\sigma|\bm y)\\

515

=-2\log\left(L(\bm\theta,\bm\beta,\sigma|\bm y)\right)\\

515

=-2\log\left(L(\bm\theta,\bm\beta,\sigma|\bm y)\right)\\

516

=n\log(2\pi\sigma^2)+\log(|\bm L_{\bm Z}|^2)+\frac{\tilde{d}(\bm y,\bm\theta) +

517

\left\|\bm L_{\bm X}\trans\bm P_{\bm X}(\bm\beta-\tilde{\bm\beta})\right\|^2}{\sigma^2},

518

\end{multline}

667

\end{bmatrix} =

668

\begin{bmatrix}

669

\bm\Lambda\trans(\bm\theta)\bm Z\trans\bm y\\

670

\bm X\trans\bm y

670

\bm X\trans\bm y

671

\end{bmatrix} .

672

\end{displaymath}

673

In the process of solving this system we create the sparse left

802

\end{equation}

803

becomes a weighted, nonlinear least squares problem except that the

804

weights, $\bm W$, can depend on $\bm\mu$ and, hence, on $\bm u$ and

805

$\bm\beta$.

805

$\bm\beta$.

806

807

In describing an algorithm for linear mixed models we called

808

$\tilde{\bm\beta}(\bm\theta)$ the \emph{conditional estimate}. That

875

{\bm U^{(i)}}\trans\bm W^{1/2}(\bm y-\bm\mu^{(i)})

876

\end{bmatrix}

877

\end{equation}

878

providing the updated parameter values

878

providing the updated parameter values

879

\begin{equation}

880

\label{eq:33}

881

\begin{bmatrix}\bm u^{(i+1)}\\\bm\beta^{(i+1)}\end{bmatrix}=

941

\end{bmatrix} =

942

\begin{bmatrix}

943

\bm\Lambda\trans(\bm\theta)\bm Z\trans\bm W\bm y\\

944

\bm X\trans\bm W\bm y

944

\bm X\trans\bm W\bm y

945

\end{bmatrix} ,

946

\end{equation}

947

which can be solved directly, and the Cholesky factor, $\bm L_{\bm Z}(\bm\theta)$, satisfies

968

simultaneously. In these cases we must include the term involving

969

$|\bm W(\bm\phi)|$ when evaluating the profiled log-likelihood.

970

971

Note that we must define the parameterization of $\bm W(\bm\phi)$

971

Note that we must define the parameterization of $\bm W(\bm\phi)$

972

such that $\sigma^2$ and $\bm\phi$ are not a redundant

973

parameterization of $\sigma^2\bm W(\bm\phi)$. For example, we could

974

require that the first diagonal element of $\bm W$ be unity.

981

W=\bm I_n$. That is,

982

\begin{equation}

983

\label{eq:conddistNLMM}

984

(\bc Y|\bc U=\bm u)\sim\mathcal{N}(\bm m(\bm\gamma),\sigma^2\bm I_n)

984

(\bc Y|\bc U=\bm u)\sim\mathcal{N}(\bm m(\bm\gamma),\sigma^2\bm I_n)

985

\end{equation}

986

with discrepancy function

987

\begin{equation}

1000

\end{bmatrix} = \arg\min_{\bm u,\bm\theta}d(\bm u|\bm y,\bm\theta,\bm\beta)

1001

\end{equation}

1002

and we write the minimum discrepancy, given $\bm y$ and $\bm\theta$,

1003

as

1003

as

1004

\begin{equation}

1005

\label{eq:25}

1006

\tilde{d}(\bm y,\bm\theta)=d(\tilde{\bm u}(\bm\theta)|\bm

1128

the Bernoulli and Poisson families, which are in the exponential class

1129

of probability distributions, there is one particular scalar link

1130

function, called the \emph{canonical link function}, that arisesfrom the

1131

definition of the distribution itself.

1131

definition of the distribution itself.

1132

1133

To determine the canonical link function we consider the logarithm of

1134

the density function or the probability mass function. In the case of

1168

estimates of the parameters in a linear mixed model is evaluating the

1169

factorization (\ref{eq:16}) for any $\bm\theta$ satisfying

1170

$\bm\theta_L\le\bm\theta\le\bm\theta_U$. We will assume that $\bm Z$

1171

is sparse as is $\bm Z\bm\Lambda(\bm\theta)$.

1171

is sparse as is $\bm Z\bm\Lambda(\bm\theta)$.

1172

1173

When $\bm X$ is not sparse we will use the factorization (\ref{eq:16})

1174

setting $\bm P_{\bm X}=\bm I_p$ and storing $\bm L_{\bm X\bm Z}$ and

1195

choice in implementation is whether to store $\bm Z\trans$ and update

1196

it to $\bm\Lambda\trans(\bm\theta)\bm Z$ or to store $\bm Z\trans\bm

1197

Z$ and use it to form $\bm\Lambda\trans(\bm\theta)\bm Z\trans\bm

1198

Z\bm\Lambda(\theta)$ at each evaluation.

1198

Z\bm\Lambda(\theta)$ at each evaluation.

1199

1200

In the \package{lme4} package we store $\bm Z\trans$ and use it to

1201

form $\bm\Lambda\trans(\bm\theta)\bm Z\trans$ from which $\bm L_{\bm

1319

(dimension $m$).

1320

\item[$\sigma$] the common scale parameter - not used in some

1321

generalized linear mixed models and generalized nonlinear mixed

1322

models.

1322

models.

1323

\end{description}

1324

1325

\subsection{Dimensions}

1360

which is often alarmingly described as ``an intractable integral''.

1361

In point of fact, this integral can be evaluated exactly in the case

1362

of a linear mixed model and can be approximated quite accurately for

1363

other forms of mixed models.

1363

other forms of mixed models.

1364

1365

\end{document}