1
\documentclass{article}
7
\title{eCryptfs Design Document}
9
\author{Michael A. Halcrow}
13
\section*{Introduction}
15
This document details the design for eCryptfs. We are implementing
16
eCryptfs features on a staged basis. The first stage (version 0.1)
17
includes mount-wide passphrase support and data confidentiality
18
enforcement. The second stage (version 0.2) includes mount-wide public
19
key support and data integrity enforcement. The third stage (version
20
0.3) includes per-file policy support.
22
We have published two papers covering eCryptfs at the Ottawa Linux
23
Symposium (2004 and 2005).\cite{ols} These papers provide a high-level
24
overview of eCryptfs, along with extensive discussion of various
25
topics relating to filesystem security in Linux.
27
As of January 2006, we have completed development for the features
28
planned for eCryptfs version 0.1 and are recommending eCryptfs to be
29
merged into the Linux kernel. This document provides a technical
30
description of the eCryptfs filesystem.
32
\section*{Threat Model}
34
We intend for eCryptfs to protect data confidentiality and data
35
integrity in the event that an unauthorized agent gains access to the
36
data in a context that is outside the control of the host operating
37
environment. Authorization is predicated on the possession of one or
38
more secrets that are correlated on an individual basis with each file
39
object. An agent without at least one of the secrets\footnote{When a
40
user is protecting the files with a passphrase secret, then we
41
anticipate attempts to perform dictionary attacks on that
42
passphrase. Other attacks, such as differential cryptanalysis, are
43
also anticipated.} associated with any given file should not be able
44
to discern any strategic information about the contents of any given
45
encrypted file, aside from what can be deduced from the file name, the
46
file size, or other metadata associated with the file. It should about
47
as difficult to attack an encrypted eCryptfs file as it is to attack a
48
file encrypted by GnuPG (using the same cipher, key, etc.). At no time
49
should a system error result in a confidentiality breach\footnote{For
50
instance, no intermediate state of the file on disk should be more
51
easily attacked than the final state of the file on disk.}.
53
\section*{Key Management}
55
RFC2440 (OpenPGP) heavily influences the design of eCryptfs, although
56
deviations from the RFC are necessary to support random access in a
57
filesystem. Each file has a unique \emph{session key} associated with
58
it. eCryptfs generates that session key via the Linux kernel
59
\emph{get\_random\_bytes()} function call at the time that a file is
60
created. The length of the session key is dependent upon the cipher
61
selected by the user. By default, eCryptfs selects Blowfish, which has
62
a 128-bit key size. In release 0.1, the user specifies the cipher as a
63
mount option. Later releases will allow for a more fine-grained cipher
64
selection via a policy that is dynamically applied at the time that
67
Active eCryptfs inodes are associated with cryptographic
68
contexts. This context exists in a data structure that contains such
69
things as the session key, the cipher name, the root initialization
70
vector, signatures of authentication tokens associated with the inode,
71
various flags indicating inode cryptographic properties, pointers to
72
crypto API structs, and so forth\footnote{The
73
\emph{ecryptfs\_crypt\_stat} struct definition is in the
74
\emph{ecryptfs\_kernel.h} header file.}.
76
The session key is encrypted and stored in the first extent of the
77
\emph{underlying} (encrypted) file. The session key is encrypted once
78
for each \emph{authentication token} associated with the
79
file. Authentication token types reflect the encryption mechanism. In
80
release 0.1, there is one ``global'' \emph{passphrase} authentication
81
token that eCryptfs generates at mount time from the user's specified
82
passphrase\footnote{Conversion of a passphrase into a key follows the
83
S2K process as described in RFC2440, in that the passphrase is
84
concatenated with a salt; that data block is then iteratively
85
MD5-hashed 65,536 times to generate the session key encryption
86
key.}. In later releases, the user will have the option of associating
87
multiple authentication tokens with each file via a dynamically
90
eCryptfs stores authentication tokens in the user's session
91
keyring. Authentication tokens may either be \emph{instantiated} or
92
\emph{uninstantiated}. Instantiated authentication tokens contain the
93
secret value necessary to encrypt and decrypt the session
94
key. Uninstantiated authentication tokens contain only enough
95
information to generate the packets that are written to the header of
98
When eCryptfs opens an encrypted file, it attempts to match the set of
99
authentication tokens contained in the header of the file against the
100
set of instantiated authentication tokens in the user's session
101
keyring\footnote{Note that release 0.1 only supports one mount-wide
102
authentication token.}. If eCryptfs can make at least one match, then
103
it uses that instantiated authentication token to decrypt the session
104
key that is used to encrypt and decrypt the file contents on page
105
write and read operations. If no instantiated authentication tokens
106
are found, then eCryptfs attempts to instantiate one of the existing
107
uninstantiated authentication tokens in the session keyring. This
108
process may include, for instance, prompting the user for a passphrase
109
or retrieving a private key from the user's GnuPG keyring\footnote{In
110
general, in order to preserve transparency, instantiation attempts
111
will begin with options that do not require user interaction.
112
Ultimately, instantiation behavior will be policy-directed.}.
114
Userspace applications are largely responsible for authentication
115
token management. Authentication tokens may be created and manipulated
116
by such things as PAM modules or daemon processes. For instance, when
117
a user logs in, a PAM module may capture the login passphrase,
118
generate an instanatiated authentication token from that passphrase,
119
and insert that authentication token into the user's session key. When
120
the user later tries to open a file that has been associated with that
121
particular authentication token, eCryptfs will find the authentication
122
token and use it to decrypt the session key to access the file
123
contents. This is all done in a manner that is transparent to the
124
application making the file operation request.
126
\section*{Cryptographic Operations}
128
\subsection*{Confidentiality Enforcement}
130
eCryptfs enforces the confidentiality of the data that is outside the
131
control of the host operating environment by encrypting the contents
132
of the file objects containing the data. eCryptfs utilizes the Linux
133
kernel cryptographic API to perform the encryption and decryption of
134
the contents of its files over subregions known as \emph{extents}.
136
The length of each extent is fixed to the page size (typically $4096$
137
bytes). When a \emph{readpage()} request comes through as the result
138
of a VFS syscall, eCryptfs will interpolate the page index to find the
139
corresponding extent in the lower (encrypted) file. eCryptfs reads
140
this extent in and then decrypts the page, which is encrypted in CBC
141
mode with whatever cipher the user selected for the file at the time
142
the file was created.
144
eCryptfs derives the initialization vector (IV) for each extent from
145
the \emph{root initialization vector} stored in the header extent of
146
the file. The root IV is generated for each newly created file in the
147
same manner that the session key is generated (with a call to the
148
Linux kernel function \emph{get\_random\_bytes()}). The extent IV
149
derivation process entails taking the MD5 sum of the root IV
150
concatenated with the ASCII decimal characters reflecting the extent
151
index\footnote{For efficiency, future versions of eCryptfs are likely
152
to simply cast the least significant bits of the root IV to an
153
unsigned long data type and then add the extent index to that value.}.
155
When a \emph{writepage()} request comes through as a result of a VFS
156
syscall, eCryptfs will read the target extent from the underlying file
157
using the process described in the prior paragraph. The data on that
158
page is modified according to the write request. The entire (modified)
159
page is re-encrypted (again, in CBC mode) )with the same IV and key that
160
were used to originally encrypt the page and written out to the
163
\subsection*{Integrity Verification}
165
eCryptfs (releases 0.2 and higher) verifies the integrity of the data
166
that has been outside the control of the host operating environment by
167
generating and storing HMAC values. eCryptfs utilizes the Linux kernel
168
cryptographic API to generate and verify HMAC values over extents.
170
\section*{File Format}
172
File formats vary depending on whether or not a file is encrypted and
173
whether or not a file is HMAC verified. Release 0.1 only supports
174
files that are encrypted and have no HMAC. This release also only
175
supports a mount-wide passphrase, and so the packet set consists only
176
of a Tag 3 followed by a Tag 11 packet.
178
The first 20 bytes consist of the file size, the eCryptfs marker/version
179
value, and a set of status flags. From byte 20 on, only
180
RFC2440-compliant packets are valid.
186
Octets 0-7: Unencrypted file size
187
Octets 8-15: eCryptfs special marker/version
189
Octet 20: Begin RFC2440 authentication token packet set
190
Octets 4088-4095: (optional) page index for remainder of the packet set
192
Extent 0 (CBC encrypted)
194
Extent 1 (CBC encrypted)
197
(optional) continuation of RFC2440 authentication token packet set
199
ENCRYPTED or UNENCRYPTED, HMAC (16-byte):
201
Octets 0-7: Unencrypted file size
202
Octets 8-15: eCryptfs special marker/version
204
Octet 20: Begin RFC2440 authentication token packet set
205
Octets 4088-4095: (optional) page index for remainder of the packet set
207
HMAC records for extents 0-X (X=(PAGE_SIZE/16)-1)
209
Extent 0 (CBC encrypted)
211
Extent 1 (CBC encrypted)
214
Extent X (CBC encrypted)
216
HMAC records for extents (X+1)-(2*X)
221
(optional) continuation of RFC2440 authentication token packet set
223
UNENCRYPTED, NO HMAC:
224
Identical w/ original file
229
The RFC2440 packet set includes Tag 3, Tag 11, and Tag 1 packets. A
230
Tag 3 (passphrase) packet is immediately followed by a Tag 11
231
(literal) packet containing the identifier for the passphrase in the
232
Tag 3 packet. This identifier is formed by hashing the key that is
233
generated from the passphrase in the String-to-Key (S2K) operation.
235
The total number of packets cannot exceed the page size (4096 bytes on
236
most architectures) for eCryptfs release 0.1. Later releases will
237
allow for additional packets to be written at the end of the file.
239
\section*{Deployment Considerations}
241
Use dm-crypt to encrypt your swap space (random key generated on each
244
Use Trusted Computing Module (TPM) to protect encryption key. One way
245
to accomplish this is to generate a random key, ask the TPM to use its
246
unexportable private key to encrypt the random key, and store the
247
encrypted random key. The TPM can refuse to decrypt the random key
248
unless the machine is booted into a secure configuration. The
249
hexadecimal representation of the random key can be used as the
250
mount-wide passphrase.
252
\section*{Functional Overview}
254
eCryptfs is a stacked filesystem that is implemented natively in the
255
Linux kernel VFS. Since eCryptfs is stacked, it does not write
256
directly into a block device. Instead, it mounts on top of a directory
257
in a \emph{lower} filesystem. Most any filesystem may act as a lower
258
filesystem, including EXT3 or JFS. Objects in the eCryptfs filesystem,
259
including \emph{inode}, \emph{dentry}, and \emph{file} objects,
260
correlate in a one-to-one basis with the objects in the lower
263
\subsection*{VFS Objects}
265
eCryptfs maintains the reference between the objects in the eCryptfs
266
filesystem and the objects in the lower filesystem. The references are
267
maintained from eCryptfs via (1) the file object's \emph{private_data}
268
pointer, (2) the inode object's \emph{u.generic_ip} pointer, (3) the
269
dentry object's \emph{d_fsdata} pointer, and (4) the superblock
270
object's \emph{s_fs_info} pointer. These pointers point to special
271
data structures that maintain information necessary to perform the
272
cryptographic operations.
276
\begin{thebibliography}{1}
279
\url{http://www.linuxsymposium.org/2006/proceedings.php}.
281
\end{thebibliography}
b'\\ No newline at end of file'