5
By: David Howells <dhowells@redhat.com>
11
(*) Types of credentials.
17
- Immutable credentials.
18
- Accessing task credentials.
19
- Accessing another task's credentials.
20
- Altering credentials.
21
- Managing credentials.
23
(*) Open file credentials.
25
(*) Overriding the VFS's use of credentials.
32
There are several parts to the security check performed by Linux when one
33
object acts upon another:
37
Objects are things in the system that may be acted upon directly by
38
userspace programs. Linux has a variety of actionable objects, including:
44
- Shared memory segments
48
As a part of the description of all these objects there is a set of
49
credentials. What's in the set depends on the type of object.
53
Amongst the credentials of most objects, there will be a subset that
54
indicates the ownership of that object. This is used for resource
55
accounting and limitation (disk quotas and task rlimits for example).
57
In a standard UNIX filesystem, for instance, this will be defined by the
58
UID marked on the inode.
60
(3) The objective context.
62
Also amongst the credentials of those objects, there will be a subset that
63
indicates the 'objective context' of that object. This may or may not be
64
the same set as in (2) - in standard UNIX files, for instance, this is the
65
defined by the UID and the GID marked on the inode.
67
The objective context is used as part of the security calculation that is
68
carried out when an object is acted upon.
72
A subject is an object that is acting upon another object.
74
Most of the objects in the system are inactive: they don't act on other
75
objects within the system. Processes/tasks are the obvious exception:
76
they do stuff; they access and manipulate things.
78
Objects other than tasks may under some circumstances also be subjects.
79
For instance an open file may send SIGIO to a task using the UID and EUID
80
given to it by a task that called fcntl(F_SETOWN) upon it. In this case,
81
the file struct will have a subjective context too.
83
(5) The subjective context.
85
A subject has an additional interpretation of its credentials. A subset
86
of its credentials forms the 'subjective context'. The subjective context
87
is used as part of the security calculation that is carried out when a
90
A Linux task, for example, has the FSUID, FSGID and the supplementary
91
group list for when it is acting upon a file - which are quite separate
92
from the real UID and GID that normally form the objective context of the
97
Linux has a number of actions available that a subject may perform upon an
98
object. The set of actions available depends on the nature of the subject
101
Actions include reading, writing, creating and deleting files; forking or
102
signalling and tracing tasks.
104
(7) Rules, access control lists and security calculations.
106
When a subject acts upon an object, a security calculation is made. This
107
involves taking the subjective context, the objective context and the
108
action, and searching one or more sets of rules to see whether the subject
109
is granted or denied permission to act in the desired manner on the
110
object, given those contexts.
112
There are two main sources of rules:
114
(a) Discretionary access control (DAC):
116
Sometimes the object will include sets of rules as part of its
117
description. This is an 'Access Control List' or 'ACL'. A Linux
118
file may supply more than one ACL.
120
A traditional UNIX file, for example, includes a permissions mask that
121
is an abbreviated ACL with three fixed classes of subject ('user',
122
'group' and 'other'), each of which may be granted certain privileges
123
('read', 'write' and 'execute' - whatever those map to for the object
124
in question). UNIX file permissions do not allow the arbitrary
125
specification of subjects, however, and so are of limited use.
127
A Linux file might also sport a POSIX ACL. This is a list of rules
128
that grants various permissions to arbitrary subjects.
130
(b) Mandatory access control (MAC):
132
The system as a whole may have one or more sets of rules that get
133
applied to all subjects and objects, regardless of their source.
134
SELinux and Smack are examples of this.
136
In the case of SELinux and Smack, each object is given a label as part
137
of its credentials. When an action is requested, they take the
138
subject label, the object label and the action and look for a rule
139
that says that this action is either granted or denied.
146
The Linux kernel supports the following types of credentials:
148
(1) Traditional UNIX credentials.
153
The UID and GID are carried by most, if not all, Linux objects, even if in
154
some cases it has to be invented (FAT or CIFS files for example, which are
155
derived from Windows). These (mostly) define the objective context of
156
that object, with tasks being slightly different in some cases.
158
Effective, Saved and FS User ID
159
Effective, Saved and FS Group ID
162
These are additional credentials used by tasks only. Usually, an
163
EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
164
will be used as the objective. For tasks, it should be noted that this is
169
Set of permitted capabilities
170
Set of inheritable capabilities
171
Set of effective capabilities
172
Capability bounding set
174
These are only carried by tasks. They indicate superior capabilities
175
granted piecemeal to a task that an ordinary task wouldn't otherwise have.
176
These are manipulated implicitly by changes to the traditional UNIX
177
credentials, but can also be manipulated directly by the capset() system
180
The permitted capabilities are those caps that the process might grant
181
itself to its effective or permitted sets through capset(). This
182
inheritable set might also be so constrained.
184
The effective capabilities are the ones that a task is actually allowed to
187
The inheritable capabilities are the ones that may get passed across
190
The bounding set limits the capabilities that may be inherited across
191
execve(), especially when a binary is executed that will execute as UID 0.
193
(3) Secure management flags (securebits).
195
These are only carried by tasks. These govern the way the above
196
credentials are manipulated and inherited over certain operations such as
197
execve(). They aren't used directly as objective or subjective
200
(4) Keys and keyrings.
202
These are only carried by tasks. They carry and cache security tokens
203
that don't fit into the other standard UNIX credentials. They are for
204
making such things as network filesystem keys available to the file
205
accesses performed by processes, without the necessity of ordinary
206
programs having to know about security details involved.
208
Keyrings are a special type of key. They carry sets of other keys and can
209
be searched for the desired key. Each process may subscribe to a number
216
When a process accesses a key, if not already present, it will normally be
217
cached on one of these keyrings for future accesses to find.
219
For more information on using keys, see Documentation/security/keys.txt.
223
The Linux Security Module allows extra controls to be placed over the
224
operations that a task may do. Currently Linux supports two main
225
alternate LSM options: SELinux and Smack.
227
Both work by labelling the objects in a system and then applying sets of
228
rules (policies) that say what operations a task with one label may do to
229
an object with another label.
233
This is a socket-based approach to credential management for networking
234
stacks [RFC 2367]. It isn't discussed by this document as it doesn't
235
interact directly with task and file credentials; rather it keeps system
239
When a file is opened, part of the opening task's subjective context is
240
recorded in the file struct created. This allows operations using that file
241
struct to use those credentials instead of the subjective context of the task
242
that issued the operation. An example of this would be a file opened on a
243
network filesystem where the credentials of the opened file should be presented
244
to the server, regardless of who is actually doing a read or a write upon it.
251
Files on disk or obtained over the network may have annotations that form the
252
objective security context of that file. Depending on the type of filesystem,
253
this may include one or more of the following:
255
(*) UNIX UID, GID, mode;
259
(*) Access control list;
261
(*) LSM security label;
263
(*) UNIX exec privilege escalation bits (SUID/SGID);
265
(*) File capabilities exec privilege escalation bits.
267
These are compared to the task's subjective security context, and certain
268
operations allowed or disallowed as a result. In the case of execve(), the
269
privilege escalation bits come into play, and may allow the resulting process
270
extra privileges, based on the annotations on the executable file.
277
In Linux, all of a task's credentials are held in (uid, gid) or through
278
(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
279
Each task points to its credentials by a pointer called 'cred' in its
282
Once a set of credentials has been prepared and committed, it may not be
283
changed, barring the following exceptions:
285
(1) its reference count may be changed;
287
(2) the reference count on the group_info struct it points to may be changed;
289
(3) the reference count on the security data it points to may be changed;
291
(4) the reference count on any keyrings it points to may be changed;
293
(5) any keyrings it points to may be revoked, expired or have their security
294
attributes changed; and
296
(6) the contents of any keyrings to which it points may be changed (the whole
297
point of keyrings being a shared set of credentials, modifiable by anyone
298
with appropriate access).
300
To alter anything in the cred struct, the copy-and-replace principle must be
301
adhered to. First take a copy, then alter the copy and then use RCU to change
302
the task pointer to make it point to the new copy. There are wrappers to aid
303
with this (see below).
305
A task may only alter its _own_ credentials; it is no longer permitted for a
306
task to alter another's credentials. This means the capset() system call is no
307
longer permitted to take any PID other than the one of the current process.
308
Also keyctl_instantiate() and keyctl_negate() functions no longer permit
309
attachment to process-specific keyrings in the requesting process as the
310
instantiating process may need to create them.
313
IMMUTABLE CREDENTIALS
314
---------------------
316
Once a set of credentials has been made public (by calling commit_creds() for
317
example), it must be considered immutable, barring two exceptions:
319
(1) The reference count may be altered.
321
(2) Whilst the keyring subscriptions of a set of credentials may not be
322
changed, the keyrings subscribed to may have their contents altered.
324
To catch accidental credential alteration at compile time, struct task_struct
325
has _const_ pointers to its credential sets, as does struct file. Furthermore,
326
certain functions such as get_cred() and put_cred() operate on const pointers,
327
thus rendering casts unnecessary, but require to temporarily ditch the const
328
qualification to be able to alter the reference count.
331
ACCESSING TASK CREDENTIALS
332
--------------------------
334
A task being able to alter only its own credentials permits the current process
335
to read or replace its own credentials without the need for any form of locking
336
- which simplifies things greatly. It can just call:
338
const struct cred *current_cred()
340
to get a pointer to its credentials structure, and it doesn't have to release
343
There are convenience wrappers for retrieving specific aspects of a task's
344
credentials (the value is simply returned in each case):
346
uid_t current_uid(void) Current's real UID
347
gid_t current_gid(void) Current's real GID
348
uid_t current_euid(void) Current's effective UID
349
gid_t current_egid(void) Current's effective GID
350
uid_t current_fsuid(void) Current's file access UID
351
gid_t current_fsgid(void) Current's file access GID
352
kernel_cap_t current_cap(void) Current's effective capabilities
353
void *current_security(void) Current's LSM security pointer
354
struct user_struct *current_user(void) Current's user account
356
There are also convenience wrappers for retrieving specific associated pairs of
357
a task's credentials:
359
void current_uid_gid(uid_t *, gid_t *);
360
void current_euid_egid(uid_t *, gid_t *);
361
void current_fsuid_fsgid(uid_t *, gid_t *);
363
which return these pairs of values through their arguments after retrieving
364
them from the current task's credentials.
367
In addition, there is a function for obtaining a reference on the current
368
process's current set of credentials:
370
const struct cred *get_current_cred(void);
372
and functions for getting references to one of the credentials that don't
373
actually live in struct cred:
375
struct user_struct *get_current_user(void);
376
struct group_info *get_current_groups(void);
378
which get references to the current process's user accounting structure and
379
supplementary groups list respectively.
381
Once a reference has been obtained, it must be released with put_cred(),
382
free_uid() or put_group_info() as appropriate.
385
ACCESSING ANOTHER TASK'S CREDENTIALS
386
------------------------------------
388
Whilst a task may access its own credentials without the need for locking, the
389
same is not true of a task wanting to access another task's credentials. It
390
must use the RCU read lock and rcu_dereference().
392
The rcu_dereference() is wrapped by:
394
const struct cred *__task_cred(struct task_struct *task);
396
This should be used inside the RCU read lock, as in the following example:
398
void foo(struct task_struct *t, struct foo_data *f)
400
const struct cred *tcred;
403
tcred = __task_cred(t);
406
f->groups = get_group_info(tcred->groups);
411
Should it be necessary to hold another task's credentials for a long period of
412
time, and possibly to sleep whilst doing so, then the caller should get a
413
reference on them using:
415
const struct cred *get_task_cred(struct task_struct *task);
417
This does all the RCU magic inside of it. The caller must call put_cred() on
418
the credentials so obtained when they're finished with.
420
[*] Note: The result of __task_cred() should not be passed directly to
421
get_cred() as this may race with commit_cred().
423
There are a couple of convenience functions to access bits of another task's
424
credentials, hiding the RCU magic from the caller:
426
uid_t task_uid(task) Task's real UID
427
uid_t task_euid(task) Task's effective UID
429
If the caller is holding the RCU read lock at the time anyway, then:
431
__task_cred(task)->uid
432
__task_cred(task)->euid
434
should be used instead. Similarly, if multiple aspects of a task's credentials
435
need to be accessed, RCU read lock should be used, __task_cred() called, the
436
result stored in a temporary pointer and then the credential aspects called
437
from that before dropping the lock. This prevents the potentially expensive
438
RCU magic from being invoked multiple times.
440
Should some other single aspect of another task's credentials need to be
441
accessed, then this can be used:
443
task_cred_xxx(task, member)
445
where 'member' is a non-pointer member of the cred struct. For instance:
447
uid_t task_cred_xxx(task, suid);
449
will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
450
magic. This may not be used for pointer members as what they point to may
451
disappear the moment the RCU read lock is dropped.
457
As previously mentioned, a task may only alter its own credentials, and may not
458
alter those of another task. This means that it doesn't need to use any
459
locking to alter its own credentials.
461
To alter the current process's credentials, a function should first prepare a
462
new set of credentials by calling:
464
struct cred *prepare_creds(void);
466
this locks current->cred_replace_mutex and then allocates and constructs a
467
duplicate of the current process's credentials, returning with the mutex still
468
held if successful. It returns NULL if not successful (out of memory).
470
The mutex prevents ptrace() from altering the ptrace state of a process whilst
471
security checks on credentials construction and changing is taking place as
472
the ptrace state may alter the outcome, particularly in the case of execve().
474
The new credentials set should be altered appropriately, and any security
475
checks and hooks done. Both the current and the proposed sets of credentials
476
are available for this purpose as current_cred() will return the current set
480
When the credential set is ready, it should be committed to the current process
483
int commit_creds(struct cred *new);
485
This will alter various aspects of the credentials and the process, giving the
486
LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
487
commit the new credentials to current->cred, it will release
488
current->cred_replace_mutex to allow ptrace() to take place, and it will notify
489
the scheduler and others of the changes.
491
This function is guaranteed to return 0, so that it can be tail-called at the
492
end of such functions as sys_setresuid().
494
Note that this function consumes the caller's reference to the new credentials.
495
The caller should _not_ call put_cred() on the new credentials afterwards.
497
Furthermore, once this function has been called on a new set of credentials,
498
those credentials may _not_ be changed further.
501
Should the security checks fail or some other error occur after prepare_creds()
502
has been called, then the following function should be invoked:
504
void abort_creds(struct cred *new);
506
This releases the lock on current->cred_replace_mutex that prepare_creds() got
507
and then releases the new credentials.
510
A typical credentials alteration function would look something like this:
512
int alter_suid(uid_t suid)
517
new = prepare_creds();
522
ret = security_alter_suid(new);
528
return commit_creds(new);
535
There are some functions to help manage credentials:
537
(*) void put_cred(const struct cred *cred);
539
This releases a reference to the given set of credentials. If the
540
reference count reaches zero, the credentials will be scheduled for
541
destruction by the RCU system.
543
(*) const struct cred *get_cred(const struct cred *cred);
545
This gets a reference on a live set of credentials, returning a pointer to
546
that set of credentials.
548
(*) struct cred *get_new_cred(struct cred *cred);
550
This gets a reference on a set of credentials that is under construction
551
and is thus still mutable, returning a pointer to that set of credentials.
554
=====================
555
OPEN FILE CREDENTIALS
556
=====================
558
When a new file is opened, a reference is obtained on the opening task's
559
credentials and this is attached to the file struct as 'f_cred' in place of
560
'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid
561
should now access file->f_cred->fsuid and file->f_cred->fsgid.
563
It is safe to access f_cred without the use of RCU or locking because the
564
pointer will not change over the lifetime of the file struct, and nor will the
565
contents of the cred struct pointed to, barring the exceptions listed above
566
(see the Task Credentials section).
569
=======================================
570
OVERRIDING THE VFS'S USE OF CREDENTIALS
571
=======================================
573
Under some circumstances it is desirable to override the credentials used by
574
the VFS, and that can be done by calling into such as vfs_mkdir() with a
575
different set of credentials. This is done in the following places: