1
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
4
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
5
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
6
<meta name="GENERATOR" content="Mozilla/4.76C-CCK-MCD Netscape [en] (X11; U; SunOS 5.8 sun4u) [Netscape]">
7
<meta name="CREATED" content="20010611;10370600">
8
<meta name="CHANGEDBY" content="Andre Alefeld">
9
<meta name="CHANGED" content="20010611;11590200">
14
<font color="#990000">Grid Engine Kerberization Implementation</font></h2>
15
Kerberos support is implemented in Grid Engine primarily through the use
16
of a set of library routines. The source code for the library is in krb/krb_lib.c.
17
<br>The header file is krb/krb_lib.h. There are two routines, krb_send_message()
18
and krb_receive_message() which replace the normal send_message() and receive_message()
19
routines used by all Grid Engine daemons and client processes.
20
<br>The krb_send_message() and krb_receive_message() routines take care
21
of authenticating and encrypting messages which are passed between processes.
22
<br>The <a href="#Original Design Notes">original design</a> is outlined
24
<br>To build a Kerberized version of Grid Engine proceed as follows:
26
<br>check aimk.site and adapt the corresponding Kerberos V related paths
27
(Kerberos can be obtained from <a href="http://www.crypto-publish.org">http://www.crypto-publish.org</a>)
29
<p>Install the system as usual, the setup of Kerberos itself is beyond
30
the scope of this document. Additional hints can be found in the <a href="ReleaseNotes.html">ReleaseNotes</a>.
32
<font color="#990000">Authentication</font></h2>
33
Authentication of Grid Engine clients and daemons to the qmaster is accomplished
34
within the krb_send_message() and krb_receive_message() routines.
35
<br>Each message sent from a Grid Engine client or daemon to the qmaster
36
contains the necessary information for the qmaster to authenticate the
37
sender of the message (i.e. a Kerberos AP_REQ packet).
38
<br>The actual data in the message is also encrypted in the session key,
39
which is a key obtained from the KDC which is private between the Grid
40
Engine client/daemon and the qmaster.
41
<p>The message may also contain a forwarded TGT which is also encrypted
43
<br>A connection list is maintained by the library routines in the qmaster
44
to track the state of each qmaster client (including the Grid Engine daemons).
45
<br>The connection list is keyed based on the client name, host name, and
46
connection ID. Client connections are removed from the connection list
47
after a period of inactivity in the krb_check_for_idle_clients() routine,
48
which is called regularly in qmaster.
49
<br>The connection list is primarily used as a holding place for client-specific
50
information passed between the kerberos library and higher level routines.
51
For example, upon receiving an GDI request, the qmaster verifies that the
52
user name passed in the GDI request matches the user name used during Kerberos
53
authentication by calling krb_verify_user() and passing the client information
54
and the user name as a parameter. The user name associated with connection
55
entry is looked up based on the client information, and if the user names
56
do not match, then the GDI request is denied.
58
<font color="#990000">Message Encryption</font></h2>
59
All message data passed between a Grid Engine client or daemon and the
60
qmaster is encrypted in a session key which is private between the Grid
61
Engine client or daemon and the qmaster.
63
<font color="#990000">Forwarding TGTs</font></h2>
64
Kerberized Grid Engine also automatically forwards ticket-granting-tickets
65
(TGTs) to the execution host. In order for TGT forwarding to work, the
66
client must have requested "forwardable" tickets in the initial kinit(1)
67
request. This allows jobs submitted using Grid Engine to automatically
68
have Kerberos tickets and to run kerberized applications without the user
69
having to login to each execution host and execute a kinit(1).
70
<p>When a job is submitted from qmon, qsub, or qsh, the sge_gdi_multi()
71
routine (sge_gdi_request.c) calls krb_set_client_flags() (krb_lib.c) to
72
set the KRB_FORWARD_TGT flag, which informs the kerberos library to forward
73
the TGT for any subsequent krb_send_message() messages.
74
<br>When the krb_send_message() routine is called to send the job to the
75
qmaster, the TGT is acquired, encoded, and sent as part of the encoded
76
message to the qmaster. The TGT is encrypted in the private session key
77
of the client process and the qmaster.
78
<p>After sending the message, the sge_gdi_multi() routine calls krb_set_client_flags()
79
to clear the KRB_FORWARD_TGT flag. When the message is received by the
80
qmaster, the TGT is stored in the client connection entry maintained by
81
the Grid Engine kerberos library routines.
82
<br>When the job is added to the job list in the sge_gdi_add_job() routine
83
(sge_job.c), the TGT is retrieved from the connection entry using krb_get_tgt()
84
(krb_lib.c), and is then encrypted in the qmaster's private key using krb_encrypt_tgt_creds()
85
(krb_lib.c), and is converted to a string using krb_bin2str() (krb_lib.c),
86
and is stored in the job entry in the JB_tgt field.
87
<p>This allows the TGT to be spooled as part of the job entry but also
88
to be protected since it is encrypted. Once a job is to be executed on
89
an execution host, the send_job routine() (sge_give_jobs.c), just before
90
sending the message, calls krb_str2bin() (krb_lib.c) to convert the TGT
91
stored in the JB_tgt field of the job entry, from string to binary. The
92
TGT is then decrypted using krb_decrypt_tgt_creds() (krb_lib.c), and is
93
stored in the connection entry associated with the execution daemon that
94
the message is being sent to using krb_put_tgt() (krb_lib.c). When the
95
krb_send_message() routine (krb_lib.c) is executed, the TGT is forwarded
96
to the execution daemon. Upon receiving the message in the execution daemon,
97
the krb_receive_message() routine decrypts and saves the TGT locally. The
98
execd_job_exec_() routine (execd_job_exec.c), then calls krb_get_tgt()
99
(krb_lib.c) to retrieve the saved TGT and calls krb_store_forwarded_tgt()
100
(krb_lib.c) which stores in the forwarded TGT in a specific credentials
101
cache created for the user for this job. The credentials cache is stored
102
in /tmp/krb5cc_sge_%d where %d is the job ID. The KRB5CCNAME environment
103
variable for the job is updated to point to this credentials cache.
104
<p>Any kerberized applications executed by the job will automatically use
105
the TGT located in the credentials cache. When the job completes and is
106
cleaned up, the credentials cache is destroyed by calling krb_destroy_forwarded_tgt()
107
from clean_up_job() (reaper_execd.c).
109
<font color="#990000">Renewing TGTs</font></h2>
110
Kerberized Grid Engine automatically renews the kerberos ticket for the
111
"sge" service which is required by the Grid Engine sge_schedd and sge_execd
112
daemons to authenticate themselves to the Grid Engine qmaster. The ticket
113
is renewed by checking the credentials of the daemon to see if the ticket
114
has expired or is about to expire before sending a message to the qmaster.
115
<p>Kerberized Grid Engine also handles renewing TGTs on behalf of the client.
116
In order for TGTs to be renewed the client must have requested both <i>forwardable</i>
117
and <i>renewable</i> <i>TGTs</i> in the initial kinit(1) request.
118
<br>This allows long running jobs or jobs which are queued for a long period
119
of time to still maintain a valid TGT when they are executed on the execution
120
hosts. TGTs are renewed on both the execution host and the qmaster host
121
until the job is complete to ensure that if the job is restarted on a different
122
host, it will still have a valid TGT.
123
<p>Tickets are renewed in the krb_renew_tgts() routine which is regularly
124
called by both the qmaster and the execution daemon. The krb_renew_tgts()
125
routine, which is executed once per TGT renewal interval, goes through
126
the list of jobs and checks to see if the TGT will expire within the TGT
128
<br>If so, a new TGT is acquired from the KDC and is stored back into the
129
job entry. If executing in the execution daemon, the new TGT is also written
130
to the user's credentials cache, where it will be used by any Kerberized
131
applications running in the job.
135
<a NAME="Original Design Notes"></a><font color="#990000">Original Design
137
In February 1997 an attempt was made to integrate Grid Engine into a Kerberos
140
<font color="#990000">Problems with Kerberizing Grid Engine</font></h2>
141
Once you have a good understanding of the basics of Kerberos and how the
142
API library works, it appears from the sample client/server code that it
143
would be very easy to kerberize an application. This is not really the
144
case. The sample code deals with an extremely simplistic client/server
145
application. It is very easy to kerberize a simple client/server application.
146
Unfortunately, most existing client/server applications aren't simple.
147
Many of the problems with kerberizing an application are not addressed
148
in this simple example code. For instance, a real server will likely have
149
multiple clients which must be tracked individually. This involves keeping
150
some internal data structures on a per-client basis which may or may not
151
exist in the application.
152
<p>Another difficult problem is that in an existing "real-life" server
153
application the communications protocol is likely implemented in a separate
154
layer than the application code. This insulates the application code from
155
having to deal with the specifics of the communication layer. This presents
156
some real difficulties in implementing security, because the security protocol
157
needs information from both levels.
158
<p>Another significant problem with kerberizing Grid Engine deals with
159
tracking clients. In a typical client/server application using stream based
160
sockets to pass messages between the client and the server, the server
161
would initially authenticate the client either before or as part of the
162
first message sent from the client to the server. All communications (i.e.
163
messages) between the client and the server would then be encrypted using
164
the secret session key known only to the client and the server. The server
165
associates messages with the clients based on the socket the message is
166
read from. If a message comes across the socket, the server uses the secret
167
key associated with the client to decrypt the message. If the socket goes
168
down, this indicates that communications between the client and the server
169
have been disrupted and the server can clean up any data structures that
170
the client had. The design of Grid Engine introduces a different communication
171
model. Instead of maintaining a socket per client, Grid Engine servers
172
and clients communicate through a set of services that are part of a communication
173
library. The client and server actually communicate through one or more
174
communication daemons which dynamically manage socket connections. This
175
insulates the application from the details of socket management and handles
176
various problems associated with server communications such as buffering
177
data and running out of file descriptors. It also makes it difficult to
178
know if a client is up or down or reachable at a given time since there
179
is no socket directly associated with the client.
180
<p>One possible solution for handling the connectionless nature of Grid
181
Engine communications was to authenticate each individual message passed
182
between the client and the server. However, upon further investigation,
183
this proved to be an inadequate solution. One reason for this is that a
184
simple transaction consisting of a client sending a request to the server
185
and getting back the response generates two messages. One from the client
186
to the server and one from the server back to the client. It makes sense
187
for the server to authenticate the client, but the client should not have
188
to explicitly authenticate the server.
189
<p>Even a simple transaction has an implied state. A request is sent from
190
the client, the server performs some service on behalf of the client, and
191
a response is sent back to the client. Because the response is tied to
192
the request, there exists an implied state. The server must, at minimum,
193
authenticate the request, decrypt it, and encrypt the response back to
194
the client sufficiently that only the client can read the response. This
195
means that the server must at least maintain the state of the client internally
196
for the length of the transaction. An outgrowth of the design for handling
197
this case is that the server can actually handle additional messages from
198
the client without having to reauthenticate each message. Instead, each
199
message after the first is decrypted using the secret client/server session
203
<font color="#990000">Grid Engine Kerberization Level of Effort Scope</font></h2>
204
There are two levels of effort in Kerberizing Grid Engine. The first
205
level of effort is the authentication of Grid Engine clients and servers.
206
It turns out that the authentication of Grid Engine clients and the authentication
207
between the Grid Engine servers are both actually accomplished with the
208
same design and code. The basic design for this first level of effort was
209
completed by Shannon Davidson and Andre Alefeld as part of the February
210
8-13, 1997 trip to GENIAS. The second level of effort in kerberizing Grid
211
Engine is the acquisition of Kerberos tickets on behalf of the client's
212
job on the execution host. This involves issues such as acquiring ticket-granting-tickets
213
on behalf of a client, handling/preventing ticket expirations, storing
214
tickets for future use, and forwarding tickets to the execution host. There
215
are also a number of design and performance related issues associated with
216
the second level of kerberizing Grid Engine. Some of these issues were
217
identified during the February 8-13 trip, but the actual design work has
218
not yet been completed.
220
<font color="#990000">Grid Engine Kerberization - First Level of Effort</font></h2>
221
First Level Kerberization is handled by replacing the generic send_message
222
and receive_message routines in the commd library with kerberized versions.
223
The kerberized versions of the routines will maintain the state of the
224
connection and take care of the authentication of clients, as well as encrypting
225
and decrypting messages passed between clients and servers. Handling authentication
226
at this level means that any code using the standard commd library will
227
automatically have authentication. These routines will ensure that a user
228
is who he says he is. Higher level routines are responsible for determining
229
if that user is actually an authorized user of Grid Engine. These routines
230
will behave differently when acting on behalf of a client or server. In
231
this model, the qmaster acts as the server, and all other Grid Engine daemons
232
and client programs act as clients. When acting as a server, these routines
233
will maintain a connection list which tracks the clients. When a client
234
first connects to a server, the client will be authenticated. If authentication
235
fails, a failure message will be sent back to the client indicating the
236
failure. If the client is not a Grid Engine daemon the failure message
237
will be displayed on the screen. If the client is successfully authenticated,
238
a connection entry will be created and added to the connection list. The
239
connection entry contains enough information to uniquely identify the client
240
based on information received in the receive_message routine. All later
241
messages received from or sent to this client will be encrypted. If the
242
connection is idle for a period of time (i.e. no messages sent or received
243
on the connection), the connection entry will be removed from the connection
244
list. A specific routine will be written which will check for connections
245
which have "timed-out" and "clean them up". This routine will be called
246
from the send_message and/or receive_message routines and may also be called
247
separately from within a chk_to_do routine in a Grid Engine daemon.
249
Grid Engine Kerberization Level One Design</h4>
251
<br> call krb5_init
252
<br> if we are a grid engine daemon
253
<br> setup connection lists
254
<br> set internal is_sge_daemon
256
<br> endif
257
<br> set connection ID to 0
258
<br> if (is_sge_daemon)
259
<br> get TGT from keytab file
260
<br> endif
261
<br> if (!qmaster)
262
<br> go get ticket for the qmaster/sge
264
<br> endif
265
<p>sec_krb_send_message
266
<br> if (!qmaster && !connected)
267
<br> build an AP_REQ authentication
269
<br> (krb5_mk_req)
270
<br> endif
271
<br> if (qmaster)
272
<br> lookup auth_context using
274
<br> else
275
<br> use the local auth_context
276
<br> endif
277
<br> if (!qmaster)
278
<br> put connection ID in the
280
<br> endif
281
<br> call krb5_mk_priv to encrypt message
282
<br> call send_message to send [ AP_REQ + ] message
283
<p>sec_krb_receive_message
284
<br> call recv_message to receive message
285
<br> if (*tag == TAG_AUTHENTICATE)
286
<br> if (qmaster) {
287
<br>
288
call krb5_rd_req to authenticate client
289
<br>
290
if authentication fails
291
<br>
292
send TAG_AUTHENTICATE message back to the client
293
<br>
295
<br>
297
<br> } else (is_a_daemon) {
298
<br>
300
<br>
302
<br> } else /* its a normal client
304
<br>
305
print TAG_AUTHENTICATE message to stderr
306
<br>
308
<br>
310
<br> } endif
311
<br> if (qmaster)
312
<br>
314
<br>
315
look up connection ID in connection list
316
<br>
317
get auth_context from connetion list
318
<br> else
319
<br>
320
get connection ID from msg
321
<br>
322
compare connection ID to connection ID in msg
323
<br>
324
set auth_context to local auth_context
325
<br> endif
326
<br> decrypt message
327
<br> if decryption fails
328
<br>
329
send TAG_AUTHENTICATE message back to client
330
<br>
332
<br> endif
333
<br> return message
338
<br>list of connection entry for each client
341
<br>commd host triple <host, commproc,ID>
343
<p>global client/server data
344
<br>is_sge_daemon flag
345
<br>auth_context (client)
349
Q. Is there a unique ID in the auth_context or available from the KDC which
350
could be used as the connection ID? It would need to be unique for every
351
client and also unique for each instantiation of a client. If there is
352
no unique ID that we can get from the KDC, then we may need to use a transaction
353
when authenticating the client in order to get back a connection ID from
356
<p>Q. Do we need to maintain an auth_context for each client?
357
<p>A. It appears that we need to maintain an auth_context for each client
358
for the life of the connection in the connection entry. The Kerberos libraries
359
maintain certain information such as the sequence number, etc. in this
361
<p>Q. Why is there a connection ID?
362
<p>A. The connection ID is used to uniquely identify a client connection
363
by the server. Since the Grid Engine communication mechanism is basically
364
connectionless there is nothing like a socket to uniquely identify a client.
365
A client can be identified using the commd triple which consists of the
366
host, process name, and ID, but this would not be unique for cycled processes.
367
The connection ID can either be assigned by the KDC (if possible - see
368
the first question) or assigned by the server itself during the initial
369
authentication process. Assignment by the server guarrantees uniqueness
370
of all clients regardless
371
<p>Q. Is there any way to avoid maintaining a connection list in the qmaster?
372
<p>A. It doesn't appear so. The connection list is needed so that the server
373
will know how to encrypt any messages going back from the server to the
374
client. The only way to make this association from with the sec_krb_send_message
375
routine is to map the parameters on the send_msg routine to a client in
376
the connection list and then use the auth_context from the connection entry
377
to encrypt the message.
378
<p>Q. What happens when the qmaster is cycled?
379
<p>A. The connection list is not spooled to disk, so when the qmaster is
380
cycled, all clients must reauthenticate. If the qmaster receives a message
381
from an unauthenticated client, the message will be thrown away and an
382
error message will be sent back to the client. (Would it make sense to
383
return the original offending message back to the client where it could
384
be retried?) This reauthentication could be handled in one of two ways.
385
The first method is that is a message ever fails then the client must reauthenticate
386
using an authentication transaction. The response of the authentication
387
request would contain the connection ID assigned by the server to the client.
388
Another possibility would be to include the reauthentication information
389
information (AP_REQ) in every message sent from the client to the server.
390
If the client was already authenticated in the server this portion of the
391
message would be ignored by the server. If the client was not already authenticated
392
by the server, he would be reauthenticated by the server. This would also
393
handle cases where messages destined for the qmaster are spooled in the
394
commd while the qmaster is down. Since each message would include authentication
395
info, the messages would not have to be thrown away when the qmaster comes
397
<p>Q. What happens when other Grid Engine daemons are cycled?
398
<p>A. When a Grid Engine daemon other than the qmaster is cycled and comes
399
back up, it will initially attempt to contact the qmaster. Any initial
400
message sent to the qmaster will include the authentication info (AP_REQ).
401
Any messages queued in the commd will be lost because they will have been
402
encrypted with an old session key. (Is this a problem?)
403
<p>Q. What happens when a client is cycled?
404
<p>A. Same as Grid Engine daemon.
405
<p>Q. Can the other Grid Engine daemons serve as server? This may be needed
406
for the plans that GENIAS has for enhanced parallel job support.
407
<p>A. We will look into this further during implementation. If possible,
408
we will construct the code such that any Grid Engine daemon can act as
409
a client if a message is received with the authentication information (AP_REQ)
411
<p>Q. If a client receives a message that it cannot decrypt what should
413
<p>A. If the client is a Grid Engine the message should be thrown away
414
and an error message logged. If the client is a general purpose client
415
acting on behalf of a user, an error message should be displayed to the
417
<p>Q. How do we make sure that the user doesn't authenticate himself to
418
the security routines and then simply indicate that he is someone else
419
to the higher level routines?
420
<p>A. The higher level routines (i.e. those which call receive_message)
421
need to call some security routine to verify that the user is who he says
422
he is. This is necessary because the higher level routines currently just
423
check the user ID which is passed in the message. This is not adequate
424
since a knowledgeable user could pass through the lower level security
425
routines as himself while putting a zero user ID into the message and passing
426
himself off as root in the higher level routines. There needs to be a test
427
in the higher level routines to verify that the user authenticated himself
428
as the same user that he indicated he was in the higher level message.
429
This would need to be done in the code which is calling the receive_message
430
routine. A routine needs to be written which verifies that the client is
431
a particular user. The routine could be called sec_krb_verify_client_user
432
and would be passed the user ID (or user name) and the commd triple <host,
433
commproc, ID>. The routine would look up the user in the connection list,
434
compare to the passed user ID, and return true or false.
437
<font color="#990000">Grid Engine Kerberization - Second Level of Effort</font></h2>
438
Second level Grid Engine kerberization will affect a number of different
439
pieces of code. It will probably involve changes in the job submittal client
440
modules (qsub, qmon), the queue master (sge_qmaster), the execution daemon
441
(sge_execd), and possibly the job shepherd process (sge_shepherd). It will
442
also mean changes to, at a minimum, the job structure and the message data
443
structures used to pass job information between the various modules.
444
<p>A job submitted to Grid Engine needs to have a TGT on the execution
445
host. This is necessary because certain applications that the job may need
446
to access (such as PVM) will need valid Kerberos tickets in order to work.
447
A valid TGT should be provided, since we don't know which services will
448
be needed by the job. A valid TGT on the execution host for the user of
449
the job will allow processes running under the job to get tickets for whatever
450
services the job requires.
451
<p>A number of steps are involved in order to get a valid TGT on the execution
452
host of the job. In general, a TGT is valid on a particular host or set
453
of hosts. (It is possible to get a TGT that is valid on any host in the
454
Kerberos domain, but this is not recommended.) First, the qsub or qmon
455
client program gets a forwardable TGT for the host where the qmaster is
456
executing. This TGT is passed along with the job request to the qmaster.
457
The qmaster stores this TGT in the job entry. After the decision to run
458
the job on a particular host is made, the qmaster uses the TGT in the job
459
entry to acquire a TGT which is valid on the execution host and then passes
460
that TGT to the execution daemon along with the job request. The sge_execd
461
or the sge_shepherd then stores that TGT in the user's credentials cache
462
and starts the job. The job then has access to a valid TGT on the execution
464
<p>A potential performance problem of getting a TGT on the execution host
465
of the job occurs when the sge_qmaster has to get a TGT which is valid
466
on the execution host. This is a problem because the sge_qmaster must contact
467
the KDC and wait for a response before it can send the job to the execution
468
host. If this transaction is executed synchronously, the qmaster will be
469
blocked for a period of time (maybe 100 ms average) for each job. This
470
should certainly be avoided if possible. One option would be to use an
471
asynchronous request, but this has the disadvantage of significantly complicating
472
the code due to handling the asynchronous request and having an additional
473
job state (i.e. WAITING_FOR_TGT) for each outstanding job. We would also
474
have to investigate if this is even possible using the Kerberos API C library.
475
A more attractive solution to this performance problem is to get a TGT
476
in the client (qsub or qmon) which is valid for any host but to prevent
477
this TGT from being written to the user's credentials cache (Hopefully,
478
not making this TGT available in the user's credential cache would address
479
any security concerns of a wildcard TGT). Then the wild-card TGT could
480
be passed to the qmaster who would store it in the job entry and pass it
481
to the execution host when the job executes. The sge_execd or sge_shepherd
482
could use this wildcard TGT to get a specific TGT valid on the execution
483
host and then destroy the original wild-card TGT.
486
Grid Engine Kerberization Level Two Design</h4>
493
Q. How do we prevent a job's tickets from expiring while the job is queued?
494
<p>A. If this is a requirement, then the qmaster (or some other external
495
process such as the scheduler) will need to occasionally go through all
496
of the job entries and look for TGTs which are about to expire and contact
497
the KDC to request a new TGT for that job. Of course, contacting the KDC
498
will impose some overhead and care should be taken that this does not cause
499
performance problems. Grid Engine should also continue to keep the TGT
500
in the qmaster valid for as long as the job is executing, just in case
501
we have to restart the job at a later time. It might also make sense that
502
when the shepherd gets a TGT for the job that it gets a TGT which is valid
503
for a reasonably long period of time.
504
<p>Q. How can we prevent a job's tickets from expiring while the job is
506
<p>A. The sge_shepherd could continually renew the TGT of the client on
507
a regular basis. The sge_shepherd could do this directly or spawn a separate
508
process to do it. When the TGT is renewed the new TGT would be placed in
509
the user's credential cache making it available to processes executing
510
as part of the user's job. (Thanks to Fritz for this suggestion.)
512
<p>Copyright 2001 Sun Microsystems, Inc. All rights reserved.</center>