1
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
4
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
5
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
6
<meta name="GENERATOR" content="Mozilla/4.76C-CCK-MCD Netscape [en] (X11; U; SunOS 5.8 sun4u) [Netscape]">
7
<meta name="AUTHOR" content="Joachim Gabler">
8
<meta name="CREATED" content="20010613;11080800">
9
<meta name="CHANGEDBY" content="Joachim Gabler">
10
<meta name="CHANGED" content="20010625;16081100">
13
H2 { margin-top: 0.33in; margin-bottom: 0.2in }
20
Execd - the execution daemon</h1>
21
The Execution Daemon (execd) is the instance in Grid Engine that
27
controls jobs, e.g. can suspend / unsuspend a job, reprioritize the processes
28
associated with a job, etc.</li>
31
gathers information about jobs, e.g. resource usage, exit code etc.</li>
34
gathers information about the execution host it controls, e.g. load, free
37
There is one execd on each host of a cluster.
40
When execd starts up, the following actions are taken:
43
General initializations</li>
45
<br><tt>(source/daemons/execd/execd.c: main)</tt>
47
Connect to commd.</li>
49
<br><tt>(source/libs/comm/commlib.c: enroll)</tt>
51
Try to contact qmaster and register with qmaster.</li>
53
<br><tt>(source/daemons/execd/setup_execd.c: sge_setup_sge_execd)</tt>
54
<br>If the execd can't contact qmaster, it will continue and try to contact
55
qmaster in regular intervals.
57
Look for old jobs (jobs that have been started before execd was shutdown).</li>
59
<br><tt>(source/daemons/execd/setup_execd.c: sge_setup_old_jobs)</tt>
61
Establish process control for still running jobs.</li>
63
<br><tt>(source/daemons/execd/execd_ck_to_do.c: register_at_ptf)</tt>
65
Cleanup finished jobs and report them to qmaster.</li>
67
<br><tt>(source/daemons/execd/reaper_execd.c: clean_up_old_jobs)</tt></ul>
68
After startup, execd enters its main loop where it
71
receives requests,</li>
74
processes requests, and</li>
77
sends reports in regular intervals.</li>
81
PDC - the Portable Data Collector</h2>
82
The PDC is a module inside the execd that collects information about running
83
jobs like CPU usage, memory consumption, etc.
84
<p>Data is collected for all processes of a job on the basis of a criterion
85
unique to a job. On Systems that support some sort of jobid for a hierarchy
86
of processes, this jobid is used. On all other systems, an additional user
87
group id (gid) is created on behalf of each job and then used.
88
<p>The jobid / additional group id is attached to the root process of a
89
job and is inherited by its child processes.
90
<p>The PDC is implemented in <tt>source/daemons/pdc.c</tt>.
92
PTF - the Priority Translation Facility</h2>
93
In SG3E mode, Grid Engine has the feature of a share-based scheduler (product
94
mode sgeee). Each job gets a certain share of the system resources.
95
<p>There exist different mechanisms (policies) to assign shares to a job.
96
The sum of all shares for a job is expressed in so called tickets - a job
97
has a certain number of tickets enabling it to run with certain process
99
<p>If multiple jobs are running concurrently on a host, their different
100
share of system resources - their different number of tickets - can be
101
mapped to priorities in the operating system.
102
<p>Setting priorities in the operating system is done by either setting
103
the nice value for all processes of a job or by using special priority
104
mapping facilities provided by the underlying operating system.
105
<p>Grid Engine reassigns the number of tickets per job in a regular interval.
106
The PTF then maps the number of tickets of a job to nice values (or another
107
operating system priority representation) and renices all processes of
109
<p>Like the PDC, the PTF uses the jobid / additional group id to capture
110
all processes of a job.
111
<p>The PTF is implemented in <tt>source/daemons/execd/ptf.c</tt>.
113
Requests to execd</h2>
114
Execd requests are specified by a request tag (e.g. <tt>TAG_JOB_EXECUTION</tt>).
115
<br>For incoming requests a mapping is done from a request tag to a callback
116
function that processes the request.
117
<br>Execd accepts and processes the following requests:
120
<b><i>Execute a job <tt>(TAG_JOB_EXECUTION)</tt></i></b></li>
122
<br>If a request to execute a job is received from the qmaster, the job
123
is spooled to disk and started via a shepherd process - see the <a href="../shepherd/shepherd.html">shepherd
125
<br>During the job's runtime, the processes of the job can be monitored
126
and controlled by the PDC and PTF modules of the execd.
127
<br>After the job finished, all relevant information about the job is gathered
128
and the job end is reported to the qmaster.
129
<br>The function <tt>execd_job_exec</tt> in <tt>source/daemons/execd/execd_job_exec.c</tt>
130
processes this type of request.
132
<b><i>Execute a task inside a parallel job <tt>(TAG_SLAVE_ALLOW)</tt></i></b></li>
134
<br>The model of parallel jobs in Grid Engine provides the concept of a
135
tight integration of a parallel job's tasks in Grid Engine. In this tight
136
integration, the tasks of a parallel job are under full control of Grid
138
<br>Tasks can be started with the qrsh binary (<tt>qrsh -inherit</tt>).
139
q<tt>rsh -inherit</tt> itself contacts Execd using the GDI function
140
<tt>sge_qrexec()</tt>.
141
<br>Like a single job, a task is started via a shepherd process and can
142
be monitored and controlled by PDC and PTF.
143
<br>After a task finishes, all relevant information about the task is gathered
144
and the task end is reported to the qmaster.
145
<br>The function <tt>execd_job_slave</tt> in <tt>source/daemons/execd/execd_job_exec.c</tt>
146
processes this type of request.</ul>
150
<b><i>Assign Tickets to a running job, reprioritize job <tt>(TAG_CHANGE_TICKET)</tt></i></b></li>
152
<br>In regular intervals, the number of tickets is reassigned to each running
153
job. The number of tickets is reported from the qmaster to the execd's.
154
<br>The number of tickets is mapped to an operating system nice value or
155
another operating system provided priority representation and thus all
156
processes of a job are reprioritized.
157
<br>The function <tt>execd_ticket</tt> in <tt>source/daemons/execd/execd_ticket.c</tt>
158
processes this type of request.
160
<b><i>Acknowledge from qmaster to a previously sent job report <tt>(TAG_ACK_REQUEST)</tt></i></b></li>
162
<br>After a job or a task finishes, the execd reports this as a job report
163
to the qmaster. The qmaster must acknowledge a job report; if no acknowledge
164
arrives at the execd within a certain interval, the job report is resent.
165
<br>The function <tt>execd_c_ack</tt> in <tt>source/daemons/execd/job_report_execd.c</tt>
166
processes this type of request.
168
<b><i>Signal all jobs in a queue <tt>(TAG_SIGQUEUE)</tt></i></b></li>
170
<br>The qmaster asks the execd to send a certain signal to all jobs in
171
a certain queue. This, for example, can be triggered by suspending the
173
<br>The execd signals the process group of each job in the queue.
174
<br>The function <tt>execd_signal_queue</tt> in <tt>source/daemons/execd/execd_signal_queue.c</tt>
175
processes this type of request.
177
<b><i>Signal a job <tt>(TAG_SIGJOB)</tt></i></b></li>
179
<br>The qmaster can ask the execd to send a certain signal to a single
180
job, for example, if the job is suspended.
181
<br>The execd will signal the process group of this job.
182
<br>This request is also processed by the function <tt>execd_signal_queue</tt>
183
in <tt>source/daemons/execd/execd_signal_queue.c</tt>.
185
<b><i>Shutdown <tt>(TAG_KILL_EXECD)</tt></i></b></li>
187
<br>Tells the execd to do a clean shutdown.
188
<br>The function <tt>execd_kill_execd</tt> in <tt>source/daemons/execd/execd_kill_execd.c</tt>
189
processes this type of request.
191
<b><i>Activate/deactivate certain features, e.g. job repriorization - PTF
192
<tt>(TAG_NEW_FEATURES)</tt></i></b></li>
194
<br>The function <tt>execd_new_features</tt> in <tt>source/daemons/execd/execd_kill_execd.c</tt>
195
processes this type of request.</ul>
199
<b><i>Configuration changed <tt>(TAG_GET_NEW_CONF)</tt></i></b></li>
201
<br>If the cluster configuration (either the global or for a specific host)
202
is changed, all affected hosts will be notified by the qmaster about the
203
configuration change.
204
<br>The function <tt>execd_get_new_conf</tt> in <tt>source/daemons/execd/execd_get_new_conf.c</tt>
205
processes this type of request.</ul>
208
Reports from execd to qmaster</h2>
209
The execd sends reports to the qmaster in a regular interval. These reports
213
<b><i>Load values</i></b></li>
215
<br>All load values collected by the load sensor(s) of an execd in a load
216
report interval are sent to the qmaster in one report message - see also
217
man page <font color="#000000"><a href="../../../doc/htmlman/htmlman5/sge_conf.html">sge_conf(5)</a></font>.
219
<b><i>Job reports</i></b></li>
221
<br>Job reports are created during a job's runtime by PDC reporting the
222
job's resource consumption accumulated so far. A job report is also created
223
when a job finishes to report the final resource consumption. Multiple
224
job reports are collected and sent in one report message to the qmaster
225
- job reports for tasks of parallel jobs are sent to qmaster immediately
226
(see <tt>source/daemons/execd/reaper_execd.c</tt> - the variable <tt>flush_jr</tt>
227
defines if a job report is sent immediately or with the report interval).</ul>
230
The load sensor interface</h2>
231
A load sensor is a module, that retrieves any host specific values and
232
passes them to the execd.
233
<p>The execd will report these host specific values, called load values
234
in the following text, to the qmaster.
235
<p>The execd contains a load sensor for the common host characteristics
236
like load, total memory, free memory, total swap, free swap etc.
237
<p><font color="#000000">The file doc/load_parameters.asc contains a detailed
238
description of all load values including platform dependencies.</font>
239
<p><font color="#000000">Load values are retrieved by the (platform dependene)
240
function <tt>get_load_avg</tt> and <tt>get_cpu_load</tt> in the file <tt>source/libs/uti/sge_getloadavg.c</tt>.</font>
241
<p><font color="#000000">Memory load values are retrieved by the (platform
242
dependent) function <tt>load_mem</tt> in file <tt>source/libs/uti/sge_loadmem.c</tt>.</font>
243
<p>In addition, there exists an interface to integrate one or multiple
244
external load sensors into the execd - see man page <a href="../../../doc/htmlman/htmlman5/sge_conf.html">sge_conf(5)</a>.
245
<p>This is for example done to integrate license counters from licensing
246
systems into Grid Engine or to provide additional host characteristics
247
to Grid Engine that are not handled by the built-in load sensor.
248
<p>An external load sensor can be any executable like a binary, a shell
249
script, a perl script ...
250
<p>It can be configured in the (host specific) cluster configuration by
251
setting the parameter <tt>load_sensor</tt> - see man page <a href="../../../doc/htmlman/htmlman5/sge_conf.html">sge_conf(5)</a>
252
- and is started by the execd as a child process (see function <tt>sge_ls_start_ls</tt>
253
in file <tt>source/daemons/execd/sge_load_sensor.c</tt>).
254
<br>Multiple load sensors can be started by one execd.
255
<p>A load sensor gets commands from execd on stdin and has to report the
256
load values on stdout. It has to implement the following protocol:
258
Commands from execd</h3>
262
<b><i>Retrieve and send load values</i></b></li>
264
<br>In a regular interval defined as <tt>load_report_interval</tt> in the
265
cluster configuration, the execd will ask the load sensor to retrieve actual
266
load values and send them back to the execd.
267
<br>Execd will send a single linefeed (<tt>\n</tt>) to trigger this action.
269
<b><i>Shutdown</i></b></li>
271
<br>The execd can tell the load sensor to shutdown. Therefor it sends the
273
<tt>quit</tt> followed by a linefeed.</ul>
276
Format of load values</h3>
277
A record containing all load values provided by a load sensor may only
278
be sent after a request from execd.
279
<p>The record is formed by
282
the keyword <tt>begin</tt> followed by a linefeed</li>
285
Any number of load values, each in a single line</li>
288
the keyword <tt>end</tt> followed by a linefeed</li>
290
The format for a load value is
291
<br><tt>hostname:name:value</tt>
292
<p>Examples of load sensors are installed in the directory
293
<tt>$SGE_ROOT/util/resources/loadsensors</tt>
294
<br>Further information on setting up loadsensors can be found at
295
<tt><a href="http://gridengine.sunsource.net/project/gridengine/howto/loadsensor.html">http://gridengine.sunsource.net/project/gridengine/howto/loadsensor.html</a>
301
<p>Copyright 2001 Sun Microsystems, Inc. All rights reserved.</center>