1
Complex Attribute Specification
10
The purpose of this document is to describe how the attributes in the
11
new SGE V6.0 system. The behavior to the previous one has changed in some
12
aspects. It also describes how the attributes can be used and which work
13
packages are still pending. Therefor this document is under construction
14
itself and will be changed while the work packages are implemented.
18
I gratefully acknowledge useful conversations and input in other
19
forms with Andre Alefeld, Ernst Bablick, Andreas Dorr, Fritz Ferstl,
20
Andreas Haas, Christian Reissmann, and Andy Schwierskott.
25
- We have the possibility to specify attributes on global level, host level,
27
- The values for an attribute can be fixed or changeable.
28
- The values can be a load value, custom defined or a resource limit.
29
- Resource limits exist only on queue level
30
- Load values only on global or host level
31
- The type of an attribute can be: string, host, cstring, int,double,
32
boolean, memory, and time.
33
- A consumable can have a default value, thus the user can, but does not
34
have to specify the attribute.
35
- An attribute can be requestable, has to be requested, or cannot be
37
- An attribute can be build-in or a user defined.
38
- Most load values and all of the resource limits are build in.
39
- A user can define consumables, load values, or fixed values.
40
- An attribute has one of many relational operations: ==, !=, >=, =<, >,<
42
- An attribute can be a per job attribute or per slot attribute:
43
- all load values and consumables are per job attributes except string,
44
cstring, host values. They are per slot attributes.
45
- all resource limits and user defined fixed values are per slot
47
- An attribute can only be a job attribute or a slot attribute, but not
50
The user can change every aspect of an attribute at any time. The current
51
definition of an attribute looks like:
60
Using per slot attributes:
61
--------------------------
63
All fixed values in the system are per slot attributes. This means, that
64
a parallel job existing of 4 parallel tasks (needing 4 slots to run)
65
requires the attribute for each task.
68
one job j1 requests t01 = 10 with 4 slots
69
queue q1 has 5 slots and t01 is set to <=20
70
j1 does fit on q1 since it has enough slots and t01 allows to run
71
jobs with t01 requests between 0 and 20.
72
The configuration of q1 after j1 has started is slots = 1 and
75
Using per job attributes:
76
-------------------------
78
All changeable values in the system are per job attributes like the
79
"slots" attribute in E1. This means, that a parallel job requests a
80
resource n times (n is the number of slots it will use).
83
one job j2 requests t02 = 20 with 4 slots
84
queue q2 offers 5 slots with t02 <= 40
85
queue q3 offers 1 slot with t02 <= 40
86
queue q4 offers 3 slots with t02 <= 40
87
The system will start two instances of the job on q2, one on q3 and
89
The configuration of the queues afterwards looks like:
90
q2 offers 3 slots with t02 <= 0 (q2.t02 - 2*j2.t02)
91
q1 offers 0 slots with t02 <= 20 (q2.t02 - 1*j2.t02)
92
q3 offers 2 slots with t02 <= 20(q2.t02 - 1*j2.t02)
94
This example shows how consumables are handled. With load values is
95
will take some time, before they are updated, depending on the
96
load_value_report interval.
98
It is possible to override attributes on a lower level or on the same
99
level. More about it later.
101
The fact, that all consumables are per job attributes posses a problem,
102
when one wants to do licence managing with parallel jobs. Therefor it
103
would be nice to have it configurable if a consumable is a per job
104
attribute or per slot attribute.
107
=> Decide, what is the right way of doing it
108
=> Adding a flag which can be set by a user, if an attribute is a per
109
slot or perjob attribute.
116
Not all of the above described combinations make sense. Till now, there are
117
no restrictions on how an attribute is defined, but the code working on them
118
has the restrictions already build in. Therefore it will aid the user in
119
configuring SGE when the system allows only valid specifications:
121
name : has to be unique
122
Short cu : has to be unique
123
Type : every type from the list (string, host, cstring, int, double,
124
boolean, memory, time, restring)
125
Consumable : can only be defined for: int, double, memory, time
126
If a consumable is not requestable, it has to have a default
128
If a consumable is forced, it must not have a default value.
131
- for consumables: only <=
132
- for non consumables:
133
- string, host, cstring: only ==, !=
135
- int, double, memory, time: ==, !=, <=, <, =>, >
137
Requestable : for all attribute
138
default value : only for consumables
140
The qmon interface should only provied valid options. The choice of the
141
type limites the choice of operators and if it can be a consumable or not.
142
Haveing a consumable also limites the relational operators to one. This
143
makes it easier and more convieniend for the user to add new attributes.
144
Default values can only be added, when it makes sense (for consumables).
149
Besides the overall attribute restrictions, we have one additional one for
150
the system build-in attributes. One can not change the type of a build in
151
attribute. The system relies on the type and will not function anymore, if
152
one changes the type. A build-in value can also not be deleted.
154
The only exception are the strings. A string can be changed into a cstring
155
or restring and back.
157
3. Overriding attributes:
158
-------------------------
160
In general an attribute can be overridden on a lower level
161
- global by hosts and queues
163
and load values or resource limits on the same level. Overriding a per slot
164
attribute with a per slot attribute and a per job attribute with an per slot
165
attribute is no problem. Based on the specification does a per job attribute
166
never be override with a per slot attribute. But a per job attribute can be
167
overridden with a per slot attribute. In this case the per slot attribute
168
changes into a per job attribute. This happens, when a load value is
169
overridden with a fixed value.
171
We have one limitation for overriding attributes based on its relational
173
- !=, == operators can only be overridden on the same level, but not on a
175
- >=, >, <=, < operators can only be overridden, when the new value is more
176
restrictive than the old one.
179
1. We have a load value arch on host level. One can override it in the host
180
definition with another value, but not in a queue.
181
2. We have a load value mem_free with a relop <= on host level. One can
182
override it on host or queue level with a value, which is smaller than
184
mem_free: custom / load report / result:
186
1 GB / 0.9 GB / 0.9 GB
188
The reason why we have the override limitation is buried in the algorithm
189
how we match the job requests with available resources. The algorithm is
190
strict hierarchical, which means, if it finds a attribute on one level,
191
which does not match, the other levels are not further evaluated. It starts
192
with the global host and ends with a queue. When a attribute is missing on
193
one level it will go one with the next levels. But an existing attribute,
194
which does not match results in an abort.
197
4. Internal representation:
198
---------------------------
200
One could say, that we have three different lists. which are used to match
201
the requests with the existing resources. These are:
202
- the job request list (hart and soft)
203
- the attribute configuration list
204
- and the elements (of a list), which are generated for matching and output
205
purposes. It is generated from the first two lists. All list entries share
206
the same CULL list structure, which is never fully used.
208
request list / configuration list / match list
210
Shortcut --- / X / ---
212
val (as doulbe) --- / --- / X
213
val (as string) X / --- / X
215
consumable --- / X / X
216
default --- / X / ---
217
dominant --- / --- / X
218
pj val (as double) --- / --- / X
219
pj val (as string) --- / --- / X
220
pj dominant --- / --- / X
221
requestable --- / --- / X
224
Splitting this one CULL definition into multiple ones will reduce the amount
225
of used memory, the time for copying the lists ,and enhance the readability
226
within the source code.
227
Though the new structures will look like:
230
- RL_name - name or shortcut
231
- RL_stringval - requested value
232
- RL_tagged - matched a existing resource
237
- CE_shortcut - short name
238
- CE_type - type (int, string, ....)
239
- CE_relop - relational operator (==, !=, ...)
240
- CE_consumable - boolean flag
241
- CE_default - default value, only consumables
245
- CE_name - requested name (name or shortcut)
247
- CE_doubleval - fixed value
248
- CE_stringval - fixed value as string
249
- CE_relop - relational operator
250
- CE_consumable - is consumable?
251
- CE_dominant - from which level , of which type (fixed, ...)
252
- CE_pj_stringval - changeable value
253
- CE_pj_doubleval - changeable value as string
254
- CE_pj_dominant - from which level , of which type (load, ...)
255
- CE_requestable - is it requestable
259
=> Phase 1: changing the request matching structure (CM_type)
260
=> Phase 2: changing the job request structure (CL_type)
261
=> Phase 3: changing the attribute configuration structure (RL_type)
264
5. Scheduler attribute matching:
265
--------------------------------
267
As written before, the matching of attribute requests by a job a matched in
268
in a strict hierarchy. When a match fails, the underlying levels are not
269
evaluated any further. Right now, this is done for every job, even so the
270
jobs might be in the same job category, which means, that they have the same
272
To speed up this process, one can store the information, which queue cannot
273
run which job category. When this is known, the jobs are only tested against
274
the queues, which were capable of running the jobs from the previous
275
dispatch cycle. List of queues to test will get shorter and shorter, while
278
The same is true for soft requests. Once all queues are validated and the
279
number of mismatches are computed, they are the same for all other jobs in
280
the same job category. This saves a lot of matching time with the soft
287
The string matching has some specialties. A string can have one of three
291
- regular expression string
293
1. Plain strings (STRING):
294
Matches only, when the requested and the provided string are exactly
297
2. Caseless strings (CSTRING):
298
The upper- and lowercase of the characters in a string is ignored.
300
3. Regular expression string (RESTRING):
301
The user can use a regular expression to ask for a resource. The
302
syntax follows the following rules:
303
- "*" : matches any character and any number of chars (between 0
305
- "?" : matches any character. It cannot be no character
306
- "." : is the character ".". It has no other meaning
307
- "\" : escape character. "\\" = "\", "\*" = "*", "\?" = "?"
308
- "[xx]": specifies an array or a range of allowed characters
309
- "|" : logical "or". Can only be used on the highest level and
313
- "x+" : to specify, that the character "x" has to appear at least
315
- "[xx|yy]" : to specify xx or yy
318
-l arch="linux|solaris" : results in "arch=linux" OR
320
-l arch="[linux|solaris]" : results in "arch=[linux" OR
326
When ever resource matching is done with jobs, which have pre-calculated
327
job categories, the matching results will be stored in the job categories.
328
This can be done because all jobs in the same category have the same requests,
329
the same user, the same department,...
331
What is cached depends on the job kind (if it is a job with only hard requests,
332
or if its one with hard, soft, pe and other requests)
334
Jobs with only hard requests:
335
All queues and hosts on which the job cannot run are stored in the job
336
category. This information is used to limit the possible target queues.
339
All unfitting queues and the soft violation results are stored in the job
340
category. This means, that the soft violations are only computed once and
341
reused for all other jobs in the same category. The queue information
342
limits the possible target queues.