2
.\"___INFO__MARK_BEGIN__
4
.\" Copyright: 2004-2007 by Sun Microsystems, Inc.
8
.\" $RCSfile$ Last Update: $Date$ Revision: $Revision$
11
.\" Some handy macro definitions [from Tom Christensen's man(1) manual page].
13
.de SB \" small and bold
14
.if !"\\$1"" \\s-2\\fB\&\\$1\\s0\\fR\\$2 \\$3 \\$4 \\$5
17
.de T \" switch to typewriter font
18
.ft CW \" probably want CW if you don't have TA font
21
.de TY \" put $1 in typewriter font
30
.de M \" man page reference
31
\\fI\\$1\\fR\\|(\\$2)\\$3
33
.TH QSTAT 1 "$Date$" "xxRELxx" "xxQS_NAMExx User Commands"
35
qstat \- show the status of xxQS_NAMExx jobs and queues
44
] [ \fB\-F\fP [\fBresource_name,...\fP]
52
.B -l resource=val,...
62
.B -qs {a|c|d|o|s|u|A|C|D|E|S}
66
.B -s {r|p|s|z|hu|ho|hs|hd|hj|ha|h|a}[+]
81
shows the current status of the available xxQS_NAMExx queues and the
82
jobs associated with the queues. Selection options allow you
83
to get information about specific jobs, queues or users.
86
will display only a list of jobs with no queue status
89
The administrator and the user may define files (see
91
which can contain any of the options described below. A cluster-wide sge_qstat
92
file may be placed under
93
$xxQS_NAME_Sxx_ROOT/$xxQS_NAME_Sxx_CELL/common/sge_qstat
94
The user private file is searched at the location
96
The home directory request file has the highest precedence over
97
the cluster global file.
98
Command line can be used to override the flags contained in the files.
101
.IP "\fB\-explain a|A|c|E\fP"
102
\'c' displays the reason for the c(onfiguration ambiguous) state of a queue
103
instance. 'a' shows the reason for the alarm state. Suspend alarm state
104
reasons will be displayed by 'A'. 'E' displays the reason for a queue
105
instance error state.
107
The output format for the alarm reasons is one line per reason containing
108
the resource value and threshold. For details about the resource value please
109
refer to the description of the \fBFull Format\fP in section \fBOUTPUT FORMATS\fP below.
111
Displays additional information for each job related to the job ticket policy scheme
112
(see OUTPUT FORMATS below).
115
Specifies a "full" format display of information.
116
The \fB\-f\fP option causes summary
117
information on all queues to be displayed along with the
120
.IP "\fB\-F\fP [ \fBresource_name,...\fP ]"
121
Like in the case of \fB\-f\fP information is displayed on all jobs as well as
124
will present a detailed listing of the current
125
resource availability per queue with respect to all resources (if the option
126
argument is omitted) or with respect to those resources contained in the
127
resource_name list. Please refer to the description of the
129
section \fBOUTPUT FORMATS\fP below for further detail.
131
.IP "\fB\-g {c|d|t}[+]\fP"
132
The \fB\-g\fP option allows for controlling grouping of displayed
135
With \fB\-g c\fP a cluster queue summary is displayed. Find more information
136
in the section \fBOUTPUT FORMATS\fP.
138
With \fB\-g d\fP array jobs are displayed verbosely in a one
139
line per job task fashion. By default, array jobs are grouped and all
140
tasks with the same status (for pending tasks only) are displayed in a
141
single line. The array job task id range field in the output (see section
142
\fBOUTPUT FORMATS\fP) specifies the corresponding set of tasks.
144
With \fB\-g t\fP parallel jobs are displayed verbosely in a one line
145
per parallel job task fashion. By default, parallel job tasks are
146
displayed in a single line. Also with \fB\-g t\fP option the function of each
147
parallel task is displayed rather than the jobs slot amount (see section
148
\fBOUTPUT FORMATS\fP).
152
Prints a listing of all options.
154
.IP "\fB\-j [job_list]\fP"
155
Prints either for all pending jobs or the jobs contained in job_list
156
various information. The job_list can contain job_ids, job_names, or
160
For jobs in E(rror) state the error reason is displayed. For jobs that
161
could not be dispatched during in the last scheduling interval the
162
obstacles are shown, if 'schedd_job_info' in
164
is configured accordingly.
166
For running jobs available information on resource utilization is shown
167
about consumed cpu time in seconds, integral memory usage in Gbytes
168
seconds, amount of data transferred in io operations, current virtual
169
memory utilization in Mbytes, and maximum virtual memory utilization in
170
Mbytes. This information is not available if resource utilization
171
retrieval is not supported for the OS platform where the job is hosted.
173
.IP "\fB\-l resource\fP[\fB=value\fP],..."
174
Defines the resources required by the jobs or granted by the
175
queues on which information is requested. Matching is performed
176
on queues based on non-mutable resource availability information
177
only. That means load values are always ignored except the
178
so-called static load values (i.e. "arch", "num_proc", "mem_total", "swap_total" and
179
"virtual_total") ones. Consumable utilization is also ignored.
180
The pending jobs are restricted to jobs that might run in
181
one of the above queues. In a similar fashion also the queue-job
182
matching bases only on non-mutable resource availability
186
In combination with \fB\-f\fP the option suppresses the display of empty
187
queues. This means all queues where actually no jobs are running are not
190
.IP "\fB\-pe pe_name,...\fP"
191
Displays status information with respect to queues which are attached to
192
at least one of the parallel environments enlisted in the comma separated
193
option argument. Status information for jobs is displayed either for those
194
which execute in one of the selected queues or which are pending and
195
might get scheduled to those queues in principle.
198
Displays additional information for each job related to the job priorities in
200
(see OUTPUT FORMATS below).
202
.IP "\fB\-q wc_queue_list\fP"
203
Specifies a wildcard expression queue list to which job
204
information is to be displayed. Find the definition of \fBwc_queue_list\fP
208
.IP "\fB\-qs {a|c|d|o|s|u|A|C|D|E|S}\fP"
209
Allows for the filtering of queue instances according to state.
212
Prints extended information about the resource requirements
213
of the displayed jobs. Please refer to the \fBOUTPUT FORMATS\fP
214
sub-section \fBExpanded Format\fP below for detailed information.
216
.IP "\fB\-s {p|r|s|z|hu|ho|hs|hd|hj|ha|h|a}[+]\fP"
218
Prints only jobs in the specified state, any combination of states is
219
possible. \fB\-s prs\fP corresponds to the regular
221
output without \fB\-s\fP
222
at all. To show recently finished jobs, use \fB\-s z\fP.
223
To display jobs in user/operator/system/array-dependency hold,
224
use the \fB\-s hu/ho/hs/hd\fP
226
\fB\-s ha\fP option shows jobs which where
232
displays all jobs which are not eligible for execution unless the job
233
has entries in the job dependency list.
236
is an abbreviation for
238
\fB\-s huhohshdhjha\fP
241
\fB\-s a\fP is an abbreviation for
244
(see \fB\-a\fP, \fB\-hold_jid\fP
245
and \fB\-hold_jid_ad\fP options to
249
Prints extended information about the controlled sub-tasks
250
of the displayed parallel jobs. Please refer to the \fBOUTPUT FORMATS\fP
251
sub-section \fBReduced Format\fP below for detailed information. Sub-tasks
252
of parallel jobs should not be confused with array job tasks (see \fB\-g\fP
253
option above and \fB\-t\fP option to
256
.IP "\fB\-U user,...\fP"
257
Displays status information with respect to queues to which the specified
258
users have access. Status information for jobs is displayed either for those
259
which execute in one of the selected queues or which are pending and
260
might get scheduled to those queues in principle.
262
.IP "\fB\-u user,...\fP"
263
Display information only on those jobs and queues
264
being associated with the users from the given user list.
265
Queue status information is displayed if the \fB\-f\fP or \fB\-F\fP
266
options are specified additionally and if the user runs
267
jobs in those queues.
271
is a placeholder for the current username. An asterisk "*" can be used
272
as username wildcard to request any users' jobs be displayed. The default
273
value for this switch is \fB\-u $user\fP.
276
Displays additional information for each job related to the job urgency policy scheme
277
(see OUTPUT FORMATS below).
280
This option can be used with all other options and changes the output to XML. The used
281
schemas are referenced in the XML output. The output is printed to stdout.
286
Depending on the presence or absence of the \fB-explain\fP, \fB\-f\fP, \fB\-F\fP, or \fB\-qs\fP and
287
\fB\-r\fP and \fB\-t\fP option three output formats need to be differentiated.
289
The \fB\-ext\fP and \fB\-urg\fP options may be used
290
to display additional information for each job.
292
.SS "\fBCluster Queue Format (with \-g c)\fP"
293
Following the header line a section for each cluster queue
294
is provided. When queue instances selection are applied (\-l \-pe, \-q, \-U)
295
the cluster format contains only cluster queues of the corresponding queue
298
the cluster queue name.
300
an average of the normalized load average of all queue hosts. In order
301
to reflect each hosts different significance the number of configured
302
slots is used as a weighting factor when determining cluster queue load.
303
Please note that only hosts with a np_load_value are considered for this
304
value. When queue selection is applied only data about selected queues
305
is considered in this formula. If the load value is not available at
306
any of the hosts '-NA-' is printed instead of the value from the complex
307
attribute definition.
309
the number of currently used slots.
311
the number of slots reserved in advance.
313
the number of currently available slots.
315
the total number of slots.
317
the number of slots which is in at least one of the states 'aoACDS' and in
318
none of the states 'cdsuE'
320
the number of slots which are in one of these states or in any combination
323
the \fB\-g c\fP option can be used in combination with \fB\-ext\fP. In this
324
case, additional columns are added to the output. Each column contains
325
the slot count for one of the available queue states.
326
.SS "\fBReduced Format (without \-f, \-F, and \-qs)\fP"
327
Following the header line a line is printed for each job
332
the priority of the job determining its position in the pending jobs list.
333
The priority value is determined
334
dynamically based on ticket and urgency policy set-up (see also
340
the user name of the job owner.
342
the status of the job \- one of d(eletion), E(rror), h(old), r(unning),
343
R(estarted), s(uspended), S(uspended), t(ransfering), T(hreshold) or w(aiting).
345
The state d(eletion) indicates that a
347
has been used to initiate job deletion.
348
The states t(ransfering) and r(unning) indicate that a job is about to
349
be executed or is already executing, whereas the states s(uspended),
350
S(uspended) and T(hreshold) show that an already running jobs has been
351
suspended. The s(uspended) state is caused by suspending the job via the
353
command, the S(uspended) state indicates that the queue containing the job
354
is suspended and therefore the job is also suspended and the T(hreshold)
355
state shows that at least one suspend threshold of the corresponding queue
358
and that the job has been suspended as a consequence. The state R(estarted)
359
indicates that the job was restarted. This can be caused by a job migration or
360
because of one of the reasons described in the -r section of the
364
The states w(aiting) and h(old) only appear for pending jobs. The h(old)
365
state indicates that a job currently is not eligible for execution due to
366
a hold state assigned to it via
371
\fB\-h\fP option or that the job is waiting for completion of the jobs
372
to which job dependencies have been assigned to the job via the
373
\fB\-hold_jid\fP or \fB\-hold_jid-ad\fP options of
378
The state E(rror) appears for pending jobs that couldn't be started due to
379
job properties. The reason for the job error is shown by the
384
the submission or start time and date of the job.
386
the queue the job is assigned to (for running or suspended
389
the number of job slots or the function of parallel job tasks
390
if \fB\-g t\fP is specified.
392
Without \fB\-g t\fP option the total number of slots occupied resp. requested by the job
393
is displayed. For pending parallel jobs with a PE slot range request,
394
the assumed future slot allocation is displayed.
395
With \fB\-g t\fP option the function of the running jobs (MASTER or SLAVE \- the
396
latter for parallel jobs only) is displayed.
398
the array job task id. Will be empty for non-array jobs. See the
401
and the \fB\-g\fP above for additional information.
404
If the \fB\-t\fP option is supplied, each status line always contains
405
parallel job task information as if \fB\-g t\fP were specified and
406
each line contains the following parallel job subtask information:
408
the parallel task ID (do not confuse parallel tasks with array job tasks),
410
the status of the parallel task \- one of
411
r(unning), R(estarted), s(uspended), S(uspended), T(hreshold), w(aiting),
412
h(old), or x(exited).
414
the cpu, memory, and I/O usage,
416
the exit status of the parallel task,
418
and the failure code and message for the parallel task.
419
.SS "\fBFull Format (with \-f and \-F)\fP"
420
Following the header line a section for each queue separated
421
by a horizontal line is provided. For each queue the information
426
the queue type \- one of B(atch), I(nteractive), C(heckpointing),
427
P(arallel), T(ransfer) or combinations thereof or N(one),
429
the number of used and available job slots,
431
the load average of the queue host,
433
the architecture of the queue host and
435
the state of the queue \- one of
436
u(nknown) if the corresponding
437
.M xxqs_name_sxx_execd 8
438
cannot be contacted, a(larm), A(larm), C(alendar suspended), s(uspended),
439
S(ubordinate), d(isabled), D(isabled), E(rror) or
440
combinations thereof.
442
If the state is a(larm) at least on of the load thresholds defined in the
443
\fIload_thresholds\fP list of the queue configuration (see
446
currently exceeded, which prevents from scheduling further jobs to that
449
As opposed to this, the state A(larm) indicates that at least one of the
450
suspend thresholds of the queue (see
452
is currently exceeded. This will result in jobs running in that queue being
453
successively suspended until no threshold is violated.
455
The states s(uspended) and d(isabled) can be assigned to queues and
458
command. Suspending a queue will cause all jobs executing in that queue to
461
The states D(isabled) and C(alendar suspended) indicate that the queue
462
has been disabled or suspended automatically via the calendar facility of
464
.M calendar_conf 5 ),
465
while the S(ubordinate) state
466
indicates, that the queue has been suspend via subordination to another
469
for details). When suspending a queue
470
(regardless of the cause) all jobs executing in that queue are suspended
473
If an E(rror) state is displayed for a queue,
474
.M xxqs_name_sxx_execd 8
475
on that host was unable to locate the
476
.M xxqs_name_sxx_shepherd 8
478
on that host in order to start a job. Please check the
479
error logfile of that
480
.M xxqs_name_sxx_execd 8
481
for leads on how to resolve the problem. Please enable the
482
queue afterwards via the \fB-c\fP option of the
486
If the c(onfiguration ambiguous) state is displayed for a queue
487
instance this indicates that the configuration specified for this
490
is ambiguous. This state is cleared when
491
the configuration becomes unambiguous again. This state prevents further jobs
492
from being scheduled to that queue instance. Detailed reasons why
493
a queue instance entered the c(onfiguration ambiguous) state can
496
messages file and are shown by the
497
qstat -explain switch. For queue instances in this state the cluster
498
queue's default settings are used for the ambiguous attribute.
500
If an o(rphaned) state is displayed for a queue instance, it
501
indicates that the queue instance is no longer demanded by the current
502
cluster queue's configuration or the host group configuration.
503
The queue instance is kept because jobs which not yet finished
504
jobs are still associated with it, and it will vanish from qstat output
505
when these jobs have finished. To quicken vanishing of an orphaned
506
queue instance associated job(s) can be deleted using
508
A queue instance in (o)rphaned state can be revived by changing
509
the cluster queue configuration accordingly to cover that queue
510
instance. This state prevents from scheduling further jobs to that
513
If the \fB\-F\fP option was used, resource availability information is printed
514
following the queue status line. For each resource (as selected in an option
515
argument to \fB\-F\fP or for all resources if the option argument was
516
omitted) a single line is displayed with the following format:
518
a one letter specifier indicating whether the current resource availability
519
value was dominated by either
521
`\fBg\fP' - a cluster global,
523
`\fBh\fP' - a host total or
525
`\fBq\fP' - a queue related resource consumption.
527
a second one letter specifier indicating the source for the current resource
528
availability value, being one of
530
`\fBl\fP' - a load value reported for the
533
`\fBL\fP' - a load value for the resource after administrator
534
defined load scaling has been applied,
536
`\fBc\fP' - availability derived from
537
the consumable resources facility (see
541
availability definition derived from a non-consumable complex attribute or
542
a fixed resource limit.
544
after a colon the name of the resource on which information is displayed.
546
after an equal sign the current resource availability value.
548
The displayed availability values and the sources from which they derive are
549
always the minimum values of all possible combinations. Hence, for example,
550
a line of the form "qf:h_vmem=4G" indicates that a queue currently has a
551
maximum availability in virtual memory of 4 Gigabyte, where this value is a
552
fixed value (e.g. a resource limit in the queue configuration) and it is queue
553
dominated, i.e. the host in total may have more virtual memory available than
554
this, but the queue doesn't allow for more. Contrarily a line "hl:h_vmem=4G"
555
would also indicate an upper bound of 4 Gigabyte virtual memory
556
availability, but the limit would be derived from a load value currently
557
reported for the host. So while the queue might allow for jobs with higher
558
virtual memory requirements, the host on which this particular queue resides
559
currently only has 4 Gigabyte available.
561
If the \fB\-explain\fP option was used with the character 'a' or 'A',
562
information about resources is displayed, that
563
violate load or suspend thresholds.
565
The same format as with the \fB-F\fP option is used with following extensions:
567
the line starts with the keyword `alarm'
570
appended to the resource value is the type and value of the appropriate threshold
572
After the queue status line (in case of \fB\-f\fP) or the resource
573
availability information (in case of \fB\-F\fP) a single line is printed
574
for each job running currently in this queue. Each job status
579
the priority of the job determining its position in the pending jobs list.
580
The priority value is determined
581
dynamically based on ticket and urgency policy set-up (see also
589
the status of the job \- one of t(ransfering),
590
r(unning), R(estarted), s(uspended), S(uspended) or T(hreshold) (see the
591
\fBReduced Format\fP section for detailed information),
593
the submission or start time and date of the job.
595
the number of job slots or the function of parallel job tasks
596
if \fB\-g t\fP is specified.
598
Without \fB\-g t\fP option the number of slots occupied per queue resp. requested by the job
599
is displayed. For pending parallel jobs with a PE slot range request,
600
the assumed future slot allocation is displayed.
601
With \fB\-g t\fP option the function of the running jobs (MASTER or SLAVE \- the
602
latter for parallel jobs only) is displayed.
604
If the \fB\-t\fP option is supplied, each job status line also contains
608
the status of the task \- one of
609
r(unning), R(estarted), s(uspended), S(uspended), T(hreshold), w(aiting),
610
h(old), or x(exited) (see the
611
\fBReduced Format\fP section for detailed information),
613
the cpu, memory, and I/O usage,
615
the exit status of the task,
617
and the failure code and message for the task.
619
Following the list of queue sections a \fIPENDING JOBS\fP list may
620
be printed in case jobs are waiting for being assigned to a queue.
621
A status line for each waiting job is displayed being similar to
622
the one for the running jobs. The differences are that the status
623
for the jobs is w(aiting) or h(old), that the submit time and date
624
is shown instead of the start time and that no function
625
is displayed for the jobs.
627
In very rare cases, e.g. if
628
.M xxqs_name_sxx_qmaster 8
629
starts up from an inconsistent state in the job or queue spool
630
files or if the \fBclean queue\fP (\fB\-cq\fP) option of
634
cannot assign jobs to either the running or pending jobs section
635
of the output. In this case as job status inconsistency (e.g. a
636
job has a running status but is not assigned to a queue) has been
637
detected. Such jobs are printed in an \fIERROR JOBS\fP section at the
638
very end of the output. The ERROR JOBS section should disappear
640
.M xxqs_name_sxx_qmaster 8 .
641
Please contact your xxQS_NAMExx support representative if you feel
642
uncertain about the cause or effects of such jobs.
644
.SS "\fBExpanded Format (with \-r)\fP"
645
If the \fB\-r\fP option was specified together with \fIqstat\fP,
646
the following information for each displayed job is printed (a single line
647
for each of the following job characteristics):
649
The job and master queue name.
651
The hard and soft resource requirements of the job as specified
654
\fB\-l\fP option. The per resource
655
addend when determining the jobs urgency contribution value is
659
The requested parallel environment including the
660
desired queue slot range (see \fB\-pe\fP option of
663
The requested checkpointing environment of the job (see the
665
\fB\-ckpt\fP option).
667
In case of running jobs, the granted
668
parallel environment with the granted number of queue slots.
670
.SS "\fBEnhanced Output (with \-ext)\fP"
671
For each job the following additional items are displayed:
674
The total number of tickets in normalized fashion.
677
The project to which the job is assigned as specified in the
681
.IP "\fBdepartment\fP"
682
The department, to which the user belongs (use the \fB\-sul\fP and
683
\fB\-su\fP options of
685
to display the current department definitions).
688
The current accumulated CPU usage of the job in seconds.
691
The current accumulated memory usage of the job in Gbytes seconds.
694
The current accumulated IO usage of the job.
697
The total number of tickets assigned to the job currently
700
The override tickets as assigned by the \fB\-ot\fP option of
704
The override portion of the total number of tickets assigned to the
708
The functional portion of the total number of tickets assigned to the
712
The share portion of the total number of tickets assigned to the
716
The share of the total system to which the job is entitled currently.
718
.SS "\fBEnhanced Output (with \-urg)\fP"
719
For each job the following additional urgency policy related items are
724
The jobs total urgency value in normalized fashion.
727
The jobs total urgency value.
730
The urgency value contribution that reflects the urgency
731
that is related to the jobs overall resource requirement.
734
The urgency value contribution that reflects the urgency related to
735
the jobs waiting time.
738
The urgency value contribution that reflects the urgency related to
739
the jobs deadline initiation time.
742
The deadline initiation time of the job as specified with the
746
.SS "\fBEnhanced Output (with \-pri)\fP"
747
For each job, the following additional job priority related items are
752
The job's total urgency value in normalized fashion.
755
The job's \fB\-p\fP priority in normalized fashion.
758
The job's ticket amount in normalized fashion.
761
The job's \fB\-p\fP priority as specified by the user.
764
.SH "ENVIRONMENTAL VARIABLES"
766
.IP "\fBxxQS_NAME_Sxx_ROOT\fP" 1.5i
767
Specifies the location of the xxQS_NAMExx standard configuration
770
.IP "\fBxxQS_NAME_Sxx_CELL\fP" 1.5i
771
If set, specifies the default xxQS_NAMExx cell. To address a xxQS_NAMExx
774
uses (in the order of precedence):
778
The name of the cell specified in the environment
779
variable xxQS_NAME_Sxx_CELL, if it is set.
781
The name of the default cell, i.e. \fBdefault\fP.
786
.IP "\fBxxQS_NAME_Sxx_DEBUG_LEVEL\fP" 1.5i
787
If set, specifies that debug information
788
should be written to stderr. In addition the level of
789
detail in which debug information is generated is defined.
791
.IP "\fBxxQS_NAME_Sxx_QMASTER_PORT\fP" 1.5i
792
If set, specifies the tcp port on which
793
.M xxqs_name_sxx_qmaster 8
794
is expected to listen for communication requests.
795
Most installations will use a services map entry for the
796
service "sge_qmaster" instead to define that port.
798
.IP "\fBSGE_LONG_QNAMES\fP" 1.5i
799
Qstat does display queue names up to 30 characters. If that is
800
to much or not enough, one can set a custom length with this
801
variable. The minimum display length is 10 characters. If one does
802
not know the best display length, one can set SGE_LONG_QNAMES to
803
-1 and qstat will figure out the best length.
808
.ta \w'<xxqs_name_sxx_root>/ 'u
809
\fI<xxqs_name_sxx_root>/<cell>/common/act_qmaster\fP
810
xxQS_NAMExx master host file
811
.ta \w'<xxqs_name_sxx_root>/ 'u
812
\fI<xxqs_name_sxx_root>/<cell>/common/xxqs_name_sxx_qstat\fP
813
cluster qstat default options
814
\fI$HOME/.xxqs_name_sxx_qstat\fR
815
user qstat default options
820
.M xxqs_name_sxx_intro 1 ,
828
.M xxqs_name_sxx_execd 8 ,
829
.M xxqs_name_sxx_qmaster 8 ,
830
.M xxqs_name_sxx_shepherd 8 .
835
.M xxqs_name_sxx_intro 1
836
for a full statement of rights and permissions.