1
.TH "slurm.conf" "5" "August 2010" "slurm.conf 2.2" "Slurm configuration file"
4
slurm.conf \- Slurm configuration file
6
\fB/etc/slurm.conf\fP is an ASCII file which describes general SLURM
7
configuration information, the nodes to be managed, information about
8
how those nodes are grouped into partitions, and various scheduling
9
parameters associated with those partitions. This file should be
10
consistent across all nodes in the cluster.
12
You can use the \fBSLURM_CONF\fR environment variable to override the built\-in
13
location of this file. The SLURM daemons also allow you to override
14
both the built\-in and environment\-provided location using the "\-f"
15
option on the command line.
17
Note the while SLURM daemons create log files and other files as needed,
18
it treats the lack of parent directories as a fatal error.
19
This prevents the daemons from running if critical file systems are
20
not mounted and will minimize the risk of cold\-starting (starting
21
without preserving jobs).
23
The contents of the file are case insensitive except for the names of nodes
24
and partitions. Any text following a "#" in the configuration file is treated
25
as a comment through the end of that line.
26
The size of each line in the file is limited to 1024 characters.
27
Changes to the configuration file take effect upon restart of
28
SLURM daemons, daemon receipt of the SIGHUP signal, or execution
29
of the command "scontrol reconfigure" unless otherwise noted.
31
If a line begins with the word "Include" followed by whitespace
32
and then a file name, that file will be included inline with the current
35
Note on file permissions:
37
The \fIslurm.conf\fR file must be readable by all users of SLURM, since it
38
is used by many of the SLURM commands. Other files that are defined
39
in the \fIslurm.conf\fR file, such as log files and job accounting files,
40
may need to be created/owned by the "SlurmUser" uid to be successfully
41
accessed. Use the "chown" and "chmod" commands to set the ownership
42
and permissions appropriately.
43
See the section \fBFILE AND DIRECTORY PERMISSIONS\fR for information
44
about the various files and directories used by SLURM.
48
The overall configuration parameters available include:
51
\fBAccountingStorageBackupHost\fR
52
The name of the backup machine hosting the accounting storage database.
53
If used with the accounting_storage/slurmdbd plugin, this is where the backup
54
slurmdbd would be running.
55
Only used for database type storage plugins, ignored otherwise.
58
\fBAccountingStorageEnforce\fR
59
This controls what level of association\-based enforcement to impose
60
on job submissions. Valid options are any combination of
61
\fIassociations\fR, \fIlimits\fR, \fIqos\fR, and \fIwckeys\fR, or
62
\fIall\fR for all things. If limits, qos, or wckeys are set,
63
associations will automatically be set. In addition, if wckeys is
64
set, TrackWCKey will automatically be set. By enforcing Associations
65
no new job is allowed to run unless a corresponding association exists
66
in the system. If limits are enforced users can be limited by
67
association to whatever job size or run time limits are defined. With
68
qos and/or wckeys enforced jobs will not be scheduled unless a valid
69
qos and/or workload characterization key is specified. When
70
\fBAccountingStorageEnforce\fR is changed, a restart of the slurmctld
71
daemon is required (not just a "scontrol reconfig").
74
\fBAccountingStorageHost\fR
75
The name of the machine hosting the accounting storage database.
76
Only used for database type storage plugins, ignored otherwise.
77
Also see \fBDefaultStorageHost\fR.
80
\fBAccountingStorageLoc\fR
81
The fully qualified file name where accounting records are written
82
when the \fBAccountingStorageType\fR is "accounting_storage/filetxt"
83
or else the name of the database where accounting records are stored when the
84
\fBAccountingStorageType\fR is a database.
85
Also see \fBDefaultStorageLoc\fR.
88
\fBAccountingStoragePass\fR
89
The password used to gain access to the database to store the
90
accounting data. Only used for database type storage plugins, ignored
91
otherwise. In the case of SLURM DBD (Database Daemon) with MUNGE
92
authentication this can be configured to use a MUNGE daemon
93
specifically configured to provide authentication between clusters
94
while the default MUNGE daemon provides authentication within a
95
cluster. In that case, \fBAccountingStoragePass\fR should specify the
96
named port to be used for communications with the alternate MUNGE
97
daemon (e.g. "/var/run/munge/global.socket.2"). The default value is
98
NULL. Also see \fBDefaultStoragePass\fR.
101
\fBAccountingStoragePort\fR
102
The listening port of the accounting storage database server.
103
Only used for database type storage plugins, ignored otherwise.
104
Also see \fBDefaultStoragePort\fR.
107
\fBAccountingStorageType\fR
108
The accounting storage mechanism type. Acceptable values at
109
present include "accounting_storage/filetxt",
110
"accounting_storage/mysql", "accounting_storage/none",
111
"accounting_storage/pgsql", and "accounting_storage/slurmdbd". The
112
"accounting_storage/filetxt" value indicates that accounting records
113
will be written to the file specified by the
114
\fBAccountingStorageLoc\fR parameter. The "accounting_storage/mysql"
115
value indicates that accounting records will be written to a MySQL
116
database specified by the \fBAccountingStorageLoc\fR parameter. The
117
"accounting_storage/pgsql" value indicates that accounting records
118
will be written to a PostgreSQL database specified by the
119
\fBAccountingStorageLoc\fR parameter. The
120
"accounting_storage/slurmdbd" value indicates that accounting records
121
will be written to the SLURM DBD, which manages an underlying MySQL or
122
PostgreSQL database. See "man slurmdbd" for more information. The
123
default value is "accounting_storage/none" and indicates that account
124
records are not maintained. Note: the PostgreSQL plugin is not
125
complete and should not be used if wanting to use associations. It
126
will however work with basic accounting of jobs and job steps. If
127
interested in completing, please email slurm-dev@lists.llnl.gov. Also
128
see \fBDefaultStorageType\fR.
131
\fBAccountingStorageUser\fR
132
The user account for accessing the accounting storage database.
133
Only used for database type storage plugins, ignored otherwise.
134
Also see \fBDefaultStorageUser\fR.
138
The authentication method for communications between SLURM
140
Acceptable values at present include "auth/none", "auth/authd",
142
The default value is "auth/munge".
143
"auth/none" includes the UID in each communication, but it is not verified.
144
This may be fine for testing purposes, but
145
\fBdo not use "auth/none" if you desire any security\fR.
146
"auth/authd" indicates that Brett Chun's authd is to be used (see
147
"http://www.theether.org/authd/" for more information. Note that
148
authd is no longer actively supported).
149
"auth/munge" indicates that LLNL's MUNGE is to be used
150
(this is the best supported authentication mechanism for SLURM,
151
see "http://munge.googlecode.com/" for more information).
152
All SLURM daemons and commands must be terminated prior to changing
153
the value of \fBAuthType\fR and later restarted (SLURM jobs can be
158
The name that \fBBackupController\fR should be referred to in
159
establishing a communications path. This name will
160
be used as an argument to the gethostbyname() function for
161
identification. For example, "elx0000" might be used to designate
162
the Ethernet address for node "lx0000".
163
By default the \fBBackupAddr\fR will be identical in value to
164
\fBBackupController\fR.
167
\fBBackupController\fR
168
The name of the machine where SLURM control functions are to be
169
executed in the event that \fBControlMachine\fR fails. This node
170
may also be used as a compute server if so desired. It will come into service
171
as a controller only upon the failure of ControlMachine and will revert
172
to a "standby" mode when the ControlMachine becomes available once again.
173
This should be a node name without the full domain name. I.e., the hostname
174
returned by the \fIgethostname()\fR function cut at the first dot (e.g. use
175
"tux001" rather than "tux001.my.com").
176
While not essential, it is recommended that you specify a backup controller.
177
See the \fBRELOCATING CONTROLLERS\fR section if you change this.
180
\fBBatchStartTimeout\fR
181
The maximum time (in seconds) that a batch job is permitted for
182
launching before being considered missing and releasing the
183
allocation. The default value is 10 (seconds). Larger values may be
184
required if more time is required to execute the \fBProlog\fR, load
185
user environment variables (for Moab spawned jobs), or if the slurmd
186
daemon gets paged from memory.
190
If set to 1, the slurmd daemon will cache /etc/groups entries.
191
This can improve performance for highly parallel jobs if NIS servers
192
are used and unable to respond very quickly.
193
The default value is 0 to disable caching group data.
197
The system\-initiated checkpoint method to be used for user jobs.
198
The slurmctld daemon must be restarted for a change in \fBCheckpointType\fR
200
Supported values presently include:
206
\fBcheckpoint/blcr\fR
207
Berkeley Lab Checkpoint Restart (BLCR)
209
\fBcheckpoint/none\fR
210
no checkpoint support (default)
212
\fBcheckpoint/ompi\fR
213
OpenMPI (version 1.3 or higher)
215
\fBcheckpoint/xlch\fR
216
XLCH (requires that SlurmUser be root)
221
The name by which this SLURM managed cluster is known in the
222
accounting database. This is needed distinguish accounting records
223
when multiple clusters report to the same database.
227
The time, in seconds, given for a job to remain in COMPLETING state
228
before any additional jobs are scheduled.
229
If set to zero, pending jobs will be started as soon as possible.
230
Since a COMPLETING job's resources are released for use by other
231
jobs as soon as the \fBEpilog\fR completes on each individual node,
232
this can result in very fragmented resource allocations.
233
To provide jobs with the minimum response time, a value of zero is
234
recommended (no waiting).
235
To minimize fragmentation of resources, a value equal to \fBKillWait\fR
236
plus two is recommended.
237
In that case, setting \fBKillWait\fR to a small value may be beneficial.
238
The default value of \fBCompleteWait\fR is zero seconds.
239
The value may not exceed 65533.
243
Name that \fBControlMachine\fR should be referred to in
244
establishing a communications path. This name will
245
be used as an argument to the gethostbyname() function for
246
identification. For example, "elx0000" might be used to designate
247
the Ethernet address for node "lx0000".
248
By default the \fBControlAddr\fR will be identical in value to
249
\fBControlMachine\fR.
253
The short hostname of the machine where SLURM control functions are
254
executed (i.e. the name returned by the command "hostname \-s", use
255
"tux001" rather than "tux001.my.com").
256
This value must be specified.
257
In order to support some high availability architectures, multiple
258
hostnames may be listed with comma separators and one \fBControlAddr\fR
259
must be specified. The high availability system must insure that the
260
slurmctld daemon is running on only one of these hosts at a time.
261
See the \fBRELOCATING CONTROLLERS\fR section if you change this.
265
The cryptographic signature tool to be used in the creation of
266
job step credentials.
267
The slurmctld daemon must be restarted for a change in \fBCryptoType\fR
269
Acceptable values at present include "crypto/munge" and "crypto/openssl".
270
The default value is "crypto/munge".
274
Defines specific subsystems which should provide more detailed event logging.
275
Multiple subsystems can be specified with comma separators.
276
Valid subsystems available today (with more to come) include:
280
Backfill scheduler details
283
BlueGene block selection details
285
\fBBGBlockAlgoDeep\fR
286
BlueGene block selection, more details
289
BlueGene block selection for jobs
292
BlueGene block wiring (switch state details)
295
CPU binding details for jobs and steps
298
Generic resource details
301
Gang scheduling details
304
Do not log when the slurm.conf files differs between SLURM daemons
310
Advanced reservations
313
Resource selection plugin
316
Slurmctld resource allocation for job steps
322
Sched/wiki and wiki2 communications
327
Default real memory size available per allocated CPU in MegaBytes.
328
Used to avoid over\-subscribing memory and causing paging.
329
\fBDefMemPerCPU\fR would generally be used if individual processors
330
are allocated to jobs (\fBSelectType=select/cons_res\fR).
331
The default value is 0 (unlimited).
332
Also see \fBDefMemPerNode\fR and \fBMaxMemPerCPU\fR.
333
\fBDefMemPerCPU\fR and \fBDefMemPerNode\fR are mutually exclusive.
334
NOTE: Enforcement of memory limits currently requires enabling of
335
accounting, which samples memory use on a periodic basis (data need
336
not be stored, just collected).
340
Default real memory size available per allocated node in MegaBytes.
341
Used to avoid over\-subscribing memory and causing paging.
342
\fBDefMemPerNode\fR would generally be used if whole nodes
343
are allocated to jobs (\fBSelectType=select/linear\fR) and
344
resources are shared (\fBShared=yes\fR or \fBShared=force\fR).
345
The default value is 0 (unlimited).
346
Also see \fBDefMemPerCPU\fR and \fBMaxMemPerNode\fR.
347
\fBDefMemPerCPU\fR and \fBDefMemPerNode\fR are mutually exclusive.
348
NOTE: Enforcement of memory limits currently requires enabling of
349
accounting, which samples memory use on a periodic basis (data need
350
not be stored, just collected).
353
\fBDefaultStorageHost\fR
354
The default name of the machine hosting the accounting storage and
355
job completion databases.
356
Only used for database type storage plugins and when the
357
\fBAccountingStorageHost\fR and \fBJobCompHost\fR have not been
361
\fBDefaultStorageLoc\fR
362
The fully qualified file name where accounting records and/or job
363
completion records are written when the \fBDefaultStorageType\fR is
364
"filetxt" or the name of the database where accounting records and/or job
365
completion records are stored when the \fBDefaultStorageType\fR is a
367
Also see \fBAccountingStorageLoc\fR and \fBJobCompLoc\fR.
370
\fBDefaultStoragePass\fR
371
The password used to gain access to the database to store the
372
accounting and job completion data.
373
Only used for database type storage plugins, ignored otherwise.
374
Also see \fBAccountingStoragePass\fR and \fBJobCompPass\fR.
377
\fBDefaultStoragePort\fR
378
The listening port of the accounting storage and/or job completion
380
Only used for database type storage plugins, ignored otherwise.
381
Also see \fBAccountingStoragePort\fR and \fBJobCompPort\fR.
384
\fBDefaultStorageType\fR
385
The accounting and job completion storage mechanism type. Acceptable
386
values at present include "filetxt", "mysql", "none", "pgsql", and
387
"slurmdbd". The value "filetxt" indicates that records will be
388
written to a file. The value "mysql" indicates that accounting
389
records will be written to a mysql database. The default value is
390
"none", which means that records are not maintained. The value
391
"pgsql" indicates that records will be written to a PostgreSQL
392
database. The value "slurmdbd" indicates that records will be written
393
to the SLURM DBD, which maintains its own database. See "man slurmdbd"
394
for more information.
395
Also see \fBAccountingStorageType\fR and \fBJobCompType\fR.
398
\fBDefaultStorageUser\fR
399
The user account for accessing the accounting storage and/or job
401
Only used for database type storage plugins, ignored otherwise.
402
Also see \fBAccountingStorageUser\fR and \fBJobCompUser\fR.
405
\fBDisableRootJobs\fR
406
If set to "YES" then user root will be prevented from running any jobs.
407
The default value is "NO", meaning user root will be able to execute jobs.
408
\fBDisableRootJobs\fR may also be set by partition.
411
\fBEnforcePartLimits\fR
412
If set to "YES" then jobs which exceed a partition's size and/or time limits
413
will be rejected at submission time. If set to "NO" then the job will be
414
accepted and remain queued until the partition limits are altered.
415
The default value is "NO".
419
Fully qualified pathname of a script to execute as user root on every
420
node when a user's job completes (e.g. "/usr/local/slurm/epilog"). This may
421
be used to purge files, disable user login, etc.
422
By default there is no epilog.
423
See \fBProlog and Epilog Scripts\fR for more information.
427
The number of microseconds the the slurmctld daemon requires to process
428
an epilog completion message from the slurmd dameons. This parameter can
429
be used to prevent a burst of epilog completion messages from being sent
430
at the same time which should help prevent lost messages and improve
431
throughput for large jobs.
432
The default value is 2000 microseconds.
433
For a 1000 node job, this spreads the epilog completion messages out over
437
\fBEpilogSlurmctld\fR
438
Fully qualified pathname of a program for the slurmctld to execute
439
upon termination of a job allocation (e.g.
440
"/usr/local/slurm/epilog_controller").
441
The program executes as SlurmUser, which gives it permission to drain
442
nodes and requeue the job if a failure occurs or cancel the job if appropriate.
443
The program can be used to reboot nodes or perform other work to prepare
445
See \fBProlog and Epilog Scripts\fR for more information.
449
Controls how a node's configuration specifications in slurm.conf are used.
450
If the number of node configuration entries in the configuration file
451
is significantly lower than the number of nodes, setting FastSchedule to
452
1 will permit much faster scheduling decisions to be made.
453
(The scheduler can just check the values in a few configuration records
454
instead of possibly thousands of node records.)
455
Note that on systems with hyper\-threading, the processor count
456
reported by the node will be twice the actual processor count.
457
Consider which value you want to be used for scheduling purposes.
461
Consider the configuration of each node to be that specified in the
462
slurm.conf configuration file and any node with less than the
463
configured resources will be set DOWN.
466
Base scheduling decisions upon the actual configuration of each individual
467
node except that the node's processor count in SLURM's configuration must
468
match the actual hardware configuration if \fBSchedulerType=sched/gang\fR
469
or \fBSelectType=select/cons_res\fR are configured (both of those plugins
470
maintain resource allocation information using bitmaps for the cores in the
471
system and must remain static, while the node's memory and disk space can
472
be established later).
475
Consider the configuration of each node to be that specified in the
476
slurm.conf configuration file and any node with less than the
477
configured resources will \fBnot\fR be set DOWN.
478
This can be useful for testing purposes.
483
The job id to be used for the first submitted to SLURM without a
484
specific requested value. Job id values generated will incremented by 1
485
for each subsequent job. This may be used to provide a meta\-scheduler
486
with a job id space which is disjoint from the interactive jobs.
487
The default value is 1.
491
Used for Moab scheduled jobs only. Controls how long job should wait
492
in seconds for loading the user's environment before attempting to
493
load it from a cache file. Applies when the srun or sbatch
494
\fI\-\-get\-user\-env\fR option is used. If set to 0 then always load
495
the user's environment from the cache file.
496
The default value is 2 seconds.
500
A comma delimited list of generic resources to be managed.
501
These generic resources may have an associated plugin available to provide
502
additional functionality.
503
No generic resources are managed by default.
504
Insure this parameter is consistent across all nodes in the cluster for
506
The slurmctld daemon must be restarted for changes to this parameter to become
510
\fBGroupUpdateForce\fR
511
If set to a non\-zero value, then information about which users are members
512
of groups allowed to use a partition will be updated periodically, even when
513
there have been no changes to the /etc/group file.
514
Otherwise group member information will be updated periodically only after the
515
/etc/group file is updated
516
The default vaue is 0.
517
Also see the \fBGroupUpdateTime\fR parameter.
520
\fBGroupUpdateTime\fR
521
Controls how frequently information about which users are members of groups
522
allowed to use a partition will be updated.
523
The time interval is given in seconds with a default value of 600 seconds and
524
a maximum value of 4095 seconds.
525
A value of zero will prevent periodic updating of group membership information.
526
Also see the \fBGroupUpdateForce\fR parameter.
529
\fBHealthCheckInterval\fR
530
The interval in seconds between executions of \fBHealthCheckProgram\fR.
531
The default value is zero, which disables execution.
534
\fBHealthCheckProgram\fR
535
Fully qualified pathname of a script to execute as user root periodically
536
on all compute nodes that are not in the NOT_RESPONDING state. This may be
537
used to verify the node is fully operational and DRAIN the node or send email
538
if a problem is detected.
539
Any action to be taken must be explicitly performed by the program
541
"scontrol update NodeName=foo State=drain Reason=tmp_file_system_full"
543
The interval is controlled using the \fBHealthCheckInterval\fR parameter.
544
Note that the \fBHealthCheckProgram\fR will be executed at the same time
545
on all nodes to minimize its impact upon parallel programs.
546
This program is will be killed if it does not terminate normally within
548
By default, no program will be executed.
552
The interval, in seconds, after which a non\-responsive job allocation
553
command (e.g. \fBsrun\fR or \fBsalloc\fR) will result in the job being
554
terminated. If the node on which the command is executed fails or the
555
command abnormally terminates, this will terminate its job allocation.
556
This option has no effect upon batch jobs.
557
When setting a value, take into consideration that a debugger using \fBsrun\fR
558
to launch an application may leave the \fBsrun\fR command in a stopped state
559
for extended periods of time.
560
This limit is ignored for jobs running in partitions with the
561
\fBRootOnly\fR flag set (the scheduler running as root will be
562
responsible for the job).
563
The default value is unlimited (zero) and may not exceed 65533 seconds.
566
\fBJobAcctGatherType\fR
567
The job accounting mechanism type.
568
Acceptable values at present include "jobacct_gather/aix" (for AIX operating
569
system), "jobacct_gather/linux" (for Linux operating system) and "jobacct_gather/none"
570
(no accounting data collected).
571
The default value is "jobacct_gather/none".
572
In order to use the \fBsstat\fR tool, "jobacct_gather/aix" or "jobacct_gather/linux"
576
\fBJobAcctGatherFrequency\fR
577
The job accounting sampling interval.
578
For jobacct_gather/none this parameter is ignored.
579
For jobacct_gather/aix and jobacct_gather/linux the parameter is a number is
580
seconds between sampling job state.
581
The default value is 30 seconds.
582
A value of zero disables real the periodic job sampling and provides accounting
583
information only on job termination (reducing SLURM interference with the job).
584
Smaller (non\-zero) values have a greater impact upon job performance, but
585
a value of 30 seconds is not likely to be noticeable for applications having
586
less than 10,000 tasks.
587
Users can override this value on a per job basis using the \fB\-\-acctg\-freq\fR
588
option when submitting the job.
591
\fBJobCheckpointDir\fR
592
Specifies the default directory for storing or reading job checkpoint
593
information. The data stored here is only a few thousand bytes per job
594
and includes information needed to resubmit the job request, not job's
595
memory image. The directory must be readable and writable by
596
\fBSlurmUser\fR, but not writable by regular users. The job memory images
597
may be in a different location as specified by \fB\-\-checkpoint\-dir\fR
598
option at job submit time or scontrol's \fBImageDir\fR option.
602
The name of the machine hosting the job completion database.
603
Only used for database type storage plugins, ignored otherwise.
604
Also see \fBDefaultStorageHost\fR.
608
The fully qualified file name where job completion records are written
609
when the \fBJobCompType\fR is "jobcomp/filetxt" or the database where
610
job completion records are stored when the \fBJobCompType\fR is a
612
Also see \fBDefaultStorageLoc\fR.
616
The password used to gain access to the database to store the job
618
Only used for database type storage plugins, ignored otherwise.
619
Also see \fBDefaultStoragePass\fR.
623
The listening port of the job completion database server.
624
Only used for database type storage plugins, ignored otherwise.
625
Also see \fBDefaultStoragePort\fR.
629
The job completion logging mechanism type.
630
Acceptable values at present include "jobcomp/none", "jobcomp/filetxt",
631
"jobcomp/mysql", "jobcomp/pgsql", and "jobcomp/script"".
632
The default value is "jobcomp/none", which means that upon job completion
633
the record of the job is purged from the system. If using the accounting
634
infrastructure this plugin may not be of interest since the information
636
The value "jobcomp/filetxt" indicates that a record of the job should be
637
written to a text file specified by the \fBJobCompLoc\fR parameter.
638
The value "jobcomp/mysql" indicates that a record of the job should be
639
written to a mysql database specified by the \fBJobCompLoc\fR parameter.
640
The value "jobcomp/pgsql" indicates that a record of the job should be
641
written to a PostgreSQL database specified by the \fBJobCompLoc\fR parameter.
642
The value "jobcomp/script" indicates that a script specified by the
643
\fBJobCompLoc\fR parameter is to be executed with environment variables
644
indicating the job information.
648
The user account for accessing the job completion database.
649
Only used for database type storage plugins, ignored otherwise.
650
Also see \fBDefaultStorageUser\fR.
653
\fBJobCredentialPrivateKey\fR
654
Fully qualified pathname of a file containing a private key used for
655
authentication by SLURM daemons.
656
This parameter is ignored if \fBCryptoType=crypto/munge\fR.
659
\fBJobCredentialPublicCertificate\fR
660
Fully qualified pathname of a file containing a public key used for
661
authentication by SLURM daemons.
662
This parameter is ignored if \fBCryptoType=crypto/munge\fR.
666
This option controls what to do if a job's output or error file
667
exist when the job is started.
668
If \fBJobFileAppend\fR is set to a value of 1, then append to
670
By default, any existing file is truncated.
674
This option controls what to do by default after a node failure.
675
If \fBJobRequeue\fR is set to a value of 1, then any batch job running
676
on the failed node will be requeued for execution on different nodes.
677
If \fBJobRequeue\fR is set to a value of 0, then any job running
678
on the failed node will be terminated.
679
Use the \fBsbatch\fR \fI\-\-no\-requeue\fR or \fI\-\-requeue\fR
680
option to change the default behavior for individual jobs.
681
The default value is 1.
684
\fBJobSubmitPlugins\fR
685
A comma delimited list of job submission plugins to be used.
686
The specified plugins will be executed in the order listed.
687
These are intended to be site\-specific plugins which can be used to set
688
default job parameters and/or logging events.
689
Sample plugins available in the distribution include "cnode", "defaults",
690
"logging", "lua", and "partition".
691
See the SLURM code in "src/plugins/job_submit" and modify the code to satisfy
693
No job submission plugins are used by default.
697
If set to 1, the job will be terminated immediately when one of the
698
processes is crashed or aborted. With default value of 0, if one of
699
the processes is crashed or aborted the other processes will continue
704
The interval, in seconds, given to a job's processes between the
705
SIGTERM and SIGKILL signals upon reaching its time limit.
706
If the job fails to terminate gracefully in the interval specified,
707
it will be forcibly terminated.
708
The default value is 30 seconds.
709
The value may not exceed 65533.
713
Specification of licenses (or other resources available on all
714
nodes of the cluster) which can be allocated to jobs.
715
License names can optionally be followed by an asterisk
716
and count with a default count of one.
717
Multiple license names should be comma separated (e.g.
718
"Licenses=foo*4,bar").
719
Note that SLURM prevents jobs from being scheduled if their
720
required license specification is not available.
721
SLURM does not prevent jobs from using licenses that are
722
not explicitly listed in the job submission specification.
726
Fully qualified pathname to the program used to send email per user request.
727
The default value is "/bin/mail".
731
The maximum number of jobs SLURM can have in its active database
732
at one time. Set the values of \fBMaxJobCount\fR and \fBMinJobAge\fR
733
to insure the slurmctld daemon does not exhaust its memory or other
734
resources. Once this limit is reached, requests to submit additional
735
jobs will fail. The default value is 10000 jobs. This value may not
736
be reset via "scontrol reconfig". It only takes effect upon restart
737
of the slurmctld daemon.
741
Maximum real memory size available per allocated CPU in MegaBytes.
742
Used to avoid over\-subscribing memory and causing paging.
743
\fBMaxMemPerCPU\fR would generally be used if individual processors
744
are allocated to jobs (\fBSelectType=select/cons_res\fR).
745
The default value is 0 (unlimited).
746
Also see \fBDefMemPerCPU\fR and \fBMaxMemPerNode\fR.
747
\fBMaxMemPerCPU\fR and \fBMaxMemPerNode\fR are mutually exclusive.
748
NOTE: Enforcement of memory limits currently requires enabling of
749
accounting, which samples memory use on a periodic basis (data need
750
not be stored, just collected).
754
Maximum real memory size available per allocated node in MegaBytes.
755
Used to avoid over\-subscribing memory and causing paging.
756
\fBMaxMemPerNode\fR would generally be used if whole nodes
757
are allocated to jobs (\fBSelectType=select/linear\fR) and
758
resources are shared (\fBShared=yes\fR or \fBShared=force\fR).
759
The default value is 0 (unlimited).
760
Also see \fBDefMemPerNode\fR and \fBMaxMemPerCPU\fR.
761
\fBMaxMemPerCPU\fR and \fBMaxMemPerNode\fR are mutually exclusive.
762
NOTE: Enforcement of memory limits currently requires enabling of
763
accounting, which samples memory use on a periodic basis (data need
764
not be stored, just collected).
767
\fBMaxTasksPerNode\fR
768
Maximum number of tasks SLURM will allow a job step to spawn
769
on a single node. The default \fBMaxTasksPerNode\fR is 128.
773
Time permitted for a round\-trip communication to complete
774
in seconds. Default value is 10 seconds. For systems with
775
shared nodes, the slurmd daemon could be paged out and
776
necessitate higher values.
780
The minimum age of a completed job before its record is purged from
781
SLURM's active database. Set the values of \fBMaxJobCount\fR and
782
\fBMinJobAge\fR to insure the slurmctld daemon does not exhaust
783
its memory or other resources. The default value is 300 seconds.
784
A value of zero prevents any job record purging.
785
May not exceed 65533.
789
Identifies the default type of MPI to be used.
790
Srun may override this configuration parameter in any case.
791
Currently supported versions include:
798
\fBnone\fR (default, which works for many other versions of MPI) and
800
More information about MPI use is available here
801
<https://computing.llnl.gov/linux/slurm/mpi_guide.html>.
806
Used to identify ports used by OpenMPI only and the input format is
807
"ports=12000\-12999" to identify a range of communication ports to be used.
811
Number of minutes by which a job can exceed its time limit before
813
The configured job time limit is treated as a \fIsoft\fR limit.
814
Adding \fBOverTimeLimit\fR to the \fIsoft\fR limit provides a \fIhard\fR
815
limit, at which point the job is canceled.
816
This is particularly useful for backfill scheduling, which bases upon
817
each job's soft time limit.
818
The default value is zero.
819
Man not exceed exceed 65533 minutes.
820
A value of "UNLIMITED" is also supported.
824
Identifies the places in which to look for SLURM plugins.
825
This is a colon\-separated list of directories, like the PATH
826
environment variable.
827
The default value is "/usr/local/lib/slurm".
830
\fBPlugStackConfig\fR
831
Location of the config file for SLURM stackable plugins that use
832
the Stackable Plugin Architecture for Node job (K)control (SPANK).
833
This provides support for a highly configurable set of plugins to
834
be called before and/or after execution of each task spawned as
835
part of a user's job step. Default location is "plugstack.conf"
836
in the same directory as the system slurm.conf. For more information
837
on SPANK plugins, see the \fBspank\fR(8) manual.
841
Enables gang scheduling and/or controls the mechanism used to preempt
842
jobs. When the \fBPreemptType\fR parameter is set to enable
843
preemption, the \fBPreemptMode\fR selects the mechanism used to
844
preempt the lower priority jobs. The \fBGANG\fR option is used to
845
enable gang scheduling independent of whether preemption is enabled
846
(the \fBPreemptType\fR setting). The \fBGANG\fR option can be
847
specified in addition to a \fBPreemptMode\fR setting with the two
848
options comma separated. The \fBSUSPEND\fR option requires that gang
849
scheduling be enabled (i.e, "PreemptMode=SUSPEND,GANG").
853
is the default value and disables job preemption and gang scheduling.
854
This is the only option compatible with \fBSchedulerType=sched/wiki\fR
855
or \fBSchedulerType=sched/wiki2\fR (used by Maui and Moab respectively,
856
which provide their own job preemption functionality).
859
always cancel the job.
862
preempts jobs by checkpointing them (if possible) or canceling them.
865
enables gang scheduling (time slicing) of jobs in the same partition.
868
preempts jobs by requeuing them (if possible) or canceling them.
871
preempts jobs by suspending them.
872
A suspended job will resume execution once the high priority job
873
preempting it completes.
874
The \fBSUSPEND\fR may only be used with the \fBGANG\fR option
875
(the gang scheduler module performs the job resume operation).
880
This specifies the plugin used to identify which jobs can be
881
preempted in order to start a pending job.
885
Job preemption is disabled.
888
\fBpreempt/partition_prio\fR
889
Job preemption is based upon partition priority.
890
Jobs in higher priority partitions (queues) may preempt jobs from lower
894
Job preemption rules are specified by Quality Of Service (QOS) specifications
895
in the SLURM database a database.
896
This is not compatible with \fBPreemptMode=OFF\fR or \fBPreemptMode=SUSPEND\fR
897
(i.e. preempted jobs must be removed from the resources).
901
\fBPriorityDecayHalfLife\fR
902
This controls how long prior resource use is considered in determining
903
how over\- or under\-serviced an association is (user, bank account and
904
cluster) in determining job priority. If set to 0 no decay will be applied.
905
This is helpful if you want to enforce hard time limits per association. If
906
set to 0 \fBPriorityUsageResetPeriod\fR must be set to some interval.
907
Applicable only if PriorityType=priority/multifactor.
908
The unit is a time string (i.e. min, hr:min:00, days\-hr:min:00,
909
or days\-hr). The default value is 7\-0 (7 days).
912
\fBPriorityCalcPeriod\fR
913
The period of time in minutes in which the half-life decay will be
915
Applicable only if PriorityType=priority/multifactor.
916
The default value is 5 (minutes).
919
\fBPriorityFavorSmall\fR
920
Specifies that small jobs should be given preferential scheduling priority.
921
Applicable only if PriorityType=priority/multifactor.
922
Supported values are "YES" and "NO". The default value is "NO".
926
Specifies the job age which will be given the maximum age factor in computing
927
priority. For example, a value of 30 minutes would result in all jobs over
928
30 minutes old would get the same age\-based priority.
929
Applicable only if PriorityType=priority/multifactor.
930
The unit is a time string (i.e. min, hr:min:00, days\-hr:min:00,
931
or days\-hr). The default value is 7\-0 (7 days).
934
\fBPriorityUsageResetPeriod\fR
935
At this interval the usage of associations will be reset to 0. This is used
936
if you want to enforce hard limits of time usage per association. If
937
PriorityDecayHalfLife is set to be 0 no decay will happen and this is the
938
only way to reset the usage accumulated by running jobs. By default this is
939
turned off and it is advised to use the PriorityDecayHalfLife option to avoid
940
not having anything running on your cluster, but if your schema is set up to
941
only allow certain amounts of time on your system this is the way to do it.
942
Applicable only if PriorityType=priority/multifactor.
946
Never clear historic usage. The default value.
949
Clear the historic usage now.
950
Executed at startup and reconfiguration time.
953
Cleared every day at midnight.
956
Cleared every week on Sunday at time 00:00.
959
Cleared on the first day of each month at time 00:00.
962
Cleared on the first day of each quarter at time 00:00.
965
Cleared on the first day of each year at time 00:00.
970
This specifies the plugin to be used in establishing a job's scheduling
971
priority. Supported values are "priority/basic" (jobs are prioritized
972
by order of arrival, also suitable for sched/wiki and sched/wiki2) and
973
"priority/multifactor" (jobs are prioritized based upon size, age,
974
fair\-share of allocation, etc).
975
The default value is "priority/basic".
978
\fBPriorityWeightAge\fR
979
An integer value that sets the degree to which the queue wait time
980
component contributes to the job's priority.
981
Applicable only if PriorityType=priority/multifactor.
982
The default value is 0.
985
\fBPriorityWeightFairshare\fR
986
An integer value that sets the degree to which the fair-share
987
component contributes to the job's priority.
988
Applicable only if PriorityType=priority/multifactor.
989
The default value is 0.
992
\fBPriorityWeightJobSize\fR
993
An integer value that sets the degree to which the job size
994
component contributes to the job's priority.
995
Applicable only if PriorityType=priority/multifactor.
996
The default value is 0.
999
\fBPriorityWeightPartition\fR
1000
An integer value that sets the degree to which the node partition
1001
component contributes to the job's priority.
1002
Applicable only if PriorityType=priority/multifactor.
1003
The default value is 0.
1006
\fBPriorityWeightQOS\fR
1007
An integer value that sets the degree to which the Quality Of Service
1008
component contributes to the job's priority.
1009
Applicable only if PriorityType=priority/multifactor.
1010
The default value is 0.
1014
This controls what type of information is hidden from regular users.
1015
By default, all information is visible to all users.
1016
User \fBSlurmUser\fR and \fBroot\fR can always view all information.
1017
Multiple values may be specified with a comma separator.
1018
Acceptable values include:
1022
(NON-SLURMDBD ACCOUNTING ONLY) prevents users from viewing any account
1023
definitions unless they are coordinators of them.
1026
prevents users from viewing jobs or job steps belonging
1027
to other users. (NON-SLURMDBD ACCOUNTING ONLY) prevents users from viewing
1028
job records belonging to other users unless they are coordinators of
1029
the association running the job when using sacct.
1032
prevents users from viewing node state information.
1035
prevents users from viewing partition state information.
1038
prevents regular users from viewing reservations.
1041
(NON-SLURMDBD ACCOUNTING ONLY) prevents users from viewing
1042
usage of any other user. This applies to sreport.
1045
(NON-SLURMDBD ACCOUNTING ONLY) prevents users from viewing
1046
information of any user other than themselves, this also makes it so users can
1047
only see associations they deal with.
1048
Coordinators can see associations of all users they are coordinator of,
1049
but can only see themselves when listing users.
1054
Identifies the plugin to be used for process tracking.
1055
The slurmd daemon uses this mechanism to identify all processes
1056
which are children of processes it spawns for a user job.
1057
The slurmd daemon must be restarted for a change in ProctrackType
1059
NOTE: "proctrack/linuxproc" and "proctrack/pgid" can fail to
1060
identify all processes associated with a job since processes
1061
can become a child of the init process (when the parent process
1062
terminates) or change their process group.
1063
To reliably track all processes, one of the other mechanisms
1064
utilizing kernel modifications is preferable.
1065
NOTE: "proctrack/linuxproc" is not compatible with "switch/elan."
1066
Acceptable values at present include:
1070
which uses an AIX kernel extension and is the default for AIX systems
1072
\fBproctrack/cgroup\fR
1073
which uses linux cgroups to constrain and track processes.
1074
NOTE: see "man cgroup.conf" for configuration details
1076
\fBproctrack/linuxproc\fR
1077
which uses linux process tree using parent process IDs
1080
which uses Quadrics kernel patch and is the default if "SwitchType=switch/elan"
1082
\fBproctrack/sgi_job\fR
1083
which uses SGI's Process Aggregates (PAGG) kernel module,
1084
see \fIhttp://oss.sgi.com/projects/pagg/\fR for more information
1086
\fBproctrack/pgid\fR
1087
which uses process group IDs and is the default for all other systems
1092
Fully qualified pathname of a program for the slurmd to execute
1093
whenever it is asked to run a job step from a new job allocation (e.g.
1094
"/usr/local/slurm/prolog"). The slurmd executes the script before starting
1095
the first job step. This may be used to purge files, enable user login, etc.
1096
By default there is no prolog. Any configured script is expected to
1097
complete execution quickly (in less time than \fBMessageTimeout\fR).
1098
See \fBProlog and Epilog Scripts\fR for more information.
1101
\fBPrologSlurmctld\fR
1102
Fully qualified pathname of a program for the slurmctld to execute
1103
before granting a new job allocation (e.g.
1104
"/usr/local/slurm/prolog_controller").
1105
The program executes as SlurmUser, which gives it permission to drain
1106
nodes and requeue the job if a failure occurs or cancel the job if appropriate.
1107
The program can be used to reboot nodes or perform other work to prepare
1109
While this program is running, the nodes associated with the job will be
1110
have a POWER_UP/CONFIGURING flag set in their state, which can be readily
1112
A non\-zero exit code will result in the job being requeued (where possible)
1114
See \fBProlog and Epilog Scripts\fR for more information.
1117
\fBPropagatePrioProcess\fR
1118
Controls the scheduling priority (nice value) of user spawned tasks.
1122
The tasks will inherit the scheduling priority from the slurm daemon.
1123
This is the default value.
1126
The tasks will inherit the scheduling priority of the command used to
1127
submit them (e.g. \fBsrun\fR or \fBsbatch\fR).
1128
Unless the job is submitted by user root, the tasks will have a scheduling
1129
priority no higher than the slurm daemon spawning them.
1132
The tasks will inherit the scheduling priority of the command used to
1133
submit them (e.g. \fBsrun\fR or \fBsbatch\fR) with the restriction that
1134
their nice value will always be one higher than the slurm daemon (i.e.
1135
the tasks scheduling priority will be lower than the slurm daemon).
1139
\fBPropagateResourceLimits\fR
1140
A list of comma separated resource limit names.
1141
The slurmd daemon uses these names to obtain the associated (soft) limit
1142
values from the users process environment on the submit node.
1143
These limits are then propagated and applied to the jobs that
1144
will run on the compute nodes.
1145
This parameter can be useful when system limits vary among nodes.
1146
Any resource limits that do not appear in the list are not propagated.
1147
However, the user can override this by specifying which resource limits
1148
to propagate with the srun commands "\-\-propagate" option.
1149
If neither of the 'propagate resource limit' parameters are specified, then
1150
the default action is to propagate all limits.
1151
Only one of the parameters, either
1152
\fBPropagateResourceLimits\fR or \fBPropagateResourceLimitsExcept\fR,
1154
The following limit names are supported by SLURM (although some
1155
options may not be supported on some systems):
1159
All limits listed below
1162
No limits listed below
1165
The maximum address space for a processes
1168
The maximum size of core file
1171
The maximum amount of CPU time
1174
The maximum size of a process's data segment
1177
The maximum size of files created
1180
The maximum size that may be locked into memory
1183
The maximum number of open files
1186
The maximum number of processes available
1189
The maximum resident set size
1192
The maximum stack size
1196
\fBPropagateResourceLimitsExcept\fR
1197
A list of comma separated resource limit names.
1198
By default, all resource limits will be propagated, (as described by
1199
the \fBPropagateResourceLimits\fR parameter), except for the limits
1200
appearing in this list. The user can override this by specifying which
1201
resource limits to propagate with the srun commands "\-\-propagate" option.
1202
See \fBPropagateResourceLimits\fR above for a list of valid limit names.
1206
SLURM supports a mechanism to reduce power consumption on nodes that
1207
remain idle for an extended period of time.
1208
This is typically accomplished by reducing voltage and frequency or powering
1210
\fBResumeProgram\fR is the program that will be executed when a node
1211
in power save mode is assigned work to perform.
1212
For reasons of reliability, \fBResumeProgram\fR may execute more than once
1213
for a node when the \fBslurmctld\fR daemon crashes and is restarted.
1214
If \fBResumeProgram\fR is unable to restore a node to service, it should
1215
requeue any node associated with the node and set the node state to DRAIN.
1216
The program executes as \fBSlurmUser\fR.
1217
The argument to the program will be the names of nodes to
1218
be removed from power savings mode (using SLURM's hostlist
1220
By default no program is run.
1221
Related configuration options include \fBResumeTimeout\fR, \fBResumeRate\fR,
1222
\fBSuspendRate\fR, \fBSuspendTime\fR, \fBSuspendTimeout\fR, \fBSuspendProgram\fR,
1223
\fBSuspendExcNodes\fR, and \fBSuspendExcParts\fR.
1224
More information is available at the SLURM web site
1225
(https://computing.llnl.gov/linux/slurm/power_save.html).
1229
The rate at which nodes in power save mode are returned to normal
1230
operation by \fBResumeProgram\fR.
1231
The value is number of nodes per minute and it can be used to prevent
1232
power surges if a large number of nodes in power save mode are
1233
assigned work at the same time (e.g. a large job starts).
1234
A value of zero results in no limits being imposed.
1235
The default value is 300 nodes per minute.
1236
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1237
\fBSuspendRate\fR, \fBSuspendTime\fR, \fBSuspendTimeout\fR, \fBSuspendProgram\fR,
1238
\fBSuspendExcNodes\fR, and \fBSuspendExcParts\fR.
1242
Maximum time permitted (in second) between when a node is resume request
1243
is issued and when the node is actually available for use.
1244
Nodes which fail to respond in this time frame may be marked DOWN and
1245
the jobs scheduled on the node requeued.
1246
The default value is 60 seconds.
1247
Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR,
1248
\fBSuspendRate\fR, \fBSuspendTime\fR, \fBSuspendTimeout\fR, \fBSuspendProgram\fR,
1249
\fBSuspendExcNodes\fR and \fBSuspendExcParts\fR.
1250
More information is available at the SLURM web site
1251
(https://computing.llnl.gov/linux/slurm/power_save.html).
1255
Describes how long a job already running in a reservation should be
1256
permitted to execute after the end time of the reservation has been
1258
The time period is specified in minutes and the default value is 0
1259
(kill the job immediately).
1260
The value may not exceed 65533 minutes, although a value of "UNLIMITED"
1261
is supported to permit a job to run indefinitely after its reservation
1265
\fBReturnToService\fR
1266
Controls when a DOWN node will be returned to service.
1267
The default value is 0.
1268
Supported values include
1272
A node will remain in the DOWN state until a system administrator
1273
explicitly changes its state (even if the slurmd daemon registers
1274
and resumes communications).
1277
A DOWN node will become available for use upon registration with a
1278
valid configuration only if it was set DOWN due to being non\-responsive.
1279
If the node was set DOWN for any other reason (low memory, prolog failure,
1280
epilog failure, silently rebooting, etc.), its state will not automatically
1284
A DOWN node will become available for use upon registration with a
1285
valid configuration. The node could have been set DOWN for any reason.
1289
\fBSallocDefaultCommand\fR
1290
Normally, \fBsalloc\fR(1) will run the user's default shell when
1291
a command to execute is not specified on the \fBsalloc\fR command line.
1292
If \fBSallocDefaultCommand\fR is specified, \fBsalloc\fR will instead
1293
run the configured command. The command is passed to '/bin/sh \-c', so
1294
shell metacharacters are allowed, and commands with multiple arguments
1295
should be quoted. For instance:
1298
SallocDefaultCommand = "$SHELL"
1301
would run the shell in the user's $SHELL environment variable.
1305
SallocDefaultCommand = "xterm \-T Job_$SLURM_JOB_ID"
1308
would run \fBxterm\fR with the title set to the SLURM jobid.
1311
\fBSchedulerParameters\fR
1312
The interpretation of this parameter varies by \fBSchedulerType\fR.
1313
Multiple options may be comma separated.
1316
\fBdefault_queue_depth=#\fR
1317
The default number of jobs to attempt scheduling (i.e. the queue depth) when a
1318
running job completes or other routine actions occur. The full queue will be
1319
tested on a less frequent basis. The default value is 100.
1320
In the case of large clusters (more than 1000 nodes), configuring a relatively
1321
small value may be desirable.
1324
Setting this option will avoid attempting to schedule each job
1325
individually at job submit time, but defer it until a later time when
1326
scheduling multiple jobs simultaneously may be possible.
1327
This option may improve system responsiveness when large numbers of jobs
1328
(many hundreds) are submitted at the same time, but it will delay the
1329
initiation time of individual jobs. Also see \fBdefault_queue_depth\fR above.
1332
The number of seconds between iterations.
1333
Higher values result in less overhead and better responsiveness.
1334
The default value is 30 seconds.
1335
This option applies only to \fBSchedulerType=sched/backfill\fR.
1338
The number of minutes into the future to look when considering jobs to schedule.
1339
Higher values result in more overhead and less responsiveness.
1340
The default value is 1440 minutes (one day).
1341
This option applies only to \fBSchedulerType=sched/backfill\fR.
1344
The maximum number of jobs to attempt backfill scheduling for
1345
(i.e. the queue depth).
1346
Higher values result in more overhead and less responsiveness.
1347
Until an attempt is made to backfill schedule a job, its expected
1348
initiation time value will not be set.
1349
The default value is 50.
1350
In the case of large clusters (more than 1000 nodes) configured with
1351
\fBSelectType=select/cons_res\fR, configuring a relatively small value may be
1353
This option applies only to \fBSchedulerType=sched/backfill\fR.
1358
The port number on which slurmctld should listen for connection requests.
1359
This value is only used by the Maui Scheduler (see \fBSchedulerType\fR).
1360
The default value is 7321.
1363
\fBSchedulerRootFilter\fR
1364
Identifies whether or not \fBRootOnly\fR partitions should be filtered from
1365
any external scheduling activities. If set to 0, then \fBRootOnly\fR partitions
1366
are treated like any other partition. If set to 1, then \fBRootOnly\fR
1367
partitions are exempt from any external scheduling activities. The
1368
default value is 1. Currently only used by the built\-in backfill
1369
scheduling module "sched/backfill" (see \fBSchedulerType\fR).
1372
\fBSchedulerTimeSlice\fR
1373
Number of seconds in each time slice when gang scheduling is enabled
1374
(\fBPreemptMode=GANG\fR).
1375
The default value is 30 seconds.
1379
Identifies the type of scheduler to be used.
1380
Note the \fBslurmctld\fR daemon must be restarted for a change in
1381
scheduler type to become effective (reconfiguring a running daemon has
1382
no effect for this parameter).
1383
The \fBscontrol\fR command can be used to manually change job priorities
1385
Acceptable values include:
1389
for the built\-in FIFO (First In First Out) scheduler.
1390
This is the default.
1392
\fBsched/backfill\fR
1393
for a backfill scheduling module to augment the default FIFO scheduling.
1394
Backfill scheduling will initiate lower\-priority jobs if doing
1395
so does not delay the expected initiation time of any higher
1397
Effectiveness of backfill scheduling is dependent upon users specifying
1398
job time limits, otherwise all jobs will have the same time limit and
1399
backfilling is impossible.
1400
Note documentation for the \fBSchedulerParameters\fR option above.
1403
Defunct option. See \fBPreemptType\fR and \fBPreemptMode\fR options.
1406
to hold all newly arriving jobs if a file "/etc/slurm.hold"
1407
exists otherwise use the built\-in FIFO scheduler
1410
for the Wiki interface to the Maui Scheduler
1413
for the Wiki interface to the Moab Cluster Suite
1418
Identifies the type of resource selection algorithm to be used.
1419
Acceptable values include
1423
for allocation of entire nodes assuming a
1424
one\-dimensional array of nodes in which sequentially ordered
1425
nodes are preferable.
1426
This is the default value for non\-BlueGene systems.
1428
\fBselect/cons_res\fR
1429
The resources within a node are individually allocated as
1430
consumable resources.
1431
Note that whole nodes can be allocated to jobs for selected
1432
partitions by using the \fIShared=Exclusive\fR option.
1433
See the partition \fBShared\fR parameter for more information.
1435
\fBselect/bluegene\fR
1436
for a three\-dimensional BlueGene system.
1437
The default value is "select/bluegene" for BlueGene systems.
1441
\fBSelectTypeParameters\fR
1442
The permitted values of \fBSelectTypeParameters\fR depend upon the
1443
configured value of \fBSelectType\fR.
1444
\fBSelectType=select/bluegene\fR supports no \fBSelectTypeParameters\fR.
1445
The only supported option for \fBSelectType=select/linear\fR are
1446
\fBCR_ONE_TASK_PER_CORE\fR and
1447
\fBCR_Memory\fR, which treats memory as a consumable resource and
1448
prevents memory over subscription with job preemption or gang scheduling.
1449
The following values are supported for \fBSelectType=select/cons_res\fR:
1453
CPUs are consumable resources.
1454
There is no notion of sockets, cores or threads;
1455
do not define those values in the node specification. If these
1456
are defined, unexpected results will happen when hyper\-threading
1457
is enabled Procs= should be used instead.
1458
On a multi\-core system, each core will be considered a CPU.
1459
On a multi\-core and hyper\-threaded system, each thread will be
1461
On single\-core systems, each CPUs will be considered a CPU.
1464
CPUs and memory are consumable resources.
1465
There is no notion of sockets, cores or threads;
1466
do not define those values in the node specification. If these
1467
are defined, unexpected results will happen when hyper\-threading
1468
is enabled Procs= should be used instead.
1469
Setting a value for \fBDefMemPerCPU\fR is strongly recommended.
1472
Cores are consumable resources.
1473
On nodes with hyper\-threads, each thread is counted as a CPU to
1474
satisfy a job's resource requirement, but multiple jobs are not
1475
allocated threads on the same core.
1477
\fBCR_Core_Memory\fR
1478
Cores and memory are consumable resources.
1479
On nodes with hyper\-threads, each thread is counted as a CPU to
1480
satisfy a job's resource requirement, but multiple jobs are not
1481
allocated threads on the same core.
1482
Setting a value for \fBDefMemPerCPU\fR is strongly recommended.
1484
\fBCR_ONE_TASK_PER_CORE\fR
1485
Allocate one task per core by default.
1486
Without this option, by default one task will be allocated per
1487
thread on nodes with more than one \fBThreadsPerCore\fR configured.
1489
\fBCR_CORE_DEFAULT_DIST_BLOCK\fR
1490
Allocate cores using block distribution by default.
1491
This default behavior can be overridden specifying a particular
1492
"\-m" parameter with srun/salloc/sbatch.
1493
Without this option, cores will be allocated cyclicly across the sockets.
1496
Sockets are consumable resources.
1497
On nodes with multiple cores, each core or thread is counted as a CPU
1498
to satisfy a job's resource requirement, but multiple jobs are not
1499
allocated resources on the same socket.
1500
Note that jobs requesting one CPU will only be allocated
1501
that one CPU, but no other job will share the socket.
1503
\fBCR_Socket_Memory\fR
1504
Memory and sockets are consumable resources.
1505
On nodes with multiple cores, each core or thread is counted as a CPU
1506
to satisfy a job's resource requirement, but multiple jobs are not
1507
allocated resources on the same socket.
1508
Note that jobs requesting one CPU will only be allocated
1509
that one CPU, but no other job will share the socket.
1510
Setting a value for \fBDefMemPerCPU\fR is strongly recommended.
1513
Memory is a consumable resource.
1514
NOTE: This implies \fIShared=YES\fR or \fIShared=FORCE\fR for all partitions.
1515
Setting a value for \fBDefMemPerCPU\fR is strongly recommended.
1520
The name of the user that the \fBslurmctld\fR daemon executes as.
1521
For security purposes, a user other than "root" is recommended.
1522
This user must exist on all nodes of the cluster for authentication
1523
of communications between SLURM components.
1524
The default value is "root".
1528
The name of the user that the \fBslurmd\fR daemon executes as.
1529
This user must exist on all nodes of the cluster for authentication
1530
of communications between SLURM components.
1531
The default value is "root".
1534
\fBSlurmctldDebug\fR
1535
The level of detail to provide \fBslurmctld\fR daemon's logs.
1536
Values from 0 to 9 are legal, with `0' being "quiet" operation and `9'
1537
being insanely verbose.
1538
The default value is 3.
1541
\fBSlurmctldLogFile\fR
1542
Fully qualified pathname of a file into which the \fBslurmctld\fR daemon's
1544
The default value is none (performs logging via syslog).
1547
\fBSlurmctldPidFile\fR
1548
Fully qualified pathname of a file into which the \fBslurmctld\fR daemon
1549
may write its process id. This may be used for automated signal processing.
1550
The default value is "/var/run/slurmctld.pid".
1554
The port number that the SLURM controller, \fBslurmctld\fR, listens
1555
to for work. The default value is SLURMCTLD_PORT as established at system
1556
build time. If none is explicitly specified, it will be set to 6817.
1557
\fBSlurmctldPort\fR may also be configured to support a range of port
1558
numbers in order to accept larger bursts of incoming messages by specifying
1559
two numbers separated by a dash (e.g. \fBSlurmctldPort=6817\-6818\fR).
1560
NOTE: Either \fBslurmctld\fR and \fBslurmd\fR daemons must not
1561
execute on the same nodes or the values of \fBSlurmctldPort\fR and
1562
\fBSlurmdPort\fR must be different.
1565
\fBSlurmctldTimeout\fR
1566
The interval, in seconds, that the backup controller waits for the
1567
primary controller to respond before assuming control.
1568
The default value is 120 seconds.
1569
May not exceed 65533.
1573
The level of detail to provide \fBslurmd\fR daemon's logs.
1574
Values from 0 to 9 are legal, with `0' being "quiet" operation and `9' being
1576
The default value is 3.
1580
Fully qualified pathname of a file into which the \fBslurmd\fR daemon's
1582
The default value is none (performs logging via syslog).
1583
Any "%h" within the name is replaced with the hostname on which the
1584
\fBslurmd\fR is running.
1588
Fully qualified pathname of a file into which the \fBslurmd\fR daemon may write
1589
its process id. This may be used for automated signal processing.
1590
The default value is "/var/run/slurmd.pid".
1594
The port number that the SLURM compute node daemon, \fBslurmd\fR, listens
1595
to for work. The default value is SLURMD_PORT as established at system
1596
build time. If none is explicitly specified, its value will be 6818.
1597
NOTE: Either slurmctld and slurmd daemons must not execute
1598
on the same nodes or the values of \fBSlurmctldPort\fR and \fBSlurmdPort\fR
1602
\fBSlurmdSpoolDir\fR
1603
Fully qualified pathname of a directory into which the \fBslurmd\fR
1604
daemon's state information and batch job script information are written. This
1605
must be a common pathname for all nodes, but should represent a directory which
1606
is local to each node (reference a local file system). The default value
1607
is "/var/spool/slurmd." \fBNOTE\fR: This directory is also used to store
1609
shared memory lockfile, and \fBshould not be changed\fR unless the system
1610
is being cleanly restarted. If the location of \fBSlurmdSpoolDir\fR is
1611
changed and \fBslurmd\fR is restarted, the new daemon will attach to a
1612
different shared memory region and lose track of any running jobs.
1616
The interval, in seconds, that the SLURM controller waits for \fBslurmd\fR
1617
to respond before configuring that node's state to DOWN.
1618
A value of zero indicates the node will not be tested by \fBslurmctld\fR to
1619
confirm the state of \fBslurmd\fR, the node will not be automatically set to
1620
a DOWN state indicating a non\-responsive \fBslurmd\fR, and some other tool
1621
will take responsibility for monitoring the state of each compute node
1622
and its \fBslurmd\fR daemon.
1623
SLURM's hierarchical communication mechanism is used to ping the \fBslurmd\fR
1624
daemons in order to minimize system noise and overhead.
1625
The default value is 300 seconds.
1626
The value may not exceed 65533 seconds.
1629
\fBSlurmSchedLogFile\fR
1630
Fully qualified pathname of the scheduling event logging file.
1631
The syntax of this parameter is the same as for \fBSlurmctldLogFile\fR.
1632
In order to configure scheduler logging, set both the \fBSlurmSchedLogFile\fR
1633
and \fBSlurmSchedLogLevel\fR parameters.
1636
\fBSlurmSchedLogLevel\fR
1637
The initial level of scheduling event logging, similar to the
1638
\fBSlurmctlDebug\fR parameter used to control the initial level of
1639
\fBslurmctld\fR logging.
1640
Valid values for \fBSlurmSchedLogLevel\fR are "0" (scheduler logging
1641
disabled) and "1" (scheduler logging enabled).
1642
If this parameter is omitted, the value defaults to "0" (disabled).
1643
In order to configure scheduler logging, set both the \fBSlurmSchedLogFile\fR
1644
and \fBSlurmSchedLogLevel\fR parameters.
1645
The scheduler logging level can be changed dynamically using \fBscontrol\fR.
1649
Fully qualified pathname of an executable to be run by srun following
1650
the completion of a job step. The command line arguments for the
1651
executable will be the command and arguments of the job step. This
1652
configuration parameter may be overridden by srun's \fB\-\-epilog\fR
1653
parameter. Note that while the other "Epilog" executables (e.g.,
1654
TaskEpilog) are run by slurmd on the compute nodes where the tasks are
1655
executed, the \fBSrunEpilog\fR runs on the node where the "srun" is
1660
Fully qualified pathname of an executable to be run by srun prior to
1661
the launch of a job step. The command line arguments for the
1662
executable will be the command and arguments of the job step. This
1663
configuration parameter may be overridden by srun's \fB\-\-prolog\fR
1664
parameter. Note that while the other "Prolog" executables (e.g.,
1665
TaskProlog) are run by slurmd on the compute nodes where the tasks are
1666
executed, the \fBSrunProlog\fR runs on the node where the "srun" is
1670
\fBStateSaveLocation\fR
1671
Fully qualified pathname of a directory into which the SLURM controller,
1672
\fBslurmctld\fR, saves its state (e.g. "/usr/local/slurm/checkpoint").
1673
SLURM state will saved here to recover from system failures.
1674
\fBSlurmUser\fR must be able to create files in this directory.
1675
If you have a \fBBackupController\fR configured, this location should be
1676
readable and writable by both systems.
1677
Since all running and pending job information is stored here, the use of
1678
a reliable file system (e.g. RAID) is recommended.
1679
The default value is "/tmp".
1680
If any slurm daemons terminate abnormally, their core files will also be written
1681
into this directory.
1684
\fBSuspendExcNodes\fR
1685
Specifies the nodes which are to not be placed in power save mode, even
1686
if the node remains idle for an extended period of time.
1687
Use SLURM's hostlist expression to identify nodes.
1688
By default no nodes are excluded.
1689
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1690
\fBResumeRate\fR, \fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR,
1691
\fBSuspendTimeout\fR, and \fBSuspendExcParts\fR.
1694
\fBSuspendExcParts\fR
1695
Specifies the partitions whose nodes are to not be placed in power save
1696
mode, even if the node remains idle for an extended period of time.
1697
Multiple partitions can be identified and separated by commas.
1698
By default no nodes are excluded.
1699
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1700
\fBResumeRate\fR, \fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTime\fR
1701
\fBSuspendTimeout\fR, and \fBSuspendExcNodes\fR.
1704
\fBSuspendProgram\fR
1705
\fBSuspendProgram\fR is the program that will be executed when a node
1706
remains idle for an extended period of time.
1707
This program is expected to place the node into some power save mode.
1708
This can be used to reduce the frequency and voltage of a node or
1709
completely power the node off.
1710
The program executes as \fBSlurmUser\fR.
1711
The argument to the program will be the names of nodes to
1712
be placed into power savings mode (using SLURM's hostlist
1714
By default, no program is run.
1715
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1716
\fBResumeRate\fR, \fBSuspendRate\fR, \fBSuspendTime\fR, \fBSuspendTimeout\fR,
1717
\fBSuspendExcNodes\fR, and \fBSuspendExcParts\fR.
1721
The rate at which nodes are place into power save mode by \fBSuspendProgram\fR.
1722
The value is number of nodes per minute and it can be used to prevent
1723
a large drop in power power consumption (e.g. after a large job completes).
1724
A value of zero results in no limits being imposed.
1725
The default value is 60 nodes per minute.
1726
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1727
\fBResumeRate\fR, \fBSuspendProgram\fR, \fBSuspendTime\fR, \fBSuspendTimeout\fR,
1728
\fBSuspendExcNodes\fR, and \fBSuspendExcParts\fR.
1732
Nodes which remain idle for this number of seconds will be placed into
1733
power save mode by \fBSuspendProgram\fR.
1734
A value of \-1 disables power save mode and is the default.
1735
Related configuration options include \fBResumeTimeout\fR, \fBResumeProgram\fR,
1736
\fBResumeRate\fR, \fBSuspendProgram\fR, \fBSuspendRate\fR, \fBSuspendTimeout\fR,
1737
\fBSuspendExcNodes\fR, and \fBSuspendExcParts\fR.
1740
\fBSuspendTimeout\fR
1741
Maximum time permitted (in second) between when a node suspend request
1742
is issued and when the node shutdown.
1743
At that time the node must ready for a resume request to be issued
1744
as needed for new work.
1745
The default value is 30 seconds.
1746
Related configuration options include \fBResumeProgram\fR, \fBResumeRate\fR,
1747
\fBResumeTimeout\fR, \fBSuspendRate\fR, \fBSuspendTime\fR, \fBSuspendProgram\fR,
1748
\fBSuspendExcNodes\fR and \fBSuspendExcParts\fR.
1749
More information is available at the SLURM web site
1750
(https://computing.llnl.gov/linux/slurm/power_save.html).
1754
Identifies the type of switch or interconnect used for application
1756
Acceptable values include
1757
"switch/none" for switches not requiring special processing for job launch
1758
or termination (Myrinet, Ethernet, and InfiniBand),
1759
"switch/elan" for Quadrics Elan 3 or Elan 4 interconnect.
1760
The default value is "switch/none".
1761
All SLURM daemons, commands and running jobs must be restarted for a
1762
change in \fBSwitchType\fR to take effect.
1763
If running jobs exist at the time \fBslurmctld\fR is restarted with a new
1764
value of \fBSwitchType\fR, records of all jobs in any state may be lost.
1768
Fully qualified pathname of a program to be execute as the slurm job's
1769
owner after termination of each task.
1770
See \fBTaskProlog\fR for execution order details.
1774
Identifies the type of task launch plugin, typically used to provide
1775
resource management within a node (e.g. pinning tasks to specific
1777
Acceptable values include
1778
"task/none" for systems requiring no special handling and
1779
"task/affinity" to enable the \-\-cpu_bind and/or \-\-mem_bind
1781
The default value is "task/none".
1782
If you "task/affinity" and encounter problems, it may be due to
1783
the variety of system calls used to implement task affinity on
1784
different operating systems.
1785
If that is the case, you may want to use Portable Linux
1786
Process Affinity (PLPA, see http://www.open-mpi.org/software/plpa),
1787
which is supported by SLURM.
1790
\fBTaskPluginParam\fR
1791
Optional parameters for the task plugin.
1792
Multiple options should be comma separated
1793
If \fBNone\fR, \fBSockets\fR, \fBCores\fR, \fBThreads\fR,
1794
and/or \fBVerbose\fR are specified, they will override
1795
the \fB\-\-cpu_bind\fR option specified by the user
1796
in the \fBsrun\fR command.
1797
\fBNone\fR, \fBSockets\fR, \fBCores\fR and \fBThreads\fR are mutually
1798
exclusive and since they decrease scheduling flexibility are not generally
1799
recommended (select no more than one of them).
1800
\fBCpusets\fR and \fBSched\fR
1801
are mutually exclusive (select only one of them).
1806
Always bind to cores.
1807
Overrides user options or automatic binding.
1810
Use cpusets to perform task affinity functions.
1811
By default, \fBSched\fR task binding is performed.
1814
Perform no task binding.
1815
Overrides user options or automatic binding.
1818
Use \fIsched_setaffinity\fR or \fIplpa_sched_setaffinity\fR
1819
(if available) to bind tasks to processors.
1822
Always bind to sockets.
1823
Overrides user options or automatic binding.
1826
Always bind to threads.
1827
Overrides user options or automatic binding.
1830
Verbosely report binding before tasks run.
1831
Overrides user options.
1836
Fully qualified pathname of a program to be execute as the slurm job's
1837
owner prior to initiation of each task.
1838
Besides the normal environment variables, this has SLURM_TASK_PID
1839
available to identify the process ID of the task being started.
1840
Standard output from this program can be used to control the environment
1841
variables and output for the user program.
1844
\fBexport NAME=value\fR
1845
Will set environment variables for the task being spawned.
1846
Everything after the equal sign to the end of the
1847
line will be used as the value for the environment variable.
1848
Exporting of functions is not currently supported.
1851
Will cause that line (without the leading "print ")
1852
to be printed to the job's standard output.
1855
Will clear environment variables for the task being spawned.
1857
The order of task prolog/epilog execution is as follows:
1859
\fB1. pre_launch()\fR
1860
Function in TaskPlugin
1863
System\-wide per task program defined in slurm.conf
1865
\fB3. user prolog\fR
1866
Job step specific task program defined using
1867
\fBsrun\fR's \fB\-\-task\-prolog\fR option or \fBSLURM_TASK_PROLOG\fR
1868
environment variable
1870
\fB4.\fR Execute the job step's task
1872
\fB5. user epilog\fR
1873
Job step specific task program defined using
1874
\fBsrun\fR's \fB\-\-task\-epilog\fR option or \fBSLURM_TASK_EPILOG\fR
1875
environment variable
1878
System\-wide per task program defined in slurm.conf
1880
\fB7. post_term()\fR
1881
Function in TaskPlugin
1886
Fully qualified pathname of the file system available to user jobs for
1887
temporary storage. This parameter is used in establishing a node's \fBTmpDisk\fR
1889
The default value is "/tmp".
1892
\fBTopologyPlugin\fR
1893
Identifies the plugin to be used for determining the network topology
1894
and optimizing job allocations to minimize network contention.
1895
See \fBNETWORK TOPOLOGY\fR below for details.
1896
Additional plugins may be provided in the future which gather topology
1897
information directly from the network.
1898
Acceptable values include:
1901
\fBtopology/3d_torus\fR
1902
default for Sun Constellation
1903
systems, best\-fit logic over three\-dimensional topology
1905
\fBtopology/node_rank\fR
1906
default for Cray computers, orders nodes based upon information in the
1907
ALPS database and then performs a best\-fit algorithm over over those
1911
default for other systems, best\-fit logic over one\-dimensional topology
1914
used for a hierarchical network as described in a \fItopology.conf\fR file
1919
Boolean yes or no. Used to set display and track of the Workload
1920
Characterization Key. Must be set to track wckey usage.
1924
\fBSlurmd\fR daemons use a virtual tree network for communications.
1925
\fBTreeWidth\fR specifies the width of the tree (i.e. the fanout).
1926
The default value is 50, meaning each slurmd daemon can communicate
1927
with up to 50 other slurmd daemons and over 2500 nodes can be contacted
1928
with two message hops.
1929
The default value will work well for most clusters.
1930
Optimal system performance can typically be achieved if \fBTreeWidth\fR
1931
is set to the square root of the number of nodes in the cluster for
1932
systems having no more than 2500 nodes or the cube root for larger
1936
\fBUnkillableStepProgram\fR
1937
If the processes in a job step are determined to be unkillable for a period
1938
of time specified by the \fBUnkillableStepTimeout\fR variable, the program
1939
specified by \fBUnkillableStepProgram\fR will be executed.
1940
This program can be used to take special actions to clean up the unkillable
1941
processes and/or notify computer administrators.
1942
The program will be run \fBSlurmdUser\fR (usually "root").
1943
By default no program is run.
1946
\fBUnkillableStepTimeout\fR
1947
The length of time, in seconds, that SLURM will wait before deciding that
1948
processes in a job step are unkillable (after they have been signaled with
1949
SIGKILL) and execute \fBUnkillableStepProgram\fR as described above.
1950
The default timeout value is 60 seconds.
1954
If set to 1, PAM (Pluggable Authentication Modules for Linux) will be enabled.
1955
PAM is used to establish the upper bounds for resource limits. With PAM support
1956
enabled, local system administrators can dynamically configure system resource
1957
limits. Changing the upper bound of a resource limit will not alter the limits
1958
of running jobs, only jobs started after a change has been made will pick up
1960
The default value is 0 (not to enable PAM support).
1961
Remember that PAM also needs to be configured to support SLURM as a service.
1962
For sites using PAM's directory based configuration option, a configuration
1963
file named \fBslurm\fR should be created. The module\-type, control\-flags, and
1964
module\-path names that should be included in the file are:
1966
auth required pam_localuser.so
1968
auth required pam_shells.so
1970
account required pam_unix.so
1972
account required pam_access.so
1974
session required pam_unix.so
1976
For sites configuring PAM with a general configuration file, the appropriate
1977
lines (see above), where \fBslurm\fR is the service\-name, should be added.
1981
Memory specifications in job requests apply to real memory size (also known
1982
as resident set size). It is possible to enforce virtual memory limits for
1983
both jobs and job steps by limiting their virtual memory to some percentage
1984
of their real memory allocation. The \fBVSizeFactor\fR parameter specifies
1985
the job's or job step's virtual memory limit as a percentage of its real
1986
memory limit. For example, if a job's real memory limit is 500MB and
1987
VSizeFactor is set to 101 then the job will be killed if its real memory
1988
exceeds 500MB or its virtual memory exceeds 505MB (101 percent of the
1990
The default valus is 0, which disables enforcement of virtual memory limits.
1991
The value may not exceed 65533 percent.
1995
Specifies how many seconds the srun command should by default wait after
1996
the first task terminates before terminating all remaining tasks. The
1997
"\-\-wait" option on the srun command line overrides this value.
1998
If set to 0, this feature is disabled.
1999
May not exceed 65533 seconds.
2002
The configuration of nodes (or machines) to be managed by SLURM is
2003
also specified in \fB/etc/slurm.conf\fR.
2004
Changes in node configuration (e.g. adding nodes, changing their
2005
processor count, etc.) require restarting the slurmctld daemon.
2006
Only the NodeName must be supplied in the configuration file.
2007
All other node configuration information is optional.
2008
It is advisable to establish baseline node configurations,
2009
especially if the cluster is heterogeneous.
2010
Nodes which register to the system with less than the configured resources
2011
(e.g. too little memory), will be placed in the "DOWN" state to
2012
avoid scheduling jobs on them.
2013
Establishing baseline configurations will also speed SLURM's
2014
scheduling process by permitting it to compare job requirements
2015
against these (relatively few) configuration parameters and
2016
possibly avoid having to check job requirements
2017
against every individual node's configuration.
2018
The resources checked at node registration time are: Procs,
2019
RealMemory and TmpDisk.
2020
While baseline values for each of these can be established
2021
in the configuration file, the actual values upon node
2022
registration are recorded and these actual values may be
2023
used for scheduling purposes (depending upon the value of
2024
\fBFastSchedule\fR in the configuration file.
2026
Default values can be specified with a record in which
2027
"NodeName" is "DEFAULT".
2028
The default entry values will apply only to lines following it in the
2029
configuration file and the default values can be reset multiple times
2030
in the configuration file with multiple entries where "NodeName=DEFAULT".
2031
The "NodeName=" specification must be placed on every line
2032
describing the configuration of nodes.
2033
In fact, it is generally possible and desirable to define the
2034
configurations of all nodes in only a few lines.
2035
This convention permits significant optimization in the scheduling
2037
In order to support the concept of jobs requiring consecutive nodes
2038
on some architectures,
2039
node specifications should be place in this file in consecutive order.
2040
No single node name may be listed more than once in the configuration
2042
Use "DownNodes=" to record the state of nodes which are temporarily
2043
in a DOWN, DRAIN or FAILING state without altering permanent
2044
configuration information.
2045
A job step's tasks are allocated to nodes in order the nodes appear
2046
in the configuration file. There is presently no capability within
2047
SLURM to arbitrarily order a job step's tasks.
2049
Multiple node names may be comma separated (e.g. "alpha,beta,gamma")
2050
and/or a simple node range expression may optionally be used to
2051
specify numeric ranges of nodes to avoid building a configuration
2052
file with large numbers of entries.
2053
The node range expression can contain one pair of square brackets
2054
with a sequence of comma separated numbers and/or ranges of numbers
2055
separated by a "\-" (e.g. "linux[0\-64,128]", or "lx[15,18,32\-33]").
2056
Note that the numeric ranges can include one or more leading
2057
zeros to indicate the numeric portion has a fixed number of digits
2058
(e.g. "linux[0000\-1023]").
2059
Up to two numeric ranges can be included in the expression
2060
(e.g. "rack[0\-63]_blade[0\-41]").
2061
If one or more numeric expressions are included, one of them
2062
must be at the end of the name (e.g. "unit[0\-31]rack" is invalid),
2063
but arbitrary names can always be used in a comma separated list.
2065
On BlueGene systems only, the square brackets should contain
2066
pairs of three digit numbers separated by a "x".
2067
These numbers indicate the boundaries of a rectangular prism
2068
(e.g. "bgl[000x144,400x544]").
2069
See BlueGene documentation for more details.
2070
The node configuration specified the following information:
2074
Name that SLURM uses to refer to a node (or base partition for
2076
Typically this would be the string that "/bin/hostname \-s" returns.
2077
It may also be the fully qualified domain name as returned by "/bin/hostname \-f"
2078
(e.g. "foo1.bar.com"), or any valid domain name associated with the host
2079
through the host database (/etc/hosts) or DNS, depending on the resolver
2080
settings. Note that if the short form of the hostname is not used, it
2081
may prevent use of hostlist expressions (the numeric portion in brackets
2082
must be at the end of the string).
2083
Only short hostname forms are compatible with the
2084
switch/elan and switch/federation plugins at this time.
2085
It may also be an arbitrary string if \fBNodeHostname\fR is specified.
2086
If the \fBNodeName\fR is "DEFAULT", the values specified
2087
with that record will apply to subsequent node specifications
2088
unless explicitly set to other values in that node record or
2089
replaced with a different set of default values.
2090
For architectures in which the node order is significant,
2091
nodes will be considered consecutive in the order defined.
2092
For example, if the configuration for "NodeName=charlie" immediately
2093
follows the configuration for "NodeName=baker" they will be
2094
considered adjacent in the computer.
2098
Typically this would be the string that "/bin/hostname \-s" returns.
2099
It may also be the fully qualified domain name as returned by "/bin/hostname \-f"
2100
(e.g. "foo1.bar.com"), or any valid domain name associated with the host
2101
through the host database (/etc/hosts) or DNS, depending on the resolver
2102
settings. Note that if the short form of the hostname is not used, it
2103
may prevent use of hostlist expressions (the numeric portion in brackets
2104
must be at the end of the string).
2105
Only short hostname forms are compatible with the
2106
switch/elan and switch/federation plugins at this time.
2107
A node range expression can be used to specify a set of nodes.
2108
If an expression is used, the number of nodes identified by
2109
\fBNodeHostname\fR on a line in the configuration file must
2110
be identical to the number of nodes identified by \fBNodeName\fR.
2111
By default, the \fBNodeHostname\fR will be identical in value to
2116
Name that a node should be referred to in establishing
2117
a communications path.
2118
This name will be used as an
2119
argument to the gethostbyname() function for identification.
2120
If a node range expression is used to designate multiple nodes,
2121
they must exactly match the entries in the \fBNodeName\fR
2122
(e.g. "NodeName=lx[0\-7] NodeAddr="elx[0\-7]").
2123
\fBNodeAddr\fR may also contain IP addresses.
2124
By default, the \fBNodeAddr\fR will be identical in value to
2128
\fBCoresPerSocket\fR
2129
Number of cores in a single physical processor socket (e.g. "2").
2130
The CoresPerSocket value describes physical cores, not the
2131
logical number of processors per socket.
2132
\fBNOTE\fR: If you have multi\-core processors, you will likely
2133
need to specify this parameter in order to optimize scheduling.
2134
The default value is 1.
2138
A comma delimited list of arbitrary strings indicative of some
2139
characteristic associated with the node.
2140
There is no value associated with a feature at this time, a node
2141
either has a feature or it does not.
2142
If desired a feature may contain a numeric component indicating,
2143
for example, processor speed.
2144
By default a node has no features.
2145
Also see \fBGres\fR.
2149
A comma delimited list of generic resources specifications for a node.
2150
Each resource specification consists of a name followed by an optional
2151
colon with a numeric value (default value is one)
2152
(e.g. "Gres=bandwidth:10000,gpus:2").
2153
A suffix of "K", "M" or "G" may be used to mulitply the number by 1024,
2154
1048576 or 1073741824 respectively (e.g. "Gres=bandwidth:4G,gpus:4")..
2155
By default a node has no generic resources.
2156
Also see \fBFeature\fR.
2160
The port number that the SLURM compute node daemon, \fBslurmd\fR, listens
2161
to for work on this particular node. By default there is a single port number
2162
for all \fBslurmd\fR daemons on all compute nodes as defined by the
2163
\fBSlurmdPort\fR configuration parameter. Use of this option is not generally
2164
recommended except for development or testing purposes.
2168
Number of logical processors on the node (e.g. "2").
2169
If \fBProcs\fR is omitted, it will set equal to the product of
2170
\fBSockets\fR, \fBCoresPerSocket\fR, and \fBThreadsPerCore\fR.
2171
The default value is 1.
2175
Size of real memory on the node in MegaBytes (e.g. "2048").
2176
The default value is 1.
2180
Identifies the reason for a node being in state "DOWN", "DRAINED"
2181
"DRAINING", "FAIL" or "FAILING".
2182
Use quotes to enclose a reason having more than one word.
2186
Number of physical processor sockets/chips on the node (e.g. "2").
2187
If Sockets is omitted, it will be inferred from
2188
\fBProcs\fR, \fBCoresPerSocket\fR, and \fBThreadsPerCore\fR.
2189
\fBNOTE\fR: If you have multi\-core processors, you will likely
2190
need to specify these parameters.
2191
The default value is 1.
2195
State of the node with respect to the initiation of user jobs.
2196
Acceptable values are "DOWN", "DRAIN", "FAIL", "FAILING" and "UNKNOWN".
2197
"DOWN" indicates the node failed and is unavailable to be allocated work.
2198
"DRAIN" indicates the node is unavailable to be allocated work.
2199
"FAIL" indicates the node is expected to fail soon, has
2200
no jobs allocated to it, and will not be allocated
2202
"FAILING" indicates the node is expected to fail soon, has
2203
one or more jobs allocated to it, but will not be allocated
2205
"UNKNOWN" indicates the node's state is undefined (BUSY or IDLE),
2206
but will be established when the \fBslurmd\fR daemon on that node
2208
The default value is "UNKNOWN".
2209
Also see the \fBDownNodes\fR parameter below.
2212
\fBThreadsPerCore\fR
2213
Number of logical threads in a single physical core (e.g. "2").
2214
Note that the SLURM can allocate resources to jobs down to the
2215
resolution of a core. If your system is configured with more than
2216
one thread per core, execution of a different job on each thread
2217
is not supported unless you configure \fBSelectTypeParameters=CR_CPU\fR
2218
plus \fBProcs\fR; do not configure \fBSockets\fR, \fBCoresPerSocket\fR or
2219
\fBThreadsPerCore\fR.
2220
A job can execute a one task per thread from within one job step or
2221
execute a distinct job step on each of the threads.
2222
Note also if you are running with more than 1 thread per core and running
2223
the select/cons_res plugin you will want to set the SelectTypeParameters
2224
variable to something other than CR_CPU to avoid unexpected results.
2225
The default value is 1.
2229
Total size of temporary disk storage in \fBTmpFS\fR in MegaBytes
2230
(e.g. "16384"). \fBTmpFS\fR (for "Temporary File System")
2231
identifies the location which jobs should use for temporary storage.
2232
Note this does not indicate the amount of free
2233
space available to the user on the node, only the total file
2234
system size. The system administration should insure this file
2235
system is purged as needed so that user jobs have access to
2237
The Prolog and/or Epilog programs (specified in the configuration file)
2238
might be used to insure the file system is kept clean.
2239
The default value is 0.
2243
The priority of the node for scheduling purposes.
2244
All things being equal, jobs will be allocated the nodes with
2245
the lowest weight which satisfies their requirements.
2246
For example, a heterogeneous collection of nodes might
2247
be placed into a single partition for greater system
2248
utilization, responsiveness and capability. It would be
2249
preferable to allocate smaller memory nodes rather than larger
2250
memory nodes if either will satisfy a job's requirements.
2251
The units of weight are arbitrary, but larger weights
2252
should be assigned to nodes with more processors, memory,
2253
disk space, higher processor speed, etc.
2254
Note that if a job allocation request can not be satisfied
2255
using the nodes with the lowest weight, the set of nodes
2256
with the next lowest weight is added to the set of nodes
2257
under consideration for use (repeat as needed for higher
2258
weight values). If you absolutely want to minimize the number
2259
of higher weight nodes allocated to a job (at a cost of higher
2260
scheduling overhead), give each node a distinct \fBWeight\fR
2261
value and they will be added to the pool of nodes being
2262
considered for scheduling individually.
2263
The default value is 1.
2266
The "DownNodes=" configuration permits you to mark certain nodes as in a
2267
DOWN, DRAIN, FAIL, or FAILING state without altering the permanent
2268
configuration information listed under a "NodeName=" specification.
2272
Any node name, or list of node names, from the "NodeName=" specifications.
2276
Identifies the reason for a node being in state "DOWN", "DRAIN",
2278
\Use quotes to enclose a reason having more than one word.
2282
State of the node with respect to the initiation of user jobs.
2283
Acceptable values are "BUSY", "DOWN", "DRAIN", "FAIL",
2284
"FAILING, "IDLE", and "UNKNOWN".
2288
Indicates the node failed and is unavailable to be allocated work.
2291
Indicates the node is unavailable to be allocated work.on.
2294
Indicates the node is expected to fail soon, has
2295
no jobs allocated to it, and will not be allocated
2299
Indicates the node is expected to fail soon, has
2300
one or more jobs allocated to it, but will not be allocated
2304
Indicates the node is defined for future use and need not
2305
exist when the SLURM daemons are started. These nodes can be made available
2306
for use simply by updating the node state using the scontrol command rather
2307
than restarting the slurmctld daemon. After these nodes are made available,
2308
change their \fRState\fR in the slurm.conf file. Until these nodes are made
2309
available, they will not be seen using any SLURM commands or Is nor will
2310
any attempt be made to contact them.
2313
Indicates the node's state is undefined (BUSY or IDLE),
2314
but will be established when the \fBslurmd\fR daemon on that node
2316
The default value is "UNKNOWN".
2320
The partition configuration permits you to establish different job
2321
limits or access controls for various groups (or partitions) of nodes.
2322
Nodes may be in more than one partition, making partitions serve
2323
as general purpose queues.
2324
For example one may put the same set of nodes into two different
2325
partitions, each with different constraints (time limit, job sizes,
2326
groups allowed to use the partition, etc.).
2327
Jobs are allocated resources within a single partition.
2328
Default values can be specified with a record in which
2329
"PartitionName" is "DEFAULT".
2330
The default entry values will apply only to lines following it in the
2331
configuration file and the default values can be reset multiple times
2332
in the configuration file with multiple entries where "PartitionName=DEFAULT".
2333
The "PartitionName=" specification must be placed on every line
2334
describing the configuration of partitions.
2335
\fBNOTE:\fR Put all parameters for each partition on a single line.
2336
Each line of partition configuration information should
2337
represent a different partition.
2338
The partition configuration file contains the following information:
2342
Comma separated list of nodes from which users can execute jobs in the
2344
Node names may be specified using the node range expression syntax
2346
The default value is "ALL".
2350
Comma separated list of group IDs which may execute jobs in the partition.
2351
If at least one group associated with the user attempting to execute the
2352
job is in AllowGroups, he will be permitted to use this partition.
2353
Jobs executed as user root can use any partition without regard to
2354
the value of AllowGroups.
2355
If user root attempts to execute a job as another user (e.g. using
2356
srun's \-\-uid option), this other user must be in one of groups
2357
identified by AllowGroups for the job to successfully execute.
2358
The default value is "ALL".
2359
\fBNOTE:\fR For performance reasons, SLURM maintains a list of user IDs
2360
allowed to use each partition and this is checked at job submission time.
2361
This list of user IDs is updated when the \fBslurmctld\fR daemon is restarted,
2362
reconfigured (e.g. "scontrol reconfig") or the partition's \fBAllowGroups\fR
2363
value is reset, even if is value is unchanged
2364
(e.g. "scontrol update PartitionName=name AllowGroups=group").
2365
For a user's access to a partition to change, both his group membership must
2366
change and SLURM's internal user ID list must change using one of the methods
2371
Partition name of alternate partition to be used if the state of this partition
2372
is "DRAIN" or "INACTIVE."
2376
If this keyword is set, jobs submitted without a partition
2377
specification will utilize this partition.
2378
Possible values are "YES" and "NO".
2379
The default value is "NO".
2383
Run time limit used for jobs that don't specify a value. If not set
2384
then MaxTime will be used.
2385
Format is the same as for MaxTime.
2388
\fBDisableRootJobs\fR
2389
If set to "YES" then user root will be prevented from running any jobs
2391
The default value will be the value of \fBDisableRootJobs\fR set
2392
outside of a partition specification (which is "NO", allowing user
2393
root to execute jobs).
2397
Specifies if the partition and its jobs are to be hidden by default.
2398
Hidden partitions will by default not be reported by the SLURM APIs or commands.
2399
Possible values are "YES" and "NO".
2400
The default value is "NO".
2401
Note that partitions that a user lacks access to by virtue of the
2402
\fBAllowGroups\fR parameter will also be hidden by default.
2406
Maximum count of nodes which may be allocated to any single job.
2407
For BlueGene systems this will be a c\-nodes count and will be converted
2408
to a midplane count with a reduction in resolution.
2409
The default value is "UNLIMITED", which is represented internally as \-1.
2410
This limit does not apply to jobs executed by SlurmUser or user root.
2414
Maximum run time limit for jobs.
2415
Format is minutes, minutes:seconds, hours:minutes:seconds,
2416
days\-hours, days\-hours:minutes, days\-hours:minutes:seconds or
2418
Time resolution is one minute and second values are rounded up to
2420
This limit does not apply to jobs executed by SlurmUser or user root.
2424
Minimum count of nodes which may be allocated to any single job.
2425
For BlueGene systems this will be a c\-nodes count and will be converted
2426
to a midplane count with a reduction in resolution.
2427
The default value is 1.
2428
This limit does not apply to jobs executed by SlurmUser or user root.
2432
Comma separated list of nodes (or base partitions for BlueGene systems)
2433
which are associated with this partition.
2434
Node names may be specified using the node range expression syntax
2435
described above. A blank list of nodes
2436
(i.e. "Nodes= ") can be used if one wants a partition to exist,
2437
but have no resources (possibly on a temporary basis).
2441
Name by which the partition may be referenced (e.g. "Interactive").
2442
This name can be specified by users when submitting jobs.
2443
If the \fBPartitionName\fR is "DEFAULT", the values specified
2444
with that record will apply to subsequent partition specifications
2445
unless explicitly set to other values in that partition record or
2446
replaced with a different set of default values.
2450
Mechanism used to preempt jobs from this partition when
2451
\fBPreemptType=preempt/partition_prio\fR is configured.
2452
This partition specific \fBPreemptMode\fR configuration parameter will override
2453
the \fBPreemptMode\fR configuration parameter set for the cluster as a whole.
2454
The cluster\-level \fBPreemptMode\fR must include the GANG option if
2455
\fBPreemptMode\fR is configured to SUSPEND for any partition.
2456
The cluster\-level \fBPreemptMode\fR must not be OFF if \fBPreemptMode\fR
2457
is enabled for any partition.
2458
See the description of the cluster\-level \fBPreemptMode\fR configuration
2459
parameter above for further information.
2463
Jobs submitted to a higher priority partition will be dispatched
2464
before pending jobs in lower priority partitions and if possible
2465
they will preempt running jobs from lower priority partitions.
2466
Note that a partition's priority takes precedence over a job's
2468
The value may not exceed 65533.
2473
Specifies if only user ID zero (i.e. user \fIroot\fR) may allocate resources
2474
in this partition. User root may allocate resources for any other user,
2475
but the request must be initiated by user root.
2476
This option can be useful for a partition to be managed by some
2477
external entity (e.g. a higher\-level job manager) and prevents
2478
users from directly using those resources.
2479
Possible values are "YES" and "NO".
2480
The default value is "NO".
2484
Controls the ability of the partition to execute more than one job at a
2485
time on each resource (node, socket or core depending upon the value
2486
of \fBSelectTypeParameters\fR).
2487
If resources are to be shared, avoiding memory over\-subscription
2489
\fBSelectTypeParameters\fR should be configured to treat
2490
memory as a consumable resource and the \fB\-\-mem\fR option
2491
should be used for job allocations.
2492
Sharing of resources is typically useful only when using gang scheduling
2493
(\fBPreemptMode=suspend\fR or \fBPreemptMode=kill\fR).
2494
Possible values for \fBShared\fR are "EXCLUSIVE", "FORCE", "YES", and "NO".
2495
The default value is "NO".
2496
For more information see the following web pages:
2499
\fIhttps://computing.llnl.gov/linux/slurm/cons_res.html\fR,
2501
\fIhttps://computing.llnl.gov/linux/slurm/cons_res_share.html\fR,
2503
\fIhttps://computing.llnl.gov/linux/slurm/gang_scheduling.html\fR, and
2505
\fIhttps://computing.llnl.gov/linux/slurm/preempt.html\fR.
2511
Allocates entire nodes to jobs even with select/cons_res configured.
2512
Jobs that run in partitions with "Shared=EXCLUSIVE" will have
2513
exclusive access to all allocated nodes.
2516
Makes all resources in the partition available for sharing
2517
without any means for users to disable it.
2518
May be followed with a colon and maximum number of jobs in
2519
running or suspended state.
2520
For example "Shared=FORCE:4" enables each node, socket or
2521
core to execute up to four jobs at once.
2522
Recommended only for BlueGene systems configured with
2523
small blocks or for systems running
2524
with gang scheduling (\fBSchedulerType=sched/gang\fR).
2527
Makes all resources in the partition available for sharing,
2528
but honors a user's request for dedicated resources.
2529
If \fBSelectType=select/cons_res\fR, then resources will be
2530
over\-subscribed unless explicitly disabled in the job submit
2531
request using the "\-\-exclusive" option.
2532
With \fBSelectType=select/bluegene\fR or \fBSelectType=select/linear\fR,
2533
resources will only be over\-subscribed when explicitly requested
2534
by the user using the "\-\-share" option on job submission.
2535
May be followed with a colon and maximum number of jobs in
2536
running or suspended state.
2537
For example "Shared=YES:4" enables each node, socket or
2538
core to execute up to four jobs at once.
2539
Recommended only for systems running with gang scheduling
2540
(\fBSchedulerType=sched/gang\fR).
2543
Selected resources are allocated to a single job. No resource will be
2544
allocated to more than one job.
2549
State of partition or availability for use. Possible values
2550
are "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
2551
See also the related "Alternate" keyword.
2555
Designates that new jobs may queued on the partition, and that
2556
jobs may be allocated nodes and run from the partition.
2559
Designates that new jobs may be queued on the partition, but
2560
queued jobs may not be allocated nodes and run from the partition. Jobs
2561
already running on the partition continue to run. The jobs
2562
must be explicitly canceled to force their termination.
2565
Designates that no new jobs may be queued on the partition (job
2566
submission requests will be denied with an error message), but jobs
2567
already queued on the partition may be allocated nodes and run.
2568
See also the "Alternate" partition specification.
2571
Designates that no new jobs may be queued on the partition,
2572
and jobs already queued may not be allocated nodes and run.
2573
See also the "Alternate" partition specification.
2576
.SH "Prolog and Epilog Scripts"
2577
There are a variety of prolog and epilog program options that
2578
execute with various permissions and at various times.
2579
The four options most likely to be used are:
2580
\fBProlog\fR and \fBEpilog\fR (executed once on each compute node
2581
for each job) plus \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR
2582
(executed once on the \fBControlMachine\fR for each job).
2584
NOTE: Standard output and error messages are normally not preserved.
2585
Explicitly write output and error messages to an appropriate location
2586
if you wish to preserve that information.
2588
NOTE: The Prolog script is ONLY run on any individual
2589
node when it first sees a job step from a new allocation; it does not
2590
run the Prolog immediately when an allocation is granted. If no job steps
2591
from an allocation are run on a node, it will never run the Prolog for that
2592
allocation. The Epilog, on the other hand, always runs on every node of an
2593
allocation when the allocation is released.
2595
Information about the job is passed to the script using environment
2597
Unless otherwise specified, these environment variables are available
2598
to all of the programs.
2600
\fBBASIL_RESERVATION_ID\fR
2601
Basil reservation ID.
2602
Available on Cray XT systems only.
2604
\fBMPIRUN_PARTITION\fR
2605
BlueGene partition name.
2606
Available on BlueGene systems only.
2608
\fBSLURM_JOB_ACCOUNT\fR
2609
Account name used for the job.
2610
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2612
\fBSLURM_JOB_CONSTRAINTS\fR
2613
Features required to run the job.
2614
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2616
\fBSLURM_JOB_DERIVED_EC\fR
2617
The highest exit code of all of the job steps.
2618
Available in \fBEpilogSlurmctld\fR only.
2620
\fBSLURM_JOB_EXIT_CODE\fR
2621
The exit code of the job script (or salloc).
2622
Available in \fBEpilogSlurmctld\fR only.
2625
Group ID of the job's owner.
2626
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2628
\fBSLURM_JOB_GROUP\fR
2629
Group name of the job's owner.
2630
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2635
\fBSLURM_JOB_NAME\fR
2637
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2639
\fBSLURM_JOB_NODELIST\fR
2640
Nodes assigned to job. A SLURM hostlist expression.
2641
"scontrol show hostnames" can be used to convert this to a
2642
list of individual host names.
2643
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2645
\fBSLURM_JOB_PARTITION\fR
2646
Partition that job runs in.
2647
Available in \fBPrologSlurmctld\fR and \fBEpilogSlurmctld\fR only.
2650
User ID of the job's owner.
2652
\fBSLURM_JOB_USER\fR
2653
User name of the job's owner.
2655
.SH "NETWORK TOPOLOGY"
2656
SLURM is able to optimize job allocations to minimize network contention.
2657
Special SLURM logic is used to optimize allocations on systems with a
2658
three\-dimensional interconnect (BlueGene, Sun Constellation, etc.)
2659
and information about configuring those systems are available on
2660
web pages available here: <https://computing.llnl.gov/linux/slurm/>.
2661
For a hierarchical network, SLURM needs to have detailed information
2662
about how nodes are configured on the network switches.
2664
Given network topology information, SLURM allocates all of a job's
2665
resources onto a single leaf of the network (if possible) using a best\-fit
2667
Otherwise it will allocate a job's resources onto multiple leaf switches
2668
so as to minimize the use of higher\-level switches.
2669
The \fBTopologyPlugin\fR parameter controls which plugin is used to
2670
collect network topology information.
2671
The only values presently supported are
2672
"topology/3d_torus" (default for IBM BlueGene, Sun Constellation and
2673
Cray XT systems, performs best\-fit logic over three\-dimensional topology),
2674
"topology/none" (default for other systems,
2675
best\-fit logic over one\-dimensional topology),
2676
"topology/tree" (determine the network topology based
2677
upon information contained in a topology.conf file,
2678
see "man topology.conf" for more information).
2679
Future plugins may gather topology information directly from the network.
2680
The topology information is optional.
2681
If not provided, SLURM will perform a best\-fit algorithm assuming the
2682
nodes are in a one\-dimensional array as configured and the communications
2683
cost is related to the node distance in this array.
2685
.SH "RELOCATING CONTROLLERS"
2686
If the cluster's computers used for the primary or backup controller
2687
will be out of service for an extended period of time, it may be
2688
desirable to relocate them.
2689
In order to do so, follow this procedure:
2691
1. Stop the SLURM daemons
2693
2. Modify the slurm.conf file appropriately
2695
3. Distribute the updated slurm.conf file to all nodes
2697
4. Restart the SLURM daemons
2699
There should be no loss of any running or pending jobs.
2700
Insure that any nodes added to the cluster have the current
2701
slurm.conf file installed.
2703
\fBCAUTION:\fR If two nodes are simultaneously configured as the
2704
primary controller (two nodes on which \fBControlMachine\fR specify
2705
the local host and the \fBslurmctld\fR daemon is executing on each),
2706
system behavior will be destructive.
2707
If a compute node has an incorrect \fBControlMachine\fR or
2708
\fBBackupController\fR parameter, that node may be rendered
2709
unusable, but no other harm will result.
2715
# Sample /etc/slurm.conf for dev[0\-25].llnl.gov
2727
BackupController=dev1
2735
Epilog=/usr/local/slurm/epilog
2737
Prolog=/usr/local/slurm/prolog
2745
JobCompType=jobcomp/filetxt
2747
JobCompLoc=/var/log/slurm/jobcomp
2755
PluginDir=/usr/local/lib:/usr/local/slurm/lib
2759
SchedulerType=sched/backfill
2761
SlurmctldLogFile=/var/log/slurm/slurmctld.log
2763
SlurmdLogFile=/var/log/slurm/slurmd.log
2769
SlurmdSpoolDir=/usr/local/slurm/slurmd.spool
2771
StateSaveLocation=/usr/local/slurm/slurm.state
2773
SwitchType=switch/elan
2779
JobCredentialPrivateKey=/usr/local/slurm/private.key
2782
JobCredentialPublicCertificate=/usr/local/slurm/public.cert
2787
# Node Configurations
2791
NodeName=DEFAULT Procs=2 RealMemory=2000 TmpDisk=64000
2793
NodeName=DEFAULT State=UNKNOWN
2795
NodeName=dev[0\-25] NodeAddr=edev[0\-25] Weight=16
2797
# Update records for specific DOWN nodes
2799
DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
2803
# Partition Configurations
2807
PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
2809
PartitionName=debug Nodes=dev[0\-8,18\-25] Default=YES
2811
PartitionName=batch Nodes=dev[9\-17] MinNodes=4
2813
PartitionName=long Nodes=dev[9\-17] MaxTime=120 AllowGroups=admin
2815
.SH "FILE AND DIRECTORY PERMISSIONS"
2816
There are three classes of files:
2817
Files used by \fBslurmctld\fR must be accessible by user \fBSlurmUser\fR
2818
and accessible by the primary and backup control machines.
2819
Files used by \fBslurmd\fR must be accessible by user root and
2820
accessible from every compute node.
2821
A few files need to be accessible by normal users on all login and
2823
While many files and directories are listed below, most of them will
2824
not be used with most configurations.
2826
\fBAccountingStorageLoc\fR
2827
If this specifies a file, it must be writable by user \fBSlurmUser\fR.
2828
The file must be accessible by the primary and backup control machines.
2829
It is recommended that the file be readable by all users from login and
2833
Must be executable by user root.
2834
It is recommended that the file be readable by all users.
2835
The file must exist on every compute node.
2837
\fBEpilogSlurmctld\fR
2838
Must be executable by user \fBSlurmUser\fR.
2839
It is recommended that the file be readable by all users.
2840
The file must be accessible by the primary and backup control machines.
2842
\fBHealthCheckProgram\fR
2843
Must be executable by user root.
2844
It is recommended that the file be readable by all users.
2845
The file must exist on every compute node.
2847
\fBJobCheckpointDir\fR
2848
Must be writable by user \fBSlurmUser\fR and no other users.
2849
The file must be accessible by the primary and backup control machines.
2852
If this specifies a file, it must be writable by user \fBSlurmUser\fR.
2853
The file must be accessible by the primary and backup control machines.
2855
\fBJobCredentialPrivateKey\fR
2856
Must be readable only by user \fBSlurmUser\fR and writable by no other users.
2857
The file must be accessible by the primary and backup control machines.
2859
\fBJobCredentialPublicCertificate\fR
2860
Readable to all users on all nodes.
2861
Must not be writable by regular users.
2864
Must be executable by user \fBSlurmUser\fR.
2865
Must not be writable by regular users.
2866
The file must be accessible by the primary and backup control machines.
2869
Must be executable by user root.
2870
It is recommended that the file be readable by all users.
2871
The file must exist on every compute node.
2873
\fBPrologSlurmctld\fR
2874
Must be executable by user \fBSlurmUser\fR.
2875
It is recommended that the file be readable by all users.
2876
The file must be accessible by the primary and backup control machines.
2879
Must be executable by user \fBSlurmUser\fR.
2880
The file must be accessible by the primary and backup control machines.
2882
\fBSallocDefaultCommand\fR
2883
Must be executable by all users.
2884
The file must exist on every login and compute node.
2887
Readable to all users on all nodes.
2888
Must not be writable by regular users.
2890
\fBSlurmctldLogFile\fR
2891
Must be writable by user \fBSlurmUser\fR.
2892
The file must be accessible by the primary and backup control machines.
2894
\fBSlurmctldPidFile\fR
2895
Must be writable by user root.
2896
Preferably writable and removable by \fBSlurmUser\fR.
2897
The file must be accessible by the primary and backup control machines.
2900
Must be writable by user root.
2901
A distinct file must exist on each compute node.
2904
Must be writable by user root.
2905
A distinct file must exist on each compute node.
2907
\fBSlurmdSpoolDir\fR
2908
Must be writable by user root.
2909
A distinct file must exist on each compute node.
2912
Must be executable by all users.
2913
The file must exist on every login and compute node.
2916
Must be executable by all users.
2917
The file must exist on every login and compute node.
2919
\fBStateSaveLocation\fR
2920
Must be writable by user \fBSlurmUser\fR.
2921
The file must be accessible by the primary and backup control machines.
2923
\fBSuspendProgram\fR
2924
Must be executable by user \fBSlurmUser\fR.
2925
The file must be accessible by the primary and backup control machines.
2928
Must be executable by all users.
2929
The file must exist on every compute node.
2932
Must be executable by all users.
2933
The file must exist on every compute node.
2935
\fBUnkillableStepProgram\fR
2936
Must be executable by user \fBSlurmUser\fR.
2937
The file must be accessible by the primary and backup control machines.
2940
Copyright (C) 2002\-2007 The Regents of the University of California.
2941
Copyright (C) 2008\-2010 Lawrence Livermore National Security.
2942
Portions Copyright (C) 2010 SchedMD <http://www.sched\-md.com>.
2943
Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2944
CODE\-OCEC\-09\-009. All rights reserved.
2946
This file is part of SLURM, a resource management program.
2947
For details, see <https://computing.llnl.gov/linux/slurm/>.
2949
SLURM is free software; you can redistribute it and/or modify it under
2950
the terms of the GNU General Public License as published by the Free
2951
Software Foundation; either version 2 of the License, or (at your option)
2954
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
2955
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
2956
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
2964
\fBbluegene.conf\fR(5), \fBcgroup.conf\fR(5), \fBgethostbyname\fR(3),
2965
\fBgetrlimit\fR(2), \fBgres.conf\fR(5), \fBgroup\fR(5), \fBhostname\fR(1),
2966
\fBscontrol\fR(1), \fBslurmctld\fR(8), \fBslurmd\fR(8),
2967
\fBslurmdbd\fR(8), \fBslurmdbd.conf\fR(5), \fBsrun(1)\fR,
2968
\fBspank(8)\fR, \fBsyslog\fR(2), \fBtopology.conf\fR(5), \fBwiki.conf\fR(5)