93
96
information between major SLURM updates?</li>
94
97
<li><a href="#health_check">Why doesn't the <i>HealthCheckProgram</i>
95
98
execute on DOWN nodes?</li>
99
<li><a href="#batch_lost">What is the meaning of the error
100
"Batch JobId=# missing from master node, killing it"?</a></li>
101
<li><a href="#accept_again">What does the messsage
102
"srun: error: Unable to accept connection: Resources temporarily unavailable"
104
<li><a href="#task_prolog">How could I automatically print a job's
105
SLURM job ID to its standard output?</li>
106
<li><a href="#moab_start">I run SLURM with the Moab or Maui scheduler.
107
How can I start a job under SLURM without the scheduler?</li>
108
<li><a href="#orphan_procs">Why are user processes and <i>srun</i>
109
running even though the job is supposed to be completed?</li>
110
<li><a href="#slurmd_oom">How can I prevent the <i>slurmd</i> and
111
<i>slurmstepd</i> daemons from being killed when a node's memory
113
<li><a href="#ubuntu">I see my host of my calling node as 127.0.1.1
114
instead of the correct ip address. Why is that?</a></li>
98
118
<h2>For Users</h2>
99
119
<p><a name="comp"><b>1. Why is my job/node in COMPLETING state?</b></a><br>
100
120
When a job is terminating, both the job and its nodes enter the COMPLETING state.
123
143
<p>Note that SLURM has two configuration parameters that may be used to
124
144
automate some of this process.
125
145
<i>UnkillableStepProgram</i> specifies a program to execute when
126
non-killable proceses are identified.
146
non-killable processes are identified.
127
147
<i>UnkillableStepTimeout</i> specifies how long to wait for processes
129
149
See the "man slurm.conf" for more information about these parameters.</p>
131
151
<p><a name="rlimit"><b>2. Why are my resource limits not propagated?</b></a><br>
132
152
When the <span class="commandline">srun</span> command executes, it captures the
133
resource limits in effect at that time. These limits are propagated to the allocated
153
resource limits in effect at submit time. These limits are propagated to the allocated
134
154
nodes before initiating the user's job. The SLURM daemon running on that node then
135
155
tries to establish identical resource limits for the job being initiated.
136
156
There are several possible reasons for not being able to establish those
139
159
<li>The hard resource limits applied to SLURM's slurmd daemon are lower
140
160
than the user's soft resources limits on the submit host. Typically
141
161
the slurmd daemon is initiated by the init daemon with the operating
142
system default limits. This may be address either through use of the
162
system default limits. This may be addressed either through use of the
143
163
ulimit command in the /etc/sysconfig/slurm file or enabling
144
164
<a href="#pam">PAM in SLURM</a>.</li>
145
<li>The user's hard resource limits on the allocated node sre lower than
146
the same user's soft hard resource limits on the node from which the
165
<li>The user's hard resource limits on the allocated node are lower than
166
the same user's soft hard resource limits on the node from which the
147
167
job was submitted. It is recommended that the system administrator
148
168
establish uniform hard resource limits for users on all nodes
149
169
within a cluster to prevent this from occurring.</li>
151
171
<p>NOTE: This may produce the error message "Can't propagate RLIMIT_...".
152
The error message is printed only if the user explicity specifies that
172
The error message is printed only if the user explicitly specifies that
153
173
the resource limit should be propagated or the srun command is running
154
174
with verbose logging of actions from the slurmd daemon (e.g. "srun -d6 ...").</p>
988
1016
SLURM output expecting the old format (e.g. LSF, Maui or Moab).
990
1018
<p><a name="file_limit"><b>25. What causes the error
991
"Unable to accept new connection: Too many open files"?</b><br>
1019
"Unable to accept new connection: Too many open files"?</b></a><br>
992
1020
The srun command automatically increases its open file limit to
993
1021
the hard limit in order to process all of the standard input and output
994
1022
connections to the launched tasks. It is recommended that you set the
995
1023
open file hard limit to 8192 across the cluster.
997
1025
<p><a name="slurmd_log"><b>26. Why does the setting of <i>SlurmdDebug</i>
998
fail to log job step information at the appropriate level?</b><br>
1026
fail to log job step information at the appropriate level?</b></a><br>
999
1027
There are two programs involved here. One is <b>slurmd</b>, which is
1000
1028
a persistent daemon running at the desired debug level. The second
1001
1029
program is <b>slurmstep</b>, which executed the user job and its
1032
1060
slurmdbd you can also query any cluster using the slurmdbd from any
1033
1061
other cluster's nodes.
1035
<p><a name="debug"><b>29. How can I build SLURM with debugging symbols?</b></br>
1063
<p><a name="debug"><b>29. How can I build SLURM with debugging symbols?</b></a></br>
1036
1064
Set your CFLAGS environment variable before building.
1037
1065
You want the "-g" option to produce debugging information and
1038
1066
"-O0" to set the optimization level to zero (off). For example:<br>
1039
1067
CFLAGS="-g -O0" ./configure ...
1041
1069
<p><a name="state_preserve"><b>30. How can I easily preserve drained node
1042
information between major SLURM updates?</b><br>
1070
information between major SLURM updates?</b></a><br>
1043
1071
Major SLURM updates generally have changes in the state save files and
1044
1072
communication protocols, so a cold-start (without state) is generally
1045
1073
required. If you have nodes in a DRAIN state and want to preserve that
1054
1082
<p><a name="health_check"><b>31. Why doesn't the <i>HealthCheckProgram</i>
1055
execute on DOWN nodes?</b><br>
1083
execute on DOWN nodes?</a></b><br>
1056
1084
Hierarchical communications are used for sending this message. If there
1057
1085
are DOWN nodes in the communications hierarchy, messages will need to
1058
be re-routed. This limits SLURM's ability to tightly synchroize the
1086
be re-routed. This limits SLURM's ability to tightly synchronize the
1059
1087
execution of the <i>HealthCheckProgram</i> across the cluster, which
1060
could adversly impact performance of parallel applications.
1088
could adversely impact performance of parallel applications.
1061
1089
The use of CRON or node startup scripts may be better suited to insure
1062
1090
that <i>HealthCheckProgram</i> gets executed on nodes that are DOWN
1063
1091
in SLURM. If you still want to have SLURM try to execute
1110
<p><a name="batch_lost"><b>32. What is the meaning of the error
1111
"Batch JobId=# missing from master node, killing it"?</b></a><br>
1112
A shell is launched on node zero of a job's allocation to execute
1113
the submitted program. The <i>slurmd</i> daemon executing on each compute
1114
node will periodically report to the <i>slurmctld</i> what programs it
1115
is executing. If a batch program is expected to be running on some
1116
node (i.e. node zero of the job's allocation) and is not found, the
1117
message above will be logged and the job cancelled. This typically is
1118
associated with exhausting memory on the node or some other critical
1119
failure that cannot be recovered from. The equivalent message in
1120
earlier releases of slurm is
1121
"Master node lost JobId=#, killing it".
1123
<p><a name="accept_again"><b>33. What does the messsage
1124
"srun: error: Unable to accept connection: Resources temporarily unavailable"
1125
indicate?</b></a><br>
1126
This has been reported on some larger clusters running SUSE Linux when
1127
a user's resource limits are reached. You may need to increase limits
1128
for locked memory and stack size to resolve this problem.
1130
<p><a name="task_prolog"><b>34. How could I automatically print a job's
1131
SLURM job ID to its standard output?</b></a></br>
1132
The configured <i>TaskProlog</i> is the only thing that can write to
1133
the job's standard output or set extra environment variables for a job
1134
or job step. To write to the job's standard output, precede the message
1135
with "print ". To export environment variables, output a line of this
1136
form "export name=value". The example below will print a job's SLURM
1137
job ID and allocated hosts for a batch job only.
1142
# Sample TaskProlog script that will print a batch job's
1143
# job ID and node list to the job's stdout
1146
if [ X"$SLURM_STEP_ID" = "X" -a X"$SLURM_PROCID" = "X"0 ]
1148
echo "print =========================================="
1149
echo "print SLURM_JOB_ID = $SLURM_JOB_ID"
1150
echo "print SLURM_NODELIST = $SLURM_NODELIST"
1151
echo "print =========================================="
1155
<p><a name="moab_start"><b>35. I run SLURM with the Moab or Maui scheduler.
1156
How can I start a job under SLURM without the scheduler?</b></a></br>
1157
When SLURM is configured to use the Moab or Maui scheduler, all submitted
1158
jobs have their priority initialized to zero, which SLURM treats as a held
1159
job. The job only begins when Moab or Maui decide where and when to start
1160
the job, setting the required node list and setting the job priority to
1161
a non-zero value. To circumvent this, submit your job using a SLURM or
1162
Moab command then manually set its priority to a non-zero value (must be
1163
done by user root). For example:</p>
1165
$ scontrol update jobid=1234 priority=1000000
1167
<p>Note that changes in the configured value of <i>SchedulerType</i> only
1168
take effect when the <i>slurmctld</i> daemon is restarted (reconfiguring
1169
SLURM will not change this parameter. You will also manually need to
1170
modify the priority of every pending job.
1171
When changing to Moab or Maui scheduling, set every job priority to zero.
1172
When changing from Moab or Maui scheduling, set every job priority to a
1173
non-zero value (preferably fairly large, say 1000000).</p>
1175
<p><a name="orphan_procs"><b>36. Why are user processes and <i>srun</i>
1176
running even though the job is supposed to be completed?</b></a></br>
1177
SLURM relies upon a configurable process tracking plugin to determine
1178
when all of the processes associated with a job or job step have completed.
1179
Those plugins relying upon a kernel patch can reliably identify every process.
1180
Those plugins dependent upon process group IDs or parent process IDs are not
1181
reliable. See the <i>ProctrackType</i> description in the <i>slurm.conf</i>
1182
man page for details. We rely upon the sgi_job for most systems.</p>
1184
<p><a name="slurmd_oom"><b>37. How can I prevent the <i>slurmd</i> and
1185
<i>slurmstepd</i> daemons from being killed when a node's memory
1186
is exhausted?</b></a></br>
1187
You can the value set in the <i>/proc/self/oom_adj</i> for
1188
<i>slurmd</i> and <i>slurmstepd</i> by initiating the <i>slurmd</i>
1189
daemon with the <i>SLURMD_OOM_ADJ</i> and/or <i>SLURMSTEPD_OOM_ADJ</i>
1190
environment variables set to the desired values.
1191
A value of -17 typically will disable killing.</p>
1193
<p><a name="ubuntu"><b>38. I see my host of my calling node as 127.0.1.1
1194
instead of the correct ip address. Why is that?</b></a></br>
1195
Some systems by default will put your host in the /etc/hosts file as
1198
127.0.1.1 snowflake.llnl.gov snowflake
1200
This will cause srun and other things to grab 127.0.1.1 as it's
1201
address instead of the correct address and make it so the
1202
communication doesn't work. Solution is to either remove this line or
1203
set a different nodeaddr that is known by your other nodes.</p>
1082
1205
<p class="footer"><a href="#top">top</a></p>
1084
<p style="text-align:center;">Last modified 24 October 2008</p>
1207
<p style="text-align:center;">Last modified 12 June 2009</p>
1086
1209
<!--#include virtual="footer.txt"-->