1
<!--#include virtual="header.txt"-->
3
<h1>SLURM User and Administrator Guide for Cray systems</h1>
5
<b>NOTE: As of January 2009, the SLURM interface to Cray systems is incomplete.</b>
9
<p>This document describes the unique features of SLURM on
11
You should be familiar with the SLURM's mode of operation on Linux clusters
12
before studying the relatively few differences in Cray system
13
operation described in this document.</p>
15
<p>SLURM's primary mode of operation is designed for use on clusters with
16
nodes configured in a one-dimensional space.
17
Minor changes were required for the <i>smap</i> and <i>sview</i> tools
18
to map nodes in a three-dimensional space.
19
Some changes are also desirable to optimize job placement in three-dimensional
22
<p>SLURM has added an interface to Cray's Application Level Placement Scheduler
23
(ALPS). The ALPS <i>aprun</i> command must used for task launch rather than SLURM's
24
<i>srun</i> command. You should create a resource reservation using SLURM's
25
<i>salloc</i> or <i>sbatch</i> command and execute <i>aprun</i> from within
26
that allocation. <//p>
28
<h2>Administrator Guide</h2>
30
<h3>Cray/ALPS configuration</h3>
32
<p>Node names must have a three-digit suffix describing their
33
zero-origin position in the X-, Y- and Z-dimension respectively (e.g.
34
"tux000" for X=0, Y=0, Z=0; "tux123" for X=1, Y=2, Z=3).
35
Rectangular prisms of nodes can be specified in SLURM commands and
36
configuration files using the system name prefix with the end-points
37
enclosed in square brackets and separated by an "x".
38
For example "tux[620x731]" is used to represent the eight nodes in a
39
block with endpoints at "tux620" and "tux731" (tux620, tux621, tux630,
40
tux631, tux720, tux721, tux730, tux731).
41
<b>NOTE:</b> We anticipate that Cray will provide node coordinate
42
information via the ALPS interface in the future, which may result
43
in a more flexible node naming convention.</p>
45
<p>In ALPS, configure each node to be scheduled using SLURM as type
48
<h3>SLURM configuration</h3>
50
<p>Four variables must be defined in the <i>config.h</i> file:
51
<i>APBASIL_LOC</i> (location of the <i>apbasil</i> command),
52
<i>HAVE_FRONT_END</i>, <i>HAVE_CRAY_XT</i> and <i>HAVE_3D</i>.
53
The <i>apbasil</i> command should automatically be found.
54
If that is not the case, please notify us of its location on your system
55
and we will add that to the search paths tested at configure time.
56
The other variable definitions can be initiated in several different
57
ways depending upon how SLURM is being built.
59
<li>Execute the <i>configure</i> command with the option
60
<i>--enable-cray-xt</i> <b>OR</b></li>
61
<li>Execute the <i>rpmbuild</i> command with the option
62
<i>--with cray_xt</i> <b>OR</b></li>
63
<li>Add <i>%with_cray_xt 1</i> to your <i>~/.rpmmacros</i> file.</li>
66
<p>One <i>slurmd</i> will be used to run all of the batch jobs on
67
the system. It is from here that users will execute <i>aprun</i>
68
commands to launch tasks.
69
This is specified in the <i>slurm.conf</i> file by using the
70
<i>NodeName</i> field to identify the compute nodes and both the
71
<i>NodeAddr</i> and <i>NodeHostname</i> fields to identify the
72
computer when <i>slurmd</i> runs (normally some sort of front-end node)
73
as seen in the examples below.</p>
75
<p>Next you need to select from two options for the resource selection
76
plugin (the <i>SelectType</i> option in SLURM's <i>slurm.conf</i> configuration
79
<li><b>select/cons_res</b> - Performs a best-fit algorithm based upon a
80
one-dimensional space to allocate whole nodes, sockets, or cores to jobs
81
based upon other configuration parameters.</li>
82
<li><b>select/linear</b> - Performs a best-fit algorithm based upon a
83
one-dimensional space to allocate whole nodes to jobs.</li>
86
<p>In order for <i>select/cons_res</i> or <i>select/linear</i> to
87
allocate resources physically nearby in three-dimensional space, the
88
nodes be specified in SLURM's <i>slurm.conf</i> configuration file in
89
such a fashion that those nearby in <i>slurm.conf</i> (one-dimensional
90
space) are also nearby in the physical three-dimensional space.
91
If the definition of the nodes in SLURM's <i>slurm.conf</i> configuration
92
file are listed on one line (e.g. <i>NodeName=tux[000x333]</i>),
93
SLURM will automatically perform that conversion using a
94
<a href="http://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a>.
95
Otherwise you may construct your own node name ordering and list them
96
one node per line in <i>slurm.conf</i>.
97
Note that each node must be listed exactly once and consecutive
98
nodes should be nearby in three-dimensional space.
99
Also note that each node must be defined individually rather than using
100
a hostlist expression in order to preserve the ordering (there is no
101
problem using a hostlist expression in the partition specification after
102
the nodes have already been defined).
103
The open source code used by SLURM to generate the Hilbert curve is
104
included in the distribution at <i>contribs/skilling.c</i> in the event
105
that you wish to experiment with it to generate your own node ordering.
106
Two examples of SLURM configuration files are shown below:</p>
109
# slurm.conf for Cray XT system of size 4x4x4
110
# Parameters removed here
111
SelectType=select/linear
112
NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown
113
NodeName=tux[000x333] NodeAddr=front_end NodeHostname=front_end
114
PartitionName=debug Nodes=tux[000x333] Default=Yes State=UP
118
# slurm.conf for Cray XT system of size 4x4x4
119
# Parameters removed here
120
SelectType=select/linear
121
NodeName=DEFAULT Procs=8 RealMemory=2048 State=Unknown
122
NodeName=tux000 NodeAddr=front_end NodeHostname=front_end
123
NodeName=tux100 NodeAddr=front_end NodeHostname=front_end
124
NodeName=tux110 NodeAddr=front_end NodeHostname=front_end
125
NodeName=tux010 NodeAddr=front_end NodeHostname=front_end
126
NodeName=tux011 NodeAddr=front_end NodeHostname=front_end
127
NodeName=tux111 NodeAddr=front_end NodeHostname=front_end
128
NodeName=tux101 NodeAddr=front_end NodeHostname=front_end
129
NodeName=tux001 NodeAddr=front_end NodeHostname=front_end
130
PartitionName=debug Nodes=tux[000x111] Default=Yes State=UP
133
<p>In both of the examples above, the node names output by the
134
<i>scontrol show nodes</i> will be ordered as defined (sequentially
135
along the Hilbert curve or per the ordering in the <i>slurm.conf</i> file)
136
rather than in numeric order (e.g. "tux001" follows "tux101" rather
138
SLURM partitions should contain nodes which are defined sequentially
139
by that ordering for optimal performance.</p>
141
<p class="footer"><a href="#top">top</a></p>
143
<p style="text-align:center;">Last modified 9 January 2009</p></td>
145
<!--#include virtual="footer.txt"-->