2
<CENTER><A HREF = "Section_packages.html">Previous Section</A> - <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> -
3
<A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A>
13
<P><A HREF = "Section_accelerate.html">Return to Section accelerate overview</A>
15
<H4>5.3.1 USER-CUDA package
17
<P>The USER-CUDA package was developed by Christian Trott (Sandia) while
18
at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions
19
of many pair styles, many fixes, a few computes, and for long-range
20
Coulombics via the PPPM command. It has the following general
23
<UL><LI>The package is designed to allow an entire LAMMPS calculation, for
24
many timesteps, to run entirely on the GPU (except for inter-processor
25
MPI communication), so that atom-based data (e.g. coordinates, forces)
26
do not have to move back-and-forth between the CPU and GPU.
28
<LI>The speed-up advantage of this approach is typically better when the
29
number of atoms per GPU is large
31
<LI>Data will stay on the GPU until a timestep where a non-USER-CUDA fix
32
or compute is invoked. Whenever a non-GPU operation occurs (fix,
33
compute, output), data automatically moves back to the CPU as needed.
34
This may incur a performance penalty, but should otherwise work
37
<LI>Neighbor lists are constructed on the GPU.
39
<LI>The package only supports use of a single MPI task, running on a
40
single CPU (core), assigned to each GPU.
42
<P>Here is a quick overview of how to use the USER-CUDA package:
44
<UL><LI>build the library in lib/cuda for your GPU hardware with desired precision
45
<LI>include the USER-CUDA package and build LAMMPS
46
<LI>use the mpirun command to specify 1 MPI task per GPU (on each node)
47
<LI>enable the USER-CUDA package via the "-c on" command-line switch
48
<LI>specify the # of GPUs per node
49
<LI>use USER-CUDA styles in your input script
51
<P>The latter two steps can be done using the "-pk cuda" and "-sf cuda"
52
<A HREF = "Section_start.html#start_7">command-line switches</A> respectively. Or
53
the effect of the "-pk" or "-sf" switches can be duplicated by adding
54
the <A HREF = "package.html">package cuda</A> or <A HREF = "suffix.html">suffix cuda</A> commands
55
respectively to your input script.
57
<P><B>Required hardware/software:</B>
59
<P>To use this package, you need to have one or more NVIDIA GPUs and
60
install the NVIDIA Cuda software on your system:
62
<P>Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
63
help you to find out the Compute Capability of your card:
65
<P>http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
67
<P>Install the Nvidia Cuda Toolkit (version 3.2 or higher) and the
68
corresponding GPU drivers. The Nvidia Cuda SDK is not required, but
69
we recommend it also be installed. You can then make sure its sample
70
projects can be compiled without problems.
72
<P><B>Building LAMMPS with the USER-CUDA package:</B>
74
<P>This requires two steps (a,b): build the USER-CUDA library, then build
75
LAMMPS with the USER-CUDA package.
77
<P>You can do both these steps in one line, using the src/Make.py script,
78
described in <A HREF = "Section_start.html#start_4">Section 2.4</A> of the manual.
79
Type "Make.py -h" for help. If run from the src directory, this
80
command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
81
starting Makefile.machine:
83
<PRE>Make.py -p cuda -cuda mode=single arch=20 -o cuda lib-cuda file mpi
85
<P>Or you can follow these two (a,b) steps:
87
<P>(a) Build the USER-CUDA library
89
<P>The USER-CUDA library is in lammps/lib/cuda. If your <I>CUDA</I> toolkit
90
is not installed in the default system directoy <I>/usr/local/cuda</I> edit
91
the file <I>lib/cuda/Makefile.common</I> accordingly.
93
<P>To build the library with the settings in lib/cuda/Makefile.default,
98
<P>To set options when the library is built, type "make OPTIONS", where
99
<I>OPTIONS</I> are one or more of the following. The settings will be
100
written to the <I>lib/cuda/Makefile.defaults</I> before the build.
102
<PRE><I>precision=N</I> to set the precision level
103
N = 1 for single precision (default)
104
N = 2 for double precision
105
N = 3 for positions in double precision
106
N = 4 for positions and velocities in double precision
107
<I>arch=M</I> to set GPU compute capability
108
M = 35 for Kepler GPUs
109
M = 20 for CC2.0 (GF100/110, e.g. C2050,GTX580,GTX470) (default)
110
M = 21 for CC2.1 (GF104/114, e.g. GTX560, GTX460, GTX450)
111
M = 13 for CC1.3 (GF200, e.g. C1060, GTX285)
112
<I>prec_timer=0/1</I> to use hi-precision timers
113
0 = do not use them (default)
115
this is usually only useful for Mac machines
116
<I>dbg=0/1</I> to activate debug mode
117
0 = no debug mode (default)
119
this is only useful for developers
120
<I>cufft=1</I> for use of the CUDA FFT library
121
0 = no CUFFT support (default)
122
in the future other CUDA-enabled FFT libraries might be supported
124
<P>If the build is successful, it will produce the files liblammpscuda.a and
127
<P>Note that if you change any of the options (like precision), you need
128
to re-build the entire library. Do a "make clean" first, followed by
131
<P>(b) Build LAMMPS with the USER-CUDA package
137
<P>No additional compile/link flags are needed in Makefile.machine.
139
<P>Note that if you change the USER-CUDA library precision (discussed
140
above) and rebuild the USER-CUDA library, then you also need to
141
re-install the USER-CUDA package and re-build LAMMPS, so that all
142
affected files are re-compiled and linked to the new USER-CUDA
145
<P><B>Run with the USER-CUDA package from the command line:</B>
147
<P>The mpirun or mpiexec command sets the total number of MPI tasks used
148
by LAMMPS (one or multiple per compute node) and the number of MPI
149
tasks used per node. E.g. the mpirun command in MPICH does this via
150
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.
152
<P>When using the USER-CUDA package, you must use exactly one MPI task
155
<P>You must use the "-c on" <A HREF = "Section_start.html#start_7">command-line
156
switch</A> to enable the USER-CUDA package.
157
The "-c on" switch also issues a default <A HREF = "package.html">package cuda 1</A>
158
command which sets various USER-CUDA options to default values, as
159
discussed on the <A HREF = "package.html">package</A> command doc page.
161
<P>Use the "-sf cuda" <A HREF = "Section_start.html#start_7">command-line switch</A>,
162
which will automatically append "cuda" to styles that support it. Use
163
the "-pk cuda Ng" <A HREF = "Section_start.html#start_7">command-line switch</A> to
164
set Ng = # of GPUs per node to a different value than the default set
165
by the "-c on" switch (1 GPU) or change other <A HREF = "package.html">package
168
<PRE>lmp_machine -c on -sf cuda -pk cuda 1 -in in.script # 1 MPI task uses 1 GPU
169
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # 2 MPI tasks use 2 GPUs on a single 16-core (or whatever) node
170
mpirun -np 24 -ppn 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # ditto on 12 16-core nodes
172
<P>The syntax for the "-pk" switch is the same as same as the "package
173
cuda" command. See the <A HREF = "package.html">package</A> command doc page for
174
details, including the default values used for all its options if it
177
<P>Note that the default for the <A HREF = "package.html">package cuda</A> command is
178
to set the Newton flag to "off" for both pairwise and bonded
179
interactions. This typically gives fastest performance. If the
180
<A HREF = "newton.html">newton</A> command is used in the input script, it can
181
override these defaults.
183
<P><B>Or run with the USER-CUDA package by editing an input script:</B>
185
<P>The discussion above for the mpirun/mpiexec command and the requirement
186
of one MPI task per GPU is the same.
188
<P>You must still use the "-c on" <A HREF = "Section_start.html#start_7">command-line
189
switch</A> to enable the USER-CUDA package.
191
<P>Use the <A HREF = "suffix.html">suffix cuda</A> command, or you can explicitly add a
192
"cuda" suffix to individual styles in your input script, e.g.
194
<PRE>pair_style lj/cut/cuda 2.5
196
<P>You only need to use the <A HREF = "package.html">package cuda</A> command if you
197
wish to change any of its option defaults, including the number of
198
GPUs/node (default = 1), as set by the "-c on" <A HREF = "Section_start.html#start_7">command-line
201
<P><B>Speed-ups to expect:</B>
203
<P>The performance of a GPU versus a multi-core CPU is a function of your
204
hardware, which pair style is used, the number of atoms/GPU, and the
205
precision used on the GPU (double, single, mixed).
207
<P>See the <A HREF = "http://lammps.sandia.gov/bench.html">Benchmark page</A> of the
208
LAMMPS web site for performance of the USER-CUDA package on different
211
<P><B>Guidelines for best performance:</B>
213
<UL><LI>The USER-CUDA package offers more speed-up relative to CPU performance
214
when the number of atoms per GPU is large, e.g. on the order of tens
215
or hundreds of 1000s.
217
<LI>As noted above, this package will continue to run a simulation
218
entirely on the GPU(s) (except for inter-processor MPI communication),
219
for multiple timesteps, until a CPU calculation is required, either by
220
a fix or compute that is non-GPU-ized, or until output is performed
221
(thermo or dump snapshot or restart file). The less often this
222
occurs, the faster your simulation will run.
224
<P><B>Restrictions:</B>