1
These are example scripts that can be run with any of
2
the acclerator packages in LAMMPS:
4
USER-CUDA, GPU, USER-INTEL, KOKKOS, USER-OMP, OPT
6
The easiest way to build LAMMPS with these packages
7
is via the src/Make.py tool described in Section 2.4
8
of the manual. You can also type "Make.py -h" to see
9
its options. The easiest way to run these scripts
10
is by using the appropriate
12
Details on the individual accelerator packages
13
can be found in doc/Section_accelerate.html.
17
Build LAMMPS with one or more of the accelerator packages
19
The following command will invoke the src/Make.py tool with one of the
20
command-lines from the Make.list file:
22
../../src/Make.py -r Make.list target
24
target = one or more of the following:
26
cuda_double, cuda_mixed, cuda_single
27
gpu_double, gpu_mixed, gpu_single
29
kokkos_omp, kokkos_cuda, kokkos_phi
31
If successful, the build will produce the file lmp_target in this
34
Note that in addition to any accelerator packages, these packages also
35
need to be installed to run all of the example scripts: ASPHERE,
36
MOLECULE, KSPACE, RIGID.
38
These two targets will build a single LAMMPS executable with all the
39
CPU accelerator packages installed (USER-INTEL for CPU, KOKKOS for
40
OMP, USER-OMP, OPT) or all the GPU accelerator packages installed
41
(USER-CUDA, GPU, KOKKOS for CUDA):
43
target = all_cpu, all_gpu
45
Note that the Make.py commands in Make.list assume an MPI environment
46
exists on your machine and use mpicxx as the wrapper compiler with
47
whatever underlying compiler it wraps by default. If you add "-cc mpi
48
wrap=g++" or "-cc mpi wrap=icc" after the target, you can choose the
49
underlying compiler for mpicxx to invoke. E.g.
51
../../src/Make.py -r Make.list intel_cpu -cc mpi wrap=icc
53
You should do this for any build that includes the USER-INTEL
54
package, since it will perform best with the Intel compilers.
56
Note that for kokkos_cuda, it needs to be "-cc nvcc" instead of "mpi",
57
since a KOKKOS for CUDA build requires NVIDIA nvcc as the wrapper
60
Also note that the Make.py commands in Make.list use the default
61
FFT support which is via the KISS library. If you want to
62
build with another FFT library, e.g. FFTW3, then you can add
63
"-fft fftw3" after the target, e.g.
65
../../src/Make.py -r Make.list gpu -fft fftw3
67
For any build with USER-CUDA, GPU, or KOKKOS for CUDA, be sure to set
68
the arch=XX setting to the appropriate value for the GPUs and Cuda
69
environment on your system. What is defined in the Make.list file is
70
arch=21 for older Fermi GPUs. This can be overridden as follows,
73
../../src/Make.py -r Make.list gpu_double -gpu mode=double arch=35
77
Running with each of the accelerator packages
79
All of the input scripts have a default problem size and number of
82
in.lj = LJ melt with cutoff of 2.5 = 32K atoms for 100 steps
83
in.lj.5.0 = same with cutoff of 5.0 = 32K atoms for 100 steps
84
in.phosphate = 11K atoms for 100 steps
85
in.rhodo = 32K atoms for 100 steps
86
in.lc = 33K atoms for 100 steps (after 200 steps equilibration)
88
These can be reset using the x,y,z and t variables in the command
89
line. E.g. adding "-v x 2 -v y 2 -v z 4 -t 1000" to any of the run
90
command below would run a 16x larger problem (2x2x4) for 1000 steps.
92
Here are example run commands using each of the accelerator packages:
97
mpirun -np 4 lmp_cpu -in in.lj
101
lmp_opt -sf opt < in.lj
102
mpirun -np 4 lmp_opt -sf opt -in in.lj
106
lmp_omp -sf omp -pk omp 1 < in.lj
107
mpirun -np 4 lmp_omp -sf opt -pk omp 1 -in in.lj # 4 MPI, 1 thread/MPI
108
mpirun -np 2 lmp_omp -sf opt -pk omp 4 -in in.lj # 2 MPI, 4 thread/MPI
112
lmp_gpu_double -sf gpu < in.lj
113
mpirun -np 8 lmp_gpu_double -sf gpu < in.lj # 8 MPI, 8 MPI/GPU
114
mpirun -np 12 lmp_gpu_double -sf gpu -pk gpu 2 < in.lj # 12 MPI, 6 MPI/GPU
115
mpirun -np 4 lmp_gpu_double -sf gpu -pk gpu 2 tpa 8 < in.lj.5.0 # 4 MPI, 2 MPI/GPU
117
Note that when running in.lj.5.0 (which has a long cutoff) with the
118
GPU package, the "-pk tpa" setting should be > 1 (e.g. 8) for best
123
lmp_machine -c on -sf cuda < in.lj
124
mpirun -np 1 lmp_machine -c on -sf cuda < in.lj # 1 MPI, 1 MPI/GPU
125
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 < in.lj # 2 MPI, 1 MPI/GPU
127
** KOKKOS package for OMP
129
lmp_kokkos_omp -k on t 1 -sf kk -pk kokkos neigh half < in.lj
130
mpirun -np 2 lmp_kokkos_omp -k on t 4 -sf kk < in.lj # 2 MPI, 4 thread/MPI
132
Note that when running with just 1 thread/MPI, "-pk kokkos neigh half"
133
was speficied to use half neighbor lists which are faster when running
136
** KOKKOS package for CUDA
138
lmp_kokkos_cuda -k on t 1 -sf kk < in.lj # 1 thread, 1 GPU
139
mpirun -np 2 lmp_kokkos_cuda -k on t 6 g 2 -sf kk < in.lj # 2 MPI, 6 thread/MPI, 1 MPI/GPU
141
** KOKKOS package for PHI
143
mpirun -np 1 lmp_kokkos_phi -k on t 240 -sf kk -in in.lj # 1 MPI, 240 threads/MPI
144
mpirun -np 30 lmp_kokkos_phi -k on t 8 -sf kk -in in.lj # 30 MPI, 8 threads/MPI
146
** USER-INTEL package for CPU
148
lmp_intel_cpu -sf intel < in.lj
149
mpirun -np 4 lmp_intl_cpu -sf intel < in.lj # 4 MPI
150
mpirun -np 4 lmp_intl_cpu -sf intel -pk omp 2 < in.lj # 4 MPI, 2 thread/MPI
152
** USER-INTEL package for PHI
154
lmp_intel_phi -sf intel -pk intel 1 omp 16 < in.lc # 1 MPI, 16 CPU thread/MPI, 1 Phi, 240 Phi thread/MPI
155
mpirun -np 4 lmp_intel_phi -sf intel -pk intel 1 omp 2 < in.lc # 4 MPI, 2 CPU threads/MPI, 1 Phi, 60 Phi thread/MPI
157
Note that there is currently no Phi support for pair_style lj/cut in
158
the USER-INTEL package.