|
Table of contents:
- How do I run jobs under SLURM?
- Will Open MPI support "srun -n X my_mpi_application"?
- I use SLURM on a cluster with the OpenFabrics network stack. Do I need to do anything special?
| 1. How do I run jobs under SLURM? |
The short answer is just to use mpirun as normal.
The longer answer is that Open MPI supports launching parallel jobs in
two of the three methods that SLURM supports:
- Launching via "
salloc ...": supported (older versions of SLURM used "srun -A ...")
- Launching via "
sbatch ...": supported (older versions of SLURM used "srun -B ...")
- Launching via "
srun -n X my_mpi_application": not supported
Specifically, you need to launch Open MPI's mpirun in an interactive
SLURM allocation (via the salloc command) or you need to submit a
script to SLURM (via the sbatch command). Open MPI does not yet
support "direct" launching of MPI executables via srun.
Open MPI automatically obtains both the list of hosts and how many
processes to start on each host from SLURM directly. Hence, it is
unnecessary to specify the --hostfile, --host, or -np options to
mpirun. Open MPI will also use SLURM-native mechanisms to launch
and kill processes ([rsh] and/or ssh are not required).
For example:
# Allocate a SLURM job with 4 nodes
shell$ salloc -N 4 sh
# Now run an Open MPI job on all the nodes allocated by SLURM
# (Note that you need to specify -np for the 1.0 and 1.1 series;
# the -np value is inferred directly from SLURM starting with the
# v1.2 series)
shell$ mpirun my_mpi_application
|
This will run the 4 MPI processes on the nodes that were allocated by
SLURM. Equivalently, you can do this:
# Allocate a SLURM job with 4 nodes and run your MPI application in it
shell$ salloc -N 4 mpirun my_mpi_aplication
|
Or, if submitting a script:
shell$ cat my_script.sh
#!/bin/sh
mpirun my_mpi_application
shell$ sbatch -N 4 my_script.sh
srun: jobid 1234 submitted
shell$
|
| 2. Will Open MPI support "srun -n X my_mpi_application"? |
It is on the to-do list, yes. But it has not bubbled up high
enough in priority to occur yet.
| 3. I use SLURM on a cluster with the OpenFabrics network stack. Do I need to do anything special? |
Yes. You need to ensure that SLURM sets up the locked memory
limits properly. Be sure to see this FAQ entry about
locked memory and this FAQ entry for
references about SLURM.
|