Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It has the ability to run parallel calculations by allocating a set of nodes specifically for that task.
  • Users running serial jobs with minimal input/output can start the jobs directly from the commandline without resorting to batch files.
  • Slurm has the ability to handle batch jobs as well.
  • Slurm allows users to run interactive commandline or X11 jobs.

 


Node Status

sinfo

sinfo reports the state of partitions and nodes managed by Slurm. Example output:

...

srun can also be invoked outside of a job allocation. In that case, srun requests resources, and when those resources are granted, launches tasks across those resources as a single job and job step.

Load any environment modules before the srun/sbatch/salloc commands. These commands will copy your environment as it is at job submission time.

srun

You can start a calculation/job directly from the commandprompt by using srun. This command submits jobs to the slurm job submission system and can also be used to start the same command on multiple nodes. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics such as memory and disk space. 

...

Code Block
languagebash
$ prun -v a.out
[prun] Master compute host = dn1
[prun] Resource manager = slurm
[prun] Setting env variable: OMPI_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: PMIX_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: OMPI_MCA_ras=^tm
[prun] Setting env variable: OMPI_MCA_ess=^tm
[prun] Setting env variable: OMPI_MCA_plm=^tm
[prun] Setting env variable: OMPI_MCA_io=romio314
[prun] Launch cmd = mpirun a.out (family=openmpi3)

 Hello, world (4 procs total)
    --> Process #   0 of   4 is alive. -> dn1
    --> Process #   1 of   4 is alive. -> rb2u1
    --> Process #   2 of   4 is alive. -> rb2u2
    --> Process #   3 of   4 is alive. -> rb2u4

...


Interactive Shells

You can allocate an interactive shell on a node for running calculations by hand.

...

Code Block
languagebash
$ srun -n1 --pty bash

To allocate a shell with X11 (Option 1 for X11 forwarding) forwarding so that you can use the rappture GUI:

Code Block
languagebash
$ module load rappture
$ srun -n1 --x11 --pty bash

To allocate a shell with X11 (Option 2 for X11 forwarding, which may work better than option1 depending on the software and is slightly more complicated to use):

Code Block
languagebash
# With this method, you end up in two subshells underneath your main nanolab login:
# nanolab main login -> salloc subshell on nanolab -> subshell via ssh -Y on the allocated node

# Do not load modules until you have connected to the allocated node, as exemplified below:

$ salloc -n1

# You will see output such as:

salloc: Granted job allocation 18820
salloc: Waiting for resource configuration
salloc: Nodes <nodename> are ready for job

$ ssh -Y <nodename_from_above>

# Now load your module
$ module load rappture 

# Now run your software

# If you forget the -Y option to ssh, you will see an error such as:
xterm: Can't open display:
xterm: DISPLAY is not set

# When exiting, you will have to exit an extra time. 
# First, out of the node that was allocated to you.
# Then out of the salloc command, which will print output such as
salloc: Relinquishing job allocation 18820

# Then a third time from nanolab itself

If you did not connect to the head node with X11 forwarding enabled and are using option 1If you did not connect to the head node with X11 forwarding enabled, you will see the following error:

Code Block
languagebash
$ srun -n1 --x11 --pty bash
srun: error: No DISPLAY variable set, cannot setup x11 forwarding.

sbatch Examples

Load any environment modules before using the sbatch command to submit your job.

With sbatch, your script file will contain special #SBATCH commands detailing your job requirements. Here is an example:
Code Block
languagebash
#!/bin/bash
 
#SBATCH -J test           # job name
#SBATCH -o job.%j.out     # Name of standard output file (%j expands to %jobId)
#SBATCH -N 2              # Number of nodes requested
#SBATCH -n 16             # Total number of mpi tasks requested
#SBATCH -t 01:30:00       # Run tim e(hh:mm:ss)
 
# Launch MPI-based executable
prun ./a.out

...

Code Block
languagebash
$ sbatch job.mpi
Submitted batch job 339

...


Stopping a Job

scancel

scancel is used to cancel a pending or running job or job step. To do this we need the JOB ID for the calculation and the command scancel. The JOB ID can be determined using the squeue command described above. To cancel the job with ID=84, just type:

Code Block
languagebash
$ scancel 84

 


If you rerun squeue you will see that the job is gone.