Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • It has the ability to run parallel calculations by allocating a set of nodes specifically for that task.
  • Users running serial jobs with minimal input/output can start the jobs directly from the commandline without resorting to batch files.
  • Slurm has the ability to handle batch jobs as well.
  • Slurm allows users to run interactive commandline or X11 jobs.

 


Node Status

sinfo

sinfo reports the state of partitions and nodes managed by Slurm. Example output:

...

Code Block
languagebash
$ prun -v a.out
[prun] Master compute host = dn1
[prun] Resource manager = slurm
[prun] Setting env variable: OMPI_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: PMIX_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: OMPI_MCA_ras=^tm
[prun] Setting env variable: OMPI_MCA_ess=^tm
[prun] Setting env variable: OMPI_MCA_plm=^tm
[prun] Setting env variable: OMPI_MCA_io=romio314
[prun] Launch cmd = mpirun a.out (family=openmpi3)

 Hello, world (4 procs total)
    --> Process #   0 of   4 is alive. -> dn1
    --> Process #   1 of   4 is alive. -> rb2u1
    --> Process #   2 of   4 is alive. -> rb2u2
    --> Process #   3 of   4 is alive. -> rb2u4

...


Interactive Shells

You can allocate an interactive shell on a node for running calculations by hand.

...

To allocate a shell with X11 (Option 1 for X11 forwarding) forwarding so that you can use the rappture GUI:

Code Block
languagebash
$ module load rappture
$ srun -n1 --x11 --pty bash

If you did not connect to the head node with X11 forwarding enabled, you will see the following errorTo allocate a shell with X11 (Option 2 for X11 forwarding, which may work better than option1 depending on the software and is slightly more complicated to use):

Code Block
languagebash
$# With this method, you end up in two subshells underneath your main nanolab login:
# nanolab main login -> salloc subshell on nanolab -> subshell via ssh -Y on the allocated node

# Do not load modules until you have connected to the allocated node, as exemplified below:

$ salloc -n1

# You will see output such as:

salloc: Granted job allocation 18820
salloc: Waiting for resource configuration
salloc: Nodes <nodename> are ready for job

$ ssh -Y <nodename_from_above>

# Now load your module
$ module load rappture 

# Now run your software

# If you forget the -Y option to ssh, you will see an error such as:
xterm: Can't open display:
xterm: DISPLAY is not set

# When exiting, you will have to exit an extra time. 
# First, out of the node that was allocated to you.
# Then out of the salloc command, which will print output such as
salloc: Relinquishing job allocation 18820

# Then a third time from nanolab itself

If you did not connect to the head node with X11 forwarding enabled and are using option 1, you will see the following error:

Code Block
languagebash
$ srun srun -n1 --x11 --pty bash
srun: error: No DISPLAY variable set, cannot setup x11 forwarding.

...

Code Block
languagebash
$ sbatch job.mpi
Submitted batch job 339

...


Stopping a Job

scancel

scancel is used to cancel a pending or running job or job step. To do this we need the JOB ID for the calculation and the command scancel. The JOB ID can be determined using the squeue command described above. To cancel the job with ID=84, just type:

Code Block
languagebash
$ scancel 84

 


If you rerun squeue you will see that the job is gone.