Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Each calculation is given a JOBID. This can be used to cancel the job if necessary. The PARTITION field references the node class spec partitions as mentioned above in the "sinfo" documentation. The NAME field gives the name of the program being used for the calculation. The NODELIST field shows which node each calculation is running on. And the NODES field shows the number of nodes in use for that job.

Starting a Job

srun

You can start a calculation/job directly from the commandprompt by using srun. This command submits jobs to the slurm job submission system and can also be used to start the same command on multiple nodes. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics such as memory and disk space. 

srun Examples

Code Block
languagebash
$ srun -N4 /bin/hostname
rb2u4
dn1
rb2u1
rb2u2
 

In the example above, we use srun to start the command hostname on 4 nodes in the cluster. The option -N4 tells slurm to run the job on four nodes of its choice. And we see the output printed by the hostname command of each node that was used.

Piping Data

With many calculations it is important to pipe data in (<) from an input file and pipe date out (>) to an output file. The program may also have command line options as well:

Code Block
languagebash
$program [options] < input.dat > output.dat

One of the nice features of srun is that it preserves this ability to redirect input and output. Just remember that any options directly after srun such as –N will be used by srun. However, any options or piping commands after your program name will be used by the program only.

Dealing with Batch Files (-b option for srun)

 

Running Parallel Calculations

 

Interactive Shells

Stopping a Job

scancel

scancel is used to cancel a pending or running job or job step. To do this we need the JOB ID for the calculation and the command scancel. The JOB ID can be determined using the squeue command described above. To cancel the job with ID=84, just type:

Code Block
languagebash
$ scancel 84

 

If you rerun squeue you will see that the job is gone.