Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Each calculation is given a JOBID. This can be used to cancel the job if necessary. The PARTITION field references the node class spec partitions as mentioned above in the "sinfo" documentation. The NAME field gives the name of the program being used for the calculation. The NODELIST field shows which node each calculation is running on. And the NODES field shows the number of nodes in use for that job.

Starting a Job

There are multiple ways to essentially accomplish the same thing. A quick overview:

sbatch and salloc allocate resources to the job, while srun launches parallel tasks across those resources. When invoked within a job allocation, srun will launch parallel tasks across some or all of the allocated resources. In that case, srun inherits by default the pertinent options of the sbatch or salloc which it runs under. You can then (usually) provide srun different options which will override what it receives by default. Each invocation of srun within a job is known as a job step.

srun can also be invoked outside of a job allocation. In that case, srun requests resources, and when those resources are granted, launches tasks across those resources as a single job and job step.

srun

You can start a calculation/job directly from the commandprompt by using srun. This command submits jobs to the slurm job submission system and can also be used to start the same command on multiple nodes. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics such as memory and disk space. 

...

Code Block
languagebash
$ srun -n1 --x11 --pty bash
srun: error: No DISPLAY variable set, cannot setup x11 forwarding.

sbatch Examples

With sbatch, your script file will contain special #SBATCH commands detailing your job requirements. Here is an example:

Code Block
languagebash
#!/bin/bash
 
#SBATCH -J test           # job name
#SBATCH -o job.%j.out     # Name of standard output file (%j expands to %jobId)
#SBATCH -N 2              # Number of nodes requested
#SBATCH -n 16             # Total number of mpi tasks requested
#SBATCH -t 01:30:00       # Run tim e(hh:mm:ss)
 
# Launch MPI-based executable
prun ./a.out

You would then submit the above batch execution script with the sbatch command:

Code Block
languagebash
$ sbatch job.mpi
Submitted batch job 339

 

Stopping a Job

scancel

scancel is used to cancel a pending or running job or job step. To do this we need the JOB ID for the calculation and the command scancel. The JOB ID can be determined using the squeue command described above. To cancel the job with ID=84, just type:

...