Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Running Parallel Calculations

Often we need to run parallel calculations that take advantage of several of the nodes on the cluster. This can be done using the slurm job submission with a few small modifications. Instead of running the calculation directly with srun, we are only going to use srun to reserve the nodes we need for the calculation. Let's say we want to run a parallel calculation on 4 nodes. 

First, we allocate the nodes for the calculation:

Code Block
languagebash
$ srun -n 4 -N 4 --pty bash

The "-n 4" says we will be running 4 tasks. The "-N 4" asks to allocate 4 nodes. And the "–pty bash" says to give us an interactive shell.

After typing the above command, you will see a bash shell prompt on one of the compute nodes. You can now run your command in one of several ways, including the OpenHPC "prun" command (which will execute the "mpirun" command – we can see what prun does by passing the "-v" option to it):

Code Block
languagebash
$ prun -v a.out
[prun] Master compute host = dn1
[prun] Resource manager = slurm
[prun] Setting env variable: OMPI_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: PMIX_MCA_mca_base_component_show_load_errors=0
[prun] Setting env variable: OMPI_MCA_ras=^tm
[prun] Setting env variable: OMPI_MCA_ess=^tm
[prun] Setting env variable: OMPI_MCA_plm=^tm
[prun] Setting env variable: OMPI_MCA_io=romio314
[prun] Launch cmd = mpirun a.out (family=openmpi3)

 Hello, world (4 procs total)
    --> Process #   0 of   4 is alive. -> dn1
    --> Process #   1 of   4 is alive. -> rb2u1
    --> Process #   2 of   4 is alive. -> rb2u2
    --> Process #   3 of   4 is alive. -> rb2u4

 

Interactive Shells

You can allocate an interactive shell on a node for running calculations by hand.

Load any environment modules before the srun command.

To just allocate a bash shell:

Code Block
languagebash
$ srun -n1 --pty bash

To allocate a shell with X11 forwarding so that you can use the rappture GUI:

Code Block
languagebash
$ module load rappture
$ srun -n1 --x11 --pty bash

If you did not connect to the head node with X11 forwarding enabled, you will see the following error:

Code Block
languagebash
$ srun -n1 --x11 --pty bash
srun: error: No DISPLAY variable set, cannot setup x11 forwarding.

 

Stopping a Job

scancel

scancel is used to cancel a pending or running job or job step. To do this we need the JOB ID for the calculation and the command scancel. The JOB ID can be determined using the squeue command described above. To cancel the job with ID=84, just type:

...