You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview

All user processes must be run as cluster jobs in the cluster job queuing system, slurm.


Slurm provides an easy and transparent way to streamline calculations on the cluster and make cluster use more efficient. Slurm offers several user friendly features:

  • It has the ability to run parallel calculations by allocating a set of nodes specifically for that task.
  • Users running serial jobs with minimal input/output can start the jobs directly from the commandline without resorting to batch files.
  • Slurm has the ability to handle batch jobs as well.
  • Slurm allows users to run interactive commandline or X11 jobs.

 

Node Status

sinfo

sinfo reports the state of partitions and nodes managed by Slurm. Example output:

$ sinfo
 
PARTITION        AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*                up   infinite      9  down* rb2u[3,5-12]
all*                up   infinite      5   idle dn[1-2],rb2u[1-2,4]
xeon-6136-256G      up   infinite      2   idle dn[1-2]
xeon-e5-2620-32G    up   infinite      9  down* rb2u[3,5-12]
xeon-e5-2620-32G    up   infinite      3   idle rb2u[1-2,4]

In the above, the partition "all" contains all the nodes. There are also partitions for the differing specifications of nodes. Nodes will be listed both in "all" and their individual spec class partition.

The above eample shows that nodes dn1 and dn2 are idle – up and no jobs are running. Nodes rb2u3, rb2u5, rb2u6, rb2u7.... through rb2u12 are all down. If a node is allocated to a job, the status will be "alloc" . If a node is set to run its current jobs and allow no more jobs in preparation of downtime, its status will be set to "drain". 

sview

sview is a graphical user interface to get state information for nodes (and jobs).

Job Status

sview

sview is a graphical user interface to get job state information on nodes.

squeue

squeue reports the state of jobs or job steps. By default, it reports the running jobs in priority order and then the pending jobs in priority order.

$ squeue
 
JOBID PARTITION  NAME  USER ST  TIME NODES NODELIST(REASON)
65646     batch  chem  mike  R 24:19     2 adev[7-8]
65647     batch   bio  joan  R  0:09     1 adev14
65648     batch  math  phil PD  0:00     6 (Resources)

Each calculation is given a JOBID. This can be used to cancel the job if necessary. The PARTITION field references the node class spec partitions as mentioned above in the "sinfo" documentation. The NAME field gives the name of the program being used for the calculation. The NODELIST field shows which node each calculation is running on. And the NODES field shows the number of nodes in use for that job.

 

  • No labels