Resources and Job Management

Resources and job management are managed by SLURM Work Manager providing insight, among others, on:

  1. Available resources

  2. Job management

  3. Accounting

  4. Slurm commands

1. Available Resources

To check the available resources the user should execute sinfo

$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug        up 2-00:00:00     10  alloc cn[001-008,012-013]
debug        up 2-00:00:00     78   idle cn[009-011,014-058,063-088]
private*     up 3-00:00:00     10  alloc cn[001-008,012-013]
private*     up 3-00:00:00     78   idle cn[009-011,014-058,063-088]
medium       up 2-00:00:00     10  alloc cn[001-008,012-013]
medium       up 2-00:00:00     48   idle cn[009-011,014-058]
short        up 3-00:00:00      4  alloc cn[059-062]
short        up 3-00:00:00     26   idle cn[063-088]

and execute squeue that gives the compute nodes in use and the jobs status. In some systems one can see all jobs while in others it is limited to the user. (see example below)

2. Job Management

Job Submission

The user submits the job to the system by using a script with the command

$ sbatch <slurm_script_name>

The script can have the form (example for the case of using foss/2021b toolchain)

#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --account=astro_00
#SBATCH --job-name=JOB_NAME
#SBATCH --output=JOB_NAME_%j.out
#SBATCH --error=JOB_NAME_%j.error
#SBATCH --nodes=32
#SBATCH --ntasks=1024
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-socket=16
#SBATCH --exclusive
#SBATCH --partition=debug

export PMIX_MCA_psec=native

module purge
module load foss/2021b HDF/4.2.15

srun ./code_executable

The script sets 1024 cores (ntasks), 1 MPI task per core (cpus-per-task), and 16 cores per CPU/Socket (ntasks-per-socket). The compute nodes are being used exclusively in this run (exclusive), and the queue, which in SLURM is called partition, is debug. The code is executed using srun.

Request of Specific Compute Nodes

Imagine the user wants to use compute nodes cn012 through cn022 in partition debug. Hence, in the script he/she adds the line #SBATCH --nodelist=cn[012-022].

Job Information

After submitting the job the user can check the compute nodes under use or the job status by issuing the command squeue as

$ squeue | grep USER_NAME

  JOBID PARTITION     NAME       USER ST       TIME  NODES  NODELIST(REASON)
  16868     debug     job1  USER_NAME  R    5:54:10      1  cn013
  16867     debug     job2  USER_NAME  R    5:54:15      1  cn012
  16866     debug     job3  USER_NAME  R    5:54:21      8  cn[001-008]

He/She can learn further detailed information on the submitted job, e.g., used resources, paths, scripts, etc., by executing scontrol show jobid <JOBID>, with <JOBID> being the job id:

$ scontrol show jobid 17551

JobId=17551 JobName=<JOB NAME>
 UserId=<UserID> GroupId=<GroupID> MCS_label=N/A
 Priority=2484 Nice=0 Account=<ProjID> QOS=normal
 JobState=RUNNING Reason=None Dependency=(null)
 Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
 RunTime=02:07:25 TimeLimit=1-00:00:00 TimeMin=N/A
 SubmitTime=2022-12-01T09:15:43 EligibleTime=2022-12-01T09:15:43
 AccrueTime=2022-12-01T09:15:43
 StartTime=2022-12-01T09:15:43 EndTime=2022-12-02T09:15:43 Deadline=N/A
 SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-12-01T09:15:43
 Partition=debug AllocNode:Sid=mn01:9703
 ReqNodeList=(null) ExcNodeList=(null)
 NodeList=cn[005-006]
 BatchHost=cn005
 NumNodes=2 NumCPUs=72 NumTasks=72 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
 TRES=cpu=72,node=2,billing=72
 Socks/Node=* NtasksPerN:B:S:C=0:0:18:* CoreSpec=*
 MinCPUsNode=1 MinMemoryCPU=4600M MinTmpDiskNode=0
 Features=(null) DelayBoot=00:00:00
 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
 Command=<PROJECT_PATH>/<USER_FOLDERS>/slurm.sh
 WorkDir=<PROJECT_PATH>/<USER_FOLDERS>
 StdErr=<PROJECT_PATH>/<USER_FOLDERS>/slurm-17551.err
 StdIn=/dev/null
 StdOut=<PROJECT_PATH>/<USER_FOLDERS>/slurm-17551.out
 Power=

Hold and Release Jobs

Submitted jobs that are not running yet, because they are in a pending state, can be put on hold by using the command

$ scontrol hold <jobid>

The same job can be released using

$ scontrol release <jobid>

3. Accounting

The user can always use sacct to see the CPU time used by his/her jobs by using, for example,

$ sacct --format=JobIdRaw,User,Partition,Submit,Start,Elapsed,AllocCPUS,CPUTime,CPUTimeRaw,MaxRSS,State,NodeList -S 2021-02-01 -E 2021-02-02

JobIDRaw      User  Partition              Submit               Start    Elapsed  AllocCPUS    CPUTime CPUTimeRAW     MaxRSS      State           NodeList
------------ --------- ---------- ------------------- ------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
2002              USER      debug 2021-02-01T15:42:30 2021-02-01T15:42:30   00:14:17        576 5-17:07:12     493632             COMPLETED     cn[029-044]
2002.batch                        2021-02-01T15:42:30 2021-02-01T15:42:30   00:14:17         36   08:34:12      30852      8792K  COMPLETED           cn029
2002.0                            2021-02-01T15:42:30 2021-02-01T15:42:30   00:14:17        512 5-01:53:04     438784    174720K  COMPLETED     cn[029-044]
2003              USER      debug 2021-02-01T15:44:13 2021-02-01T15:56:47   00:07:43       1152 6-04:09:36     533376             COMPLETED cn[020-027,029+
2003.batch                        2021-02-01T15:56:47 2021-02-01T15:56:47   00:07:43         36   04:37:48      16668     10104K  COMPLETED           cn020
2003.0                            2021-02-01T15:56:47 2021-02-01T15:56:47   00:07:43       1024 5-11:41:52     474112    134972K  COMPLETED cn[020-027,029+

For more information on the command sacct options at the terminal execute man sacct

The total computing time consumed by the users of a project, say ProjID, over a period of time, say from 01.01.2022 through 18.07.2022 is obtained using the command sreport

$ sreport -t Hours cluster AccountUtilizationByUser Accounts=projID start=1/1/22 format=Accounts,Login,Used,Energy

--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2022-01-01T00:00:00 - 2022-07-18T23:59:59
Usage reported in CPU Hours
--------------------------------------------------------------------------------
        Account     Login      Used     Energy
--------------- --------- --------- ----------
         projID              211007    2217368
         projID    user01     4030       45434
         projID    user01      1711      23285
         projID    user01     41505     525459
         projID    user02     58204     542022
         projID    user02    105558    1081168

This shows the computing time (Hours) and energy (Joules) consumed by the project members, user01 and user02 and by the project.

For further information see the user manual using man sreport.

4. Most Commonly Used Slurm Commands

sbatch

Submit a batch script (which can be a bash, Perl or Python script.

salloc

Request an allocation.

srun

Create a job step within an job

squeue

Query the list of pending and running jobs

scancel

Cancel pending or running jobs or to send signals to processes in running jobs or job steps. Use scancel <jobid>

scontrol

Query information about compute nodes and running or recently completed jobs. Can use scontrol show job <jobid>

sacct

Retrieve accounting information for jobs and job steps

sinfo

Retrieve information about the partitions and node states

sprio

Query job priorities

smap

Graphical display of the state of the partitions and nodes using a curses interface

sattach

Attach to the standard input, output or error of a running job

sstat

Query information about a running job