Quick Start¶
1. Project Folder¶
After logged in the user finds himself in his home directory. Change to the project directory
cd /mnt/beegfs/projects/PROJECT_ID/USER_NAME
where PROJECT_ID is the project folder and USER_NAME is the user’s username.
2. Working Environment¶
Check the available modules in the machine by using
module av
Load the needed modules to create the working environment (see the modules section)
module load <module_name>
3. Software folder¶
The software directories are located at
/mnt/beegfs/apps/cn01470x/software
Using ls /mnt/beegfs/apps/cn01470x/software
a list of all software directories is displayed
ABINIT FlexiBLAS Julia NCO re2c
AmberTools Flye kim-api ncurses Rust
Anaconda3 FMS LAME ncview SAMtools
ANSYS_CFD fontconfig LAMMPS netCDF ScaFaCoS
ant foss LAPACK netcdf4-python ScaLAPACK
ANTLR freetype libarchive netCDF-C++4 scikit-bio
archspec FriBidi libcerf netCDF-Fortran scikit-build
arpack-ng futile libdap nettle scikit-learn
Arrow GATK libdrm networkx SciPy-bundle
ArviZ GCC libepoxy Ninja SCOTCH
ASE GCCcore libevent NLopt Siesta
ATK GDAL libfabric nodejs SimPEG
at-spi2-atk Gdk-Pixbuf libffi NSPR snakemake
at-spi2-core GEOS libFLAME NSS snappy
attr gettext libgd nsync spglib-python
Autoconf Ghostscript libgeotiff numactl SPOTPY
...
4. Compilation and Job Submission¶
In the project directory compile the code and submit it for execution using a SLURM script
sbatch <script_name.sh>.
A script example for runs with OpneMPI compiled with GCC:
#!/bin/bash
#SBATCH --time=00:40:00
#SBATCH --account=astro_00
#SBATCH --job-name=JOB_NAME
#SBATCH --output=JOB_NAME_%j.out
#SBATCH --error=JOB_NAME_%j.error
#SBATCH --nodes=32
#SBATCH --ntasks=1024
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-socket=16
#SBATCH --exclusive
#SBATCH --partition=debug
export PMIX_MCA_psec=native
srun ./code_executable
In this script we are setting the number of MPI tasks (ntasks), the number of cores per task (cpus-per-task) and the number of tasks per CPU also referred as socket (ntasks-per-socket). So, this script imposes that 1 core executes 1 MPI task. The compute nodes are being used exclusively by this run (option exclusive), and the queue, which in SLURM is called partition, is the debug queue. Finally the code is executed using srun.
5. Available Resources and Jobs in the Queue¶
To see what compute nodes ara vailable use
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
private* up 3-00:00:00 2 down* cn[076,080]
private* up 3-00:00:00 2 mix cn[025,030]
private* up 3-00:00:00 12 alloc cn[013-018,021,023,026-029]
private* up 3-00:00:00 71 idle cn[002-012,019-020,022,024,031-075,077-079,081-088]
private* up 3-00:00:00 1 down cn001
debug up 2-00:00:00 2 mix cn[025,030]
debug up 2-00:00:00 12 alloc cn[013-018,021,023,026-029]
debug up 2-00:00:00 43 idle cn[002-012,019-020,022,024,031-058]
debug up 2-00:00:00 1 down cn001
short up 3-00:00:00 6 alloc cn[013-018]
short up 3-00:00:00 13 idle cn[002-012,019-020]
short up 3-00:00:00 1 down cn001
medium up 2-00:00:00 2 mix cn[025,030]
medium up 2-00:00:00 6 alloc cn[021,023,026-029]
medium up 2-00:00:00 30 idle cn[022,024,031-058]
To learn the meaning of states down, mix, alloc, and idle read the manual pages by issuing the command man sinfo
.
To check if a job is in the queue to run just execute
$ squeue | grep USER_NAME
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
16868 debug job1 USER_NAME R 5:54:10 1 cn013
16867 debug job2 USER_NAME R 5:54:15 1 cn012
16866 debug job3 USER_NAME R 5:54:21 8 cn[001-008]
6. Consumed CPU time¶
The user can always use sacct to see the CPU time used by the job by using, for example,
$ sacct --format=JobIdRaw,User,Partition,Submit,Start,Elapsed,AllocCPUS,CPUTime,CPUTimeRaw,MaxRSS,State,NodeList -S 2021-02-01 -E 2021-02-02
JobIDRaw User Partition Submit Start Elapsed AllocCPUS CPUTime CPUTimeRAW MaxRSS State NodeList
------------ --------- ---------- ------------------- ------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
2002 USER debug 2021-02-01T15:42:30 2021-02-01T15:42:30 00:14:17 576 5-17:07:12 493632 COMPLETED cn[029-044]
2002.batch 2021-02-01T15:42:30 2021-02-01T15:42:30 00:14:17 36 08:34:12 30852 8792K COMPLETED cn029
2002.0 2021-02-01T15:42:30 2021-02-01T15:42:30 00:14:17 512 5-01:53:04 438784 174720K COMPLETED cn[029-044]
2003 USER debug 2021-02-01T15:44:13 2021-02-01T15:56:47 00:07:43 1152 6-04:09:36 533376 COMPLETED cn[020-027,029+
2003.batch 2021-02-01T15:56:47 2021-02-01T15:56:47 00:07:43 36 04:37:48 16668 10104K COMPLETED cn020
2003.0 2021-02-01T15:56:47 2021-02-01T15:56:47 00:07:43 1024 5-11:41:52 474112 134972K COMPLETED cn[020-027,029+
For more information on the command sacct options at the terminal execute
man sacct