Uppmax basics - Submitting jobs
Slurm, sbatch, the job queue
- Problem: 1000 users, 500 nodes, 10k cores
- Need a queue:
x-axis: cores, one thread per core
y-axis: time
- We use the SLURM tool to handle the queue
- plan your job and but in the slurm job batch (sbatch)
- sbatch <flags> <program> or
- sbatch <job script>
- Easiest to schedule single-threaded, short jobs
Left: 4 one-core jobs can run immediately (or a 4-core wide job).
- The jobs are too long to fit in core number 9-13.
Right: A 5-core job has to wait.
- Too long to fit in cores 9-13 and too wide to fit in the last cores.
Jobs
- Job = what will happen during booked time
- Described in a Bash script file
- Slurm parameters (flags)
- Load software modules
- (Move around file system)
- Run programs
- (Collect output)
- ... and more
Slurm parameters
- We present the basics here. On Wednesday afternoon we go deeper!
Most important slurm flags
- 1 mandatory setting for jobs:
- Which compute project? (-A)
- 3 settings you really should set:
- Type of queue? (-p)
- How many cores? (-n)
- How long at most? (-t)
- If in doubt:
- -p core
- -n 1
- -t 7-00:00:00
Type of queue -p
- Where should it run?
- several nodes: -p node
- within one node: -p core
- Use a whole node or just part of it?
- 1 node = 20 cores (16 on Bianca & Snowy)
- 1 hour walltime = 20 core hours = rather expensive
- Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
- Short test jobs (< 1 hour)
- -p devcore (within one node)
- -p devel (across nodes)
- Default value: core
How long is the job? -t
- Jobs killed when timelimit reached
- Only charged for time used
- -t = time (hh:mm:ss)
- 78:00:00 or 3-6:00:00
- Default value: 7-00:00:00
- Always do an overistimation by ~50%.
- things can take longer sometimes
Efficient jobs
- Use your booked cores or memory
- (at least 50%)
- Runtime longer than 1 hour
- Combine shorter jobs if possible
- Ask UPPMAX support for help!
Interactive jobs
- Most work is most effective as submitted jobs, but e.g. development needs responsiveness.
- Interactive jobs are high-priority but limited in -n and -t
- Quickly gives you a job and logs you in to the compute node
- Requires same Slurm parameters as other jobs
- Try it:
$ interactive -A snic2022-22-50
-p core -n 1 -t 10:00
- Which node are you on?
- Logout with Ctrl-D or 'exit'
A simple job script template
#!/bin/bash -l
#This tells it is bash language and -l is for starting a session with a "clean environment, e.g. with no modules loaded and paths reset"
#SBATCH -A snic2022-22-50 # Project name
#SBATCH -p devcore # Asking for cores (for test jobs and as opposed to multiple nodes)
#SBATCH -n 1 # Number of cores
#SBATCH -t 00:10:00 # Ten minutes
#SBATCH -J Template_script # Name of the job
# go to some directory
cd /proj/introtouppmax/completed/
pwd
echo ""
ls -l
echo ""
# load software modules
module load bioinfo-tools
module list
# do something
echo Hello world!
echo ""
Other Slurm tools
- Squeue — quick info about jobs in queue
- Jobinfo — detailed info about jobs
- Finishedjobinfo — summary of finished jobs
- Jobstats — efficiency of booked resources
- Slurm at UPPMAX user guide Links to an external site.
More on Wednesday!