Uppmax basics - Submitting jobs

Slurm, sbatch, the job queue

  • Problem: 1000 users, 500 nodes, 10k cores
    • Need a queue:

queue1.png

x-axis: cores, one thread per core
y-axis: time

  • We use the SLURM tool to handle the queue
  • plan your job and but in the slurm job batch (sbatch)
    • sbatch <flags> <program> or
    • sbatch <job script>
  • Easiest to schedule single-threaded, short jobs

queue2.pngqueue3.png

Left: 4 one-core jobs can run immediately (or a 4-core wide job).

  • The jobs are too long to fit in core number 9-13.

Right: A 5-core job has to wait.

  • Too long to fit in cores 9-13 and too wide to fit in the last cores.

 

Jobs

  • Job = what will happen during booked time
  • Described in a Bash script file
    • Slurm parameters (flags)
    • Load software modules
    • (Move around file system)
    • Run programs
    • (Collect output)
  • ... and more

 

Slurm parameters

  • We present the basics here. On Wednesday afternoon we go deeper!

Most important slurm flags

  • 1 mandatory setting for jobs:
    • Which compute project? (-A)
  • 3 settings you really should set:
    • Type of queue? (-p)
    • How many cores? (-n)
    • How long at most? (-t)
  • If in doubt:
    • -p core
    • -n 1
    • -t 7-00:00:00

Type of queue -p

  • Where should it run?
    • several nodes: -p node
    • within one node: -p core
  • Use a whole node or just part of it?
    • 1 node = 20 cores (16 on Bianca & Snowy)
    • 1 hour walltime = 20 core hours = rather expensive
    • Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
  • Short test jobs (< 1 hour)
    • -p devcore (within one node)
    • -p devel (across nodes)
  • Default value: core

 

How long is the job? -t

  • Jobs killed when timelimit reached
  • Only charged for time used
  • -t = time (hh:mm:ss)
    • 78:00:00 or 3-6:00:00
  • Default value: 7-00:00:00
  • Always do an overistimation by ~50%.
    • things can take longer sometimes

Efficient jobs

  • Use your booked cores or memory
    • (at least 50%)
  • Runtime longer than 1 hour
    • Combine shorter jobs if possible
  • Ask UPPMAX support for help!

 

Interactive jobs

  • Most work is most effective as submitted jobs, but e.g. development needs responsiveness.
  • Interactive jobs are high-priority but limited in -n and -t
  • Quickly gives you a job and logs you in to the compute node
  • Requires same Slurm parameters as other jobs
  • Try it:
$ interactive -A snic2022-22-50 -p core -n 1 -t 10:00
  • Which node are you on?
    • Logout with Ctrl-D or 'exit'

 

A simple job script template


#!/bin/bash -l 
#This tells it is bash language and -l is for starting a session with a "clean environment, e.g. with no modules loaded and paths reset"

#SBATCH -A snic2022-22-50  # Project name

#SBATCH -p devcore  # Asking for cores (for test jobs and as opposed to multiple nodes)

#SBATCH -n 1  # Number of cores

#SBATCH -t 00:10:00  # Ten minutes

#SBATCH -J Template_script  # Name of the job

# go to some directory

cd /proj/introtouppmax/completed/
pwd
echo ""
ls -l
echo ""
# load software modules

module load bioinfo-tools
module list

# do something

echo Hello world!  
echo ""

Other Slurm tools

 

More on Wednesday!

Exercise: Submitting a job

  • Copy the just further up!
  • Put it into a file named “jobtemplate.sh”
  • Make the file executable (chmod)
  • Submit the job:
$ sbatch jobtemplate.sh
  • Note the job id!
  • Check the queue:
$ squeue -u <username>
$
jobinfo -u <username>
  • When it’s done, look for the output file (slurm-<jobid>.out):
$ ls -lrt
  • Check the output file to see if it ran correctly
$ cat <filename>