Machine Learning for Life Sciences

The course provides an introduction to machine learning methods and workflows for life science research. It introduces the full end-to-end machine learning (ML) workflow, from data preprocessing and feature engineering to model training, evaluation, interpretation, and reproducible reporting, with a focus on the analysis of complex, high-dimensional biological data. Participants explore biological datasets using unsupervised methods such as dimensionality reduction and clustering, and build predictive models using supervised approaches including linear and tree-based models. Methods for multi-omics integration, including partial least squares (PLS), are introduced together with specialized modeling settings relevant to life sciences, such as mixed-effects models and survival analysis.

Next course

May 4th - 8th, 2026
Trippelrummet (E10:1307-9), Navet, BMC, Husargatan 3, 751 23 Uppsala

Application

Applications will open soon.
Fill in this form, and we'll notify you when we open the application.

Important dates

Application deadline: March 29th, 2026
Confirmation to accepted students: April 1st, 2026
Course dates: May 4th - 8th, 2026

Course content

Overview of the machine learning workflow
Dimensionality reduction methods such as PCA and UMAP
Unsupervised learning and clustering methods
Supervised learning models, including tree-based models
Partial least squares (PLS) for multi-omics integration
Mixed-effects models for analysis of repeated-measures and longitudinal data
Survival analysis methods for time-to-event data
Model training, evaluation and validation strategies
Model interpretation and explainable machine learning methods

Learning outcomes

After completing the course, participants will be able to:
- Explain the main components of the machine learning workflow and their role in life science research
- Perform data preprocessing and exploratory analysis of high-dimensional biological datasets
- Apply unsupervised learning methods to discover structure and generate biological hypotheses
- Train, evaluate, and compare supervised learning models commonly used in life sciences
- Apply specialized modeling approaches, including mixed-effects models for repeated measures and survival analysis for time-to-event data
- Assess model performance using appropriate evaluation metrics and validation strategies
- Interpret and communicate model results using explainable machine learning techniques
- Apply basic principles of reproducible and FAIR machine learning workflows
- Collaborate in interdisciplinary teams to design, implement, and present an ML-based data analysis

Schedule

Preliminary course schedule can be found here.

Course format

The course is delivered through a combination of online and on-site teaching activities. It includes two preparatory online sessions (ca. 3h) held prior to the on-site module, an intensive five-day on-site meeting in Uppsala, and a concluding online session for project presentations and discussion. Teaching formats include lectures, live coding sessions, hands-on practical exercises, group discussions, and group-based mini-project work. Participants are expected to bring their own laptops for hands-on sessions.

Assessment

Examination consists of active participation in course activities, completion of a group-based mini-project, and a presentation of the mini-project.

Prerequisites

Basic programming skills in R or Python, including working with data frames and running scripts
Prior exposure to basic statistical concepts (e.g. descriptive statistics, linear regression)
Familiarity with data analysis environments such as RStudio or Jupyter Notebooks

No prior experience with machine learning is required.

More on R and Python skills:

Basic syntax and arithmetic (using the language as a calculator) (R: 1 + 2; Python: 1 + 2)
Core data structures: vectors/arrays, matrices, and data frames, including subsetting and basic matrix operations (R: vectors, matrices, data frames; Python: NumPy arrays, pandas DataFrames)
Reading data and managing files: (R: read_csv(), relative paths; Python: pandas.read_csv(), relative paths)
Inspecting and summarising data: (R: head(), tail(), sum(), min(), max(); Python: head(), tail(), sum(), min(), max())
Handling missing values (R: NA, na.rm = TRUE; Python: NaN, isna())
Writing simple control flow and functions (R: if/else, loops, functions; Python: if/else, loops, functions)
Finding and using documentation (R: help(), ?; Python: help(), docstrings)
Installing and loading/importing external packages (R: install.packages(), library(); Python: pip / conda, import)
Data transformation and manipulation (filtering rows, selecting columns, creating new variables) (R: tidyverse; Python: pandas)
Creating and interpreting basic plots, including simple customisation (labels, titles): (R: plot(), ggplot2; Python: matplotlib, seaborn)
Basic familiarity with reproducible documents: (R: R Markdown / Quarto; Python: Quarto / Jupyter)

Fees

DDLS RS students: free
Academic participants: 3000 SEK
Non-academic participants: 15 000 SEK

includes lunches and coffee

Please note NBIS cannot invoice individuals

Travel info

For travel information and hotel bookings see Travel Information page

Course credits

3 credits

Teaching team

Olga Dethlefsen «olga.dethlefsen@scilifelab.se»
Payam Emami «payam.emami@scilifelab.se»
Eva Freyhult «eva.freyhult@nbis.se»
Miguel Redondo «miguel.angel.redondo@nbis.se»
Julie Lorent «julie.lorent@nbis.se»
Mun-Gwan Hong «mungwan.hong@nbis.se»

Contact us

For questions regarding the course, please contact the course leaders at edu.ml-biostats@nbis.se or olga.dethlefsen@scilifelab.se.

This course content is offered under a CC attribution share alike license. Content in this course can be considered under this license unless otherwise noted.