Machine Learning for Life Sciences
Machine Learning for Life Sciences
The course provides an introduction to machine learning methods and workflows for life science research. It introduces the full end-to-end machine learning (ML) workflow, from data preprocessing and feature engineering to model training, evaluation, interpretation, and reproducible reporting, with a focus on the analysis of complex, high-dimensional biological data. Participants explore biological datasets using unsupervised methods such as dimensionality reduction and clustering, and build predictive models using supervised approaches including linear and tree-based models. Methods for multi-omics integration, including partial least squares (PLS), are introduced together with specialized modeling settings relevant to life sciences, such as mixed-effects models and survival analysis.
Next course
- May 4th - 8th, 2026
- Trippelrummet (E10:1307-9), Navet, BMC, Husargatan 3, 751 23 Uppsala
Application
- Applications will open soon.
- Fill in this form, and we'll notify you when we open the application.
Important dates
- Application deadline: March 29th, 2026
- Confirmation to accepted students: April 1st, 2026
- Course dates: May 4th - 8th, 2026
Course content
- Overview of the machine learning workflow
- Dimensionality reduction methods such as PCA and UMAP
- Unsupervised learning and clustering methods
- Supervised learning models, including tree-based models
- Partial least squares (PLS) for multi-omics integration
- Mixed-effects models for analysis of repeated-measures and longitudinal data
- Survival analysis methods for time-to-event data
- Model training, evaluation and validation strategies
- Model interpretation and explainable machine learning methods
Learning outcomes
-
After completing the course, participants will be able to:
- Explain the main components of the machine learning workflow and their role in life science research
- Perform data preprocessing and exploratory analysis of high-dimensional biological datasets
- Apply unsupervised learning methods to discover structure and generate biological hypotheses
- Train, evaluate, and compare supervised learning models commonly used in life sciences
- Apply specialized modeling approaches, including mixed-effects models for repeated measures and survival analysis for time-to-event data
- Assess model performance using appropriate evaluation metrics and validation strategies
- Interpret and communicate model results using explainable machine learning techniques
- Apply basic principles of reproducible and FAIR machine learning workflows
- Collaborate in interdisciplinary teams to design, implement, and present an ML-based data analysis
Schedule
Preliminary course schedule can be found here.
Course format
The course is delivered through a combination of online and on-site teaching activities. It includes two preparatory online sessions (ca. 3h) held prior to the on-site module, an intensive five-day on-site meeting in Uppsala, and a concluding online session for project presentations and discussion. Teaching formats include lectures, live coding sessions, hands-on practical exercises, group discussions, and group-based mini-project work. Participants are expected to bring their own laptops for hands-on sessions.
Assessment
Examination consists of active participation in course activities, completion of a group-based mini-project, and a presentation of the mini-project.
Prerequisites
- Basic programming skills in R or Python, including working with data frames and running scripts
- Prior exposure to basic statistical concepts (e.g. descriptive statistics, linear regression)
- Familiarity with data analysis environments such as RStudio or Jupyter Notebooks
No prior experience with machine learning is required.
More on R and Python skills:
- Basic syntax and arithmetic (using the language as a calculator) (R:
1 + 2; Python:1 + 2) - Core data structures: vectors/arrays, matrices, and data frames, including subsetting and basic matrix operations (R: vectors, matrices, data frames; Python: NumPy arrays, pandas DataFrames)
- Reading data and managing files: (R:
read_csv(), relative paths; Python:pandas.read_csv(), relative paths) - Inspecting and summarising data: (R:
head(),tail(),sum(),min(),max(); Python:head(),tail(),sum(),min(),max()) - Handling missing values (R:
NA,na.rm = TRUE; Python:NaN,isna()) - Writing simple control flow and functions (R:
if/else, loops, functions; Python:if/else, loops, functions) - Finding and using documentation (R:
help(),?; Python:help(), docstrings) - Installing and loading/importing external packages (R:
install.packages(),library(); Python:pip/conda,import) - Data transformation and manipulation (filtering rows, selecting columns, creating new variables) (R:
tidyverse; Python:pandas) - Creating and interpreting basic plots, including simple customisation (labels, titles): (R:
plot(),ggplot2; Python:matplotlib,seaborn) - Basic familiarity with reproducible documents: (R: R Markdown / Quarto; Python: Quarto / Jupyter)
Fees
- DDLS RS students: free
- Academic participants: 3000 SEK
- Non-academic participants: 15 000 SEK
includes lunches and coffee
Please note NBIS cannot invoice individuals
Travel info
For travel information and hotel bookings see Travel Information page
Course credits
-
3 credits
Teaching team
- Olga Dethlefsen «olga.dethlefsen@scilifelab.se»
- Payam Emami «payam.emami@scilifelab.se»
- Eva Freyhult «eva.freyhult@nbis.se»
- Miguel Redondo «miguel.angel.redondo@nbis.se»
- Julie Lorent «julie.lorent@nbis.se»
- Mun-Gwan Hong «mungwan.hong@nbis.se»
Contact us
For questions regarding the course, please contact the course leaders at edu.ml-biostats@nbis.se or olga.dethlefsen@scilifelab.se.
This course content is offered under a CC attribution share alike license. Content in this course can be considered under this license unless otherwise noted.