Biostatistics and Machine Learning II

National course for PhD students, researchers, and other employees across Swedish universities who seek to deepen their biostatistical and machine learning skills. Building on the Introduction to Biostatistics and Machine Learning course, this course expands on common life science data analysis methods, including dimensionality reduction techniques beyond PCA, mixed-effects models for analysis of repeated measures, and survival analysis. We will also dive deeper into machine learning, covering more classification algorithms, ensemble techniques, optimization strategies and PLS methods for single and multi-omics data analysis.

Next course

June 9th - 13th, 2025
Trippelrummet (E10:1307-9), Navet, BMC, Husargatan 3, 751 23 Uppsala

Application & Registration of interest

Application is now open here
If you want to be notified when we organise next course in 2025, fill in this form

Important dates

Application deadline: May 2nd, 2025
Confirmation to accepted students: May 9th, 2025
Course dates: June 9th - 13th, 2025

Course content

Dimensionality reduction beyond PCA
Classification algorithms & ensemble techniques
Machine learning optimization strategies
PLS-based methods for single and multi-omics data analysis
Mixed-effect models for repeated measures, longitudinal studies and nested designs
Survival analysis
Introduction to neural networks

Learning outcomes

Machine Learning Workflow: understand and implement core ML stages in R and Python, covering data preprocessing, model selection, training, and evaluation.
Dimension Reduction: understand and apply advanced techniques like UMAP and t-SNE for high-dimensional data analysis and understand their relationship to PCA.
Classification Models: implement and tune RF, SVM, and logistic regression models using grid search for classification tasks.
Ensemble Methods: understand concepts of bagging, boosting, and stacking, and apply AdaBoost and XGBoost for classification and regression tasks.
PLS Analysis: Implement PLS, PLS-DA, and sPLS for single- and multi-omics data, including variable selection.
Mixed Effects Models: apply mixed models to complex biological data, focusing on repeated measures and longitudinal designs.
Survival Analysis: understand censored data, calculate Kaplan-Meier estimators to estimate survival functions, compare survival curves, and perform regression analysis with Cox proportional hazards models, handling time-dependent covariates and competing risks.
Gain foundational knowledge of CNNs and RNNs; understand LLMs in life sciences and apply pre-trained models for cell-type classification and gene expression prediction.
Integration Challenge: synthesize course methods in a final challenge, implementing ML workflows and statistical models on real-world data.

Schedule

Preliminary course schedule can be found here.

Education

In this course we focus on an active learning approach. The education consists of teaching blocks alternating between mini-lectures, group discussions, live coding sessions etc.

Entry requirements

Having basic knowledge of descriptive statistics, hypothesis testing and linear regression or having attended the Introduction to Biostatistics and Machine Learning course
Basic R and Python data science skills
BYOL (bring your own laptop)

More on R skills

using R as calculator
being able to work with vectors and matrices, incl. subsetting and matrices multiplication
reading in data from .csv files, e.g. with read_csv(), printing top few rows or last few rows, e.g. with head() and tail()
using in-built summary functions such as sum(), min() or max()
being able to use documentation pages for R functions, e.g. with help() or ?()
using if else statements, writing simple loops and functions.
making simple plots (scatter plots, histograms), both with plot() and ggplot()
using tidyverse() for data transformations, e.g. filtering rows, selecting columns, creating new columns etc.
being able to install CRAN packages e.g. with install.packages()
being familiar with R Markdown or Quatro format

More on Python skills

familiarity with Python syntax, loops, functions
numerical operations with NumPy for array and matrix computations
data manipulation with pandas
being able to visualise data using matplotlib and seaborn

Selection criteria

Due to limited space the course can accommodate maximum of 24 participants. If we receive more applications, participants will be selected based on several criteria. Selection criteria include correct entry requirements, motivation to attend the course as well as gender and geographical balance.
NBIS prioritises academic participants (students, staff, affiliated researchers) in Sweden. We can accept participants from industry and/or outside Sweden if we have seats available and the requirements criteria are met.

Fees

3000 SEK for academic participants

15 000 SEK for non-academic participants

includes lunches and coffee

Please note NBIS cannot invoice individuals

Travel info

For travel information and hotel bookings see Travel Information page

Course credits

Upon successful course completion, assessed based on active participation in all course session, we will issue a course certificate.
Please note that we are not able to provide any formal university credits (högskolepoäng). Many universities, however, recognize the attendance in our courses, and award 1.5 HPs, corresponding to 40h of studying. It is up to participants to clarify and arrange credit transfer with the relevant university department.

Teaching team

Olga Dethlefsen «olga.dethlefsen@nbis.se»
Payam Emami «payam.emami@nbis.se»
Eva Freyhult «eva.freyhult@nbis.se»
Miguel Redondo «miguel.angel.redondo@nbis.se»
Julie Lorent «julie.lorent@nbis.se»
Mun-Gwan Hong «mungwan.hong@nbis.se»

Contact us

edu.ml-biostats@nbis.se
olga.dethlefsen@nbis.se
payam.emami@nbis.se
eva.freyhult@nbis.se

This course content is offered under a CC attribution share alike license. Content in this course can be considered under this license unless otherwise noted.