Biostatistics and Machine Learning II
Biostatistics and Machine Learning II
National course for PhD students, researchers, and other employees across Swedish universities who seek to deepen their biostatistical and machine learning skills. Building on the Introduction to Biostatistics and Machine Learning course, this course expands on common life science data analysis methods, including dimensionality reduction techniques beyond PCA, mixed-effects models for analysis of repeated measures, and survival analysis. We will also dive deeper into machine learning, covering more classification algorithms, ensemble techniques, optimization strategies and PLS methods for single and multi-omics data analysis.
Next course
- June 9th - 13th, 2025
- Trippelrummet (E10:1307-9), Navet, BMC, Husargatan 3, 751 23 Uppsala
Application & Registration of interest
- Application is now open here
- If you want to be notified when we organise next course in 2025, fill in this form
Important dates
- Application deadline: May 2nd, 2025
- Confirmation to accepted students: May 9th, 2025
- Course dates: June 9th - 13th, 2025
Course content
- Dimensionality reduction beyond PCA
- Classification algorithms & ensemble techniques
- Machine learning optimization strategies
- PLS-based methods for single and multi-omics data analysis
- Mixed-effect models for repeated measures, longitudinal studies and nested designs
- Survival analysis
- Introduction to neural networks
Learning outcomes
- Machine Learning Workflow: understand and implement core ML stages in R and Python, covering data preprocessing, model selection, training, and evaluation.
- Dimension Reduction: understand and apply advanced techniques like UMAP and t-SNE for high-dimensional data analysis and understand their relationship to PCA.
- Classification Models: implement and tune RF, SVM, and logistic regression models using grid search for classification tasks.
- Ensemble Methods: understand concepts of bagging, boosting, and stacking, and apply AdaBoost and XGBoost for classification and regression tasks.
- PLS Analysis: Implement PLS, PLS-DA, and sPLS for single- and multi-omics data, including variable selection.
- Mixed Effects Models: apply mixed models to complex biological data, focusing on repeated measures and longitudinal designs.
- Survival Analysis: understand censored data, calculate Kaplan-Meier estimators to estimate survival functions, compare survival curves, and perform regression analysis with Cox proportional hazards models, handling time-dependent covariates and competing risks.
- Gain foundational knowledge of CNNs and RNNs; understand LLMs in life sciences and apply pre-trained models for cell-type classification and gene expression prediction.
- Integration Challenge: synthesize course methods in a final challenge, implementing ML workflows and statistical models on real-world data.
Schedule
Preliminary course schedule can be found here.
Education
In this course we focus on an active learning approach. The education consists of teaching blocks alternating between mini-lectures, group discussions, live coding sessions etc.
Entry requirements
- Having basic knowledge of descriptive statistics, hypothesis testing and linear regression or having attended the Introduction to Biostatistics and Machine Learning course
- Basic R and Python data science skills
- BYOL (bring your own laptop)
More on R skills
- using R as calculator
- being able to work with vectors and matrices, incl. subsetting and matrices multiplication
- reading in data from .csv files, e.g. with read_csv(), printing top few rows or last few rows, e.g. with head() and tail()
- using in-built summary functions such as sum(), min() or max()
- being able to use documentation pages for R functions, e.g. with help() or ?()
- using if else statements, writing simple loops and functions.
- making simple plots (scatter plots, histograms), both with plot() and ggplot()
- using tidyverse() for data transformations, e.g. filtering rows, selecting columns, creating new columns etc.
- being able to install CRAN packages e.g. with install.packages()
- being familiar with R Markdown or Quatro format
More on Python skills
- familiarity with Python syntax, loops, functions
- numerical operations with NumPy for array and matrix computations
- data manipulation with pandas
- being able to visualise data using matplotlib and seaborn
Selection criteria
- Due to limited space the course can accommodate maximum of 24 participants. If we receive more applications, participants will be selected based on several criteria. Selection criteria include correct entry requirements, motivation to attend the course as well as gender and geographical balance.
- NBIS prioritises academic participants (students, staff, affiliated researchers) in Sweden. We can accept participants from industry and/or outside Sweden if we have seats available and the requirements criteria are met.
Fees
3000 SEK for academic participants
15 000 SEK for non-academic participants
includes lunches and coffee
Please note NBIS cannot invoice individuals
Travel info
For travel information and hotel bookings see Travel Information page
Course credits
-
Upon successful course completion, assessed based on active participation in all course session, we will issue a course certificate.
-
Please note that we are not able to provide any formal university credits (högskolepoäng). Many universities, however, recognize the attendance in our courses, and award 1.5 HPs, corresponding to 40h of studying. It is up to participants to clarify and arrange credit transfer with the relevant university department.
Teaching team
- Olga Dethlefsen «olga.dethlefsen@nbis.se»
- Payam Emami «payam.emami@nbis.se»
- Eva Freyhult «eva.freyhult@nbis.se»
- Miguel Redondo «miguel.angel.redondo@nbis.se»
- Julie Lorent «julie.lorent@nbis.se»
- Mun-Gwan Hong «mungwan.hong@nbis.se»
Contact us
- edu.ml-biostats@nbis.se
- olga.dethlefsen@nbis.se
- payam.emami@nbis.se
- eva.freyhult@nbis.se