Biostatistics and Machine Learning II

course-03.png

 

 Biostatistics and Machine Learning II

National course for PhD students, researchers, and other employees across Swedish universities who seek to deepen their biostatistical and machine learning skills. Building on the Introduction to Biostatistics and Machine Learning course, this course expands on common  life science data analysis methods, including dimensionality reduction techniques beyond PCA, mixed-effects models for analysis of  repeated measures, and survival analysis.  We will also dive deeper into machine learning, covering more classification algorithms, ensemble techniques, optimization strategies and PLS methods for single and multi-omics data analysis.

 

Next course

  • June 9th - 13th, 2025
  • Trippelrummet (E10:1307-9), Navet, BMC, Husargatan 3, 751 23 Uppsala

 

 Application & Registration of interest
  • Application is now open here
  • If you want to be notified when we organise next course in 2025, fill in this form

 

 Important dates
  • Application deadline: May 2nd, 2025
  • Confirmation to accepted students: May 9th, 2025
  • Course dates: June 9th - 13th, 2025

 

 Course content
  • Dimensionality reduction beyond PCA
  • Classification algorithms & ensemble techniques
  • Machine learning optimization strategies
  • PLS-based methods for single and multi-omics data analysis
  • Mixed-effect models for repeated measures, longitudinal studies and nested designs 
  • Survival analysis
  • Introduction to neural networks

 

outcomes-svgrepo-com.svg Learning outcomes
  • Machine Learning Workflow: understand and implement core ML stages in R and Python, covering data preprocessing, model selection, training, and evaluation.
  • Dimension Reduction: understand and apply advanced techniques like UMAP and t-SNE for high-dimensional data analysis and understand their relationship to PCA.
  • Classification Models: implement and tune RF, SVM, and logistic regression models using grid search for classification tasks.
  • Ensemble Methods: understand concepts of bagging, boosting, and stacking, and apply AdaBoost and XGBoost for classification and regression tasks.
  • PLS Analysis: Implement PLS, PLS-DA, and sPLS for single- and multi-omics data, including variable selection.
  • Mixed Effects Models: apply mixed models to complex biological data, focusing on repeated measures and longitudinal designs.
  • Survival Analysis: understand censored data, calculate Kaplan-Meier estimators to estimate survival functions, compare survival curves, and perform regression analysis with Cox proportional hazards models, handling time-dependent covariates and competing risks.
  • Gain foundational knowledge of CNNs and RNNs; understand LLMs in life sciences and apply pre-trained models for cell-type classification and gene expression prediction.
  • Integration Challenge: synthesize course methods in a final challenge, implementing ML workflows and statistical models on real-world data.

 

 Schedule

Preliminary course schedule can be found here.

 

 Education

In this course we focus on an active learning approach. The education consists of teaching blocks alternating between mini-lectures, group discussions, live coding sessions etc.

 

Entry requirements

 

More on R skills

  • using R as calculator
  • being able to work with vectors and matrices, incl. subsetting and matrices multiplication 
  • reading in data from .csv files, e.g. with read_csv(), printing top few rows or last few rows, e.g. with head() and tail()
  • using in-built summary functions such as sum(), min() or max()
  • being able to use documentation pages for R functions, e.g. with help() or ?()
  • using if else statements, writing simple loops and functions.
  • making simple plots (scatter plots, histograms), both with plot() and ggplot()
  • using tidyverse() for data transformations, e.g. filtering rows, selecting columns, creating new columns etc. 
  • being able to install CRAN packages e.g. with install.packages()
  • being familiar with R Markdown or Quatro format

More on Python skills

  • familiarity with Python syntax, loops, functions
  • numerical operations with NumPy for array and matrix computations
  • data manipulation with pandas
  • being able to visualise data using matplotlib and seaborn

 

 Selection criteria
  • Due to limited space the course can accommodate maximum of 24 participants. If we receive more applications, participants will be selected based on several criteria. Selection criteria include correct entry requirements, motivation to attend the course as well as gender and geographical balance.
  • NBIS prioritises academic participants (students, staff, affiliated researchers) in Sweden. We can accept participants from industry and/or outside Sweden if we have seats available and the requirements criteria are met.

 

 Fees

3000 SEK for academic participants 

15 000 SEK for non-academic participants

includes lunches and coffee 

Please note NBIS cannot invoice individuals

 

travel-main.svg Travel info

For travel information and hotel bookings see Travel Information page 

 

 Course credits
  • Upon successful course completion, assessed based on active participation in all course session, we will issue a course certificate.

  • Please note that we are not able to provide any formal university credits (högskolepoäng). Many universities, however, recognize the attendance in our courses, and award 1.5 HPs, corresponding to 40h of studying. It is up to participants to clarify and arrange credit transfer with the relevant university department.

 

 Teaching team
  • Olga Dethlefsen «olga.dethlefsen@nbis.se»
  • Payam Emami «payam.emami@nbis.se»
  • Eva Freyhult «eva.freyhult@nbis.se»
  • Miguel Redondo «miguel.angel.redondo@nbis.se»
  • Julie Lorent «julie.lorent@nbis.se»
  • Mun-Gwan Hong «mungwan.hong@nbis.se»

 

 Contact us
CC attribution share alike This course content is offered under a CC attribution share alike license. Content in this course can be considered under this license unless otherwise noted.