Syllabus for Maskininlärning inom språkteknologi 5LN708 LN708 VT2021

Course syllabus

_MG_0576 cropped.jpg

Welcome to this course on the fundamentals of machine learning for natural language processing!

This course is divided into 5 modules. The first four modules, each consisting of three lectures and one assignment, will guide you from simple text encoding up to neural models. The last module consisting of one lecture and a text seminar, where you will be given an introduction to ethics in machine learning. Each lecture will typically consist of 45-60 minutes of theory, a mini-lab, and finally, a presentation/discussion of mini-lab results during the last 5-15 minutes. The mini-labs are there to anchor newly acquired theoretical knowledge to a real task, but also to incrementally build up a code base for future experiments. Each mini-lab usually has several design/data choices. Note that there is no expectation of “finishing” a mini-lab during the lecture. Please experiment with them outside of class as an exercise. Course examination is done though four assignments and a text seminar.

The course has two tracks for readings. The fundamental track gives you an understanding of the course content while also reviewing the basics (from earlier courses). The advanced track goes beyond the material covered in the lectures. These extra readings are completely voluntary and are not required for passing the course and will not be discussed in-depth in the lectures. You are, however, welcome to ask questions about them.

Lectures

All lectures will be given both on campus and by video link (invitation). Each module has its own lecture plan with more detailed descriptions of the lectures, with reading material, slides, and code.

Module 1: Fundamentals of modelling

In the first part of the course, we will discuss the basics of modelling. The core concepts are: encoding text as vectors, basic classification and regression models, and how to choose parameters for these models. Some themes in this module are: [lecture plan]

Vector spaces for data
- Bag-of-Word
- Word embedding
Regression
- Linear data vs linear modelling
- Maximum likelihood vs Maximum a Posteriori
Classification
- Logistic regression
- Decision trees
- Nearest Neighbours
Training
- Gradient descent
- Loss functions

A visualization of "Alice in Wonderland" using a t-SNE embedding of GloVe vectors. Blobs are coloured by POS tag and scaled in proportional to the word frequency. [code]

Module 2: Model Selection & (Un-)Supervised Learning

EM to fit a Gaussian mixture model to some data. The model does not know the labelling during training but finds the structure by itself. [code]

The main focus of this part of the course is to get deeper into different types of models and ways of learning. We will talk about how to create a model without labelled data and model parameter sensitivity. Maybe mos importantly, we will try out several types of classifiers (nonlinear, structured prediction, ) that have been successful in NLP. Some themes in this module are: [lecture plan]

Model selection
- Training/Validation/Test sets
- Hyper-parameter search
More classification
- Structured prediction
- Multi label
- SVM
Unsupervised learning
- Clustering
- Mixture models
- Expectation maximization (EM)

Module 3: Fundamentals of Neural networks

This part will be about the core components of neural networks. We will talk about designing and training small networks to solve NLP problems such as POS tagging. Some themes in this module are: [lecture plan]

Neural modelling
- Feed forward networks
- Recurrent networks
Network training
- Momentum
- Regularization
- GPU vs CPU
Loading and preparing data

Predictions changing during training for a simple feedforward network. [code]

Module 4: Machine Learning Applications

As this course doesn't presuppose any knowledge of machine learning, the first three parts are dedicated to grasping the basics of the multitude of concepts in ML. With a good grasp of the fundamentals, we can now focus more on applications of, primarily, neural models. Some themes in this module are: [lecture plan]

Convolutional neural networks for text
Sequence to sequence models
Embeddings
- Character embedding
- Semantic embedding
- Contextualized embedding
Computer vision

Model 5: Ethics and ML

Teaching modern machine learning without some insight into ethics could be considered unethical. Also, "AI ethics" is increasingly being written about and researched. In this part, you will be given a very brief introduction to ethical philosophy. Focus will be on how to think, not what to think. [lecture plan]

Fundamentals of ethical philosophy
- Virtue ethics
- Deontological ethics
- Consequentialist ethics
Trade-offs for personal cost
Ethics of software and machinery
Responsibility when following orders
Bias

Literature

For the first two parts (half of the course), we will be using the book “An Introduction to Statistical Learning: with Applications in R” as the main literature. This will be abbreviated as ST in the reading instructions. The implementations in the book are written in R, which is a programming language that is especially popular among statisticians and bioinformaticians. We will, however, keep using python, as it is much more popular in computational linguistics (among other fields). You can find reimplementations of the examples from the book in python here (I haven't looked at them properly, so I can't vouch for the quality). The book is available for free from the authors' website. In addition, you will be given several items for reading per lecture. These can be found under "literature" for each lecture/assignment. All reading items will be split into a fundamentals track and an advanced track. The advanced track reading items will be marked with "(A)" and are voluntary. The course is designed so that you can pass (G) without having read the advanced track. In addition, we will make heavy use of manuals for the respective software packages used throughout the course.

ST: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). “An Introduction to Statistical Learning: with Applications in R.” Springer New York., https://statlearning.com/

Intended Learning Outcomes

The course syllabus states five learning outcomes. Here follows a short description of their relation to the course material.

1. apply basic principles of machine learning to natural language data;

The majority of the data we will be working on is natural language. Starting from the first lecture, several ways of encoding this type of data will be discussed. The span will be from binary bags-of-words (lectures 1&3, assignment 1) to different types of embeddings (lectures 1&8+, assignment 3&4). Basic methodology like cross validation also falls under this learning outcome.

2. apply probability theory and statistic inference on linguistic data;

Basic n-gram models and word statistics have been used in earlier courses. Here, these concepts will be extended to both vector spaces and as input to probabilistic classifiers (e.g., Naive Bayes). Several lectures will discuss probabilistic perspectives on both feature and parameters spaces, starting with lecture 2.

3. use standard software packages for machine learning;

In order to work more with the core functionality of models, the course does not involve too many ML frameworks. Avoiding black boxes is of weight when studying the basics. However, several external libraries will be introduced, though most modelling is going to be written in sklearn and pyTorch. This learning outcome also includes code quality. Using a standard software packages includes writing understandable code with python and numpy.

4. implement linear models for classification;

This is introduced in lecture three and will be expanded upon throughout the course (including SVMs, feedforward networks etc). This is the main theme for assignment 1 and will come back in modified form as smaller parts of the other assignments.

5. design simple neural nets using some standard library.

This is the core of the second half of the course. We will be using pyTorch.

Examination

The course is examined by four assignments (handed in though studium) and a seminar. To pass the course ("Godkänt", G), you must pass all four assignments and the seminar. To pass the course with distinction ("Väl godkänt", VG), at least three of the individual assignments must be passed with distinction. All assignments will be distributed as ipython notebooks, which is also the hand-in format in studium. Note that the course includes additional ungraded mini-labs and exercises, which are not part of the examination.

	First deadline	Second deadline
Assignment 1: Sentiment Polarity for Movie Reviews	22 April	13 June
Assignment 2: Probabilistic Document Classification	29 April (peer-review), 6 May (final)	13 June
Assignment 3: Recurrent Network of Part-of-speech Tagging	20 May	13 June
Assignment 4: Gendered Directions in Embeddings	3 June	20 June
Ethics seminar

If you miss a submission deadline, or you do not pass the assignment, you can re-submit your assignment up to the resubmission deadline. Please also take notice of our general course assessment and examination policy. If there are special circumstances that make a regular submission impossible, please inform us in good time before a deadline.

Course summary:

Course Summary
Date	Details	Due

December 2025

Calendar
Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
1 December 2025 Previous month Next month Today Click to view event details	2 December 2025 Previous month Next month Today Click to view event details	3 December 2025 Previous month Next month Today Click to view event details	4 December 2025 Previous month Next month Today Click to view event details	5 December 2025 Previous month Next month Today Click to view event details	6 December 2025 Previous month Next month Today Click to view event details	7 December 2025 Previous month Next month Today Click to view event details
8 December 2025 Previous month Next month Today Click to view event details	9 December 2025 Previous month Next month Today Click to view event details	10 December 2025 Previous month Next month Today Click to view event details	11 December 2025 Previous month Next month Today Click to view event details	12 December 2025 Previous month Next month Today Click to view event details	13 December 2025 Previous month Next month Today Click to view event details	14 December 2025 Previous month Next month Today Click to view event details
15 December 2025 Previous month Next month Today Click to view event details	16 December 2025 Previous month Next month Today Click to view event details	17 December 2025 Previous month Next month Today Click to view event details	18 December 2025 Previous month Next month Today Click to view event details	19 December 2025 Previous month Next month Today Click to view event details	20 December 2025 Previous month Next month Today Click to view event details	21 December 2025 Previous month Next month Today Click to view event details
22 December 2025 Previous month Next month Today Click to view event details	23 December 2025 Previous month Next month Today Click to view event details	24 December 2025 Previous month Next month Today Click to view event details	25 December 2025 Previous month Next month Today Click to view event details	26 December 2025 Previous month Next month Today Click to view event details	27 December 2025 Previous month Next month Today Click to view event details	28 December 2025 Previous month Next month Today Click to view event details
29 December 2025 Previous month Next month Today Click to view event details	30 December 2025 Previous month Next month Today Click to view event details	31 December 2025 Previous month Next month Today Click to view event details	1 January 2026 Previous month Next month Today Click to view event details	2 January 2026 Previous month Next month Today Click to view event details	3 January 2026 Previous month Next month Today Click to view event details	4 January 2026 Previous month Next month Today Click to view event details
5 January 2026 Previous month Next month Today Click to view event details	6 January 2026 Previous month Next month Today Click to view event details	7 January 2026 Previous month Next month Today Click to view event details	8 January 2026 Previous month Next month Today Click to view event details	9 January 2026 Previous month Next month Today Click to view event details	10 January 2026 Previous month Next month Today Click to view event details	11 January 2026 Previous month Next month Today Click to view event details