The course has an optional project of 3hp. Please submit the project proposal here.

Below are some guidelines about the project.

- The course participants propose a project themselves. There is no restriction on the theme of the project, other than the project should utilize reinforcement learning.

- As already stated in the calendar, the deadline for proposing a project is June 5th.

- Due date for the projects is October 4th, 23:59.

- The projects can be done in groups of 1-2. The delivered work should reflect the group size, i.e. groups of 2 should deliver 2 person x 3hp worth of work.

CONSULTATION HOURS

Discussions will be held over Zoom: https://uu-se.zoom.us/j/69643291145.

If you would like to discuss your project, please you sign-up using the below doodle links. Each slot is a 20 min individual slot with one of the teachers. Note that there are separate links for each week.

W25: https://doodle.com/poll/8hnc8t86hkp9tmyw

W26: https://doodle.com/poll/dems8z4xit57b3r3

ASSOCIATED RESEARCH ARTICLE

- Every project should be connected to at least one research article that uses reinforcement learning or a another learning and control technique that can be compared with reinforcement learning. This will help you to make some meaningful comparisons.

- It is fine to propose projects that you expect to possibly perform worse than the associated article as long as you can motivate your idea, for instance from a lower computational load perspective.

Additional information:

-While applying RL to a specific application, it may be the case that there is no RL work to the specific problem you consider, but it is unlikely that there is no application to a close-enough problem. You may try searching with alternative, close subject areas.

- Another alternative is to find an article that uses some other technique to solve the same/similar problem. This is what “another learning and control technique” refers to in the text. (Almost every good research article compares its results with that of other articles. Ask yourself which article I would have compared to the results of this project if it were a research article .)

FEEDBACK/HELP

All the feedback we give will be high-level. There will be no help on details, especially with any implementation issues.

We give feedback on the proposal based on what you have written.

After the proposal, the main feedback mechanism will be the consultation hours. We will announce optional consultation hours you can sign up for in June, August and September.

We may answer some occasional, high-level questions by email, but this type of feedback will be limited.

If there is too much demand on help with email, we will consider this as a sign of the need for structured discussions and simply point email writers to the consultation hours instead.

Please make sure that you choose a project you can perform according to the above. If in doubt, think of another project or find a partner. Note that working in groups of two is allowed. Most importantly, please do not count on help with email. With our workloads in the upcoming months, this is something we cannot guarantee.

DELIVERABLES

Your reasonably commented code for the project
Project Report

As PhD students, you already know what to include in such a report but here is a short reminder:

- description of the problem and motivation,

- description of the approach you have adopted with a short comparison with other possible approaches (RL and/or otherwise as applicable),

- information about implementation so that the results are reproducible ,

- results and discussions,

- conclusions and suggestions for future work (i.e what you would have done if you had more resources (time, computational resources etc)).

SUGGESTED ARTICLES

- Below are some suggested articles that relates to some of the subjects we're interested in. Nevertheless, you're most encouraged to choose an article outside this list. It will be best if you choose something that relates to a subject/method you're passionate about.

Suggested articles that can be used as a starting point for the course project (list is updated continuously):

-Joshua Achiam, Shankar Sastry, Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning, 2017

- Sebastian Trimpe, Alexander Millane, Simon Doessegger, Raffaello D’Andrea, A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum, 2014

- Kai Ueltzhöffer, Deep Active Inference, 2017

-Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell, Curiosity-driven Exploration by Self-supervised Prediction, 2017