Homework 2 - Model-free prediction and control
- Due May 17, 2020 by 11:59pm
- Points 25
- Submitting a file upload
- File Types pdf, ipynb, and txt
This assignment will cover DS4-DS5, and you will among other things solve the Taxi- and MountainCar-problems from HW0.
Additional packages needed:
As in previous tinkering notebooks and homework, you will need the OpenAI gym package. Additionally you will need the GridWorld environment also used in previous notebooks.
The purpose of the assignment:
1. Work with some fundamental concepts via pen-and-paper exercises.
2. Implement and test methods for learning a policy from experience.
The assignment:
Download the notebook here. Download Download the notebook here.
The instructions for the assignment is given in the notebook.
Hand-in:
Hand in the notebook with your solutions, so that the grader can easily run your code. All code and other solutions should also be given in the form of PDF-files. If you prefer, you can hand in the code by exporting your notebook into a PDF, and then write your answers in a different PDF. (It is much easier to give feedback on your solutions if the code is available in a PDF).
Passing requirements:
Each task is awarded a certain amount of points, indicated in the notebook. According to the quality of the answer, each question will receive scores according to 3 levels: 0%, 50%, or 100% of the question's score.
For passing this assignment, you should score at least 60% of the total score of the assignment, that is, you need to obtain least 15 out of 25 points.
The grading will be done through peer-review. Instructions for the peer-review is given here.
Questions:
If you have questions, write in the discussion forum or send an e-mail to per.mattsson@it.uu.se
Rubric
Criteria | Ratings | Pts |
---|---|---|
Ex 1.1
threshold:
pts
|
pts
--
|
|
Ex 1.2
threshold:
pts
|
pts
--
|
|
Ex 1.3
threshold:
pts
|
pts
--
|
|
Ex 1.4
threshold:
pts
|
pts
--
|
|
Ex 1.5
threshold:
pts
|
pts
--
|
|
Task 2: Reasoning choice of method
threshold:
pts
|
pts
--
|
|
Task 2: Code and plot
threshold:
pts
|
pts
--
|
|
Task 2: Positive average reward
threshold:
pts
|
pts
--
|
|
Task 2: Average reward at least 7
threshold:
pts
|
pts
--
|
|
Task 3.1
threshold:
pts
|
pts
--
|
|
Task 3.2: Code
threshold:
pts
|
pts
--
|
|
Task 3.2: Average reward above -200
threshold:
pts
|
pts
--
|
|
Task 3.2: Discussion of results
threshold:
pts
|
pts
--
|