L6 Study instructions
Watch DS6. This lecture is on function approximation. Both prediction and control are covered.
DS6 Slides Link
Links to an external site.
Links to an external site.DS6 Video Link
Links to an external site.
Tinkering Notebook for Lecture 6 Download Tinkering Notebook for Lecture 6
Notebook updated on 2020/05/26: A second variable "doneEpisode" to control the loop termination in the training procedure of SARSA is introduced. Thanks to Gustav Björdal for pointing out this mistake.
Supplementary Material for Tinkering Notebook: RandomWalk_100.pickle Download Supplementary Material for Tinkering Notebook: RandomWalk_100.pickle
Reading Instructions:
In Lecture 6, David Silver covers function approximation both for prediction (Chapter 9) and Control (Chapter 10, Chapter 11) and also variants with eligibility traces (Chapter 12). Here, value function or action-value function is approximated using a well-chosen parametrized function of the states (and actions). The most important reading here is Chapter 9 since it introduces the main novel aspect: function approximation and different function classes used in RL. A significant part of the rest of the material in DS6 can be almost seen as a direct extension of what you have already covered but this time implemented with function approximation.
- Chapter 9: We suggest you read this chapter almost in its entirety.
- You can skip 9.5.3, 9.5.5, 9.7 without lack of continuity in your first reading. The rest of the feature families from 9.5 are used in the Tinkering Notebook, so it is beneficial to familiarize yourself with them. Nevertheless, note that all of the function approximation methods covered in Chapter 9 are important function approximation methods in RL and in general. In particular, Section 9.7 gives a very quick introduction to neural networks which is used in almost all state-of-art implementations (and also mentioned in DS6).
- You can skip 9.9, 9.10, 9.11 without lack of continuity.
- Chapter 10: Read 10.1 -10. 2.
- Chapter 11: Read 11.1- 11.3. Optional reading 11.4-11.7. (11.7 introduces Gradient TD in the slides)
- Chapter 12: Read 12.1 -12.8 except 12.6. You have already learned about eligibility traces. Now the chapter considers them together with function approximation. This chapter will serve as an overview of almost all algorithms you have seen so far.
Study Questions:
L6 Q1: Work on Exercise 9.2 and Exercise 9.3.
L6 Q2: Relate the updates on slide 17 (out of 56) and slide 18 (out of 56 )in DS6 with the algorithms on pages 202 and 203, respectively.
L6 Q3: Relate the update for forward view linear TD(lambda) on slide 19 (out of 56) in DS6 with Eqn. 12.4.
L6 Q4: Relate the update for backward view linear TD(lambda) on slide 19 (out of 56) in DS6 with the algorithm on page 293.