Watch DS7. This lecture is on policy gradient methods.

DS7 Slides Link

DS7 Video Link

Tinkering Notebook for Lecture 7

Reading Instructions:

Read Chapter 13, where 13.6 is optional.

Study Questions:

Slide numbers are out of 41.

It may seem that there are too many questions. Since policy gradient is a perspective you have not seen before, there is some unfamiliar notation in the slides. On the other hand, most of the study questions are straightforward and aim to make sure that you understand the basic notation.

L7 Q1: Relate the equation for probability of action on slide 17 to Eqn. 13.2 (together with Eqn 13.3.)

L7 Q2: Work on Exercise 13.3. Relate Eqn. 13.9 to the score function on slide 17.

L7 Q3: Work on Exercise 13.4. Relate the equations on the exercise to the equation on slide 18. State if one of them (setting in the book or the setting on the slide) is a special case of another.

L7 Q4: Relate the equation for gradient of J(theta) just before Eqn. 13.8 to the expression on slide 19. State if one of them (setting in the book or the setting on the slide) is a special case of another.

L7 Q5: Relate the algorithm on slide 21 to the algorithm on page 328 for gamma=1. Be aware of the difference in notation for return.

L7 Q6: Relate the update for TD(\lambda) with eligibility traces on slide 34 to the policy update of the episodic algorithm with eligibility traces on page 332 (the algorithm in the box at the bottom of the page) for gamma=1.

L7 Q7: Work on Exercise 13.1 as a preparation for the examples in the Tinkering notebook.