├── HW ├── hw2.pdf ├── hw4.pdf ├── hw5.pdf ├── hw7.pdf ├── hw8.pdf ├── hw9.pdf └── hw1_scan.pdf ├── Notes ├── 8-RTDP.pdf ├── 1A-Intro.pdf ├── 5B-Gittins.pdf ├── 4-discounting.pdf ├── 9A-ApproximateVI.pdf ├── 11-policy-gradient.pdf ├── 10-policy-iteration.pdf ├── 1B-Optimal-Stopping.pdf ├── 2A-Inventory-Control.pdf ├── 5B-priority-policies.pdf ├── 7A-IndefiniteHorizon.pdf ├── 9B-Approximate-RTDP.pdf ├── 3-Partial-Observability.pdf ├── 5B-Tsitsiklis-short-proof.pdf ├── 2B-Linear-Quadratic-Control.pdf └── 5A-Continuous-Time-Discrete-Event.pdf └── index.md /HW/hw2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw2.pdf -------------------------------------------------------------------------------- /HW/hw4.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw4.pdf -------------------------------------------------------------------------------- /HW/hw5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw5.pdf -------------------------------------------------------------------------------- /HW/hw7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw7.pdf -------------------------------------------------------------------------------- /HW/hw8.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw8.pdf -------------------------------------------------------------------------------- /HW/hw9.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw9.pdf -------------------------------------------------------------------------------- /HW/hw1_scan.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/HW/hw1_scan.pdf -------------------------------------------------------------------------------- /Notes/8-RTDP.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/8-RTDP.pdf -------------------------------------------------------------------------------- /Notes/1A-Intro.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/1A-Intro.pdf -------------------------------------------------------------------------------- /Notes/5B-Gittins.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/5B-Gittins.pdf -------------------------------------------------------------------------------- /Notes/4-discounting.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/4-discounting.pdf -------------------------------------------------------------------------------- /Notes/9A-ApproximateVI.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/9A-ApproximateVI.pdf -------------------------------------------------------------------------------- /Notes/11-policy-gradient.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/11-policy-gradient.pdf -------------------------------------------------------------------------------- /Notes/10-policy-iteration.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/10-policy-iteration.pdf -------------------------------------------------------------------------------- /Notes/1B-Optimal-Stopping.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/1B-Optimal-Stopping.pdf -------------------------------------------------------------------------------- /Notes/2A-Inventory-Control.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/2A-Inventory-Control.pdf -------------------------------------------------------------------------------- /Notes/5B-priority-policies.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/5B-priority-policies.pdf -------------------------------------------------------------------------------- /Notes/7A-IndefiniteHorizon.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/7A-IndefiniteHorizon.pdf -------------------------------------------------------------------------------- /Notes/9B-Approximate-RTDP.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/9B-Approximate-RTDP.pdf -------------------------------------------------------------------------------- /Notes/3-Partial-Observability.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/3-Partial-Observability.pdf -------------------------------------------------------------------------------- /Notes/5B-Tsitsiklis-short-proof.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/5B-Tsitsiklis-short-proof.pdf -------------------------------------------------------------------------------- /Notes/2B-Linear-Quadratic-Control.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/2B-Linear-Quadratic-Control.pdf -------------------------------------------------------------------------------- /Notes/5A-Continuous-Time-Discrete-Event.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/djrusso/Dynamic-Optimization-Course/HEAD/Notes/5A-Continuous-Time-Discrete-Event.pdf -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | # Dynamic programming and Optimal Control 2 | 3 | ## Course Information 4 | Schedule: Fall 2023, Thursdays 9:00am - 12:15pm 5 | Location: 430 Kravis Hall 6 | 7 | Professor: Daniel Russo 8 | Office hours: 12:15-1:00 PM or by appointment 9 | 10 | TA: David Cheikhi 11 | 12 | Course Number: B9120-001 13 | 14 | ## Course description: 15 | 16 | This course serves as an advanced introduction to dynamic programming and optimal control. The first part of the course will cover problem formulation and problem specific solution ideas arising in canonical control problems. The second part of the course covers algorithms, treating foundations of approximate dynamic programming and reinforcement learning alongside exact dynamic programming algorithms. 17 | 18 | ## Prerequisites 19 | Markov chains; Some python and some linear programming; mathematical maturity (this is a doctoral course); 20 | 21 | ## Textbooks 22 | Dynamic Programming and Optimal Control by Dimitris Bertsekas, 4th Edition, Volumes I and II. 23 | 24 | 25 | 26 | # Class notes (and *tentative* schedule) 27 | 28 | | # | Date | Topic | Notes| Reading| Extras| 29 | |-|-|-|-|-|-| 30 | | | | **Part I: Problem formulation and special problem structure** | | | 31 | | 1 | 9/7 | Intro | [pdf](Notes/1A-Intro.pdf) | Vol 1, Sec 1.2-1.4 32 | | 2 | 9/7 | Optimal stopping | [pdf](Notes/1B-Optimal-Stopping.pdf) | Vol 1, Sec 3.4 | | 33 | | 3 | 9/14 | Inventory control | [pdf](Notes/2A-Inventory-Control.pdf) | Vol 1, Sec 3.2 | 34 | | 4 | 9/14 | Linear quadratic control | [pdf](Notes/2B-Linear-Quadratic-Control.pdf) | Vol 1, Sec 3.1 | 35 | | 5 | 9/21| Imperfect state information; reformulation as a problem with perfect state information; separation principle | [pdf](Notes/3-Partial-Observability.pdf) | Vol 1, Sec 4.1-4.3 36 | |6 | 9/28| Discounted infinite horizon objectives. | [pdf](Notes/4-discounting.pdf) | Vol 2 Sec 1.4 37 | |7 | 10/5| Continuous time, discrete event problems; uniformization|[pdf](Notes/5A-Continuous-Time-Discrete-Event.pdf) |Vol 2 Sec 1.5 | 38 | |8| 10/5| Gittins index theorem; Applications to scheduling and optimal learning; | [pdf1](Notes/5B-Gittins.pdf), [pdf2](Notes/5B-priority-policies.pdf), [pdf3](Notes/5B-Tsitsiklis-short-proof.pdf)| 39 | |9| 10/12| Gittins index theorem continued| | 40 | | | 10/19| No B-school classes scheduled| | | 41 | | | | **Part 2: Algorithms and approximations** | | | 42 | |10| 10/26| Infinite (or indefinite) horizon problems without discounting: average cost objectives & stochastic shortest path problems. | [pdf](Notes/7A-IndefiniteHorizon.pdf) 43 | |11| 11/2 | Online value iteration. | [pdf](Notes/8-RTDP.pdf) | | 44 | |12| 11/9 | Approximate value iteration. | [pdf](Notes/9A-ApproximateVI.pdf) | | 45 | |12| 11/9 | Approximation benefits of online value iteration. | [pdf](Notes/9B-Approximate-RTDP.pdf) | | 46 | |13| 11/16 | Policy iteration; impact of distribution shift on convergence rate. | [pdf](Notes/10-policy-iteration.pdf) | | 47 | | | 11/23| No B-school classes scheduled | | 48 | |14| 11/30 | Policy gradient methods | [pdf](Notes/11-policy-gradient.pdf) | 49 | |15| Dec 7| 50 | 51 | Topics for part 2 are yet to be determined. Some possibilities include: 52 | 53 | - Temporal difference learning and parametric value function approximation. 54 | - Policy gradient methods and their global convergence; differentiable simulators; 55 | - Efficient exploration; optimism in the face of uncertainty 56 | - Problems with many "weakly coupled" components; Lagrangian relaxations 57 | 58 | 59 | ## Evaluation 60 | Homework (90%). Lecture scribing and class participation (10%). 61 | 62 | ## Homework 63 | We will have a short homework each week. Please write down a precise, rigorous, formulation of all word problems. For example, specify the state space, the cost functions at each state, etc. 64 | 65 | 1. Due September 14. Vol 1. Problems 1.23, 1.24, 3.18. [Scan](HW/hw1_scan.pdf) 66 | 2. Due September 21. [pdf](HW/hw2.pdf) 67 | 3. Due September 29. see canvas annoucement 68 | 4. Due October 5. [pdf](HW/hw4.pdf) 69 | 5. Due October 12. [pdf](HW/hw5.pdf) 70 | 6. Due October 26. Vol 2. Problem 1.8. You may assume that the state space of each bandit process is finite. 71 | 7. Due November 2. [pdf](HW/hw7.pdf) 72 | 8. Due November 17. [pdf](HW/hw8.pdf) 73 | 9. Due December 9. [pdf](HW/hw9.pdf) 74 | 75 | 76 | 77 | 78 | --------------------------------------------------------------------------------