├── 2019
    └── tex
    │   ├── chapter1_optimization.tex
    │   ├── chapter2_DP.tex
    │   ├── chapter3_HJB.tex
    │   ├── chapter4_indirect.tex
    │   ├── chapter5_direct.tex
    │   ├── chapter6_MPC.tex
    │   ├── combined.tex
    │   ├── figs
    │       ├── foc.png
    │       ├── large_step.png
    │       ├── linesearch.png
    │       ├── newtonmethod.png
    │       ├── optimality.png
    │       └── small_step.png
    │   ├── preamble.tex
    │   ├── references.bib
    │   └── source
    │       ├── ch1.tex
    │       ├── ch2.tex
    │       ├── ch3.tex
    │       ├── ch4.tex
    │       ├── ch5.tex
    │       ├── ch6.tex
    │       ├── ch7.tex
    │       ├── ch8.tex
    │       └── intro.tex
├── 2020
    └── tex
    │   ├── combined.tex
    │   ├── figs
    │       ├── foc.png
    │       ├── large_step.png
    │       ├── linesearch.png
    │       ├── newtonmethod.png
    │       ├── optimality.png
    │       └── small_step.png
    │   ├── preamble.tex
    │   ├── references.bib
    │   └── source
    │       ├── ch1.tex
    │       ├── ch10.tex
    │       ├── ch11.tex
    │       ├── ch12.tex
    │       ├── ch2.tex
    │       ├── ch3.tex
    │       ├── ch4.tex
    │       ├── ch5.tex
    │       ├── ch6.tex
    │       ├── ch7.tex
    │       ├── ch8.tex
    │       ├── ch9.tex
    │       └── intro.tex
├── notes.pdf
└── readme.md


/2019/tex/chapter1_optimization.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | \begin{document}
13 | 
14 | \maketitle
15 | 
16 | \input{source/ch1.tex}
17 | 
18 | \bibliography{references}
19 | \bibliographystyle{alpha}
20 | 
21 | \end{document}
22 | 


--------------------------------------------------------------------------------
/2019/tex/chapter2_DP.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | 
13 | \setcounter{section}{1}
14 | \begin{document}
15 | 
16 | \maketitle
17 | 
18 | \input{source/ch2.tex}
19 | 
20 | \bibliography{references}
21 | \bibliographystyle{alpha}
22 | 
23 | \end{document}
24 | 


--------------------------------------------------------------------------------
/2019/tex/chapter3_HJB.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | 
13 | \setcounter{section}{2}
14 | \begin{document}
15 | 
16 | \maketitle
17 | 
18 | \input{source/ch3.tex}
19 | 
20 | \bibliography{references}
21 | \bibliographystyle{alpha}
22 | 
23 | \end{document}
24 | 


--------------------------------------------------------------------------------
/2019/tex/chapter4_indirect.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | 
13 | \setcounter{section}{3}
14 | \begin{document}
15 | 
16 | \maketitle
17 | 
18 | \input{source/ch4.tex}
19 | 
20 | \bibliography{references}
21 | \bibliographystyle{alpha}
22 | 
23 | \end{document}
24 | 


--------------------------------------------------------------------------------
/2019/tex/chapter5_direct.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | 
13 | \setcounter{section}{4}
14 | \begin{document}
15 | 
16 | \maketitle
17 | 
18 | \input{source/ch5.tex}
19 | 
20 | \bibliography{references}
21 | \bibliographystyle{alpha}
22 | 
23 | \end{document}
24 | 


--------------------------------------------------------------------------------
/2019/tex/chapter6_MPC.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Course Notes}
 9 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
10 | \date{\today}
11 | 
12 | 
13 | \setcounter{section}{5}
14 | \begin{document}
15 | 
16 | \maketitle
17 | 
18 | \input{source/ch6.tex}
19 | 
20 | \bibliography{references}
21 | \bibliographystyle{alpha}
22 | 
23 | \end{document}
24 | 


--------------------------------------------------------------------------------
/2019/tex/combined.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{article}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{AA203: Optimal and Learning-based Control\\
 8 | Combined Course Notes}
 9 | 
10 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
11 | \date{\today}
12 | 
13 | \begin{document}
14 | 
15 | \maketitle
16 | 
17 | 
18 | \input{source/intro.tex}
19 | 
20 | \tableofcontents
21 | 
22 | \input{source/ch1.tex}
23 | \input{source/ch2.tex}
24 | \input{source/ch3.tex}
25 | \input{source/ch4.tex}
26 | \input{source/ch5.tex}
27 | \input{source/ch6.tex}
28 | \input{source/ch7.tex}
29 | \input{source/ch8.tex}
30 | 
31 | \bibliography{references}
32 | \bibliographystyle{alpha}
33 | 
34 | \end{document}
35 | 


--------------------------------------------------------------------------------
/2019/tex/figs/foc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/foc.png


--------------------------------------------------------------------------------
/2019/tex/figs/large_step.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/large_step.png


--------------------------------------------------------------------------------
/2019/tex/figs/linesearch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/linesearch.png


--------------------------------------------------------------------------------
/2019/tex/figs/newtonmethod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/newtonmethod.png


--------------------------------------------------------------------------------
/2019/tex/figs/optimality.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/optimality.png


--------------------------------------------------------------------------------
/2019/tex/figs/small_step.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2019/tex/figs/small_step.png


--------------------------------------------------------------------------------
/2019/tex/preamble.tex:
--------------------------------------------------------------------------------
 1 | \usepackage{amsmath}
 2 | \usepackage{amsthm}
 3 | \usepackage{amsfonts}
 4 | \usepackage{bm}
 5 | \usepackage{graphicx}
 6 | \usepackage{subcaption}
 7 | \usepackage{accents}
 8 | \usepackage{mathtools}
 9 | 
10 | \usepackage{algorithm}
11 | \usepackage{algpseudocode}
12 | 
13 | \usepackage{url}
14 | 
15 | \newtheorem{theorem}{Theorem}[section]
16 | \newtheorem{corollary}[theorem]{Corollary}
17 | \newtheorem{lemma}[theorem]{Lemma}
18 | \newtheorem{remark}[theorem]{Remark}
19 | \newtheorem{definition}[theorem]{Definition}
20 | 
21 | % macros
22 | \newcommand{\cost}{c}
23 | \newcommand{\pol}{\pi}
24 | \newcommand{\st}{\bm{x}}
25 | \newcommand{\cst}{\bm{p}} % costate
26 | \newcommand{\stdot}{\dot{\bm{x}}}
27 | \newcommand{\ac}{\bm{u}}
28 | \newcommand{\ob}{\bm{z}}
29 | \newcommand{\ad}{\bm{d}}
30 | 
31 | 
32 | \newcommand{\f}{f}
33 | \newcommand{\h}{h} %used for measurement model
34 | \newcommand{\J}{J}
35 | 
36 | 
37 | \newcommand{\w}{\bm{\omega}} %process noise
38 | \newcommand{\wob}{\bm{\nu}} %measurement noise
39 | \newcommand{\W}{\Sigma_{\omega}} %measurement covar
40 | \newcommand{\V}{\Sigma_{\nu}} %measurement noise
41 | \newcommand{\I}{\bm{i}} %information vector
42 | 
43 | 
44 | \newcommand{\ham}{\mathcal{H}} %information vector
45 | 
46 | \newcommand{\R}{\mathbb{R}}
47 | \newcommand{\E}{\mathbb{E}}
48 | \newcommand{\tr}{\text{tr}}
49 | 
50 | 
51 | \newcommand\munderbar[1]{%
52 |   \underaccent{\bar}{#1}}
53 | 
54 | \newcommand{\argmin}{\text{argmin}}
55 | \newcommand{\argmax}{\text{argmax}}
56 | 
57 | 
58 | 
59 | 
60 | 
61 | 
62 | 


--------------------------------------------------------------------------------
/2019/tex/references.bib:
--------------------------------------------------------------------------------
  1 | @String { icml              = {International Conference on Machine Learning (ICML)} }
  2 | @String { colt              = {Conference on Learning Theory (COLT)} }
  3 | @String { nips              = {Neural Information Processing Systems (NIPS)} }
  4 | @String { ijrr              = {International Journal of Robotics Research} }
  5 | @String { isrr              = {International Symposium on Robotics Research (ISRR)} }
  6 | @String { icra              = {{IEEE} International Conference on Robotics and Automation (ICRA)} }
  7 | @String { iros              = {{IEEE} International Conference on Intelligent Robots and Systems (IROS)} }
  8 | @String { humanoids         = {{IEEE-RAS} International Conference on Humanoid Robotics (Humanoids)} }
  9 | @String { jmlr              = {Journal of Machine Learning Research} }
 10 | @String { iclr              = {International Conference on Learning Representations (ICLR)} }
 11 | @String { uai               = {Uncertainty in Artificial Intelligence (UAI)} }
 12 | @String { tpami             = {IEEE Transactions on Pattern Analysis \& Machine Intelligence} }
 13 | @String { tac               = {IEEE Transactions on Automatic Control} }
 14 | @String { automatica        = {Automatica} }
 15 | @String { jfr               = {Journal of Field Robotics} }
 16 | @String { ar                = {Autonomous Robots} }
 17 | @String { ijcai             = {International Joint Conference on Artificial Intelligence (IJCAI)} }
 18 | @String { cvpr              = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)} }
 19 | @String { eccv              = {European Conference on Computer Vision (ECCV)} }
 20 | @String { aistats           = {Artificial Intelligence and Statistics (AISTATS)} }
 21 | @String { acc               = {American Control Conference (ACC)} }
 22 | @String { cdc               = {IEEE Conference on Decision and Control (CDC)} }
 23 | @String { nc                = {Neural Computation} }
 24 | @String { jasa              = {Journal of the American Statistical Association} }
 25 | @String { wafr              = {Workshop on the Algorithmic Foundations of Robotics (WAFR)} }
 26 | @String { corl              = {Conference on Robot Learning (CoRL)} }
 27 | @String { rss               = {Robotics: Science and Systems (RSS)} }
 28 | @String { jgcd              = {AIAA Journal of Guidance, Control, and Dynamics} }
 29 | @String { tsc               = {IEEE Transactions on Control Systems Technology} }
 30 | 
 31 | 
 32 | % optimization 
 33 | 
 34 | @book{bertsekas2016nonlinear,
 35 |   title={Nonlinear programming},
 36 |   author={Bertsekas, Dimitri P},
 37 |   year={2016},
 38 |   publisher={Athena Scientific}
 39 | }
 40 | 
 41 | @book{bertsimas1997introduction,
 42 |   title={Introduction to linear optimization},
 43 |   author={Bertsimas, Dimitris and Tsitsiklis, John N},
 44 |   year={1997},
 45 |   publisher={Athena Scientific}
 46 | }
 47 | 
 48 | @article{powell2012ai,
 49 |   title={{AI}, {OR} and control theory: A rosetta stone for stochastic optimization},
 50 |   author={Powell, Warren B},
 51 |   year={2012}
 52 | }
 53 | 
 54 | @book{boyd2004convex,
 55 |   title={Convex optimization},
 56 |   author={Boyd, Stephen and Vandenberghe, Lieven},
 57 |   year={2004},
 58 |   publisher={Cambridge university press}
 59 | }
 60 | 
 61 | @article{kolter2008convex,
 62 |   title={Convex Optimization Overview},
 63 |   journal={CS 229 Lecture Notes},
 64 |   author={Zico Kolter},
 65 |   year={2008}
 66 | }
 67 | 
 68 | % DP
 69 | 
 70 | @book{bertsekas1995dynamic,
 71 |   title={Dynamic programming and optimal control},
 72 |   author={Bertsekas, Dimitri P},
 73 |   edition={4},
 74 |   number={1},
 75 |   year={2012}
 76 | }
 77 | 
 78 | @book{anderson2007optimal,
 79 |   title={Optimal control: linear quadratic methods},
 80 |   author={Anderson, Brian DO and Moore, John B},
 81 |   year={2007},
 82 |   publisher={Courier Corporation}
 83 | }
 84 | 
 85 | @inproceedings{todorov2005generalized,
 86 |   title={A generalized iterative {LQG} method for locally-optimal feedback control of constrained nonlinear stochastic systems},
 87 |   author={Todorov, Emanuel and Li, Weiwei},
 88 |   booktitle=acc,
 89 |   year={2005}
 90 | }
 91 | 
 92 | @inproceedings{tassa2014control,
 93 |   title={Control-limited differential dynamic programming},
 94 |   author={Tassa, Yuval and Mansard, Nicolas and Todorov, Emo},
 95 |   booktitle=icra,
 96 |   year={2014}
 97 | }
 98 | 
 99 | @inproceedings{levine2014learning,
100 |   title={Learning complex neural network policies with trajectory optimization},
101 |   author={Levine, Sergey and Koltun, Vladlen},
102 |   booktitle=icml,
103 |   year={2014}
104 | }
105 | 
106 | @book{mayne1970ddp,
107 |   title={Differential Dynamic Programming},
108 |   author={David Jacobson and David Mayne},
109 |   year={1970},
110 |   publisher={Elsevier}
111 | }
112 | 
113 | @inproceedings{tassa2012synthesis,
114 |   title={Synthesis and stabilization of complex behaviors through online trajectory optimization},
115 |   author={Tassa, Yuval and Erez, Tom and Todorov, Emanuel},
116 |   booktitle=iros,
117 |   year={2012}
118 | }
119 | 
120 | @techreport{liao1992advantages,
121 |   title={Advantages of differential dynamic programming over Newton's method for discrete-time optimal control problems},
122 |   author={Liao, Li-zhi and Shoemaker, Christine A},
123 |   year={1992},
124 |   institution={Cornell University}
125 | }
126 | 
127 | @inproceedings{xie2017differential,
128 |   title={Differential dynamic programming with nonlinear constraints},
129 |   author={Xie, Zhaoming and Liu, C Karen and Hauser, Kris},
130 |   booktitle=icra,
131 |   year={2017}
132 | }
133 | 
134 | @inproceedings{giftthaler2017projection,
135 |   title={A projection approach to equality constrained iterative linear quadratic optimal control},
136 |   author={Giftthaler, Markus and Buchli, Jonas},
137 |   booktitle=humanoids,
138 |   year={2017}
139 | }
140 | 
141 | @inproceedings{li2004iterative,
142 |   title={Iterative linear quadratic regulator design for nonlinear biological movement systems.},
143 |   author={Li, Weiwei and Todorov, Emanuel},
144 |   booktitle={International Conference on Informatics in Control, Automation, and Robotics},
145 |   year={2004}
146 | }
147 | 
148 | %HJB
149 | 
150 | @article{mitchell2005time,
151 |   title={A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games},
152 |   author={Mitchell, Ian M and Bayen, Alexandre M and Tomlin, Claire J},
153 |   journal=tac,
154 |   year={2005}
155 | }
156 | 
157 | @article{bressan2010noncooperative,
158 |   title={Noncooperative differential games. a tutorial},
159 |   year={2010},
160 |   author={Bressan, Alberto}
161 | }
162 | 
163 | @book{kirk2012optimal,
164 |   title={Optimal control theory: an introduction},
165 |   author={Kirk, Donald E},
166 |   year={2012},
167 |   publisher={Courier Corporation}
168 | }
169 | 
170 | % Indirect
171 | 
172 | @book{bryson1975applied,
173 |   title={Applied Optimal Control: Optimization, Estimation and Control},
174 |   author={Arthur Bryson and Yu-Chi Ho},
175 |   year={1975},
176 |   publisher={CRC Press}
177 | }
178 | 
179 | @article{lee1967foundations,
180 |   title={Foundations of optimal control theory},
181 |   author={Lee, Ernest Bruce and Markus, Lawrence},
182 |   year={1967},
183 |   publisher={Wiley}
184 | }
185 | 
186 | % Direct
187 | 
188 | @article{kelly2017transcription,
189 |   title={Transcription Methods for Trajectory Optimization: a beginners tutorial},
190 |   author={Kelly, Matthew},
191 |   journal={arXiv:1707.00284},
192 |   year={2017}
193 | }
194 | 
195 | @article{kelly2017introduction,
196 |   title={An introduction to trajectory optimization: how to do your own direct collocation},
197 |   author={Kelly, Matthew},
198 |   journal={SIAM Review},
199 |   year={2017}
200 | }
201 | 
202 | % MPC
203 | 
204 | @book{borrelli2017predictive,
205 |   title={Predictive control for linear and hybrid systems},
206 |   author={Borrelli, Francesco and Bemporad, Alberto and Morari, Manfred},
207 |   year={2017},
208 |   publisher={Cambridge University Press}
209 | }
210 | 
211 | @book{rawlings2017model,
212 |   title={Model Predictive Control: Theory, Computation, and Design},
213 |   author={Rawlings, James Blake and Mayne, David Q and Diehl, Moritz},
214 |   year={2017},
215 |   publisher={Nob Hill Publishing}
216 | }


--------------------------------------------------------------------------------
/2019/tex/source/ch1.tex:
--------------------------------------------------------------------------------
  1 | \section{Nonlinear Optimization}
  2 | 
  3 | In this section we discuss the generic nonlinear optimization problem that forms the basis for the rest of the material presented in this class. We write the minimization problem as 
  4 | \begin{equation*}
  5 | \begin{aligned}
  6 | & \underset{\bm{x} \in \mathcal{X}}{\min}
  7 | & & f(\bm{x})
  8 | \end{aligned}
  9 | \end{equation*}
 10 | where $f$ is the cost function, usually assumed twice continuously differentiable, $x \in \R^n$ is the optimization variable, and $\mathcal{X} \subset \R^n$ is the constraint set. The special case in which the cost function is linear and the constraint set is specified by linear equations and/or inequalities is \textit{linear optimization}, which we will not discuss. 
 11 | 
 12 | \subsection{Unconstrained Nonlinear Optimization}
 13 | 
 14 | We will first address the unconstrained case, in which $\mathcal{X} = \R^n$. A vector $\bm{x}^*$ is said to be an unconstrained \textit{local minimum} if there exists $\epsilon > 0$ such that $f(\bm{x}^*) \leq f(\bm{x})$ for all $\bm{x} \in \{\bm{x} \mid \|\bm{x} - \bm{x}^*\| \leq \epsilon\}$, and $\bm{x}^*$ is said to be an unconstrained \textit{global minimum} if $f(\bm{x}^*) \leq f(\bm{x})$ for all $x \in \R^n$. 
 15 | 
 16 | \subsubsection{Necessary Conditions for Optimality}
 17 | 
 18 | For a differentiable cost function, we can compare the cost of a point to its neighbors by considering a small variation $\Delta \bm{x}$ from $\bm{x}^*$. By using Taylor expansions, this yields a first order cost variation
 19 | \begin{equation}
 20 |     f(\bm{x}^* + \Delta \bm{x}) - f(\bm{x}^*) \approx \nabla f(\bm{x}^*)^T \Delta \bm{x}
 21 | \end{equation}
 22 | and a second order cost variation 
 23 | \begin{equation}
 24 |     f(\bm{x}^* + \Delta \bm{x}) - f(\bm{x}^*) \approx \nabla f(\bm{x}^*)^T \Delta \bm{x} + \frac{1}{2} \Delta \bm{x}^T \nabla^2 f(\bm{x}^*) \Delta \bm{x}.
 25 | \end{equation}
 26 | Setting $\Delta \bm{x}$ to be equal to positive and negative multiples of the unit coordinate vector, we have 
 27 | \begin{equation}
 28 |     \frac{\partial f(\bm{x}^*)}{\partial x_i} \geq 0
 29 | \end{equation}
 30 | where $x_i$ denotes the $i$'th coordinate of $\bm{x}$, and 
 31 | \begin{equation}
 32 |     \frac{\partial f(\bm{x}^*)}{\partial x_i} \leq 0
 33 | \end{equation}
 34 | for all $i$, which is only satisfied by $\nabla f(\bm{x}^*) = 0$.  This is referred to as the \textit{first order necessary condition for optimality}. Looking at the second order variation, and noting that $\nabla f(\bm{x}^*) = 0$, we expect
 35 | \begin{align}
 36 | f(\bm{x}^* + \Delta \bm{x}) - f(\bm{x}^*) \geq 0
 37 | \end{align}
 38 | and thus
 39 | \begin{align}
 40 | \Delta \bm{x}^T \nabla^2 f(\bm{x}^*) \Delta \bm{x} \geq 0
 41 | \end{align}
 42 | which implies $\nabla^2 f(\bm{x}^*)$ is positive semidefinite. This is referred to as the \textit{second order necessary condition for optimality}. Stating these conditons formally, 
 43 | 
 44 | \begin{theorem}[Necessary Conditions for Optimality (NOC)]
 45 | Let $\bm{x}^*$ be an unconstrained local minimum of $f:\R^n \to \R$ and $f \in C^1$ in an open set $S$ containing $\bm{x}^*$. Then,
 46 | \begin{equation}
 47 |     \nabla f(\bm{x}^*) = 0.
 48 | \end{equation}
 49 | If $f \in C^2$ within $S$, $\nabla^2 f(\bm{x}^*)$ is positive semidefinite. 
 50 | \end{theorem}
 51 | 
 52 | \proof{See section 1.1 of \cite{bertsekas2016nonlinear}}.
 53 | 
 54 | \subsubsection{Sufficient Conditions for Optimality}
 55 | 
 56 | 
 57 | \begin{figure}[t]
 58 |     \centering
 59 |     \includegraphics[width=0.7\linewidth]{figs/foc.png}
 60 |     \caption{An example of a function for which the necessary conditions of optimality are satisfied but the sufficient conditions are not.}
 61 |     \label{fig:foc}
 62 | \end{figure}
 63 | 
 64 | 
 65 | If we strengthen the second order condition to $\nabla^2 f(\bm{x}^*)$ being positive definite, we have the sufficient conditions for $\bm{x}^*$ being a local minimum. Why is the second order necessary conditions not sufficient? An example function is given in figure \ref{fig:foc}. Formally, 
 66 | 
 67 | \begin{theorem}[Sufficient Conditions for Optimality (SOC)]
 68 | Let $f:\R^n \to \R$ be $C^2$ in an open set $S$. Suppose a vector $\bm{x}^*$ satisfies the conditions $\nabla f(\bm{x}^*) = 0$ and $\nabla^2 f(\bm{x}^*)$ is positive definite. Then $\bm{x}^*$ is a strict unconstrained local minimum of $f$. 
 69 | \end{theorem}
 70 | 
 71 | Proof is again given in Section 1.1 of \cite{bertsekas2016nonlinear}.
 72 | There are several reasons why the optimality conditions are important. In a general nonlinear optimization setting, they can be used to filter candidates for global minima. They can be used for sensitivity analysis, in which the sensitivity of $\bm{x}^*$ to model parameters can be quantified \cite{bertsekas2016nonlinear}. This is common in e.g. microeconomics. Finally, these conditions often provide the basis for the design and analysis of optimization algorithms.
 73 | 
 74 | \subsubsection{Special case: Convex Optimization}
 75 | 
 76 | A special case within nonlinear optimization is the set of \textit{convex optimization} problems. A set $S \subset \R^n$ is called \textit{convex} if 
 77 | \begin{equation}
 78 |     \alpha \bm{x} + (1 - \alpha) \bm{y} \in S, \quad \forall \bm{x},\bm{y} \in S, \forall \alpha \in [0,1].
 79 | \end{equation}
 80 | For $S$ convex, a function $f:S\to\R$ is called convex if 
 81 | \begin{equation}
 82 |     f(\alpha \bm{x} + (1-\alpha) \bm{y}) \leq \alpha f(\bm{x}) + (1-\alpha) f(\bm{y}).
 83 |     \label{eq:conv_fun}
 84 | \end{equation}
 85 | This class of problems has several important characteristics. If $f$ is convex, then
 86 | \begin{itemize}
 87 |     \item A local minimum of $f$ over $S$ is also a global minimum over $S$. If in addition $f$ is strictly convex (the inequality in (\ref{eq:conv_fun}) is strict), there exists at most one global minimum of $f$.
 88 |     \item If $f \in C^1$ and convex, and the set $S$ is open, $\nabla f(\bm{x}^*) = 0$ is a necessary and sufficient condition for a vector $\bm{x}^* \in S$ to be a global minimum over $S$.
 89 | \end{itemize}
 90 | Convex optimization problems have several nice properties that make them (usually) computationally efficient to solve, and the first property above gives a certificate of having obtained global optimality that is difficult or impossible to obtain in the general nonlinear optimization setting. For a thorough treatment of convex optimization theory and algorithms, see \cite{boyd2004convex}. 
 91 | 
 92 | \subsubsection{Computational Methods}
 93 | 
 94 | In this subsection we will discuss the class of algorithms known as \textit{gradient methods} for finding local minima in nonlinear optimization problems. These approaches, rely (roughly) on following the gradient of the function ``downhill'', toward the minima. More concretely, these algorithms rely on taking steps of the form
 95 | \begin{equation}
 96 |     \bm{x}^{k+1} = \bm{x}^k + \alpha^{k} \bm{d}^k
 97 | \end{equation}
 98 | where if $\nabla f(\bm{x}) \neq 0$, $\bm{d}^k$ is chosen so that 
 99 | \begin{equation}
100 |     \nabla f(\bm{x})^T \bm{d}^k < 0
101 | \end{equation}
102 | and $\alpha > 0$. Typically, the step size $\alpha^k$ is chosen such that 
103 | \begin{equation}
104 |     f(\bm{x}^k + \alpha^k \bm{d}^k) < f(\bm{x}^k),
105 | \end{equation}
106 | but generally, the step size and the direction of descent ($\bm{d}^k$) are tuning parameters. 
107 | 
108 | We will look at the general class of descent directions of the form 
109 | \begin{equation}
110 |     \bm{d}^k = -D^k \nabla f(\bm{x}^k)
111 | \end{equation}
112 | where $D^k > 0$ (note that this guarantees $\nabla f(\bm{x}^k)^T \bm{d}^k < 0$). 
113 | 
114 | \paragraph{Steepest descent, $D^k = I$.} The simplest choice of descent direction is directly following the gradient, and ignoring second order function information. In practice, this often leads to slow convergence (figure \ref{fig:graddesc_small}) and possible oscillation (figure \ref{fig:graddesc_large}).
115 | 
116 | \begin{figure}[!t]
117 |     \centering
118 |     \begin{subfigure}[b]{0.46\linewidth}
119 |         \centering
120 |         \includegraphics[width=\textwidth]{figs/small_step.png}
121 |         \caption{Steepest descent, small fixed step size.}
122 |         \label{fig:graddesc_small}
123 |     \end{subfigure}%
124 |     \begin{subfigure}[b]{0.46\linewidth}
125 |         \centering
126 |         \includegraphics[width=\textwidth]{figs/large_step.png}
127 |         \caption{Steepest descent, large fixed step size.}
128 |         \label{fig:graddesc_large}
129 |     \end{subfigure}
130 |     \begin{subfigure}[b]{0.46\linewidth}
131 |         \centering
132 |         \includegraphics[width=\textwidth]{figs/linesearch.png}
133 |         \caption{Steepest descent, step size chosen via line search.}
134 |         \label{fig:graddesc_line}
135 |     \end{subfigure}%
136 |     \begin{subfigure}[b]{0.46\linewidth}
137 |         \centering
138 |         \includegraphics[width=\textwidth]{figs/newtonmethod.png}
139 |         \caption{Newton's method. Note that the method converges in one step.}
140 |         \label{fig:graddesc_newton}
141 |     \end{subfigure}
142 |     \caption{Comparison of steepest descent methods with various step sizes, and Newton's method, on the same quadratic cost function.}
143 |     \label{fig:gradient_descent}
144 | \end{figure}
145 | 
146 | \paragraph{Newton's Method, $D^k = (\nabla^2 f(\bm{x}^k))^{-1}$.} The underlying idea of this approach is to at each iteration, minimize the quadratic approximation of $f$ around $\bm{x}^k$,
147 | \begin{equation}
148 |     f^k(\bm{x}) = f(\bm{x}^k) + \nabla f(\bm{x}^k)^T (\bm{x} - \bm{x}^k) + \frac{1}{2} (\bm{x} - \bm{x}^k)^T \nabla^2 f(\bm{x}^k) (\bm{x} - \bm{x}^k).
149 | \end{equation}
150 | Setting the derivative of this to zero, we obtain 
151 | \begin{equation}
152 |     \nabla f(\bm{x}^k) + \nabla^2 f(\bm{x}^k) (\bm{x} - \bm{x}^k) = 0
153 | \end{equation}
154 | and thus, by setting $\bm{x}^{k+1}$ to be the $\bm{x}$ that satisfies the above, we get the
155 | \begin{equation}
156 |     \bm{x}^{k+1} = \bm{x}^k - (\nabla^2 f(\bm{x}^k))^{-1} \nabla f(\bm{x}^k)
157 | \end{equation}
158 | or more generally, 
159 | \begin{equation}
160 |     \bm{x}^{k+1} = \bm{x}^k - \alpha (\nabla^2 f(\bm{x}^k))^{-1} \nabla f(\bm{x}^k).
161 | \end{equation}
162 | Note that this update is only valid for $\nabla^2 f(\bm{x}^k) > 0$. When this condition doesn't hold, $\bm{x}^{k+1}$ is not a minimizer of the second order approximation (as a result of the SOCs). See figure \ref{fig:graddesc_newton} for an example where Newton's method converges in one step, as a result of the cost function being quadratic. 
163 | 
164 | \paragraph{Diagonally scaled steepest descent, $D^k = \textrm{diag}(d_1^k, \ldots, d_n^k)$.} Have $d_i^k >0 \forall i$. A popular choice is 
165 | \begin{equation}
166 |     d_i^k = \left( \frac{\partial^2 f(\bm{x}^k)}{\partial x_i^2} \right)^{-1}
167 | \end{equation}
168 | which is a diagonal approximation of the Hessian. 
169 | 
170 | \paragraph{Modified Newton's method, $D^k = (\nabla^2 f(\bm{x}^0))^{-1}$.} Requires $\nabla^2 f(\bm{x}^0) > 0$. For cases in which one expects $\nabla^2 f(\bm{x}^0) \approx \nabla^2 f(\bm{x}^k)$, this removes having to compute the Hessian at each step. 
171 | 
172 | In addition to choosing the descent direction, there also exist a variety of methods to choose the step size $\alpha$. A computationally intensive but efficient (in terms of the number of steps taken) is using a minimization rule of the form 
173 | \begin{equation}
174 | \alpha^k = \argmin_{\alpha\geq 0} f(\bm{x}^k + \alpha \bm{d}^k)  
175 | \end{equation}
176 | which is usually solved via line search (figure \ref{fig:graddesc_line}). Alternative approaches include a limited minimization rule, in which you constrain $\alpha^k \in [0,s]$ during the line search, or simpler approach such as a constant step size (which may not guarantee convergence), or a diminishing scheduled step size. In this last case, schedules are typically chosen such that $\alpha^k \to 0$ as $k \to \infty$, while $\sum_{k=0}^\infty \alpha^k = +\infty$.
177 | 
178 | \subsection{Constrained Nonlinear Optimization}
179 | 
180 | In this section we will address the general constrained nonlinear optimization problem,
181 | \begin{equation*}
182 | \begin{aligned}
183 | & \underset{\bm{x} \in \mathcal{X}}{\min}
184 | & & f(\bm{x})
185 | \end{aligned}
186 | \end{equation*}
187 | which may equivalently be written
188 | \begin{equation*}
189 | \begin{aligned}
190 | & \underset{\bm{x}}{\min}
191 | & & f(\bm{x})\\
192 | & \textrm{s.t.} & & \bm{x} \in \mathcal{X}
193 | \end{aligned}
194 | \end{equation*}
195 | where the set $\mathcal{X}$ is usually specified in terms of equality and inequality constraints. To operate within this problem structure, we will develop a set of optimality conditions involving auxiliary variables called \textit{Lagrange multipliers}. 
196 | 
197 | \subsubsection{Equality Constrained Optimization}
198 | 
199 | We will first look at optimization with equality constraints of the form 
200 | \begin{equation*}
201 | \begin{aligned}
202 | & \underset{\bm{x}}{\min}
203 | & & f(\bm{x})\\
204 | & \textrm{s.t.} & & h_i(\bm{x}) = 0, \quad i = 1, \ldots, m
205 | \end{aligned}
206 | \end{equation*}
207 | where $f:\R^n \to \R$, $h_i:\R^n \to \R$ are $C^1$. We will write $\bm{h} = [h_1,\ldots, h_m ]^T$. For a given local minimum $\bm{x}^*$, there exist scalars $\lambda_1, \ldots, \lambda_m$ called Lagrange multipliers such that 
208 | \begin{equation}
209 |     \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) = 0.
210 | \end{equation}
211 | There are several possible interpretations for Lagrange multipliers. First, note that the cost gradient $f(\bm{x}^*)$ is in the subspace spanned by the constraint gradients at $\bm{x}^*$. Equivalently, $f(\bm{x}^*)$ is orthogonal to the subspace of first order feasible variations 
212 | \begin{equation}
213 |     V(\bm{x}^*) = \{\Delta \bm{x} \mid \nabla h_i(\bm{x}^*)^T \Delta \bm{x} = 0, i=1,\ldots,m\}.
214 | \end{equation}
215 | This subspace is the space of variations $\Delta \bm{x}$ for which $\bm{x} = \bm{x}^* + \Delta \bm{x}$ satisfies the constraint $\bm{h}(\bm{x}) = 0$ up to first order. Therefore, at a local minimum, the first order cost variation $\nabla f(\bm{x}^*)^T \Delta \bm{x}$ is zero for all variations $\Delta \bm{x}$ in this space. 
216 | 
217 | Given this informal understanding, we may now precisely state the necessary conditions for optimality in constrained optimization. 
218 | 
219 | \begin{theorem}[NOC for equality constrained optimization]
220 | \label{thm:eq_con_NOC}
221 | Let $\bm{x}^*$ be a local minimum of $f$ subject to $\bm{h}(\bm{x}) = 0$, and assume that the constraint gradients $\nabla h_1(\bm{x}^*),\ldots,\nabla h_m(\bm{x}^*)$ are linearly independent. Then there exists a unique vector $\bm{\lambda}^* = [\lambda_1^*,\ldots,\lambda_m^*]^T$ called a Lagrange multiplier vector, such that
222 | \begin{equation}
223 |     \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) = 0.
224 | \end{equation}
225 | If in addition $f$ and $\bm{h}$ are $C^2$, we have 
226 | \begin{equation}
227 |     \bm{y}^T (\nabla^2 f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla^2 h_i(\bm{x}^*)) \bm{y} \geq 0, \quad \forall \bm{y} \in V(\bm{x}^*) 
228 | \end{equation}
229 | where
230 | \begin{equation}
231 |     V(\bm{x}^*) = \{\bm{y} \mid \nabla h_i(\bm{x}^*)^T \bm{y} = 0, i=1,\ldots,m\}.
232 | \end{equation}
233 | \end{theorem}
234 | 
235 | \begin{proof}
236 | See \cite{bertsekas2016nonlinear} Section 3.1.1 and 3.1.2.
237 | \end{proof}
238 | 
239 | We will sketch two possible proofs for the NOC for equality constrained optimization. 
240 | 
241 | \paragraph{Penalty approach.} This approach relies on adding a to the cost function a large penalty term for constraint violation. This is the same approach that will be used in proving the necessary conditions for inequality constrained optimization, and is the basis of a variety of practical numerical algorithms. 
242 | 
243 | \paragraph{Elimination approach.} This approach views the constraints as a system of $m$ equations with $n$ unknowns, for which $m$ variables can be expressed in terms of the remaining $m-n$ variables. This reduces the problem to an unconstrained optimization problem. 
244 | 
245 | Note that in theorem \ref{thm:eq_con_NOC}, we assumed the gradients of the constraint functions were linearly independent. A feasible vector for which this holds is called \textit{regular}. If this condition is violated, a Lagrange multiplier for a local minimum may not exist. 
246 | 
247 | For convenience, we will write the necessary conditions in terms of the Lagrangian function $L:\R^{m+n} \to \R$,
248 | \begin{equation}
249 |     L(\bm{x},\bm{\lambda}) = f(\bm{x}) + \sum^m_{i=1} \lambda_i h_i(\bm{x}).
250 | \end{equation}
251 | This function allows the NOC conditions to be succinctly stated as 
252 | \begin{align}
253 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
254 |     \nabla_{\bm{\lambda}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
255 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*) \bm{y} &\geq 0, \quad \forall \bm{y} \in V(\bm{x}^*).
256 | \end{align}
257 | which form a system of $n+m$ equations with $n+m$ unknowns. Given this notation, we can state the sufficient conditions. 
258 | 
259 | \begin{theorem}[SOC for equality constrained optimization]
260 | Assume that $f$ and $\bm{h}$ are $C^2$ and let $\bm{x}^* \in \R^n$ and $\bm{\lambda}^* \in \R^m$ satisfy
261 | \begin{align}
262 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
263 |     \nabla_{\bm{\lambda}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
264 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*) \bm{y} &> 0, \quad \forall \bm{y} \neq 0, \bm{y} \in V(\bm{x}^*).
265 | \end{align}
266 | \end{theorem}
267 | 
268 | \begin{proof}
269 | See \cite{bertsekas2016nonlinear} Section 3.2.
270 | \end{proof}
271 | Note that the SOC does not include regularity of $\bm{x}^*$. 
272 | 
273 | \subsubsection{Inequality Constrained Optimization}
274 | 
275 | We will now address the general case, including inequality constraints,
276 | \begin{equation*}
277 | \begin{aligned}
278 | & \underset{\bm{x}}{\min}
279 | & & f(\bm{x})\\
280 | & \textrm{s.t.} & & h_i(\bm{x}) = 0, \quad i = 1, \ldots, m\\
281 | & & & g_j(\bm{x}) \leq 0, \quad j = 1, \ldots, r
282 | \end{aligned}
283 | \end{equation*}
284 | where $f,h_i,g_i$ are $C^1$. The key intuition for the case of  inequality constraints is based on realizing that for any feasible point, some subset of the constraints will be active (for which $g_j(\bm{x}) = 0$), while the complement of this set will be inactive. We define the active set of inequality constraints, which we denote 
285 | \begin{equation}
286 |     A(\bm{x}) = \{j \mid g_j(\bm{x}) = 0\}.
287 | \end{equation}
288 | A constraint is active at $\bm{x}$ if it is in $A(\bm{x})$, otherwise it is inactive. Note that if $\bm{x}^*$i is a local minimum of the inequality constrained problem, then $\bm{x}^*$ is a local minimum of the identical problem with the inactive constraints removed. Moreover, at this local minimum, the constraints may be treated as equality constraints. Thus, if $\bm{x}^*$ is regular, there exists Lagrange multipliers $\lambda_1^*, \ldots, \lambda_m^*$ and $\mu_j^*, j \in A(\bm{x}^*)$ such that 
289 | \begin{equation}
290 | \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) + \sum_{j \in A(\bm{x}^*)} \mu_j^* \nabla g_j(\bm{x}^*)= 0.    
291 | \end{equation}
292 | We will define the Lagrangian
293 | \begin{equation}
294 |     L(\bm{x},\bm{\lambda}, \bm{\mu}) = f(\bm{x}) + \sum^m_{i=1} \lambda_i h_i(\bm{x}) + \sum_{j =1}^r \mu_j  g_j(\bm{x}),
295 | \end{equation}
296 | which we will use to state the necessary and sufficient conditions. 
297 | 
298 | \begin{theorem}[Karush-Kuhn-Tucker NOC]
299 | Let $\bm{x}^*$ be a local minimum for the inequality constrained problem where $f, h_i, g_j$ are $C^1$ and assume $\bm{x}^*$ is regular (equality and active inequality constraint gradients are linearly independent). Then, there exists unique Lagrange multiplier vectors $\bm{\lambda}^*$ and $\bm{\mu}^*$ such that
300 | \begin{align}
301 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*, \bm{\mu}^*) &= 0\\
302 |     \bm{\mu} &\geq 0\\
303 |     \mu_j^* &= 0, \quad \forall j \notin A(\bm{x}^*)
304 | \end{align}
305 | If in addition, $f,\bm{h},\bm{g}$ are $C^2$, we have 
306 | \begin{equation}
307 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*, \bm{\mu}^*) \bm{y} \geq 0 
308 | \end{equation}
309 | for all $\bm{y}$ such that 
310 | \begin{align}
311 |     \nabla h_i(\bm{x}^*)^T \bm{y} &=0, \quad i = 1, \ldots, m\\
312 |     \nabla g_j(\bm{x}^*)^T \bm{y} &=0, \quad j \in A(\bm{x}^*)
313 | \end{align}
314 | \end{theorem}
315 | 
316 | \begin{proof}
317 | See \cite{bertsekas2016nonlinear} Section 3.3.1.
318 | \end{proof}
319 | 
320 | The SOC are obtained similarly to the equality constrained case. 
321 | 
322 | % should add in statement of KKT SOC for completeness
323 | 
324 | % should we add section on convex optimization from recitation?
325 | 
326 | \subsection{Further Reading}
327 | 
328 | In this section we have addressed the necessary and sufficient conditions for constrained and unconstrained nonlinear optimization. This section is based heavily on \cite{bertsekas2016nonlinear}, and we refer the reader to this book for further details. We have avoided discussing linear programming, which is itself a large topic of study, about which many books have been written (we refer the reader to \cite{bertsimas1997introduction} as a good reference on the subject). 
329 | 
330 | Convex optimization has become a powerful and widespread tool in modern optimal control. While we have only addressed it briefly here, \cite{boyd2004convex} offers a fairly comprehensive treatment of the theory and practice of convex optimization. For a succinct overview with a focus on machine learning, we refer the reader to \cite{kolter2008convex}.
331 | 


--------------------------------------------------------------------------------
/2019/tex/source/ch3.tex:
--------------------------------------------------------------------------------
  1 | \section{The HJB and HJI Equations}
  2 | 
  3 | In this section, we will extend the ideas of dynamic programming to the continuous time setting. Restating the continuous time optimal control problem, we assume dynamics
  4 | \begin{equation}
  5 |     \stdot(t) = \f(\st(t),\ac(t),t)
  6 | \end{equation}
  7 | and cost
  8 | \begin{equation}
  9 |     \J(\st(0)) = \cost_f(\st(t_f),t_f) + \int_0^{t_f} \cost(\st(\tau),\ac(\tau),\tau) d\tau.
 10 | \end{equation}
 11 | where $t_f$ is fixed. 
 12 | 
 13 | \subsection{The Principle of Optimality in Continuous Time}
 14 | 
 15 | \subsubsection{Hamilton-Jacobi-Bellman}
 16 | 
 17 | As in the discrete time principle of optimality, consider the tail problem
 18 | \begin{equation}
 19 |     \J(\st(t),\{\ac(\tau)\}_{\tau=t}^{t_f},t) = \cost_f(\st(t_f),t_f) + \int_t^{t_f} \cost(\st(\tau),\ac(\tau),\tau) d\tau
 20 | \end{equation}
 21 | where $t\leq t_f$ and $\st(t)$ is an admissible state value. The optimal solution to this tail problem comes from the functional minimization
 22 | \begin{equation}
 23 |     \J^*(\st(t),t) = \min_{\{\ac(\tau)\}_{\tau=t}^{t_f}} \left\{ \cost_f(\st(t_f),t_f) + \int_t^{t_f} \cost(\st(\tau),\ac(\tau),\tau) d\tau\right\}.
 24 | \end{equation}
 25 | Note, then, that due to the additivity of cost we can split the problem up over time,
 26 | \begin{equation}
 27 |     \J^*(\st(t),t) = \min_{\{\ac(\tau)\}_{\tau=t}^{t_f}} \left\{ \int_t^{t+\Delta t} \cost(\st(\tau),\ac(\tau),\tau) d\tau + \cost_f(\st(t_f),t_f) + \int_{t+\Delta t}^{t_f} \cost(\st(\tau),\ac(\tau),\tau) d\tau\right\}
 28 | \end{equation}    
 29 | which by applying the principle of optimality to the tail cost,
 30 | \begin{equation}
 31 |      \J^*(\st(t),t) = \min_{\{\ac(\tau)\}_{\tau=t}^{t + \Delta t}} \left\{ \int_t^{t+\Delta t} \cost(\st(\tau),\ac(\tau),\tau) d\tau + \J^*(\st(t + \Delta t), t + \Delta t)\right\}.
 32 | \end{equation}
 33 | Let $J^*_t(\st(t),t) = \nabla_t J^* (\st(t),t)$ and $J^*_{\st}(\st(t),t) = \nabla_{\st} J^* (\st(t),t)$. Taylor expanding, we have 
 34 | \begin{align}
 35 | \J^*(\st(t),t) = \min_{\{\ac(\tau)\}_{\tau=t}^{t + \Delta t}} \huge{\{} &\cost(\st(t),\ac(t),t) \Delta t + \J^*(\st(t),t) + (\J_t^*(\st(t),t)) \Delta t \\
 36 | &+ (\J_{\st}^*(\st(t),t))^T (\st(t+\Delta t) - \st(t))  + o(\Delta t) \huge{\}} \nonumber
 37 | \end{align}
 38 | for small $\Delta t$. The first term is a result of Taylor expanding the integral and applying the fundamental theorem of calculus. Note that we can pull $\J^*(\st(t),t)$ out of the minimization over cost, as this quantity will not vary under different choices of future actions. Dividing through by $\Delta t$ and taking the limit $\Delta t \to 0$, we obtain the \textit{Hamilton-Jacobi-Bellman} equation
 39 | \begin{equation}
 40 |     0 = \J^*_t(\st(t),t) + \min_{\ac(t)} \left\{ \cost(\st(t),\ac(t),t) + (\J_{\st}^*(\st(t),t))^T \f(\st(t),\ac(t),t) \right\}
 41 | \end{equation}
 42 | with terminal condition
 43 | \begin{equation}
 44 |     \J^*(\st(t_f),t_f) = \cost_f(\st(t_f),t_f).
 45 | \end{equation}
 46 | % need to talk more about what this equation is
 47 | For convenience, we will define the Hamiltonian 
 48 | \begin{equation}
 49 |     \ham(\st(t),\ac(t),\J^*_{\st},t) \vcentcolon= \cost(\st(t),\ac(t),t) + (\J_{\st}^*(\st(t),t))^T \f(\st(t),\ac(t),t)
 50 | \end{equation}
 51 | which allow us to compactly write the HJB equation as 
 52 | \begin{equation}
 53 |     0 = \J^*_t(\st(t),t) + \min_{\ac(t)} \left\{ \ham(\st(t),\ac(t),\J^*_{\st},t) \right\}.
 54 | \end{equation}
 55 | 
 56 | The HJB equation is a partial differential equation that, for cost-to-go $J^*(\st(t),t)$, will satisfy all time-state pairs $(\st(t),t)$. The previous informal derivation assumed differentiability of $J^*(\st(t),t)$, which we do not know a priori. This assumption is rectified by the following theorem on solutions to the HJB equation. 
 57 | 
 58 | \begin{theorem}[Sufficiency Theorem]
 59 | Suppose $V(\st,t)$ is a solution to the HJB equation, that $V(\st,t)$ is $C^1$ in $\st$ and $t$, and that
 60 | \begin{align*}
 61 |     0 &= V_t(\st,t) + \min_{\ac \in \mathcal{U}} \left\{ \cost(\st,\ac,t) + (V_{\st}(\st,t))^T \f(\st,\ac,t) \right\}\\
 62 |     V(\st,t_f) &= \cost_f(\st,t_f)\,\, \forall \, \st
 63 | \end{align*}
 64 | Suppose also that $\pi^*(\st,t)$ attains the minimum in this equation for all $t$ and $\st$. Let $\{\st^*(t) \mid t \in [t_0, t_f]\}$ be the state trajectory obtained from the given initial condition $\st(0)$ when the control trajectory $\ac^*(t) = \pi^*(\st^*(t),t), t \in [t_0, t_f]$ is used. Then $V$ is equal to the optimal cost-to-go function, i.e.,
 65 | \begin{equation}
 66 |     V(\st,t) = J^*(\st,t)\,\, \forall\, \st, t.
 67 | \end{equation}
 68 | Furthermore, the control trajectory $\{\ac^*(t)\mid t \in [t_0, t_f]\}$ is optimal..
 69 | \end{theorem}
 70 | 
 71 | \begin{proof}
 72 | \cite{bertsekas1995dynamic}, Volume 1, Section 7.2.
 73 | \end{proof}
 74 | 
 75 | \subsubsection{Continuous-Time LQR}
 76 | 
 77 | As a useful result of the HJB equations, we will derive LQR in continuous time. We aim to minimize 
 78 | \begin{equation}
 79 |     \J(\st(0)) = \frac{1}{2} \st^T(t_f) Q_f \st(t_f) + \frac{1}{2} \int_0^{t_f} \st^T(t) Q(t) \st(t) + \ac^T(t) R(t) \ac(t) dt
 80 | \end{equation}
 81 | subject to dynamics
 82 | \begin{equation}
 83 |     \stdot(t) = A(t) \st(t) + B(t) \ac(t).
 84 | \end{equation}
 85 | As in discrete LQR, we will assume $Q_f, Q(t)$ are positive semidefinite, and $R(t)$ is positive definite. We will also assume $t_f$ is fixed, and the state and action are unconstrained. 
 86 | 
 87 | We will write the Hamiltonian, 
 88 | \begin{equation}
 89 |     \ham = \frac{1}{2} \st^T(t) Q(t) \st(t) + \frac{1}{2} \ac^T(t) R(t) \ac(t) + \J^*_{\st}(\st(t),t)^T (A(t) \st(t) + B(t) \ac(t))
 90 | \end{equation}
 91 | which yields necessary optimality conditions 
 92 | \begin{equation}
 93 |     0 = \nabla_{\ac} \ham = R(t) \ac(t) + B^T(t) \J^*_{\st}(\st(t),t).
 94 | \end{equation}
 95 | Since $\nabla_{\ac \ac}^2 \ham = R(t) > 0$, the control that satisfies the necessary conditions is the global minimizer. Rearranging, we have
 96 | \begin{equation}
 97 |     \ac^*(t) = - R^{-1}(t) B^T(t) \J^*_{\st}(\st(t),t)
 98 | \end{equation}
 99 | which we can plug back into the Hamiltonian to yield
100 | \begin{align}
101 |     \ham &= \frac{1}{2} \st^T(t) Q(t) \st(t) + \frac{1}{2} \J^*_{\st}(\st(t),t)^T B(t) R^{-1}(t) B^T(t) \J^*_{\st}(\st(t),t)\\
102 |      &\qquad + \J^*_{\st}(\st(t),t)^T A(t) \st(t) - \J^*_{\st}(\st(t),t)^T B(t) R^{-1}(t) B^T(t) \J^*_{\st}(\st(t),t)\nonumber\\
103 |      &= \frac{1}{2} \st^T(t) Q(t) \st(t) - \frac{1}{2} \J^*_{\st}(\st(t),t)^T B(t) R^{-1}(t) B^T(t) \J^*_{\st}(\st(t),t) + \J^*_{\st}(\st(t),t)^T A(t) \st(t).
104 | \end{align}
105 | This gives the HJB equation
106 | \begin{align}
107 |     0 &= \J_t^*(\st(t),t) + \frac{1}{2} \st^T(t) Q(t) \st(t) - \frac{1}{2} \J^*_{\st}(\st(t),t)^T B(t) R^{-1}(t) B^T(t) \J^*_{\st}(\st(t),t)\\
108 |     &\qquad + \J^*_{\st}(\st(t),t)^T A(t) \st(t)\nonumber
109 | \end{align}
110 | with boundary condition 
111 | \begin{equation}
112 |     \J^*(\st(t_f),t_f) = \frac{1}{2} \st^T(t_f) Q_f \st(t_f).
113 | \end{equation}
114 | It may appear as if we are stuck here, as this form of the HJB doesn't immediately yield $J^*(\st(t),t)$. Armed with the knowledge that the discrete time LQR problem has a quadratic cost-to-go, we will cross our fingers and guess a solution of the form
115 | \begin{equation}
116 |     J^*(\st(t),t) = \frac{1}{2} \st^T(t) V(t) \st(t).
117 | \end{equation}
118 | Substituting, we have
119 | \begin{align}
120 |     0 &= \frac{1}{2} \st^T(t) \dot{V}(t) \st(t) + \frac{1}{2} \st^T(t) Q(t) \st(t)\\ 
121 |     &\qquad- \frac{1}{2} \st^T(t) V(t) B(t) R^{-1}(t) B^T(t) V(t) \st(t) + \st^T(t) V(t) A(t) \st(t)\nonumber
122 | \end{align}
123 | Note that we will decompose
124 | \begin{equation}
125 |     \st^T(t) V(t) A(t) \st(t) = \frac{1}{2} \st^T(t) V(t) A(t) \st(t) + \frac{1}{2} \st^T(t) A^T(t) V(t) \st(t)
126 | \end{equation}
127 | which yields
128 | \begin{align}
129 |     0 &= \frac{1}{2} \st^T(t) \left(\dot{V}(t) + Q(t) - V(t) B(t) R^{-1}(t) B^T(t) V(t) + V(t) A(t) + A^T(t) V(t)\right) \st(t).
130 | \end{align}
131 | This equation must hold for all $\st(t)$, so 
132 | \begin{equation}
133 |     -\dot{V}(t) = Q(t) - V(t) B(t) R^{-1}(t) B^T(t) V(t) + V(t) A(t) + A^T(t) V(t)
134 | \end{equation}
135 | with boundary condition $V(t_f) = Q_f$.
136 | 
137 | Therefore, the HJB PDE has been reduced to a set of matrix ordinary differential equations (the Riccati equation). This is integrated backwards in time to find the full control policy as a function of time. One we have found $V(t)$, the control policy is
138 | \begin{equation}
139 |     \ac^*(t) = - R^{-1}(t) B^T(t) V(t) \st(t).
140 | \end{equation}
141 | Similarly to the discrete case, the feedback gains tend toward constant in the limit of the infinite horizon problem, under some technical assumptions.
142 | 
143 | \subsection{Differential Games}
144 | 
145 | We have so far addressed the case in which we aim to solve the optimal control problem for a single agent. We will now consider an adversarial game setting, in which there exists another player that aims to maximally harm the first agent. In particular, we will consider zero sum games in which the second agent aims to maximize the cost of the first agent. While the differential game setting is not restricted to this case --- agents may have separate cost functions that partially interfere or aid each other --- the zero-sum case lends itself to useful analytical tools. 
146 | 
147 | \subsubsection{Differential Games and Information Patterns}
148 | 
149 | We consider the two player differential game with dynamics
150 | \begin{equation}
151 |     \stdot(t) = \f(\st(t),\ac(t),\ad(t))
152 | \end{equation}
153 | where the first player takes action $\ac(t)$ at time $t$, and the second player takes action $\ad(t)$. The state $\st(t)$ is the joint state of both players. We write the cost as 
154 | \begin{equation}
155 |     \J(\st(t)) = \cost_f(\st(0)) + \int_{t}^{0} \cost(\st(\tau),\ac(\tau),\ad(\tau)) d\tau
156 | \end{equation}
157 | which the first agent aims to maximize, and the second agent aims to minimize. 
158 | 
159 | To fully specify the differential game, we must specify what each agent knows, and when. This is referred to as the \textit{information pattern} of the game. In addition to capturing the knowledge of the state available to each agent, the information pattern also captures the knowledge of each other agents' strategies available to each agent. 
160 | 
161 | % TODO information pattern
162 | 
163 | \subsubsection{Hamilton-Jacobi-Isaacs}
164 | 
165 | The key idea in building the multi-agent equivalent of the HJB equation will again be to apply the principle of optimality. We consider the information pattern in which the adversary has access to the instantaneous control action of the first agent, so the cost takes the form
166 | \begin{equation}
167 |     \J(\st(t),t) = \min_{\Gamma(\ac)(\cdot)}\,\max_{\ac(\cdot)} \left\{ \int_t^0 \cost(\st(\tau), \ac(\tau),\ad(\tau)) d\tau + \cost_f(\st(0)) \right\}.
168 | \end{equation}
169 | Applying the dynamic programming principle, we have 
170 | \begin{equation}
171 |     \J(\st(t),t) = \min_{\Gamma(\ac)(\cdot)}\,\max_{\ac(\cdot)} \left\{ \int_t^{t+\Delta t} \cost(\st(\tau), \ac(\tau),\ad(\tau)) d\tau + \J(\st(t+\Delta t),t+\Delta t)\right\}.
172 | \end{equation}
173 | We can take the same strategy as with the informal derivation of the HJB equation, and Taylor expand both terms to yield
174 | \begin{align}
175 |     \J(\st(t),t) =& \min_{\Gamma(\ac)(\cdot)}\,\max_{\ac(\cdot)} {\large\{} \cost(\st(\tau), \ac(\tau),\ad(\tau))\Delta t + \J(\st(t),t)\\
176 |     &\qquad + (\J_{\st}(\st(t),t))^T \f(\st(t),\ac(t),\ad(t)) \Delta t + \J_t(\st(t),t) \Delta t {\large\}}. \nonumber
177 | \end{align}
178 | Note that we are optimizing over instantaneous actions, and so we optimizing over finite dimensional quantities as opposed to functions. Dividing through by $\Delta t$ and removing redundant terms, we get the \textit{Hamilton-Jacobi-Isaacs} (HJI) equation
179 | \begin{equation}
180 |     0 = \J_t(\st,t) + \max_{\ac} \min_{\ad} \left\{ \cost(\st,\ac,\ad) + (\J_{\st}(\st,\ac,\ad))^T \f(\st,\ad,\ac) \right\}
181 | \end{equation}
182 | with boundary condition
183 | \begin{equation}
184 |     \J(\st,0) = \cost_f(\st).
185 | \end{equation}
186 | Note that we have switched the order of the min/max.
187 | 
188 | \subsubsection{Reachability}
189 | 
190 | Differential games have applications in multi-agent modeling (both in the context of autonomous systems engineering and, e.g., economics and operations research). One concrete application in engineering is reachability analysis. In this setting, an agent aims to compute the set of states in which there exists a policy that either avoids a target set or enters a target set, subject to adversarial disturbances. The former case, in which we would like to avoid a target set, is useful for safety verification. If we are able to, even in the worst case, guarantee e.g. collision avoidance, we have guarantees on safety (subject of course to our system assumptions). The latter case is useful for task satisfaction. For example, we would like a quadrotor to reach a set of safe hovering poses, even under adversarial disturbances. Finding the backward reachable set in this case would find all states such that there exists a policy that succeeds in reaching the target set. 
191 | 
192 | More concretely, the first case aims to find a set 
193 | \begin{equation}
194 |     \mathcal{A}(t) = \{\bar{\st} : \exists \Gamma(\ac)(\cdot), \forall \ac(\cdot), \stdot = \f(\st,\ac,\ad), \st(t) = \bar{\st}, \st(0) \in \mathcal{T}\}
195 | \end{equation}
196 | where $\mathcal{T}$ is the unsafe set which we aim to avoid. Breaking this down, $\mathcal{A}(t)$ is the set of states at time $t$ such that there exists $\Gamma(\ac)$ that maps action $\ac$ to a disturbance such that, following the dynamics induced by the disturbance and the action sequence, the state is in $\mathcal{T}$ at time $0$ (note that we are considering $t \leq 0$).
197 | 
198 | The second case aims to find a set 
199 | \begin{equation}
200 | \mathcal{R}(t) = \{\bar{\st} : \forall \Gamma(\ac)(\cdot), \exists \ac(\cdot), \stdot = \f(\st,\ac,\ad), \st(t) = \bar{\st}, \st(0) \in \mathcal{T}\},
201 | \end{equation}
202 | where in this case $\mathcal{T}$ is the set that we wish to reach. In this setting, we wish to find all states that, no matter what strategy the disturbance takes, there exist control actions that can steer the system to the goal state. Because the disturbance is adversarial (we reason over all adversary strategies), this is an extremely conservative form of safety analysis. 
203 | 
204 | Computation of the backward reachable set results from solving a differential \textit{game of kind} in which the outcome is Boolean (i.e. whether or not $\st(0) \in \mathcal{T}$). This boolean outcome can be encoded by removing the running cost and choosing a particular form for the final cost. In particular, we can choose a final cost where
205 | \begin{equation}
206 |     \st \in \mathcal{T} \iff \cost_f(\st) \leq 0.
207 | \end{equation}
208 | As a result, the agent should aim to maximize $\cost_f$ to avoid $\mathcal{T}$, whereas the disturbance should aim to minimize it. The two settings then take the following forms:
209 | \begin{itemize}
210 |     \item Set avoidance: $J(\st,t) = \min_{\Gamma(\ac)} \max_{\ac} \cost_f(\st(0))$
211 |     \item Set reaching: $J(\st,t) = \max_{\Gamma(\ac)} \min_{\ac} \cost_f(\st(0))$
212 | \end{itemize}
213 | 
214 | \paragraph{Sets vs. Tubes.} We have so far considered avoidance or reachability problems for which we care about set membership at time $t=0$. However, for something like collision avoidance, we would like to stay collision free at every time as opposed to a particular time. \textit{Backward reachable sets} capture the case in which only the final time set membership matters, and states for times $t<0$ do not matter. \textit{Backward reachable tubes} capture the entire time duration of the problem. Any state that passes through the target at any time in the problem duration is included. This yields a modified value function of the form
215 | \begin{equation}
216 |     \J(\st,t) = \min_{\Gamma(\ac)} \max_{\ac} \min_{\tau \in [t,0]} \cost_f(\st(\tau)).
217 | \end{equation}
218 | If the target set membership holds at any time $\tau'$, then $\min_{\tau\in[t,0]} \cost_f(\st(\tau)) \leq \cost_f(\st(\tau')) \leq 0$.
219 | 
220 | \subsection{Further Reading}
221 | 
222 | Our coverage of reachability analysis is based on the \cite{mitchell2005time}, which is an important early work in the field, in addition to being a relatively comprehensive coverage of the method. For a review of differential games with a (slight) emphasis on economics and management science, we refer the reader to \cite{bressan2010noncooperative}. For a review of HJB and continuous time LQR, we refer the reader to \cite{bertsekas1995dynamic} and \cite{kirk2012optimal}.
223 | 
224 | %applications of differential games in OR and econ -- tutorial from OR


--------------------------------------------------------------------------------
/2019/tex/source/ch4.tex:
--------------------------------------------------------------------------------
  1 | \section{Indirect Methods}
  2 | 
  3 | \subsection{Calculus of Variations}
  4 | 
  5 | We will begin by restating the optimal control problem. We will to find an admissible control sequence $\ac^*$ which causes the system 
  6 | \begin{equation}
  7 |     \stdot = \f(\st(t),\ac(t),t)
  8 | \end{equation}
  9 | to follow an \textit{admissible} trajectory $\st^*$ that minimizes the functional 
 10 | \begin{equation}
 11 |     \J = \cost_f(\st(t_f),t_f) + \int_{t_0}^{t_f} \cost(\st(t),\ac(t),t) dt.
 12 | \end{equation}
 13 | To find the minima of functions of a finite number of real numbers, we rely on the first order optimality conditions to find candidate minima, and use higher order derivatives to determine whether a point is a local minimum. Because we are minimizing a function that maps from some $n$ dimensional space to a scalar, candidate points have zero gradient in each of these dimensions. However, in the optimal control problem, we have a cost \textit{functional}, which maps functions to scalars. This is immediately problematic for our first order conditions --- we are required to check the necessary condition at infinite points. The necessary notion of optimality conditions for functionals is provided by calculus of variations.
 14 | 
 15 | Concretely, we define a functional $\J$ as a rule of correspondence assigning each function $\st$ in a class $\Omega$ (the domain) to a unique real number. The functional $\J$ is linear if and only if 
 16 | \begin{equation}
 17 |     \J(\alpha_1 \st_1 + \alpha_2 \st_2) = \alpha_1 \J(\st_1) + \alpha_2 \J(\st_2)
 18 | \end{equation}
 19 | for all $\st_1, \st_2, \alpha_1 \st_1 + \alpha_2 \st_2$ in $\Omega$. We must now define a notion of ``closeness'' for functions. Intuitively, two points being close together has an immediate geometric interpretation. We first define the norm of a function. The norm of a function is a rule of correspondence that assigns each $\st \in \Omega$, defined over $t \in [t_0,t_f]$, a real number. The norm of $\st$, which we denote $\|\st\|$, satisfies:
 20 | \begin{enumerate}
 21 |     \item $\|\st\| \geq 0$, and $\|\st\|=0$ iff $\st(t) = 0$ for all $t \in [t_0,t_f]$
 22 |     \item $\|\alpha \st\| = |\alpha| \|\st\|$ for all real numbers $\alpha$
 23 |     \item $\|\st_1 + \st_2\| \leq \|\st_1\| + \|\st_2\|$.
 24 | \end{enumerate}
 25 | To compare the closeness of two functions $\bm{y}, \bm{z}$, we let $\st(t) = \bm{y}(t) - \bm{z}(t)$. Thus, for two identical functions, $\|\st\|$ is zero. Generally, a norm will be small for ``close'' functions, and large for ``far apart'' functions. However, there exist many possible definitions of norms that satisfy the above conditions. 
 26 | 
 27 | \subsubsection{Extrema for Functionals}
 28 | 
 29 | A functional $\J$ with domain $\Omega$ has a local minimum at $\st^* \in \Omega$ if there exists an $\epsilon > 0$ such that $\J(\st) \geq \J(\st^*)$ for all $\st \in \Omega$ such that $\|\st - \st^*\| < \epsilon$. Maxima are defined similarly, just with $\J(\st) \leq \J(\st^*)$. 
 30 | 
 31 | Analogously to optimization of functions, we define the variation of the functional as
 32 | \begin{equation}
 33 |     \Delta \J(\st,\delta \st) \vcentcolon= \J(\st + \delta \st) - \J(\st)
 34 | \end{equation}
 35 | where $\delta \st(t)$ is the \textit{variation} of $\st(t)$. The increment of a functional can be written as 
 36 | \begin{equation}
 37 |     \Delta \J(\st,\delta \st) = \delta \J(\st,\delta \st) + g(\st,\delta \st) \|\delta \st\|
 38 | \end{equation}
 39 | where $\delta \J$ is linear in $\delta \st$. If 
 40 | \begin{equation}
 41 |     \lim_{\|\delta \st \| \to 0} \{g(\st,\delta \st)\} = 0
 42 | \end{equation}
 43 | then $\J$ is said to be differentiable on $\st$ and $\delta \J$ is the variation of $\J$ at $\st$. 
 44 | We can now state the \textit{fundamental theorem of the calculus of variations}. 
 45 | 
 46 | \begin{theorem}[Fundamental Theorem of CoV]
 47 | Let $\st(t)$ be a vector function of $t$ in the class $\Omega$, and $\J(\st)$ be a differentiable functional of $\st$. Assume that the functions in $\Omega$ are not constrained by any boundaries. If $\st^*$ is an extremal, the variation of $\J$ must vanish at $\st^*$, that is $\delta \J(\st^*, \delta \st) = 0$ for all admissible $\delta \st$ (i.e. such that $\st + \delta \st \in \Omega$).
 48 | \end{theorem}
 49 | 
 50 | \begin{proof}
 51 | \cite{kirk2012optimal}, Section 4.1.
 52 | \end{proof}
 53 | 
 54 | We will now look at how calculus of variations may be leveraged to approach practical problems. Let $x$ be a scalar continuous function in $C^1$. We would like to find a function $x^*$ for which the functional 
 55 | \begin{equation}
 56 |     \J(s) = \int_{t_0}^{t_f} g(x(t),x(t),t) dt
 57 | \end{equation}
 58 | has a relative extremum. We will assume $g \in C^2$, that $t_0,t_f$ are fixed, and $x_0, x_f$ are fixed. Let $\st$ be any curve in $\Omega$, and we will write the variation $\delta\J$ from the increment
 59 | \begin{align}
 60 |     \Delta \J(x,\delta x) &= \J(x + \delta x) - \J(x)\\
 61 |     &= \int_{t_0}^{t_f} g(x + \delta x, \dot{x} + \delta \dot{x}, t) dt - \int_{t_0}^{t_f} g(x,\dot{x},t) dt\\
 62 |     &= \int_{t_0}^{t_f} g(x + \delta x, \dot{x} + \delta \dot{x}, t) - g(x,\dot{x},t) dt.
 63 | \end{align}
 64 | Expanding via Taylor series, we get
 65 | \begin{equation}
 66 |     \Delta J(x,\delta x) = \int_{t_0}^{t_f} g(x,\dot{x},t) + \underbrace{\frac{\partial g}{\partial x}}_{g_{x}} (x,\dot{x}, t) \delta x + \underbrace{\frac{\partial g}{\partial \dot{x}}}_{g_{\dot{x}}} (x,\dot{x}, t) \delta \dot{x} + o(\delta x, \delta \dot{x}) - g(x,\dot{x},t) dt
 67 | \end{equation}
 68 | which yields the variation
 69 | \begin{equation}
 70 |     \delta \J = \int_{t_0}^{t_f} g_{x}(x,\dot{x},t) \delta x + g_{\dot{x}}(x,\dot{x},t)\delta \dot{x} \,\,dt.
 71 | \end{equation}
 72 | Integrating by parts, we have 
 73 | \begin{equation}
 74 |     \delta \J = \int_{t_0}^{t_f} \left[ g_{x}(x,\dot{x},t) - \frac{d}{dt} g_{\dot{x}}(x,\dot{x},t)\right] \delta x \delta t + [g_{\dot{x}}(x,\dot{x},t)\delta x(t)]_{t_0}^{t_f}.
 75 | \end{equation}
 76 | We have assumed $x(t_0), x(t_f)$ given, and thus $\delta x(t_0) = 0$, $\delta x(t_f) = 0$. Considering an extremal curve, applying the CoV theorem yields
 77 | \begin{equation}
 78 |     \int_{t_0}^{t_f} \left[ g_{x}(x,\dot{x},t) - \frac{d}{dt} g_{\dot{x}}(x,\dot{x},t)\right] \delta x \delta t.
 79 |     \label{eq:euler_int}
 80 | \end{equation}
 81 | We can now state the fundamental lemma of CoV. We will state it for vector functions, although our derivation was for the scalar case. 
 82 | 
 83 | \begin{lemma}[Fundamental Lemma of CoV]
 84 | If a function $h$ is continuous and 
 85 | \begin{equation}
 86 |     \int_{t_0}^{t_f} h(t) \delta \st(t) dt = 0
 87 | \end{equation}
 88 | for every function $\delta \st$ that is continuous in the interval $[t_0,t_f]$, then $h$ must be zero everywhere in the interval $[t_0,t_f]$.
 89 | \end{lemma}
 90 | 
 91 | \begin{proof}
 92 | \cite{kirk2012optimal}, Section 4.2.
 93 | \end{proof}
 94 | 
 95 | Applying the fundamental lemma, we find that a necessary condition for $\st^*$ being an extremal is 
 96 | \begin{equation}
 97 |     g_{\st}(\st,\stdot,t) - \frac{d}{dt} g_{\stdot}(\st,\stdot,t) = 0
 98 | \end{equation}
 99 | for all $t \in [t_0, t_f]$, which is the \textit{Euler equation}. This is a nonlinear, time-varying second-order ordinary differential equation with split boundary conditions (at $\st(t_0)$ and $\st(t_f)$).
100 | 
101 | \subsubsection{Generalized Boundary Conditions}
102 | 
103 | In the previous subsection, we assumed that $t_0, t_f, \st(t_0), \st(t_f)$ were all given. We will now relax that assumption. In particular, $t_f$ may be fixed or free, and each component of $\st(t_f)$ may be fixed or free. 
104 | 
105 | We begin by writing the variation around $\st^*$
106 | \begin{align}
107 | \delta \J &= \left[ g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta \st(t_f) + 
108 | \left[ g(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta t_f\\
109 | & \qquad + \int_{t_0}^{t_f} \left[ g_{\st}(\st^*,\stdot^*,t) - \frac{d}{dt} g_{\stdot}(\st^*,\stdot^*,t)\right]^T \delta \st \delta t \nonumber
110 | \end{align}
111 | by using the same integration by parts approach as before. Note that for fixed $t_f$ and $\st(t_f)$, the variations $\delta t_f$ and $\delta \st(t_f)$ vanish, and so we are left with (\ref{eq:euler_int}). Because $\delta t_f$ and $\delta \st(t_f)$ do not vanish in this case, we are left with additional boundary conditions that must be satisfied. Note that 
112 | \begin{equation}
113 |     \delta \st_f = \delta \st(t_f) + \stdot^*(t_f) \delta t_f
114 | \end{equation}
115 | and substituting this, we have
116 | \begin{align}
117 | \delta \J &= \left[ g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta \st_f + 
118 | \left[ g(\st^*(t_f),\stdot^*(t_f),t_f) - g^T_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \stdot^*(t_f) \right] \delta t_f\\
119 | & \qquad + \int_{t_0}^{t_f} \left[ g_{\st}(\st^*,\stdot^*,t) - \frac{d}{dt} g_{\stdot}(\st^*,\stdot^*,t)\right] \delta \st \delta t \nonumber.
120 | \end{align}
121 | Stationarity of this variation thus requires 
122 | \begin{equation}
123 |     g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) = 0
124 | \end{equation} 
125 | if $\st_f$ is free, and 
126 | \begin{equation}
127 |     g(\st^*(t_f),\stdot^*(t_f),t_f) - g^T_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \stdot^*(t_f) = 0
128 | \end{equation}
129 | if $t_f$ is free, in addition to the Euler equation being satisfied. For a complete reference on the boundary conditions associated with a variety of problem specifications, we refer the reader to Section 4.3 of \cite{kirk2012optimal}.
130 | 
131 | % TODO add table
132 | 
133 | \subsubsection{Constrained Extrema}
134 | 
135 | Previously, we have not considered constraints in the variational problem. However, constraints (and in particular, dynamics constraints) are central to most optimal control problems. Let $\bm{w} \in \R^{n+m}$ be a vector function in $C^1$. As previously, we would like to find a function $\bm{w}^*$ for which the functional 
136 | \begin{equation}
137 |     \J(\bm{w}) = \int_{t_0}^{t_f} g(\bm{w}(t),\dot{\bm{w}}(t),t) dt
138 | \end{equation}
139 | has a relative extremum, although we additionally introduce the constraints 
140 | \begin{equation}
141 |     \f_i(\bm{w}(t), \dot{\bm{w}}(t),t) = 0, \quad i = 1, \ldots, n.
142 | \end{equation}
143 | We will again assume $g \in C^2$ and that $t_0, \bm{w}(t_0)$ are fixed. Note that as a result of these $n$ constraints, only $m$ of the $n+m$ components of $\bm{w}$ are independent. 
144 | 
145 | One approach to solving this constrained problem is re-writing the $n$ dependent components of $\bm{w}$ in terms of the $m$ independent components. However, the nonlinearity of the constraints typically makes this infeasible. Instead, we will turn to Lagrange multipliers. We will write our \textit{augmented functional} as 
146 | \begin{equation}
147 |     \hat{g}(\bm{w}(t),\dot{\bm{w}}(t),\bm{p}(t),t) \vcentcolon = g(\bm{w}(t),\dot{\bm{w}}(t),t) + \bm{p}^T(t) \bm{f}(\bm{w}(t),\dot{\bm{w}}(t),t)
148 | \end{equation}
149 | where $\bm{p}(t)$ are Lagrange multipliers that are functions of time. Based on this, a necessary condition for optimality is 
150 | \begin{equation}
151 |     \hat{g}_{\bm{w}}(\bm{w}^*(t),\dot{\bm{w}}^*(t),\bm{p}^*(t),t) - \frac{d}{dt} \hat{g}_{\dot{\bm{w}}}(\bm{w}^*(t),\dot{\bm{w}}^*(t),\bm{p}^*(t),t) = 0
152 | \end{equation}
153 | with 
154 | \begin{equation}
155 |     \bm{f}(\bm{w}^*(t), \dot{\bm{w}}^*(t),t) = 0.
156 | \end{equation}
157 | 
158 | 
159 | \subsection{Indirect Methods for Optimal Control}
160 | 
161 | Having built the foundations of functional optimization via calculus of variations, we will now derive the necessary conditions for optimal control under the assumption that the admissible controls are not bounded. The problem, as previously stated, is to find an \textit{admissible control} $\ac^*$ which causes the system 
162 | \begin{equation}
163 |     \stdot(t) = \f(\st(t),\ac(t),t)
164 | \end{equation}
165 | to follow an \textit{admissible trajectory} $\st^*$ that minimizes the functional
166 | \begin{equation}
167 |     \J(\ac) = \cost_f(\st(t_f),t_f) + \int_{t_0}^{t_f} \cost(\st(t),\ac(t),t) dt
168 | \end{equation}
169 | under the assumptions that $\cost_f \in C^2$, the state and control are unconstrained, and $t_0, \st(t_0)$ are fixed. We define the \textit{Hamiltonian} as
170 | \begin{equation}
171 |     \ham(\st(t),\ac(t),\cst(t),t) \vcentcolon= \cost(\st(t),\ac(t),t) + \cst^T(t) \f(\st(t),\ac(t),t).
172 | \end{equation}
173 | Then, the necessary conditions are 
174 | \begin{align}
175 | \label{eq:nec_ham_conds1}
176 |     \stdot^*(t) &= \frac{\partial \ham}{\partial \cst}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
177 |     \dot{\cst}^*(t) &= -\frac{\partial \ham}{\partial \st}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
178 |     0 &= \frac{\partial \ham}{\partial \ac}(\st^*(t),\ac^*(t),\cst^*(t),t)
179 |     \label{eq:nec_ham_conds3}
180 | \end{align}
181 | which must hold for all $t \in [t_0,t_f]$. Additionally, the boundary conditions 
182 | \begin{align}
183 | \label{eq:ham_bcs}
184 |     [\frac{\partial \cost_f}{\partial \st}(\st^*(t_f),t_f)& - \cst^*(t_f)]^T \delta \st_f\\
185 |     & + [ \ham(\st^*(t_f),\ac^*(t_f), \cst^*(t_f), t_f) + \frac{\partial \cost_f}{\partial t}(\st^*(t_f),t_f)] \delta t_f = 0 \nonumber
186 | \end{align}
187 | must be satisfied. Note that as in the previous section, they are automatically satisfied if the terminal state and time are fixed. Based on these necessary conditions, we have a set of $2n$ \textit{first-order} differential equations (for the state and co-state), and a set of $m$ algebraic equations (control equations). The solution to the state and co-state equations will contain $2n$ constants of integration. To solve for these constants, we use the initial conditions $\st(t_0) = \st_0$ (of which there are $n$), and an additional $n$ (or $n+1$) equations from the boundary conditions. We are left with a two-point boundary value problem, which are considerably more difficult to solve than initial value problems which can just be integrated forward. For a full review of boundary conditions, we again refer the reader to \cite{kirk2012optimal}.
188 | 
189 | \subsubsection{Proof of the Necessary Conditions}
190 | 
191 | We will now prove the necessary conditions, (\ref{eq:nec_ham_conds1} -- \ref{eq:nec_ham_conds3}), along with the boundary conditions (\ref{eq:ham_bcs}). For simplicity, assume that the terminal cost is zero, and that $t_f, \st(t_f)$ are fixed and given. Consider the augmented cost function 
192 | \begin{equation}
193 | \hat{\cost}(\st(t), \stdot(t), \ac(t), \cst(t),t) \vcentcolon= \cost(\st(t),\ac(t),t) + \cst^T(t) [\f(\st(t),\ac(t),t) - \stdot(t)].
194 | \end{equation}
195 | When the constraint holds, this augmented cost function is exactly equal to the original cost function. The augmented total cost is then
196 | \begin{equation}
197 |     \hat{\J}(\ac) = \int_{t_0}^{t_f} \hat{\cost}(\st(t), \stdot(t), \ac(t), \cst(t),t) dt.
198 | \end{equation}
199 | Applying the fundamental theorem of CoV on an extremal, we have
200 | \begin{align}
201 |     0 = &\delta \hat{\J}(\ac) = \int_{t_0}^{t_f} \left[ \overbrace{\frac{\partial \hat{\cost}}{\partial \st}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)}^{\frac{\partial \cost}{\partial \st}(\st^*(t),  \ac^*(t), t) + \frac{\partial \f}{\partial \st}^T(\st^*(t),  \ac^*(t), t)\cst^*(t)} - \overbrace{\frac{d}{dt} \frac{\partial \hat{\cost}}{\partial \stdot}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)}^{-\frac{d}{dt}(-\cst^*(t))} \right]^T \delta \st(t) \\
202 |     & + \left[ \frac{\partial \hat{\cost}}{\partial \ac}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)\right]^T \delta \ac(t) + \underbrace{\left[\frac{\partial \hat{\cost}}{\partial \cst}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t) \right]^T }_{\f(\st^*(t),\ac^*(t),t) - \stdot^*(t)} \delta \cst(t) dt \nonumber.
203 | \end{align}
204 | Considering each term in sequence, we have:
205 | \begin{itemize}
206 |     \item $\f(\st^*(t),\ac^*(t),t) - \stdot^*(t) = 0$ on an extremal.
207 |     \item The Lagrange multipliers are arbitrary, so we can select them to make the coefficients of $\delta \st(t)$ equal to zero, giving $\dot{\cst}(t) = -\frac{\partial \cost}{\partial \st}(\st^*(t),  \ac^*(t), t) - \frac{\partial \f}{\partial \st}^T(\st^*(t),  \ac^*(t), t)\cst^*(t)$.
208 |     \item The remaining variation $\delta \ac(t)$ is independent, so its coefficient must be zero, thus $ \frac{\partial \cost}{\partial \ac}(\st^*(t),  \ac^*(t), t) + \frac{\partial \f}{\partial \ac}^T(\st^*(t),  \ac^*(t), t)\cst^*(t) = 0$.
209 | \end{itemize}
210 | These conditions exactly give the necessary conditions as previously stated, when recast with the Hamiltonian formalism.
211 | 
212 | \subsection{Pontryagin's Minimum Principle}
213 | 
214 | So far, we have assumed that the admissible controls and states are unconstrained. This assumption is frequently violated for real systems---physical actuators have limits on their realizable outputs, and state constraints may occur due to safety considerations. The control $\ac^*$ causes the functional $\J$ to have a relative minimum if
215 | \begin{equation}
216 |     \J(\ac) - \J(\ac^*) = \Delta \J \geq 0
217 | \end{equation}
218 | for all admissible controls ``close'' to $\ac^*$. Letting $\ac = \ac^* + \delta \ac$, the increment can be expressed as
219 | \begin{equation}
220 |     \Delta \J(\ac^*,\delta\ac) = \delta \J(\ac^*,\delta \ac) + \text{higher order terms}.
221 | \end{equation}
222 | The variation $\delta \ac$ is arbitrary only if the extremal control is strictly within the boundary for all time in the interval $[t_0,t_f]$. In general, however, an extremal control lies on a boundary during at least subinterval in the interval $[t_0,t_f]$.
223 | As a consequence, admissible control variations $\delta \ac$ exist whose negatives are not admissible. This implies that a necessary condition for $\ac^*$ to minimize $\J$ is $\delta \J(\ac^*, \delta \ac) \geq 0$ for all admissible variations with $\|\delta \ac\|$ small enough. The reason why the equality in the fundamental theorem of CoV (in which we explicitly assumed no constraints) is replaced with an inequality is the presence of the control constraints. This result has an analogue in calculus, where the necessary condition for a scalar function $f$ to have a relative minimum at the end point is that the differential $df \geq 0$.
224 | 
225 | Assuming bounded controls $\ac \in \mathcal{U}$, the necessary optimality conditions are 
226 | \begin{align}
227 |     \label{eq:nec_pmp_conds1}
228 |     \stdot^*(t) &= \frac{\partial \ham}{\partial \cst}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
229 |     \dot{\cst}^*(t) &= -\frac{\partial \ham}{\partial \st}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
230 |     \ham(\st^*(t),\ac^*(t), \cst^*(t), t) &\leq \ham(\st^*(t),\ac(t), \cst^*(t), t) \,\,\forall \ac \in \mathcal{U}
231 |     \label{eq:nec_pmp_conds3}
232 | \end{align}
233 | along with the boundary conditions
234 | \begin{align}
235 | \label{eq:ham_bcs}
236 |     [\frac{\partial \cost_f}{\partial \st}(\st^*(t_f),t_f)& - \cst^*(t_f)]^T \delta \st_f\\
237 |     & + [ \ham(\st^*(t_f),\ac^*(t_f), \cst^*(t_f), t_f) + \frac{\partial \cost_f}{\partial t}(\st^*(t_f),t_f)] \delta t_f = 0. \nonumber
238 | \end{align}
239 | The control $\ac^*(t)$ causes $\ham(\st^*(t),\ac^*(t),\cst^*(t),t)$ to assume its global minimum. This is a harder condition, in general, to analyze. Finally, we have additional necessary conditions. If the final time is fixed and the Hamiltonian does not explicitly depend on time, 
240 | \begin{equation}
241 |     \ham(\st^*(t),\ac^*(t),\cst^*(t)) = c \,\, \forall t \in [t_0,t_f]
242 | \end{equation}
243 | and if the final time is free and the Hamiltonian does not depend explicitly on time, 
244 | \begin{equation}
245 |     \ham(\st^*(t),\ac^*(t),\cst^*(t)) = 0 \,\, \forall t \in [t_0,t_f].
246 | \end{equation}
247 | Note that in general, uniqueness and existence are not guaranteed in the constrained setting. 
248 | 
249 | % TODO add example problems
250 | 
251 | \subsection{Numerical Aspects of Indirect Optimal Control}
252 | 
253 | % Mention how PMP can be leveraged to swap the original optimal control problem into a two-point boundary value problem (TPBVP).
254 | 
255 | % - Pros: We considerably decrease the dimension of the problem because, instead of looking for a control in the very large space $L^{\infty}$, we look for a vector (that is $p(0)$) in the smaller space $\mathbb{R}^n$. Still, this problem is hard to solve.
256 | % - Cons: A good guess of $p(0)$ must be available to make the method converge. The mathematical analysis to get further insights concerning such guess might be difficult.
257 | % Show how to go from PMP to TPBVP for the simple case "fixed final time/fixed final point". Follow the slides of lecture 10 and introduce the "shooting function" $S$ describing all the steps. Therefore, introduce the "indirect shooting method" as research of zeros for the shooting function.
258 | % Reproduce/show how to adapt the argument above in the presence of either "free final time" or "final point belonging to some submanifold", or both. For "final point belonging to some submanifold", attentively introduce the relation that relates the adjoint vector at the final time $p(t_f)$ with the kernel of the function $F$ that defines the submanifold $M_f = \{x\in\mathbb{R}^n|F(x)=0\}$.
259 | 
260 | \subsection{Further Reading}
261 | 
262 | For a practical treatment of indirect methods, we refer the reader to \cite{bryson1975applied}. For a more theoretical treatment, we refer the reader to \cite{lee1967foundations}.


--------------------------------------------------------------------------------
/2019/tex/source/ch5.tex:
--------------------------------------------------------------------------------
  1 | \section{Direct Methods for Optimal Control}
  2 | 
  3 | In the previous section we considered indirect methods to optimal control, in which the necessary conditions for optimality were first applied, yielding a two-point boundary value problem that was solved numerically. We will now consider the class of direct methods, in which the optimal control problem is first discretized, and then the resulting discrete optimization problem is solved numerically. 
  4 | 
  5 | \subsection{Direct Methods}
  6 | 
  7 | We will write our original continuous optimal control problem,
  8 | \begin{equation}
  9 | \begin{aligned}
 10 | \label{eq:ocp}
 11 | & \underset{\ac}{\min} & & \int_0^{t_f} \cost(\st(t),\ac(t),t) dt \\
 12 | & \textrm{s.t.} & & \stdot(t) = \f(\st(t), \ac(t),t), t \in [0, t_f]\\
 13 | & & & \st(0) = \st_0\\
 14 | & & & \st(t_f) \in \mathcal{M}_f\\
 15 | & & & \ac(t) \in \mathcal{U}, t\in [0,t_f]
 16 | \end{aligned}
 17 | \end{equation}
 18 | where $\mathcal{M}_f = \{\st\in \R^n : F(\st) = 0\}$ and where we have, for simplicity, assumed zero terminal cost and $t_0 = 0$. We will use forward Euler discretization of the dynamics. We select a discretization $0 = t_0 < t_1 < \ldots < t_N = t_f$ for the interval $[0,t_f]$, and we will write $\st_{i+1} \approx \st(t), \ac_i \approx \ac(t)$ for $t \in [t_i, t_{i+1}]$, and $\st_0 \approx \st(0)$. Denoting $h_i = t_{i+1} - t_i$, the continuous time optimal control problem is transcibed into the nonlinear constrained optimization problem 
 19 | \begin{equation}
 20 | \begin{aligned}
 21 | \label{eq:nlop}
 22 | & \underset{\st,\ac}{\min} & & \sum_{i=0}^{N-1} h_i \cost(\st_i,\ac_i,t_i) \\
 23 | & \textrm{s.t.} & & \st_{i+1} = \st_i + h_i \f(\st_i,\ac_i,t_i), i = 0, \ldots, N-1\\
 24 | & & & \st_N \in \mathcal{M}_f\\
 25 | & & & \ac_i \in \mathcal{U}, i = 0, \ldots, N-1
 26 | \end{aligned}
 27 | \end{equation}
 28 | 
 29 | \subsubsection{Consistency of Time Discretization}
 30 | 
 31 | Having performed this discretization, a reasonable (and important) sanity check on the validity of the direct approach is whether we recover the original problem in the limit of $h_i \to 0$. For simplicity, we will drop the time-dependence of the cost and dynamics. We will write the Lagrangian for (\ref{eq:nlop}) as 
 32 | \begin{equation}
 33 |     \mathcal{L} = \sum_{i=0}^{N-1} h_i \cost(\st_i,\ac_i) + \sum_{i=0}^{N-1} \bm{\lambda}_i^T (\st_i + h_i \f(\st_i,\ac_i) - \st_{i+1}).
 34 | \end{equation}
 35 | Then, the KKT conditions are
 36 | \begin{align}
 37 |     0 &= h_i \frac{\partial \cost}{\partial \st_i}(\st_i,\ac_i) + \bm{\lambda}_i - \bm{\lambda}_{i-1} + h_i \frac{\partial \f}{\partial \st_i}^T(\st_i,\ac_i) \bm{\lambda}_i\\
 38 |     0 &= h_i \frac{\partial \cost}{\partial \ac_i}(\st_i,\ac_i) + h_i \frac{\partial \f}{\partial \ac_i}^T(\st_i,\ac_i) \bm{\lambda}_i
 39 | \end{align}
 40 | Rearranging, we have
 41 | \begin{align}
 42 |     \frac{\bm{\lambda}_i - \bm{\lambda}_{i-1}}{h_i} &=- \frac{\partial \f}{\partial \st_i}^T(\st_i,\ac_i) \bm{\lambda}_i- \frac{\partial \cost}{\partial \st_i}(\st_i,\ac_i)\\
 43 |     0 &= \frac{\partial \f}{\partial \ac_i}^T(\st_i,\ac_i) \bm{\lambda}_i + \frac{\partial \cost}{\partial \ac_i}(\st_i,\ac_i).
 44 | \end{align}
 45 | Let $\cst(t) = \bm{\lambda}_i$ for $t \in [t_i, t_{i+1}], i = 0, \ldots, N-1$ and $p(0) = \lambda_0$. Then, the above are direct discretizations of the necessary conditions for (\ref{eq:ocp}), 
 46 | \begin{align}
 47 |     \dot{\cst}(t) &=- \frac{\partial \f}{\partial \st}^T(\st(t),\ac(t)) \cst_i- \frac{\partial \cost}{\partial \st}(\st(t),\ac(t))\\
 48 |     0 &= \frac{\partial \f}{\partial \ac}^T(\st(t),\ac(t)) \cst(t) + \frac{\partial \cost}{\partial \ac}(\st(t),\ac(t)).
 49 | \end{align}
 50 | 
 51 | \subsection{Transcription Methods}
 52 | 
 53 | A fundamental choice in the design of numerical algorithms for direct optimization of the discretized optimal control problem is whether to optimize over the state and action variables (a method known as collocation or simultaneous optimization) or strictly over the action variables (known as shooting).
 54 | 
 55 | \subsubsection{Collocation Methods}
 56 | 
 57 | Collocation methods optimize both the state variables and the control input at a fixed, finite number of times, $t_0, \ldots, t_i, \ldots, t_N$. Moreover, the dynamics constraints are enforced at these points. As such, it is necessary to choose a finite-dimensional representation of the trajectory between these points. This rough outline leaves unspecified a large number of algorithmic design choices. 
 58 | 
 59 | First, how are the dynamics constraints enforced? Both derivative and integral constraints exist. The derivative approach enforces that the derivative of the state with respect to time of the parameterized trajectory is equal to the given system dynamics. The integral approach relies on integrating the given dynamics and enforcing agreement between this and the trajectory parameterization. In these notes, we will focus on the derivative approach. 
 60 | 
 61 | Second, a choice of trajectory parameterization is required. We will primarily discuss Hermite-Simpson methods in herein, which parameterize each subinterval of the trajectory (in $[t_i, t_{i+1}]$) with a cubic polynomial. Note that the choice of a polynomial results in integral and derivative constraints being relatively simple to evaluate. However, a wide variety of parameterizations exist. For example, pseudospectral methods represent the entire trajectory as a single high-order polynomial. 
 62 | 
 63 | We will now outline the Hermite-Simpson method as one example of direct collocation. Having selected a discretization $0 = t_0 < t_1 < \ldots < t_N = t_f$, we denote $h_i = t_{i+1} - t_i$. In every subinterval $[t_i, t_{i+1}]$, we approximate $\st(t)$ with a cubic polynomial
 64 | \begin{equation}
 65 |     \st(t) = \bm{c}_0^i + \bm{c}_1^i (t - t_i) + \bm{c}_2^i (t-t_i)^2 + \bm{c}_3^i (t - t_i)^3 
 66 | \end{equation}
 67 | which yields derivative 
 68 | \begin{equation}
 69 |     \st(t) = \bm{c}_1^i + 2 \bm{c}_2^i (t-t_i) + 3 \bm{c}_3^i (t - t_i)^2. 
 70 | \end{equation}
 71 | Writing $\st_i = \st(t_i), \st_{i+1} = \st(t_{i+1}), \stdot_i = \stdot(t_i), \stdot_{i+1} = \stdot(t_{i+1})$, we may write 
 72 | \begin{equation}
 73 |     \begin{bmatrix}
 74 |     \st_i\\
 75 |     \stdot_{i}\\
 76 |     \st_{i+1}\\
 77 |     \stdot_{i+1}
 78 | \end{bmatrix}
 79 | =
 80 | \begin{bmatrix}
 81 | I & 0 & 0 & 0\\
 82 | 0 & I & 0 & 0\\
 83 | I & h_i I & h_i^2 I & h_i^3 I\\
 84 | 0 &  I & 2 h_i I & 3 h_i^2 I
 85 | \end{bmatrix}
 86 |     \begin{bmatrix}
 87 |     \bm{c}_0^i\\
 88 |     \bm{c}_1^i\\
 89 |     \bm{c}_2^i\\
 90 |     \bm{c}_3^i
 91 | \end{bmatrix}
 92 | \end{equation}
 93 | which in turn results in 
 94 | \begin{equation}
 95 | \begin{bmatrix}
 96 |     \bm{c}_0^i\\
 97 |     \bm{c}_1^i\\
 98 |     \bm{c}_2^i\\
 99 |     \bm{c}_3^i
100 | \end{bmatrix}
101 | =
102 | \begin{bmatrix}
103 | I & 0 & 0 & 0\\
104 | 0 & I & 0 & 0\\
105 | -\frac{3}{h_i^2} I & -\frac{2}{h_i} I & \frac{3}{h_i^2} I & -\frac{1}{h_i} I\\
106 | \frac{2}{h_i^2} I & \frac{1}{h_i^2} I & -\frac{2}{h_i^3} I & \frac{1}{h_i^2} I
107 | \end{bmatrix}
108 | \begin{bmatrix}
109 |     \st_i\\
110 |     \stdot_{i}\\
111 |     \st_{i+1}\\
112 |     \stdot_{i+1}
113 | \end{bmatrix}.
114 | \end{equation}
115 | Choosing intermediate times $t_i^c = t_i + \frac{h_i}{2}$ (collocation points), we can define interpolated controls $\ac^c_i = \frac{\ac_i + \ac_{i+1}}{2}$. From the above, we have
116 | \begin{align}
117 |     \st^c_i \vcentcolon=& \st(t_i + \frac{h_i}{2}) = \frac{1}{2} (\st_i + \st_{i+1}) + \frac{h_i}{8} (\f(\st_i,\ac_i,t_i) - \f(\st_{i+1},\ac_{i+1},t_{i+1}))\\
118 |     \stdot^c_i \vcentcolon=& \stdot(t_i + \frac{h_i}{2}) = -\frac{3}{2h_i} (\st_i + \st_{i+1}) - \frac{1}{4} (\f(\st_i,\ac_i,t_i) + \f(\st_{i+1},\ac_{i+1},t_{i+1})).
119 | \end{align}
120 | Thus, we can write our discretized problem as 
121 | \begin{equation}
122 | \begin{aligned}
123 | \label{eq:ocp}
124 | & \underset{\ac_{0:N-1},\st_{0:N}}{\min} & & \sum_{i=0}^{N-1} h_i \cost(\st(t),\ac(t),t)\\
125 | & \textrm{s.t.} & & \stdot_i^c - \f(\st^c_i, \ac^c_i,t_i^c) = 0, i = 0, \ldots, N-1\\
126 | & & & F(\st_N) = 0\\
127 | & & & \ac(t) \in \mathcal{U}, i = 0, \ldots, N-1
128 | \end{aligned}
129 | \end{equation}
130 | 
131 | % Describe the generalized procedure which consists of discretizing both the state and the control in time. This is achieved by discretizing the variables with high-order polynomials in each subintervals $[t_i,t_{i+1}]$.
132 | 
133 | % Say that in the literature there are lots of such discretizations and the choice relies on the fact that some discretizations fit better than others for specific problem. Moreover, the higher precision you want, the higher order of polynomial you might be obliged to choose.
134 | 
135 | % Because of its generality and high precision, it is worth introducing two methods: Hermite polynomials (done in lecture 12) and Runge-Kutta 2 (easily found online). Say something about their stability property: unlike explicit Euler, the numerical solution does not explode with the number of discretization points.
136 | 
137 | % It might be worth introducing general explicit Runge-Kutta schemes for any order of polynomials, because they easily generalize Hermite and other collocation methods. For completeness, I would also provide some proof of convergence, like the fact that for $h\rightarrow0$, we have that the difference the solution of the original ODE and its Runge-Kutta approximation goes to zero with a speed $h^p$, where $p$ is the order of the Runge-Kutta scheme (or at least for the second-order scheme. I might help you doing this if needed).
138 | 
139 | \subsubsection{Shooting Methods}
140 | 
141 | Shooting methods solve the discrete optimization problem via optimizing only over the control inputs, and integrating the dynamics forward given these controls. A simple approach to the forward integration is the approach we have discussed above, in which forward Euler integration is used. Single-shooting methods directly optimize the controls for the entire problem. These approaches are fairly efficient for low dimension, short horizon problems, but typically struggle to scale to larger problems. Multiple shooting methods, on the other hand, optimize via shooting over subcomponents of the problem, and enforce agreement between the trajectory segments generated via shooting within each subproblem. These methods are therefore a combination of shooting methods and collocation methods. Generally, numerical solvers for shooting problems will, given an initial action sequence, linearize the trajectory and optimize the objective function with respect to those linearized dynamics to obtain new control inputs.
142 | 
143 | % Add comparisons with indirect shooting. More precisely, shootings are more robust but are computational more demanding. On the other hand, indirect shootings are cheaper and converge fast, but they suffer from sensitivity issues (i.e., less robustness) because they are quite hard to correctly initialize.
144 | 
145 | 
146 | \subsubsection{Sequential Convex Programming}
147 | 
148 | Direct optimization of the discretized nonlinear control problem typically results in a non-convex optimization problem, for which finding a good solution may be difficult or impossible. The source of this non-convexity is typically the dynamics (and sometimes the cost function). The key idea of sequential convex programming (SCP) is to iterative re-linearize the dynamics (and construct a convex approximation of the cost function, if it is non-convex) around a nominal trajectory. 
149 | 
150 | First, we will assume for this outline that the cost $\cost$ is convex. Let $(\st_0(\cdot),\ac_0(\cdot))$ be a nominal tuple of trajectory and control (which is not necessarily feasible). We linearize the dynamics around this trajectory:
151 | \begin{equation}
152 |     \f_1(\st,\ac,t) = \f(\st_0(t),\ac_0(t),t) + \frac{\partial \f}{\partial \st}(\st_0(t), \ac_0(t), t) (\st - \st_0(t)) + \frac{\partial \f}{\partial \ac}(\st_0(t), \ac_0(t), t) (\ac - \ac_0(t)).
153 | \end{equation}
154 | We can then solve the linear optimal control problem (with $k=0$, initially),
155 | \begin{equation}
156 | \begin{aligned}
157 | \label{eq:ocp}
158 | & {\min} & & \int_{0}^{t_f} \cost(\st(t),\ac(t),t) dt\\
159 | & \textrm{s.t.} & & \stdot(t) = \f_{k+1}(\st(t),\ac(t),t), t \in [0,t_f]\\
160 | & & & \st(0) = \st_0\\
161 | & & & \st(t_f) = \st_f\\
162 | & & & \ac(t) \in \mathcal{U}, t \in [0,t_f]
163 | \end{aligned}
164 | \end{equation}
165 | where the dynamics are linear and the cost function is quadratic. Discretizing this continuous control problem yields a tractable convex optimization problem with dynamics $\st_{i+1} = \st_i + h_i \f(\st_i,\ac_i,t_i), i=0, \ldots, N-1$. We then iterate this procedure until convergence is achieved with the new trajectory.
166 | 
167 | % [Gu] = http://asl.stanford.edu/wp-content/papercite-data/pdf/Bonalli.Cauligi.Bylard.Pavone.ICRA19.pdf
168 | 
169 | % Trust-regions: these (either hard or soft) constraints must be added to prevent bad linearizations. Also, the size of the trust regions should be adapted during iterations depending on how good the linearization is (follow sections II.B and III.A of [Gu]).
170 | % Put an algorithm that provides SCP. For this, you might copy Algorithm 1 from [Gu]. Explain and justify each line of such algorithm (you can copy what we wrote in [Gu]).
171 | % Spend some words (maybe just a paragraph) to justify the approach. The key fact is to say that when the procedure converges (and to make it converge we may use several numerical tricks, one of them being trust regions), we obtain a local solution (in the sense of the PMP) for the original optimal control problem (I might help you doing this if needed).
172 | 
173 | % TODO discuss iLQR/DDP in the context of SCP/transcription methods
174 | 
175 | \subsection{Further Reading}
176 | 
177 | A broad introduction to direct methods for trajectory optimization is presented in \cite{kelly2017transcription}. This tutorial also features a discussion of trajectory optimization for hybrid systems, which we have not discussed in this section, as well as numerical solver features. For a more comprehensive review of direct methods for trajectory optimization by the same author with an emphasis on collocation methods, see \cite{kelly2017introduction}.


--------------------------------------------------------------------------------
/2019/tex/source/ch6.tex:
--------------------------------------------------------------------------------
  1 | \section{Model Predictive Control}
  2 | 
  3 | Both direct and indirect methods for open-loop control result in trajectories that must be tracked with an auxiliary controller, if there is any mismatch between the systems model and the true system. This often results in a decoupling of the auxiliary controller from the original optimal control problem, which may result in performance degradation. Alternatively, the auxiliary controller may not be able to take into account other problem considerations such as state or control constraints. In this section, we introduce model predictive control, which applies the ideas from direct methods for trajectory generation online to iteratively replan, and thus results in a closed-loop controller.
  4 | 
  5 | \subsection{Overview of MPC}
  6 | 
  7 | Model predictive control entails solving finite-time optimal control problems in a receding horizon fashion (and thus is also frequently referred to as \textit{receding horizon control}). The rough structure of model predictive control algorithms is
  8 | \begin{itemize}
  9 |     \item At each sampling time $t$, solve an \textit{open-loop} optimal control problem over a finite horizon
 10 |     \item Apply the generated optimal input signal during the subsequent sampling interval $[t,t+1)$
 11 |     \item At the next time step $t+1$, solve the new optimal control problem based on new measurements of the state over a shifted horizon
 12 | \end{itemize}
 13 | 
 14 | Consider the problem of regulating to the origin the discrete-time linear time-invariant system
 15 | \begin{equation}
 16 |     \st(t+1) = A \st(t) + B \ac(t)
 17 | \end{equation}
 18 | for $\st(t) \in \mathbb{R}^n$, $\ac(t) \in \mathbb{R}^m$, subject to constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}, t \geq 0$, where $\mathcal{X}, \mathcal{U}$ are polyhedra. We will assume the full state measurement is available at time $t$. 
 19 | Given this, we can state the finite-time optimal control problem solved at each stage, $t$, as 
 20 | \begin{equation}
 21 | \begin{aligned}
 22 | \label{eq:ocp}
 23 | & \underset{\ac_{t\mid t}, \ldots, \ac_{t+N-1\mid t}}{\min} & & \cost_f(\st_{t+N\mid t}) + \sum_{k=0}^{N-1} \cost(\st_{t+k\mid t},\ac_{t+k\mid t})  \\
 24 | & \textrm{s.t.} & & \st_{t+k+1\mid t} = A \st_{t+k\mid t} + B \ac_{t+k\mid t}, \quad k = 0, \ldots, N-1 \\
 25 | & & & \st_{t+k\mid t} \in \mathcal{X},  \quad k = 0, \ldots, N-1 \\
 26 | & & & \ac_{t+k\mid t} \in \mathcal{U},  \quad k = 0, \ldots, N-1 \\
 27 | & & & \st_{t+N\mid t} \in \mathcal{X}_f, \\
 28 | & & & \st_{t\mid t} = \st(t)
 29 | \end{aligned}
 30 | \end{equation}
 31 | for which we write the solution as $\J_t^*(\st(t))$. In this problem, $\st_{t+k\mid t}$ and $\ac_{t+k\mid t}$ are the state and action predicted at time $t+k$ from time $t$. Letting $U^*_{t\to t+N\mid t} \vcentcolon= \{\ac^*_{t\mid t}, \ldots, \ac^*_{t+N-1\mid t}\}$ denote the optimal solution, we take $\ac(t) = \ac^*_{t\mid t}(\st(t))$. This optimization problem is then repeated at time $t+1$, based on the new state $\st_{t+1\mid t+1} = \st(t+1)$. Defining the closed-loop control policy as $\pol_t(\st(t)) \vcentcolon= \ac^*_{t\mid t}(\st(t))$, we have the closed-loop dynamics 
 32 | \begin{equation}
 33 |     \st(t+1) = A \st(t) + B \pol_t(\st(t)).
 34 | \end{equation}
 35 | Thus, the central question of this formulation becomes characterizing the behavior of the closed-loop system defined by this iterative re-optimization. As the problem is time-invariant, we can rewrite the closed-loop dynamics as
 36 | \begin{equation}
 37 |     \st(t+1) = A \st(t) + B \pol(\st(t)).
 38 | \end{equation}
 39 | 
 40 | The rough structure of the online model predictive control framework is then as follows:
 41 | \begin{enumerate}
 42 |     \item Measure the state $\st(t)$ at every time $t$
 43 |     \item Obtain $U^*_0(\st(t)$ by solving finite-time optimal control problem
 44 |     \item If $U^*_0(\st(t) = \emptyset$ then `problem infeasible', stop
 45 |     \item Apply the first element $\ac^*_0$ of $U^*_0(\st(t)$ to the system
 46 |     \item Wait for the new sampling time $t+1$
 47 | \end{enumerate}
 48 | This framework leads to two main implementation issues. First, the controller may lead us into a situation where after a few steps the finite-time optimal control problem is infeasible, which we refer to as the \textit{persistent feasibility issue}. Even if the feasibility problem does not occur, the generated control inputs may not lead to trajectories that converge to the origin, which we refer to as the \textit{stability issue}. The key question in the analysis of MPC algorithms is how we may guarantee that our ``short-sighted'' control strategy leads to effective long-term behavior. While one possible approach is directly analyzing the closed-loop dynamics, this is in practice very difficult. Our approach will instead be to derive conditions on the terminal function $\cost_f$ and terminal constraint set $\mathcal{X}_f$ so that the persistent feasibility and closed-loop stability are guaranteed. 
 49 | 
 50 | \subsection{Feasibility}
 51 | 
 52 | Model predictive control simplifies the online control optimization problem by solving a shorter horizon problem, as opposed to solving the full optimal control problem online at each timestep. This myopic optimization leads to the possibility that after several steps, the problem may no longer be feasible. As such, in this section we will discuss approaches to impose constraints on so-called \textit{recursive feasibility} to avoid this problem. 
 53 | 
 54 | Let 
 55 | \begin{align}
 56 |     \mathcal{X}_0 \vcentcolon= \{\st \in \mathcal{X} \mid \exists (\ac_0, \ldots,& \ac_{N-1}) \,\,\text{s.t.}\,\, \st_k \in \mathcal{X}, \ac_k \in \mathcal{U}, k=0,\ldots,N-1,\\
 57 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_{k+1} = A \st_k + B \ac_k, k = 0, \ldots, N-1 \nonumber
 58 |     \}
 59 | \end{align}
 60 | be the set of feasible initial states. Simply, this set is the set of initial states for which a sequence of control inputs exist that cause the final state to satisfy the terminal constraint. For the autonomous system $\st(t+1) = \phi(\st(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, the one-step controllable set to set $\mathcal{S}$ is defined as 
 61 | \begin{equation}
 62 |     \text{Pre}(\mathcal{S}) \vcentcolon= \{\st \in \mathbb{R}^n : \phi(\st)\in \mathcal{S}\}.
 63 | \end{equation}
 64 | For the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, the one-step controllable set to set $\mathcal{S}$ is defined as 
 65 | \begin{equation}
 66 |     \text{Pre}(\mathcal{S}) \vcentcolon= \{\st \in \mathbb{R}^n : \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, \phi(\st,\ac)\in \mathcal{S}\}.
 67 | \end{equation}
 68 | A set $\mathcal{C} \subseteq \mathcal{X}$ is said to be a control invariant set for the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, if
 69 | \begin{equation}
 70 |     \st(t) \in \mathcal{C} \implies \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, \phi(\st(t),\ac(t)) \in \mathcal{C}, \forall t.
 71 | \end{equation}
 72 | The set $\mathcal{C}_\infty \subset \mathcal{X}$ is said to the maximal control invariant set for the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, if it control invariant all control invariant sets contained in $\mathcal{X}$\footnote{Control invariant sets can be computed using the MPT toolbox: \url{www.mpt3.org}}. 
 73 | 
 74 | We will now proceed to derive critical results on recursive feasibility for linear dynamical systems. We will define the ``truncated'' feasibility set 
 75 | \begin{align}
 76 |     \mathcal{X}_1 \vcentcolon= \{\st \in \mathcal{X} \mid \exists (\ac_1, \ldots,& \ac_{N-1}) \,\,\text{s.t.}\,\, \st_k \in \mathcal{X}, \ac_k \in \mathcal{U}, k=1,\ldots,N-1,\\
 77 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_{k+1} = A \st_k + B \ac_k, k = 1, \ldots, N-1 \nonumber
 78 |     \}.
 79 | \end{align}
 80 | Then, we may state the following result on feasibility. 
 81 | 
 82 | \begin{lemma}[Persistent Feasibility]
 83 | If set $\mathcal{X}_1$ is a control invariant set for system $\st(t+1) = A \st(t) + B \ac(t)$, $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, then the MPC law is persistently feasible. 
 84 | \end{lemma}
 85 | 
 86 | \begin{proof}
 87 | Note that 
 88 | \begin{equation}
 89 |     \text{Pre}(\mathcal{X}_1) \vcentcolon= \{\st \in \mathbb{R}^n : \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, A\st + B \ac\in \mathcal{X}_1\}.
 90 | \end{equation}
 91 | Since $\mathcal{X}_1$ is control invariant, there exists $\ac \in \mathcal{U}$ such that $A \st + B \ac \in \mathcal{X}_1$ for all $\st \in \mathcal{X}_1$. Thus, $\mathcal{X}_1 \subseteq \mathcal{X}_1 \cap \mathcal{X}$. One may write 
 92 | \begin{equation}
 93 |     \mathcal{X}_0 = \{\st_0 \in \mathcal{X} \mid \exists \ac_0 \in \mathcal{U}\,\, \text{s.t.}\,\, A \st_0 + B \ac_0 \in \mathcal{X}_1\} = \text{Pre}(\mathcal{X}_1) \cap \mathcal{X}.
 94 | \end{equation}
 95 | This then implies $\mathcal{X}_1 \subseteq \mathcal{X}_0$. Choose some $\st_0 \in \mathcal{X}_0$. Let $U^*_0$ be the solution to the finite-time optimization problem, and $\ac^*_0$ be the first control. Let $\st_1 = A \st_0 + B \ac_0^*$. Since $U^*_0$ is feasible, one has $\st_1 \in \mathcal{X}_1$. Since $\mathcal{X}_1 \subseteq \mathcal{X}_0$, $\st_1 \in \mathcal{X}_0$, and hence the next optimization problem is feasible. 
 96 | \end{proof}
 97 | 
 98 | For $N=1$, we may set $\mathcal{X}_f = \mathcal{X}1$. If the terminal set is chosen to be control invariant, then MPC problem will be persistently feasible \textit{independent} of chosen control objectives and parameters. The system designer may then choose the parameters to affect the system performance. The logical question, then, is how to extent this result to $N>1$, for which we have the following result.
 99 | 
100 | \begin{theorem}[Persistent Feasibility]
101 | If $\mathcal{X}_f$ is a control invariant set for the the system $\st(t+1) = A \st(t) + B \ac(t)$, $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}, t\geq 0$, then the MPC law is persistently feasible. 
102 | \end{theorem}
103 | 
104 | \begin{proof}
105 | We will begin by defining the ``truncated'' feasibility set at step $N-1$,
106 | \begin{align}
107 |     \mathcal{X}_{N-1} \vcentcolon= \{\st_{N-1} \in \mathcal{X} \mid \exists & \ac_{N-1} \,\,\text{s.t.}\,\, \st_{N-1} \in \mathcal{X}, \ac_{N-1} \in \mathcal{U}\\
108 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_N = A \st_{N-1} + B \ac_{N-1} \nonumber
109 |     \}.
110 | \end{align}
111 | Due to the terminal constraints, have $A \st_{N-1} + B \ac_{N-1} = \st_N \in \mathcal{X}_f$. Since $\mathcal{X}_f$ is a control invariant set, there exists a $\ac \in \mathcal{U}$ such that $\st^+ = A \st_N + B \ac \in \mathcal{X}_f$. This is the requirement that $\st_N \in \mathcal{X}_{N-1}$. Thus, $\mathcal{X}_{N-1}$ is control invariant. Repeating this argument, one can recursively show that $\mathcal{X}_{N-2}, \ldots, \mathcal{X}_1$ are control invariant, and the persistent feasibility lemma then applies. 
112 | \end{proof}
113 | 
114 | Practically, we introduce the terminal set $\mathcal{X}_f$ artificially for the purpose of leading to a sufficient condition for persistent feasibility. We would like to choose it to be large, so that is avoids compromising closed-loop performance. 
115 | 
116 | \subsection{Stability}
117 | 
118 | Persistent feasibility does not guarantee that the closed-loop trajectories converge toward the desired equilibrium point. One of the most popular approaches to guarantee persistent feasibility and stability of the MPC law makes use of a control invariant terminal set $\mathcal{X}_f$ for feasibility, and a terminal function $\cost_f(\cdot)$ for stability. To prove stability, we leverage Lyapunov stability theory. 
119 | 
120 | \begin{theorem}[Lyapunov Stability]
121 | \label{thm:mpc_stability}
122 | Consider the equilibrium point $\st = 0$ for the autonomous system $\st_{k+1} = \f(\st_k)$ (with $\f(0)=0)$. Let $\Omega \subset \mathbb{R}^n$ be a closed and bounded set containing the origin. Let $V:\mathbb{R}^n \to \mathbb{R}$ be a function, continuous at the origin, such that 
123 | \begin{align}
124 |     & V(0) = 0\, \text{and} \,\,V(\st) > 0, \,\, \forall \st \in \Omega \setminus \{0\} \\
125 |     & V(\st_{k+1}) - V(\st_k) < 0, \,\, \forall \st \in \Omega \setminus \{0\}.
126 | \end{align}
127 | Then $\st=0$ is asymptotically stable in $\Omega$.
128 | \end{theorem}
129 | 
130 | We will utilize this result to show that with appropriate choices of $\mathcal{X}_f$ and $\cost_f(\cdot)$, $\J_0^*$ is a Lyapunov function for the closed-loop system. 
131 | 
132 | \begin{theorem}[MPC Stability (for Quadratic Cost)]
133 | Assume 
134 | \begin{enumerate}
135 |     \item $Q = Q^T > 0, R = R^T >0, Q_f > 0$
136 |     \item Sets $\mathcal{X}, \mathcal{X}_f$, and $\mathcal{U}$ contain the origin in their interior and are closed
137 |     \item $\mathcal{X}_f \subseteq \mathcal{X}$ is control invariant
138 |     \item $\min_{\bm{v} \in \mathcal{U}, A \st + B \bm{v} \in \mathcal{X}_f} \left\{ -\cost_f(\st) + \cost(\st,\bm{v}) + \cost_f(A\st + B\bm{v}) \right\} \leq 0, \forall \st \in \mathcal{X}_f$.
139 | \end{enumerate}
140 | Then, the origin of the closed-loop system is asymptotically stable with domain of attraction $\mathcal{X}_f$.
141 | \end{theorem}
142 | 
143 | \begin{proof}
144 | Note that via assumption 3, persistent feasibility is guaranteed for any $Q_f, Q, R$. We want to show that $\J_0^*$ is a Lyapunov function for the closed-loop system $\st(t+1) = \f_{cl}(\st(t)) = A\st(t) + B \pol(\st(t))$, with respect to the equilibrium $\f_{cl}(0) = 0$ (the origin is indeed an equilibrium as $0 \in \mathcal{X}, 0 \in \mathcal{U}$, and the cost is positive for any non-zero control sequence. Note also that $\mathcal{X}_0$ is closed and bounded, and $\J_0^*(0)=0$, both by assumption. Note also that $\J^*_0(\st)>0$ for all $\st \in \mathcal{X}_0 \setminus \{0\}$.
145 | 
146 | We will now show the decay property. Since the setup is time-invariant, we can study the decay property between $t=0$ and $t=1$. Let $\st(0) \in \mathcal{X}_0$, let $U_0^{[0]} = [\ac^{[0]}_0,\ldots,\ac^{[0]}_{N-1}]$ be the optimal control sequence, and let $[\st(0), \ldots, \st^{[0]}_{N}]$ be the corresponding trajectory. After applying $\ac^{[0]}_0$, one obtains $\st(1) = A \st(0) + B \ac^{[0]}_0$. Now, consider the sequence of control $[\ac^{[0]}_1,\ldots,\ac^{[0]}_{N-1}, \bm{v}]$, where $\bm{v}\in \mathcal{U}$ and the corresponding state trajectory is $[\st(1), \ldots, \st^{[0]}_{N}, A\st^{[0]}_{N} + B \bm{V}]$. Since $\st^{[0]}_{N} \in \mathcal{X}_f$ (by the terminal constraint), and since $\mathcal{X}_f$ is control invariant, 
147 | \begin{equation}
148 |     \exists \bar{\bm{v}} \in \mathcal{U}\mid A \st^{[0]}_{N} + B \bar{\bm{v}} \in \mathcal{X}_f.
149 | \end{equation}
150 | With such a choice of $\bar{\bm{v}}$, the sequence $[\ac^{[0]}_1,\ldots,\ac^{[0]}_{N-1}, \bm{v}]$ is feasible for the MPC optimization problem at time $t=1$. Subce tgus sequence is not necessarily optimal, 
151 | \begin{equation}
152 |     \J_0^*(\st(1)) \leq \cost_f(A \st^{[0]}_{N} + B \bar{\bm{v}}) + \sum_{k=1}^{N-1} \cost(\st^{[0]}_{k}, \ac^{[0]}_{k}) + \cost(\st^{[0]}_{N}, \bm{v}).
153 | \end{equation}
154 | Equivalently, 
155 | \begin{equation}
156 |     \J_0^*(\st(1)) \leq \cost_f(A \st^{[0]}_{N} + B \bar{\bm{v}}) + \J^*_0(\st(0)) - \cost_f(\st^{[0]}_{N}) - \cost(\st(0), \ac^{[0]}_{0}) + \cost(\st^{[0]}_{N}, \bar{\bm{v}})
157 | \end{equation}
158 | Since $\st^{[0]}_{N} \in \mathcal{X}_f$ by assumption, we can select $\bar{\bm{v}}$ such that 
159 | \begin{equation}
160 |     \J_0^*(\st(1)) \leq \J_0^*(\st(0)) - \cost(\st(0), \ac^{[0]}_{0}).
161 | \end{equation}
162 | Since $\cost(\st(0), \ac^{[0]}_{0})>0$ for all $\st(0) \in \mathcal{X}_0 \setminus \{0\}$,
163 | \begin{equation}
164 |     \J_0^*(\st(1)) - \J_0^*(\st(0)) < 0.
165 | \end{equation}
166 | The last step is to prove continuity, for which we omit the details and refer the reader to \cite{borrelli2017predictive}.
167 | \end{proof}
168 | 
169 | \subsubsection{Choosing $\mathcal{X}_f$ and $Q_f$}
170 | 
171 | We will look at two cases. First, we will assume that $A$ is asymptotically stable. Then, we set $\mathcal{X}_f$ as the maximally positive invariant set $\mathcal{O}_\infty$ for the system $\st(t+1) = A \st(t), \st(t) \in \mathcal{X}$. The set $\mathcal{X}_f$ is a control invariant set for the system $\st(t+1) = A \st(t) + B \ac(t)$ as $\ac=0$ is a feasible control. As for stability, $\ac=0$ is feasible and $A \st \in \mathcal{X}_f$ if $\st \in \mathcal{X}_f$, thus assumption 4 of Theorem \ref{thm:mpc_stability} becomes 
172 | \begin{equation}
173 |     -\st^T Q_f \st + \st^T Q \st + \st^T A^T Q_f A \st \leq 0, \, \forall \st \in \mathcal{X}_f
174 | \end{equation}
175 | which is true since, due to the fact that A is asymptotically stable, 
176 | \begin{equation}
177 |     \exists Q_f > 0 \mid - Q_f + Q + A^T Q_f A = 0
178 | \end{equation}
179 | 
180 | Next, we will look at the general case. Let $L_\infty$ be the optimal gain for the infinite-horizon LQR controller. Set $\mathcal{X}_f$ as the maximal positive invariant set for system $\st(t+1) = (A + B L_\infty) \st(t)$ (with constraints $\st(t) \in \mathcal{X}, F_\infty \st(t) \in \mathcal{U}$). Then, set $Q_f$ as the solution $Q_\infty$ to the discrete-time Riccati equation. 
181 | 
182 | \subsubsection{Explicit MPC}
183 | 
184 | In some cases, the MPC law can be pre-computed, which removes the need for online optimization. An important case of this is that of constrained LQR, in which we wish to solve the optimal control problem
185 | \begin{equation}
186 | \begin{aligned}
187 | \label{eq:ocp}
188 | & \underset{\ac_{0}, \ldots, \ac_{N-1}}{\min} & & \st_N^T Q_f \st_N + \sum_{k=0}^{N-1} \st_k^T Q \st_k + \ac_k^T R \ac_k\\
189 | & \textrm{s.t.} & & \st_{k+1} = A \st_{k} + B \ac_{k}, \quad k = 0, \ldots, N-1 \\
190 | & & & \st_{k} \in \mathcal{X},  \quad k = 0, \ldots, N-1 \\
191 | & & & \ac_{k} \in \mathcal{U},  \quad k = 0, \ldots, N-1 \\
192 | & & & \st_{N} \in \mathcal{X}_f, \\
193 | & & & \st_{0} = \st
194 | \end{aligned}
195 | \end{equation}
196 | The solution to the constrained LQR problem is a control $\ac^*$ which is a continuous piecewise affine function on polyhedral partition of the state space $\mathcal{X}$, that is $\ac^* = \pol(\st)$, where
197 | \begin{equation}
198 |     \pol(\st) = L^j \st + \bm{l}^j\,\,\text{if}\,\, H^j\st \leq K^j, \, j = 1, \ldots, N^r.
199 | \end{equation}
200 | Thus, online, one has to locate in which cell of the polyhedral partition the state $\st$ lies, and then one obtains the optimal control via a look-up table query.
201 | 
202 | \subsection{Further Reading}
203 | 
204 | We refer the reader to \cite{borrelli2017predictive} and \cite{rawlings2017model} for two broad and comprehensive treatments of the topic. 


--------------------------------------------------------------------------------
/2019/tex/source/ch7.tex:
--------------------------------------------------------------------------------
 1 | \section{Adaptive Optimal Control}
 2 | 
 3 | 
 4 | \subsection{Problem Statement}
 5 | 
 6 | % State the adaptive optimal control problem. Lead into information gain. Talk about comparison to passive adaptive control? Talk about MRAC adaptive control? 
 7 | 
 8 | % lay out episodic vs non-episodic setting (+ goals of each)
 9 | 
10 | \subsection{System Identification}
11 | 
12 | % lay out the system ID problem, discuss split into sysID/control phases as heuristic for RL problem
13 | 
14 | \subsection{Adaptive Control}
15 | 
16 | % discuss passive adaptive control
17 | 
18 | \subsection{Probing, Planning for Information Gain, and Dual Control}
19 | 
20 | % The concept of probing. The idea of planning for information gain. The practical difficulty of optimizing the dual control problem (with examples from Edison Tse papers). Approximate/local methods for dual control? Belief space planning?
21 | 
22 | 
23 | \subsection{Further Reading}


--------------------------------------------------------------------------------
/2019/tex/source/ch8.tex:
--------------------------------------------------------------------------------
 1 | \section{Reinforcement Learning}
 2 | 
 3 | \subsection{Model-Based Approaches: Linear Systems}
 4 | 
 5 | \subsubsection{Adaptive LQR}
 6 | 
 7 | % cover classical work on adaptive LQR; exploration/persistent excitation (becker/kumar 85, possibly results from adaptive control books?)
 8 | 
 9 | % survey, briefly, modern work on adaptive LQR. Model-based and model-free? Lots of refs from szepesvari and recht papers. 
10 | 
11 | \subsubsection{Learning-Based MPC}
12 | 
13 | % Tomlin work + others; linear learning-based MPC only. Survey or dig into details?
14 | 
15 | \subsubsection{Time-Varying Linear Models}
16 | 
17 | % Iterative Learning Control?
18 | 
19 | % Episodic fitting of linear models
20 | 
21 | \subsection{Model-Based Approaches: Nonlinear Systems}
22 | 
23 | % Discuss PILCO, other probabilistic methods like PETS?
24 | 
25 | % Discuss predictive models for observations, e.g. visuomotor control
26 | % Should avoid introduction of latent variable models in the class, but can include for the future. Could survey representation learning in MB RL?
27 | 
28 | 
29 | \subsection{Model-Free Approaches}
30 | 
31 | \subsubsection{Temporal Difference Learning}
32 | 
33 | % The idea of TD learning, Q learning, DQN. 
34 | 
35 | 
36 | \subsubsection{Policy Gradient}
37 | 
38 | % Policy gradient theorem. Deterministic Policy Gradient theorem?
39 | 
40 | \subsection{Connections Between Model-Based and Model-Free Methods}
41 | 
42 | % Model-based RL via model-learning + model-free RL. Model-based acceleration for model-free RL. Model-free terminal costs/values for finite horizon MPC-style methods. 
43 | 
44 | \subsubsection{Model-Based Acceleration for Model-Free RL}
45 | 
46 | \subsubsection{Model-Free RL for Terminal Costs}
47 | 
48 | \subsubsection{LQR via Q-Learning}
49 | 
50 | % Bradtke paper from '93. Is this necessary?
51 | 
52 | \subsection{Exploration}
53 | 
54 | % Epsilon greedy and shortcomings for nonlinear systems. Deep exploration as a concept. Practical approaches for exploration in nonlinear systems: Pseudo-count based, Thompson sampling (primarily TS of models), Optimism (risk-seeking LQR?). 
55 | 
56 | \subsection{Safety}
57 | 
58 | \subsubsection{Risk-Sensitive RL}
59 | 
60 | % Risk-sensitive RL. 
61 | 
62 | \subsubsection{Stability Analysis}
63 | 
64 | %KL constraints on trajectory shift. Lyapunov methods. 
65 | 
66 | \subsubsection{Safe Exploration}
67 | 
68 | % Extensions of the previous methods. GP-based SafeOpt ideas. 
69 | 
70 | 
71 | \subsection{Further Reading}


--------------------------------------------------------------------------------
/2019/tex/source/intro.tex:
--------------------------------------------------------------------------------
 1 | \section*{Introduction}
 2 | 
 3 | These notes accompany the newly revised (Spring 2019) version of AA203 at Stanford. The goal of this new course is to present a unified treatment of optimal control and reinforcement learning (RL), with an emphasis on model-based reinforcement learning. The goal of the instructors is to unify the subjects as much as possible, and to concretize connections between these research communities. 
 4 | 
 5 | \paragraph{How is this course different from a standard class on Optimal Control?} 
 6 | First, we will emphasize practical computational tools for real world optimal control problems, such as model predictive control and sequential convex programming. Beyond this, the last third of the course focuses on the case in which an exact model of the system is not available. We will discuss this setting both in the online context (typically referred to as adaptive optimal control) and in the episodic context (the typical setting for reinforcement learning). 
 7 | 
 8 | \paragraph{How is this course different from a standard class on Reinforcement Learning?}
 9 | Many classes on reinforcement learning focus primarily on the setting of discrete Markov Decision Processes (MDPs), whereas we will focus primarily on continuous MDPs. More importantly, the focus on discrete MDPs leads planning with a known model (which is typically referred to as ``planning'' or ``control'' in RL) to be relatively simple. In this course, we will spend considerably more time focusing on planning with a known model in both continuous and discrete time. Finally, the focus of this course will primarily be on model-based methods. We will touch briefly on model-free methods at the end, and combinations of model-free and model-based approaches. 
10 | 
11 | \subsection*{A Note on Notation}
12 | 
13 | The notation and language used in the control theory and reinforcement learning communities vary substantially, as so we will state all of the notational choices we make in this section. First, optimal control problems are typically stated in terms of minimizing a cost function, whereas reinforcement learning problems aim to maximize a reward. These are mathematically identical statements, where one is simply the negation of the other. Herein, we will use the control theoretic approach of cost minimization. We write $\cost$ for the cost function, $\f$ for the system dynamics, and denote the state and action at time $t$ as $\st_t$ and $\ac_t$ respectively. We write scalars as lower case letters, vectors as bold lower case letters, and matrices as upper case letters. We write a deterministic policy as $\pol(\st)$, and a stochastic policy as $\pol(\ac\mid\st)$.
14 | We write the cost-to-go (negation of the value function) associated with policy $\pol$ at time $t$ and state $\st$ as $\J^\pol_t(\st)$. We will also sometimes refer to the cost-to-go as the value, but in these notes we are always referring to the expected sum of future costs. 
15 | For an in-depth discussion of the notational and language differences between the artificial intelligence and control theory communities, we refer the reader to \cite{powell2012ai}.
16 | 
17 | For notational convenience, we will write the Hessian of a function $f(x)$, evaluated at $x^*$, as $\nabla^2 f(x^*)$.
18 | 
19 | \subsection*{Prerequisites}
20 | 
21 | While these notes aim to be almost entirely self contained, familiarity with undergraduate level calculus, differential equations, and linear algebra (equivalent to CME 102 and EE 263 at Stanford) are assumed. We will briefly review nonlinear optimization in the first section of these notes, but previous experience with optimization (e.g. EE 364A) will be helpful. Finally, previous experience with machine learning (at the level of CS 229) is beneficial. 
22 | 
23 | % TODO add acks


--------------------------------------------------------------------------------
/2020/tex/combined.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[12pt]{book}
 2 | \usepackage[utf8]{inputenc}
 3 | \usepackage{fullpage}
 4 | 
 5 | \input{preamble.tex}
 6 | 
 7 | \title{Optimal and Learning-Based Control\\
 8 | Course Notes}
 9 | 
10 | \author{James Harrison\thanks{Contact: jharrison@stanford.edu}}
11 | \date{\today}
12 | 
13 | \begin{document}
14 | 
15 | \maketitle
16 | 
17 | 
18 | \input{source/intro.tex}
19 | 
20 | \newpage
21 | \tableofcontents
22 | 
23 | \newpage
24 | 
25 | 
26 | \part{Optimization and Optimal Control}
27 | \input{source/ch1.tex}
28 | \input{source/ch2.tex}
29 | \input{source/ch3.tex}
30 | \input{source/ch4.tex}
31 | \input{source/ch5.tex}
32 | \input{source/ch6.tex}
33 | 
34 | \part{Adaptive Control and Reinforcement Learning}
35 | \input{source/ch7.tex}
36 | \input{source/ch8.tex}
37 | \input{source/ch9.tex}
38 | \input{source/ch10.tex}
39 | \input{source/ch11.tex}
40 | \input{source/ch12.tex}
41 | % \input{source/ch13.tex}
42 | 
43 | % \appendix
44 | % \part{Appendices}
45 | % \input{source/app1.tex}
46 | % \input{source/app2.tex}
47 | 
48 | 
49 | \bibliography{references}
50 | \bibliographystyle{alpha}
51 | 
52 | \end{document}
53 | 


--------------------------------------------------------------------------------
/2020/tex/figs/foc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/foc.png


--------------------------------------------------------------------------------
/2020/tex/figs/large_step.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/large_step.png


--------------------------------------------------------------------------------
/2020/tex/figs/linesearch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/linesearch.png


--------------------------------------------------------------------------------
/2020/tex/figs/newtonmethod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/newtonmethod.png


--------------------------------------------------------------------------------
/2020/tex/figs/optimality.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/optimality.png


--------------------------------------------------------------------------------
/2020/tex/figs/small_step.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/2020/tex/figs/small_step.png


--------------------------------------------------------------------------------
/2020/tex/preamble.tex:
--------------------------------------------------------------------------------
 1 | \usepackage{amsmath}
 2 | \usepackage{amsthm}
 3 | \usepackage{amsfonts}
 4 | \usepackage{bm}
 5 | \usepackage{graphicx}
 6 | \usepackage{subcaption}
 7 | \usepackage{accents}
 8 | \usepackage{mathtools}
 9 | 
10 | \usepackage{algorithm}
11 | \usepackage{algpseudocode}
12 | 
13 | \usepackage{url}
14 | \usepackage{color}
15 | 
16 | 
17 | \newtheorem{theorem}{Theorem}[section]
18 | \newtheorem{corollary}[theorem]{Corollary}
19 | \newtheorem{lemma}[theorem]{Lemma}
20 | \newtheorem{remark}[theorem]{Remark}
21 | \newtheorem{definition}[theorem]{Definition}
22 | 
23 | % macros
24 | \newcommand{\iid}{i.i.d.}
25 | 
26 | 
27 | \newcommand{\cost}{c}
28 | \newcommand{\pol}{\pi}
29 | \newcommand{\st}{\bm{x}}
30 | \newcommand{\cst}{\bm{p}} % costate
31 | \newcommand{\stdot}{\dot{\bm{x}}}
32 | \newcommand{\ac}{\bm{u}}
33 | \newcommand{\ob}{\bm{y}}
34 | \newcommand{\ad}{\bm{d}}
35 | \newcommand{\param}{\bm{\theta}} %vector of parameters 
36 | \newcommand{\hyp}{\bm{y}}
37 | \newcommand{\feat}{\bm{\phi}}
38 | 
39 | \newcommand{\stdim}{n}
40 | \newcommand{\acdim}{m}
41 | \newcommand{\obdim}{l}
42 | \newcommand{\datdim}{d}
43 | 
44 | 
45 | \newcommand{\statespace}{\mathcal{X}}
46 | \newcommand{\actionspace}{\mathcal{U}}
47 | 
48 | 
49 | \newcommand{\f}{f}
50 | \newcommand{\h}{h} %used for measurement model
51 | \newcommand{\J}{J}
52 | 
53 | 
54 | \newcommand{\w}{\bm{\omega}} %process noise
55 | \newcommand{\wob}{\bm{\nu}} %measurement noise
56 | \newcommand{\W}{\Sigma_{\omega}} %measurement covar
57 | \newcommand{\V}{\Sigma_{\nu}} %measurement noise
58 | \newcommand{\I}{\bm{i}} %information vector
59 | 
60 | 
61 | \newcommand{\ham}{\mathcal{H}} %information vector
62 | 
63 | \newcommand{\R}{\mathbb{R}}
64 | \newcommand{\E}{\mathbb{E}}
65 | \newcommand{\tr}{\text{tr}}
66 | \newcommand{\N}{\mathcal{N}}
67 | 
68 | 
69 | \newcommand\munderbar[1]{%
70 |   \underaccent{\bar}{#1}}
71 | 
72 | \newcommand{\argmin}{\text{argmin}}
73 | \newcommand{\argmax}{\text{argmax}}
74 | 
75 | \newcommand\jhtodo[1]{\textcolor{red}{[JH: #1]}} %James
76 | \newcommand{\jhmargin}[2]{{\color{red}#1}\marginpar{\color{red}\raggedright\footnotesize [JH]:#2}}
77 | 
78 | \newcommand{\mtmargin}[2]{{\color{blue}#1}\marginpar{\color{blue}\raggedright\footnotesize [MT]:#2}} % matt's comments
79 | 


--------------------------------------------------------------------------------
/2020/tex/references.bib:
--------------------------------------------------------------------------------
  1 | @String { icml              = {International Conference on Machine Learning (ICML)} }
  2 | @String { colt              = {Conference on Learning Theory (COLT)} }
  3 | @String { nips              = {Neural Information Processing Systems (NIPS)} }
  4 | @String { ijrr              = {International Journal of Robotics Research} }
  5 | @String { isrr              = {International Symposium on Robotics Research (ISRR)} }
  6 | @String { icra              = {{IEEE} International Conference on Robotics and Automation (ICRA)} }
  7 | @String { iros              = {{IEEE} International Conference on Intelligent Robots and Systems (IROS)} }
  8 | @String { humanoids         = {{IEEE-RAS} International Conference on Humanoid Robotics (Humanoids)} }
  9 | @String { jmlr              = {Journal of Machine Learning Research} }
 10 | @String { iclr              = {International Conference on Learning Representations (ICLR)} }
 11 | @String { uai               = {Uncertainty in Artificial Intelligence (UAI)} }
 12 | @String { tpami             = {IEEE Transactions on Pattern Analysis \& Machine Intelligence} }
 13 | @String { tac               = {IEEE Transactions on Automatic Control} }
 14 | @String { automatica        = {Automatica} }
 15 | @String { jfr               = {Journal of Field Robotics} }
 16 | @String { ar                = {Autonomous Robots} }
 17 | @String { ijcai             = {International Joint Conference on Artificial Intelligence (IJCAI)} }
 18 | @String { aaai              = {AAAI Conference on Artificial Intelligence} }
 19 | @String { cvpr              = {{IEEE} Conference on Computer Vision and Pattern Recognition (CVPR)} }
 20 | @String { eccv              = {European Conference on Computer Vision (ECCV)} }
 21 | @String { aistats           = {Artificial Intelligence and Statistics (AISTATS)} }
 22 | @String { acc               = {American Control Conference (ACC)} }
 23 | @String { cdc               = {IEEE Conference on Decision and Control (CDC)} }
 24 | @String { nc                = {Neural Computation} }
 25 | @String { jasa              = {Journal of the American Statistical Association} }
 26 | @String { wafr              = {Workshop on the Algorithmic Foundations of Robotics (WAFR)} }
 27 | @String { corl              = {Conference on Robot Learning (CoRL)} }
 28 | @String { rss               = {Robotics: Science and Systems (RSS)} }
 29 | @String { jgcd              = {AIAA Journal of Guidance, Control, and Dynamics} }
 30 | @String { tsc               = {IEEE Transactions on Control Systems Technology} }
 31 | 
 32 | 
 33 | % planning
 34 | 
 35 | @book{lavalle2006planning,
 36 |   title={Planning algorithms},
 37 |   author={LaValle, Steven M},
 38 |   year={2006},
 39 |   publisher={Cambridge university press}
 40 | }
 41 | 
 42 | % adaptive control
 43 | 
 44 | @book{ioannou2012robust,
 45 |   title={Robust adaptive control},
 46 |   author={Ioannou, Petros A and Sun, Jing},
 47 |   year={2012},
 48 |   publisher={Courier Corporation}
 49 | }
 50 | 
 51 | 
 52 | % optimization 
 53 | 
 54 | @book{bertsekas2016nonlinear,
 55 |   title={Nonlinear programming},
 56 |   author={Bertsekas, Dimitri P},
 57 |   year={2016},
 58 |   publisher={Athena Scientific}
 59 | }
 60 | 
 61 | @book{bertsimas1997introduction,
 62 |   title={Introduction to linear optimization},
 63 |   author={Bertsimas, Dimitris and Tsitsiklis, John N},
 64 |   year={1997},
 65 |   publisher={Athena Scientific}
 66 | }
 67 | 
 68 | @article{powell2012ai,
 69 |   title={{AI}, {OR} and control theory: A rosetta stone for stochastic optimization},
 70 |   author={Powell, Warren B},
 71 |   year={2012}
 72 | }
 73 | 
 74 | @book{boyd2004convex,
 75 |   title={Convex optimization},
 76 |   author={Boyd, Stephen and Vandenberghe, Lieven},
 77 |   year={2004},
 78 |   publisher={Cambridge university press}
 79 | }
 80 | 
 81 | @article{kolter2008convex,
 82 |   title={Convex Optimization Overview},
 83 |   journal={CS 229 Lecture Notes},
 84 |   author={Zico Kolter},
 85 |   year={2008}
 86 | }
 87 | 
 88 | % DP
 89 | 
 90 | @book{altman1999constrained,
 91 |   title={Constrained Markov decision processes},
 92 |   author={Altman, Eitan},
 93 |   volume={7},
 94 |   year={1999},
 95 |   publisher={CRC Press}
 96 | }
 97 | 
 98 | @book{bertsekas1995dynamic,
 99 |   title={Dynamic programming and optimal control},
100 |   author={Bertsekas, Dimitri P},
101 |   edition={4},
102 |   number={1},
103 |   year={2012}
104 | }
105 | 
106 | @book{anderson2007optimal,
107 |   title={Optimal control: linear quadratic methods},
108 |   author={Anderson, Brian DO and Moore, John B},
109 |   year={2007},
110 |   publisher={Courier Corporation}
111 | }
112 | 
113 | @inproceedings{todorov2005generalized,
114 |   title={A generalized iterative {LQG} method for locally-optimal feedback control of constrained nonlinear stochastic systems},
115 |   author={Todorov, Emanuel and Li, Weiwei},
116 |   booktitle=acc,
117 |   year={2005}
118 | }
119 | 
120 | @inproceedings{tassa2014control,
121 |   title={Control-limited differential dynamic programming},
122 |   author={Tassa, Yuval and Mansard, Nicolas and Todorov, Emo},
123 |   booktitle=icra,
124 |   year={2014}
125 | }
126 | 
127 | @inproceedings{levine2014learning,
128 |   title={Learning complex neural network policies with trajectory optimization},
129 |   author={Levine, Sergey and Koltun, Vladlen},
130 |   booktitle=icml,
131 |   year={2014}
132 | }
133 | 
134 | @book{mayne1970ddp,
135 |   title={Differential Dynamic Programming},
136 |   author={David Jacobson and David Mayne},
137 |   year={1970},
138 |   publisher={Elsevier}
139 | }
140 | 
141 | @inproceedings{tassa2012synthesis,
142 |   title={Synthesis and stabilization of complex behaviors through online trajectory optimization},
143 |   author={Tassa, Yuval and Erez, Tom and Todorov, Emanuel},
144 |   booktitle=iros,
145 |   year={2012}
146 | }
147 | 
148 | @techreport{liao1992advantages,
149 |   title={Advantages of differential dynamic programming over Newton's method for discrete-time optimal control problems},
150 |   author={Liao, Li-zhi and Shoemaker, Christine A},
151 |   year={1992},
152 |   institution={Cornell University}
153 | }
154 | 
155 | @inproceedings{xie2017differential,
156 |   title={Differential dynamic programming with nonlinear constraints},
157 |   author={Xie, Zhaoming and Liu, C Karen and Hauser, Kris},
158 |   booktitle=icra,
159 |   year={2017}
160 | }
161 | 
162 | @inproceedings{giftthaler2017projection,
163 |   title={A projection approach to equality constrained iterative linear quadratic optimal control},
164 |   author={Giftthaler, Markus and Buchli, Jonas},
165 |   booktitle=humanoids,
166 |   year={2017}
167 | }
168 | 
169 | @inproceedings{li2004iterative,
170 |   title={Iterative linear quadratic regulator design for nonlinear biological movement systems.},
171 |   author={Li, Weiwei and Todorov, Emanuel},
172 |   booktitle={International Conference on Informatics in Control, Automation, and Robotics},
173 |   year={2004}
174 | }
175 | 
176 | %HJB
177 | 
178 | @article{mitchell2005time,
179 |   title={A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games},
180 |   author={Mitchell, Ian M and Bayen, Alexandre M and Tomlin, Claire J},
181 |   journal=tac,
182 |   year={2005}
183 | }
184 | 
185 | @article{bressan2010noncooperative,
186 |   title={Noncooperative differential games. a tutorial},
187 |   year={2010},
188 |   author={Bressan, Alberto}
189 | }
190 | 
191 | @book{kirk2012optimal,
192 |   title={Optimal control theory: an introduction},
193 |   author={Kirk, Donald E},
194 |   year={2012},
195 |   publisher={Courier Corporation}
196 | }
197 | 
198 | % Indirect
199 | 
200 | @book{bryson1975applied,
201 |   title={Applied Optimal Control: Optimization, Estimation and Control},
202 |   author={Arthur Bryson and Yu-Chi Ho},
203 |   year={1975},
204 |   publisher={CRC Press}
205 | }
206 | 
207 | @article{lee1967foundations,
208 |   title={Foundations of optimal control theory},
209 |   author={Lee, Ernest Bruce and Markus, Lawrence},
210 |   year={1967},
211 |   publisher={Wiley}
212 | }
213 | 
214 | % Direct
215 | 
216 | @article{kelly2017transcription,
217 |   title={Transcription Methods for Trajectory Optimization: a beginners tutorial},
218 |   author={Kelly, Matthew},
219 |   journal={arXiv:1707.00284},
220 |   year={2017}
221 | }
222 | 
223 | @article{kelly2017introduction,
224 |   title={An introduction to trajectory optimization: how to do your own direct collocation},
225 |   author={Kelly, Matthew},
226 |   journal={SIAM Review},
227 |   year={2017}
228 | }
229 | 
230 | % MPC
231 | 
232 | @book{borrelli2017predictive,
233 |   title={Predictive control for linear and hybrid systems},
234 |   author={Borrelli, Francesco and Bemporad, Alberto and Morari, Manfred},
235 |   year={2017},
236 |   publisher={Cambridge University Press}
237 | }
238 | 
239 | @book{rawlings2017model,
240 |   title={Model Predictive Control: Theory, Computation, and Design},
241 |   author={Rawlings, James Blake and Mayne, David Q and Diehl, Moritz},
242 |   year={2017},
243 |   publisher={Nob Hill Publishing}
244 | }
245 | 
246 | % Adaptive Optimal Control
247 | 
248 | @article{ljung1999system,
249 |   title={System identification},
250 |   author={Ljung, Lennart},
251 |   journal={Wiley Encyclopedia of Electrical and Electronics Engineering},
252 |   year={1999}
253 | }
254 | 
255 | @book{aastrom2013adaptive,
256 |   title={Adaptive control},
257 |   author={{\AA}str{\"o}m, Karl J and Wittenmark, Bj{\"o}rn},
258 |   year={2013},
259 |   publisher={Courier Corporation}
260 | }
261 | 
262 | @article{simon1956dynamic,
263 |   title={Dynamic programming under uncertainty with a quadratic criterion function},
264 |   author={Simon, Herbert A},
265 |   journal={Econometrica},
266 |   year={1956}
267 | }
268 | 
269 | @article{becker1985adaptive,
270 |   title={Adaptive control with the stochastic approximation algorithm: Geometry and convergence},
271 |   author={Becker, Arthur and Kumar, P and Wei, Ching-Zong},
272 |   journal=tac,
273 |   year={1985}
274 | }
275 | 
276 | @article{abbasi2011regret,
277 |   title={Regret bounds for the adaptive control of linear quadratic systems},
278 |   author={Abbasi-Yadkori, Yasin and Szepesv{\'a}ri, Csaba},
279 |   journal=colt,
280 |   year={2011}
281 | }
282 | 
283 | @article{moldovan2015optimism,
284 |   title={Optimism-driven exploration for nonlinear systems},
285 |   author={Moldovan, Teodor Mihai and Levine, Sergey and Jordan, Michael I and Abbeel, Pieter},
286 |   journal=icra,
287 |   year={2015}
288 | }
289 | 
290 | @article{osband2016generalization,
291 |   title={Generalization and Exploration via Randomized Value Functions},
292 |   author={Osband, Ian and Van Roy, Benjamin and Wen, Zheng},
293 |   journal=icml,
294 |   year={2016}
295 | }
296 | 
297 | @book{zhou1996robust,
298 |   title={Robust and optimal control},
299 |   author={Zhou, Kemin and Doyle, John Comstock and Glover, Keith and others},
300 |   year={1996},
301 |   publisher={Prentice Hall}
302 | }
303 | 
304 | % regression 
305 | 
306 | @book{friedman2008elements,
307 |   title={The elements of statistical learning},
308 |   author={Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert},
309 |   year={2008},
310 |   volume={2}
311 | }
312 | 
313 | @book{murphy2012machine,
314 |   title={Machine learning: a probabilistic perspective},
315 |   author={Murphy, Kevin P},
316 |   year={2012},
317 |   publisher={MIT press}
318 | }
319 | 
320 | @inproceedings{rasmussen2003gaussian,
321 |   title={Gaussian processes in machine learning},
322 |   author={Rasmussen, Carl Edward},
323 |   year={2003},
324 |   organization={Springer}
325 | }
326 | 
327 | @article{petersenmatrix,
328 |   title={The Matrix Cookbook},
329 |   author={Petersen, Kaare Brandt and Pedersen, Michael Syskind}
330 | }
331 | 
332 | @inproceedings{krizhevsky2012imagenet,
333 |   title={Imagenet classification with deep convolutional neural networks},
334 |   author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
335 |   booktitle=nips,
336 |   year={2012}
337 | }
338 | 
339 | @book{goodfellow2016deep,
340 |   title={Deep learning},
341 |   author={Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron},
342 |   year={2016},
343 |   publisher={MIT press}
344 | }
345 | 
346 | @book{sutton2018reinforcement,
347 |   title={Reinforcement learning: An introduction},
348 |   author={Sutton, Richard S and Barto, Andrew G},
349 |   year={2018},
350 |   publisher={MIT press}
351 | }
352 | 
353 | 
354 | % model-free
355 | 
356 | @article{mnih2015human,
357 |   title={Human-level control through deep reinforcement learning},
358 |   author={Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A and Veness, Joel and Bellemare, Marc G and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K and Ostrovski, Georg and others},
359 |   journal={Nature},
360 |   year={2015}
361 | }
362 | 
363 | @inproceedings{riedmiller2005neural,
364 |   title={Neural fitted Q iteration--first experiences with a data efficient neural reinforcement learning method},
365 |   author={Riedmiller, Martin},
366 |   booktitle={European Conference on Machine Learning},
367 |   year={2005}
368 | }
369 | 
370 | @inproceedings{van2016deep,
371 |   title={Deep reinforcement learning with double q-learning},
372 |   author={Van Hasselt, Hado and Guez, Arthur and Silver, David},
373 |   booktitle=aaai,
374 |   year={2016}
375 | }
376 | 
377 | @techreport{lin1993reinforcement,
378 |   title={Reinforcement learning for robots using neural networks},
379 |   author={Lin, Long-Ji},
380 |   year={1993},
381 |   institution={Carnegie-Mellon University}
382 | }


--------------------------------------------------------------------------------
/2020/tex/source/ch1.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Nonlinear Optimization}
  2 | 
  3 | In this section we discuss the generic nonlinear optimization problem that forms the basis for the rest of the material presented in this class. We write the minimization problem as 
  4 | \begin{equation*}
  5 | \begin{aligned}
  6 | & \underset{\bm{x} \in \mathcal{X}}{\min}
  7 | & & f(\bm{x})
  8 | \end{aligned}
  9 | \end{equation*}
 10 | where $f$ is the cost function, usually assumed twice continuously differentiable, $x \in \R^n$ is the optimization variable, and $\mathcal{X} \subset \R^n$ is the constraint set. The special case in which the cost function is linear and the constraint set is specified by linear equations and/or inequalities is \textit{linear optimization}, which we will not discuss. 
 11 | 
 12 | \section{Unconstrained Nonlinear Optimization}
 13 | 
 14 | We will first address the unconstrained case, in which $\mathcal{X} = \R^n$. A vector $\bm{x}^*$ is said to be an unconstrained \textit{local minimum} if there exists $\epsilon > 0$ such that $f(\bm{x}^*) \leq f(\bm{x})$ for all $\bm{x} \in \{\bm{x} \mid \|\bm{x} - \bm{x}^*\| \leq \epsilon\}$, and $\bm{x}^*$ is said to be an unconstrained \textit{global minimum} if $f(\bm{x}^*) \leq f(\bm{x})$ for all $x \in \R^n$. 
 15 | 
 16 | \subsection{Necessary Conditions for Optimality}
 17 | 
 18 | For a differentiable cost function, we can compare the cost of a point to its neighbors by considering a small variation $\Delta \bm{x}$ from $\bm{x}^*$. By using Taylor expansions, this yields a first and second order cost variation
 19 | \begin{equation}
 20 |     f(\bm{x}^* + \Delta \bm{x}) - f(\bm{x}^*) \approx \nabla f(\bm{x}^*)^T \Delta \bm{x} + \frac{1}{2} \Delta \bm{x}^T \nabla^2 f(\bm{x}^*) \Delta \bm{x}.
 21 | \end{equation}
 22 | Setting $\Delta \bm{x}$ to be equal to positive and negative multiples of the unit coordinate vector, we have 
 23 | \begin{equation}
 24 |     \frac{\partial f(\bm{x}^*)}{\partial x_i} \geq 0
 25 | \end{equation}
 26 | where $x_i$ denotes the $i$'th coordinate of $\bm{x}$, and 
 27 | \begin{equation}
 28 |     \frac{\partial f(\bm{x}^*)}{\partial x_i} \leq 0
 29 | \end{equation}
 30 | for all $i$, which is only satisfied by $\nabla f(\bm{x}^*) = 0$.  This is referred to as the \textit{first order necessary condition for optimality}. Looking at the second order variation, and noting that $\nabla f(\bm{x}^*) = 0$, we expect
 31 | \begin{align}
 32 | f(\bm{x}^* + \Delta \bm{x}) - f(\bm{x}^*) \geq 0
 33 | \end{align}
 34 | and thus
 35 | \begin{align}
 36 | \Delta \bm{x}^T \nabla^2 f(\bm{x}^*) \Delta \bm{x} \geq 0
 37 | \end{align}
 38 | which implies $\nabla^2 f(\bm{x}^*)$ is positive semidefinite. This is referred to as the \textit{second order necessary condition for optimality}. Stating these conditons formally, 
 39 | 
 40 | \begin{theorem}[Necessary Conditions for Optimality (NOC)]
 41 | Let $\bm{x}^*$ be an unconstrained local minimum of $f:\R^n \to \R$ and $f \in C^1$ in an open set $S$ containing $\bm{x}^*$. Then,
 42 | \begin{equation}
 43 |     \nabla f(\bm{x}^*) = 0.
 44 | \end{equation}
 45 | If $f \in C^2$ within $S$, $\nabla^2 f(\bm{x}^*)$ is positive semidefinite. 
 46 | \end{theorem}
 47 | 
 48 | \proof{See section 1.1 of \cite{bertsekas2016nonlinear}}.
 49 | 
 50 | \subsection{Sufficient Conditions for Optimality}
 51 | 
 52 | 
 53 | \begin{figure}[t]
 54 |     \centering
 55 |     \includegraphics[width=0.7\linewidth]{figs/foc.png}
 56 |     \caption{An example of a function for which the necessary conditions of optimality are satisfied but the sufficient conditions are not.}
 57 |     %Should we specify here that $x=0$ is the counterexample where the derivative and hessian are zero but $0$ is not a local min? 
 58 |     \label{fig:foc}
 59 | \end{figure}
 60 | 
 61 | 
 62 | If we strengthen the second order condition to $\nabla^2 f(\bm{x}^*)$ being positive definite, we have the sufficient conditions for $\bm{x}^*$ being a local minimum. Why is the second order necessary conditions not sufficient? An example function is given in figure \ref{fig:foc}. Formally, 
 63 | 
 64 | \begin{theorem}[Sufficient Conditions for Optimality (SOC)]
 65 | Let $f:\R^n \to \R$ be $C^2$ in an open set $S$. Suppose a vector $\bm{x}^*$ satisfies the conditions $\nabla f(\bm{x}^*) = 0$ and $\nabla^2 f(\bm{x}^*)$ is positive definite. Then $\bm{x}^*$ is a strict unconstrained local minimum of $f$. 
 66 | \end{theorem}
 67 | 
 68 | Proof is again given in Section 1.1 of \cite{bertsekas2016nonlinear}.
 69 | There are several reasons why the optimality conditions are important. In a general nonlinear optimization setting, they can be used to filter candidates for global minima. They can be used for sensitivity analysis, in which the sensitivity of $\bm{x}^*$ to model parameters can be quantified \cite{bertsekas2016nonlinear}. This is common in e.g. microeconomics. Finally, these conditions often provide the basis for the design and analysis of optimization algorithms.
 70 | 
 71 | \subsection{Special case: Convex Optimization}
 72 | 
 73 | A special case within nonlinear optimization is the set of \textit{convex optimization} problems. A set $S \subset \R^n$ is called \textit{convex} if 
 74 | \begin{equation}
 75 |     \alpha \bm{x} + (1 - \alpha) \bm{y} \in S, \quad \forall \bm{x},\bm{y} \in S, \forall \alpha \in [0,1].
 76 | \end{equation}
 77 | For $S$ convex, a function $f:S\to\R$ is called convex if 
 78 | \begin{equation}
 79 |     f(\alpha \bm{x} + (1-\alpha) \bm{y}) \leq \alpha f(\bm{x}) + (1-\alpha) f(\bm{y}).
 80 |     \label{eq:conv_fun}
 81 | \end{equation}
 82 | This class of problems has several important characteristics. If $f$ is convex, then
 83 | \begin{itemize}
 84 |     \item A local minimum of $f$ over $S$ is also a global minimum over $S$. If in addition $f$ is strictly convex (the inequality in (\ref{eq:conv_fun}) is strict), there exists at most one global minimum of $f$.
 85 |     \item If $f \in C^1$ and convex, and the set $S$ is open, $\nabla f(\bm{x}^*) = 0$ is a necessary and sufficient condition for a vector $\bm{x}^* \in S$ to be a global minimum over $S$.
 86 | \end{itemize}
 87 | Convex optimization problems have several nice properties that make them (usually) computationally efficient to solve, and the first property above gives a certificate of having obtained global optimality that is difficult or impossible to obtain in the general nonlinear optimization setting. For a thorough treatment of convex optimization theory and algorithms, see \cite{boyd2004convex}. 
 88 | 
 89 | \subsection{Computational Methods}
 90 | 
 91 | % We should probably say at some point that Convex optimization is nice because all of the following techniques (with the right parameters) can find global minimizers.
 92 | 
 93 | In this subsection we will discuss the class of algorithms known as \textit{gradient methods} for finding local minima in nonlinear optimization problems. These approaches, rely (roughly) on following the gradient of the function ``downhill'', toward the minima. More concretely, these algorithms rely on taking steps of the form
 94 | \begin{equation}
 95 |     \bm{x}^{k+1} = \bm{x}^k + \alpha^{k} \bm{d}^k
 96 | \end{equation}
 97 | where if $\nabla f(\bm{x}) \neq 0$, $\bm{d}^k$ is chosen so that 
 98 | \begin{equation}
 99 |     \nabla f(\bm{x})^T \bm{d}^k < 0
100 | \end{equation}
101 | and $\alpha > 0$. Typically, the step size $\alpha^k$ is chosen such that 
102 | \begin{equation}
103 |     f(\bm{x}^k + \alpha^k \bm{d}^k) < f(\bm{x}^k),
104 | \end{equation}
105 | but generally, the step size and the direction of descent ($\bm{d}^k$) are tuning parameters. 
106 | 
107 | We will look at the general class of descent directions of the form 
108 | \begin{equation}
109 |     \bm{d}^k = -D^k \nabla f(\bm{x}^k)
110 | \end{equation}
111 | where $D^k > 0$ (note that this guarantees $\nabla f(\bm{x}^k)^T \bm{d}^k < 0$). 
112 | 
113 | \paragraph{Steepest descent, $D^k = I$.} The simplest choice of descent direction is directly following the gradient, and ignoring second order function information. In practice, this often leads to slow convergence (figure \ref{fig:graddesc_small}) and possible oscillation (figure \ref{fig:graddesc_large}).
114 | 
115 | \begin{figure}[!t]
116 |     \centering
117 |     \begin{subfigure}[b]{0.46\linewidth}
118 |         \centering
119 |         \includegraphics[width=\textwidth]{figs/small_step.png}
120 |         \caption{Steepest descent, small fixed step size.}
121 |         \label{fig:graddesc_small}
122 |     \end{subfigure}%
123 |     \begin{subfigure}[b]{0.46\linewidth}
124 |         \centering
125 |         \includegraphics[width=\textwidth]{figs/large_step.png}
126 |         \caption{Steepest descent, large fixed step size.}
127 |         \label{fig:graddesc_large}
128 |     \end{subfigure}
129 |     \begin{subfigure}[b]{0.46\linewidth}
130 |         \centering
131 |         \includegraphics[width=\textwidth]{figs/linesearch.png}
132 |         \caption{Steepest descent, step size chosen via line search.}
133 |         \label{fig:graddesc_line}
134 |     \end{subfigure}%
135 |     \begin{subfigure}[b]{0.46\linewidth}
136 |         \centering
137 |         \includegraphics[width=\textwidth]{figs/newtonmethod.png}
138 |         \caption{Newton's method. Note that the method converges in one step.}
139 |         \label{fig:graddesc_newton}
140 |     \end{subfigure}
141 |     \caption{Comparison of steepest descent methods with various step sizes, and Newton's method, on the same quadratic cost function.}
142 |     \label{fig:gradient_descent}
143 | \end{figure}
144 | 
145 | \paragraph{Newton's Method, $D^k = (\nabla^2 f(\bm{x}^k))^{-1}$.} The underlying idea of this approach is to at each iteration, minimize the quadratic approximation of $f$ around $\bm{x}^k$,
146 | \begin{equation}
147 |     f^k(\bm{x}) = f(\bm{x}^k) + \nabla f(\bm{x}^k)^T (\bm{x} - \bm{x}^k) + \frac{1}{2} (\bm{x} - \bm{x}^k)^T \nabla^2 f(\bm{x}^k) (\bm{x} - \bm{x}^k).
148 | \end{equation}
149 | Setting the derivative of this to zero, we obtain 
150 | \begin{equation}
151 |     \nabla f(\bm{x}^k) + \nabla^2 f(\bm{x}^k) (\bm{x} - \bm{x}^k) = 0
152 | \end{equation}
153 | and thus, by setting $\bm{x}^{k+1}$ to be the $\bm{x}$ that satisfies the above, we get the
154 | \begin{equation}
155 |     \bm{x}^{k+1} = \bm{x}^k - (\nabla^2 f(\bm{x}^k))^{-1} \nabla f(\bm{x}^k)
156 | \end{equation}
157 | or more generally, 
158 | \begin{equation}
159 |     \bm{x}^{k+1} = \bm{x}^k - \alpha (\nabla^2 f(\bm{x}^k))^{-1} \nabla f(\bm{x}^k).
160 | \end{equation}
161 | Note that this update is only valid for $\nabla^2 f(\bm{x}^k) \succ 0$. When this condition doesn't hold, $\bm{x}^{k+1}$ is not a minimizer of the second order approximation (as a result of the SOCs). See figure \ref{fig:graddesc_newton} for an example where Newton's method converges in one step, as a result of the cost function being quadratic. 
162 | 
163 | \paragraph{Diagonally scaled steepest descent, $D^k = \textrm{diag}(d_1^k, \ldots, d_n^k)$.} Have $d_i^k >0 \forall i$. A popular choice is 
164 | \begin{equation}
165 |     d_i^k = \left( \frac{\partial^2 f(\bm{x}^k)}{\partial x_i^2} \right)^{-1}
166 | \end{equation}
167 | which is a diagonal approximation of the Hessian
168 | 
169 | %{Should we cite the AdaGrad paper here or at least mention the name? Should we mention momentum and ADAM as well?}. 
170 | 
171 | \paragraph{Modified Newton's method, $D^k = (\nabla^2 f(\bm{x}^0))^{-1}$.} Requires $\nabla^2 f(\bm{x}^0) > 0$. For cases in which one expects $\nabla^2 f(\bm{x}^0) \approx \nabla^2 f(\bm{x}^k)$, this removes having to compute the Hessian at each step. 
172 | 
173 | In addition to choosing the descent direction, there also exist a variety of methods to choose the step size $\alpha$. A computationally intensive but efficient (in terms of the number of steps taken) is using a minimization rule of the form 
174 | \begin{equation}
175 | \alpha^k = \argmin_{\alpha\geq 0} f(\bm{x}^k + \alpha \bm{d}^k)  
176 | \end{equation}
177 | which is usually solved via line search (figure \ref{fig:graddesc_line}). Alternative approaches include a limited minimization rule, in which you constrain $\alpha^k \in [0,s]$ during the line search, or simpler approach such as a constant step size (which may not guarantee convergence), or a diminishing scheduled step size. In this last case, schedules are typically chosen such that $\alpha^k \to 0$ as $k \to \infty$, while $\sum_{k=0}^\infty \alpha^k = +\infty$. 
178 | 
179 | % TODO we should have a computational complexity discussion for these topics? i.e. first order methods have many cheap iterations, second order methods have fewer, but more computationally intensive iterations. Very large problems or problems where memory is limited $\implies$ 1st order is better, special hessian structure or poor conditioning $\implies$ second order is better. Adagrad and ADAM try to get the benefits of both, etc. Thoughts?
180 | 
181 | \section{Constrained Nonlinear Optimization}
182 | 
183 | In this section we will address the general constrained nonlinear optimization problem,
184 | \begin{equation*}
185 | \begin{aligned}
186 | & \underset{\bm{x} \in \mathcal{X}}{\min}
187 | & & f(\bm{x})
188 | \end{aligned}
189 | \end{equation*}
190 | which may equivalently be written
191 | \begin{equation*}
192 | \begin{aligned}
193 | & \underset{\bm{x}}{\min}
194 | & & f(\bm{x})\\
195 | & \textrm{s.t.} & & \bm{x} \in \mathcal{X}
196 | \end{aligned}
197 | \end{equation*}
198 | where the set $\mathcal{X}$ is usually specified in terms of equality and inequality constraints. To operate within this problem structure, we will develop a set of optimality conditions involving auxiliary variables called \textit{Lagrange multipliers}. 
199 | 
200 | \subsection{Equality Constrained Optimization}
201 | 
202 | We will first look at optimization with equality constraints of the form 
203 | \begin{equation*}
204 | \begin{aligned}
205 | & \underset{\bm{x}}{\min}
206 | & & f(\bm{x})\\
207 | & \textrm{s.t.} & & h_i(\bm{x}) = 0, \quad i = 1, \ldots, m
208 | \end{aligned}
209 | \end{equation*}
210 | where $f:\R^n \to \R$, $h_i:\R^n \to \R$ are $C^1$. We will write $\bm{h} = [h_1,\ldots, h_m ]^T$. For a given local minimum $\bm{x}^*$, there exist scalars $\lambda_1, \ldots, \lambda_m$ called Lagrange multipliers such that 
211 | \begin{equation}
212 |     \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) = 0.
213 | \end{equation}
214 | There are several possible interpretations for Lagrange multipliers. First, note that the cost gradient $f(\bm{x}^*)$ is in the subspace spanned by the constraint gradients at $\bm{x}^*$. Equivalently, $f(\bm{x}^*)$ is orthogonal to the subspace of first order feasible variations 
215 | \begin{equation}
216 |     V(\bm{x}^*) = \{\Delta \bm{x} \mid \nabla h_i(\bm{x}^*)^T \Delta \bm{x} = 0, i=1,\ldots,m\}.
217 | \end{equation}
218 | This subspace is the space of variations $\Delta \bm{x}$ for which $\bm{x} = \bm{x}^* + \Delta \bm{x}$ satisfies the constraint $\bm{h}(\bm{x}) = 0$ up to first order. Therefore, at a local minimum, the first order cost variation $\nabla f(\bm{x}^*)^T \Delta \bm{x}$ is zero for all variations $\Delta \bm{x}$ in this space. 
219 | 
220 | Given this informal understanding, we may now precisely state the necessary conditions for optimality in constrained optimization. 
221 | 
222 | \begin{theorem}[NOC for equality constrained optimization]
223 | \label{thm:eq_con_NOC}
224 | Let $\bm{x}^*$ be a local minimum of $f$ subject to $\bm{h}(\bm{x}) = 0$, and assume that the constraint gradients $\nabla h_1(\bm{x}^*),\ldots,\nabla h_m(\bm{x}^*)$ are linearly independent. Then there exists a unique vector $\bm{\lambda}^* = [\lambda_1^*,\ldots,\lambda_m^*]^T$ called a Lagrange multiplier vector, such that
225 | \begin{equation}
226 |     \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) = 0.
227 | \end{equation}
228 | If in addition $f$ and $\bm{h}$ are $C^2$, we have 
229 | \begin{equation}
230 |     \bm{y}^T (\nabla^2 f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla^2 h_i(\bm{x}^*)) \bm{y} \geq 0, \quad \forall \bm{y} \in V(\bm{x}^*) 
231 | \end{equation}
232 | where
233 | \begin{equation}
234 |     V(\bm{x}^*) = \{\bm{y} \mid \nabla h_i(\bm{x}^*)^T \bm{y} = 0, i=1,\ldots,m\}.
235 | \end{equation}
236 | \end{theorem}
237 | 
238 | \begin{proof}
239 | See \cite{bertsekas2016nonlinear} Section 3.1.1 and 3.1.2.
240 | \end{proof}
241 | 
242 | We will sketch two possible proofs for the NOC for equality constrained optimization. 
243 | 
244 | \paragraph{Penalty approach.} This approach relies on adding a to the cost function a large penalty term for constraint violation. This is the same approach that will be used in proving the necessary conditions for inequality constrained optimization, and is the basis of a variety of practical numerical algorithms. 
245 | 
246 | \paragraph{Elimination approach.} This approach views the constraints as a system of $m$ equations with $n$ unknowns, for which $m$ variables can be expressed in terms of the remaining $m-n$ variables. This reduces the problem to an unconstrained optimization problem. 
247 | 
248 | Note that in theorem \ref{thm:eq_con_NOC}, we assumed the gradients of the constraint functions were linearly independent. A feasible vector for which this holds is called \textit{regular}. If this condition is violated, a Lagrange multiplier for a local minimum may not exist. 
249 | 
250 | For convenience, we will write the necessary conditions in terms of the Lagrangian function $L:\R^{m+n} \to \R$,
251 | \begin{equation}
252 |     L(\bm{x},\bm{\lambda}) = f(\bm{x}) + \sum^m_{i=1} \lambda_i h_i(\bm{x}).
253 | \end{equation}
254 | This function allows the NOC conditions to be succinctly stated as 
255 | \begin{align}
256 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
257 |     \nabla_{\bm{\lambda}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
258 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*) \bm{y} &\geq 0, \quad \forall \bm{y} \in V(\bm{x}^*).
259 | \end{align}
260 | which form a system of $n+m$ equations with $n+m$ unknowns. Given this notation, we can state the sufficient conditions. 
261 | 
262 | \begin{theorem}[SOC for equality constrained optimization]
263 | Assume that $f$ and $\bm{h}$ are $C^2$ and let $\bm{x}^* \in \R^n$ and $\bm{\lambda}^* \in \R^m$ satisfy
264 | \begin{align}
265 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
266 |     \nabla_{\bm{\lambda}} L(\bm{x}^*,\bm{\lambda}^*) &= 0\\
267 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*) \bm{y} &> 0, \quad \forall \bm{y} \neq 0, \bm{y} \in V(\bm{x}^*).
268 | \end{align}
269 | \end{theorem}
270 | 
271 | \begin{proof}
272 | See \cite{bertsekas2016nonlinear} Section 3.2.
273 | \end{proof}
274 | Note that the SOC does not include regularity of $\bm{x}^*$. 
275 | 
276 | \subsection{Inequality Constrained Optimization}
277 | 
278 | We will now address the general case, including inequality constraints,
279 | \begin{equation*}
280 | \begin{aligned}
281 | & \underset{\bm{x}}{\min}
282 | & & f(\bm{x})\\
283 | & \textrm{s.t.} & & h_i(\bm{x}) = 0, \quad i = 1, \ldots, m\\
284 | & & & g_j(\bm{x}) \leq 0, \quad j = 1, \ldots, r
285 | \end{aligned}
286 | \end{equation*}
287 | where $f,h_i,g_i$ are $C^1$. The key intuition for the case of  inequality constraints is based on realizing that for any feasible point, some subset of the constraints will be active (for which $g_j(\bm{x}) = 0$), while the complement of this set will be inactive. We define the active set of inequality constraints, which we denote 
288 | \begin{equation}
289 |     A(\bm{x}) = \{j \mid g_j(\bm{x}) = 0\}.
290 | \end{equation}
291 | A constraint is active at $\bm{x}$ if it is in $A(\bm{x})$, otherwise it is inactive. Note that if $\bm{x}^*$i is a local minimum of the inequality constrained problem, then $\bm{x}^*$ is a local minimum of the identical problem with the inactive constraints removed. Moreover, at this local minimum, the constraints may be treated as equality constraints. Thus, if $\bm{x}^*$ is regular, there exists Lagrange multipliers $\lambda_1^*, \ldots, \lambda_m^*$ and $\mu_j^*, j \in A(\bm{x}^*)$ such that 
292 | \begin{equation}
293 | \nabla f(\bm{x}^*) + \sum^m_{i=1} \lambda_i \nabla h_i(\bm{x}^*) + \sum_{j \in A(\bm{x}^*)} \mu_j^* \nabla g_j(\bm{x}^*)= 0.    
294 | \end{equation}
295 | We will define the Lagrangian
296 | \begin{equation}
297 |     L(\bm{x},\bm{\lambda}, \bm{\mu}) = f(\bm{x}) + \sum^m_{i=1} \lambda_i h_i(\bm{x}) + \sum_{j =1}^r \mu_j  g_j(\bm{x}),
298 | \end{equation}
299 | which we will use to state the necessary and sufficient conditions. 
300 | 
301 | \begin{theorem}[Karush-Kuhn-Tucker NOC]
302 | Let $\bm{x}^*$ be a local minimum for the inequality constrained problem where $f, h_i, g_j$ are $C^1$ and assume $\bm{x}^*$ is regular (equality and active inequality constraint gradients are linearly independent). Then, there exists unique Lagrange multiplier vectors $\bm{\lambda}^*$ and $\bm{\mu}^*$ such that
303 | \begin{align}
304 |     \nabla_{\bm{x}} L(\bm{x}^*,\bm{\lambda}^*, \bm{\mu}^*) &= 0\\
305 |     \bm{\mu} &\geq 0\\
306 |     \mu_j^* &= 0, \quad \forall j \notin A(\bm{x}^*)
307 | \end{align}
308 | If in addition, $f,\bm{h},\bm{g}$ are $C^2$, we have 
309 | \begin{equation}
310 |     \bm{y}^T \nabla^2_{\bm{xx}} L(\bm{x}^*,\bm{\lambda}^*, \bm{\mu}^*) \bm{y} \geq 0 
311 | \end{equation}
312 | for all $\bm{y}$ such that 
313 | \begin{align}
314 |     \nabla h_i(\bm{x}^*)^T \bm{y} &=0, \quad i = 1, \ldots, m\\
315 |     \nabla g_j(\bm{x}^*)^T \bm{y} &=0, \quad j \in A(\bm{x}^*)
316 | \end{align}
317 | \end{theorem}
318 | 
319 | \begin{proof}
320 | See \cite{bertsekas2016nonlinear} Section 3.3.1.
321 | \end{proof}
322 | 
323 | The SOC are obtained similarly to the equality constrained case. 
324 | 
325 | % should add in statement of KKT SOC for completeness
326 | 
327 | % should we add section on convex optimization from recitation?
328 | 
329 | \section{Bibliographic Notes}
330 | 
331 | In this section we have addressed the necessary and sufficient conditions for constrained and unconstrained nonlinear optimization. This section is based heavily on \cite{bertsekas2016nonlinear}, and we refer the reader to this book for further details. We have avoided discussing linear programming, which is itself a large topic of study, about which many books have been written (we refer the reader to \cite{bertsimas1997introduction} as a good reference on the subject). 
332 | 
333 | Convex optimization has become a powerful and widespread tool in modern optimal control. While we have only addressed it briefly here, \cite{boyd2004convex} offers a fairly comprehensive treatment of the theory and practice of convex optimization. For a succinct overview with a focus on machine learning, we refer the reader to \cite{kolter2008convex}.
334 | 


--------------------------------------------------------------------------------
/2020/tex/source/ch10.tex:
--------------------------------------------------------------------------------
 1 | \chapter{Adaptive Control and Adaptive Optimal Control}
 2 | 
 3 | 
 4 | \section{Adaptive Control}
 5 | 
 6 | We will now discuss adaptive control, which (broadly speaking) aims to perform online adaptation of the control policy to improve performance. The formulation of the adaptive control problem is typically not in terms of optimal adaptive control. Typically, the designer of an adaptive control system will place more emphasis on proving the combined stability of the controller and the adaptive process than on minimizing a cost function. While in the system identification setting we were concerned with model identification in particular, work on adaptive control investigates both model adaptation as well as controller adaptation, and combinations. Indeed, adaptive control encompasses a wide variety of techniques. In \cite{aastrom2013adaptive}, the authors define an adaptive controller as ``a controller with adjustable parameters and a mechanism for adjusting the parameters''. Examples of adaptive control strategies include
 7 | \begin{itemize}
 8 | \item Adaptive pole placement or policy adaptation
 9 | \item Iterative learning control (ILC)
10 | \item Gain scheduling
11 | \item Model reference adaptive control (MRAC)
12 | \item Model identification adaptive control (MIAC)
13 | \item Dual control strategies
14 | \end{itemize}
15 | though we will focus primarily on the last three.
16 | 
17 | \subsection{Model Reference Adaptive Control (MRAC)}
18 | 
19 | We will now introduce model reference adaptive control, which may be interpreted as a combined model-based and model-free adaptive control scheme. While it leverages a model, this model is responsible only for generating a reference output to track, and control adaptation is done directly via updating the policy. 
20 | Despite stating our desired adaptive control problem in discrete time, we will describe the MRAC approach in continuous time, which is the more standard setting. 
21 | A model reference adaptive controller is composed of four parts:
22 | \begin{enumerate}
23 |     \item A plant containing unknown parameters
24 |     \item A \textit{reference model} for compactly specifying the desired output
25 |     \item A feedback control law for containing adjustable parameters
26 |     \item An adaptation mechanism for updating the adjustable parameters
27 | \end{enumerate}
28 | The reference model is used to provide the ideal plant response which the adaptation mechanism should aim to achieve. Choice of reference model is therefore an art, similar to choice of a cost function. However, practically, it is more difficult than choice of a cost function. A cost function designer for an optimal control or reinforcement learning system aims to reflect their desired performance characteristics is a cost function that yields a tractable optimization problem. The designer of an MRAC system, on the other hand, must choose a model that achieves good performance. As such, an MRAC system designer is implicitly solving a policy optimization problem to choose a reference model. Practically, choice of these models is simplified by considering linear systems, and thus one may specify performance in terms of relatively simple characteristics such as damping. 
29 | 
30 | One of the simplest approaches to MRAC is the so-called MIT rule. Let $\hat{\bm{y}}(t)$ denote the reference signal we aim to track, and $\bm{y}(t)$ denote the output of the system as a result of controller $\pol$, parameterized by $\param$. We will write specify the error $e(t) = \| \hat{\bm{y}}(t) - \bm{y}(t) \|_2$, as well as $J(t) = \frac{1}{2} e(t)^2$. Then, the MIT rule consists of taking gradient updates of the form 
31 | \begin{equation}
32 |     \dot{\param} = -\gamma \frac{\partial J}{\partial \param} = - \gamma e(t) \frac{\partial e}{\partial \param}(t)
33 | \end{equation}
34 | where $\gamma$ is a gain parameter governing update rate. This rule may be applied in discrete or continuous time (via an update difference equation or differential equation), and the joint dynamics of the system and the control parameter $\param$ may be analyzed for stability, typically with tools from Lyapunov stability theory. For a more complete discussion on MRAC and design of MRAC systems using Lyapunov theory, we refer the reader to \cite{aastrom2013adaptive}.
35 | 
36 | \subsection{Model Identification Adaptive Control (MIAC)}
37 | 
38 | Model identification adaptive control is the adaptive control scheme that logically follows from system identification: we will concurrently perform parameter estimation while controlling the system by using our estimate of the parameters. We will refer to our estimate of the model parameters as $\hat{\param}$. MIAC designs a control scheme that takes $\hat{\param}$ as an input in addition to the state $\st$. 
39 | 
40 | 
41 | % TODO need to clarify difference between MIAC and dual control more
42 | 
43 | An important distinction within MIAC schemes is between \textit{certainty-equivalent} controllers and so-called \textit{cautious} controllers. Certainty-equivalent approaches, like certainty-equivalence in the LQG setting, have the policy as a function only of a point estimate of the estimated parameters. In the LQG setting, in which the state is estimated by Kalman filter, it can be shown that certainty-equivalent control performs equivalently to a control scheme incorporating other statistics of the state estimate. However, this principle does not in general hold, and thus certainty-equivalence in adaptive control is often a design choice to make stability analysis tractable. Cautious controllers, on the other hand, incorporate other information about the estimate of the parameters. For example, in the Bayesian setting, the posterior density of the parameters may be passed to the controller. This approach is known as ``cautious'' as it explicitly factors the uncertainty of the parameter estimate into the control decision, and thus will be more robust when uncertainty is high. However, because we are operating within the passively adaptive control setting, cautious methods will not include the expected uncertainty reduction due to future information, and thus may be overly conservative. 
44 | 
45 | % examples of both CE and cautious strategies
46 | 
47 | \subsection{Linear Quadratic Adaptive Control}
48 | 
49 | One of the earliest works on adaptive control in the linear dynamics, quadratic cost setting was Simon \cite{simon1956dynamic}, who proposed using certainty-equivalence in the model estimation. His approach, however---to utilize the mean parameter estimates combined with an LQR controller---can converge to incorrect values with non-zero probability \cite{becker1985adaptive,abbasi2011regret}. Thus, it is necessary to augment the control strategy with heuristic approaches to actively explore the environment. A simple example exploration strategy is so-called $\epsilon$-greedy exploration, in which a random action is taken with probability $\epsilon$, and the system otherwise follows the certainty-equivalent optimal controller. While this performs reasonably well in the linear setting (where the definition of ``reasonably well'' may perhaps be debated), it perfoms poorly in nonlinear MDPs and discrete MDPs due to the highly ``local'' nature of the exploration \cite{moldovan2015optimism,osband2016generalization}. We expand on the problem of exploration in the next section. 
50 | 
51 | % add more on exploration? Generally need to more deeply discuss this literature.
52 | 
53 | 
54 | 
55 | \section{Probing, Planning for Information Gain, and Dual Control}
56 | 
57 | Thus, we can augment our state with some statistics of our estimate (which we will write $\hat{\param}$) to yield hyperstate $\hyp$, and augment our dynamics with the dynamics of the estimation process. To solve the Bellman dynamic programming problem for this hyperstate is dual control. 
58 | 
59 | The difference between these two approaches may seem subtle, but that former approach will result in a control scheme that optimally identifies the parameters for the purposes of control, whereas the latter approach will only passively adaptive the parameter estimate, and act with respect to that passive estimate. 
60 | 
61 | % The concept of probing. The idea of planning for information gain. The practical difficulty of optimizing the dual control problem (with examples from Edison Tse papers). Approximate/local methods for dual control? Belief space planning?
62 | 
63 | % planning in information space; belief MDP. Intractability. 
64 | 
65 | \section{Bibliographic Notes}
66 | 
67 | % Annaswamy, Sastry, Astrom


--------------------------------------------------------------------------------
/2020/tex/source/ch12.tex:
--------------------------------------------------------------------------------
 1 | \chapter{Model-based Reinforcement Learning}
 2 | 
 3 | \section{Adaptive and Learning MPC}
 4 | 
 5 | % cover classical work on adaptive LQR; exploration/persistent excitation (becker/kumar 85, possibly results from adaptive control books?)
 6 | 
 7 | % survey, briefly, modern work on adaptive LQR. Model-based and model-free? Lots of refs from szepesvari and recht papers. 
 8 | 
 9 | % recht papers have SLS as a prereq
10 | 
11 | % include output feedback? Pixels to torques, e.g.
12 | 
13 | % Bayesian methods. PETS? Dreamer and related models
14 | 
15 | 
16 | % Tomlin work + others; linear learning-based MPC only. Survey or dig into details?
17 | % Rosolia work on learning MPC
18 | 
19 | % Iterative Learning Control?
20 | 
21 | % Episodic fitting of linear models
22 | 
23 | 
24 | \section{Combining Model and Policy Learning}
25 | 
26 | % Discuss PILCO, other probabilistic methods like PETS?
27 | 
28 | % Discuss predictive models for observations, e.g. visuomotor control
29 | % Should avoid introduction of latent variable models in the class, but can include for the future. Could survey representation learning in MB RL?
30 | 
31 | % optimizing model via backprop through value; how value learning and model learning mesh
32 | 
33 | \section{Bibliographic Notes}
34 | 
35 | % For learning MPC, Hewing review


--------------------------------------------------------------------------------
/2020/tex/source/ch4.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Indirect Methods}
  2 | 
  3 | \section{Calculus of Variations}
  4 | 
  5 | We will begin by restating the optimal control problem. We will to find an admissible control sequence $\ac^*$ which causes the system 
  6 | \begin{equation}
  7 |     \stdot = \f(\st(t),\ac(t),t)
  8 | \end{equation}
  9 | to follow an \textit{admissible} trajectory $\st^*$ that minimizes the functional 
 10 | \begin{equation}
 11 |     \J = \cost_f(\st(t_f),t_f) + \int_{t_0}^{t_f} \cost(\st(t),\ac(t),t) dt.
 12 | \end{equation}
 13 | To find the minima of functions of a finite number of real numbers, we rely on the first order optimality conditions to find candidate minima, and use higher order derivatives to determine whether a point is a local minimum. Because we are minimizing a function that maps from some $n$ dimensional space to a scalar, candidate points have zero gradient in each of these dimensions. However, in the optimal control problem, we have a cost \textit{functional}, which maps functions to scalars. This is immediately problematic for our first order conditions --- we are required to check the necessary condition at infinite points. The necessary notion of optimality conditions for functionals is provided by calculus of variations.
 14 | 
 15 | Concretely, we define a functional $\J$ as a rule of correspondence assigning each function $\st$ in a class $\Omega$ (the domain) to a unique real number. The functional $\J$ is linear if and only if 
 16 | \begin{equation}
 17 |     \J(\alpha_1 \st_1 + \alpha_2 \st_2) = \alpha_1 \J(\st_1) + \alpha_2 \J(\st_2)
 18 | \end{equation}
 19 | for all $\st_1, \st_2, \alpha_1 \st_1 + \alpha_2 \st_2$ in $\Omega$. We must now define a notion of ``closeness'' for functions. Intuitively, two points being close together has an immediate geometric interpretation. We first define the norm of a function. The norm of a function is a rule of correspondence that assigns each $\st \in \Omega$, defined over $t \in [t_0,t_f]$, a real number. The norm of $\st$, which we denote $\|\st\|$, satisfies:
 20 | \begin{enumerate}
 21 |     \item $\|\st\| \geq 0$, and $\|\st\|=0$ iff $\st(t) = 0$ for all $t \in [t_0,t_f]$
 22 |     \item $\|\alpha \st\| = |\alpha| \|\st\|$ for all real numbers $\alpha$
 23 |     \item $\|\st_1 + \st_2\| \leq \|\st_1\| + \|\st_2\|$.
 24 | \end{enumerate}
 25 | To compare the closeness of two functions $\bm{y}, \bm{z}$, we let $\st(t) = \bm{y}(t) - \bm{z}(t)$. Thus, for two identical functions, $\|\st\|$ is zero. Generally, a norm will be small for ``close'' functions, and large for ``far apart'' functions. However, there exist many possible definitions of norms that satisfy the above conditions. 
 26 | 
 27 | \subsection{Extrema for Functionals}
 28 | 
 29 | A functional $\J$ with domain $\Omega$ has a local minimum at $\st^* \in \Omega$ if there exists an $\epsilon > 0$ such that $\J(\st) \geq \J(\st^*)$ for all $\st \in \Omega$ such that $\|\st - \st^*\| < \epsilon$. Maxima are defined similarly, just with $\J(\st) \leq \J(\st^*)$. 
 30 | 
 31 | Analogously to optimization of functions, we define the variation of the functional as
 32 | \begin{equation}
 33 |     \Delta \J(\st,\delta \st) \vcentcolon= \J(\st + \delta \st) - \J(\st)
 34 | \end{equation}
 35 | where $\delta \st(t)$ is the \textit{variation} of $\st(t)$. The increment of a functional can be written as 
 36 | \begin{equation}
 37 |     \Delta \J(\st,\delta \st) = \delta \J(\st,\delta \st) + g(\st,\delta \st) \|\delta \st\|
 38 | \end{equation}
 39 | where $\delta \J$ is linear in $\delta \st$. If 
 40 | \begin{equation}
 41 |     \lim_{\|\delta \st \| \to 0} \{g(\st,\delta \st)\} = 0
 42 | \end{equation}
 43 | then $\J$ is said to be differentiable on $\st$ and $\delta \J$ is the variation of $\J$ at $\st$. 
 44 | We can now state the \textit{fundamental theorem of the calculus of variations}. 
 45 | 
 46 | \begin{theorem}[Fundamental Theorem of CoV]
 47 | Let $\st(t)$ be a vector function of $t$ in the class $\Omega$, and $\J(\st)$ be a differentiable functional of $\st$. Assume that the functions in $\Omega$ are not constrained by any boundaries. If $\st^*$ is an extremal, the variation of $\J$ must vanish at $\st^*$, that is $\delta \J(\st^*, \delta \st) = 0$ for all admissible $\delta \st$ (i.e. such that $\st + \delta \st \in \Omega$).
 48 | \end{theorem}
 49 | 
 50 | \begin{proof}
 51 | \cite{kirk2012optimal}, Section 4.1.
 52 | \end{proof}
 53 | 
 54 | We will now look at how calculus of variations may be leveraged to approach practical problems. Let $x$ be a scalar continuous function in $C^1$. We would like to find a function $x^*$ for which the functional 
 55 | \begin{equation}
 56 |     \J(s) = \int_{t_0}^{t_f} g(x(t),x(t),t) dt
 57 | \end{equation}
 58 | has a relative extremum. We will assume $g \in C^2$, that $t_0,t_f$ are fixed, and $x_0, x_f$ are fixed. Let $\st$ be any curve in $\Omega$, and we will write the variation $\delta\J$ from the increment
 59 | \begin{align}
 60 |     \Delta \J(x,\delta x) &= \J(x + \delta x) - \J(x)\\
 61 |     &= \int_{t_0}^{t_f} g(x + \delta x, \dot{x} + \delta \dot{x}, t) dt - \int_{t_0}^{t_f} g(x,\dot{x},t) dt\\
 62 |     &= \int_{t_0}^{t_f} g(x + \delta x, \dot{x} + \delta \dot{x}, t) - g(x,\dot{x},t) dt.
 63 | \end{align}
 64 | Expanding via Taylor series, we get
 65 | \begin{equation}
 66 |     \Delta J(x,\delta x) = \int_{t_0}^{t_f} g(x,\dot{x},t) + \underbrace{\frac{\partial g}{\partial x}}_{g_{x}} (x,\dot{x}, t) \delta x + \underbrace{\frac{\partial g}{\partial \dot{x}}}_{g_{\dot{x}}} (x,\dot{x}, t) \delta \dot{x} + o(\delta x, \delta \dot{x}) - g(x,\dot{x},t) dt
 67 | \end{equation}
 68 | which yields the variation
 69 | \begin{equation}
 70 |     \delta \J = \int_{t_0}^{t_f} g_{x}(x,\dot{x},t) \delta x + g_{\dot{x}}(x,\dot{x},t)\delta \dot{x} \,\,dt.
 71 | \end{equation}
 72 | Integrating by parts, we have 
 73 | \begin{equation}
 74 |     \delta \J = \int_{t_0}^{t_f} \left[ g_{x}(x,\dot{x},t) - \frac{d}{dt} g_{\dot{x}}(x,\dot{x},t)\right] \delta x \delta t + [g_{\dot{x}}(x,\dot{x},t)\delta x(t)]_{t_0}^{t_f}.
 75 | \end{equation}
 76 | We have assumed $x(t_0), x(t_f)$ given, and thus $\delta x(t_0) = 0$, $\delta x(t_f) = 0$. Considering an extremal curve, applying the CoV theorem yields
 77 | \begin{equation}
 78 |     \int_{t_0}^{t_f} \left[ g_{x}(x,\dot{x},t) - \frac{d}{dt} g_{\dot{x}}(x,\dot{x},t)\right] \delta x \delta t.
 79 |     \label{eq:euler_int}
 80 | \end{equation}
 81 | We can now state the fundamental lemma of CoV. We will state it for vector functions, although our derivation was for the scalar case. 
 82 | 
 83 | \begin{lemma}[Fundamental Lemma of CoV]
 84 | If a function $h$ is continuous and 
 85 | \begin{equation}
 86 |     \int_{t_0}^{t_f} h(t) \delta \st(t) dt = 0
 87 | \end{equation}
 88 | for every function $\delta \st$ that is continuous in the interval $[t_0,t_f]$, then $h$ must be zero everywhere in the interval $[t_0,t_f]$.
 89 | \end{lemma}
 90 | 
 91 | \begin{proof}
 92 | \cite{kirk2012optimal}, Section 4.2.
 93 | \end{proof}
 94 | 
 95 | Applying the fundamental lemma, we find that a necessary condition for $\st^*$ being an extremal is 
 96 | \begin{equation}
 97 |     g_{\st}(\st,\stdot,t) - \frac{d}{dt} g_{\stdot}(\st,\stdot,t) = 0
 98 | \end{equation}
 99 | for all $t \in [t_0, t_f]$, which is the \textit{Euler equation}. This is a nonlinear, time-varying second-order ordinary differential equation with split boundary conditions (at $\st(t_0)$ and $\st(t_f)$).
100 | 
101 | \subsection{Generalized Boundary Conditions}
102 | 
103 | In the previous subsection, we assumed that $t_0, t_f, \st(t_0), \st(t_f)$ were all given. We will now relax that assumption. In particular, $t_f$ may be fixed or free, and each component of $\st(t_f)$ may be fixed or free. 
104 | 
105 | We begin by writing the variation around $\st^*$
106 | \begin{align}
107 | \delta \J &= \left[ g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta \st(t_f) + 
108 | \left[ g(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta t_f\\
109 | & \qquad + \int_{t_0}^{t_f} \left[ g_{\st}(\st^*,\stdot^*,t) - \frac{d}{dt} g_{\stdot}(\st^*,\stdot^*,t)\right]^T \delta \st \delta t \nonumber
110 | \end{align}
111 | by using the same integration by parts approach as before. Note that for fixed $t_f$ and $\st(t_f)$, the variations $\delta t_f$ and $\delta \st(t_f)$ vanish, and so we are left with (\ref{eq:euler_int}). Because $\delta t_f$ and $\delta \st(t_f)$ do not vanish in this case, we are left with additional boundary conditions that must be satisfied. Note that 
112 | \begin{equation}
113 |     \delta \st_f = \delta \st(t_f) + \stdot^*(t_f) \delta t_f
114 | \end{equation}
115 | and substituting this, we have
116 | \begin{align}
117 | \delta \J &= \left[ g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \right]^T \delta \st_f + 
118 | \left[ g(\st^*(t_f),\stdot^*(t_f),t_f) - g^T_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \stdot^*(t_f) \right] \delta t_f\\
119 | & \qquad + \int_{t_0}^{t_f} \left[ g_{\st}(\st^*,\stdot^*,t) - \frac{d}{dt} g_{\stdot}(\st^*,\stdot^*,t)\right] \delta \st \delta t \nonumber.
120 | \end{align}
121 | Stationarity of this variation thus requires 
122 | \begin{equation}
123 |     g_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) = 0
124 | \end{equation} 
125 | if $\st_f$ is free, and 
126 | \begin{equation}
127 |     g(\st^*(t_f),\stdot^*(t_f),t_f) - g^T_{\stdot}(\st^*(t_f),\stdot^*(t_f),t_f) \stdot^*(t_f) = 0
128 | \end{equation}
129 | if $t_f$ is free, in addition to the Euler equation being satisfied. For a complete reference on the boundary conditions associated with a variety of problem specifications, we refer the reader to Section 4.3 of \cite{kirk2012optimal}.
130 | 
131 | % TODO add table
132 | 
133 | \subsection{Constrained Extrema}
134 | 
135 | Previously, we have not considered constraints in the variational problem. However, constraints (and in particular, dynamics constraints) are central to most optimal control problems. Let $\bm{w} \in \R^{n+m}$ be a vector function in $C^1$. As previously, we would like to find a function $\bm{w}^*$ for which the functional 
136 | \begin{equation}
137 |     \J(\bm{w}) = \int_{t_0}^{t_f} g(\bm{w}(t),\dot{\bm{w}}(t),t) dt
138 | \end{equation}
139 | has a relative extremum, although we additionally introduce the constraints 
140 | \begin{equation}
141 |     \f_i(\bm{w}(t), \dot{\bm{w}}(t),t) = 0, \quad i = 1, \ldots, n.
142 | \end{equation}
143 | We will again assume $g \in C^2$ and that $t_0, \bm{w}(t_0)$ are fixed. Note that as a result of these $n$ constraints, only $m$ of the $n+m$ components of $\bm{w}$ are independent. 
144 | 
145 | One approach to solving this constrained problem is re-writing the $n$ dependent components of $\bm{w}$ in terms of the $m$ independent components. However, the nonlinearity of the constraints typically makes this infeasible. Instead, we will turn to Lagrange multipliers. We will write our \textit{augmented functional} as 
146 | \begin{equation}
147 |     \hat{g}(\bm{w}(t),\dot{\bm{w}}(t),\bm{p}(t),t) \vcentcolon = g(\bm{w}(t),\dot{\bm{w}}(t),t) + \bm{p}^T(t) \bm{f}(\bm{w}(t),\dot{\bm{w}}(t),t)
148 | \end{equation}
149 | where $\bm{p}(t)$ are Lagrange multipliers that are functions of time. Based on this, a necessary condition for optimality is 
150 | \begin{equation}
151 |     \hat{g}_{\bm{w}}(\bm{w}^*(t),\dot{\bm{w}}^*(t),\bm{p}^*(t),t) - \frac{d}{dt} \hat{g}_{\dot{\bm{w}}}(\bm{w}^*(t),\dot{\bm{w}}^*(t),\bm{p}^*(t),t) = 0
152 | \end{equation}
153 | with 
154 | \begin{equation}
155 |     \bm{f}(\bm{w}^*(t), \dot{\bm{w}}^*(t),t) = 0.
156 | \end{equation}
157 | 
158 | 
159 | \section{Indirect Methods for Optimal Control}
160 | 
161 | Having built the foundations of functional optimization via calculus of variations, we will now derive the necessary conditions for optimal control under the assumption that the admissible controls are not bounded. The problem, as previously stated, is to find an \textit{admissible control} $\ac^*$ which causes the system 
162 | \begin{equation}
163 |     \stdot(t) = \f(\st(t),\ac(t),t)
164 | \end{equation}
165 | to follow an \textit{admissible trajectory} $\st^*$ that minimizes the functional
166 | \begin{equation}
167 |     \J(\ac) = \cost_f(\st(t_f),t_f) + \int_{t_0}^{t_f} \cost(\st(t),\ac(t),t) dt
168 | \end{equation}
169 | under the assumptions that $\cost_f \in C^2$, the state and control are unconstrained, and $t_0, \st(t_0)$ are fixed. We define the \textit{Hamiltonian} as
170 | \begin{equation}
171 |     \ham(\st(t),\ac(t),\cst(t),t) \vcentcolon= \cost(\st(t),\ac(t),t) + \cst^T(t) \f(\st(t),\ac(t),t).
172 | \end{equation}
173 | Then, the necessary conditions are 
174 | \begin{align}
175 | \label{eq:nec_ham_conds1}
176 |     \stdot^*(t) &= \frac{\partial \ham}{\partial \cst}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
177 |     \dot{\cst}^*(t) &= -\frac{\partial \ham}{\partial \st}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
178 |     0 &= \frac{\partial \ham}{\partial \ac}(\st^*(t),\ac^*(t),\cst^*(t),t)
179 |     \label{eq:nec_ham_conds3}
180 | \end{align}
181 | which must hold for all $t \in [t_0,t_f]$. Additionally, the boundary conditions 
182 | \begin{align}
183 | \label{eq:ham_bcs}
184 |     [\frac{\partial \cost_f}{\partial \st}(\st^*(t_f),t_f)& - \cst^*(t_f)]^T \delta \st_f\\
185 |     & + [ \ham(\st^*(t_f),\ac^*(t_f), \cst^*(t_f), t_f) + \frac{\partial \cost_f}{\partial t}(\st^*(t_f),t_f)] \delta t_f = 0 \nonumber
186 | \end{align}
187 | must be satisfied. Note that as in the previous section, they are automatically satisfied if the terminal state and time are fixed. Based on these necessary conditions, we have a set of $2n$ \textit{first-order} differential equations (for the state and co-state), and a set of $m$ algebraic equations (control equations). The solution to the state and co-state equations will contain $2n$ constants of integration. To solve for these constants, we use the initial conditions $\st(t_0) = \st_0$ (of which there are $n$), and an additional $n$ (or $n+1$) equations from the boundary conditions. We are left with a two-point boundary value problem, which are considerably more difficult to solve than initial value problems which can just be integrated forward. For a full review of boundary conditions, we again refer the reader to \cite{kirk2012optimal}.
188 | 
189 | \subsection{Proof of the Necessary Conditions}
190 | 
191 | We will now prove the necessary conditions, (\ref{eq:nec_ham_conds1} -- \ref{eq:nec_ham_conds3}), along with the boundary conditions (\ref{eq:ham_bcs}). For simplicity, assume that the terminal cost is zero, and that $t_f, \st(t_f)$ are fixed and given. Consider the augmented cost function 
192 | \begin{equation}
193 | \hat{\cost}(\st(t), \stdot(t), \ac(t), \cst(t),t) \vcentcolon= \cost(\st(t),\ac(t),t) + \cst^T(t) [\f(\st(t),\ac(t),t) - \stdot(t)].
194 | \end{equation}
195 | When the constraint holds, this augmented cost function is exactly equal to the original cost function. The augmented total cost is then
196 | \begin{equation}
197 |     \hat{\J}(\ac) = \int_{t_0}^{t_f} \hat{\cost}(\st(t), \stdot(t), \ac(t), \cst(t),t) dt.
198 | \end{equation}
199 | Applying the fundamental theorem of CoV on an extremal, we have
200 | \begin{align}
201 |     0 = &\delta \hat{\J}(\ac) = \int_{t_0}^{t_f} \left[ \overbrace{\frac{\partial \hat{\cost}}{\partial \st}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)}^{\frac{\partial \cost}{\partial \st}(\st^*(t),  \ac^*(t), t) + \frac{\partial \f}{\partial \st}^T(\st^*(t),  \ac^*(t), t)\cst^*(t)} - \overbrace{\frac{d}{dt} \frac{\partial \hat{\cost}}{\partial \stdot}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)}^{-\frac{d}{dt}(-\cst^*(t))} \right]^T \delta \st(t) \\
202 |     & + \left[ \frac{\partial \hat{\cost}}{\partial \ac}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t)\right]^T \delta \ac(t) + \underbrace{\left[\frac{\partial \hat{\cost}}{\partial \cst}(\st^*(t), \stdot^*(t), \ac^*(t), \cst^*(t),t) \right]^T }_{\f(\st^*(t),\ac^*(t),t) - \stdot^*(t)} \delta \cst(t) dt \nonumber.
203 | \end{align}
204 | Considering each term in sequence, we have:
205 | \begin{itemize}
206 |     \item $\f(\st^*(t),\ac^*(t),t) - \stdot^*(t) = 0$ on an extremal.
207 |     \item The Lagrange multipliers are arbitrary, so we can select them to make the coefficients of $\delta \st(t)$ equal to zero, giving $\dot{\cst}(t) = -\frac{\partial \cost}{\partial \st}(\st^*(t),  \ac^*(t), t) - \frac{\partial \f}{\partial \st}^T(\st^*(t),  \ac^*(t), t)\cst^*(t)$.
208 |     \item The remaining variation $\delta \ac(t)$ is independent, so its coefficient must be zero, thus $ \frac{\partial \cost}{\partial \ac}(\st^*(t),  \ac^*(t), t) + \frac{\partial \f}{\partial \ac}^T(\st^*(t),  \ac^*(t), t)\cst^*(t) = 0$.
209 | \end{itemize}
210 | These conditions exactly give the necessary conditions as previously stated, when recast with the Hamiltonian formalism.
211 | 
212 | \section{Pontryagin's Minimum Principle}
213 | 
214 | So far, we have assumed that the admissible controls and states are unconstrained. This assumption is frequently violated for real systems---physical actuators have limits on their realizable outputs, and state constraints may occur due to safety considerations. The control $\ac^*$ causes the functional $\J$ to have a relative minimum if
215 | \begin{equation}
216 |     \J(\ac) - \J(\ac^*) = \Delta \J \geq 0
217 | \end{equation}
218 | for all admissible controls ``close'' to $\ac^*$. Letting $\ac = \ac^* + \delta \ac$, the increment can be expressed as
219 | \begin{equation}
220 |     \Delta \J(\ac^*,\delta\ac) = \delta \J(\ac^*,\delta \ac) + \text{higher order terms}.
221 | \end{equation}
222 | The variation $\delta \ac$ is arbitrary only if the extremal control is strictly within the boundary for all time in the interval $[t_0,t_f]$. In general, however, an extremal control lies on a boundary during at least subinterval in the interval $[t_0,t_f]$.
223 | As a consequence, admissible control variations $\delta \ac$ exist whose negatives are not admissible. This implies that a necessary condition for $\ac^*$ to minimize $\J$ is $\delta \J(\ac^*, \delta \ac) \geq 0$ for all admissible variations with $\|\delta \ac\|$ small enough. The reason why the equality in the fundamental theorem of CoV (in which we explicitly assumed no constraints) is replaced with an inequality is the presence of the control constraints. This result has an analogue in calculus, where the necessary condition for a scalar function $f$ to have a relative minimum at the end point is that the differential $df \geq 0$.
224 | 
225 | Assuming bounded controls $\ac \in \mathcal{U}$, the necessary optimality conditions are 
226 | \begin{align}
227 |     \label{eq:nec_pmp_conds1}
228 |     \stdot^*(t) &= \frac{\partial \ham}{\partial \cst}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
229 |     \dot{\cst}^*(t) &= -\frac{\partial \ham}{\partial \st}(\st^*(t),\ac^*(t),\cst^*(t),t)\\
230 |     \ham(\st^*(t),\ac^*(t), \cst^*(t), t) &\leq \ham(\st^*(t),\ac(t), \cst^*(t), t) \,\,\forall \ac \in \mathcal{U}
231 |     \label{eq:nec_pmp_conds3}
232 | \end{align}
233 | along with the boundary conditions
234 | \begin{align}
235 | \label{eq:ham_bcs}
236 |     [\frac{\partial \cost_f}{\partial \st}(\st^*(t_f),t_f)& - \cst^*(t_f)]^T \delta \st_f\\
237 |     & + [ \ham(\st^*(t_f),\ac^*(t_f), \cst^*(t_f), t_f) + \frac{\partial \cost_f}{\partial t}(\st^*(t_f),t_f)] \delta t_f = 0. \nonumber
238 | \end{align}
239 | The control $\ac^*(t)$ causes $\ham(\st^*(t),\ac^*(t),\cst^*(t),t)$ to assume its global minimum. This is a harder condition, in general, to analyze. Finally, we have additional necessary conditions. If the final time is fixed and the Hamiltonian does not explicitly depend on time, 
240 | \begin{equation}
241 |     \ham(\st^*(t),\ac^*(t),\cst^*(t)) = c \,\, \forall t \in [t_0,t_f]
242 | \end{equation}
243 | and if the final time is free and the Hamiltonian does not depend explicitly on time, 
244 | \begin{equation}
245 |     \ham(\st^*(t),\ac^*(t),\cst^*(t)) = 0 \,\, \forall t \in [t_0,t_f].
246 | \end{equation}
247 | Note that in general, uniqueness and existence are not guaranteed in the constrained setting. 
248 | 
249 | % TODO add example problems
250 | 
251 | \section{Numerical Aspects of Indirect Optimal Control}
252 | 
253 | % Mention how PMP can be leveraged to swap the original optimal control problem into a two-point boundary value problem (TPBVP).
254 | 
255 | % - Pros: We considerably decrease the dimension of the problem because, instead of looking for a control in the very large space $L^{\infty}$, we look for a vector (that is $p(0)$) in the smaller space $\mathbb{R}^n$. Still, this problem is hard to solve.
256 | % - Cons: A good guess of $p(0)$ must be available to make the method converge. The mathematical analysis to get further insights concerning such guess might be difficult.
257 | % Show how to go from PMP to TPBVP for the simple case "fixed final time/fixed final point". Follow the slides of lecture 10 and introduce the "shooting function" $S$ describing all the steps. Therefore, introduce the "indirect shooting method" as research of zeros for the shooting function.
258 | % Reproduce/show how to adapt the argument above in the presence of either "free final time" or "final point belonging to some submanifold", or both. For "final point belonging to some submanifold", attentively introduce the relation that relates the adjoint vector at the final time $p(t_f)$ with the kernel of the function $F$ that defines the submanifold $M_f = \{x\in\mathbb{R}^n|F(x)=0\}$.
259 | 
260 | \section{Bibliographic Notes}
261 | 
262 | For a practical treatment of indirect methods, we refer the reader to \cite{bryson1975applied}. For a more theoretical treatment, we refer the reader to \cite{lee1967foundations}.


--------------------------------------------------------------------------------
/2020/tex/source/ch5.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Direct Methods for Optimal Control}
  2 | 
  3 | In the previous section we considered indirect methods to optimal control, in which the necessary conditions for optimality were first applied, yielding a two-point boundary value problem that was solved numerically. We will now consider the class of direct methods, in which the optimal control problem is first discretized, and then the resulting discrete optimization problem is solved numerically. 
  4 | 
  5 | \section{Direct Methods}
  6 | 
  7 | We will write our original continuous optimal control problem,
  8 | \begin{equation}
  9 | \begin{aligned}
 10 | \label{eq:ocp}
 11 | & \underset{\ac}{\min} & & \int_0^{t_f} \cost(\st(t),\ac(t),t) dt \\
 12 | & \textrm{s.t.} & & \stdot(t) = \f(\st(t), \ac(t),t), t \in [0, t_f]\\
 13 | & & & \st(0) = \st_0\\
 14 | & & & \st(t_f) \in \mathcal{M}_f\\
 15 | & & & \ac(t) \in \mathcal{U}, t\in [0,t_f]
 16 | \end{aligned}
 17 | \end{equation}
 18 | where $\mathcal{M}_f = \{\st\in \R^n : F(\st) = 0\}$ and where we have, for simplicity, assumed zero terminal cost and $t_0 = 0$. We will use forward Euler discretization of the dynamics. We select a discretization $0 = t_0 < t_1 < \ldots < t_N = t_f$ for the interval $[0,t_f]$, and we will write $\st_{i+1} \approx \st(t), \ac_i \approx \ac(t)$ for $t \in [t_i, t_{i+1}]$, and $\st_0 \approx \st(0)$. Denoting $h_i = t_{i+1} - t_i$, the continuous time optimal control problem is transcibed into the nonlinear constrained optimization problem 
 19 | \begin{equation}
 20 | \begin{aligned}
 21 | \label{eq:nlop}
 22 | & \underset{\st,\ac}{\min} & & \sum_{i=0}^{N-1} h_i \cost(\st_i,\ac_i,t_i) \\
 23 | & \textrm{s.t.} & & \st_{i+1} = \st_i + h_i \f(\st_i,\ac_i,t_i), i = 0, \ldots, N-1\\
 24 | & & & \st_N \in \mathcal{M}_f\\
 25 | & & & \ac_i \in \mathcal{U}, i = 0, \ldots, N-1
 26 | \end{aligned}
 27 | \end{equation}
 28 | 
 29 | \subsection{Consistency of Time Discretization}
 30 | 
 31 | Having performed this discretization, a reasonable (and important) sanity check on the validity of the direct approach is whether we recover the original problem in the limit of $h_i \to 0$. For simplicity, we will drop the time-dependence of the cost and dynamics. We will write the Lagrangian for (\ref{eq:nlop}) as 
 32 | \begin{equation}
 33 |     \mathcal{L} = \sum_{i=0}^{N-1} h_i \cost(\st_i,\ac_i) + \sum_{i=0}^{N-1} \bm{\lambda}_i^T (\st_i + h_i \f(\st_i,\ac_i) - \st_{i+1}).
 34 | \end{equation}
 35 | Then, the KKT conditions are
 36 | \begin{align}
 37 |     0 &= h_i \frac{\partial \cost}{\partial \st_i}(\st_i,\ac_i) + \bm{\lambda}_i - \bm{\lambda}_{i-1} + h_i \frac{\partial \f}{\partial \st_i}^T(\st_i,\ac_i) \bm{\lambda}_i\\
 38 |     0 &= h_i \frac{\partial \cost}{\partial \ac_i}(\st_i,\ac_i) + h_i \frac{\partial \f}{\partial \ac_i}^T(\st_i,\ac_i) \bm{\lambda}_i
 39 | \end{align}
 40 | Rearranging, we have
 41 | \begin{align}
 42 |     \frac{\bm{\lambda}_i - \bm{\lambda}_{i-1}}{h_i} &=- \frac{\partial \f}{\partial \st_i}^T(\st_i,\ac_i) \bm{\lambda}_i- \frac{\partial \cost}{\partial \st_i}(\st_i,\ac_i)\\
 43 |     0 &= \frac{\partial \f}{\partial \ac_i}^T(\st_i,\ac_i) \bm{\lambda}_i + \frac{\partial \cost}{\partial \ac_i}(\st_i,\ac_i).
 44 | \end{align}
 45 | Let $\cst(t) = \bm{\lambda}_i$ for $t \in [t_i, t_{i+1}], i = 0, \ldots, N-1$ and $p(0) = \lambda_0$. Then, the above are direct discretizations of the necessary conditions for (\ref{eq:ocp}), 
 46 | \begin{align}
 47 |     \dot{\cst}(t) &=- \frac{\partial \f}{\partial \st}^T(\st(t),\ac(t)) \cst_i- \frac{\partial \cost}{\partial \st}(\st(t),\ac(t))\\
 48 |     0 &= \frac{\partial \f}{\partial \ac}^T(\st(t),\ac(t)) \cst(t) + \frac{\partial \cost}{\partial \ac}(\st(t),\ac(t)).
 49 | \end{align}
 50 | 
 51 | \section{Transcription Methods}
 52 | 
 53 | A fundamental choice in the design of numerical algorithms for direct optimization of the discretized optimal control problem is whether to optimize over the state and action variables (a method known as collocation or simultaneous optimization) or strictly over the action variables (known as shooting).
 54 | 
 55 | \subsection{Collocation Methods}
 56 | 
 57 | Collocation methods optimize both the state variables and the control input at a fixed, finite number of times, $t_0, \ldots, t_i, \ldots, t_N$. Moreover, the dynamics constraints are enforced at these points. As such, it is necessary to choose a finite-dimensional representation of the trajectory between these points. This rough outline leaves unspecified a large number of algorithmic design choices. 
 58 | 
 59 | First, how are the dynamics constraints enforced? Both derivative and integral constraints exist. The derivative approach enforces that the derivative of the state with respect to time of the parameterized trajectory is equal to the given system dynamics. The integral approach relies on integrating the given dynamics and enforcing agreement between this and the trajectory parameterization. In these notes, we will focus on the derivative approach. 
 60 | 
 61 | Second, a choice of trajectory parameterization is required. We will primarily discuss Hermite-Simpson methods in herein, which parameterize each subinterval of the trajectory (in $[t_i, t_{i+1}]$) with a cubic polynomial. Note that the choice of a polynomial results in integral and derivative constraints being relatively simple to evaluate. However, a wide variety of parameterizations exist. For example, pseudospectral methods represent the entire trajectory as a single high-order polynomial. 
 62 | 
 63 | We will now outline the Hermite-Simpson method as one example of direct collocation. Having selected a discretization $0 = t_0 < t_1 < \ldots < t_N = t_f$, we denote $h_i = t_{i+1} - t_i$. In every subinterval $[t_i, t_{i+1}]$, we approximate $\st(t)$ with a cubic polynomial
 64 | \begin{equation}
 65 |     \st(t) = \bm{c}_0^i + \bm{c}_1^i (t - t_i) + \bm{c}_2^i (t-t_i)^2 + \bm{c}_3^i (t - t_i)^3 
 66 | \end{equation}
 67 | which yields derivative 
 68 | \begin{equation}
 69 |     \st(t) = \bm{c}_1^i + 2 \bm{c}_2^i (t-t_i) + 3 \bm{c}_3^i (t - t_i)^2. 
 70 | \end{equation}
 71 | Writing $\st_i = \st(t_i), \st_{i+1} = \st(t_{i+1}), \stdot_i = \stdot(t_i), \stdot_{i+1} = \stdot(t_{i+1})$, we may write 
 72 | \begin{equation}
 73 |     \begin{bmatrix}
 74 |     \st_i\\
 75 |     \stdot_{i}\\
 76 |     \st_{i+1}\\
 77 |     \stdot_{i+1}
 78 | \end{bmatrix}
 79 | =
 80 | \begin{bmatrix}
 81 | I & 0 & 0 & 0\\
 82 | 0 & I & 0 & 0\\
 83 | I & h_i I & h_i^2 I & h_i^3 I\\
 84 | 0 &  I & 2 h_i I & 3 h_i^2 I
 85 | \end{bmatrix}
 86 |     \begin{bmatrix}
 87 |     \bm{c}_0^i\\
 88 |     \bm{c}_1^i\\
 89 |     \bm{c}_2^i\\
 90 |     \bm{c}_3^i
 91 | \end{bmatrix}
 92 | \end{equation}
 93 | which in turn results in 
 94 | \begin{equation}
 95 | \begin{bmatrix}
 96 |     \bm{c}_0^i\\
 97 |     \bm{c}_1^i\\
 98 |     \bm{c}_2^i\\
 99 |     \bm{c}_3^i
100 | \end{bmatrix}
101 | =
102 | \begin{bmatrix}
103 | I & 0 & 0 & 0\\
104 | 0 & I & 0 & 0\\
105 | -\frac{3}{h_i^2} I & -\frac{2}{h_i} I & \frac{3}{h_i^2} I & -\frac{1}{h_i} I\\
106 | \frac{2}{h_i^2} I & \frac{1}{h_i^2} I & -\frac{2}{h_i^3} I & \frac{1}{h_i^2} I
107 | \end{bmatrix}
108 | \begin{bmatrix}
109 |     \st_i\\
110 |     \stdot_{i}\\
111 |     \st_{i+1}\\
112 |     \stdot_{i+1}
113 | \end{bmatrix}.
114 | \end{equation}
115 | Choosing intermediate times $t_i^c = t_i + \frac{h_i}{2}$ (collocation points), we can define interpolated controls $\ac^c_i = \frac{\ac_i + \ac_{i+1}}{2}$. From the above, we have
116 | \begin{align}
117 |     \st^c_i \vcentcolon=& \st(t_i + \frac{h_i}{2}) = \frac{1}{2} (\st_i + \st_{i+1}) + \frac{h_i}{8} (\f(\st_i,\ac_i,t_i) - \f(\st_{i+1},\ac_{i+1},t_{i+1}))\\
118 |     \stdot^c_i \vcentcolon=& \stdot(t_i + \frac{h_i}{2}) = -\frac{3}{2h_i} (\st_i + \st_{i+1}) - \frac{1}{4} (\f(\st_i,\ac_i,t_i) + \f(\st_{i+1},\ac_{i+1},t_{i+1})).
119 | \end{align}
120 | Thus, we can write our discretized problem as 
121 | \begin{equation}
122 | \begin{aligned}
123 | \label{eq:ocp}
124 | & \underset{\ac_{0:N-1},\st_{0:N}}{\min} & & \sum_{i=0}^{N-1} h_i \cost(\st(t),\ac(t),t)\\
125 | & \textrm{s.t.} & & \stdot_i^c - \f(\st^c_i, \ac^c_i,t_i^c) = 0, i = 0, \ldots, N-1\\
126 | & & & F(\st_N) = 0\\
127 | & & & \ac(t) \in \mathcal{U}, i = 0, \ldots, N-1
128 | \end{aligned}
129 | \end{equation}
130 | 
131 | % Describe the generalized procedure which consists of discretizing both the state and the control in time. This is achieved by discretizing the variables with high-order polynomials in each subintervals $[t_i,t_{i+1}]$.
132 | 
133 | % Say that in the literature there are lots of such discretizations and the choice relies on the fact that some discretizations fit better than others for specific problem. Moreover, the higher precision you want, the higher order of polynomial you might be obliged to choose.
134 | 
135 | % Because of its generality and high precision, it is worth introducing two methods: Hermite polynomials (done in lecture 12) and Runge-Kutta 2 (easily found online). Say something about their stability property: unlike explicit Euler, the numerical solution does not explode with the number of discretization points.
136 | 
137 | % It might be worth introducing general explicit Runge-Kutta schemes for any order of polynomials, because they easily generalize Hermite and other collocation methods. For completeness, I would also provide some proof of convergence, like the fact that for $h\rightarrow0$, we have that the difference the solution of the original ODE and its Runge-Kutta approximation goes to zero with a speed $h^p$, where $p$ is the order of the Runge-Kutta scheme (or at least for the second-order scheme. I might help you doing this if needed).
138 | 
139 | \subsection{Shooting Methods}
140 | 
141 | Shooting methods solve the discrete optimization problem via optimizing only over the control inputs, and integrating the dynamics forward given these controls. A simple approach to the forward integration is the approach we have discussed above, in which forward Euler integration is used. Single-shooting methods directly optimize the controls for the entire problem. These approaches are fairly efficient for low dimension, short horizon problems, but typically struggle to scale to larger problems. Multiple shooting methods, on the other hand, optimize via shooting over subcomponents of the problem, and enforce agreement between the trajectory segments generated via shooting within each subproblem. These methods are therefore a combination of shooting methods and collocation methods. Generally, numerical solvers for shooting problems will, given an initial action sequence, linearize the trajectory and optimize the objective function with respect to those linearized dynamics to obtain new control inputs.
142 | 
143 | % Add comparisons with indirect shooting. More precisely, shootings are more robust but are computational more demanding. On the other hand, indirect shootings are cheaper and converge fast, but they suffer from sensitivity issues (i.e., less robustness) because they are quite hard to correctly initialize.
144 | 
145 | 
146 | \subsection{Sequential Convex Programming}
147 | 
148 | Direct optimization of the discretized nonlinear control problem typically results in a non-convex optimization problem, for which finding a good solution may be difficult or impossible. The source of this non-convexity is typically the dynamics (and sometimes the cost function). The key idea of sequential convex programming (SCP) is to iterative re-linearize the dynamics (and construct a convex approximation of the cost function, if it is non-convex) around a nominal trajectory. 
149 | 
150 | First, we will assume for this outline that the cost $\cost$ is convex. Let $(\st_0(\cdot),\ac_0(\cdot))$ be a nominal tuple of trajectory and control (which is not necessarily feasible). We linearize the dynamics around this trajectory:
151 | \begin{equation}
152 |     \f_1(\st,\ac,t) = \f(\st_0(t),\ac_0(t),t) + \frac{\partial \f}{\partial \st}(\st_0(t), \ac_0(t), t) (\st - \st_0(t)) + \frac{\partial \f}{\partial \ac}(\st_0(t), \ac_0(t), t) (\ac - \ac_0(t)).
153 | \end{equation}
154 | We can then solve the linear optimal control problem (with $k=0$, initially),
155 | \begin{equation}
156 | \begin{aligned}
157 | \label{eq:ocp}
158 | & {\min} & & \int_{0}^{t_f} \cost(\st(t),\ac(t),t) dt\\
159 | & \textrm{s.t.} & & \stdot(t) = \f_{k+1}(\st(t),\ac(t),t), t \in [0,t_f]\\
160 | & & & \st(0) = \st_0\\
161 | & & & \st(t_f) = \st_f\\
162 | & & & \ac(t) \in \mathcal{U}, t \in [0,t_f]
163 | \end{aligned}
164 | \end{equation}
165 | where the dynamics are linear and the cost function is quadratic. Discretizing this continuous control problem yields a tractable convex optimization problem with dynamics $\st_{i+1} = \st_i + h_i \f(\st_i,\ac_i,t_i), i=0, \ldots, N-1$. We then iterate this procedure until convergence is achieved with the new trajectory.
166 | 
167 | % [Gu] = http://asl.stanford.edu/wp-content/papercite-data/pdf/Bonalli.Cauligi.Bylard.Pavone.ICRA19.pdf
168 | 
169 | % Trust-regions: these (either hard or soft) constraints must be added to prevent bad linearizations. Also, the size of the trust regions should be adapted during iterations depending on how good the linearization is (follow sections II.B and III.A of [Gu]).
170 | % Put an algorithm that provides SCP. For this, you might copy Algorithm 1 from [Gu]. Explain and justify each line of such algorithm (you can copy what we wrote in [Gu]).
171 | % Spend some words (maybe just a paragraph) to justify the approach. The key fact is to say that when the procedure converges (and to make it converge we may use several numerical tricks, one of them being trust regions), we obtain a local solution (in the sense of the PMP) for the original optimal control problem (I might help you doing this if needed).
172 | 
173 | % TODO discuss iLQR/DDP in the context of SCP/transcription methods
174 | 
175 | \section{Bibliographic Notes}
176 | 
177 | A broad introduction to direct methods for trajectory optimization is presented in \cite{kelly2017transcription}. This tutorial also features a discussion of trajectory optimization for hybrid systems, which we have not discussed in this section, as well as numerical solver features. For a more comprehensive review of direct methods for trajectory optimization by the same author with an emphasis on collocation methods, see \cite{kelly2017introduction}.


--------------------------------------------------------------------------------
/2020/tex/source/ch6.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Model Predictive Control}
  2 | 
  3 | Both direct and indirect methods for open-loop control result in trajectories that must be tracked with an auxiliary controller, if there is any mismatch between the systems model and the true system. This often results in a decoupling of the auxiliary controller from the original optimal control problem, which may result in performance degradation. Alternatively, the auxiliary controller may not be able to take into account other problem considerations such as state or control constraints. In this section, we introduce model predictive control, which applies the ideas from direct methods for trajectory generation online to iteratively replan, and thus results in a closed-loop controller.
  4 | 
  5 | \section{Overview of MPC}
  6 | 
  7 | Model predictive control entails solving finite-time optimal control problems in a receding horizon fashion (and thus is also frequently referred to as \textit{receding horizon control}). The rough structure of model predictive control algorithms is
  8 | \begin{itemize}
  9 |     \item At each sampling time $t$, solve an \textit{open-loop} optimal control problem over a finite horizon
 10 |     \item Apply the generated optimal input signal during the subsequent sampling interval $[t,t+1)$
 11 |     \item At the next time step $t+1$, solve the new optimal control problem based on new measurements of the state over a shifted horizon
 12 | \end{itemize}
 13 | 
 14 | Consider the problem of regulating to the origin the discrete-time linear time-invariant system
 15 | \begin{equation}
 16 |     \st(t+1) = A \st(t) + B \ac(t)
 17 | \end{equation}
 18 | for $\st(t) \in \mathbb{R}^n$, $\ac(t) \in \mathbb{R}^m$, subject to constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}, t \geq 0$, where $\mathcal{X}, \mathcal{U}$ are polyhedra. We will assume the full state measurement is available at time $t$. 
 19 | Given this, we can state the finite-time optimal control problem solved at each stage, $t$, as 
 20 | \begin{equation}
 21 | \begin{aligned}
 22 | \label{eq:ocp}
 23 | & \underset{\ac_{t\mid t}, \ldots, \ac_{t+N-1\mid t}}{\min} & & \cost_f(\st_{t+N\mid t}) + \sum_{k=0}^{N-1} \cost(\st_{t+k\mid t},\ac_{t+k\mid t})  \\
 24 | & \textrm{s.t.} & & \st_{t+k+1\mid t} = A \st_{t+k\mid t} + B \ac_{t+k\mid t}, \quad k = 0, \ldots, N-1 \\
 25 | & & & \st_{t+k\mid t} \in \mathcal{X},  \quad k = 0, \ldots, N-1 \\
 26 | & & & \ac_{t+k\mid t} \in \mathcal{U},  \quad k = 0, \ldots, N-1 \\
 27 | & & & \st_{t+N\mid t} \in \mathcal{X}_f, \\
 28 | & & & \st_{t\mid t} = \st(t)
 29 | \end{aligned}
 30 | \end{equation}
 31 | for which we write the solution as $\J_t^*(\st(t))$. In this problem, $\st_{t+k\mid t}$ and $\ac_{t+k\mid t}$ are the state and action predicted at time $t+k$ from time $t$. Letting $U^*_{t\to t+N\mid t} \vcentcolon= \{\ac^*_{t\mid t}, \ldots, \ac^*_{t+N-1\mid t}\}$ denote the optimal solution, we take $\ac(t) = \ac^*_{t\mid t}(\st(t))$. This optimization problem is then repeated at time $t+1$, based on the new state $\st_{t+1\mid t+1} = \st(t+1)$. Defining the closed-loop control policy as $\pol_t(\st(t)) \vcentcolon= \ac^*_{t\mid t}(\st(t))$, we have the closed-loop dynamics 
 32 | \begin{equation}
 33 |     \st(t+1) = A \st(t) + B \pol_t(\st(t)).
 34 | \end{equation}
 35 | Thus, the central question of this formulation becomes characterizing the behavior of the closed-loop system defined by this iterative re-optimization. As the problem is time-invariant, we can rewrite the closed-loop dynamics as
 36 | \begin{equation}
 37 |     \st(t+1) = A \st(t) + B \pol(\st(t)).
 38 | \end{equation}
 39 | 
 40 | The rough structure of the online model predictive control framework is then as follows:
 41 | \begin{enumerate}
 42 |     \item Measure the state $\st(t)$ at every time $t$
 43 |     \item Obtain $U^*_0(\st(t))$ by solving finite-time optimal control problem
 44 |     \item If $U^*_0(\st(t)) = \emptyset$ then `problem infeasible', stop
 45 |     \item Apply the first element $\ac^*_0$ of $U^*_0(\st(t))$ to the system
 46 |     \item Wait for the new sampling time $t+1$
 47 | \end{enumerate}
 48 | This framework leads to two main implementation issues. First, the controller may lead us into a situation where after a few steps the finite-time optimal control problem is infeasible, which we refer to as the \textit{persistent feasibility issue}. Even if the feasibility problem does not occur, the generated control inputs may not lead to trajectories that converge to the origin, which we refer to as the \textit{stability issue}. The key question in the analysis of MPC algorithms is how we may guarantee that our ``short-sighted'' control strategy leads to effective long-term behavior. While one possible approach is directly analyzing the closed-loop dynamics, this is in practice very difficult. Our approach will instead be to derive conditions on the terminal function $\cost_f$ and terminal constraint set $\mathcal{X}_f$ so that the persistent feasibility and closed-loop stability are guaranteed. 
 49 | 
 50 | \section{Feasibility}
 51 | 
 52 | Model predictive control simplifies the online control optimization problem by solving a shorter horizon problem, as opposed to solving the full optimal control problem online at each timestep. This myopic optimization leads to the possibility that after several steps, the problem may no longer be feasible. As such, in this section we will discuss approaches to impose constraints on so-called \textit{recursive feasibility} to avoid this problem. 
 53 | 
 54 | Let 
 55 | \begin{align}
 56 |     \mathcal{X}_0 \vcentcolon= \{\st \in \mathcal{X} \mid \exists (\ac_0, \ldots,& \ac_{N-1}) \,\,\text{s.t.}\,\, \st_k \in \mathcal{X}, \ac_k \in \mathcal{U}, k=0,\ldots,N-1,\\
 57 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_{k+1} = A \st_k + B \ac_k, k = 0, \ldots, N-1 \nonumber
 58 |     \}
 59 | \end{align}
 60 | be the set of feasible initial states. Simply, this set is the set of initial states for which a sequence of control inputs exist that cause the final state to satisfy the terminal constraint. For the autonomous system $\st(t+1) = \phi(\st(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, the one-step controllable set to set $\mathcal{S}$ is defined as 
 61 | \begin{equation}
 62 |     \text{Pre}(\mathcal{S}) \vcentcolon= \{\st \in \mathbb{R}^n : \phi(\st)\in \mathcal{S}\}.
 63 | \end{equation}
 64 | For the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, the one-step controllable set to set $\mathcal{S}$ is defined as 
 65 | \begin{equation}
 66 |     \text{Pre}(\mathcal{S}) \vcentcolon= \{\st \in \mathbb{R}^n : \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, \phi(\st,\ac)\in \mathcal{S}\}.
 67 | \end{equation}
 68 | A set $\mathcal{C} \subseteq \mathcal{X}$ is said to be a control invariant set for the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, if
 69 | \begin{equation}
 70 |     \st(t) \in \mathcal{C} \implies \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, \phi(\st(t),\ac(t)) \in \mathcal{C}, \forall t.
 71 | \end{equation}
 72 | The set $\mathcal{C}_\infty \subset \mathcal{X}$ is said to the maximal control invariant set for the system $\st(t+1) = \phi(\st(t),\ac(t))$ with constraints $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, if it control invariant all control invariant sets contained in $\mathcal{X}$\footnote{Control invariant sets can be computed using the MPT toolbox: \url{www.mpt3.org}}. 
 73 | 
 74 | We will now proceed to derive critical results on recursive feasibility for linear dynamical systems. We will define the ``truncated'' feasibility set 
 75 | \begin{align}
 76 |     \mathcal{X}_1 \vcentcolon= \{\st \in \mathcal{X} \mid \exists (\ac_1, \ldots,& \ac_{N-1}) \,\,\text{s.t.}\,\, \st_k \in \mathcal{X}, \ac_k \in \mathcal{U}, k=1,\ldots,N-1,\\
 77 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_{k+1} = A \st_k + B \ac_k, k = 1, \ldots, N-1 \nonumber
 78 |     \}.
 79 | \end{align}
 80 | Then, we may state the following result on feasibility. 
 81 | 
 82 | \begin{lemma}[Persistent Feasibility]
 83 | If set $\mathcal{X}_1$ is a control invariant set for system $\st(t+1) = A \st(t) + B \ac(t)$, $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}$, then the MPC law is persistently feasible. 
 84 | \end{lemma}
 85 | 
 86 | \begin{proof}
 87 | Note that 
 88 | \begin{equation}
 89 |     \text{Pre}(\mathcal{X}_1) \vcentcolon= \{\st \in \mathbb{R}^n : \exists \ac \in \mathcal{U} \,\,\text{s.t.}\,\, A\st + B \ac\in \mathcal{X}_1\}.
 90 | \end{equation}
 91 | Since $\mathcal{X}_1$ is control invariant, there exists $\ac \in \mathcal{U}$ such that $A \st + B \ac \in \mathcal{X}_1$ for all $\st \in \mathcal{X}_1$. Thus, $\mathcal{X}_1 \subseteq \mathcal{X}_1 \cap \mathcal{X}$. One may write 
 92 | \begin{equation}
 93 |     \mathcal{X}_0 = \{\st_0 \in \mathcal{X} \mid \exists \ac_0 \in \mathcal{U}\,\, \text{s.t.}\,\, A \st_0 + B \ac_0 \in \mathcal{X}_1\} = \text{Pre}(\mathcal{X}_1) \cap \mathcal{X}.
 94 | \end{equation}
 95 | This then implies $\mathcal{X}_1 \subseteq \mathcal{X}_0$. Choose some $\st_0 \in \mathcal{X}_0$. Let $U^*_0$ be the solution to the finite-time optimization problem, and $\ac^*_0$ be the first control. Let $\st_1 = A \st_0 + B \ac_0^*$. Since $U^*_0$ is feasible, one has $\st_1 \in \mathcal{X}_1$. Since $\mathcal{X}_1 \subseteq \mathcal{X}_0$, $\st_1 \in \mathcal{X}_0$, and hence the next optimization problem is feasible. 
 96 | \end{proof}
 97 | 
 98 | For $N=1$, we may set $\mathcal{X}_f = \mathcal{X}1$. If the terminal set is chosen to be control invariant, then MPC problem will be persistently feasible \textit{independent} of chosen control objectives and parameters. The system designer may then choose the parameters to affect the system performance. The logical question, then, is how to extent this result to $N>1$, for which we have the following result.
 99 | 
100 | \begin{theorem}[Persistent Feasibility]
101 | If $\mathcal{X}_f$ is a control invariant set for the the system $\st(t+1) = A \st(t) + B \ac(t)$, $\st(t) \in \mathcal{X}, \ac(t) \in \mathcal{U}, t\geq 0$, then the MPC law is persistently feasible. 
102 | \end{theorem}
103 | 
104 | \begin{proof}
105 | We will begin by defining the ``truncated'' feasibility set at step $N-1$,
106 | \begin{align}
107 |     \mathcal{X}_{N-1} \vcentcolon= \{\st_{N-1} \in \mathcal{X} \mid \exists & \ac_{N-1} \,\,\text{s.t.}\,\, \st_{N-1} \in \mathcal{X}, \ac_{N-1} \in \mathcal{U}\\
108 |     & \st_N \in \mathcal{X}_f, \,\text{where}\,\, \st_N = A \st_{N-1} + B \ac_{N-1} \nonumber
109 |     \}.
110 | \end{align}
111 | Due to the terminal constraints, have $A \st_{N-1} + B \ac_{N-1} = \st_N \in \mathcal{X}_f$. Since $\mathcal{X}_f$ is a control invariant set, there exists a $\ac \in \mathcal{U}$ such that $\st^+ = A \st_N + B \ac \in \mathcal{X}_f$. This is the requirement that $\st_N \in \mathcal{X}_{N-1}$. Thus, $\mathcal{X}_{N-1}$ is control invariant. Repeating this argument, one can recursively show that $\mathcal{X}_{N-2}, \ldots, \mathcal{X}_1$ are control invariant, and the persistent feasibility lemma then applies. 
112 | \end{proof}
113 | 
114 | Practically, we introduce the terminal set $\mathcal{X}_f$ artificially for the purpose of leading to a sufficient condition for persistent feasibility. We would like to choose it to be large, so that is avoids compromising closed-loop performance. 
115 | 
116 | \section{Stability}
117 | 
118 | Persistent feasibility does not guarantee that the closed-loop trajectories converge toward the desired equilibrium point. One of the most popular approaches to guarantee persistent feasibility and stability of the MPC law makes use of a control invariant terminal set $\mathcal{X}_f$ for feasibility, and a terminal function $\cost_f(\cdot)$ for stability. To prove stability, we leverage Lyapunov stability theory. 
119 | 
120 | \begin{theorem}[Lyapunov Stability]
121 | \label{thm:mpc_stability}
122 | Consider the equilibrium point $\st = 0$ for the autonomous system $\st_{k+1} = \f(\st_k)$ (with $\f(0)=0)$. Let $\Omega \subset \mathbb{R}^n$ be a closed and bounded set containing the origin. Let $V:\mathbb{R}^n \to \mathbb{R}$ be a function, continuous at the origin, such that 
123 | \begin{align}
124 |     & V(0) = 0\, \text{and} \,\,V(\st) > 0, \,\, \forall \st \in \Omega \setminus \{0\} \\
125 |     & V(\st_{k+1}) - V(\st_k) < 0, \,\, \forall \st \in \Omega \setminus \{0\}.
126 | \end{align}
127 | Then $\st=0$ is asymptotically stable in $\Omega$.
128 | \end{theorem}
129 | 
130 | We will utilize this result to show that with appropriate choices of $\mathcal{X}_f$ and $\cost_f(\cdot)$, $\J_0^*$ is a Lyapunov function for the closed-loop system. 
131 | 
132 | \begin{theorem}[MPC Stability (for Quadratic Cost)]
133 | Assume 
134 | \begin{enumerate}
135 |     \item $Q = Q^T > 0, R = R^T >0, Q_f > 0$
136 |     \item Sets $\mathcal{X}, \mathcal{X}_f$, and $\mathcal{U}$ contain the origin in their interior and are closed
137 |     \item $\mathcal{X}_f \subseteq \mathcal{X}$ is control invariant
138 |     \item $\min_{\bm{v} \in \mathcal{U}, A \st + B \bm{v} \in \mathcal{X}_f} \left\{ -\cost_f(\st) + \cost(\st,\bm{v}) + \cost_f(A\st + B\bm{v}) \right\} \leq 0, \forall \st \in \mathcal{X}_f$.
139 | \end{enumerate}
140 | Then, the origin of the closed-loop system is asymptotically stable with domain of attraction $\mathcal{X}_f$.
141 | \end{theorem}
142 | 
143 | \begin{proof}
144 | Note that via assumption 3, persistent feasibility is guaranteed for any $Q_f, Q, R$. We want to show that $\J_0^*$ is a Lyapunov function for the closed-loop system $\st(t+1) = \f_{cl}(\st(t)) = A\st(t) + B \pol(\st(t))$, with respect to the equilibrium $\f_{cl}(0) = 0$ (the origin is indeed an equilibrium as $0 \in \mathcal{X}, 0 \in \mathcal{U}$, and the cost is positive for any non-zero control sequence. Note also that $\mathcal{X}_0$ is closed and bounded, and $\J_0^*(0)=0$, both by assumption. Note also that $\J^*_0(\st)>0$ for all $\st \in \mathcal{X}_0 \setminus \{0\}$.
145 | 
146 | We will now show the decay property. Since the setup is time-invariant, we can study the decay property between $t=0$ and $t=1$. Let $\st(0) \in \mathcal{X}_0$, let $U_0^{[0]} = [\ac^{[0]}_0,\ldots,\ac^{[0]}_{N-1}]$ be the optimal control sequence, and let $[\st(0), \ldots, \st^{[0]}_{N}]$ be the corresponding trajectory. After applying $\ac^{[0]}_0$, one obtains $\st(1) = A \st(0) + B \ac^{[0]}_0$. Now, consider the sequence of control $[\ac^{[0]}_1,\ldots,\ac^{[0]}_{N-1}, \bm{v}]$, where $\bm{v}\in \mathcal{U}$ and the corresponding state trajectory is $[\st(1), \ldots, \st^{[0]}_{N}, A\st^{[0]}_{N} + B \bm{V}]$. Since $\st^{[0]}_{N} \in \mathcal{X}_f$ (by the terminal constraint), and since $\mathcal{X}_f$ is control invariant, 
147 | \begin{equation}
148 |     \exists \bar{\bm{v}} \in \mathcal{U}\mid A \st^{[0]}_{N} + B \bar{\bm{v}} \in \mathcal{X}_f.
149 | \end{equation}
150 | With such a choice of $\bar{\bm{v}}$, the sequence $[\ac^{[0]}_1,\ldots,\ac^{[0]}_{N-1}, \bm{v}]$ is feasible for the MPC optimization problem at time $t=1$. Subce tgus sequence is not necessarily optimal, 
151 | \begin{equation}
152 |     \J_0^*(\st(1)) \leq \cost_f(A \st^{[0]}_{N} + B \bar{\bm{v}}) + \sum_{k=1}^{N-1} \cost(\st^{[0]}_{k}, \ac^{[0]}_{k}) + \cost(\st^{[0]}_{N}, \bm{v}).
153 | \end{equation}
154 | Equivalently, 
155 | \begin{equation}
156 |     \J_0^*(\st(1)) \leq \cost_f(A \st^{[0]}_{N} + B \bar{\bm{v}}) + \J^*_0(\st(0)) - \cost_f(\st^{[0]}_{N}) - \cost(\st(0), \ac^{[0]}_{0}) + \cost(\st^{[0]}_{N}, \bar{\bm{v}})
157 | \end{equation}
158 | Since $\st^{[0]}_{N} \in \mathcal{X}_f$ by assumption, we can select $\bar{\bm{v}}$ such that 
159 | \begin{equation}
160 |     \J_0^*(\st(1)) \leq \J_0^*(\st(0)) - \cost(\st(0), \ac^{[0]}_{0}).
161 | \end{equation}
162 | Since $\cost(\st(0), \ac^{[0]}_{0})>0$ for all $\st(0) \in \mathcal{X}_0 \setminus \{0\}$,
163 | \begin{equation}
164 |     \J_0^*(\st(1)) - \J_0^*(\st(0)) < 0.
165 | \end{equation}
166 | The last step is to prove continuity, for which we omit the details and refer the reader to \cite{borrelli2017predictive}.
167 | \end{proof}
168 | 
169 | \subsection{Choosing $\mathcal{X}_f$ and $Q_f$}
170 | 
171 | We will look at two cases. First, we will assume that $A$ is asymptotically stable. Then, we set $\mathcal{X}_f$ as the maximally positive invariant set $\mathcal{O}_\infty$ for the system $\st(t+1) = A \st(t), \st(t) \in \mathcal{X}$. The set $\mathcal{X}_f$ is a control invariant set for the system $\st(t+1) = A \st(t) + B \ac(t)$ as $\ac=0$ is a feasible control. As for stability, $\ac=0$ is feasible and $A \st \in \mathcal{X}_f$ if $\st \in \mathcal{X}_f$, thus assumption 4 of Theorem \ref{thm:mpc_stability} becomes 
172 | \begin{equation}
173 |     -\st^T Q_f \st + \st^T Q \st + \st^T A^T Q_f A \st \leq 0, \, \forall \st \in \mathcal{X}_f
174 | \end{equation}
175 | which is true since, due to the fact that A is asymptotically stable, 
176 | \begin{equation}
177 |     \exists Q_f > 0 \mid - Q_f + Q + A^T Q_f A = 0
178 | \end{equation}
179 | 
180 | Next, we will look at the general case. Let $L_\infty$ be the optimal gain for the infinite-horizon LQR controller. Set $\mathcal{X}_f$ as the maximal positive invariant set for system $\st(t+1) = (A + B L_\infty) \st(t)$ (with constraints $\st(t) \in \mathcal{X}, L_\infty \st(t) \in \mathcal{U}$). Then, set $Q_f$ as the solution $Q_\infty$ to the discrete-time Riccati equation. 
181 | 
182 | \subsection{Explicit MPC}
183 | 
184 | In some cases, the MPC law can be pre-computed, which removes the need for online optimization. An important case of this is that of constrained LQR, in which we wish to solve the optimal control problem
185 | \begin{equation}
186 | \begin{aligned}
187 | \label{eq:ocp}
188 | & \underset{\ac_{0}, \ldots, \ac_{N-1}}{\min} & & \st_N^T Q_f \st_N + \sum_{k=0}^{N-1} \st_k^T Q \st_k + \ac_k^T R \ac_k\\
189 | & \textrm{s.t.} & & \st_{k+1} = A \st_{k} + B \ac_{k}, \quad k = 0, \ldots, N-1 \\
190 | & & & \st_{k} \in \mathcal{X},  \quad k = 0, \ldots, N-1 \\
191 | & & & \ac_{k} \in \mathcal{U},  \quad k = 0, \ldots, N-1 \\
192 | & & & \st_{N} \in \mathcal{X}_f, \\
193 | & & & \st_{0} = \st
194 | \end{aligned}
195 | \end{equation}
196 | The solution to the constrained LQR problem is a control $\ac^*$ which is a continuous piecewise affine function on polyhedral partition of the state space $\mathcal{X}$, that is $\ac^* = \pol(\st)$, where
197 | \begin{equation}
198 |     \pol(\st) = L^j \st + \bm{l}^j\,\,\text{if}\,\, H^j\st \leq K^j, \, j = 1, \ldots, N^r.
199 | \end{equation}
200 | Thus, online, one has to locate in which cell of the polyhedral partition the state $\st$ lies, and then one obtains the optimal control via a look-up table query.
201 | 
202 | \section{Bibliographic Notes}
203 | 
204 | We refer the reader to \cite{borrelli2017predictive} and \cite{rawlings2017model} for two broad and comprehensive treatments of the topic. 


--------------------------------------------------------------------------------
/2020/tex/source/ch7.tex:
--------------------------------------------------------------------------------
 1 | \chapter{Introduction}
 2 | 
 3 | 
 4 | \section{Learning in Optimal Control}
 5 | 
 6 | % State the adaptive optimal control problem. Lead into information gain. Talk about comparison to passive adaptive control? Talk about MRAC adaptive control? 
 7 | 
 8 | % lay out episodic vs non-episodic setting (+ goals of each)
 9 | 
10 | 
11 | % We will consider the finite-time, stochastic, discrete-time adaptive optimal control problem with full state measurement. Thus, we consider the problem
12 | % \begin{equation}
13 | % \begin{aligned}
14 | % \label{eq:ocp}
15 | % & \underset{\ac_{0:N-1}}{\min} & & \E_{\w_{0:N-1}} \left[ \sum_{k=0}^{N-1} \cost(\st_k,\ac_k) \right]\\
16 | % & \textrm{s.t.} & & \st_{k+1} = \f(\st_k, \ac_k, \w_k; \param),\,\, t = 0, \ldots, N-1\\
17 | % % & & & \st(0) = \st_0\\
18 | % % & & & \st(t_f) \in \mathcal{M}_f\\
19 | % % & & & \ac(t) \in \mathcal{U}, t\in [0,t_f]
20 | % \end{aligned}
21 | % \end{equation}
22 | % where we have dropped the time-dependence of the cost and dynamics, as well as a terminal cost, for simplicity. We have also dropped the control and state constraints. The term $\w_k$ is a stochastic disturbance. Here, $\param$ is a vector of unknown parameters that govern state evolution. For example, one may be unsure of the inertial characteristics of a robotic arm, or the drag coefficient of an aircraft. One difference in the literature is between whether or not a prior distribution is placed on $\param$, and we will discuss both cases. 
23 | 
24 | % Note that we have defined the adaptive optimal control problem of minimizing expected control cost over \textit{one episode}, which is to say you only interact with the system a single time, for $N$ timesteps, during which $\param$ is fixed. This contrasts with the usual reinforcement learning setting, which we will discuss in the next chapter, for which an agent interacts with the system for multiple episodes (for which $\param$ is fixed for all episodes). Note, however, that this is not a universal definition of reinforcement learning, and is simply a definition we choose to impose to clarify the two settings we discuss. Thus, a fundamental distinction between adaptive optimal control and reinforcement learning is that adaptive optimal control must perform adaptation online, whereas reinforcement learning may follow a policy, without online adaptation, for an entire episode. 
25 | 
26 | % In this chapter we will discuss several approaches to this problem, both heuristic and representing optimal or near-optimal solutions. The heuristic approaches are practical alternatives to the optimal/near-optimal approaches, which were largely discarded due to mathematical complexity or practical difficulties. We will discuss recent work on these methods, and the current research state-of-the-art. 
27 | 
28 | 
29 | \subsection{What Should we Learn?}
30 | 
31 | % From the previous topics addressed in this course, the question of what we should learn may seem surprising. Surely, we should identify the unknown parameters, and then perform optimal control? While that will be the bulk of the discussion in the next two sections, it is not the only possible approach. Within both the adaptive control literature and the reinforcement learning literature, large bodies of work (perhaps even the majority of each topics' literature) is focused on direct adaptation of the control policy, and does not attempt to identify the unknown parameters. An example of this in adaptive control is adaptive pole placement. Direct adaptation of a control policy is typically referred to as direct adaptive control in the control literature, and as model-free reinforcement learning in the RL literature. Adaptive control via model identification is typically referred to as indirect adaptive control or model-based RL. 
32 | 
33 | % What else could we learn? The value function is one quantity appearing regularly in the previous chapters. However, access to the value function is not actionable without model knowledge; we wish to choose some $\ac$ to minimize 
34 | % \begin{equation}
35 | % Q(\st,\ac) = \E\left[ \cost(\st,\ac) + \J(\st') \right]
36 | % \end{equation}
37 | % which we refer to as the $Q$ function or the state-action value function. Thus, without access to knowledge of the probability of $\st'$ given $(\st,\ac)$, we can not optimize $Q$. An alternative approach that is common in reinforcement learning is to directly learn the $Q$ function. In discrete control settings (for which there are a finite number of actions), this is reasonable and often quite efficient. Because we must maximize over $Q$, we can simply evaluate each possible action. For a continuous action space, we must either attempt to solve the non-convex optimization problem $\max_{\ac} Q(\st,\ac)$, or we must discretize our action space (and thus exposing ourselves to the curse of dimensionality). There are a handful of special cases in which the action space is continuous but the maximization can be solved efficiently, which we will discuss in the next chapter. 
38 | 
39 | % % avoid learning, do robust control
40 | % Finally, a question that a control system designer should ask themselves is whether a learning-based or adaptive control scheme is necessary. First, standard feedback control is often sufficient to compensate for small model errors. Moreover, if outer bounds on the unknown parameter are available and achieving near-optimal system performance is not necessary, one may wish to use a robust control strategy as opposed to an adaptive one. Verification of robust control strategies has been a key line of work in control theory in the previous three decades, and many practical approaches exist (primarily for linear systems). We refer the reader to \cite{zhou1996robust} for a treatment of robust control theory (viewed through the lens of optimal control). 
41 | 
42 | 
43 | % We will begin by discussing the broad learning control/adaptive optimal control problem (and briefly mention the approaches that will be discussed in the following chapter). While there are many ways to state this problem (discrete-time versus continuous, with or without full state information, etc.), we will fix the following control setting for the bulk of the following two chapters. 
44 | 
45 | 
46 | \subsection{Episodes and Data Collection}
47 | 
48 | % Outline the reinforcement learning problem setting 
49 | 
50 | % use temporally stationary value function
51 | 
52 | 
53 | % One or multiple episodes
54 | 
55 | % System identification versus adaptive control versus reinforcement learning
56 | 
57 | \section{Bibliographic Notes}


--------------------------------------------------------------------------------
/2020/tex/source/ch9.tex:
--------------------------------------------------------------------------------
 1 | \chapter{System Identification}
 2 | 
 3 | % % rewrite
 4 | % The problem of having an unknown or partially-known model for dynamical systems is not an uncommon one. 
 5 | % A standard control engineering pipeline, having been given a system to control, is as follows. 
 6 | % First the engineer will attempt to build a model of the dynamics based on known relations, such as e.g. physics, which relies on certain parameters. 
 7 | % Then (often because some parameters governing the dynamics can not easily be directly measured), the engineer will choose control inputs to the system and measure the state evolution, and from this, estimate the parameters. 
 8 | % As a simple example, one can consider identifying a linear model of the form
 9 | % \begin{equation}
10 | % 	\st_{k+1} = A \st_k + B \ac_k + \w_k,
11 | % \end{equation}
12 | % where $A$ (for example) is unknown. Then, after having operated the system from time $k = 0, \ldots, N$, one can choose
13 | % \begin{equation}
14 | % \hat{A} = \argmin_{A}\{\sum_{k=0}^{N-1} \|\st_{k+1} - B \ac_k - A \st_k\|_2^2\}
15 | % \end{equation}
16 | % as a system estimate (note that this can be solved via least squares).
17 | 
18 | % This approach raises several questions.
19 | % \begin{itemize} 
20 | % 	\item How much data is required to achieve a good estimate of $A$? Moreover, how can we quantify a ``good estimate'' of $A$? Indeed, in the control setting we are usually indifferent to the parameter estimate, and are primarily concerned with the performance of the resulting controller. 
21 | % 	\item How should we design our system inputs $\ac_{0:N-1}$? The system identification approach typically assumes that an engineer is choosing the inputs\footnote{The choice of optimal inputs to estimate the system under some cost function is typically referred to as the \textit{design of experiments}} and monitoring the system behavior, and so the stability of the unknown system under the choice of inputs is not considered, but what if these assumptions were not satisfied? 
22 | % 	\item What if our system does not fall into the class of models that we are considering? For example, what if we mistakenly believed that our model was linear while in fact it exhibits nonlinear characteristics?
23 | % \end{itemize}
24 | % For the purposes of this class, the most pressing question is how to achieve good performance of the model when the operation of that system is done concurrently with estimation. This differs from the system identification setting in which the estimation task is performed before operation of the system, and the performance of the system during the estimation procedure is not considered. Moreover it also implies that the system identification phase eventually ends before transitioning to the operational phase. Note also that system identification as a discipline covers a wide variety of techniques, much broader than those alluded to here. We refer the reader to \cite{ljung1999system} for a more thorough discussion.
25 | 
26 | % TODO add discussion of convergence results
27 | 
28 | 
29 | \section{Linear System Identification}
30 | 
31 | % discuss system ID with and without observations 
32 | % without: least squares
33 | 
34 | \subsection{Persistent Excitation}
35 | 
36 | \subsection{Linear Systems with Observations}
37 | % with: EM
38 | 
39 | \section{Nonlinear System Identification}
40 | 
41 | % how to learn a model: 
42 | % MDPs vs POMDP models
43 | % MDP: low dim or high dim (direct video prediction)
44 | % difficulties of learning high dim model; examples of video prediction
45 | % POMDP: E2C, RCE, Planet
46 | % different loss functions 
47 | % reconstruction 
48 | % contrastive 
49 | % others?
50 | 
51 | \section{Bibliographic Notes}
52 | 
53 | % astrom book


--------------------------------------------------------------------------------
/2020/tex/source/intro.tex:
--------------------------------------------------------------------------------
 1 | \section*{Foreword}
 2 | 
 3 | These notes accompany the newly revised (Spring 2019 to current) version of \textit{AA 203: Optimal and Learning-Based Control} at Stanford. The goal of this new course is to present a unified treatment of optimal control and reinforcement learning (RL), with an emphasis on model-based reinforcement learning. The goal of the instructors is to unify the subjects as much as possible, and to concretize connections between these research communities. 
 4 | 
 5 | \paragraph{How is this course different from a standard class on Optimal Control?} 
 6 | 
 7 | First, we will emphasize practical computational tools for real world optimal control problems, such as model predictive control and sequential convex programming. Beyond this, the last third of the course focuses on the case in which an exact model of the system is not available. We will discuss this setting both in the online context (typically referred to as adaptive optimal control) and in the episodic context (the typical setting for reinforcement learning). 
 8 | 
 9 | \paragraph{How is this course different from a standard class on Reinforcement Learning?}
10 | 
11 | Many courses on reinforcement learning focus primarily on the setting of discrete Markov Decision Processes (MDPs), whereas we will focus primarily on continuous MDPs. More importantly, the focus on discrete MDPs leads planning with a known model (which is typically referred to as ``planning'' or ``control'' in RL) to be relatively simple. In this course, we will spend considerably more time focusing on planning with a known model in both continuous and discrete time. Finally, the focus of this course will primarily be on model-based methods. We will touch briefly on model-free methods at the end, and combinations of model-free and model-based approaches. 
12 | 
13 | \subsection*{A Note on Notation}
14 | 
15 | The notation and language used in the control theory and reinforcement learning communities vary substantially, as so we will state all of the notational choices we make in this section. First, optimal control problems are typically stated in terms of minimizing a cost function, whereas reinforcement learning problems aim to maximize a reward. These are mathematically identical statements, where one is simply the negation of the other. Herein, we will use the control theoretic approach of cost minimization. We write $\cost$ for the cost function, $\f$ for the system dynamics, and denote the state and action at time $t$ as $\st_t$ and $\ac_t$ respectively. We write scalars as lower case letters, vectors as bold lower case letters, and matrices as upper case letters. We write a deterministic policy as $\pol(\st)$, and a stochastic policy as $\pol(\ac\mid\st)$.
16 | We write the cost-to-go (negation of the value function) associated with policy $\pol$ at time $t$ and state $\st$ as $\J^\pol_t(\st)$. We will also sometimes refer to the cost-to-go as the value, but in these notes we are always referring to the expected sum of future costs. 
17 | For an in-depth discussion of the notational and language differences between the artificial intelligence and control theory communities, we refer the reader to \cite{powell2012ai}.
18 | 
19 | For notational convenience, we will write the Hessian of a function $f(x)$, evaluated at $x^*$, as $\nabla^2 f(x^*)$.
20 | 
21 | \subsection*{Prerequisites}
22 | 
23 | While these notes aim to be almost entirely self contained, familiarity with undergraduate level calculus, differential equations, and linear algebra (equivalent to CME 102 and EE 263 at Stanford) are assumed. We will briefly review nonlinear optimization in the first section of these notes, but previous experience with optimization (e.g. EE 364A) will be helpful. Finally, previous experience with machine learning (at the level of CS 229) is beneficial. 
24 | 
25 | \subsection*{Omissions}
26 | 
27 | This course (and these notes) aim to cover the content of at least three distinct fields, each with many papers published every year\footnote{We primarily include references to literature in adaptive control, optimal control, and reinforcement learning, but related work is also published in economics, neuroscience, operations research and quantitative finance, as well as many other fields and sub-fields.}. As a consequence, we skip over many topics. At present, we avoid covering:
28 | \begin{itemize}
29 |     \item \textbf{Motion planning beyond trajectory optimization}, including sampling-based motion planning. For this we refer the reader to the excellent book by LaValle \cite{lavalle2006planning}.
30 |     \item \textbf{Lyapunov analysis and stability analysis in adaptive control}. We refer the reader to \cite{aastrom2013adaptive,ioannou2012robust}.
31 |     \item \textbf{Imitation learning}.
32 | \end{itemize}
33 | 
34 | \subsection*{Acknowledgments}
35 | 
36 | We acknowledge the students of the 2019 iteration of AA 203, who pointed out many typos. We also acknowledge the former course assistants of AA 203 who helped in the preparation of the material covered in this class and development of this class---in particular, Ed Schmerling, Federico Rossi, Sumeet Singh, and Jonathan Lacotte. 


--------------------------------------------------------------------------------
/notes.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/StanfordASL/AA203-Notes/f53acbcd87bd1144ac4e8d8d2bc1ad9e6ac2a1db/notes.pdf


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
1 | # AA203: Optimal and Learning-based Control Course Notes
2 | 
3 | This repository contains the in-progress course notes for the Spring 2020 version of AA203 at Stanford. If anything is unclear or incorrect, please raise an issue. 
4 | 


--------------------------------------------------------------------------------