├── .gitignore
├── Examples
├── Lecture 1 - black box solvers.ipynb
├── Lecture 2 - LQ problems with KKT and QP.ipynb
├── Lecture 4 - Newton method to approximate a square root.ipynb
├── Lecture 4 - Optimization Agorithm - Rosenbrock function.ipynb
└── Rapid Neural Networks Tutorial.ipynb
├── Homework
├── Series 1
│ ├── Exercise_4.ipynb
│ ├── quadrotor.png
│ ├── quadrotor.py
│ └── series1.pdf
├── Series 2
│ ├── exercise 3_pendulum.ipynb
│ ├── pendulum.png
│ ├── pendulum.py
│ └── series2.pdf
├── Series 3
│ └── series3.pdf
└── Series 4
│ ├── Series 4 - Q-learning.ipynb
│ ├── pendulum.png
│ └── pendulum.py
├── Project 1
├── project1.ipynb
├── quadrotor.png
└── quadrotor.py
├── Project 2
├── project2.ipynb
├── quadrotor.png
└── quadrotor.py
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | __pycache__
3 | *~
4 | .DS_store
5 |
--------------------------------------------------------------------------------
/Homework/Series 1/Exercise_4.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "f6686912-4070-4746-8dc8-4a8405ae9b83",
6 | "metadata": {},
7 | "source": [
8 | "# Exercise 4\n",
9 | "In this exercise, we will control a 2D drone depicted in the figure below
\n",
10 | "
\n",
11 | "\n",
12 | "Consider the following simplified linear model of the drone\n",
13 | "$$\\begin{align}\n",
14 | " x_{n+1} = A\n",
15 | " x_n + B u_n\n",
16 | "\\end{align}$$\n",
17 | "\n",
18 | "where the components of the state $x_n$ correspond to the horizontal position and velocity of the quadrotor, its vertical position and velocity and its orientation and angular velocity. The control vector\n",
19 | "$u_n$ contains the forces produced by the rotors (our control inputs).\n",
20 | "The module ```quadrotor.py``` defines the matrices $A$ and $B$ and code to generate a simulation of the drone."
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 1,
26 | "id": "e293bd98-ed7a-4957-9212-42fdee7481ff",
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "## what we need to do computation and display the drone\n",
31 | "%matplotlib widget\n",
32 | "\n",
33 | "import numpy as np\n",
34 | "import matplotlib.pyplot as plt\n",
35 | "\n",
36 | "import quadrotor"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 2,
42 | "id": "2dcecf8d-4dfa-4701-ad91-a81dc5bca79a",
43 | "metadata": {},
44 | "outputs": [
45 | {
46 | "name": "stdout",
47 | "output_type": "stream",
48 | "text": [
49 | "A =\n",
50 | " [[ 1. 0.01 0. 0. 0. 0. ]\n",
51 | " [ 0. 1. 0. 0. -0.0981 0. ]\n",
52 | " [ 0. 0. 1. 0.01 0. 0. ]\n",
53 | " [ 0. 0. 0. 1. 0. 0. ]\n",
54 | " [ 0. 0. 0. 0. 1. 0.01 ]\n",
55 | " [ 0. 0. 0. 0. 0. 1. ]]\n",
56 | "B =\n",
57 | " [[ 0. 0. ]\n",
58 | " [ 0. 0. ]\n",
59 | " [ 0. 0. ]\n",
60 | " [ 0.02 0.02 ]\n",
61 | " [ 0. 0. ]\n",
62 | " [ 0.015 -0.015]]\n"
63 | ]
64 | }
65 | ],
66 | "source": [
67 | "# the matrices A and B are already defined in the quadrotor module\n",
68 | "print(f'A =\\n {quadrotor.A}')\n",
69 | "print(f'B =\\n {quadrotor.B}')"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 3,
75 | "id": "05803878-65b0-4981-af8d-81ffea3ac795",
76 | "metadata": {},
77 | "outputs": [
78 | {
79 | "data": {
80 | "text/html": [
81 | ""
257 | ]
258 | },
259 | "metadata": {},
260 | "output_type": "display_data"
261 | }
262 | ],
263 | "source": [
264 | "# we can display the behavior of the drone based on a given control input\n",
265 | "\n",
266 | "# we can also start at x0 = 0 and generate a random control of 0 for 300 time steps\n",
267 | "x0 = np.zeros((6,1))\n",
268 | "u = np.random.rand(2,300)\n",
269 | "\n",
270 | "quadrotor.animate_robot(x0, u, goal=[3,3])"
271 | ]
272 | },
273 | {
274 | "cell_type": "markdown",
275 | "id": "3152fe88-ba4f-4864-90f6-17f9f20c8875",
276 | "metadata": {},
277 | "source": [
278 | "We want to generate a control input will move the drone towards the point $(3,3)$ starting from $(0,0)$. In order to\n",
279 | "do so, we define a cost that penalizes the distance to the goal at each time step while minimizing velocities and the amount of trust needed,\n",
280 | "i.e. we want to solve the following optimal control problem\n",
281 | "$$\\begin{align}\n",
282 | "&\\min_{x_0, u_0, x_1, u_1, \\cdots} \\frac{1}{2}\\sum_{n=0}^{N} (x_n - x_{desired})^T Q (x_n - x_{desired}) + u_n^T R u_n\\\\\n",
283 | "\\textrm{subject to}\\ \\ & x_{n+1} = A x_{n} + B u_n\\\\\n",
284 | "& x_0 = [0,0,0,0,0,0]^T\n",
285 | "\\end{align}$$\n",
286 | "where $x_{desired} = [3,0,3,0,0,0]^T$\n",
287 | "\n",
288 | "1. Write down the KKT conditions for the problem\n",
289 | "2. Write code to solve the problem for N=500 time steps (you will need to find diagonal matrices $Q>0$ and $R>0$ to create a nice movement) by solving the KKT system of linear equations. Do NOT inverse the KKT matrix, instead use the [NumPy solve function](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html) which is numerically more stable and efficient.\n",
290 | "4. Show plots of all the states of the robot as a function of time\n",
291 | "5. Show plots of the optimal control as a function of time"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 4,
297 | "id": "049e472c-0b01-42fe-9b61-cb393750aa5c",
298 | "metadata": {},
299 | "outputs": [],
300 | "source": [
301 | "## write down your code here"
302 | ]
303 | }
304 | ],
305 | "metadata": {
306 | "kernelspec": {
307 | "display_name": "Python 3 (ipykernel)",
308 | "language": "python",
309 | "name": "python3"
310 | },
311 | "language_info": {
312 | "codemirror_mode": {
313 | "name": "ipython",
314 | "version": 3
315 | },
316 | "file_extension": ".py",
317 | "mimetype": "text/x-python",
318 | "name": "python",
319 | "nbconvert_exporter": "python",
320 | "pygments_lexer": "ipython3",
321 | "version": "3.12.6"
322 | }
323 | },
324 | "nbformat": 4,
325 | "nbformat_minor": 5
326 | }
327 |
--------------------------------------------------------------------------------
/Homework/Series 1/quadrotor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 1/quadrotor.png
--------------------------------------------------------------------------------
/Homework/Series 1/quadrotor.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 |
4 | import matplotlib as mp
5 | from matplotlib.animation import Animation, FuncAnimation
6 | import IPython
7 |
8 | # definition of the matrices and example of control
9 | m = 0.5
10 | I = 0.1
11 | r = 0.15
12 | g = 9.81
13 | dt = 0.01
14 | length = 0.15
15 |
16 | A = np.eye(6)
17 | A[0,1] = dt
18 | A[1,4] = -g * dt
19 | A[2,3] = dt
20 | A[4,5] = dt
21 |
22 | B = np.zeros((6,2))
23 | B[3,0] = dt/m
24 | B[3,1] = dt/m
25 | B[5,0] = length * dt/I
26 | B[5,1] = -length * dt/I
27 |
28 |
29 |
30 | def animate_robot(x0, u, goal):
31 | """
32 | This function makes an animation showing the behavior of the quadrotor
33 | takes as input the result of a simulation (with dt=0.01s)
34 | """
35 |
36 | assert(u.shape[0]==2)
37 | assert(x0.shape[0]==6)
38 | N = u.shape[1] + 1
39 | x = np.zeros((6,N))
40 | x[:,0] = x0[:,0]
41 | for i in range(N-1):
42 | x[:,i+1] = A @ x[:,i] + B @ u[:,i]
43 |
44 | min_dt = 0.1
45 | if(dt < min_dt):
46 | steps = int(min_dt/dt)
47 | use_dt = int(np.round(min_dt * 1000))
48 | else:
49 | steps = 1
50 | use_dt = int(np.round(dt * 1000))
51 |
52 | #what we need to plot
53 | plotx = x[:,::steps]
54 | plotx = plotx[:,:-1]
55 | plotu = u[:,::steps]
56 |
57 | fig = mp.figure.Figure(figsize=[6,6])
58 | mp.backends.backend_agg.FigureCanvasAgg(fig)
59 | ax = fig.add_subplot(111, autoscale_on=False, xlim=[-4,4], ylim=[-4,4])
60 | ax.grid()
61 |
62 | list_of_lines = []
63 |
64 | #create the robot
65 | # the main frame
66 | line, = ax.plot([], [], 'k', lw=6)
67 | list_of_lines.append(line)
68 | # the left propeller
69 | line, = ax.plot([], [], 'b', lw=4)
70 | list_of_lines.append(line)
71 | # the right propeller
72 | line, = ax.plot([], [], 'b', lw=4)
73 | list_of_lines.append(line)
74 | # the left thrust
75 | line, = ax.plot([], [], 'r', lw=1)
76 | list_of_lines.append(line)
77 | # the right thrust
78 | line, = ax.plot([], [], 'r', lw=1)
79 | list_of_lines.append(line)
80 | # the goal
81 | ax.plot([goal[0]], [goal[1]], 'og', lw=2)
82 |
83 | def _animate(i):
84 | for l in list_of_lines: #reset all lines
85 | l.set_data([],[])
86 |
87 | theta = plotx[4,i]
88 | x = plotx[0,i]
89 | y = plotx[2,i]
90 | trans = np.array([[x,x],[y,y]])
91 | rot = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
92 |
93 | main_frame = np.array([[-length, length], [0,0]])
94 | main_frame = rot @ main_frame + trans
95 |
96 | left_propeller = np.array([[-1.3 * length, -0.7*length], [0.1,0.1]])
97 | left_propeller = rot @ left_propeller + trans
98 |
99 | right_propeller = np.array([[1.3 * length, 0.7*length], [0.1,0.1]])
100 | right_propeller = rot @ right_propeller + trans
101 |
102 | left_thrust = np.array([[length, length], [0.1, 0.1+plotu[0,i]*0.04]])
103 | left_thrust = rot @ left_thrust + trans
104 |
105 | right_thrust = np.array([[-length, -length], [0.1, 0.1+plotu[0,i]*0.04]])
106 | right_thrust = rot @ right_thrust + trans
107 |
108 | list_of_lines[0].set_data(main_frame[0,:], main_frame[1,:])
109 | list_of_lines[1].set_data(left_propeller[0,:], left_propeller[1,:])
110 | list_of_lines[2].set_data(right_propeller[0,:], right_propeller[1,:])
111 | list_of_lines[3].set_data(left_thrust[0,:], left_thrust[1,:])
112 | list_of_lines[4].set_data(right_thrust[0,:], right_thrust[1,:])
113 |
114 | return list_of_lines
115 |
116 | def _init():
117 | return _animate(0)
118 |
119 |
120 | ani = FuncAnimation(fig, _animate, np.arange(0, len(plotx[0,:])),
121 | interval=use_dt, blit=True, init_func=_init)
122 | plt.close(fig)
123 | plt.close(ani._fig)
124 | IPython.display.display_html(IPython.core.display.HTML(ani.to_html5_video()))
125 |
--------------------------------------------------------------------------------
/Homework/Series 1/series1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 1/series1.pdf
--------------------------------------------------------------------------------
/Homework/Series 2/exercise 3_pendulum.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "aa7bede2-8e8e-4edd-acf1-78f97ea142c3",
6 | "metadata": {},
7 | "source": [
8 | "# Implementation of a SQP for nonlinear optimal control\n",
9 | "The goal of this exercise is to implement a SQP solver to solve a nonlinear optimal control problem.\n",
10 | "\n",
11 | "Consider the pendulum below \n",
12 | "\n",
13 | "
\n",
14 | "\n",
15 | "Assuming $m=l=1$, The dynamics of this pendulum is\n",
16 | "$$\\ddot{\\theta} = u - g \\sin\\theta$$\n",
17 | "which can be discretized with the following dynamics\n",
18 | "$$\\begin{align}\\theta_{n+1} &= \\theta_n + \\Delta t \\omega_n\\\\ \n",
19 | "\\omega_{n+1} &= \\omega_n + \\Delta t (u_n - g \\sin\\theta_n)\\end{align}$$\n",
20 | "where $\\theta_n$ is the angle of the pendulum with respect to the vertical at time step $n$ and $\\omega_n$ its angular velocity. We will use $\\Delta t = 0.01$.\n",
21 | "The pendulum starts at configuration $\\theta_0 = \\omega_0 = 0$, i.e. all the way down with zero velocity and we would like to find\n",
22 | "an optimal control that will bring it up to $\\theta=\\pi$ with zero velocities.\n",
23 | "\n",
24 | "To get the pendulum to do this movement, we write the following optimal control problem\n",
25 | "$$\\begin{align}\n",
26 | "& \\min_{\\theta_n, \\omega_n, u_n} \\sum_{n=0}^{300} 10(\\theta_n - \\pi)^2 + 0.1\\omega_n^2 + 0.1u_n^2\\\\\n",
27 | "\\textrm{subject to}\\ \\ & \\theta_{n+1} = \\theta_n + \\Delta t \\ \\omega_n \\\\\n",
28 | "& \\omega_{n+1} = \\omega_n + \\Delta t\\ (u_n - g \\sin\\theta_n)\\\\\n",
29 | "& \\theta_0 = \\omega_0 = 0\n",
30 | "\\end{align}$$"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "id": "a79e2ec5-bfd1-45b7-bfd6-eb2db5a3f410",
36 | "metadata": {},
37 | "source": [
38 | "## Question 1: write a SQP solver to solve this problem\n",
39 | "To do so, please follow these steps:\n",
40 | "* Write down the algorithm (in words not in code), i.e. write all the steps you need to take\n",
41 | "* Write (in Latex) the gradient of the running cost at a given guess $\\bar{x} = [\\bar{\\theta}_0, \\bar{\\omega}_0, \\bar{u}_0, \\bar{\\theta}_1, \\bar{\\omega}_1, \\bar{u}_1, \\dots, \\bar{\\theta}_{300}, \\bar{\\omega}_{300}, \\bar{u}_{300}]^T$, i.e. for given values $\\bar{\\theta}_n, \\bar{\\omega}_n, \\bar{u}_n$ and implement a function that computes it\n",
42 | "* Write (in Latex) the Hessian of the running cost at a given guess $\\bar{x}$, i.e. for given values $\\bar{\\theta}_n, \\bar{\\omega}_n, \\bar{u}_n$ and implement a function that computes it\n",
43 | "* Assume that the Hessian of the constraints is 0 (i.e. ignore the second order derivatives of the constraints)\n",
44 | "* Write (in Latex) a linear approximation of the constraints at a given guess $\\bar{x}$ in a form $G(\\bar{x}) \\Delta x = g(\\bar{x})$ (don't forget the constant terms in g) where $\\Delta x$ represents a small variation around $\\bar{x}$ and implement a function that computes both $G$ and $g$.\n",
45 | "* Use these functions to construct the inner linear KKT system that you will solve using Numpy's solve function (this should resemble the KKT system you built in the first homework)\n",
46 | "* Implement a function that computes the amount of constraint violation, i.e. the sum of the absolute values of all the constraints (i.e. assuming constraints of the form $c(x) = 0$ we want to compute $|c(x)|$).\n",
47 | "* Implement a filter linear search to test if a step should be accepted. You will implement the (simplified) filter line search explained below.\n",
48 | "* Terminate the algorithm when you either reached the maximum number of iterations (e.g. 100) or when the KKT optimality conditions $\\nabla_x L$ and $\\nabla_\\lambda L$ are close to 0, e.g. $10^{-4}$.\n",
49 | "\n",
50 | "\n",
51 | "Once you have a solution, make sure to check that it satisfies the constraints! You can also use the function ``pendulum.animate_robot`` to display the pendulum motion. Please answer the following questions:\n",
52 | "1. How many iterations did it take?\n",
53 | "2. Plot the solution (angle, velocity and control)\n",
54 | "3. Plot the amont of constraint violation per iteration of the solver\n",
55 | "4. Plot the cost per iteration of the solver\n",
56 | "5. Plot $\\alpha$ for each iteration of the solver\n",
57 | "\n",
58 | "### (Simple) filter linear search\n",
59 | "Once you have a potential step $p_x$ and associated candidate Lagrange multipliers $p_\\lambda$ (from the ``solve`` of the KKT system), you need to find a step $\\alpha$ to update your guess of the solution $x_{guess}$ and the Lagrange multipliers $\\lambda_{guess}$. We will accept a step that either reduces the amount of constraint violation or reduces the cost.\n",
60 | "\n",
61 | "Let's denote $f(x)$ the cost at $x$ and $|c(x)|$ the amount of constraint violation at $x$. Initialize the variable $f_{best} = \\infty$ and $c_{best}=\\infty$ at the beginning of the SQP. \n",
62 | "\n",
63 | "Then do the following during the line search.\n",
64 | "1. Set $\\rho$ to a number between 0 and 1 (e.g. 0.5) and set $\\alpha = 1$\n",
65 | "2. If $f(x_{guess} + \\alpha p_x) < f_{best}$ then set $f_{best} \\leftarrow f(x_{guess} + \\alpha p_x)$ and accept the step\n",
66 | "\n",
67 | " Or \n",
68 | "\n",
69 | " If $|c(x_{guess} + \\alpha p_x)| < c_{best}$ then set $c_{best} \\leftarrow |c(x_{guess} + \\alpha p_x)|$ and accept the step\n",
70 | "3. If the step was not accepted set $\\alpha \\leftarrow \\rho \\alpha$ and go back to Step 2.\n",
71 | "4. If the step was accepted update the guess $x_{guess} \\leftarrow x_{guess} + \\alpha p_x$ and the Lagrange multipliers $\\lambda_{guess} \\leftarrow (1-\\alpha)\\lambda_{guess} + \\alpha p_{lambda}$"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "id": "746bf8fa-b7d5-46ee-96d5-87ba3eafd4dd",
77 | "metadata": {},
78 | "source": [
79 | "## Question 2: write a SQP solver with inequality constraints\n",
80 | "Modify your SQP solver in order to enforce the additional constraint $-4 \\leq u_n \\leq 4$. \n",
81 | "\n",
82 | "In this case you will need to use a QP solver instead of the ``solve`` function. Please use the [qpsolvers](https://pypi.org/project/qpsolvers/) library (use ``pip install qpsolvers`` to get the latest version 4.4.0 and use ``cvxopt`` as QP solver). You may access the Lagrange multipliers of the solution following [this example](https://qpsolvers.github.io/qpsolvers/quadratic-programming.html#dual-multipliers).\n",
83 | "\n",
84 | "Update the convergence checks accordingly (using the KKT condition for the nonlinear problem $\\nabla_x L$). Also update the computation of the constraint violation by computing the amount of inequality constraint violation in absolute value (note that it should be zero when the constraint is satisfied).\n",
85 | "\n",
86 | "Once you have a solution, make sure to check that it satisfies the constraints! You can also use the function ``pendulum.animate_robot`` to display the pendulum motion. Please answer the following questions:\n",
87 | "1. How many iterations did it take?\n",
88 | "2. Plot the solution (angle, velocity and control)\n",
89 | "3. Plot the amont of constraint violation per iteration of the solver\n",
90 | "4. Plot the cost per iteration of the solver\n",
91 | "5. Plot $\\alpha$ for each iteration of the solver\n",
92 | "6. Compare this solution with the solution from Question 1. Are there any qualitative differences in the pendulum behavior? Did the solver converge faster or slower?"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 1,
98 | "id": "a9a74a3b-0675-468b-a67f-1d5622b80c6d",
99 | "metadata": {},
100 | "outputs": [],
101 | "source": [
102 | "%matplotlib widget\n",
103 | "\n",
104 | "import numpy as np\n",
105 | "import matplotlib.pyplot as plt\n",
106 | "import matplotlib\n",
107 | "import matplotlib.animation as animation\n",
108 | "import IPython\n",
109 | "\n",
110 | "from qpsolvers import solve_qp, Problem, solve_problem\n",
111 | "\n",
112 | "import pendulum"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": 2,
118 | "id": "31c1b13f-54c0-4e6f-b650-2097d456fa8e",
119 | "metadata": {},
120 | "outputs": [
121 | {
122 | "name": "stdout",
123 | "output_type": "stream",
124 | "text": [
125 | "we use the following dt=0.01\n",
126 | "we use the following g=9.81\n"
127 | ]
128 | },
129 | {
130 | "data": {
131 | "text/html": [
132 | ""
521 | ]
522 | },
523 | "metadata": {},
524 | "output_type": "display_data"
525 | }
526 | ],
527 | "source": [
528 | "# dt is defined here\n",
529 | "print(f'we use the following dt={pendulum.dt}')\n",
530 | "\n",
531 | "# and g here\n",
532 | "print(f'we use the following g={pendulum.g}')\n",
533 | "\n",
534 | "# you can use this animate function to display what the pendulum would do for a given sequence of control\n",
535 | "N = 300\n",
536 | "controls = np.zeros((N,1))\n",
537 | "x_init = np.array([[1.0],[0.]])\n",
538 | "pendulum.animate_robot(x_init, controls.T)"
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": null,
544 | "id": "c23ae715-04d0-4796-9605-507b1310c75e",
545 | "metadata": {},
546 | "outputs": [],
547 | "source": []
548 | }
549 | ],
550 | "metadata": {
551 | "kernelspec": {
552 | "display_name": "Python 3 (ipykernel)",
553 | "language": "python",
554 | "name": "python3"
555 | },
556 | "language_info": {
557 | "codemirror_mode": {
558 | "name": "ipython",
559 | "version": 3
560 | },
561 | "file_extension": ".py",
562 | "mimetype": "text/x-python",
563 | "name": "python",
564 | "nbconvert_exporter": "python",
565 | "pygments_lexer": "ipython3",
566 | "version": "3.12.5"
567 | }
568 | },
569 | "nbformat": 4,
570 | "nbformat_minor": 5
571 | }
572 |
--------------------------------------------------------------------------------
/Homework/Series 2/pendulum.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 2/pendulum.png
--------------------------------------------------------------------------------
/Homework/Series 2/pendulum.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 |
4 | import matplotlib
5 | from matplotlib.animation import Animation, FuncAnimation
6 | import IPython
7 |
8 | # definition of the matrices and example of control
9 | g = 9.81
10 | dt = 0.01
11 |
12 | def animate_robot(x0, u):
13 | """
14 | This function makes an animation showing the behavior of the pendulum
15 | takes as input the result of a simulation - dt is the sampling time (0.1s normally)
16 | """
17 |
18 | assert(u.shape[0]==1)
19 | assert(x0.shape[0]==2)
20 | N = u.shape[1] + 1
21 | x = np.zeros((2,N))
22 | x[:,0] = x0[:,0]
23 | for i in range(N-1):
24 | x[0,i+1] = x[0,i] + dt * x[1,i]
25 | x[1,i+1] = x[1,i] + dt * (u[0,i] - g * np.sin(x[0,i]))
26 |
27 | # here we check if we need to down-sample the data for display
28 | #downsampling (we want 100ms DT or higher)
29 | min_dt = 0.1
30 | if(dt < min_dt):
31 | steps = int(min_dt/dt)
32 | use_dt = int(min_dt * 1000)
33 | else:
34 | steps = 1
35 | use_dt = int(dt * 1000)
36 | plotx = x[:,::steps]
37 |
38 | fig = matplotlib.figure.Figure(figsize=[6,6])
39 | matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
40 | ax = fig.add_subplot(111, autoscale_on=False, xlim=[-1.3,1.3], ylim=[-1.3,1.3])
41 | ax.grid()
42 |
43 | list_of_lines = []
44 |
45 | #create the cart pole
46 | line, = ax.plot([], [], 'k', lw=2)
47 | list_of_lines.append(line)
48 | line, = ax.plot([], [], 'o', lw=2)
49 | list_of_lines.append(line)
50 |
51 | cart_height = 0.25
52 |
53 | def animate(i):
54 | for l in list_of_lines: #reset all lines
55 | l.set_data([],[])
56 |
57 | x_pend = np.sin(plotx[0,i])
58 | y_pend = -np.cos(plotx[0,i])
59 |
60 | list_of_lines[0].set_data([0., x_pend], [0., y_pend])
61 | list_of_lines[1].set_data([x_pend, x_pend], [y_pend, y_pend])
62 |
63 | return list_of_lines
64 |
65 | def init():
66 | return animate(0)
67 |
68 |
69 | ani = FuncAnimation(fig, animate, np.arange(0, len(plotx[0,:])),
70 | interval=use_dt, blit=True, init_func=init)
71 | plt.close(fig)
72 | plt.close(ani._fig)
73 | IPython.display.display_html(IPython.core.display.HTML(ani.to_html5_video()))
74 |
--------------------------------------------------------------------------------
/Homework/Series 2/series2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 2/series2.pdf
--------------------------------------------------------------------------------
/Homework/Series 3/series3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 3/series3.pdf
--------------------------------------------------------------------------------
/Homework/Series 4/Series 4 - Q-learning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Q-learning with neural networks"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# a few packages we need to import\n",
17 | "%matplotlib widget\n",
18 | "\n",
19 | "import numpy as np\n",
20 | "import matplotlib.pyplot as plt\n",
21 | "import matplotlib\n",
22 | "import matplotlib.animation as animation\n",
23 | "import IPython \n",
24 | "\n",
25 | "import torch\n",
26 | "\n",
27 | "import pendulum"
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "The goal of this homework is to implement the Q-learning with a neural network for the Q function to solve the inverted pendulum problem.\n",
35 | "\n",
36 | "
\n",
37 | "\n",
38 | "In the following, we write $x = \\begin{pmatrix} \\theta \\\\ \\dot{\\theta} \\end{pmatrix}$ as the vector of states of the system.\n",
39 | "\n",
40 | "## System dynamics\n",
41 | "* The system dynamics is implemented in the `pendulum.py` function. The dynamics is implemented in `pendulum.step`.\n",
42 | "* The allowed control inputs are $[-5,0,5]$\n",
43 | "\n",
44 | "## Cost function\n",
45 | "The goal is to find a policy that minimizes the following cost\n",
46 | "$$\\min \\sum_{n=0}^N \\alpha^n g(x,u)$$\n",
47 | "where\n",
48 | "$$g(x,v,u) = 0.01*(1-\\cos(x-\\pi))^2 + 0.001* v^2 + 0.00001*u^2$$\n",
49 | "which gives a high cost for states far from $\\pi$ (i.e. far from the inverted position) or states with non zero velocity or high controls\n",
50 | "\n",
51 | "\n",
52 | "\n",
53 | "## Q-learning algorithm to implement\n",
54 | "For each episode:\n",
55 | "* Initialize the episode $x_0 = [0,0]$\n",
56 | "* For each step of the episode:\n",
57 | " * Select $u_n$ using an $\\epsilon$-greedy policy\n",
58 | " * Compute the next state $x_{n+1}$\n",
59 | " * Compute the target $y_n = g(x_n,u_n) + \\alpha \\min_a Q(x_{n+1},a)$\n",
60 | " * Do one SGD step on the neural network parameters to minimize $(Q(x,u) - y_t)^2$\n",
61 | "\n",
62 | "\n",
63 | "## Parameters:\n",
64 | "* Episode length 100 steps\n",
65 | "* Discount factor $\\alpha = 0,99$\n",
66 | "* Learning rate (for SGD) 0.1\n",
67 | "* $\\epsilon = 0.1$\n",
68 | "\n"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "## Using PyTorch\n",
76 | "You need to install and use PyTorch for the neural network and do the optimization. \n",
77 | "\n",
78 | "You may want to use the following functions:\n",
79 | "* [`torch.optim.SGD`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html)\n",
80 | "* [`torch.nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html)\n",
81 | "\n",
82 | "The neural network is given below"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 2,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "## we define the neural network to be used for Q-learning\n",
92 | "## 2 hidden layers with 64 nodes\n",
93 | "## 2 inputs (state)\n",
94 | "## 3 outputs for the 3 possible controls\n",
95 | "D_in, H, D_out = 2, 64, 3\n",
96 | "\n",
97 | "q_function = torch.nn.Sequential(\n",
98 | " torch.nn.Linear(D_in, H),\n",
99 | " torch.nn.ReLU(),\n",
100 | " torch.nn.Linear(H, H),\n",
101 | " torch.nn.ReLU(),\n",
102 | " torch.nn.Linear(H, H),\n",
103 | " torch.nn.ReLU(),\n",
104 | " torch.nn.Linear(H, D_out)\n",
105 | ")\n",
106 | "\n",
107 | "## we initialize the network parameters to 0\n",
108 | "for params in q_function.parameters():\n",
109 | " params = torch.zeros_like(params)\n",
110 | "\n",
111 | "\n",
112 | "### possible controls\n",
113 | "possible_controls = np.array([-5.,0.,5.])"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "# Questions:\n",
121 | "1. Implement the Q-learning algorithm described above\n",
122 | "2. Test that it works with and without pushes using the code below\n",
123 | "3. Plot the cost per episode (to visualize learning)\n",
124 | "4. Plot the learned value function (in 2D as a function of pendulum position and velocity) as well as the policy.\n",
125 | "5. Describe the algorithm and put the plots in a short report (max 2 pages) and include a video of the pendulum."
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "## Testing\n",
133 | "You can test your results with the code below which use the Q-function to create a controller send to the `animate_robot` function.\n",
134 | "You can choose to save also a movie of the animation and toggle the animation with a disturbance."
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 4,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "data": {
144 | "text/html": [
145 | ""
309 | ]
310 | },
311 | "metadata": {},
312 | "output_type": "display_data"
313 | }
314 | ],
315 | "source": [
316 | "x0 = np.zeros((2,1))\n",
317 | "def controller(x):\n",
318 | " u_pred = torch.argmin(q_function(torch.as_tensor(x, dtype=torch.float).unsqueeze(0))).item()\n",
319 | " u = possible_controls[u_pred]\n",
320 | " return u\n",
321 | " \n",
322 | "pendulum.animate_robot(x0,controller,push=False, save_movie=True)\n"
323 | ]
324 | },
325 | {
326 | "cell_type": "code",
327 | "execution_count": null,
328 | "metadata": {},
329 | "outputs": [],
330 | "source": []
331 | }
332 | ],
333 | "metadata": {
334 | "kernelspec": {
335 | "display_name": "Python 3 (ipykernel)",
336 | "language": "python",
337 | "name": "python3"
338 | },
339 | "language_info": {
340 | "codemirror_mode": {
341 | "name": "ipython",
342 | "version": 3
343 | },
344 | "file_extension": ".py",
345 | "mimetype": "text/x-python",
346 | "name": "python",
347 | "nbconvert_exporter": "python",
348 | "pygments_lexer": "ipython3",
349 | "version": "3.12.5"
350 | }
351 | },
352 | "nbformat": 4,
353 | "nbformat_minor": 4
354 | }
355 |
--------------------------------------------------------------------------------
/Homework/Series 4/pendulum.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Homework/Series 4/pendulum.png
--------------------------------------------------------------------------------
/Homework/Series 4/pendulum.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 |
4 | import matplotlib
5 | from matplotlib.animation import Animation, FuncAnimation
6 | import IPython
7 |
8 | # definition of the matrices and example of control
9 | g = 9.81
10 | integration_dt = 0.01
11 | dt = 0.1
12 |
13 | def step(x,u):
14 | """
15 | this function return x_n+1 given x and u
16 | """
17 | for i in range(10):
18 | x_next = (x[0] + integration_dt * x[1])%(2*np.pi)
19 | v_next = np.clip(x[1] + integration_dt * (u-g*np.sin(x[0])), -6.,6.)
20 | x = np.array([x_next,v_next])
21 | return x
22 |
23 | def animate_robot(x0, controller, push = False, save_movie = False):
24 | """
25 | This function makes an animation showing the behavior of the pendulum
26 | takes as input the result of a simulation - dt is the sampling time (0.1s normally)
27 | """
28 | N = 100
29 | assert(x0.shape[0]==2)
30 | x = np.zeros((2,N))
31 | x[:,0] = x0[:,0]
32 | for i in range(N-1):
33 | u = controller(x[:,i])
34 | x[:,i+1] = step(x[:,i],u)
35 | if push and (i == 45 or i == 70):
36 | x[1,i+1] = 4*np.pi*(np.random.rand(1) - 0.5)
37 |
38 | # here we check if we need to down-sample the data for display
39 | #downsampling (we want 100ms DT or higher)
40 | min_dt = 0.1
41 | if(dt < min_dt):
42 | steps = int(min_dt/dt)
43 | use_dt = int(min_dt * 1000)
44 | else:
45 | steps = 1
46 | use_dt = int(dt * 1000)
47 | plotx = x[:,::steps]
48 |
49 | fig = matplotlib.figure.Figure(figsize=[6,6])
50 | matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
51 | ax = fig.add_subplot(111, autoscale_on=False, xlim=[-1.3,1.3], ylim=[-1.3,1.3])
52 | ax.grid()
53 |
54 | list_of_lines = []
55 |
56 | #create the cart pole
57 | line, = ax.plot([], [], 'k', lw=2)
58 | list_of_lines.append(line)
59 | line, = ax.plot([], [], 'o', lw=2)
60 | list_of_lines.append(line)
61 |
62 | cart_height = 0.25
63 |
64 | def animate(i):
65 | for l in list_of_lines: #reset all lines
66 | l.set_data([],[])
67 |
68 | x_pend = np.sin(plotx[0,i])
69 | y_pend = -np.cos(plotx[0,i])
70 |
71 | list_of_lines[0].set_data([0., x_pend], [0., y_pend])
72 | list_of_lines[1].set_data([x_pend, x_pend], [y_pend, y_pend])
73 |
74 | return list_of_lines
75 |
76 | def init():
77 | return animate(0)
78 |
79 |
80 | ani = FuncAnimation(fig, animate, np.arange(0, len(plotx[0,:])),
81 | interval=use_dt, blit=True, init_func=init)
82 | if save_movie:
83 | ani.save('pendulum_movie.mp4')
84 | plt.close(fig)
85 | plt.close(ani._fig)
86 | IPython.display.display_html(IPython.core.display.HTML(ani.to_html5_video()))
87 |
--------------------------------------------------------------------------------
/Project 1/project1.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "6a718a9d-3ffe-4887-9c48-ef4c42c1da7a",
6 | "metadata": {},
7 | "source": [
8 | "# Perform a looping with the quadrotor"
9 | ]
10 | },
11 | {
12 | "cell_type": "markdown",
13 | "id": "b7d0f287-df0f-4340-ac7b-c2262d9910f9",
14 | "metadata": {},
15 | "source": [
16 | "# Goal of the project\n",
17 | "\n",
18 | "The goal of this project is to control a 2D quadrotor to get it to perform acrobatic moves. The controller will be designed using an SQP solver.\n",
19 | "\n",
20 | "Please submit your code as a runnable Jupyter Notebook.\n",
21 | "\n",
22 | "## 2D quadrotor\n",
23 | "\n",
24 | "The quadrotor is depicted in the following figure\n",
25 | "
\n",
26 | "\n",
27 | "\n",
28 | "The quadrotor model is written as\n",
29 | "$$\\begin{align} \n",
30 | "\\dot{p_x} &= v_x\\\\\n",
31 | "m \\dot{v}_x &= - (u_1 + u_2) \\sin \\theta \\\\ \n",
32 | "\\dot{p_y} &= v_y\\\\\n",
33 | "m \\dot{v}_y &= (u_1 + u_2) \\cos \\theta - m g\\\\\n",
34 | "\\dot{\\theta} &= \\omega\\\\\n",
35 | "I \\dot{\\omega} &= r (u_1 - u_2) \\end{align}$$\n",
36 | "where $p_x$ is the horizontal and $p_y$ the vertical positions of the quadrotor and $\\theta$ is its orientation with respect to the horizontal plane. $v_x$ and $v_y$ are the linear velocities and $\\omega$ is the angular velocity of the robot. $u_1$ and $u_2$ are the forces produced by the rotors (our control inputs). $m$ is the quadrotor mass, $I$ its moment of inertia (a scalar), $r$ is the distance from the center of the robot frame to the propellers and $g$ is the gravity constant. To denote the entire state, we will write $x = [p_x, v_x, p_y, v_y, \\theta, \\omega]^T$ - we will also write $u = [u_1, u_2]^T$.\n",
37 | "\n",
38 | "The module ```quadrotor.py``` defines the problem and provides all the useful information about the robot and methods to simulate and animate it as shown below.\n",
39 | "\n",
40 | "You can access the different parameters of the model in the following way:"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 1,
46 | "id": "746660f8-7599-435b-882b-81794d13f791",
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "name": "stdout",
51 | "output_type": "stream",
52 | "text": [
53 | "Mass = 0.5\n",
54 | "Length = 0.15\n",
55 | "Inertia = 0.1\n",
56 | "Dt = 0.04\n",
57 | "state size = 6\n",
58 | "control size = 2\n"
59 | ]
60 | }
61 | ],
62 | "source": [
63 | "import quadrotor\n",
64 | "\n",
65 | "print(\"Mass =\", quadrotor.MASS)\n",
66 | "print(\"Length =\", quadrotor.LENGTH)\n",
67 | "print(\"Inertia =\", quadrotor.INERTIA)\n",
68 | "print(\"Dt =\", quadrotor.DT)\n",
69 | "print(\"state size =\", quadrotor.DIM_STATE)\n",
70 | "print(\"control size =\", quadrotor.DIM_CONTROL)"
71 | ]
72 | },
73 | {
74 | "cell_type": "markdown",
75 | "id": "d581a04d-cb30-4591-bfcd-6ee093bff8b1",
76 | "metadata": {},
77 | "source": [
78 | "\n",
79 | "\n",
80 | "## Part 1 - Setting up the trajectory Optimization (50 points)\n",
81 | "1. Discretize the system dynamics using the Euler method seen in class - write the time discretization step as $\\Delta t$ (use symbols not numbers for the mass, etc)\n",
82 | "2. We would like the quadrotor to perfom a looping. Find and implement a suitable cost function to perform a looping and add constraint to maintain the thrust of each rotor between $0$ and $10$. Solve the problem using your own implementation of a SQP (levarging your code from Homework 2) with a large horizon to check that you can do a looping.\n",
83 | "3. Show plots of all the states and controls of the robot as a function of time. Describe your design choices (in a concise manner) in the report.\n",
84 | "\n",
85 | "## Part 2 - Model predictive control (MPC) (50 points)\n",
86 | "1. Use the trajectory optimization method from Part I to design a MPC controller and test it using the simulator below. In particular, verify that it can handle perturbations by calling the ```quadrotor.simulate``` function with ```disturbance = True``` (when setting disturbance to True, the simulator will generate a random perturbation every 1 second). Simulate your controller for 10 seconds, plot the state anc control evolution.\n",
87 | "2. Explain your intended design in the report, including the cost function and found control law\n",
88 | "\n",
89 | "The ```quadrotor.simulate``` function takes as an input an initial state, a controller, the number of discrete time steps and a boolean value to indicate the presence of perturbation. The controller has to be a function taking as an input a state and time index and outputting a control vector.\n",
90 | "\n",
91 | "To visualize the trajectory, use the ```quadrotor.animate_robot``` function\n",
92 | "and show the animation (show the plots in your report)."
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "id": "c396e949-32bc-4b2a-af4b-67cbf62872a1",
98 | "metadata": {},
99 | "source": [
100 | "## Bonus (10 points)\n",
101 | "Add a state constraint to perform the looping while maintening a positive altitude. Use the origin as an initial state."
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": 2,
107 | "id": "535b4946-517e-4295-96f6-d5907f55897e",
108 | "metadata": {},
109 | "outputs": [
110 | {
111 | "data": {
112 | "text/html": [
113 | ""
378 | ]
379 | },
380 | "metadata": {},
381 | "output_type": "display_data"
382 | }
383 | ],
384 | "source": [
385 | "import numpy as np\n",
386 | "%matplotlib inline\n",
387 | "import matplotlib.pyplot as plt\n",
388 | "import matplotlib\n",
389 | "import matplotlib.animation as animation\n",
390 | "import IPython\n",
391 | "\n",
392 | "def controller(x, t):\n",
393 | " return np.zeros(2)\n",
394 | "\n",
395 | "x_init = np.array([0, 0, 0., 0 ,0, 0])\n",
396 | "horizon_length = 100\n",
397 | "t, state, u = quadrotor.simulate(x_init, controller, horizon_length, disturbance=True)\n",
398 | "quadrotor.animate_robot(state, u)"
399 | ]
400 | }
401 | ],
402 | "metadata": {
403 | "kernelspec": {
404 | "display_name": "Python 3 (ipykernel)",
405 | "language": "python",
406 | "name": "python3"
407 | },
408 | "language_info": {
409 | "codemirror_mode": {
410 | "name": "ipython",
411 | "version": 3
412 | },
413 | "file_extension": ".py",
414 | "mimetype": "text/x-python",
415 | "name": "python",
416 | "nbconvert_exporter": "python",
417 | "pygments_lexer": "ipython3",
418 | "version": "3.12.5"
419 | }
420 | },
421 | "nbformat": 4,
422 | "nbformat_minor": 5
423 | }
424 |
--------------------------------------------------------------------------------
/Project 1/quadrotor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Project 1/quadrotor.png
--------------------------------------------------------------------------------
/Project 1/quadrotor.py:
--------------------------------------------------------------------------------
1 | """
2 | This class describes a quadrotor model and provides some helper functions
3 | """
4 | import numpy as np
5 | import matplotlib.pyplot as plt
6 |
7 | import matplotlib as mp
8 | import IPython
9 |
10 |
11 |
12 | # constant
13 | MASS = 0.500
14 | INERTIA = 0.1
15 | LENGTH = 0.15
16 | DT = 0.04 # integration step
17 | GRAVITY_CONSTANT = 9.81
18 |
19 | DIM_STATE = 6
20 | DIM_CONTROL = 2
21 |
22 | def next_state(z, u):
23 | """
24 | Inputs:
25 | z: state of the cart pole syste as a numpy array (px , px dot, py , py dot, theta, omega)
26 | u: 2d control
27 |
28 | Output:
29 | the new state of the pendulum as a numpy array
30 | """
31 | x = z[0]
32 | vx = z[1]
33 | y = z[2]
34 | vy = z[3]
35 | theta = z[4]
36 | omega = z[5]
37 |
38 | dydt = np.zeros([DIM_STATE,])
39 | dydt[0] = vx
40 | dydt[1] = (-(u[0] + u[1]) * np.sin(theta)) / MASS
41 | dydt[2] = vy
42 | dydt[3] = ((u[0] + u[1]) * np.cos(theta) - MASS * GRAVITY_CONSTANT) / MASS
43 | dydt[4] = omega
44 | dydt[5] = (LENGTH * (u[0] - u[1])) / INERTIA
45 |
46 | z_next = z + dydt * DT
47 | return z_next
48 |
49 | def simulate(z0, controller, horizon_length, disturbance=False):
50 | """
51 | This function simulates the quadrotor for horizon_length steps from initial state z0
52 |
53 | Inputs:
54 | z0: the initial conditions of the quadrotor as a numpy array (x,vx,y,vy,theta,omega)
55 | controller: a function that takes a state z as argument and index i of the time step and returns a control u
56 | horizon_length: the horizon length
57 |
58 | disturbance: if True will generate a random push every seconds during the simulation
59 |
60 | Output:
61 | t[time_horizon+1] contains the simulation time
62 | z[4xtime_horizon+1] and u[1,time_horizon] containing the time evolution of states and control
63 | """
64 | t = np.zeros([horizon_length + 1,])
65 | z = np.empty([DIM_STATE, horizon_length + 1])
66 | z[:, 0] = z0
67 | u = np.zeros([DIM_CONTROL, horizon_length])
68 | for i in range(horizon_length):
69 | u[:, i] = controller(z[:, i], i)
70 | z[:, i + 1] = next_state(z[:, i], u[:, i])
71 | if disturbance and np.mod(i, 25) == 0:
72 | dist = np.zeros([DIM_STATE,])
73 | dist[1::2] = np.random.uniform(-1.0, 1, (3,))
74 | z[:, i + 1] += dist
75 | t[i + 1] = t[i] + DT
76 | return t, z, u
77 |
78 | def animate_robot(x, u, dt=0.04):
79 | """
80 | This function makes an animation showing the behavior of the quadrotor
81 | takes as input the result of a simulation (with dt=0.04s)
82 | """
83 |
84 | min_dt = 0.04
85 | if dt < min_dt:
86 | steps = int(min_dt / dt)
87 | use_dt = int(np.round(min_dt * 1000))
88 | else:
89 | steps = 1
90 | use_dt = int(np.round(dt * 1000))
91 |
92 | # what we need to plot
93 | plotx = x[:, ::steps]
94 | plotx = plotx[:, :-1]
95 | plotu = u[:, ::steps]
96 |
97 | fig = mp.figure.Figure(figsize=[8.5, 8.5])
98 | mp.backends.backend_agg.FigureCanvasAgg(fig)
99 | ax = fig.add_subplot(111, autoscale_on=False, xlim=[-4, 4], ylim=[-4, 4])
100 | ax.grid()
101 |
102 | list_of_lines = []
103 |
104 | # create the robot
105 | # the main frame
106 | (line,) = ax.plot([], [], "k", lw=6)
107 | list_of_lines.append(line)
108 | # the left propeller
109 | (line,) = ax.plot([], [], "b", lw=4)
110 | list_of_lines.append(line)
111 | # the right propeller
112 | (line,) = ax.plot([], [], "b", lw=4)
113 | list_of_lines.append(line)
114 | # the left thrust
115 | (line,) = ax.plot([], [], "r", lw=1)
116 | list_of_lines.append(line)
117 | # the right thrust
118 | (line,) = ax.plot([], [], "r", lw=1)
119 | list_of_lines.append(line)
120 |
121 | def _animate(i):
122 | for l in list_of_lines: # reset all lines
123 | l.set_data([], [])
124 |
125 | theta = plotx[4, i]
126 | x = plotx[0, i]
127 | y = plotx[2, i]
128 | trans = np.array([[x, x], [y, y]])
129 | rot = np.array(
130 | [[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
131 | )
132 |
133 | main_frame = np.array([[-LENGTH, LENGTH], [0, 0]])
134 | main_frame = rot @ main_frame + trans
135 |
136 | left_propeller = np.array(
137 | [[-1.3 * LENGTH, -0.7 * LENGTH], [0.1, 0.1]]
138 | )
139 | left_propeller = rot @ left_propeller + trans
140 |
141 | right_propeller = np.array(
142 | [[1.3 * LENGTH, 0.7 * LENGTH], [0.1, 0.1]]
143 | )
144 | right_propeller = rot @ right_propeller + trans
145 |
146 | left_thrust = np.array(
147 | [[LENGTH, LENGTH], [0.1, 0.1 + plotu[0, i] * 0.04]]
148 | )
149 | left_thrust = rot @ left_thrust + trans
150 |
151 | right_thrust = np.array(
152 | [[-LENGTH, -LENGTH], [0.1, 0.1 + plotu[0, i] * 0.04]]
153 | )
154 | right_thrust = rot @ right_thrust + trans
155 |
156 | list_of_lines[0].set_data(main_frame[0, :], main_frame[1, :])
157 | list_of_lines[1].set_data(left_propeller[0, :], left_propeller[1, :])
158 | list_of_lines[2].set_data(right_propeller[0, :], right_propeller[1, :])
159 | list_of_lines[3].set_data(left_thrust[0, :], left_thrust[1, :])
160 | list_of_lines[4].set_data(right_thrust[0, :], right_thrust[1, :])
161 |
162 | return list_of_lines
163 |
164 | def _init():
165 | return _animate(0)
166 |
167 | ani = mp.animation.FuncAnimation(
168 | fig,
169 | _animate,
170 | np.arange(0, len(plotx[0, :])),
171 | interval=use_dt,
172 | blit=True,
173 | init_func=_init,
174 | )
175 | plt.close(fig)
176 | plt.close(ani._fig)
177 | IPython.display.display_html(IPython.core.display.HTML(ani.to_html5_video()))
178 |
--------------------------------------------------------------------------------
/Project 2/project2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "bce59773-dee6-41b3-83c6-ff9eca3a724a",
6 | "metadata": {},
7 | "source": [
8 | "# Goal of the project\n",
9 | "\n",
10 | "The goal of this project is to control a 2D quadrotor to reach a target while avoiding obstacles using a learned policy. You wil have to create your own Custom environment using the [stable_baselines3](https://stable-baselines3.readthedocs.io/en/master/) library and train a RL agent using PPO.\n",
11 | "\n",
12 | "## 2D quadrotor\n",
13 | "\n",
14 | "The quadrotor is depicted in the following figure\n",
15 | "
\n",
16 | "\n",
17 | "\n",
18 | "The quadrotor model is written as\n",
19 | "$$\\begin{align} \n",
20 | "\\dot{p_x} &= v_x\\\\\n",
21 | "m \\dot{v}_x &= - (u_1 + u_2) \\sin \\theta \\\\ \n",
22 | "\\dot{p_y} &= v_y\\\\\n",
23 | "m \\dot{v}_y &= (u_1 + u_2) \\cos \\theta - m g\\\\\n",
24 | "\\dot{\\theta} &= \\omega\\\\\n",
25 | "I \\dot{\\omega} &= r (u_1 - u_2) \\end{align}$$\n",
26 | "where $p_x$ is the horizontal and $p_y$ the vertical positions of the quadrotor and $\\theta$ is its orientation with respect to the horizontal plane. $v_x$ and $v_y$ are the linear velocities and $\\omega$ is the angular velocity of the robot. $u_1$ and $u_2$ are the forces produced by the rotors (our control inputs). $m$ is the quadrotor mass, $I$ its moment of inertia (a scalar), $r$ is the distance from the center of the robot frame to the propellers and $g$ is the gravity constant. To denote the entire state, we will write $x = [p_x, v_x, p_y, v_y, \\theta, \\omega]^T$ - we will also write $u = [u_1, u_2]^T$.\n",
27 | "\n",
28 | "The module ```quadrotor.py``` defines the problem and provides all the useful information about the robot and methods to simulate and animate it as shown below.\n",
29 | "\n",
30 | "You can access the different parameters of the model in the following way:"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 1,
36 | "id": "33cdeb4b-ec39-4880-add8-b8153ff9bd8a",
37 | "metadata": {},
38 | "outputs": [
39 | {
40 | "name": "stdout",
41 | "output_type": "stream",
42 | "text": [
43 | "Mass = 0.5\n",
44 | "Length = 0.15\n",
45 | "Inertia = 0.1\n",
46 | "Dt = 0.04\n",
47 | "state size = 6\n",
48 | "control size = 2\n"
49 | ]
50 | }
51 | ],
52 | "source": [
53 | "import quadrotor\n",
54 | "\n",
55 | "print(\"Mass =\", quadrotor.MASS)\n",
56 | "print(\"Length =\", quadrotor.LENGTH)\n",
57 | "print(\"Inertia =\", quadrotor.INERTIA)\n",
58 | "print(\"Dt =\", quadrotor.DT)\n",
59 | "print(\"state size =\", quadrotor.DIM_STATE)\n",
60 | "print(\"control size =\", quadrotor.DIM_CONTROL)"
61 | ]
62 | },
63 | {
64 | "attachments": {},
65 | "cell_type": "markdown",
66 | "id": "8c072ae2-dd9e-4518-90c7-d055a0872d30",
67 | "metadata": {},
68 | "source": [
69 | "The goal of this project is to learn a policy that can move the robot from any point to the red dot ($x^{\\star} = [2, 0, 0, 0, 0, 0]^T$) while avoiding thee obstacles. The obstacles are represented by the black circles in the animation. You can check if the drone is in collision with an obstacle using the function ```quadrotor.check_collision```. \n",
70 | "\n",
71 | "## Create a RL environment\n",
72 | "Using the [stable_baselines3](https://stable-baselines3.readthedocs.io/en/master/), create a [custom RL environment](https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html) environment. You will have to follow the following steps:\n",
73 | "\n",
74 | "1. Implement a step function than contrains the dynamics (you are free to use the ```quadrotor.next_state```) and a reward function. To speed-up the training, make sure to add a gravity compensation term in your dynamics (i.e. the drone should stay in place when the policy outputs zeros).\n",
75 | " The reward should be made of three terms:\n",
76 | " \n",
77 | "- A positive term to incentivize the quadrotor to reach the target. You can start with a reward bounded between 0 and 1, e.g.\n",
78 | " $\\operatorname{exp}(-\\frac{1}{2} (x - x^{\\star})Q(x - x^{\\star}) -\\frac{1}{2} (u - u_{\\text{gravity}})R(u - u_{\\text{gravity}}))$\n",
79 | "\n",
80 | " \n",
81 | "- A large negative penality(e.g. -100) if the robot get out of the following bounds:\n",
82 | "$ p_x \\in [-4, 4], \\quad v_x \\in [-10, 10] , \\quad p_y \\in [-4, 4] , \\quad v_y \\in [-10, 10] , \\quad \\theta \\in [-2 \\pi, 2 \\pi] , \\quad \\omega \\in [-10, 10] $.\n",
83 | "\n",
84 | " - A negative penalty if the robot hits the obstacle, e.g. -1. You should use the ```quadrotor.check_collision```.\n",
85 | "\n",
86 | "Keep in mind that, in RL, the goal is to maximize a reward (and not minimize a cost like in Optimal Control).\n",
87 | "\n",
88 | "2. Implement a reset function that initializes the state randomly. You can sample uniformly between $[-2, 2]$ for $p_x$ and $p_y$ and initialize the other terms to zero. Make sure to reject samples that are colliding with the obstacles using the ```quadrotor.check_collision```.\n",
89 | "\n",
90 | "3. In the step function, stop the environment using ```truncated``` after 200 steps (Here is an [example](https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/sb3/5_custom_gym_env.ipynb)).\n",
91 | "\n",
92 | "4. In the step function, stop the environment if the drone goes outsite of the provided bounds using ```terminated``` (Here is an [example](https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/sb3/5_custom_gym_env.ipynb)).\n",
93 | "\n",
94 | "5. Make sure that your environment is well defined using the ```check_env``` function.\n",
95 | " \n",
96 | "## Training a policy with PPO \n",
97 | "Train a policy with PPO and use the learned policy to define a controller. Make sure that you can reach the target while avoiding the obstacles starting from $x_0 = [-2, 0, 0, 0 ,0, 0]$\n",
98 | "\n",
99 | "\n",
100 | "\n",
101 | "Please submit your code (as runnable Jupyter Notebook), a pdf report and an mp4 video. In the report, explain your reward design and provide plots showing the trajectory of the quadrotor. The mp4 video should show the quadrotor animation starting from $x_0 = [-2, 0, 0, 0 ,0, 0]$. You can save your animation in the following way:\n"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": 2,
107 | "id": "ae585511-54b6-45ec-8431-5c1cce093345",
108 | "metadata": {},
109 | "outputs": [
110 | {
111 | "data": {
112 | "text/html": [
113 | ""
495 | ]
496 | },
497 | "metadata": {},
498 | "output_type": "display_data"
499 | }
500 | ],
501 | "source": [
502 | "import numpy as np\n",
503 | "%matplotlib inline\n",
504 | "import matplotlib.pyplot as plt\n",
505 | "import matplotlib\n",
506 | "import matplotlib.animation as animation\n",
507 | "import IPython\n",
508 | "\n",
509 | "\n",
510 | "def controller(x, t):\n",
511 | " return np.zeros(2)\n",
512 | "\n",
513 | "x_init = np.array([-2, 0, 0., 0 ,0, 0])\n",
514 | "horizon_length = 200\n",
515 | "t, state, u = quadrotor.simulate(x_init, controller, horizon_length)\n",
516 | "quadrotor.animate_robot(state, u, save_mp4=True)"
517 | ]
518 | }
519 | ],
520 | "metadata": {
521 | "kernelspec": {
522 | "display_name": "Python 3 (ipykernel)",
523 | "language": "python",
524 | "name": "python3"
525 | },
526 | "language_info": {
527 | "codemirror_mode": {
528 | "name": "ipython",
529 | "version": 3
530 | },
531 | "file_extension": ".py",
532 | "mimetype": "text/x-python",
533 | "name": "python",
534 | "nbconvert_exporter": "python",
535 | "pygments_lexer": "ipython3",
536 | "version": "3.12.5"
537 | }
538 | },
539 | "nbformat": 4,
540 | "nbformat_minor": 5
541 | }
542 |
--------------------------------------------------------------------------------
/Project 2/quadrotor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/righetti/optlearningcontrol/36a519c50c88667127c50940343b729e737d6e11/Project 2/quadrotor.png
--------------------------------------------------------------------------------
/Project 2/quadrotor.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.pyplot as plt
3 | from matplotlib.animation import FFMpegWriter
4 |
5 | import matplotlib as mp
6 | import IPython
7 |
8 |
9 |
10 |
11 | # constant
12 | MASS = 0.500
13 | INERTIA = 0.1
14 | LENGTH = 0.15
15 | DT = 0.04 # integration step
16 | GRAVITY_CONSTANT = 9.81
17 |
18 | DIM_STATE = 6
19 | DIM_CONTROL = 2
20 |
21 | # Obstacle definition
22 | OBSTACLE_CENTERS = [np.array([0, 2.]), np.array([0., 0.]), np.array([0, -2.])]
23 | OBSTACLE_RADIUS = 0.5
24 |
25 | # x-y target
26 | TARGET = [2, 0]
27 |
28 |
29 |
30 | def next_state(z, u):
31 | """
32 | Inputs:
33 | z: state of the cart pole syste as a numpy array (px , px dot, py , py dot, theta, omega)
34 | u: 2d control
35 |
36 | Output:
37 | the new state of the pendulum as a numpy array
38 | """
39 | x = z[0]
40 | vx = z[1]
41 | y = z[2]
42 | vy = z[3]
43 | theta = z[4]
44 | omega = z[5]
45 |
46 | dydt = np.zeros([DIM_STATE,])
47 | dydt[0] = vx
48 | dydt[1] = (-(u[0] + u[1]) * np.sin(theta)) / MASS
49 | dydt[2] = vy
50 | dydt[3] = ((u[0] + u[1]) * np.cos(theta) - MASS * GRAVITY_CONSTANT) / MASS
51 | dydt[4] = omega
52 | dydt[5] = (LENGTH * (u[0] - u[1])) / INERTIA
53 |
54 | z_next = z + dydt * DT
55 | return z_next
56 |
57 |
58 |
59 | def check_collision(state):
60 | """
61 | This function checks if the state is colliding with the obstacles
62 |
63 | Inputs:
64 | state: the current state of the quadrotor as a numpy array (x,vx,y,vy,theta,omega)
65 |
66 |
67 | Output:
68 | a boolean value. True if there is contact and False otherwise.
69 | """
70 | dist1 = (state[0] - OBSTACLE_CENTERS[0][0]) ** 2 + (state[2] - OBSTACLE_CENTERS[0][1]) ** 2
71 | dist2 = (state[0] - OBSTACLE_CENTERS[1][0]) ** 2 + (state[2] - OBSTACLE_CENTERS[1][1]) ** 2
72 | dist3 = (state[0] - OBSTACLE_CENTERS[2][0]) ** 2 + (state[2] - OBSTACLE_CENTERS[2][1]) ** 2
73 | if dist1 < OBSTACLE_RADIUS + LENGTH:
74 | return True
75 | elif dist2 < OBSTACLE_RADIUS + LENGTH:
76 | return True
77 | elif dist3 < OBSTACLE_RADIUS + LENGTH:
78 | return True
79 | else:
80 | return False
81 |
82 |
83 |
84 | def simulate(z0, controller, horizon_length):
85 | """
86 | This function simulates the quadrotor for horizon_length steps from initial state z0
87 |
88 | Inputs:
89 | z0: the initial conditions of the quadrotor as a numpy array (x,vx,y,vy,theta,omega)
90 | controller: a function that takes a state z as argument and index i of the time step and returns a control u
91 | horizon_length: the horizon length
92 |
93 |
94 | Output:
95 | t[time_horizon+1] contains the simulation time
96 | z[4xtime_horizon+1] and u[1,time_horizon] containing the time evolution of states and control
97 | """
98 | t = np.zeros([horizon_length + 1,])
99 | z = np.empty([DIM_STATE, horizon_length + 1])
100 | z[:, 0] = z0
101 | u = np.zeros([DIM_CONTROL, horizon_length])
102 | for i in range(horizon_length):
103 | if check_collision(z[:, i]):
104 | print("FAILURE: The robot collided with an obstacle")
105 |
106 | u[:, i] = controller(z[:, i], i)
107 | z[:, i + 1] = next_state(z[:, i], u[:, i])
108 | t[i + 1] = t[i] + DT
109 | return t, z, u
110 |
111 |
112 |
113 | def animate_robot(x, u, dt=0.04, save_mp4=False):
114 | """
115 | This function makes an animation showing the behavior of the quadrotor
116 | takes as input the result of a simulation (with dt=0.04s)
117 | """
118 |
119 | min_dt = 0.04
120 | if dt < min_dt:
121 | steps = int(min_dt / dt)
122 | use_dt = int(np.round(min_dt * 1000))
123 | else:
124 | steps = 1
125 | use_dt = int(np.round(dt * 1000))
126 |
127 | # what we need to plot
128 | plotx = x[:, ::steps]
129 | plotx = plotx[:, :-1]
130 | plotu = u[:, ::steps]
131 |
132 | fig = mp.figure.Figure(figsize=[8.5, 8.5])
133 | mp.backends.backend_agg.FigureCanvasAgg(fig)
134 | ax = fig.add_subplot(111, autoscale_on=False, xlim=[-4, 4], ylim=[-4, 4])
135 | ax.grid()
136 |
137 | list_of_lines = []
138 |
139 | # create the robot
140 | # the main frame
141 | (line,) = ax.plot([], [], "k", lw=6)
142 | list_of_lines.append(line)
143 | # the left propeller
144 | (line,) = ax.plot([], [], "b", lw=4)
145 | list_of_lines.append(line)
146 | # the right propeller
147 | (line,) = ax.plot([], [], "b", lw=4)
148 | list_of_lines.append(line)
149 | # the left thrust
150 | (line,) = ax.plot([], [], "r", lw=1)
151 | list_of_lines.append(line)
152 | # the right thrust
153 | (line,) = ax.plot([], [], "r", lw=1)
154 | list_of_lines.append(line)
155 |
156 | circle1 = plt.Circle((OBSTACLE_CENTERS[0][0], OBSTACLE_CENTERS[0][1]), OBSTACLE_RADIUS, color='k')
157 | circle2 = plt.Circle((OBSTACLE_CENTERS[1][0], OBSTACLE_CENTERS[1][1]), OBSTACLE_RADIUS, color='k')
158 | circle3 = plt.Circle((OBSTACLE_CENTERS[2][0], OBSTACLE_CENTERS[2][1]), OBSTACLE_RADIUS, color='k')
159 | ax.add_patch(circle1)
160 | ax.add_patch(circle2)
161 | ax.add_patch(circle3)
162 |
163 |
164 | circle1 = plt.Circle((TARGET[0], TARGET[1]), 0.04, color='red')
165 | ax.add_patch(circle1)
166 | ax.add_patch(circle2)
167 |
168 | def _animate(i):
169 | for l in list_of_lines: # reset all lines
170 | l.set_data([], [])
171 |
172 | theta = plotx[4, i]
173 | x = plotx[0, i]
174 | y = plotx[2, i]
175 | trans = np.array([[x, x], [y, y]])
176 | rot = np.array(
177 | [[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
178 | )
179 |
180 | main_frame = np.array([[-LENGTH, LENGTH], [0, 0]])
181 | main_frame = rot @ main_frame + trans
182 |
183 | left_propeller = np.array(
184 | [[-1.3 * LENGTH, -0.7 * LENGTH], [0.1, 0.1]]
185 | )
186 | left_propeller = rot @ left_propeller + trans
187 |
188 | right_propeller = np.array(
189 | [[1.3 * LENGTH, 0.7 * LENGTH], [0.1, 0.1]]
190 | )
191 | right_propeller = rot @ right_propeller + trans
192 |
193 | left_thrust = np.array(
194 | [[LENGTH, LENGTH], [0.1, 0.1 + plotu[0, i] * 0.04]]
195 | )
196 | left_thrust = rot @ left_thrust + trans
197 |
198 | right_thrust = np.array(
199 | [[-LENGTH, -LENGTH], [0.1, 0.1 + plotu[0, i] * 0.04]]
200 | )
201 | right_thrust = rot @ right_thrust + trans
202 |
203 | list_of_lines[0].set_data(main_frame[0, :], main_frame[1, :])
204 | list_of_lines[1].set_data(left_propeller[0, :], left_propeller[1, :])
205 | list_of_lines[2].set_data(right_propeller[0, :], right_propeller[1, :])
206 | list_of_lines[3].set_data(left_thrust[0, :], left_thrust[1, :])
207 | list_of_lines[4].set_data(right_thrust[0, :], right_thrust[1, :])
208 |
209 | return list_of_lines
210 |
211 | def _init():
212 | return _animate(0)
213 |
214 | ani = mp.animation.FuncAnimation(
215 | fig,
216 | _animate,
217 | np.arange(0, len(plotx[0, :])),
218 | interval=use_dt,
219 | blit=True,
220 | init_func=_init,
221 | )
222 |
223 | plt.close(fig)
224 | plt.close(ani._fig)
225 |
226 | if save_mp4:
227 | # Create an instance of FFMpegWriter
228 | writer = FFMpegWriter(fps=int(1/dt))
229 |
230 | # Save the animation
231 | ani.save('./animation.mp4', writer=writer)
232 |
233 | IPython.display.display_html(IPython.core.display.HTML(ani.to_html5_video()))
234 |
235 |
236 |
237 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Reinforcement learning and optimal control for robotics (ROB-GY 6323)
2 |
3 | This repository provides material used for the class **Reinforcement learning and optimal control for robotics** (ROB-GY 6323) taught at New York University by [Ludovic Righetti](https://engineering.nyu.edu/faculty/ludovic-righetti).
4 | You are free to use and copy this material (at your own risk), please reference the material if you use it.
5 |
6 | ## Working with python
7 | We work with Python 3.8 using numpy, scipy, matplotlib and Jupyter notebooks.
8 |
9 | Conda is a straightforward, multi-platform, easy-to-use python distribution. It can be downloaded [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) and instructions to get started are available [here](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html)
10 |
11 | Jupyter is a great way to create notebooks for python. A simple tutorial with Jupyter Lab can be found [here](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)
12 |
13 | Python tutorial (the web is full of great tutorials). Here are links to start:
14 | https://docs.python.org/3.8/tutorial/index.html
15 |
16 | Numpy for people coming from Matlab: http://mathesaurus.sourceforge.net/matlab-numpy.html
17 |
18 | Plotting with Python: http://matplotlib.org/users/pyplot_tutorial.html
19 |
20 | ## Installation of minimum package requirements
21 | `conda -o install -c conda-forge jupyter numpy scipy ipympl matplotlib cvxopt`
22 |
23 | ## Issues / Feedback
24 | We welcome feedback. If you find any issues, errors or have any ideas to improve the material, feel free to [create an issue](https://help.github.com/en/articles/creating-an-issue) and we will try to address it.
25 |
26 |
27 | ## Contributors
28 | The material has been developped by [Ludovic Righetti](https://engineering.nyu.edu/faculty/ludovic-righetti).
--------------------------------------------------------------------------------