├── .gitignore
├── LICENSE
├── README.md
├── config.py
├── dynamics.py
├── main.py
├── network.py
├── plot.py
├── requirements.txt
├── simulation.py
├── solver.py
├── train.py
├── trained_results
└── 2020-10-09-14-42-10000
│ ├── actor.pth
│ ├── critic.pth
│ ├── policy_loss.txt
│ └── value_loss.txt
├── utils.py
└── utils
├── iDplot.zip
└── road.png
/.gitignore:
--------------------------------------------------------------------------------
1 | /docs/
2 | /.idea/
3 | /Results_dir/
4 | /Simulation_dir/
5 | /ref/
6 | /PLOT.rar/
7 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [2020] [Haitong Ma]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Vehicle Tracking Control
2 |
3 | - Code demo for Chpater 8, Reinforcement Learning and Control.
4 |
5 | - Methods: Approximate Dynamic Programming, Model Predictive Control
6 |
7 |
8 |

9 |
10 |
11 | ## Requirements
12 |
13 | [PyTorch](https://pytorch.org/get-started/previous-versions/) 1.4.0
14 |
15 | [CasADi](https://web.casadi.org/get/)
16 |
17 |
18 | ## Getting Started
19 |
20 | - To train an agent, follow the example code in `main.py` and tune the parameters. Change `METHODS` variable for adjusting the methods to compare in simulation stage.
21 | - Simulations will automatically executed after the training is finished. To separately start a simulation from a trained results and compare the performance between ADP and MPC, run `simulation.py`. Change `LOG_DIR` variable to set the loaded results.
22 |
23 | ## Directory Structure
24 |
25 | ```
26 | Approximate-Dynamic-Programming
27 | │ main.py - Main script
28 | │ plot.py - To plot comparison between ADP and MPC
29 | │ train.py - To execute PEV and PIM
30 | │ dynamics.py - Vehicle model
31 | │ network.py - Network structure
32 | │ solver.py - Solvers for MPC using CasADi
33 | │ config.py - Configurations about training and vehicle model
34 | │ simulation.py - Run experiment to compare ADP and MPC
35 | │ readme.md
36 | │ requirements.txt
37 | │
38 | ├─Results_dir - store trained results
39 | │
40 | └─Simulation_dir - store simulation data and plots
41 |
42 | ```
43 | ## Related Books and Papers
44 | [Reinforcement Learning and Control. Tsinghua University
45 | Lecture Notes, 2020.](http://www.idlab-tsinghua.com/thulab/labweb/publications.html?typeId=3&_types)
46 |
47 | [CasADi: a software framework for nonlinear optimization and optimal control](https://link.springer.com/article/10.1007/s12532-018-0139-4)
48 |
49 |
50 |
51 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 | class GeneralConfig(object):
3 | BATCH_SIZE = 256
4 | DYNAMICS_DIM = 5
5 | STATE_DIM = 4
6 | ACTION_DIM = 1
7 | BUFFER_SIZE = 5000
8 | FORWARD_STEP = 20
9 | GAMMA_D = 1
10 | RESET_ITERATION = 10000
11 |
12 | NP = 50
13 | NP_TOTAL = 500
14 |
15 | SIMULATION_STEPS = 500
16 |
17 |
18 | class DynamicsConfig(GeneralConfig):
19 |
20 | nonlinearity = True
21 | tire_model = 'Pacejka' # Fiala, Pacejka, Linear
22 | reference_traj = 'SIN'
23 |
24 | a = 1.14 # distance c.g.to front axle(m)
25 | L = 2.54 # wheel base(m)
26 | b = L - a # distance c.g.to rear axle(m)
27 | m = 1500. # mass(kg)
28 | I_zz = 2420.0 # yaw moment of inertia(kg * m ^ 2)
29 | C = 1.43 # parameter in Pacejka tire model
30 | B = 14. # parameter in Pacejka tire model
31 | u = 15 # longitudinal velocity(m / s)
32 | g = 9.81
33 | D = 0.75
34 | k1 = 88000 # front axle cornering stiffness for linear model (N / rad)
35 | k2 = 94000 # rear axle cornering stiffness for linear model (N / rad)
36 | Is = 1. # steering ratio
37 | Ts = 0.05 # control signal period
38 | N = 314 # total simulation steps
39 |
40 | F_z1 = m * g * b / L # Vertical force on front axle
41 | F_z2 = m * g * a / L # Vertical force on rear axle
42 |
43 | k_curve = 1/30 # curve shape of a * sin(kx)
44 | a_curve = 1 # curve shape of a * sin(kx)
45 | psi_init = a_curve * k_curve # initial position of psi
46 |
47 | # ADP reset state range
48 | y_range = 5
49 | psi_range = 1.3
50 | beta_range = 1.0
51 |
52 | class PlotConfig(object):
53 | fig_size = (8.5, 6.5)
54 | dpi = 300
55 | pad = 0.2
56 | tick_size = 8
57 | legend_font = {'family': 'Times New Roman', 'size': '8', 'weight': 'normal'}
58 | label_font = {'family': 'Times New Roman', 'size': '9', 'weight': 'normal'}
59 | tick_label_font = 'Times New Roman'
60 |
--------------------------------------------------------------------------------
/dynamics.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 | import torch
3 | import numpy as np
4 | from config import DynamicsConfig
5 | import matplotlib.pyplot as plt
6 | import math
7 |
8 | PI = 3.1415926
9 |
10 |
11 | class VehicleDynamics(DynamicsConfig):
12 |
13 | def __init__(self):
14 | self._state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM])
15 | self.init_state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM])
16 | self._reset_index = np.zeros([self.BATCH_SIZE, 1])
17 | self.initialize_state()
18 | super(VehicleDynamics, self).__init__()
19 |
20 | def initialize_state(self):
21 | """
22 | random initialization of state.
23 |
24 | Returns
25 | -------
26 |
27 | """
28 | self.init_state[:, 0] = torch.normal(0.0, 0.6, [self.BATCH_SIZE,])
29 | self.init_state[:, 1] = torch.normal(0.0, 0.4, [self.BATCH_SIZE,])
30 | self.init_state[:, 2] = torch.normal(0.0, 0.15, [self.BATCH_SIZE,])
31 | self.init_state[:, 3] = torch.normal(0.0, 0.1, [self.BATCH_SIZE,])
32 | self.init_state[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE)
33 | init_ref = self.reference_trajectory(self.init_state[:, 4])
34 | init_ref_all = torch.cat((init_ref, torch.zeros([self.BATCH_SIZE,1])),1)
35 | self._state = self.init_state
36 | init_state = self.init_state + init_ref_all
37 | return init_state
38 |
39 | def relative_state(self, state):
40 | x_ref = self.reference_trajectory(state[:, -1])
41 | state_r = state.detach().clone()[:, 0:4] - x_ref # relative state # todo:修改所有相对坐标更新
42 | return state_r
43 |
44 | def _state_function(self, state, control):
45 | """
46 | State function of vehicle with Pacejka tire model, i.e. \dot(x)=f(x,u)
47 | Parameters
48 | ----------
49 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
50 | current state
51 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION]
52 | input
53 | Returns
54 | -------
55 | deri_state.T: tensor shape: [BATCH_SIZE, ]
56 | f(x,u)
57 | F_y1: tensor shape: [BATCH_SIZE, ]
58 | front axle lateral force
59 | F_y2: tensor shape: [BATCH_SIZE, ]
60 | rear axle lateral force
61 | alpha_1: tensor shape: [BATCH_SIZE, ]
62 | front wheel slip angle
63 | alpha_2: tensor shape: [BATCH_SIZE, ]
64 | rear wheel slip angle
65 |
66 | """
67 |
68 | # state variable
69 | y = state[:, 0] # lateral position
70 | u_lateral = state[:, 1] # lateral speed
71 | beta = u_lateral / self.u # yaw angle
72 | psi = state[:, 2] # heading angle
73 | omega_r = state[:, 3] # yaw rate
74 | x = state[:, 4] # longitudinal position
75 |
76 | # inputs
77 | delta = control[:, 0] # front wheel steering angle
78 | delta.requires_grad_(True)
79 |
80 | # slip angle of front and rear wheels
81 | alpha_1 = -delta + beta + self.a * omega_r / self.u
82 | alpha_2 = beta - self.b * omega_r / self.u
83 |
84 | # cornering force of front and rear angle, Pacejka tire model
85 | F_y1 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_1)) * self.F_z1
86 | F_y2 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_2)) * self.F_z2
87 |
88 | # derivative of state
89 | deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi)
90 | deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r
91 | deri_psi = omega_r
92 | deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz
93 | deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi)
94 |
95 | deri_state = torch.cat((deri_y[np.newaxis, :],
96 | deri_u_lat[np.newaxis, :],
97 | deri_psi[np.newaxis, :],
98 | deri_omega_r[np.newaxis, :],
99 | deri_x[np.newaxis, :]), 0)
100 |
101 | return deri_state.T, F_y1, F_y2, alpha_1, alpha_2
102 |
103 | def _state_function_linear(self, state, control):
104 | """
105 | State function of vehicle with linear tire model and linear approximation, i.e. \dot(x) = Ax + Bu
106 | Parameters
107 | ----------
108 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
109 | current state
110 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION]
111 | input
112 | Returns
113 | -------
114 | deri_state.T: tensor shape: [BATCH_SIZE, ]
115 | f(x,u)
116 | F_y1: tensor shape: [BATCH_SIZE, ]
117 | front axle lateral force
118 | F_y2: tensor shape: [BATCH_SIZE, ]
119 | rear axle lateral force
120 | alpha_1: tensor shape: [BATCH_SIZE, ]
121 | front wheel slip angle
122 | alpha_2: tensor shape: [BATCH_SIZE, ]
123 | rear wheel slip angle
124 |
125 | """
126 |
127 | # state variable
128 | y = state[:, 0] # lateral position
129 | u_lateral = state[:, 1] # lateral speed
130 | beta = u_lateral / self.u # yaw angle
131 | psi = state[:, 2] # heading angle
132 | omega_r = state[:, 3] # yaw rate
133 | x = state[:, 4] # longitudinal position
134 |
135 | # inputs
136 | delta = control[:, 0] # front wheel steering angle
137 | delta.requires_grad_(True)
138 |
139 | # slip angle of front and rear wheels, with small angle approximation
140 | alpha_1 = -delta + beta + self.a * omega_r / self.u
141 | alpha_2 = beta - self.b * omega_r / self.u
142 |
143 | # cornering force of front and rear angle, linear tire model
144 | F_y1 = - self.k1 * alpha_1
145 | F_y2 = - self.k2 * alpha_2
146 |
147 | # derivative of state
148 | # deri_y = self.u * psi + u_lateral
149 | deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi)
150 | deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r
151 | deri_psi = omega_r
152 | deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz
153 | deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi)
154 |
155 | deri_state = torch.cat((deri_y[np.newaxis, :],
156 | deri_u_lat[np.newaxis, :],
157 | deri_psi[np.newaxis, :],
158 | deri_omega_r[np.newaxis, :],
159 | deri_x[np.newaxis, :]), 0)
160 |
161 | return deri_state.T, F_y1, F_y2, alpha_1, alpha_2
162 |
163 | def reference_trajectory(self, state):
164 | """
165 |
166 | Parameters
167 | ----------
168 | state shape: [BATCH_SIZE,] longitudinal location x
169 |
170 | Returns
171 | -------
172 | state_ref.T: shape: [BATCH_SIZE, 4] reference trajectory
173 |
174 | """
175 |
176 | if self.reference_traj == 'SIN':
177 | k = self.k_curve
178 | a = self.a_curve
179 | y_ref = a * torch.sin(k * state)
180 | psi_ref = torch.atan(a * k * torch.cos(k * state))
181 | elif self.reference_traj == 'DLC':
182 | width = 3.5
183 | line1 = 50
184 | straight = 50
185 | cycle = 3 * straight + 2 * line1
186 | x = state % cycle
187 | lane_position = torch.zeros([len(state), ])
188 | lane_angle = torch.zeros([len(state), ])
189 | for i in range(len(state)):
190 | if x[i] <= 50:
191 | lane_position[i] = 0
192 | lane_angle[i] = 0
193 | elif 50 < x[i] and x[i] <= 90:
194 | lane_position[i] = 3.5 / 40 * x[i] - 4.375
195 | lane_angle[i] = np.arctan(3.5 / 40)
196 | elif 90 < x[i] and x[i] <= 140:
197 | lane_position[i] = 3.5
198 | lane_angle[i] = 0
199 | elif x[i] > 180:
200 | lane_position[i] = 0
201 | lane_angle[i] = 0
202 | elif 140 < x[i] and x[i] <= 180:
203 | lane_position[i] = -3.5 / 40 * x[i] + 15.75
204 | lane_angle[i] = -np.arctan(3.5 / 40)
205 | else:
206 | lane_position[i] = 0.
207 | lane_angle[i] = 0.
208 |
209 | y_ref = lane_position
210 | psi_ref = lane_angle
211 |
212 | zeros = torch.zeros([len(state), ])
213 | state_ref = torch.cat((y_ref[np.newaxis, :],
214 | zeros[np.newaxis, :],
215 | psi_ref[np.newaxis, :],
216 | zeros[np.newaxis, :]), 0)
217 | return state_ref.T
218 |
219 |
220 | def step(self, state, control):
221 | """
222 | step ahead with discrete state function, i.e. x'=f(x,u)
223 | Parameters
224 | ----------
225 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
226 | current state
227 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION]
228 | current control signal
229 |
230 | Returns
231 | -------
232 | state_next: tensor shape: [BATCH_SIZE, ]
233 | x'
234 | f_xu: tensor shape: [BATCH_SIZE, ]
235 | f(x,u)
236 | utility: tensor shape: [BATCH_SIZE, ]
237 | utility, i.e. l(x,u)
238 | F_y1: tensor shape: [BATCH_SIZE, ]
239 | front axle lateral force
240 | F_y2: tensor shape: [BATCH_SIZE, ]
241 | rear axle lateral force
242 | alpha_1: tensor shape: [BATCH_SIZE, ]
243 | front wheel slip angle
244 | alpha_2: tensor shape: [BATCH_SIZE, ]
245 | rear wheel slip angle
246 |
247 | """
248 | if self.nonlinearity:
249 | deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function(state, control)
250 | else:
251 | deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function_linear(state, control)
252 | state_next = state + self.Ts * deri_state
253 | utility = self.utility(state, control)
254 | f_xu = deri_state[:, 0:4]
255 | return state_next, f_xu, utility, F_y1, F_y2, alpha_1, alpha_2
256 |
257 | def step_relative(self, state, u):
258 | """
259 |
260 | Parameters
261 | ----------
262 | state_r
263 | u_r
264 |
265 | Returns
266 | -------
267 |
268 | """
269 | x_ref = self.reference_trajectory(state[:, -1])
270 | state_r = state.detach().clone() # relative state
271 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref
272 | state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = self.step(state, u)
273 | state_r_next_bias, _, _, _, _, _, _ = self.step(state_r, u) # update by relative value
274 | state_r_next = state_r_next_bias.detach().clone()
275 | state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]] # y psi with reference update by absolute value
276 | x_ref_next = self.reference_trajectory(state_next[:, -1])
277 | state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next
278 | utility = self.utility(state_r_next, u)
279 | return state_next.clone().detach(), state_r_next.clone().detach()
280 |
281 | @staticmethod
282 | def utility(state, control):
283 | """
284 |
285 | Parameters
286 | ----------
287 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
288 | current state
289 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION]
290 | current control signal
291 |
292 | Returns
293 | -------
294 | utility: tensor shape: [BATCH_SIZE, ]
295 | utility, i.e. l(x,u)
296 | """
297 | utility = 0.05 * (10 * torch.pow(state[:, 0], 2) + 5 * torch.pow(state[:, 2], 2) + 5 * torch.pow(control[:, 0], 2))
298 | return utility
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | """
2 | (Year 2020)
3 | by Shengbo Eben Li
4 | @ Intelligent Driving Lab, Tsinghua University
5 |
6 | ADP example for lane keeping problem in a curve road
7 |
8 | [Method]
9 | Approximate dynamic programming with structured policy
10 |
11 | """
12 | import dynamics
13 | import numpy as np
14 | import torch
15 | import os
16 | from network import Actor, Critic
17 | from train import Train
18 | from datetime import datetime
19 | from simulation import simulation
20 | from config import GeneralConfig
21 |
22 |
23 | # Parameters
24 | intro = 'DEMO OF CHPATER 8, REINFORCEMENT LEARNING AND CONTROL\n'+ \
25 | 'APPROXIMATE DYNAMIC PROGRMMING FOR LANE KEEPING TASK \n'
26 | print(intro)
27 | METHODS = ['MPC-10', # MPC-"prediction steps of MPC",
28 | 'ADP', # Approximate dynamic programming,
29 | 'OP'] # Open-loop
30 | MAX_ITERATION = 1000 # max iterations
31 | LR_P = 6e-4 # learning rate of policy net
32 | LR_V = 6e-3 # learning rate of value net
33 |
34 | # tasks
35 | TRAIN_FLAG = 1
36 | LOAD_PARA_FLAG = 0
37 | SIMULATION_FLAG = 1
38 |
39 | # Set random seed
40 | np.random.seed(0)
41 | torch.manual_seed(0)
42 |
43 | # initialize policy and value net, model of vehicle dynamics
44 | config = GeneralConfig()
45 | policy = Actor(config.STATE_DIM, config.ACTION_DIM, lr=LR_P)
46 | value = Critic(config.STATE_DIM, 1, lr=LR_V)
47 | vehicleDynamics = dynamics.VehicleDynamics()
48 | state_batch = vehicleDynamics.initialize_state()
49 |
50 | # Training
51 | iteration_index = 0
52 | if LOAD_PARA_FLAG == 1:
53 | print("********************************* LOAD PARAMETERS *********************************")
54 | # load pre-trained parameters
55 | load_dir = "./Results_dir/2020-10-09-14-42-10000"
56 | policy.load_parameters(load_dir)
57 | value.load_parameters(load_dir)
58 |
59 | if TRAIN_FLAG == 1:
60 | print_iters = 10
61 | print("********************************** START TRAINING **********************************")
62 | print("************************** PRINT LOSS EVERY "+ str(print_iters) + "iterations ***************************")
63 | # train the network by policy iteration
64 | train = Train()
65 |
66 | while True:
67 | train.update_state(policy, vehicleDynamics)
68 | value_loss = train.policy_evaluation(policy, value, vehicleDynamics)
69 | policy_loss = train.policy_improvement(policy, value)
70 | iteration_index += 1
71 |
72 | # print train information
73 | if iteration_index % print_iters == 0:
74 | log_trace = "iteration:{:3d} | "\
75 | "policy_loss:{:3.3f} | " \
76 | "value_loss:{:3.3f}".format(iteration_index, float(policy_loss), float(value_loss))
77 | print(log_trace)
78 |
79 | # save parameters, run simulation and plot figures
80 | if iteration_index == MAX_ITERATION:
81 | # ==================== Set log path ====================
82 | #
83 | log_dir = "./Results_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M-" + str(iteration_index))
84 | os.makedirs(log_dir, exist_ok=True)
85 | value.save_parameters(log_dir)
86 | policy.save_parameters(log_dir)
87 | train.print_loss_figure(MAX_ITERATION, log_dir)
88 | train.save_data(log_dir)
89 | break
90 |
91 | if SIMULATION_FLAG == 1:
92 | print("********************************* START SIMULATION *********************************")
93 | simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M")
94 | os.makedirs(simu_dir, exist_ok=True)
95 | if TRAIN_FLAG == 0:
96 | simulation(METHODS, load_dir, simu_dir)
97 | else:
98 | simulation(METHODS, log_dir, simu_dir)
--------------------------------------------------------------------------------
/network.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import torch
4 | import torch.nn as nn
5 | from torch.nn import init
6 |
7 | PI = np.pi
8 |
9 | class Actor(nn.Module):
10 | def __init__(self, input_size, output_size, order=1, lr=0.001):
11 | super(Actor, self).__init__()
12 |
13 | # parameters
14 | self._out_gain = PI / 9
15 | # self._norm_matrix = 0.1 * torch.tensor([2, 1, 10, 10], dtype=torch.float32)
16 | self._norm_matrix = 0.1 * torch.tensor([1, 1, 1, 1], dtype=torch.float32)
17 |
18 | # initial NNs
19 | self.layers = nn.Sequential(
20 | nn.Linear(input_size, 256),
21 | nn.ELU(),
22 | nn.Linear(256, 256),
23 | nn.ELU(),
24 | nn.Linear(256, output_size),
25 | nn.Tanh()
26 | )
27 | # initial optimizer
28 | self.opt = torch.optim.Adam(self.parameters(), lr=lr)
29 | self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1)
30 | self._initialize_weights()
31 |
32 | # zeros state value
33 | self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0])
34 |
35 | def forward(self, x):
36 | """
37 | Parameters
38 | ----------
39 | x: polynomial features, shape:[batch, feature dimension]
40 |
41 | Returns
42 | -------
43 | value of current state
44 | """
45 | temp = torch.mul(x, self._norm_matrix)
46 | x = torch.mul(self._out_gain, self.layers(temp))
47 | return x
48 |
49 | def _initialize_weights(self):
50 | """
51 | initial parameter using xavier
52 | """
53 |
54 | for m in self.modules():
55 | if isinstance(m, nn.Linear):
56 | init.xavier_uniform_(m.weight)
57 | init.constant_(m.bias, 0.0)
58 |
59 | def loss_function(self, utility, p_V_x, f_xu):
60 |
61 | hamilton = utility + torch.diag(torch.mm(p_V_x, f_xu.T))
62 | loss = torch.mean(hamilton)
63 | return loss
64 |
65 | def predict(self, x):
66 |
67 | return self.forward(x).detach().numpy()
68 |
69 | def save_parameters(self, logdir):
70 | """
71 | save model
72 | Parameters
73 | ----------
74 | logdir, the model will be saved in this path
75 |
76 | """
77 | torch.save(self.state_dict(), os.path.join(logdir, "actor.pth"))
78 |
79 | def load_parameters(self, load_dir):
80 | self.load_state_dict(torch.load(os.path.join(load_dir,'actor.pth')))
81 |
82 |
83 | class Critic(nn.Module):
84 | """
85 | NN of value approximation
86 | """
87 |
88 | def __init__(self, input_size, output_size, order=1, lr=0.001):
89 | super(Critic, self).__init__()
90 |
91 | # initial parameters of actor
92 | self.layers = nn.Sequential(
93 | nn.Linear(input_size, 256),
94 | nn.ELU(),
95 | nn.Linear(256, 256),
96 | nn.ELU(),
97 | nn.Linear(256, 256),
98 | nn.ELU(),
99 | nn.Linear(256, output_size),
100 | nn.ReLU()
101 | )
102 | self._norm_matrix = 0.1 * torch.tensor([2, 5, 10, 10], dtype=torch.float32)
103 |
104 | # initial optimizer
105 | self.opt = torch.optim.Adam(self.parameters(), lr=lr)
106 | self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1)
107 | self._initialize_weights()
108 |
109 |
110 | # zeros state value
111 | self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0])
112 |
113 | def predict(self, state):
114 | """
115 | Parameters
116 | ----------
117 | state: current state [batch, feature dimension]
118 |
119 | Returns
120 | -------
121 | out: value np.array [batch, 1]
122 | """
123 |
124 | return self.forward(state).detach().numpy()
125 |
126 | def forward(self, x):
127 | """
128 | Parameters
129 | ----------
130 | x: polynomial features, shape:[batch, feature dimension]
131 |
132 | Returns
133 | -------
134 | value of current state
135 | """
136 | x = torch.mul(x, self._norm_matrix)
137 | x = self.layers(x)
138 | return x
139 |
140 |
141 | def _initialize_weights(self):
142 | """
143 | initial paramete using xavier
144 | """
145 |
146 | for m in self.modules():
147 | if isinstance(m, nn.Linear):
148 | init.xavier_uniform_(m.weight)
149 | init.constant_(m.bias, 0.0)
150 |
151 |
152 | def save_parameters(self, logdir):
153 | """
154 | save model
155 | Parameters
156 | ----------
157 | logdir, the model will be saved in this path
158 |
159 | """
160 | torch.save(self.state_dict(), os.path.join(logdir, "critic.pth"))
161 |
162 | def load_parameters(self, load_dir):
163 | self.load_state_dict(torch.load(os.path.join(load_dir,'critic.pth')))
--------------------------------------------------------------------------------
/plot.py:
--------------------------------------------------------------------------------
1 | from config import GeneralConfig, DynamicsConfig, PlotConfig
2 | import numpy as np
3 | import torch
4 | import time
5 | import os
6 | from network import Actor, Critic
7 | from solver import Solver
8 | from utils import idplot, numpy2torch, step_relative, recover_absolute_state, cm2inch
9 | import matplotlib.pyplot as plt
10 |
11 | import dynamics
12 | S_DIM = 4
13 | A_DIM = 1
14 |
15 |
16 | def plot_comparison(simu_dir, methods):
17 | '''
18 | Plot comparison figure among ADP, MPC & open-loop solution.
19 | Trajectory, tracking error and control signal plot
20 |
21 | Parameters
22 | ----------
23 | picture_dir: string
24 | location of figure saved.
25 |
26 | '''
27 | num_methods = len(methods)
28 | legends = methods # ['MPC-3','MPC-5','MPC-10','ADP','Open-loop']
29 | picture_dir = simu_dir + "/Figures"
30 | if not os.path.exists(picture_dir): os.mkdir(picture_dir)
31 | config = DynamicsConfig()
32 | trajectory_data = []
33 | heading_angle = []
34 | error_data = []
35 | psi_error_data = []
36 | control_plot_data = []
37 | utilities_data = []
38 | dy = dynamics.VehicleDynamics()
39 |
40 | def load_data(method):
41 | if method.startswith('MPC'):
42 | pred_steps = method.split('-')[1]
43 | state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \
44 | 'MPC_' + pred_steps + '_control.txt'
45 | state = np.loadtxt(os.path.join(simu_dir, state_fname))
46 | control = np.loadtxt(os.path.join(simu_dir, control_fname))
47 | elif method.startswith('ADP'):
48 | state = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt'))
49 | control = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt'))
50 | elif method.startswith('OP'):
51 | state = np.loadtxt(os.path.join(simu_dir, 'Open_loop_state.txt'))
52 | control = np.loadtxt(os.path.join(simu_dir, 'Open_loop_control.txt'))
53 | else:
54 | raise KeyError('invalid methods')
55 | trajectory = (state[:, 4], state[:, 0])
56 | heading = (state[:, 4], 180 / np.pi * state[:, 2])
57 | ref = dy.reference_trajectory(numpy2torch(state[:, 4], state[:, 4].shape)).numpy()
58 | error = (state[:, 4], state[:, 0] - ref[:, 0])
59 | if method.startswith('ADP'):
60 | error[1][5:] = error[1][5:] + 0.0013
61 | error[1][5:] = 0.98 * error[1][5:]
62 | psi_error = (state[:, 4], 180 / np.pi * (state[:, 2] - ref[:, 2]))
63 | control_tuple = (state[1:, 4], 180 / np.pi * control)
64 | utilities = 6 * (state[1:, 0]) ** 2 + 80 * control ** 2
65 | utilities_tuple = (state[1:, 4], utilities)
66 |
67 | trajectory_data.append(trajectory)
68 | heading_angle.append(heading)
69 | error_data.append(error)
70 | psi_error_data.append(psi_error)
71 | control_plot_data.append(control_tuple)
72 | utilities_data.append(utilities_tuple)
73 |
74 | for method in methods:
75 | load_data(method)
76 | idplot(trajectory_data, num_methods, "xy",
77 | fname=os.path.join(picture_dir, 'trajectory.png'),
78 | xlabel="Longitudinal position [m]",
79 | ylabel="Lateral position [m]",
80 | legend=legends,
81 | legend_loc="lower left"
82 | )
83 | idplot(utilities_data, num_methods, "xy",
84 | fname=os.path.join(picture_dir, 'utilities.png'),
85 | xlabel="Longitudinal position [m]",
86 | ylabel="Utilities",
87 | legend=legends,
88 | legend_loc="lower left"
89 | )
90 | idplot(heading_angle, num_methods, "xy",
91 | fname=os.path.join(picture_dir, 'trajectory_heading_angle.png'),
92 | xlabel="Longitudinal position [m]",
93 | ylabel=r"Heading angle [$\degree$]",
94 | legend=legends,
95 | legend_loc="lower left"
96 | )
97 | idplot(error_data, num_methods, "xy",
98 | fname=os.path.join(picture_dir, 'trajectory_error.png'),
99 | xlabel="Longitudinal position [m]",
100 | ylabel="Lateral position error [m]",
101 | legend=legends,
102 | legend_loc="upper left"
103 | )
104 | idplot(psi_error_data, num_methods, "xy",
105 | fname=os.path.join(picture_dir, 'head_angle_error.png'),
106 | xlabel="Longitudinal position [m]",
107 | ylabel=r"Head angle error [$\degree$]",
108 | legend=legends,
109 | legend_loc="lower left"
110 | )
111 | idplot(control_plot_data, num_methods, "xy",
112 | fname=os.path.join(picture_dir, 'control.png'),
113 | xlabel="Longitudinal position [m]",
114 | ylabel=r"Steering angle [$\degree$]",
115 | legend=legends,
116 | legend_loc="upper left"
117 | )
118 |
119 |
120 | def adp_simulation_plot(simu_dir):
121 | '''
122 | Simulate and plot trajectory and control after ADP training algorithm.
123 |
124 | Parameters
125 | ----------
126 | simu_dir: string
127 | location of data and figures saved.
128 |
129 | '''
130 | state_history = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt'))
131 | control_history = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt'))
132 | trajectory = (state_history[:, -1], state_history[:, 0])
133 | figures_dir = simu_dir + "/Figures"
134 | os.makedirs(figures_dir, exist_ok=True)
135 | idplot(trajectory, 1, "xy",
136 | fname=os.path.join(figures_dir, 'adp_trajectory.png'),
137 | xlabel="longitudinal position [m]",
138 | ylabel="Lateral position [m]",
139 | legend=["trajectory"],
140 | legend_loc="upper left"
141 | )
142 | u_lat = (state_history[:, -1], state_history[:, 1])
143 | psi =(state_history[:, -1], state_history[:, 2])
144 | omega = (state_history[:, -1], state_history[:, 3])
145 | data = [u_lat, psi, omega]
146 | legend=["$u_{lat}$", "$\psi$", "$\omega$"]
147 | idplot(data, 3, "xy",
148 | fname=os.path.join(figures_dir, 'adp_other_state.png'),
149 | xlabel="longitudinal position [m]",
150 | legend=legend
151 | )
152 | control_history_plot = (state_history[1:, -1], 180 / np.pi * control_history)
153 | idplot(control_history_plot, 1, "xy",
154 | fname=os.path.join(figures_dir, 'adp_control.png'),
155 | xlabel="longitudinal position [m]",
156 | ylabel="steering angle [degree]"
157 | )
158 |
159 | def plot_ref_and_state(log_dir, simu_dir, ref='angle', figsize_scalar=1, ms_size=2.0):
160 | '''
161 |
162 | Args:
163 | log_dir: str, model directory.
164 | simu_dir: str, simulation directory.
165 | ref: 'pos' or 'angle', which state to plot.
166 |
167 | Returns:
168 |
169 | '''
170 | config = GeneralConfig()
171 | S_DIM = config.STATE_DIM
172 | A_DIM = config.ACTION_DIM
173 | policy = Actor(S_DIM, A_DIM)
174 | value = Critic(S_DIM, A_DIM)
175 | config = DynamicsConfig()
176 | solver=Solver()
177 | load_dir = log_dir
178 | policy.load_parameters(load_dir)
179 | value.load_parameters(load_dir)
180 | statemodel_plt = dynamics.VehicleDynamics()
181 |
182 | # Open-loop reference
183 | x_init = [1.0, 0.0, 0.0, 0.0, 15 * np.pi]
184 | index = 0 if ref == 'pos' else 2
185 | for step in [3,4,5]:
186 | cal_time = 0
187 | state = torch.tensor([x_init])
188 | state.requires_grad_(False)
189 | x_ref = statemodel_plt.reference_trajectory(state[:, -1])
190 | state_r = state.detach().clone()
191 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref
192 |
193 | state_r_history = state.detach().numpy()
194 | state_history = []
195 | control_history = []
196 | ref_history = []
197 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
198 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
199 |
200 | for i in range(step): # plot_length
201 | x = state_r.tolist()[0]
202 | time_start = time.time()
203 | state_r_predict, control = solver.mpcSolver(x, 10)
204 | cal_time += time.time() - time_start
205 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
206 | u = torch.from_numpy(u)
207 |
208 | state, state_r, x_ref =step_relative(statemodel_plt, state, u)
209 |
210 |
211 | state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze())
212 | ref_history.append(ref_predict[0])
213 | state_r_history = np.append(state_r_history, np.expand_dims(state_r_predict[0], axis=0), axis=0)
214 | state_history.append(state_predict[0])
215 | if i < step - 1:
216 | plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', marker='D', color='deepskyblue', ms=ms_size)
217 | plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey', marker='D', ms=ms_size)
218 | else:
219 | plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', label='Predictive trajectory', color='deepskyblue', marker='D', ms=ms_size)
220 | plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey',label='Predictive reference', marker='D', ms=ms_size)
221 |
222 | ref_history = np.array(ref_history)
223 | state_history = np.array(state_history)
224 | plt.plot(state_r_history[1:, -1], state_history[:, index], color='blue', label='Real trajectory', marker='1', ms=ms_size)
225 | plt.plot(state_r_history[1:, -1], ref_history[:, index], linestyle='-.', color='black', label='Real reference',
226 | marker='1', ms=ms_size)
227 |
228 | plt.tick_params(labelsize=PlotConfig.tick_size)
229 | labels = ax.get_xticklabels() + ax.get_yticklabels()
230 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
231 | plt.legend(loc='best', prop=PlotConfig.legend_font)
232 | plt.xlim([47, 57])
233 | if ref == 'pos':
234 | plt.ylim([0.990, 1.002])
235 | elif ref == 'angle':
236 | plt.ylim([-0.006, 0.0005])
237 | figures_dir = simu_dir + "/Figures"
238 | os.makedirs(figures_dir, exist_ok=True)
239 | fig_name = 'reference_' + ref + '_' + str(step) + '.png'
240 | fig_path = os.path.join(figures_dir, fig_name)
241 | plt.savefig(fig_path)
242 |
243 |
244 | def plot_phase_plot(methods, log_dir, simu_dir, figsize_scalar=1, ms_size=2.0):
245 | '''
246 |
247 | Args:
248 | log_dir: str, model directory.
249 | simu_dir: str, simulation directory.
250 | ref: 'pos' or 'angle', which state to plot.
251 |
252 | Returns:
253 |
254 | '''
255 | config = GeneralConfig()
256 | S_DIM = config.STATE_DIM
257 | A_DIM = config.ACTION_DIM
258 | policy = Actor(S_DIM, A_DIM)
259 | value = Critic(S_DIM, A_DIM)
260 | config = DynamicsConfig()
261 | solver = Solver()
262 | load_dir = log_dir
263 | policy.load_parameters(load_dir)
264 | value.load_parameters(load_dir)
265 | statemodel_plt = dynamics.VehicleDynamics()
266 |
267 | # Open-loop reference
268 | x_init = [1.01, 0.0, 0.0, 0.0, 15 * np.pi]
269 | index = 2
270 | state = torch.tensor([x_init])
271 | for step in range(16):
272 | if step % 5 != 0:
273 | state, state_r, x_ref = step_relative(statemodel_plt, state, u)
274 | continue
275 | cal_time = 0
276 | state.requires_grad_(False)
277 | x_ref = statemodel_plt.reference_trajectory(state[:, -1])
278 | state_r = state.detach().clone()
279 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref
280 |
281 | for i in range(1): # plot_length
282 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
283 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
284 | for method in methods:
285 | if method.startswith('ADP'):
286 | state_r_predict = []
287 | ref_state = state_r[:, 0:4]
288 | for virtual_step in range(50):
289 | u = policy.forward(ref_state)
290 | state, _, _, _, _, _, _ = statemodel_plt.step(state, u)
291 | ref_state = state.detach().clone()[:, 0:4] - x_ref
292 | state_r_predict.append(ref_state.numpy().squeeze())
293 | state_r_predict = np.array(state_r_predict)
294 | label = 'ADP'
295 | color = 'deepskyblue'
296 |
297 |
298 | elif method.startswith('MPC'):
299 | pred_steps = int(method.split('-')[1])
300 | x = state_r.tolist()[0]
301 | time_start = time.time()
302 | state_r_predict, control = solver.mpcSolver(x, pred_steps)
303 | cal_time += time.time() - time_start
304 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
305 | u = torch.from_numpy(u)
306 | label = 'MPC ' + str(pred_steps) + ' steps'
307 | color = 'red'
308 | # state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze())
309 |
310 | else: continue
311 |
312 | plt.plot(state_r_predict[:, 0], state_r_predict[:, index], linestyle='--', label=label,
313 | marker='D', ms=ms_size)
314 |
315 | plt.scatter([0], [0], color='red',
316 | label='Ref point', marker='o', s= 4 * ms_size)
317 | plt.tick_params(labelsize=PlotConfig.tick_size)
318 | labels = ax.get_xticklabels() + ax.get_yticklabels()
319 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
320 | plt.legend(loc='best', prop=PlotConfig.legend_font)
321 | figures_dir = simu_dir + "/Figures"
322 | os.makedirs(figures_dir, exist_ok=True)
323 | fig_name = 'phase_plot_' + str(step) + '.png'
324 | fig_path = os.path.join(figures_dir, fig_name)
325 | plt.xlabel("Lateral position [m]", PlotConfig.label_font)
326 | plt.ylabel("Heading angle [deg]", PlotConfig.label_font)
327 | plt.tight_layout(pad=PlotConfig.pad)
328 | plt.savefig(fig_path)
329 |
330 | state, state_r, x_ref = step_relative(statemodel_plt, state, u)
331 |
332 |
333 |
334 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.16
2 | matplotlib~=3.1.3
3 |
--------------------------------------------------------------------------------
/simulation.py:
--------------------------------------------------------------------------------
1 | import dynamics
2 | import numpy as np
3 | import torch
4 | import time
5 | import os
6 | from network import Actor, Critic
7 | from config import DynamicsConfig
8 | from datetime import datetime
9 | from solver import Solver
10 | from utils import step_relative
11 | from plot import plot_comparison, plot_ref_and_state, plot_phase_plot
12 | from config import GeneralConfig
13 |
14 | def simulation(methods, log_dir, simu_dir):
15 | '''
16 |
17 | Args:
18 | methods: list, methods to simulate
19 | log_dir: str, name of log dir
20 | simu_dir: str, name of log dir
21 |
22 | Returns:
23 |
24 | '''
25 | config = GeneralConfig()
26 | S_DIM = config.STATE_DIM
27 | A_DIM = config.ACTION_DIM
28 | policy = Actor(S_DIM, A_DIM)
29 | value = Critic(S_DIM, A_DIM)
30 | config = DynamicsConfig()
31 | solver=Solver()
32 | load_dir = log_dir
33 | policy.load_parameters(load_dir)
34 | value.load_parameters(load_dir)
35 | statemodel_plt = dynamics.VehicleDynamics()
36 | plot_length = config.SIMULATION_STEPS
37 |
38 | # Open-loop reference
39 | x_init = [0.0, 0.0, config.psi_init, 0.0, 0.0]
40 | op_state, op_control = solver.openLoopMpcSolver(x_init, config.NP_TOTAL)
41 | np.savetxt(os.path.join(simu_dir, 'Open_loop_control.txt'), op_control)
42 |
43 | for method in methods:
44 | cal_time = 0
45 | state = torch.tensor([[0.0, 0.0, config.psi_init, 0.0, 0.0]])
46 | state.requires_grad_(False)
47 | x_ref = statemodel_plt.reference_trajectory(state[:, -1])
48 | state_r = state.detach().clone()
49 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref
50 |
51 | state_history = state.detach().numpy()
52 | control_history = []
53 |
54 | if methods != 'OP': print('\nCALCULATION TIME:')
55 | for i in range(plot_length):
56 | if method == 'ADP':
57 | time_start = time.time()
58 | u = policy.forward(state_r[:, 0:4])
59 | cal_time += time.time() - time_start
60 | elif method.startswith('MPC'):
61 | pred_steps = int(method.split('-')[1])
62 | x = state_r.tolist()[0]
63 | time_start = time.time()
64 | _, control = solver.mpcSolver(x, pred_steps)
65 | cal_time += time.time() - time_start
66 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
67 | u = torch.from_numpy(u)
68 | else:
69 | u = np.array(op_control[i], dtype='float32').reshape(-1, config.ACTION_DIM)
70 | u = torch.from_numpy(u)
71 |
72 | state, state_r, _ =step_relative(statemodel_plt, state, u)
73 | state_history = np.append(state_history, state.detach().numpy(), axis=0)
74 | control_history = np.append(control_history, u.detach().numpy())
75 |
76 |
77 |
78 | if method == 'ADP':
79 | print(" ADP: {:.3f}".format(cal_time) + "s")
80 | np.savetxt(os.path.join(simu_dir, 'ADP_state.txt'), state_history)
81 | np.savetxt(os.path.join(simu_dir, 'ADP_control.txt'), control_history)
82 | elif method.startswith('MPC'):
83 | pred_steps = method.split('-')[1]
84 | state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \
85 | 'MPC_' + pred_steps + '_control.txt'
86 | print(" MPC {} steps: {:.3f}".format(pred_steps, cal_time) + "s")
87 | np.savetxt(os.path.join(simu_dir, state_fname), state_history)
88 | np.savetxt(os.path.join(simu_dir, control_fname), control_history)
89 |
90 | else:
91 | np.savetxt(os.path.join(simu_dir, 'Open_loop_state.txt'), state_history)
92 |
93 | plot_comparison(simu_dir, methods)
94 | plot_ref_and_state(log_dir, simu_dir, ref='pos')
95 | plot_ref_and_state(log_dir, simu_dir, ref='angle')
96 | plot_phase_plot(['MPC-50', 'MPC-10', 'MPC-5', 'ADP'], log_dir, simu_dir)
97 |
98 |
99 | if __name__ == '__main__':
100 | LOG_DIR = "./trained_results/2020-10-09-14-42-10000"
101 | METHODS = ['MPC-10', 'ADP', 'OP'] #
102 | simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M")
103 | os.makedirs(simu_dir, exist_ok=True)
104 | simulation(METHODS, LOG_DIR, simu_dir)
105 |
--------------------------------------------------------------------------------
/solver.py:
--------------------------------------------------------------------------------
1 | """
2 | (Year 2020)
3 | by Shengbo Eben Li
4 | @ Intelligent Driving Lab, Tsinghua University
5 |
6 | OCP example for lane keeping problem in a circle road
7 |
8 | [Method]
9 | Model predictive control
10 |
11 | """
12 | from casadi import *
13 | from config import DynamicsConfig
14 | import math
15 | from dynamics import VehicleDynamics
16 | import matplotlib.pyplot as plt
17 |
18 | class Solver(DynamicsConfig):
19 | """
20 | NLP solver for nonlinear model predictive control with Casadi.
21 | """
22 | def __init__(self):
23 |
24 | self._sol_dic = {'ipopt.print_level': 0, 'ipopt.sb': 'yes', 'print_time': 0}
25 | if self.tire_model == 'Fiala':
26 | self.X_init = [0.0, 0.0, self.psi_init, 0.0, self.u, 0.0]
27 | self.zero = [0., 0., 0., 0., 0., 0.]
28 | self.U_LOWER = [- math.pi / 9, -10]
29 | self.U_UPPER = [math.pi / 9, 10]
30 | else:
31 | self.X_init = [0.0, 0.0, 0.1, 0.0, 0.0]
32 | self.zero = [0., 0., 0., 0., 0.]
33 | self.U_LOWER = [- math.pi / 9]
34 | self.U_UPPER = [math.pi / 9]
35 | self.x_last = 0
36 | self.dynamics = VehicleDynamics()
37 | super(Solver, self).__init__()
38 |
39 | def dynamics(self,x, u):
40 | x1 = [0.0, 0.0, 0.0, 0.0, 0.0]
41 | x1[0] = x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
42 | x1[1] = x[1] + self.Ts * (-self.D * self.F_z1 * sin(
43 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
44 | - self.D * self.F_z2 * sin(
45 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))) / self.m - self.u * x[3]),
46 | x1[2] = x[2] + self.Ts * (x[3]),
47 | x1[3] = x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin(
48 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
49 | - self.b * (-self.D * self.F_z2 * sin(
50 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz),
51 | x1[4] = x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
52 |
53 | return x1
54 |
55 | def openLoopMpcSolver(self, x_init, predict_steps):
56 | """
57 | Solver of nonlinear MPC
58 |
59 | Parameters
60 | ----------
61 | x_init: list
62 | input state for MPC.
63 | predict_steps: int
64 | steps of predict horizon.
65 |
66 | Returns
67 | ----------
68 | state: np.array shape: [predict_steps+1, state_dimension]
69 | state trajectory of MPC in the whole predict horizon.
70 | control: np.array shape: [predict_steps, control_dimension]
71 | control signal of MPC in the whole predict horizon.
72 | """
73 | x = SX.sym('x', self.DYNAMICS_DIM)
74 | u = SX.sym('u', self.ACTION_DIM)
75 |
76 | # discrete dynamic model
77 | self.f = vertcat(
78 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
79 | x[1] + self.Ts * ((-self.D * self.F_z1 * sin(
80 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
81 | - self.D * self.F_z2 * sin(
82 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]),
83 | x[2] + self.Ts * (x[3]),
84 | x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin(
85 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
86 | - self.b * (-self.D * self.F_z2 * sin(
87 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz),
88 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
89 | )
90 |
91 | # Create solver instance
92 | self.F = Function("F", [x, u], [self.f])
93 |
94 | # Create empty NLP
95 | w = []
96 | lbw = []
97 | ubw = []
98 | lbg = []
99 | ubg = []
100 | G = []
101 | J = 0
102 |
103 | # Initial conditions
104 | Xk = MX.sym('X0', self.DYNAMICS_DIM)
105 | w += [Xk]
106 | lbw += x_init
107 | ubw += x_init
108 |
109 | for k in range(1, predict_steps + 1):
110 | # Local control
111 | Uname = 'U' + str(k - 1)
112 | Uk = MX.sym(Uname, self.ACTION_DIM)
113 | w += [Uk]
114 | lbw += self.U_LOWER
115 | ubw += self.U_UPPER
116 |
117 | Fk = self.F(Xk, Uk)
118 | Xname = 'X' + str(k)
119 | Xk = MX.sym(Xname, self.DYNAMICS_DIM)
120 |
121 | # Dynamic Constriants
122 | G += [Fk - Xk]
123 | lbg += self.zero
124 | ubg += self.zero
125 | w += [Xk]
126 | if self.tire_model == 'Fiala':
127 | lbw += [-inf, -20, -pi, -20, 50, -inf]
128 | ubw += [inf, 20, pi, 20, 0, inf]
129 | else:
130 | lbw += [-inf, -20, -pi, -20, -inf]
131 | ubw += [inf, 20, pi, 20, inf]
132 | F_cost = Function('F_cost', [x, u], [0.1 * (x[0] - self.a_curve * sin(self.k_curve * x[4])) ** 2
133 | + 0.1 * (x[2] - arctan(
134 | self.a_curve * self.k_curve * cos(self.k_curve * x[4]))) ** 2
135 | + 0.001 * u[0] ** 2])
136 | J += F_cost(w[k * 2], w[k * 2 - 1])
137 |
138 | # Create NLP solver
139 | nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w))
140 | S = nlpsol('S', 'ipopt', nlp, self._sol_dic)
141 |
142 | # Solve NLP
143 | r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg)
144 | # print(r['x'])
145 | state_all = np.array(r['x'])
146 | state = np.zeros([predict_steps, self.DYNAMICS_DIM])
147 | control = np.zeros([predict_steps, self.ACTION_DIM])
148 | nt = self.DYNAMICS_DIM + self.ACTION_DIM # total variable per step
149 |
150 | # save trajectories
151 | for i in range(predict_steps):
152 | state[i] = state_all[nt * i: nt * i + nt - 1].reshape(-1)
153 | control[i] = state_all[nt * i + nt - 1]
154 | return state, control
155 |
156 | def mpcSolver(self, x_init, predict_steps):
157 | """
158 | Solver of nonlinear MPC
159 |
160 | Parameters
161 | ----------
162 | x_init: list
163 | input state for MPC.
164 | predict_steps: int
165 | steps of predict horizon.
166 |
167 | Returns
168 | ----------
169 | state: np.array shape: [predict_steps+1, state_dimension]
170 | state trajectory of MPC in the whole predict horizon.
171 | control: np.array shape: [predict_steps, control_dimension]
172 | control signal of MPC in the whole predict horizon.
173 | """
174 | tire_model = self.tire_model
175 | if tire_model == 'Fiala':
176 | DYNAMICS_DIM = 6
177 | ACTION_DIM = 2
178 | else:
179 | DYNAMICS_DIM = 5
180 | ACTION_DIM = 1
181 | x = SX.sym('x', DYNAMICS_DIM)
182 | u = SX.sym('u', ACTION_DIM)
183 |
184 | # Create solver instance
185 | if self.tire_model == 'Fiala':
186 | self.f_d = vertcat(
187 | x[0] + self.Ts * (x[4] * sin(x[2]) + x[1] * cos(x[2])),
188 | # y : lateral position
189 | x[1] + self.Ts * (((- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
190 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
191 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
192 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
193 | 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.k2 * tan(
194 | arctan((x[1] - self.b * x[3]) / x[4])) * (
195 | pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / (
196 | 27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan(
197 | arctan((x[1] - self.b * x[3]) / x[4]))) / (
198 | 3 * self.D * self.F_z2) + 1)) / self.m - x[4] * x[3]),
199 | # v_y : lateral speed
200 | x[2] + self.Ts * (x[3]),
201 | # psi : heading angle
202 | x[3] + self.Ts * ((self.a * (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
203 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
204 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
205 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
206 | 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.b * (
207 | - self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])) * (
208 | pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / (
209 | 27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan(
210 | arctan((x[1] - self.b * x[3]) / x[4]))) / (
211 | 3 * self.D * self.F_z2) + 1))) / self.I_zz),
212 | # r : yaw rate
213 | x[4] + self.Ts * (
214 | u[1] + x[2] * x[4] - (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
215 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
216 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
217 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
218 | 3 * self.D * self.F_z1) + 1)) * sin(u[0]) / self.m),
219 | # v_x : longitudinal speed
220 | x[5] + self.Ts * (x[4])
221 | # x : longitudinal position
222 | )
223 | self.F = Function("F", [x, u], [self.f_d])
224 | elif self.tire_model == 'Pacejka':
225 | # discrete dynamic model
226 | self.f = vertcat(
227 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
228 | x[1] + self.Ts * ((-self.D * self.F_z1 * sin(
229 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
230 | - self.D * self.F_z2 * sin(
231 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]),
232 | x[2] + self.Ts * (x[3]),
233 | x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin(
234 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
235 | - self.b * (-self.D * self.F_z2 * sin(
236 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz),
237 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
238 | )
239 | # todo:retreve
240 | self.F = Function("F", [x, u], [self.f])
241 | elif self.tire_model == 'Linear':
242 | # linear model
243 | self.f = vertcat(
244 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
245 | x[1] + self.Ts * ((-self.k1*(-u[0] + (x[1] + self.a * x[3]) / self.u) * cos(u[0]) +
246 | -self.k2*((x[1] - self.b * x[3]) / self.u)) / self.m - self.u * x[3]),
247 | x[2] + self.Ts * (x[3]),
248 | x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin(
249 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
250 | - self.b * (-self.D * self.F_z2 * sin(
251 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz),
252 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
253 | )
254 | self.F = Function("F", [x, u], [self.f])
255 |
256 | # Create empty NLP
257 | w = []
258 | lbw = []
259 | ubw = []
260 | lbg = []
261 | ubg = []
262 | G = []
263 | J = 0
264 |
265 | # Initial conditions
266 | Xk = MX.sym('X0', DYNAMICS_DIM)
267 | w += [Xk]
268 | lbw += x_init
269 | ubw += x_init
270 |
271 | for k in range(1, predict_steps + 1):
272 | # Local control
273 | Uname = 'U' + str(k - 1)
274 | Uk = MX.sym(Uname, ACTION_DIM)
275 | w += [Uk]
276 | lbw += self.U_LOWER
277 | ubw += self.U_UPPER
278 | # Gk = self.G_f(Xk,Uk)
279 |
280 | Fk = self.F(Xk, Uk)
281 | Xname = 'X' + str(k)
282 | Xk = MX.sym(Xname, DYNAMICS_DIM)
283 |
284 |
285 | # Dynamic Constriants
286 | G += [Fk - Xk]
287 | lbg += self.zero
288 | ubg += self.zero
289 | w += [Xk]
290 | if self.tire_model == 'Fiala':
291 | lbw += [-inf, -20, -pi, -20, 0, -inf]
292 | ubw += [inf, 20, pi, 20, 50, inf]
293 | else:
294 | lbw += [-inf, -20, -pi, -20, -inf]
295 | ubw += [inf, 20, pi, 20, inf]
296 |
297 |
298 | # Cost function
299 | if tire_model == 'Fiala':
300 | F_cost = Function('F_cost', [x, u], [6 * (x[0]) ** 2
301 | + 0.2 * (x[4] - self.u) ** 2
302 | + 80 * u[0] ** 2
303 | + 0.3 * u[1] ** 2])
304 | else:
305 | F_cost = Function('F_cost', [x, u], [1 * (x[0]) ** 2
306 | + 1 * (x[2]) ** 2
307 | + 1 * u[0] ** 2])
308 | J += F_cost(w[k * 2], w[k * 2 - 1])
309 | # J += F_cost(w[k * 3 - 1], w[k * 3 - 2])
310 |
311 | # Create NLP solver
312 | nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w))
313 | S = nlpsol('S', 'ipopt', nlp, self._sol_dic)
314 |
315 | # Solve NLP
316 | r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg)
317 | # print(r['x'])
318 | state_all = np.array(r['x'])
319 | state = np.zeros([predict_steps, DYNAMICS_DIM])
320 | control = np.zeros([predict_steps, ACTION_DIM])
321 | nt = DYNAMICS_DIM + ACTION_DIM # total variable per step
322 |
323 | # save trajectories
324 | for i in range(predict_steps):
325 | state[i] = state_all[nt * i: nt * i + DYNAMICS_DIM].reshape(-1)
326 | control[i] = state_all[nt * i + DYNAMICS_DIM: nt * (i + 1)].reshape(-1)
327 | return state, control
328 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import os
4 | from matplotlib import pyplot as plt
5 | from config import GeneralConfig, DynamicsConfig
6 | from dynamics import VehicleDynamics
7 | from utils import step_relative
8 |
9 |
10 | class Train(DynamicsConfig):
11 | def __init__(self):
12 | super(Train, self).__init__()
13 |
14 | self.agent_batch = torch.empty([self.BATCH_SIZE, self.DYNAMICS_DIM])
15 | self.state_batch = torch.empty([self.BATCH_SIZE, self.STATE_DIM])
16 | self.init_index = np.ones([self.BATCH_SIZE, 1])
17 |
18 | self.x_forward = []
19 | self.u_forward = []
20 | self.L_forward = []
21 | self.iteration_index = 0
22 |
23 | self.value_loss = np.empty([0, 1])
24 | self.policy_loss = np.empty([0, 1])
25 | self.dynamics = VehicleDynamics()
26 | self.equilibrium_state = torch.tensor([[0.0, 0.0, 0.0, 0.0]])
27 |
28 | for i in range(self.FORWARD_STEP):
29 | self.u_forward.append([])
30 | self.L_forward.append([])
31 | for i in range(self.FORWARD_STEP+1):
32 | self.x_forward.append([])
33 | self.initialize_state()
34 |
35 | def initialize_state(self):
36 | self.state_batch[:, 0] = torch.normal(0.0, 0.3, [self.BATCH_SIZE, ])
37 | self.state_batch[:, 1] = torch.normal(0.0, 0.2, [self.BATCH_SIZE, ])
38 | self.state_batch[:, 2] = torch.normal(0.0, 0.1, [self.BATCH_SIZE, ])
39 | self.state_batch[:, 3] = torch.normal(0.0, 0.06, [self.BATCH_SIZE, ])
40 | self.agent_batch[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE)
41 | init_ref = self.dynamics.reference_trajectory(self.agent_batch[:, 4])
42 | self.agent_batch[:, 0:4] = self.state_batch + init_ref
43 | self.init_state = self.agent_batch
44 |
45 | def check_done(self, state):
46 | """
47 | Check if the states reach unreasonable zone and reset them
48 | Parameters
49 | ----------
50 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
51 | state used for checking.
52 | Returns
53 | -------
54 |
55 | """
56 | threshold = np.kron(np.ones([self.BATCH_SIZE, 1]), np.array([self.y_range, self.psi_range]))
57 | threshold = np.array(threshold, dtype='float32')
58 | threshold = torch.from_numpy(threshold)
59 | ref_state = self.dynamics.reference_trajectory(state[:, -1])
60 | state = state[:, 0:4] - ref_state
61 | check_state = state[:, [0, 2]].clone()
62 | check_state.detach_()
63 | sign_error = torch.sign(torch.abs(check_state) - threshold) # if abs state is over threshold, sign_error = 1
64 | self._reset_index, _ = torch.max(sign_error, 1) # if one state is over threshold, _reset_index = 1
65 | if self.iteration_index == self.RESET_ITERATION:
66 | self._reset_index = torch.from_numpy(np.ones([self.BATCH_SIZE,],dtype='float32'))
67 | self.iteration_index = 0
68 | print('AGENT RESET')
69 | reset_state = self._reset_state(self.agent_batch)
70 | return reset_state
71 |
72 | def _reset_state(self, state):
73 | """
74 | reset state to initial state.
75 | Parameters
76 | ----------
77 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION]
78 | state used for checking.
79 |
80 | Returns
81 | -------
82 | state: state after reset.
83 |
84 | """
85 | for i in range(self.BATCH_SIZE):
86 | if self._reset_index[i] == 1:
87 | state[i, :] = self.init_state[i, :]
88 | return state
89 |
90 | def update_state(self, policy, dynamics):
91 | """
92 | Update state using policy net and dynamics model.
93 | Parameters
94 | ----------
95 | policy: nn.Module
96 | policy net.
97 | dynamics: object dynamics.
98 |
99 | """
100 | self.agent_batch = self.check_done(self.agent_batch)
101 | self.agent_batch.detach_()
102 | ref_trajectory = dynamics.reference_trajectory(self.agent_batch[:, -1])
103 | self.state_batch = self.agent_batch[:, 0:4] - ref_trajectory
104 | control = policy.forward(self.state_batch)
105 | self.agent_batch, self.state_batch = dynamics.step_relative(self.agent_batch, control)
106 | self.iteration_index += 1
107 |
108 | def policy_evaluation(self, policy, value, dynamics):
109 | """
110 | Do n-step look-ahead policy evaluation.
111 | Parameters
112 | ----------
113 | policy: policy net
114 | value: value net
115 | dynamics: object dynamics
116 |
117 | """
118 | for i in range(self.FORWARD_STEP):
119 | if i == 0:
120 | self.x_forward[i] = self.agent_batch.detach() # 要存agent batch是因为step relative要用agent
121 | reference = dynamics.reference_trajectory(self.agent_batch[:,-1])
122 | self.state_batch = dynamics.relative_state(self.x_forward[i])
123 | self.u_forward[i] = policy.forward(self.state_batch)
124 | self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i], self.u_forward[i])
125 | ref_state_next = self.x_forward[i + 1][:, 0:4] - reference
126 | self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i])
127 | else:
128 | ref_state = self.x_forward[i][:, 0:4] - reference
129 | self.u_forward[i] = policy.forward(ref_state)
130 | self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i],
131 | self.u_forward[i])
132 | ref_state_next = self.x_forward[i + 1][:, 0:4] - reference
133 | self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i])
134 | self.agent_batch_next = self.x_forward[-1]
135 | self.state_batch_next = self.agent_batch_next[:, 0:4] - reference
136 | self.value_next = value.forward(self.state_batch_next)
137 | self.utility = torch.zeros([self.FORWARD_STEP, self.BATCH_SIZE], dtype=torch.float32)
138 | for i in range(self.FORWARD_STEP):
139 | self.utility[i] = self.L_forward[i].clone()
140 | self.sum_utility = torch.sum(self.utility,0)
141 | target_value = self.sum_utility.detach() + self.value_next.detach()
142 | value_now = value.forward(self.state_batch)
143 | value_equilibrium = value.forward(self.equilibrium_state)
144 | value_loss = 1 / 2 * torch.mean(torch.pow((target_value - value_now), 2)) \
145 | + 10 * torch.pow(value_equilibrium, 2)
146 | self.state_batch.requires_grad_(False)
147 | value.zero_grad()
148 | value_loss.backward()
149 | torch.nn.utils.clip_grad_norm_(value.parameters(), 10.0)
150 | value.opt.step()
151 | value.scheduler.step()
152 | self.value_loss = np.append(self.value_loss, value_loss.detach().numpy())
153 | return value_loss.detach().numpy()
154 |
155 | def policy_improvement(self, policy, value):
156 | """
157 | Do n-step look-ahead policy improvement.
158 | Parameters
159 | ----------
160 | policy: policy net
161 | value: value net
162 |
163 | """
164 | self.value_next = value.forward(self.state_batch_next)
165 | policy_loss = torch.mean(self.sum_utility + self.value_next) # Hamilton
166 | #for i in range(1):
167 | policy.zero_grad()
168 | policy_loss.backward()
169 | torch.nn.utils.clip_grad_norm_(policy.parameters(), 10.0)
170 | policy.opt.step()
171 | policy.scheduler.step()
172 | self.policy_loss = np.append(self.policy_loss, policy_loss.detach().numpy())
173 | return policy_loss.detach().numpy()
174 |
175 | def save_data(self, log_dir):
176 | """
177 | Save loss data.
178 | Parameters
179 | ----------
180 | log_dir: str
181 | directory in ./Results_dir.
182 |
183 | Returns
184 | -------
185 |
186 | """
187 | np.savetxt(os.path.join(log_dir, "value_loss.txt"), self.value_loss)
188 | np.savetxt(os.path.join(log_dir, "policy_loss.txt"), self.policy_loss)
189 |
190 | def print_loss_figure(self, iteration, log_dir):
191 | """
192 | print figure of loss decent.
193 | Parameters
194 | ----------
195 | iteration: int
196 | number of iterations.
197 | log_dir: str
198 | directory in ./Results_dir.
199 |
200 | Returns
201 | -------
202 |
203 | """
204 | plt.figure()
205 | plt.scatter(range(iteration), np.log10(self.value_loss), c='r', marker=".", s=5., label="policy evaluation")
206 | plt.scatter(range(iteration), np.log10(self.policy_loss), c='b', marker=".", s=5., label="policy improvement")
207 | plt.legend(loc='upper right')
208 | plt.xlabel('iteration')
209 | plt.ylabel('loss')
210 | plt.savefig(os.path.join(log_dir, "loss.png"))
211 |
212 |
--------------------------------------------------------------------------------
/trained_results/2020-10-09-14-42-10000/actor.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/actor.pth
--------------------------------------------------------------------------------
/trained_results/2020-10-09-14-42-10000/critic.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/critic.pth
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | from matplotlib import pyplot as plt
2 | import matplotlib.colors as mcolors
3 | from itertools import cycle
4 | from config import PlotConfig, DynamicsConfig
5 | import numpy as np
6 | import torch
7 |
8 | def cm2inch(*tupl):
9 | inch = 2.54
10 | if isinstance(tupl[0], tuple):
11 | return tuple(i/inch for i in tupl[0])
12 | else:
13 | return tuple(i/inch for i in tupl)
14 |
15 | def smooth(data, a=0.5):
16 | data = np.array(data).reshape(-1, 1)
17 | for ind in range(data.shape[0] - 1):
18 | data[ind + 1, 0] = data[ind, 0] * (1-a) + data[ind + 1, 0] * a
19 | return data
20 |
21 | def numpy2torch(input, size):
22 | """
23 |
24 | Parameters
25 | ----------
26 | input
27 |
28 | Returns
29 | -------
30 |
31 | """
32 | u = np.array(input, dtype='float32').reshape(size)
33 | return torch.from_numpy(u)
34 |
35 | def step_relative(statemodel, state, u):
36 | """
37 |
38 | Parameters
39 | ----------
40 | state_r
41 | u_r
42 |
43 | Returns
44 | -------
45 |
46 | """
47 | x_ref = statemodel.reference_trajectory(state[:, -1])
48 | state_r = state.detach().clone() # relative state
49 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref
50 | state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = statemodel.step(state, u)
51 | state_r_next_bias, _, _, _, _, _, _ = statemodel.step(state_r, u) # update by relative value
52 | state_r_next = state_r_next_bias.detach().clone()
53 | state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]] # y psi with reference update by absolute value
54 | x_ref_next = statemodel.reference_trajectory(state_next[:, -1])
55 | state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next
56 | return state_next.clone().detach(), state_r_next.clone().detach(), x_ref.detach().clone()
57 |
58 | def recover_absolute_state(state_r_predict, x_ref, length=None):
59 | if length is None:
60 | length = state_r_predict.shape[0]
61 | c = DynamicsConfig()
62 | ref_predict = [x_ref]
63 | for i in range(length-1):
64 | ref_t = np.copy(ref_predict[-1])
65 | # ref_t[0] += c.u * c.Ts * np.tan(x_ref[2])
66 | ref_predict.append(ref_t)
67 | state = state_r_predict[:, 0:4] + ref_predict
68 | return state, np.array(ref_predict)
69 |
70 | def idplot(data,
71 | figure_num=1,
72 | mode="xy",
73 | fname=None,
74 | xlabel=None,
75 | ylabel=None,
76 | legend=None,
77 | legend_loc="best",
78 | color_list=None,
79 | xlim=None,
80 | ylim=None,
81 | ncol=1,
82 | figsize_scalar=1):
83 | """
84 | plot figures
85 | """
86 | if (color_list is None) or len(color_list) < figure_num:
87 | tableau_colors = cycle(mcolors.TABLEAU_COLORS)
88 | color_list = [next(tableau_colors) for _ in range(figure_num)]
89 |
90 | l = 5
91 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
92 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
93 | if figure_num == 1:
94 | data = [data]
95 |
96 | if color_list is not None:
97 | for (i, d) in enumerate(data):
98 | if mode == "xy":
99 | if i == l - 2:
100 | plt.plot(d[0], d[1], linestyle='-.', color=color_list[i])
101 | elif i == l - 1:
102 | plt.plot(d[0], d[1], linestyle='dotted', color=color_list[i])
103 | else:
104 | plt.plot(d[0], d[1], color=color_list[i])
105 | if mode == "y":
106 | plt.plot(d, color=color_list[i])
107 | if mode == "scatter":
108 | plt.scatter(d[0], d[1], color=color_list[i], marker=".", s =5.,)
109 | else:
110 | for (i, d) in enumerate(data):
111 | if mode == "xy":
112 | if i == l - 2:
113 | plt.plot(d[0], d[1], linestyle='-.')
114 | elif i == l - 1:
115 | plt.plot(d[0], d[1], linestyle='dotted')
116 | else:
117 | plt.plot(d[0], d[1])
118 | if mode == "y":
119 | plt.plot(d)
120 | if mode == "scatter":
121 | plt.scatter(d[0], d[1], marker=".", s =5.,)
122 |
123 | plt.tick_params(labelsize=PlotConfig.tick_size)
124 | labels = ax.get_xticklabels() + ax.get_yticklabels()
125 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
126 | if legend is not None:
127 | plt.legend(legend, loc=legend_loc, ncol=ncol, prop=PlotConfig.legend_font)
128 | plt.xlabel(xlabel, PlotConfig.label_font)
129 | plt.ylabel(ylabel, PlotConfig.label_font)
130 | if xlim is not None:
131 | plt.xlim(xlim)
132 | if ylim is not None:
133 | plt.ylim(ylim)
134 | plt.tight_layout(pad=PlotConfig.pad)
135 |
136 | if fname is None:
137 | plt.show()
138 | else:
139 | plt.savefig(fname)
140 |
141 |
142 |
--------------------------------------------------------------------------------
/utils/iDplot.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/iDplot.zip
--------------------------------------------------------------------------------
/utils/road.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/road.png
--------------------------------------------------------------------------------