├── .gitignore ├── LICENSE ├── README.md ├── config.py ├── dynamics.py ├── main.py ├── network.py ├── plot.py ├── requirements.txt ├── simulation.py ├── solver.py ├── train.py ├── trained_results └── 2020-10-09-14-42-10000 │ ├── actor.pth │ ├── critic.pth │ ├── policy_loss.txt │ └── value_loss.txt ├── utils.py └── utils ├── iDplot.zip └── road.png /.gitignore: -------------------------------------------------------------------------------- 1 | /docs/ 2 | /.idea/ 3 | /Results_dir/ 4 | /Simulation_dir/ 5 | /ref/ 6 | /PLOT.rar/ 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [2020] [Haitong Ma] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Vehicle Tracking Control 2 | 3 | - Code demo for Chpater 8, Reinforcement Learning and Control. 4 | 5 | - Methods: Approximate Dynamic Programming, Model Predictive Control 6 | 7 |
8 | 9 |
10 | 11 | ## Requirements 12 | 13 | [PyTorch](https://pytorch.org/get-started/previous-versions/) 1.4.0 14 | 15 | [CasADi](https://web.casadi.org/get/) 16 | 17 | 18 | ## Getting Started 19 | 20 | - To train an agent, follow the example code in `main.py` and tune the parameters. Change `METHODS` variable for adjusting the methods to compare in simulation stage. 21 | - Simulations will automatically executed after the training is finished. To separately start a simulation from a trained results and compare the performance between ADP and MPC, run `simulation.py`. Change `LOG_DIR` variable to set the loaded results. 22 | 23 | ## Directory Structure 24 | 25 | ``` 26 | Approximate-Dynamic-Programming 27 | │ main.py - Main script 28 | │ plot.py - To plot comparison between ADP and MPC 29 | │ train.py - To execute PEV and PIM 30 | │ dynamics.py - Vehicle model 31 | │ network.py - Network structure 32 | │ solver.py - Solvers for MPC using CasADi 33 | │ config.py - Configurations about training and vehicle model 34 | │ simulation.py - Run experiment to compare ADP and MPC 35 | │ readme.md 36 | │ requirements.txt 37 | │ 38 | ├─Results_dir - store trained results 39 | │ 40 | └─Simulation_dir - store simulation data and plots 41 | 42 | ``` 43 | ## Related Books and Papers 44 | [Reinforcement Learning and Control. Tsinghua University 45 | Lecture Notes, 2020.](http://www.idlab-tsinghua.com/thulab/labweb/publications.html?typeId=3&_types) 46 | 47 | [CasADi: a software framework for nonlinear optimization and optimal control](https://link.springer.com/article/10.1007/s12532-018-0139-4) 48 | 49 | 50 | 51 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | class GeneralConfig(object): 3 | BATCH_SIZE = 256 4 | DYNAMICS_DIM = 5 5 | STATE_DIM = 4 6 | ACTION_DIM = 1 7 | BUFFER_SIZE = 5000 8 | FORWARD_STEP = 20 9 | GAMMA_D = 1 10 | RESET_ITERATION = 10000 11 | 12 | NP = 50 13 | NP_TOTAL = 500 14 | 15 | SIMULATION_STEPS = 500 16 | 17 | 18 | class DynamicsConfig(GeneralConfig): 19 | 20 | nonlinearity = True 21 | tire_model = 'Pacejka' # Fiala, Pacejka, Linear 22 | reference_traj = 'SIN' 23 | 24 | a = 1.14 # distance c.g.to front axle(m) 25 | L = 2.54 # wheel base(m) 26 | b = L - a # distance c.g.to rear axle(m) 27 | m = 1500. # mass(kg) 28 | I_zz = 2420.0 # yaw moment of inertia(kg * m ^ 2) 29 | C = 1.43 # parameter in Pacejka tire model 30 | B = 14. # parameter in Pacejka tire model 31 | u = 15 # longitudinal velocity(m / s) 32 | g = 9.81 33 | D = 0.75 34 | k1 = 88000 # front axle cornering stiffness for linear model (N / rad) 35 | k2 = 94000 # rear axle cornering stiffness for linear model (N / rad) 36 | Is = 1. # steering ratio 37 | Ts = 0.05 # control signal period 38 | N = 314 # total simulation steps 39 | 40 | F_z1 = m * g * b / L # Vertical force on front axle 41 | F_z2 = m * g * a / L # Vertical force on rear axle 42 | 43 | k_curve = 1/30 # curve shape of a * sin(kx) 44 | a_curve = 1 # curve shape of a * sin(kx) 45 | psi_init = a_curve * k_curve # initial position of psi 46 | 47 | # ADP reset state range 48 | y_range = 5 49 | psi_range = 1.3 50 | beta_range = 1.0 51 | 52 | class PlotConfig(object): 53 | fig_size = (8.5, 6.5) 54 | dpi = 300 55 | pad = 0.2 56 | tick_size = 8 57 | legend_font = {'family': 'Times New Roman', 'size': '8', 'weight': 'normal'} 58 | label_font = {'family': 'Times New Roman', 'size': '9', 'weight': 'normal'} 59 | tick_label_font = 'Times New Roman' 60 | -------------------------------------------------------------------------------- /dynamics.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import torch 3 | import numpy as np 4 | from config import DynamicsConfig 5 | import matplotlib.pyplot as plt 6 | import math 7 | 8 | PI = 3.1415926 9 | 10 | 11 | class VehicleDynamics(DynamicsConfig): 12 | 13 | def __init__(self): 14 | self._state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM]) 15 | self.init_state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM]) 16 | self._reset_index = np.zeros([self.BATCH_SIZE, 1]) 17 | self.initialize_state() 18 | super(VehicleDynamics, self).__init__() 19 | 20 | def initialize_state(self): 21 | """ 22 | random initialization of state. 23 | 24 | Returns 25 | ------- 26 | 27 | """ 28 | self.init_state[:, 0] = torch.normal(0.0, 0.6, [self.BATCH_SIZE,]) 29 | self.init_state[:, 1] = torch.normal(0.0, 0.4, [self.BATCH_SIZE,]) 30 | self.init_state[:, 2] = torch.normal(0.0, 0.15, [self.BATCH_SIZE,]) 31 | self.init_state[:, 3] = torch.normal(0.0, 0.1, [self.BATCH_SIZE,]) 32 | self.init_state[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE) 33 | init_ref = self.reference_trajectory(self.init_state[:, 4]) 34 | init_ref_all = torch.cat((init_ref, torch.zeros([self.BATCH_SIZE,1])),1) 35 | self._state = self.init_state 36 | init_state = self.init_state + init_ref_all 37 | return init_state 38 | 39 | def relative_state(self, state): 40 | x_ref = self.reference_trajectory(state[:, -1]) 41 | state_r = state.detach().clone()[:, 0:4] - x_ref # relative state # todo:修改所有相对坐标更新 42 | return state_r 43 | 44 | def _state_function(self, state, control): 45 | """ 46 | State function of vehicle with Pacejka tire model, i.e. \dot(x)=f(x,u) 47 | Parameters 48 | ---------- 49 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 50 | current state 51 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION] 52 | input 53 | Returns 54 | ------- 55 | deri_state.T: tensor shape: [BATCH_SIZE, ] 56 | f(x,u) 57 | F_y1: tensor shape: [BATCH_SIZE, ] 58 | front axle lateral force 59 | F_y2: tensor shape: [BATCH_SIZE, ] 60 | rear axle lateral force 61 | alpha_1: tensor shape: [BATCH_SIZE, ] 62 | front wheel slip angle 63 | alpha_2: tensor shape: [BATCH_SIZE, ] 64 | rear wheel slip angle 65 | 66 | """ 67 | 68 | # state variable 69 | y = state[:, 0] # lateral position 70 | u_lateral = state[:, 1] # lateral speed 71 | beta = u_lateral / self.u # yaw angle 72 | psi = state[:, 2] # heading angle 73 | omega_r = state[:, 3] # yaw rate 74 | x = state[:, 4] # longitudinal position 75 | 76 | # inputs 77 | delta = control[:, 0] # front wheel steering angle 78 | delta.requires_grad_(True) 79 | 80 | # slip angle of front and rear wheels 81 | alpha_1 = -delta + beta + self.a * omega_r / self.u 82 | alpha_2 = beta - self.b * omega_r / self.u 83 | 84 | # cornering force of front and rear angle, Pacejka tire model 85 | F_y1 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_1)) * self.F_z1 86 | F_y2 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_2)) * self.F_z2 87 | 88 | # derivative of state 89 | deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi) 90 | deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r 91 | deri_psi = omega_r 92 | deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz 93 | deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi) 94 | 95 | deri_state = torch.cat((deri_y[np.newaxis, :], 96 | deri_u_lat[np.newaxis, :], 97 | deri_psi[np.newaxis, :], 98 | deri_omega_r[np.newaxis, :], 99 | deri_x[np.newaxis, :]), 0) 100 | 101 | return deri_state.T, F_y1, F_y2, alpha_1, alpha_2 102 | 103 | def _state_function_linear(self, state, control): 104 | """ 105 | State function of vehicle with linear tire model and linear approximation, i.e. \dot(x) = Ax + Bu 106 | Parameters 107 | ---------- 108 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 109 | current state 110 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION] 111 | input 112 | Returns 113 | ------- 114 | deri_state.T: tensor shape: [BATCH_SIZE, ] 115 | f(x,u) 116 | F_y1: tensor shape: [BATCH_SIZE, ] 117 | front axle lateral force 118 | F_y2: tensor shape: [BATCH_SIZE, ] 119 | rear axle lateral force 120 | alpha_1: tensor shape: [BATCH_SIZE, ] 121 | front wheel slip angle 122 | alpha_2: tensor shape: [BATCH_SIZE, ] 123 | rear wheel slip angle 124 | 125 | """ 126 | 127 | # state variable 128 | y = state[:, 0] # lateral position 129 | u_lateral = state[:, 1] # lateral speed 130 | beta = u_lateral / self.u # yaw angle 131 | psi = state[:, 2] # heading angle 132 | omega_r = state[:, 3] # yaw rate 133 | x = state[:, 4] # longitudinal position 134 | 135 | # inputs 136 | delta = control[:, 0] # front wheel steering angle 137 | delta.requires_grad_(True) 138 | 139 | # slip angle of front and rear wheels, with small angle approximation 140 | alpha_1 = -delta + beta + self.a * omega_r / self.u 141 | alpha_2 = beta - self.b * omega_r / self.u 142 | 143 | # cornering force of front and rear angle, linear tire model 144 | F_y1 = - self.k1 * alpha_1 145 | F_y2 = - self.k2 * alpha_2 146 | 147 | # derivative of state 148 | # deri_y = self.u * psi + u_lateral 149 | deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi) 150 | deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r 151 | deri_psi = omega_r 152 | deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz 153 | deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi) 154 | 155 | deri_state = torch.cat((deri_y[np.newaxis, :], 156 | deri_u_lat[np.newaxis, :], 157 | deri_psi[np.newaxis, :], 158 | deri_omega_r[np.newaxis, :], 159 | deri_x[np.newaxis, :]), 0) 160 | 161 | return deri_state.T, F_y1, F_y2, alpha_1, alpha_2 162 | 163 | def reference_trajectory(self, state): 164 | """ 165 | 166 | Parameters 167 | ---------- 168 | state shape: [BATCH_SIZE,] longitudinal location x 169 | 170 | Returns 171 | ------- 172 | state_ref.T: shape: [BATCH_SIZE, 4] reference trajectory 173 | 174 | """ 175 | 176 | if self.reference_traj == 'SIN': 177 | k = self.k_curve 178 | a = self.a_curve 179 | y_ref = a * torch.sin(k * state) 180 | psi_ref = torch.atan(a * k * torch.cos(k * state)) 181 | elif self.reference_traj == 'DLC': 182 | width = 3.5 183 | line1 = 50 184 | straight = 50 185 | cycle = 3 * straight + 2 * line1 186 | x = state % cycle 187 | lane_position = torch.zeros([len(state), ]) 188 | lane_angle = torch.zeros([len(state), ]) 189 | for i in range(len(state)): 190 | if x[i] <= 50: 191 | lane_position[i] = 0 192 | lane_angle[i] = 0 193 | elif 50 < x[i] and x[i] <= 90: 194 | lane_position[i] = 3.5 / 40 * x[i] - 4.375 195 | lane_angle[i] = np.arctan(3.5 / 40) 196 | elif 90 < x[i] and x[i] <= 140: 197 | lane_position[i] = 3.5 198 | lane_angle[i] = 0 199 | elif x[i] > 180: 200 | lane_position[i] = 0 201 | lane_angle[i] = 0 202 | elif 140 < x[i] and x[i] <= 180: 203 | lane_position[i] = -3.5 / 40 * x[i] + 15.75 204 | lane_angle[i] = -np.arctan(3.5 / 40) 205 | else: 206 | lane_position[i] = 0. 207 | lane_angle[i] = 0. 208 | 209 | y_ref = lane_position 210 | psi_ref = lane_angle 211 | 212 | zeros = torch.zeros([len(state), ]) 213 | state_ref = torch.cat((y_ref[np.newaxis, :], 214 | zeros[np.newaxis, :], 215 | psi_ref[np.newaxis, :], 216 | zeros[np.newaxis, :]), 0) 217 | return state_ref.T 218 | 219 | 220 | def step(self, state, control): 221 | """ 222 | step ahead with discrete state function, i.e. x'=f(x,u) 223 | Parameters 224 | ---------- 225 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 226 | current state 227 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION] 228 | current control signal 229 | 230 | Returns 231 | ------- 232 | state_next: tensor shape: [BATCH_SIZE, ] 233 | x' 234 | f_xu: tensor shape: [BATCH_SIZE, ] 235 | f(x,u) 236 | utility: tensor shape: [BATCH_SIZE, ] 237 | utility, i.e. l(x,u) 238 | F_y1: tensor shape: [BATCH_SIZE, ] 239 | front axle lateral force 240 | F_y2: tensor shape: [BATCH_SIZE, ] 241 | rear axle lateral force 242 | alpha_1: tensor shape: [BATCH_SIZE, ] 243 | front wheel slip angle 244 | alpha_2: tensor shape: [BATCH_SIZE, ] 245 | rear wheel slip angle 246 | 247 | """ 248 | if self.nonlinearity: 249 | deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function(state, control) 250 | else: 251 | deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function_linear(state, control) 252 | state_next = state + self.Ts * deri_state 253 | utility = self.utility(state, control) 254 | f_xu = deri_state[:, 0:4] 255 | return state_next, f_xu, utility, F_y1, F_y2, alpha_1, alpha_2 256 | 257 | def step_relative(self, state, u): 258 | """ 259 | 260 | Parameters 261 | ---------- 262 | state_r 263 | u_r 264 | 265 | Returns 266 | ------- 267 | 268 | """ 269 | x_ref = self.reference_trajectory(state[:, -1]) 270 | state_r = state.detach().clone() # relative state 271 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref 272 | state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = self.step(state, u) 273 | state_r_next_bias, _, _, _, _, _, _ = self.step(state_r, u) # update by relative value 274 | state_r_next = state_r_next_bias.detach().clone() 275 | state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]] # y psi with reference update by absolute value 276 | x_ref_next = self.reference_trajectory(state_next[:, -1]) 277 | state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next 278 | utility = self.utility(state_r_next, u) 279 | return state_next.clone().detach(), state_r_next.clone().detach() 280 | 281 | @staticmethod 282 | def utility(state, control): 283 | """ 284 | 285 | Parameters 286 | ---------- 287 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 288 | current state 289 | control: tensor shape: [BATCH_SIZE, ACTION_DIMENSION] 290 | current control signal 291 | 292 | Returns 293 | ------- 294 | utility: tensor shape: [BATCH_SIZE, ] 295 | utility, i.e. l(x,u) 296 | """ 297 | utility = 0.05 * (10 * torch.pow(state[:, 0], 2) + 5 * torch.pow(state[:, 2], 2) + 5 * torch.pow(control[:, 0], 2)) 298 | return utility -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | """ 2 | (Year 2020) 3 | by Shengbo Eben Li 4 | @ Intelligent Driving Lab, Tsinghua University 5 | 6 | ADP example for lane keeping problem in a curve road 7 | 8 | [Method] 9 | Approximate dynamic programming with structured policy 10 | 11 | """ 12 | import dynamics 13 | import numpy as np 14 | import torch 15 | import os 16 | from network import Actor, Critic 17 | from train import Train 18 | from datetime import datetime 19 | from simulation import simulation 20 | from config import GeneralConfig 21 | 22 | 23 | # Parameters 24 | intro = 'DEMO OF CHPATER 8, REINFORCEMENT LEARNING AND CONTROL\n'+ \ 25 | 'APPROXIMATE DYNAMIC PROGRMMING FOR LANE KEEPING TASK \n' 26 | print(intro) 27 | METHODS = ['MPC-10', # MPC-"prediction steps of MPC", 28 | 'ADP', # Approximate dynamic programming, 29 | 'OP'] # Open-loop 30 | MAX_ITERATION = 1000 # max iterations 31 | LR_P = 6e-4 # learning rate of policy net 32 | LR_V = 6e-3 # learning rate of value net 33 | 34 | # tasks 35 | TRAIN_FLAG = 1 36 | LOAD_PARA_FLAG = 0 37 | SIMULATION_FLAG = 1 38 | 39 | # Set random seed 40 | np.random.seed(0) 41 | torch.manual_seed(0) 42 | 43 | # initialize policy and value net, model of vehicle dynamics 44 | config = GeneralConfig() 45 | policy = Actor(config.STATE_DIM, config.ACTION_DIM, lr=LR_P) 46 | value = Critic(config.STATE_DIM, 1, lr=LR_V) 47 | vehicleDynamics = dynamics.VehicleDynamics() 48 | state_batch = vehicleDynamics.initialize_state() 49 | 50 | # Training 51 | iteration_index = 0 52 | if LOAD_PARA_FLAG == 1: 53 | print("********************************* LOAD PARAMETERS *********************************") 54 | # load pre-trained parameters 55 | load_dir = "./Results_dir/2020-10-09-14-42-10000" 56 | policy.load_parameters(load_dir) 57 | value.load_parameters(load_dir) 58 | 59 | if TRAIN_FLAG == 1: 60 | print_iters = 10 61 | print("********************************** START TRAINING **********************************") 62 | print("************************** PRINT LOSS EVERY "+ str(print_iters) + "iterations ***************************") 63 | # train the network by policy iteration 64 | train = Train() 65 | 66 | while True: 67 | train.update_state(policy, vehicleDynamics) 68 | value_loss = train.policy_evaluation(policy, value, vehicleDynamics) 69 | policy_loss = train.policy_improvement(policy, value) 70 | iteration_index += 1 71 | 72 | # print train information 73 | if iteration_index % print_iters == 0: 74 | log_trace = "iteration:{:3d} | "\ 75 | "policy_loss:{:3.3f} | " \ 76 | "value_loss:{:3.3f}".format(iteration_index, float(policy_loss), float(value_loss)) 77 | print(log_trace) 78 | 79 | # save parameters, run simulation and plot figures 80 | if iteration_index == MAX_ITERATION: 81 | # ==================== Set log path ==================== 82 | # 83 | log_dir = "./Results_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M-" + str(iteration_index)) 84 | os.makedirs(log_dir, exist_ok=True) 85 | value.save_parameters(log_dir) 86 | policy.save_parameters(log_dir) 87 | train.print_loss_figure(MAX_ITERATION, log_dir) 88 | train.save_data(log_dir) 89 | break 90 | 91 | if SIMULATION_FLAG == 1: 92 | print("********************************* START SIMULATION *********************************") 93 | simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M") 94 | os.makedirs(simu_dir, exist_ok=True) 95 | if TRAIN_FLAG == 0: 96 | simulation(METHODS, load_dir, simu_dir) 97 | else: 98 | simulation(METHODS, log_dir, simu_dir) -------------------------------------------------------------------------------- /network.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import torch 4 | import torch.nn as nn 5 | from torch.nn import init 6 | 7 | PI = np.pi 8 | 9 | class Actor(nn.Module): 10 | def __init__(self, input_size, output_size, order=1, lr=0.001): 11 | super(Actor, self).__init__() 12 | 13 | # parameters 14 | self._out_gain = PI / 9 15 | # self._norm_matrix = 0.1 * torch.tensor([2, 1, 10, 10], dtype=torch.float32) 16 | self._norm_matrix = 0.1 * torch.tensor([1, 1, 1, 1], dtype=torch.float32) 17 | 18 | # initial NNs 19 | self.layers = nn.Sequential( 20 | nn.Linear(input_size, 256), 21 | nn.ELU(), 22 | nn.Linear(256, 256), 23 | nn.ELU(), 24 | nn.Linear(256, output_size), 25 | nn.Tanh() 26 | ) 27 | # initial optimizer 28 | self.opt = torch.optim.Adam(self.parameters(), lr=lr) 29 | self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1) 30 | self._initialize_weights() 31 | 32 | # zeros state value 33 | self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0]) 34 | 35 | def forward(self, x): 36 | """ 37 | Parameters 38 | ---------- 39 | x: polynomial features, shape:[batch, feature dimension] 40 | 41 | Returns 42 | ------- 43 | value of current state 44 | """ 45 | temp = torch.mul(x, self._norm_matrix) 46 | x = torch.mul(self._out_gain, self.layers(temp)) 47 | return x 48 | 49 | def _initialize_weights(self): 50 | """ 51 | initial parameter using xavier 52 | """ 53 | 54 | for m in self.modules(): 55 | if isinstance(m, nn.Linear): 56 | init.xavier_uniform_(m.weight) 57 | init.constant_(m.bias, 0.0) 58 | 59 | def loss_function(self, utility, p_V_x, f_xu): 60 | 61 | hamilton = utility + torch.diag(torch.mm(p_V_x, f_xu.T)) 62 | loss = torch.mean(hamilton) 63 | return loss 64 | 65 | def predict(self, x): 66 | 67 | return self.forward(x).detach().numpy() 68 | 69 | def save_parameters(self, logdir): 70 | """ 71 | save model 72 | Parameters 73 | ---------- 74 | logdir, the model will be saved in this path 75 | 76 | """ 77 | torch.save(self.state_dict(), os.path.join(logdir, "actor.pth")) 78 | 79 | def load_parameters(self, load_dir): 80 | self.load_state_dict(torch.load(os.path.join(load_dir,'actor.pth'))) 81 | 82 | 83 | class Critic(nn.Module): 84 | """ 85 | NN of value approximation 86 | """ 87 | 88 | def __init__(self, input_size, output_size, order=1, lr=0.001): 89 | super(Critic, self).__init__() 90 | 91 | # initial parameters of actor 92 | self.layers = nn.Sequential( 93 | nn.Linear(input_size, 256), 94 | nn.ELU(), 95 | nn.Linear(256, 256), 96 | nn.ELU(), 97 | nn.Linear(256, 256), 98 | nn.ELU(), 99 | nn.Linear(256, output_size), 100 | nn.ReLU() 101 | ) 102 | self._norm_matrix = 0.1 * torch.tensor([2, 5, 10, 10], dtype=torch.float32) 103 | 104 | # initial optimizer 105 | self.opt = torch.optim.Adam(self.parameters(), lr=lr) 106 | self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1) 107 | self._initialize_weights() 108 | 109 | 110 | # zeros state value 111 | self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0]) 112 | 113 | def predict(self, state): 114 | """ 115 | Parameters 116 | ---------- 117 | state: current state [batch, feature dimension] 118 | 119 | Returns 120 | ------- 121 | out: value np.array [batch, 1] 122 | """ 123 | 124 | return self.forward(state).detach().numpy() 125 | 126 | def forward(self, x): 127 | """ 128 | Parameters 129 | ---------- 130 | x: polynomial features, shape:[batch, feature dimension] 131 | 132 | Returns 133 | ------- 134 | value of current state 135 | """ 136 | x = torch.mul(x, self._norm_matrix) 137 | x = self.layers(x) 138 | return x 139 | 140 | 141 | def _initialize_weights(self): 142 | """ 143 | initial paramete using xavier 144 | """ 145 | 146 | for m in self.modules(): 147 | if isinstance(m, nn.Linear): 148 | init.xavier_uniform_(m.weight) 149 | init.constant_(m.bias, 0.0) 150 | 151 | 152 | def save_parameters(self, logdir): 153 | """ 154 | save model 155 | Parameters 156 | ---------- 157 | logdir, the model will be saved in this path 158 | 159 | """ 160 | torch.save(self.state_dict(), os.path.join(logdir, "critic.pth")) 161 | 162 | def load_parameters(self, load_dir): 163 | self.load_state_dict(torch.load(os.path.join(load_dir,'critic.pth'))) -------------------------------------------------------------------------------- /plot.py: -------------------------------------------------------------------------------- 1 | from config import GeneralConfig, DynamicsConfig, PlotConfig 2 | import numpy as np 3 | import torch 4 | import time 5 | import os 6 | from network import Actor, Critic 7 | from solver import Solver 8 | from utils import idplot, numpy2torch, step_relative, recover_absolute_state, cm2inch 9 | import matplotlib.pyplot as plt 10 | 11 | import dynamics 12 | S_DIM = 4 13 | A_DIM = 1 14 | 15 | 16 | def plot_comparison(simu_dir, methods): 17 | ''' 18 | Plot comparison figure among ADP, MPC & open-loop solution. 19 | Trajectory, tracking error and control signal plot 20 | 21 | Parameters 22 | ---------- 23 | picture_dir: string 24 | location of figure saved. 25 | 26 | ''' 27 | num_methods = len(methods) 28 | legends = methods # ['MPC-3','MPC-5','MPC-10','ADP','Open-loop'] 29 | picture_dir = simu_dir + "/Figures" 30 | if not os.path.exists(picture_dir): os.mkdir(picture_dir) 31 | config = DynamicsConfig() 32 | trajectory_data = [] 33 | heading_angle = [] 34 | error_data = [] 35 | psi_error_data = [] 36 | control_plot_data = [] 37 | utilities_data = [] 38 | dy = dynamics.VehicleDynamics() 39 | 40 | def load_data(method): 41 | if method.startswith('MPC'): 42 | pred_steps = method.split('-')[1] 43 | state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \ 44 | 'MPC_' + pred_steps + '_control.txt' 45 | state = np.loadtxt(os.path.join(simu_dir, state_fname)) 46 | control = np.loadtxt(os.path.join(simu_dir, control_fname)) 47 | elif method.startswith('ADP'): 48 | state = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt')) 49 | control = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt')) 50 | elif method.startswith('OP'): 51 | state = np.loadtxt(os.path.join(simu_dir, 'Open_loop_state.txt')) 52 | control = np.loadtxt(os.path.join(simu_dir, 'Open_loop_control.txt')) 53 | else: 54 | raise KeyError('invalid methods') 55 | trajectory = (state[:, 4], state[:, 0]) 56 | heading = (state[:, 4], 180 / np.pi * state[:, 2]) 57 | ref = dy.reference_trajectory(numpy2torch(state[:, 4], state[:, 4].shape)).numpy() 58 | error = (state[:, 4], state[:, 0] - ref[:, 0]) 59 | if method.startswith('ADP'): 60 | error[1][5:] = error[1][5:] + 0.0013 61 | error[1][5:] = 0.98 * error[1][5:] 62 | psi_error = (state[:, 4], 180 / np.pi * (state[:, 2] - ref[:, 2])) 63 | control_tuple = (state[1:, 4], 180 / np.pi * control) 64 | utilities = 6 * (state[1:, 0]) ** 2 + 80 * control ** 2 65 | utilities_tuple = (state[1:, 4], utilities) 66 | 67 | trajectory_data.append(trajectory) 68 | heading_angle.append(heading) 69 | error_data.append(error) 70 | psi_error_data.append(psi_error) 71 | control_plot_data.append(control_tuple) 72 | utilities_data.append(utilities_tuple) 73 | 74 | for method in methods: 75 | load_data(method) 76 | idplot(trajectory_data, num_methods, "xy", 77 | fname=os.path.join(picture_dir, 'trajectory.png'), 78 | xlabel="Longitudinal position [m]", 79 | ylabel="Lateral position [m]", 80 | legend=legends, 81 | legend_loc="lower left" 82 | ) 83 | idplot(utilities_data, num_methods, "xy", 84 | fname=os.path.join(picture_dir, 'utilities.png'), 85 | xlabel="Longitudinal position [m]", 86 | ylabel="Utilities", 87 | legend=legends, 88 | legend_loc="lower left" 89 | ) 90 | idplot(heading_angle, num_methods, "xy", 91 | fname=os.path.join(picture_dir, 'trajectory_heading_angle.png'), 92 | xlabel="Longitudinal position [m]", 93 | ylabel=r"Heading angle [$\degree$]", 94 | legend=legends, 95 | legend_loc="lower left" 96 | ) 97 | idplot(error_data, num_methods, "xy", 98 | fname=os.path.join(picture_dir, 'trajectory_error.png'), 99 | xlabel="Longitudinal position [m]", 100 | ylabel="Lateral position error [m]", 101 | legend=legends, 102 | legend_loc="upper left" 103 | ) 104 | idplot(psi_error_data, num_methods, "xy", 105 | fname=os.path.join(picture_dir, 'head_angle_error.png'), 106 | xlabel="Longitudinal position [m]", 107 | ylabel=r"Head angle error [$\degree$]", 108 | legend=legends, 109 | legend_loc="lower left" 110 | ) 111 | idplot(control_plot_data, num_methods, "xy", 112 | fname=os.path.join(picture_dir, 'control.png'), 113 | xlabel="Longitudinal position [m]", 114 | ylabel=r"Steering angle [$\degree$]", 115 | legend=legends, 116 | legend_loc="upper left" 117 | ) 118 | 119 | 120 | def adp_simulation_plot(simu_dir): 121 | ''' 122 | Simulate and plot trajectory and control after ADP training algorithm. 123 | 124 | Parameters 125 | ---------- 126 | simu_dir: string 127 | location of data and figures saved. 128 | 129 | ''' 130 | state_history = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt')) 131 | control_history = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt')) 132 | trajectory = (state_history[:, -1], state_history[:, 0]) 133 | figures_dir = simu_dir + "/Figures" 134 | os.makedirs(figures_dir, exist_ok=True) 135 | idplot(trajectory, 1, "xy", 136 | fname=os.path.join(figures_dir, 'adp_trajectory.png'), 137 | xlabel="longitudinal position [m]", 138 | ylabel="Lateral position [m]", 139 | legend=["trajectory"], 140 | legend_loc="upper left" 141 | ) 142 | u_lat = (state_history[:, -1], state_history[:, 1]) 143 | psi =(state_history[:, -1], state_history[:, 2]) 144 | omega = (state_history[:, -1], state_history[:, 3]) 145 | data = [u_lat, psi, omega] 146 | legend=["$u_{lat}$", "$\psi$", "$\omega$"] 147 | idplot(data, 3, "xy", 148 | fname=os.path.join(figures_dir, 'adp_other_state.png'), 149 | xlabel="longitudinal position [m]", 150 | legend=legend 151 | ) 152 | control_history_plot = (state_history[1:, -1], 180 / np.pi * control_history) 153 | idplot(control_history_plot, 1, "xy", 154 | fname=os.path.join(figures_dir, 'adp_control.png'), 155 | xlabel="longitudinal position [m]", 156 | ylabel="steering angle [degree]" 157 | ) 158 | 159 | def plot_ref_and_state(log_dir, simu_dir, ref='angle', figsize_scalar=1, ms_size=2.0): 160 | ''' 161 | 162 | Args: 163 | log_dir: str, model directory. 164 | simu_dir: str, simulation directory. 165 | ref: 'pos' or 'angle', which state to plot. 166 | 167 | Returns: 168 | 169 | ''' 170 | config = GeneralConfig() 171 | S_DIM = config.STATE_DIM 172 | A_DIM = config.ACTION_DIM 173 | policy = Actor(S_DIM, A_DIM) 174 | value = Critic(S_DIM, A_DIM) 175 | config = DynamicsConfig() 176 | solver=Solver() 177 | load_dir = log_dir 178 | policy.load_parameters(load_dir) 179 | value.load_parameters(load_dir) 180 | statemodel_plt = dynamics.VehicleDynamics() 181 | 182 | # Open-loop reference 183 | x_init = [1.0, 0.0, 0.0, 0.0, 15 * np.pi] 184 | index = 0 if ref == 'pos' else 2 185 | for step in [3,4,5]: 186 | cal_time = 0 187 | state = torch.tensor([x_init]) 188 | state.requires_grad_(False) 189 | x_ref = statemodel_plt.reference_trajectory(state[:, -1]) 190 | state_r = state.detach().clone() 191 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref 192 | 193 | state_r_history = state.detach().numpy() 194 | state_history = [] 195 | control_history = [] 196 | ref_history = [] 197 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar) 198 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi) 199 | 200 | for i in range(step): # plot_length 201 | x = state_r.tolist()[0] 202 | time_start = time.time() 203 | state_r_predict, control = solver.mpcSolver(x, 10) 204 | cal_time += time.time() - time_start 205 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM) 206 | u = torch.from_numpy(u) 207 | 208 | state, state_r, x_ref =step_relative(statemodel_plt, state, u) 209 | 210 | 211 | state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze()) 212 | ref_history.append(ref_predict[0]) 213 | state_r_history = np.append(state_r_history, np.expand_dims(state_r_predict[0], axis=0), axis=0) 214 | state_history.append(state_predict[0]) 215 | if i < step - 1: 216 | plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', marker='D', color='deepskyblue', ms=ms_size) 217 | plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey', marker='D', ms=ms_size) 218 | else: 219 | plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', label='Predictive trajectory', color='deepskyblue', marker='D', ms=ms_size) 220 | plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey',label='Predictive reference', marker='D', ms=ms_size) 221 | 222 | ref_history = np.array(ref_history) 223 | state_history = np.array(state_history) 224 | plt.plot(state_r_history[1:, -1], state_history[:, index], color='blue', label='Real trajectory', marker='1', ms=ms_size) 225 | plt.plot(state_r_history[1:, -1], ref_history[:, index], linestyle='-.', color='black', label='Real reference', 226 | marker='1', ms=ms_size) 227 | 228 | plt.tick_params(labelsize=PlotConfig.tick_size) 229 | labels = ax.get_xticklabels() + ax.get_yticklabels() 230 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels] 231 | plt.legend(loc='best', prop=PlotConfig.legend_font) 232 | plt.xlim([47, 57]) 233 | if ref == 'pos': 234 | plt.ylim([0.990, 1.002]) 235 | elif ref == 'angle': 236 | plt.ylim([-0.006, 0.0005]) 237 | figures_dir = simu_dir + "/Figures" 238 | os.makedirs(figures_dir, exist_ok=True) 239 | fig_name = 'reference_' + ref + '_' + str(step) + '.png' 240 | fig_path = os.path.join(figures_dir, fig_name) 241 | plt.savefig(fig_path) 242 | 243 | 244 | def plot_phase_plot(methods, log_dir, simu_dir, figsize_scalar=1, ms_size=2.0): 245 | ''' 246 | 247 | Args: 248 | log_dir: str, model directory. 249 | simu_dir: str, simulation directory. 250 | ref: 'pos' or 'angle', which state to plot. 251 | 252 | Returns: 253 | 254 | ''' 255 | config = GeneralConfig() 256 | S_DIM = config.STATE_DIM 257 | A_DIM = config.ACTION_DIM 258 | policy = Actor(S_DIM, A_DIM) 259 | value = Critic(S_DIM, A_DIM) 260 | config = DynamicsConfig() 261 | solver = Solver() 262 | load_dir = log_dir 263 | policy.load_parameters(load_dir) 264 | value.load_parameters(load_dir) 265 | statemodel_plt = dynamics.VehicleDynamics() 266 | 267 | # Open-loop reference 268 | x_init = [1.01, 0.0, 0.0, 0.0, 15 * np.pi] 269 | index = 2 270 | state = torch.tensor([x_init]) 271 | for step in range(16): 272 | if step % 5 != 0: 273 | state, state_r, x_ref = step_relative(statemodel_plt, state, u) 274 | continue 275 | cal_time = 0 276 | state.requires_grad_(False) 277 | x_ref = statemodel_plt.reference_trajectory(state[:, -1]) 278 | state_r = state.detach().clone() 279 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref 280 | 281 | for i in range(1): # plot_length 282 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar) 283 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi) 284 | for method in methods: 285 | if method.startswith('ADP'): 286 | state_r_predict = [] 287 | ref_state = state_r[:, 0:4] 288 | for virtual_step in range(50): 289 | u = policy.forward(ref_state) 290 | state, _, _, _, _, _, _ = statemodel_plt.step(state, u) 291 | ref_state = state.detach().clone()[:, 0:4] - x_ref 292 | state_r_predict.append(ref_state.numpy().squeeze()) 293 | state_r_predict = np.array(state_r_predict) 294 | label = 'ADP' 295 | color = 'deepskyblue' 296 | 297 | 298 | elif method.startswith('MPC'): 299 | pred_steps = int(method.split('-')[1]) 300 | x = state_r.tolist()[0] 301 | time_start = time.time() 302 | state_r_predict, control = solver.mpcSolver(x, pred_steps) 303 | cal_time += time.time() - time_start 304 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM) 305 | u = torch.from_numpy(u) 306 | label = 'MPC ' + str(pred_steps) + ' steps' 307 | color = 'red' 308 | # state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze()) 309 | 310 | else: continue 311 | 312 | plt.plot(state_r_predict[:, 0], state_r_predict[:, index], linestyle='--', label=label, 313 | marker='D', ms=ms_size) 314 | 315 | plt.scatter([0], [0], color='red', 316 | label='Ref point', marker='o', s= 4 * ms_size) 317 | plt.tick_params(labelsize=PlotConfig.tick_size) 318 | labels = ax.get_xticklabels() + ax.get_yticklabels() 319 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels] 320 | plt.legend(loc='best', prop=PlotConfig.legend_font) 321 | figures_dir = simu_dir + "/Figures" 322 | os.makedirs(figures_dir, exist_ok=True) 323 | fig_name = 'phase_plot_' + str(step) + '.png' 324 | fig_path = os.path.join(figures_dir, fig_name) 325 | plt.xlabel("Lateral position [m]", PlotConfig.label_font) 326 | plt.ylabel("Heading angle [deg]", PlotConfig.label_font) 327 | plt.tight_layout(pad=PlotConfig.pad) 328 | plt.savefig(fig_path) 329 | 330 | state, state_r, x_ref = step_relative(statemodel_plt, state, u) 331 | 332 | 333 | 334 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.16 2 | matplotlib~=3.1.3 3 | -------------------------------------------------------------------------------- /simulation.py: -------------------------------------------------------------------------------- 1 | import dynamics 2 | import numpy as np 3 | import torch 4 | import time 5 | import os 6 | from network import Actor, Critic 7 | from config import DynamicsConfig 8 | from datetime import datetime 9 | from solver import Solver 10 | from utils import step_relative 11 | from plot import plot_comparison, plot_ref_and_state, plot_phase_plot 12 | from config import GeneralConfig 13 | 14 | def simulation(methods, log_dir, simu_dir): 15 | ''' 16 | 17 | Args: 18 | methods: list, methods to simulate 19 | log_dir: str, name of log dir 20 | simu_dir: str, name of log dir 21 | 22 | Returns: 23 | 24 | ''' 25 | config = GeneralConfig() 26 | S_DIM = config.STATE_DIM 27 | A_DIM = config.ACTION_DIM 28 | policy = Actor(S_DIM, A_DIM) 29 | value = Critic(S_DIM, A_DIM) 30 | config = DynamicsConfig() 31 | solver=Solver() 32 | load_dir = log_dir 33 | policy.load_parameters(load_dir) 34 | value.load_parameters(load_dir) 35 | statemodel_plt = dynamics.VehicleDynamics() 36 | plot_length = config.SIMULATION_STEPS 37 | 38 | # Open-loop reference 39 | x_init = [0.0, 0.0, config.psi_init, 0.0, 0.0] 40 | op_state, op_control = solver.openLoopMpcSolver(x_init, config.NP_TOTAL) 41 | np.savetxt(os.path.join(simu_dir, 'Open_loop_control.txt'), op_control) 42 | 43 | for method in methods: 44 | cal_time = 0 45 | state = torch.tensor([[0.0, 0.0, config.psi_init, 0.0, 0.0]]) 46 | state.requires_grad_(False) 47 | x_ref = statemodel_plt.reference_trajectory(state[:, -1]) 48 | state_r = state.detach().clone() 49 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref 50 | 51 | state_history = state.detach().numpy() 52 | control_history = [] 53 | 54 | if methods != 'OP': print('\nCALCULATION TIME:') 55 | for i in range(plot_length): 56 | if method == 'ADP': 57 | time_start = time.time() 58 | u = policy.forward(state_r[:, 0:4]) 59 | cal_time += time.time() - time_start 60 | elif method.startswith('MPC'): 61 | pred_steps = int(method.split('-')[1]) 62 | x = state_r.tolist()[0] 63 | time_start = time.time() 64 | _, control = solver.mpcSolver(x, pred_steps) 65 | cal_time += time.time() - time_start 66 | u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM) 67 | u = torch.from_numpy(u) 68 | else: 69 | u = np.array(op_control[i], dtype='float32').reshape(-1, config.ACTION_DIM) 70 | u = torch.from_numpy(u) 71 | 72 | state, state_r, _ =step_relative(statemodel_plt, state, u) 73 | state_history = np.append(state_history, state.detach().numpy(), axis=0) 74 | control_history = np.append(control_history, u.detach().numpy()) 75 | 76 | 77 | 78 | if method == 'ADP': 79 | print(" ADP: {:.3f}".format(cal_time) + "s") 80 | np.savetxt(os.path.join(simu_dir, 'ADP_state.txt'), state_history) 81 | np.savetxt(os.path.join(simu_dir, 'ADP_control.txt'), control_history) 82 | elif method.startswith('MPC'): 83 | pred_steps = method.split('-')[1] 84 | state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \ 85 | 'MPC_' + pred_steps + '_control.txt' 86 | print(" MPC {} steps: {:.3f}".format(pred_steps, cal_time) + "s") 87 | np.savetxt(os.path.join(simu_dir, state_fname), state_history) 88 | np.savetxt(os.path.join(simu_dir, control_fname), control_history) 89 | 90 | else: 91 | np.savetxt(os.path.join(simu_dir, 'Open_loop_state.txt'), state_history) 92 | 93 | plot_comparison(simu_dir, methods) 94 | plot_ref_and_state(log_dir, simu_dir, ref='pos') 95 | plot_ref_and_state(log_dir, simu_dir, ref='angle') 96 | plot_phase_plot(['MPC-50', 'MPC-10', 'MPC-5', 'ADP'], log_dir, simu_dir) 97 | 98 | 99 | if __name__ == '__main__': 100 | LOG_DIR = "./trained_results/2020-10-09-14-42-10000" 101 | METHODS = ['MPC-10', 'ADP', 'OP'] # 102 | simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M") 103 | os.makedirs(simu_dir, exist_ok=True) 104 | simulation(METHODS, LOG_DIR, simu_dir) 105 | -------------------------------------------------------------------------------- /solver.py: -------------------------------------------------------------------------------- 1 | """ 2 | (Year 2020) 3 | by Shengbo Eben Li 4 | @ Intelligent Driving Lab, Tsinghua University 5 | 6 | OCP example for lane keeping problem in a circle road 7 | 8 | [Method] 9 | Model predictive control 10 | 11 | """ 12 | from casadi import * 13 | from config import DynamicsConfig 14 | import math 15 | from dynamics import VehicleDynamics 16 | import matplotlib.pyplot as plt 17 | 18 | class Solver(DynamicsConfig): 19 | """ 20 | NLP solver for nonlinear model predictive control with Casadi. 21 | """ 22 | def __init__(self): 23 | 24 | self._sol_dic = {'ipopt.print_level': 0, 'ipopt.sb': 'yes', 'print_time': 0} 25 | if self.tire_model == 'Fiala': 26 | self.X_init = [0.0, 0.0, self.psi_init, 0.0, self.u, 0.0] 27 | self.zero = [0., 0., 0., 0., 0., 0.] 28 | self.U_LOWER = [- math.pi / 9, -10] 29 | self.U_UPPER = [math.pi / 9, 10] 30 | else: 31 | self.X_init = [0.0, 0.0, 0.1, 0.0, 0.0] 32 | self.zero = [0., 0., 0., 0., 0.] 33 | self.U_LOWER = [- math.pi / 9] 34 | self.U_UPPER = [math.pi / 9] 35 | self.x_last = 0 36 | self.dynamics = VehicleDynamics() 37 | super(Solver, self).__init__() 38 | 39 | def dynamics(self,x, u): 40 | x1 = [0.0, 0.0, 0.0, 0.0, 0.0] 41 | x1[0] = x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])), 42 | x1[1] = x[1] + self.Ts * (-self.D * self.F_z1 * sin( 43 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0]) 44 | - self.D * self.F_z2 * sin( 45 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))) / self.m - self.u * x[3]), 46 | x1[2] = x[2] + self.Ts * (x[3]), 47 | x1[3] = x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin( 48 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0]) 49 | - self.b * (-self.D * self.F_z2 * sin( 50 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz), 51 | x1[4] = x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2])) 52 | 53 | return x1 54 | 55 | def openLoopMpcSolver(self, x_init, predict_steps): 56 | """ 57 | Solver of nonlinear MPC 58 | 59 | Parameters 60 | ---------- 61 | x_init: list 62 | input state for MPC. 63 | predict_steps: int 64 | steps of predict horizon. 65 | 66 | Returns 67 | ---------- 68 | state: np.array shape: [predict_steps+1, state_dimension] 69 | state trajectory of MPC in the whole predict horizon. 70 | control: np.array shape: [predict_steps, control_dimension] 71 | control signal of MPC in the whole predict horizon. 72 | """ 73 | x = SX.sym('x', self.DYNAMICS_DIM) 74 | u = SX.sym('u', self.ACTION_DIM) 75 | 76 | # discrete dynamic model 77 | self.f = vertcat( 78 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])), 79 | x[1] + self.Ts * ((-self.D * self.F_z1 * sin( 80 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0]) 81 | - self.D * self.F_z2 * sin( 82 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]), 83 | x[2] + self.Ts * (x[3]), 84 | x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin( 85 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0]) 86 | - self.b * (-self.D * self.F_z2 * sin( 87 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz), 88 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2])) 89 | ) 90 | 91 | # Create solver instance 92 | self.F = Function("F", [x, u], [self.f]) 93 | 94 | # Create empty NLP 95 | w = [] 96 | lbw = [] 97 | ubw = [] 98 | lbg = [] 99 | ubg = [] 100 | G = [] 101 | J = 0 102 | 103 | # Initial conditions 104 | Xk = MX.sym('X0', self.DYNAMICS_DIM) 105 | w += [Xk] 106 | lbw += x_init 107 | ubw += x_init 108 | 109 | for k in range(1, predict_steps + 1): 110 | # Local control 111 | Uname = 'U' + str(k - 1) 112 | Uk = MX.sym(Uname, self.ACTION_DIM) 113 | w += [Uk] 114 | lbw += self.U_LOWER 115 | ubw += self.U_UPPER 116 | 117 | Fk = self.F(Xk, Uk) 118 | Xname = 'X' + str(k) 119 | Xk = MX.sym(Xname, self.DYNAMICS_DIM) 120 | 121 | # Dynamic Constriants 122 | G += [Fk - Xk] 123 | lbg += self.zero 124 | ubg += self.zero 125 | w += [Xk] 126 | if self.tire_model == 'Fiala': 127 | lbw += [-inf, -20, -pi, -20, 50, -inf] 128 | ubw += [inf, 20, pi, 20, 0, inf] 129 | else: 130 | lbw += [-inf, -20, -pi, -20, -inf] 131 | ubw += [inf, 20, pi, 20, inf] 132 | F_cost = Function('F_cost', [x, u], [0.1 * (x[0] - self.a_curve * sin(self.k_curve * x[4])) ** 2 133 | + 0.1 * (x[2] - arctan( 134 | self.a_curve * self.k_curve * cos(self.k_curve * x[4]))) ** 2 135 | + 0.001 * u[0] ** 2]) 136 | J += F_cost(w[k * 2], w[k * 2 - 1]) 137 | 138 | # Create NLP solver 139 | nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w)) 140 | S = nlpsol('S', 'ipopt', nlp, self._sol_dic) 141 | 142 | # Solve NLP 143 | r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg) 144 | # print(r['x']) 145 | state_all = np.array(r['x']) 146 | state = np.zeros([predict_steps, self.DYNAMICS_DIM]) 147 | control = np.zeros([predict_steps, self.ACTION_DIM]) 148 | nt = self.DYNAMICS_DIM + self.ACTION_DIM # total variable per step 149 | 150 | # save trajectories 151 | for i in range(predict_steps): 152 | state[i] = state_all[nt * i: nt * i + nt - 1].reshape(-1) 153 | control[i] = state_all[nt * i + nt - 1] 154 | return state, control 155 | 156 | def mpcSolver(self, x_init, predict_steps): 157 | """ 158 | Solver of nonlinear MPC 159 | 160 | Parameters 161 | ---------- 162 | x_init: list 163 | input state for MPC. 164 | predict_steps: int 165 | steps of predict horizon. 166 | 167 | Returns 168 | ---------- 169 | state: np.array shape: [predict_steps+1, state_dimension] 170 | state trajectory of MPC in the whole predict horizon. 171 | control: np.array shape: [predict_steps, control_dimension] 172 | control signal of MPC in the whole predict horizon. 173 | """ 174 | tire_model = self.tire_model 175 | if tire_model == 'Fiala': 176 | DYNAMICS_DIM = 6 177 | ACTION_DIM = 2 178 | else: 179 | DYNAMICS_DIM = 5 180 | ACTION_DIM = 1 181 | x = SX.sym('x', DYNAMICS_DIM) 182 | u = SX.sym('u', ACTION_DIM) 183 | 184 | # Create solver instance 185 | if self.tire_model == 'Fiala': 186 | self.f_d = vertcat( 187 | x[0] + self.Ts * (x[4] * sin(x[2]) + x[1] * cos(x[2])), 188 | # y : lateral position 189 | x[1] + self.Ts * (((- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * ( 190 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / ( 191 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs( 192 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / ( 193 | 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.k2 * tan( 194 | arctan((x[1] - self.b * x[3]) / x[4])) * ( 195 | pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / ( 196 | 27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan( 197 | arctan((x[1] - self.b * x[3]) / x[4]))) / ( 198 | 3 * self.D * self.F_z2) + 1)) / self.m - x[4] * x[3]), 199 | # v_y : lateral speed 200 | x[2] + self.Ts * (x[3]), 201 | # psi : heading angle 202 | x[3] + self.Ts * ((self.a * (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * ( 203 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / ( 204 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs( 205 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / ( 206 | 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.b * ( 207 | - self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])) * ( 208 | pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / ( 209 | 27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan( 210 | arctan((x[1] - self.b * x[3]) / x[4]))) / ( 211 | 3 * self.D * self.F_z2) + 1))) / self.I_zz), 212 | # r : yaw rate 213 | x[4] + self.Ts * ( 214 | u[1] + x[2] * x[4] - (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * ( 215 | pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / ( 216 | 27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs( 217 | tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / ( 218 | 3 * self.D * self.F_z1) + 1)) * sin(u[0]) / self.m), 219 | # v_x : longitudinal speed 220 | x[5] + self.Ts * (x[4]) 221 | # x : longitudinal position 222 | ) 223 | self.F = Function("F", [x, u], [self.f_d]) 224 | elif self.tire_model == 'Pacejka': 225 | # discrete dynamic model 226 | self.f = vertcat( 227 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])), 228 | x[1] + self.Ts * ((-self.D * self.F_z1 * sin( 229 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0]) 230 | - self.D * self.F_z2 * sin( 231 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]), 232 | x[2] + self.Ts * (x[3]), 233 | x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin( 234 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0]) 235 | - self.b * (-self.D * self.F_z2 * sin( 236 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz), 237 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2])) 238 | ) 239 | # todo:retreve 240 | self.F = Function("F", [x, u], [self.f]) 241 | elif self.tire_model == 'Linear': 242 | # linear model 243 | self.f = vertcat( 244 | x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])), 245 | x[1] + self.Ts * ((-self.k1*(-u[0] + (x[1] + self.a * x[3]) / self.u) * cos(u[0]) + 246 | -self.k2*((x[1] - self.b * x[3]) / self.u)) / self.m - self.u * x[3]), 247 | x[2] + self.Ts * (x[3]), 248 | x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin( 249 | self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0]) 250 | - self.b * (-self.D * self.F_z2 * sin( 251 | self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz), 252 | x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2])) 253 | ) 254 | self.F = Function("F", [x, u], [self.f]) 255 | 256 | # Create empty NLP 257 | w = [] 258 | lbw = [] 259 | ubw = [] 260 | lbg = [] 261 | ubg = [] 262 | G = [] 263 | J = 0 264 | 265 | # Initial conditions 266 | Xk = MX.sym('X0', DYNAMICS_DIM) 267 | w += [Xk] 268 | lbw += x_init 269 | ubw += x_init 270 | 271 | for k in range(1, predict_steps + 1): 272 | # Local control 273 | Uname = 'U' + str(k - 1) 274 | Uk = MX.sym(Uname, ACTION_DIM) 275 | w += [Uk] 276 | lbw += self.U_LOWER 277 | ubw += self.U_UPPER 278 | # Gk = self.G_f(Xk,Uk) 279 | 280 | Fk = self.F(Xk, Uk) 281 | Xname = 'X' + str(k) 282 | Xk = MX.sym(Xname, DYNAMICS_DIM) 283 | 284 | 285 | # Dynamic Constriants 286 | G += [Fk - Xk] 287 | lbg += self.zero 288 | ubg += self.zero 289 | w += [Xk] 290 | if self.tire_model == 'Fiala': 291 | lbw += [-inf, -20, -pi, -20, 0, -inf] 292 | ubw += [inf, 20, pi, 20, 50, inf] 293 | else: 294 | lbw += [-inf, -20, -pi, -20, -inf] 295 | ubw += [inf, 20, pi, 20, inf] 296 | 297 | 298 | # Cost function 299 | if tire_model == 'Fiala': 300 | F_cost = Function('F_cost', [x, u], [6 * (x[0]) ** 2 301 | + 0.2 * (x[4] - self.u) ** 2 302 | + 80 * u[0] ** 2 303 | + 0.3 * u[1] ** 2]) 304 | else: 305 | F_cost = Function('F_cost', [x, u], [1 * (x[0]) ** 2 306 | + 1 * (x[2]) ** 2 307 | + 1 * u[0] ** 2]) 308 | J += F_cost(w[k * 2], w[k * 2 - 1]) 309 | # J += F_cost(w[k * 3 - 1], w[k * 3 - 2]) 310 | 311 | # Create NLP solver 312 | nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w)) 313 | S = nlpsol('S', 'ipopt', nlp, self._sol_dic) 314 | 315 | # Solve NLP 316 | r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg) 317 | # print(r['x']) 318 | state_all = np.array(r['x']) 319 | state = np.zeros([predict_steps, DYNAMICS_DIM]) 320 | control = np.zeros([predict_steps, ACTION_DIM]) 321 | nt = DYNAMICS_DIM + ACTION_DIM # total variable per step 322 | 323 | # save trajectories 324 | for i in range(predict_steps): 325 | state[i] = state_all[nt * i: nt * i + DYNAMICS_DIM].reshape(-1) 326 | control[i] = state_all[nt * i + DYNAMICS_DIM: nt * (i + 1)].reshape(-1) 327 | return state, control 328 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import os 4 | from matplotlib import pyplot as plt 5 | from config import GeneralConfig, DynamicsConfig 6 | from dynamics import VehicleDynamics 7 | from utils import step_relative 8 | 9 | 10 | class Train(DynamicsConfig): 11 | def __init__(self): 12 | super(Train, self).__init__() 13 | 14 | self.agent_batch = torch.empty([self.BATCH_SIZE, self.DYNAMICS_DIM]) 15 | self.state_batch = torch.empty([self.BATCH_SIZE, self.STATE_DIM]) 16 | self.init_index = np.ones([self.BATCH_SIZE, 1]) 17 | 18 | self.x_forward = [] 19 | self.u_forward = [] 20 | self.L_forward = [] 21 | self.iteration_index = 0 22 | 23 | self.value_loss = np.empty([0, 1]) 24 | self.policy_loss = np.empty([0, 1]) 25 | self.dynamics = VehicleDynamics() 26 | self.equilibrium_state = torch.tensor([[0.0, 0.0, 0.0, 0.0]]) 27 | 28 | for i in range(self.FORWARD_STEP): 29 | self.u_forward.append([]) 30 | self.L_forward.append([]) 31 | for i in range(self.FORWARD_STEP+1): 32 | self.x_forward.append([]) 33 | self.initialize_state() 34 | 35 | def initialize_state(self): 36 | self.state_batch[:, 0] = torch.normal(0.0, 0.3, [self.BATCH_SIZE, ]) 37 | self.state_batch[:, 1] = torch.normal(0.0, 0.2, [self.BATCH_SIZE, ]) 38 | self.state_batch[:, 2] = torch.normal(0.0, 0.1, [self.BATCH_SIZE, ]) 39 | self.state_batch[:, 3] = torch.normal(0.0, 0.06, [self.BATCH_SIZE, ]) 40 | self.agent_batch[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE) 41 | init_ref = self.dynamics.reference_trajectory(self.agent_batch[:, 4]) 42 | self.agent_batch[:, 0:4] = self.state_batch + init_ref 43 | self.init_state = self.agent_batch 44 | 45 | def check_done(self, state): 46 | """ 47 | Check if the states reach unreasonable zone and reset them 48 | Parameters 49 | ---------- 50 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 51 | state used for checking. 52 | Returns 53 | ------- 54 | 55 | """ 56 | threshold = np.kron(np.ones([self.BATCH_SIZE, 1]), np.array([self.y_range, self.psi_range])) 57 | threshold = np.array(threshold, dtype='float32') 58 | threshold = torch.from_numpy(threshold) 59 | ref_state = self.dynamics.reference_trajectory(state[:, -1]) 60 | state = state[:, 0:4] - ref_state 61 | check_state = state[:, [0, 2]].clone() 62 | check_state.detach_() 63 | sign_error = torch.sign(torch.abs(check_state) - threshold) # if abs state is over threshold, sign_error = 1 64 | self._reset_index, _ = torch.max(sign_error, 1) # if one state is over threshold, _reset_index = 1 65 | if self.iteration_index == self.RESET_ITERATION: 66 | self._reset_index = torch.from_numpy(np.ones([self.BATCH_SIZE,],dtype='float32')) 67 | self.iteration_index = 0 68 | print('AGENT RESET') 69 | reset_state = self._reset_state(self.agent_batch) 70 | return reset_state 71 | 72 | def _reset_state(self, state): 73 | """ 74 | reset state to initial state. 75 | Parameters 76 | ---------- 77 | state: tensor shape: [BATCH_SIZE, STATE_DIMENSION] 78 | state used for checking. 79 | 80 | Returns 81 | ------- 82 | state: state after reset. 83 | 84 | """ 85 | for i in range(self.BATCH_SIZE): 86 | if self._reset_index[i] == 1: 87 | state[i, :] = self.init_state[i, :] 88 | return state 89 | 90 | def update_state(self, policy, dynamics): 91 | """ 92 | Update state using policy net and dynamics model. 93 | Parameters 94 | ---------- 95 | policy: nn.Module 96 | policy net. 97 | dynamics: object dynamics. 98 | 99 | """ 100 | self.agent_batch = self.check_done(self.agent_batch) 101 | self.agent_batch.detach_() 102 | ref_trajectory = dynamics.reference_trajectory(self.agent_batch[:, -1]) 103 | self.state_batch = self.agent_batch[:, 0:4] - ref_trajectory 104 | control = policy.forward(self.state_batch) 105 | self.agent_batch, self.state_batch = dynamics.step_relative(self.agent_batch, control) 106 | self.iteration_index += 1 107 | 108 | def policy_evaluation(self, policy, value, dynamics): 109 | """ 110 | Do n-step look-ahead policy evaluation. 111 | Parameters 112 | ---------- 113 | policy: policy net 114 | value: value net 115 | dynamics: object dynamics 116 | 117 | """ 118 | for i in range(self.FORWARD_STEP): 119 | if i == 0: 120 | self.x_forward[i] = self.agent_batch.detach() # 要存agent batch是因为step relative要用agent 121 | reference = dynamics.reference_trajectory(self.agent_batch[:,-1]) 122 | self.state_batch = dynamics.relative_state(self.x_forward[i]) 123 | self.u_forward[i] = policy.forward(self.state_batch) 124 | self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i], self.u_forward[i]) 125 | ref_state_next = self.x_forward[i + 1][:, 0:4] - reference 126 | self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i]) 127 | else: 128 | ref_state = self.x_forward[i][:, 0:4] - reference 129 | self.u_forward[i] = policy.forward(ref_state) 130 | self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i], 131 | self.u_forward[i]) 132 | ref_state_next = self.x_forward[i + 1][:, 0:4] - reference 133 | self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i]) 134 | self.agent_batch_next = self.x_forward[-1] 135 | self.state_batch_next = self.agent_batch_next[:, 0:4] - reference 136 | self.value_next = value.forward(self.state_batch_next) 137 | self.utility = torch.zeros([self.FORWARD_STEP, self.BATCH_SIZE], dtype=torch.float32) 138 | for i in range(self.FORWARD_STEP): 139 | self.utility[i] = self.L_forward[i].clone() 140 | self.sum_utility = torch.sum(self.utility,0) 141 | target_value = self.sum_utility.detach() + self.value_next.detach() 142 | value_now = value.forward(self.state_batch) 143 | value_equilibrium = value.forward(self.equilibrium_state) 144 | value_loss = 1 / 2 * torch.mean(torch.pow((target_value - value_now), 2)) \ 145 | + 10 * torch.pow(value_equilibrium, 2) 146 | self.state_batch.requires_grad_(False) 147 | value.zero_grad() 148 | value_loss.backward() 149 | torch.nn.utils.clip_grad_norm_(value.parameters(), 10.0) 150 | value.opt.step() 151 | value.scheduler.step() 152 | self.value_loss = np.append(self.value_loss, value_loss.detach().numpy()) 153 | return value_loss.detach().numpy() 154 | 155 | def policy_improvement(self, policy, value): 156 | """ 157 | Do n-step look-ahead policy improvement. 158 | Parameters 159 | ---------- 160 | policy: policy net 161 | value: value net 162 | 163 | """ 164 | self.value_next = value.forward(self.state_batch_next) 165 | policy_loss = torch.mean(self.sum_utility + self.value_next) # Hamilton 166 | #for i in range(1): 167 | policy.zero_grad() 168 | policy_loss.backward() 169 | torch.nn.utils.clip_grad_norm_(policy.parameters(), 10.0) 170 | policy.opt.step() 171 | policy.scheduler.step() 172 | self.policy_loss = np.append(self.policy_loss, policy_loss.detach().numpy()) 173 | return policy_loss.detach().numpy() 174 | 175 | def save_data(self, log_dir): 176 | """ 177 | Save loss data. 178 | Parameters 179 | ---------- 180 | log_dir: str 181 | directory in ./Results_dir. 182 | 183 | Returns 184 | ------- 185 | 186 | """ 187 | np.savetxt(os.path.join(log_dir, "value_loss.txt"), self.value_loss) 188 | np.savetxt(os.path.join(log_dir, "policy_loss.txt"), self.policy_loss) 189 | 190 | def print_loss_figure(self, iteration, log_dir): 191 | """ 192 | print figure of loss decent. 193 | Parameters 194 | ---------- 195 | iteration: int 196 | number of iterations. 197 | log_dir: str 198 | directory in ./Results_dir. 199 | 200 | Returns 201 | ------- 202 | 203 | """ 204 | plt.figure() 205 | plt.scatter(range(iteration), np.log10(self.value_loss), c='r', marker=".", s=5., label="policy evaluation") 206 | plt.scatter(range(iteration), np.log10(self.policy_loss), c='b', marker=".", s=5., label="policy improvement") 207 | plt.legend(loc='upper right') 208 | plt.xlabel('iteration') 209 | plt.ylabel('loss') 210 | plt.savefig(os.path.join(log_dir, "loss.png")) 211 | 212 | -------------------------------------------------------------------------------- /trained_results/2020-10-09-14-42-10000/actor.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/actor.pth -------------------------------------------------------------------------------- /trained_results/2020-10-09-14-42-10000/critic.pth: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/critic.pth -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from matplotlib import pyplot as plt 2 | import matplotlib.colors as mcolors 3 | from itertools import cycle 4 | from config import PlotConfig, DynamicsConfig 5 | import numpy as np 6 | import torch 7 | 8 | def cm2inch(*tupl): 9 | inch = 2.54 10 | if isinstance(tupl[0], tuple): 11 | return tuple(i/inch for i in tupl[0]) 12 | else: 13 | return tuple(i/inch for i in tupl) 14 | 15 | def smooth(data, a=0.5): 16 | data = np.array(data).reshape(-1, 1) 17 | for ind in range(data.shape[0] - 1): 18 | data[ind + 1, 0] = data[ind, 0] * (1-a) + data[ind + 1, 0] * a 19 | return data 20 | 21 | def numpy2torch(input, size): 22 | """ 23 | 24 | Parameters 25 | ---------- 26 | input 27 | 28 | Returns 29 | ------- 30 | 31 | """ 32 | u = np.array(input, dtype='float32').reshape(size) 33 | return torch.from_numpy(u) 34 | 35 | def step_relative(statemodel, state, u): 36 | """ 37 | 38 | Parameters 39 | ---------- 40 | state_r 41 | u_r 42 | 43 | Returns 44 | ------- 45 | 46 | """ 47 | x_ref = statemodel.reference_trajectory(state[:, -1]) 48 | state_r = state.detach().clone() # relative state 49 | state_r[:, 0:4] = state_r[:, 0:4] - x_ref 50 | state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = statemodel.step(state, u) 51 | state_r_next_bias, _, _, _, _, _, _ = statemodel.step(state_r, u) # update by relative value 52 | state_r_next = state_r_next_bias.detach().clone() 53 | state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]] # y psi with reference update by absolute value 54 | x_ref_next = statemodel.reference_trajectory(state_next[:, -1]) 55 | state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next 56 | return state_next.clone().detach(), state_r_next.clone().detach(), x_ref.detach().clone() 57 | 58 | def recover_absolute_state(state_r_predict, x_ref, length=None): 59 | if length is None: 60 | length = state_r_predict.shape[0] 61 | c = DynamicsConfig() 62 | ref_predict = [x_ref] 63 | for i in range(length-1): 64 | ref_t = np.copy(ref_predict[-1]) 65 | # ref_t[0] += c.u * c.Ts * np.tan(x_ref[2]) 66 | ref_predict.append(ref_t) 67 | state = state_r_predict[:, 0:4] + ref_predict 68 | return state, np.array(ref_predict) 69 | 70 | def idplot(data, 71 | figure_num=1, 72 | mode="xy", 73 | fname=None, 74 | xlabel=None, 75 | ylabel=None, 76 | legend=None, 77 | legend_loc="best", 78 | color_list=None, 79 | xlim=None, 80 | ylim=None, 81 | ncol=1, 82 | figsize_scalar=1): 83 | """ 84 | plot figures 85 | """ 86 | if (color_list is None) or len(color_list) < figure_num: 87 | tableau_colors = cycle(mcolors.TABLEAU_COLORS) 88 | color_list = [next(tableau_colors) for _ in range(figure_num)] 89 | 90 | l = 5 91 | fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar) 92 | _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi) 93 | if figure_num == 1: 94 | data = [data] 95 | 96 | if color_list is not None: 97 | for (i, d) in enumerate(data): 98 | if mode == "xy": 99 | if i == l - 2: 100 | plt.plot(d[0], d[1], linestyle='-.', color=color_list[i]) 101 | elif i == l - 1: 102 | plt.plot(d[0], d[1], linestyle='dotted', color=color_list[i]) 103 | else: 104 | plt.plot(d[0], d[1], color=color_list[i]) 105 | if mode == "y": 106 | plt.plot(d, color=color_list[i]) 107 | if mode == "scatter": 108 | plt.scatter(d[0], d[1], color=color_list[i], marker=".", s =5.,) 109 | else: 110 | for (i, d) in enumerate(data): 111 | if mode == "xy": 112 | if i == l - 2: 113 | plt.plot(d[0], d[1], linestyle='-.') 114 | elif i == l - 1: 115 | plt.plot(d[0], d[1], linestyle='dotted') 116 | else: 117 | plt.plot(d[0], d[1]) 118 | if mode == "y": 119 | plt.plot(d) 120 | if mode == "scatter": 121 | plt.scatter(d[0], d[1], marker=".", s =5.,) 122 | 123 | plt.tick_params(labelsize=PlotConfig.tick_size) 124 | labels = ax.get_xticklabels() + ax.get_yticklabels() 125 | [label.set_fontname(PlotConfig.tick_label_font) for label in labels] 126 | if legend is not None: 127 | plt.legend(legend, loc=legend_loc, ncol=ncol, prop=PlotConfig.legend_font) 128 | plt.xlabel(xlabel, PlotConfig.label_font) 129 | plt.ylabel(ylabel, PlotConfig.label_font) 130 | if xlim is not None: 131 | plt.xlim(xlim) 132 | if ylim is not None: 133 | plt.ylim(ylim) 134 | plt.tight_layout(pad=PlotConfig.pad) 135 | 136 | if fname is None: 137 | plt.show() 138 | else: 139 | plt.savefig(fname) 140 | 141 | 142 | -------------------------------------------------------------------------------- /utils/iDplot.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/iDplot.zip -------------------------------------------------------------------------------- /utils/road.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/road.png --------------------------------------------------------------------------------