├── .gitignore
├── LICENSE
├── README.md
├── config.py
├── dynamics.py
├── main.py
├── network.py
├── plot.py
├── requirements.txt
├── simulation.py
├── solver.py
├── train.py
├── trained_results
    └── 2020-10-09-14-42-10000
    │   ├── actor.pth
    │   ├── critic.pth
    │   ├── policy_loss.txt
    │   └── value_loss.txt
├── utils.py
└── utils
    ├── iDplot.zip
    └── road.png


/.gitignore:
--------------------------------------------------------------------------------
1 | /docs/
2 | /.idea/
3 | /Results_dir/
4 | /Simulation_dir/
5 | /ref/
6 | /PLOT.rar/
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [2020] [Haitong Ma]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Vehicle Tracking Control
 2 | 
 3 | - Code demo for Chpater 8, Reinforcement Learning and Control.
 4 | 
 5 | - Methods: Approximate Dynamic Programming, Model Predictive Control
 6 | 
 7 | <div align=center>
 8 | <img src="utils/road.png" width = 50%/>
 9 | </div>
10 | 
11 | ## Requirements
12 | 
13 | [PyTorch](https://pytorch.org/get-started/previous-versions/)  1.4.0
14 | 
15 | [CasADi](https://web.casadi.org/get/)
16 | 
17 | 
18 | ## Getting Started
19 | 
20 | - To train an agent, follow the example code in `main.py` and tune the parameters. Change `METHODS` variable for adjusting the methods to compare in simulation stage.
21 | - Simulations will automatically executed after the training is finished. To separately start a simulation from a trained results and compare the performance between ADP and MPC, run `simulation.py`. Change `LOG_DIR` variable to set the loaded results.
22 | 
23 | ## Directory Structure
24 | 
25 | ```
26 | Approximate-Dynamic-Programming
27 | │  main.py - Main script
28 | │  plot.py - To plot comparison between ADP and MPC
29 | │  train.py - To execute PEV and PIM
30 | │  dynamics.py - Vehicle model
31 | │  network.py - Network structure
32 | │  solver.py - Solvers for MPC using CasADi
33 | │  config.py - Configurations about training and vehicle model
34 | │  simulation.py - Run experiment to compare ADP and MPC
35 | │  readme.md
36 | │  requirements.txt
37 | │
38 | ├─Results_dir - store trained results
39 | │     
40 | └─Simulation_dir - store simulation data and plots
41 | 
42 | ```
43 | ## Related Books and Papers
44 | [Reinforcement Learning and Control. Tsinghua University
45 | Lecture Notes, 2020.](http://www.idlab-tsinghua.com/thulab/labweb/publications.html?typeId=3&_types)
46 | 
47 | [CasADi: a software framework for nonlinear optimization and optimal control](https://link.springer.com/article/10.1007/s12532-018-0139-4)
48 | 
49 | 
50 | 
51 | 


--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | class GeneralConfig(object):
 3 |     BATCH_SIZE = 256
 4 |     DYNAMICS_DIM = 5
 5 |     STATE_DIM = 4
 6 |     ACTION_DIM = 1
 7 |     BUFFER_SIZE = 5000
 8 |     FORWARD_STEP = 20
 9 |     GAMMA_D = 1
10 |     RESET_ITERATION = 10000
11 | 
12 |     NP = 50
13 |     NP_TOTAL = 500
14 | 
15 |     SIMULATION_STEPS = 500
16 | 
17 | 
18 | class DynamicsConfig(GeneralConfig):
19 | 
20 |     nonlinearity = True
21 |     tire_model = 'Pacejka'    # Fiala, Pacejka, Linear
22 |     reference_traj = 'SIN'
23 | 
24 |     a = 1.14       # distance c.g.to front axle(m)
25 |     L = 2.54       # wheel base(m)
26 |     b = L - a      # distance c.g.to rear axle(m)
27 |     m = 1500.      # mass(kg)
28 |     I_zz = 2420.0  # yaw moment of inertia(kg * m ^ 2)
29 |     C = 1.43       # parameter in Pacejka tire model
30 |     B = 14.        # parameter in Pacejka tire model
31 |     u = 15         # longitudinal velocity(m / s)
32 |     g = 9.81
33 |     D = 0.75
34 |     k1 = 88000    # front axle cornering stiffness for linear model (N / rad)
35 |     k2 = 94000    # rear axle cornering stiffness for linear model (N / rad)
36 |     Is = 1.        # steering ratio
37 |     Ts = 0.05      # control signal period
38 |     N = 314        # total simulation steps
39 | 
40 |     F_z1 = m * g * b / L    # Vertical force on front axle
41 |     F_z2 = m * g * a / L    # Vertical force on rear axle
42 | 
43 |     k_curve = 1/30          # curve shape of a * sin(kx)
44 |     a_curve = 1           # curve shape of a * sin(kx)
45 |     psi_init = a_curve * k_curve # initial position of psi
46 | 
47 |     # ADP reset state range
48 |     y_range = 5
49 |     psi_range = 1.3
50 |     beta_range = 1.0
51 | 
52 | class PlotConfig(object):
53 |     fig_size = (8.5, 6.5)
54 |     dpi = 300
55 |     pad = 0.2
56 |     tick_size = 8
57 |     legend_font = {'family': 'Times New Roman', 'size': '8', 'weight': 'normal'}
58 |     label_font = {'family': 'Times New Roman', 'size': '9', 'weight': 'normal'}
59 |     tick_label_font = 'Times New Roman'
60 | 


--------------------------------------------------------------------------------
/dynamics.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import torch
  3 | import numpy as np
  4 | from config import DynamicsConfig
  5 | import matplotlib.pyplot as plt
  6 | import math
  7 | 
  8 | PI = 3.1415926
  9 | 
 10 | 
 11 | class VehicleDynamics(DynamicsConfig):
 12 | 
 13 |     def __init__(self):
 14 |         self._state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM])
 15 |         self.init_state = torch.zeros([self.BATCH_SIZE, self.DYNAMICS_DIM])
 16 |         self._reset_index = np.zeros([self.BATCH_SIZE, 1])
 17 |         self.initialize_state()
 18 |         super(VehicleDynamics, self).__init__()
 19 | 
 20 |     def initialize_state(self):
 21 |         """
 22 |         random initialization of state.
 23 | 
 24 |         Returns
 25 |         -------
 26 | 
 27 |         """
 28 |         self.init_state[:, 0] = torch.normal(0.0, 0.6, [self.BATCH_SIZE,])
 29 |         self.init_state[:, 1] = torch.normal(0.0, 0.4, [self.BATCH_SIZE,])
 30 |         self.init_state[:, 2] = torch.normal(0.0, 0.15, [self.BATCH_SIZE,])
 31 |         self.init_state[:, 3] = torch.normal(0.0, 0.1, [self.BATCH_SIZE,])
 32 |         self.init_state[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE)
 33 |         init_ref = self.reference_trajectory(self.init_state[:, 4])
 34 |         init_ref_all = torch.cat((init_ref, torch.zeros([self.BATCH_SIZE,1])),1)
 35 |         self._state = self.init_state
 36 |         init_state = self.init_state + init_ref_all
 37 |         return init_state
 38 | 
 39 |     def relative_state(self, state):
 40 |         x_ref = self.reference_trajectory(state[:, -1])
 41 |         state_r = state.detach().clone()[:, 0:4] - x_ref  # relative state # todo:修改所有相对坐标更新
 42 |         return state_r
 43 | 
 44 |     def _state_function(self, state, control):
 45 |         """
 46 |         State function of vehicle with Pacejka tire model, i.e. \dot(x)=f(x,u)
 47 |         Parameters
 48 |         ----------
 49 |         state: tensor   shape: [BATCH_SIZE, STATE_DIMENSION]
 50 |             current state
 51 |         control: tensor   shape: [BATCH_SIZE, ACTION_DIMENSION]
 52 |             input
 53 |         Returns
 54 |         -------
 55 |         deri_state.T:   tensor shape: [BATCH_SIZE, ]
 56 |             f(x,u)
 57 |         F_y1:           tensor shape: [BATCH_SIZE, ]
 58 |             front axle lateral force
 59 |         F_y2:           tensor shape: [BATCH_SIZE, ]
 60 |             rear axle lateral force
 61 |         alpha_1:        tensor shape: [BATCH_SIZE, ]
 62 |             front wheel slip angle
 63 |         alpha_2:        tensor shape: [BATCH_SIZE, ]
 64 |             rear wheel slip angle
 65 | 
 66 |         """
 67 | 
 68 |         # state variable
 69 |         y = state[:, 0]                 # lateral position
 70 |         u_lateral = state[:, 1]         # lateral speed
 71 |         beta = u_lateral / self.u       # yaw angle
 72 |         psi = state[:, 2]               # heading angle
 73 |         omega_r = state[:, 3]           # yaw rate
 74 |         x = state[:, 4]                 # longitudinal position
 75 | 
 76 |         # inputs
 77 |         delta = control[:, 0]           # front wheel steering angle
 78 |         delta.requires_grad_(True)
 79 | 
 80 |         # slip angle of front and rear wheels
 81 |         alpha_1 = -delta + beta + self.a * omega_r / self.u
 82 |         alpha_2 = beta - self.b * omega_r / self.u
 83 | 
 84 |         # cornering force of front and rear angle, Pacejka tire model
 85 |         F_y1 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_1)) * self.F_z1
 86 |         F_y2 = -self.D * torch.sin(self.C * torch.atan(self.B * alpha_2)) * self.F_z2
 87 | 
 88 |         # derivative of state
 89 |         deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi)
 90 |         deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r
 91 |         deri_psi = omega_r
 92 |         deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz
 93 |         deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi)
 94 | 
 95 |         deri_state = torch.cat((deri_y[np.newaxis, :],
 96 |                                 deri_u_lat[np.newaxis, :],
 97 |                                 deri_psi[np.newaxis, :],
 98 |                                 deri_omega_r[np.newaxis, :],
 99 |                                 deri_x[np.newaxis, :]), 0)
100 | 
101 |         return deri_state.T, F_y1, F_y2, alpha_1, alpha_2
102 | 
103 |     def _state_function_linear(self, state, control):
104 |         """
105 |         State function of vehicle with linear tire model and linear approximation, i.e. \dot(x) = Ax + Bu
106 |         Parameters
107 |         ----------
108 |         state: tensor   shape: [BATCH_SIZE, STATE_DIMENSION]
109 |             current state
110 |         control: tensor   shape: [BATCH_SIZE, ACTION_DIMENSION]
111 |             input
112 |         Returns
113 |         -------
114 |         deri_state.T:   tensor shape: [BATCH_SIZE, ]
115 |             f(x,u)
116 |         F_y1:           tensor shape: [BATCH_SIZE, ]
117 |             front axle lateral force
118 |         F_y2:           tensor shape: [BATCH_SIZE, ]
119 |             rear axle lateral force
120 |         alpha_1:        tensor shape: [BATCH_SIZE, ]
121 |             front wheel slip angle
122 |         alpha_2:        tensor shape: [BATCH_SIZE, ]
123 |             rear wheel slip angle
124 | 
125 |         """
126 | 
127 |         # state variable
128 |         y = state[:, 0]                 # lateral position
129 |         u_lateral = state[:, 1]         # lateral speed
130 |         beta = u_lateral / self.u       # yaw angle
131 |         psi = state[:, 2]               # heading angle
132 |         omega_r = state[:, 3]           # yaw rate
133 |         x = state[:, 4]                 # longitudinal position
134 | 
135 |         # inputs
136 |         delta = control[:, 0]           # front wheel steering angle
137 |         delta.requires_grad_(True)
138 | 
139 |         # slip angle of front and rear wheels, with small angle approximation
140 |         alpha_1 = -delta + beta + self.a * omega_r / self.u
141 |         alpha_2 = beta - self.b * omega_r / self.u
142 | 
143 |         # cornering force of front and rear angle, linear tire model
144 |         F_y1 = - self.k1 * alpha_1
145 |         F_y2 = - self.k2 * alpha_2
146 | 
147 |         # derivative of state
148 |         # deri_y = self.u * psi + u_lateral
149 |         deri_y = self.u * torch.sin(psi) + u_lateral * torch.cos(psi)
150 |         deri_u_lat = (torch.mul(F_y1, torch.cos(delta)) + F_y2) / (self.m) - self.u * omega_r
151 |         deri_psi = omega_r
152 |         deri_omega_r = (torch.mul(self.a * F_y1, torch.cos(delta)) - self.b * F_y2) / self.I_zz
153 |         deri_x = self.u * torch.cos(psi) - u_lateral * torch.sin(psi)
154 | 
155 |         deri_state = torch.cat((deri_y[np.newaxis, :],
156 |                                 deri_u_lat[np.newaxis, :],
157 |                                 deri_psi[np.newaxis, :],
158 |                                 deri_omega_r[np.newaxis, :],
159 |                                 deri_x[np.newaxis, :]), 0)
160 | 
161 |         return deri_state.T, F_y1, F_y2, alpha_1, alpha_2
162 | 
163 |     def reference_trajectory(self, state):
164 |         """
165 | 
166 |         Parameters
167 |         ----------
168 |         state               shape: [BATCH_SIZE,]       longitudinal location x
169 | 
170 |         Returns
171 |         -------
172 |         state_ref.T:        shape: [BATCH_SIZE, 4]      reference trajectory
173 | 
174 |         """
175 | 
176 |         if self.reference_traj == 'SIN':
177 |             k = self.k_curve
178 |             a = self.a_curve
179 |             y_ref = a * torch.sin(k * state)
180 |             psi_ref = torch.atan(a * k * torch.cos(k * state))
181 |         elif self.reference_traj == 'DLC':
182 |             width = 3.5
183 |             line1 = 50
184 |             straight = 50
185 |             cycle = 3 * straight + 2 * line1
186 |             x = state % cycle
187 |             lane_position = torch.zeros([len(state), ])
188 |             lane_angle = torch.zeros([len(state), ])
189 |             for i in range(len(state)):
190 |                 if x[i] <= 50:
191 |                     lane_position[i] = 0
192 |                     lane_angle[i] = 0
193 |                 elif 50 < x[i] and x[i] <= 90:
194 |                     lane_position[i] = 3.5 / 40 * x[i] - 4.375
195 |                     lane_angle[i] = np.arctan(3.5 / 40)
196 |                 elif 90 < x[i] and x[i] <= 140:
197 |                     lane_position[i] = 3.5
198 |                     lane_angle[i] = 0
199 |                 elif x[i] > 180:
200 |                     lane_position[i] = 0
201 |                     lane_angle[i] = 0
202 |                 elif 140 < x[i] and x[i] <= 180:
203 |                     lane_position[i] = -3.5 / 40 * x[i] + 15.75
204 |                     lane_angle[i] = -np.arctan(3.5 / 40)
205 |                 else:
206 |                     lane_position[i] = 0.
207 |                     lane_angle[i] = 0.
208 | 
209 |             y_ref = lane_position
210 |             psi_ref = lane_angle
211 | 
212 |         zeros = torch.zeros([len(state), ])
213 |         state_ref = torch.cat((y_ref[np.newaxis, :],
214 |                                 zeros[np.newaxis, :],
215 |                                 psi_ref[np.newaxis, :],
216 |                                 zeros[np.newaxis, :]), 0)
217 |         return state_ref.T
218 | 
219 | 
220 |     def step(self, state, control):
221 |         """
222 |         step ahead with discrete state function, i.e. x'=f(x,u)
223 |         Parameters
224 |         ----------
225 |         state: tensor   shape: [BATCH_SIZE, STATE_DIMENSION]
226 |             current state
227 |         control: tensor   shape: [BATCH_SIZE, ACTION_DIMENSION]
228 |             current control signal
229 | 
230 |         Returns
231 |         -------
232 |         state_next:     tensor shape: [BATCH_SIZE, ]
233 |             x'
234 |         f_xu:           tensor shape: [BATCH_SIZE, ]
235 |             f(x,u)
236 |         utility:        tensor shape: [BATCH_SIZE, ]
237 |         utility, i.e. l(x,u)
238 |         F_y1:           tensor shape: [BATCH_SIZE, ]
239 |             front axle lateral force
240 |         F_y2:           tensor shape: [BATCH_SIZE, ]
241 |             rear axle lateral force
242 |         alpha_1:        tensor shape: [BATCH_SIZE, ]
243 |             front wheel slip angle
244 |         alpha_2:        tensor shape: [BATCH_SIZE, ]
245 |             rear wheel slip angle
246 | 
247 |         """
248 |         if self.nonlinearity:
249 |             deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function(state, control)
250 |         else:
251 |             deri_state, F_y1, F_y2, alpha_1, alpha_2 = self._state_function_linear(state, control)
252 |         state_next = state + self.Ts * deri_state
253 |         utility = self.utility(state, control)
254 |         f_xu = deri_state[:, 0:4]
255 |         return state_next, f_xu, utility, F_y1, F_y2, alpha_1, alpha_2
256 | 
257 |     def step_relative(self, state, u):
258 |         """
259 | 
260 |         Parameters
261 |         ----------
262 |         state_r
263 |         u_r
264 | 
265 |         Returns
266 |         -------
267 | 
268 |         """
269 |         x_ref = self.reference_trajectory(state[:, -1])
270 |         state_r = state.detach().clone()  # relative state
271 |         state_r[:, 0:4] = state_r[:, 0:4] - x_ref
272 |         state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = self.step(state, u)
273 |         state_r_next_bias, _, _, _, _, _, _ = self.step(state_r, u)  # update by relative value
274 |         state_r_next = state_r_next_bias.detach().clone()
275 |         state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]]  # y psi with reference update by absolute value
276 |         x_ref_next = self.reference_trajectory(state_next[:, -1])
277 |         state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next
278 |         utility = self.utility(state_r_next, u)
279 |         return state_next.clone().detach(), state_r_next.clone().detach()
280 | 
281 |     @staticmethod
282 |     def utility(state, control):
283 |         """
284 | 
285 |         Parameters
286 |         ----------
287 |         state: tensor       shape: [BATCH_SIZE, STATE_DIMENSION]
288 |             current state
289 |         control: tensor     shape: [BATCH_SIZE, ACTION_DIMENSION]
290 |             current control signal
291 | 
292 |         Returns
293 |         -------
294 |         utility: tensor   shape: [BATCH_SIZE, ]
295 |             utility, i.e. l(x,u)
296 |         """
297 |         utility = 0.05 * (10 * torch.pow(state[:, 0], 2) + 5 * torch.pow(state[:, 2], 2) + 5 * torch.pow(control[:, 0], 2))
298 |         return utility


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     <Reinforcement Learning and Control>(Year 2020)
 3 |     by Shengbo Eben Li
 4 |         @ Intelligent Driving Lab, Tsinghua University
 5 | 
 6 |     ADP example for lane keeping problem in a curve road
 7 | 
 8 |     [Method]
 9 |     Approximate dynamic programming with structured policy
10 | 
11 | """
12 | import dynamics
13 | import numpy as np
14 | import torch
15 | import os
16 | from network import Actor, Critic
17 | from train import Train
18 | from datetime import datetime
19 | from simulation import simulation
20 | from config import GeneralConfig
21 | 
22 | 
23 | # Parameters
24 | intro = 'DEMO OF CHPATER 8,  REINFORCEMENT LEARNING AND CONTROL\n'+ \
25 |                 'APPROXIMATE DYNAMIC PROGRMMING FOR LANE KEEPING TASK \n'
26 | print(intro)
27 | METHODS = ['MPC-10', # MPC-"prediction steps of MPC",
28 |            'ADP',    # Approximate dynamic programming,
29 |            'OP']     # Open-loop
30 | MAX_ITERATION = 1000        # max iterations
31 | LR_P = 6e-4                  # learning rate of policy net
32 | LR_V = 6e-3                  # learning rate of value net
33 | 
34 | # tasks
35 | TRAIN_FLAG = 1
36 | LOAD_PARA_FLAG = 0
37 | SIMULATION_FLAG = 1
38 | 
39 | # Set random seed
40 | np.random.seed(0)
41 | torch.manual_seed(0)
42 | 
43 | # initialize policy and value net, model of vehicle dynamics
44 | config = GeneralConfig()
45 | policy = Actor(config.STATE_DIM, config.ACTION_DIM, lr=LR_P)
46 | value = Critic(config.STATE_DIM, 1, lr=LR_V)
47 | vehicleDynamics = dynamics.VehicleDynamics()
48 | state_batch = vehicleDynamics.initialize_state()
49 | 
50 | # Training
51 | iteration_index = 0
52 | if LOAD_PARA_FLAG == 1:
53 |     print("********************************* LOAD PARAMETERS *********************************")
54 |     # load pre-trained parameters
55 |     load_dir = "./Results_dir/2020-10-09-14-42-10000"
56 |     policy.load_parameters(load_dir)
57 |     value.load_parameters(load_dir)
58 | 
59 | if TRAIN_FLAG == 1:
60 |     print_iters = 10
61 |     print("********************************** START TRAINING **********************************")
62 |     print("************************** PRINT LOSS EVERY "+ str(print_iters) + "iterations ***************************")
63 |     # train the network by policy iteration
64 |     train = Train()
65 | 
66 |     while True:
67 |         train.update_state(policy, vehicleDynamics)
68 |         value_loss = train.policy_evaluation(policy, value, vehicleDynamics)
69 |         policy_loss = train.policy_improvement(policy, value)
70 |         iteration_index += 1
71 | 
72 |         # print train information
73 |         if iteration_index % print_iters == 0:
74 |             log_trace = "iteration:{:3d} | "\
75 |                         "policy_loss:{:3.3f} | " \
76 |                         "value_loss:{:3.3f}".format(iteration_index, float(policy_loss), float(value_loss))
77 |             print(log_trace)
78 | 
79 |         # save parameters, run simulation and plot figures
80 |         if iteration_index == MAX_ITERATION:
81 |             # ==================== Set log path ====================
82 |             #
83 |             log_dir = "./Results_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M-" + str(iteration_index))
84 |             os.makedirs(log_dir, exist_ok=True)
85 |             value.save_parameters(log_dir)
86 |             policy.save_parameters(log_dir)
87 |             train.print_loss_figure(MAX_ITERATION, log_dir)
88 |             train.save_data(log_dir)
89 |             break
90 | 
91 | if SIMULATION_FLAG == 1:
92 |     print("********************************* START SIMULATION *********************************")
93 |     simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M")
94 |     os.makedirs(simu_dir, exist_ok=True)
95 |     if TRAIN_FLAG == 0:
96 |         simulation(METHODS, load_dir, simu_dir)
97 |     else:
98 |         simulation(METHODS, log_dir, simu_dir)


--------------------------------------------------------------------------------
/network.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import torch
  4 | import torch.nn as nn
  5 | from torch.nn import init
  6 | 
  7 | PI = np.pi
  8 | 
  9 | class Actor(nn.Module):
 10 |     def __init__(self, input_size, output_size, order=1, lr=0.001):
 11 |         super(Actor, self).__init__()
 12 | 
 13 |         # parameters
 14 |         self._out_gain = PI / 9
 15 |         # self._norm_matrix = 0.1 * torch.tensor([2, 1, 10, 10], dtype=torch.float32)
 16 |         self._norm_matrix = 0.1 * torch.tensor([1, 1, 1, 1], dtype=torch.float32)
 17 | 
 18 |         # initial NNs
 19 |         self.layers = nn.Sequential(
 20 |             nn.Linear(input_size, 256),
 21 |             nn.ELU(),
 22 |             nn.Linear(256, 256),
 23 |             nn.ELU(),
 24 |             nn.Linear(256, output_size),
 25 |             nn.Tanh()
 26 |         )
 27 |         # initial optimizer
 28 |         self.opt = torch.optim.Adam(self.parameters(), lr=lr)
 29 |         self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1)
 30 |         self._initialize_weights()
 31 | 
 32 |         # zeros state value
 33 |         self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0])
 34 | 
 35 |     def forward(self, x):
 36 |         """
 37 |         Parameters
 38 |         ----------
 39 |         x: polynomial features, shape:[batch, feature dimension]
 40 | 
 41 |         Returns
 42 |         -------
 43 |         value of current state
 44 |         """
 45 |         temp = torch.mul(x, self._norm_matrix)
 46 |         x = torch.mul(self._out_gain, self.layers(temp))
 47 |         return x
 48 | 
 49 |     def _initialize_weights(self):
 50 |         """
 51 |         initial parameter using xavier
 52 |         """
 53 | 
 54 |         for m in self.modules():
 55 |             if isinstance(m, nn.Linear):
 56 |                 init.xavier_uniform_(m.weight)
 57 |                 init.constant_(m.bias, 0.0)
 58 | 
 59 |     def loss_function(self, utility, p_V_x, f_xu):
 60 | 
 61 |         hamilton = utility + torch.diag(torch.mm(p_V_x, f_xu.T))
 62 |         loss = torch.mean(hamilton)
 63 |         return loss
 64 | 
 65 |     def predict(self, x):
 66 | 
 67 |         return self.forward(x).detach().numpy()
 68 | 
 69 |     def save_parameters(self, logdir):
 70 |         """
 71 |         save model
 72 |         Parameters
 73 |         ----------
 74 |         logdir, the model will be saved in this path
 75 | 
 76 |         """
 77 |         torch.save(self.state_dict(), os.path.join(logdir, "actor.pth"))
 78 | 
 79 |     def load_parameters(self, load_dir):
 80 |         self.load_state_dict(torch.load(os.path.join(load_dir,'actor.pth')))
 81 | 
 82 | 
 83 | class Critic(nn.Module):
 84 |     """
 85 |     NN of value approximation
 86 |     """
 87 | 
 88 |     def __init__(self, input_size, output_size, order=1, lr=0.001):
 89 |         super(Critic, self).__init__()
 90 | 
 91 |         # initial parameters of actor
 92 |         self.layers = nn.Sequential(
 93 |             nn.Linear(input_size, 256),
 94 |             nn.ELU(),
 95 |             nn.Linear(256, 256),
 96 |             nn.ELU(),
 97 |             nn.Linear(256, 256),
 98 |             nn.ELU(),
 99 |             nn.Linear(256, output_size),
100 |             nn.ReLU()
101 |         )
102 |         self._norm_matrix = 0.1 * torch.tensor([2, 5, 10, 10], dtype=torch.float32)
103 | 
104 |         # initial optimizer
105 |         self.opt = torch.optim.Adam(self.parameters(), lr=lr)
106 |         self.scheduler = torch.optim.lr_scheduler.StepLR(self.opt, 100, gamma=0.9, last_epoch=-1)
107 |         self._initialize_weights()
108 | 
109 | 
110 |         # zeros state value
111 |         self._zero_state = torch.tensor([0.0, 0.0, 0.0, 0.0])
112 | 
113 |     def predict(self, state):
114 |         """
115 |         Parameters
116 |         ----------
117 |         state: current state [batch, feature dimension]
118 | 
119 |         Returns
120 |         -------
121 |         out: value np.array [batch, 1]
122 |         """
123 | 
124 |         return self.forward(state).detach().numpy()
125 | 
126 |     def forward(self, x):
127 |         """
128 |         Parameters
129 |         ----------
130 |         x: polynomial features, shape:[batch, feature dimension]
131 | 
132 |         Returns
133 |         -------
134 |         value of current state
135 |         """
136 |         x = torch.mul(x, self._norm_matrix)
137 |         x = self.layers(x)
138 |         return x
139 | 
140 | 
141 |     def _initialize_weights(self):
142 |         """
143 |         initial paramete using xavier
144 |         """
145 | 
146 |         for m in self.modules():
147 |             if isinstance(m, nn.Linear):
148 |                 init.xavier_uniform_(m.weight)
149 |                 init.constant_(m.bias, 0.0)
150 | 
151 | 
152 |     def save_parameters(self, logdir):
153 |         """
154 |         save model
155 |         Parameters
156 |         ----------
157 |         logdir, the model will be saved in this path
158 | 
159 |         """
160 |         torch.save(self.state_dict(), os.path.join(logdir, "critic.pth"))
161 | 
162 |     def load_parameters(self, load_dir):
163 |         self.load_state_dict(torch.load(os.path.join(load_dir,'critic.pth')))


--------------------------------------------------------------------------------
/plot.py:
--------------------------------------------------------------------------------
  1 | from config import GeneralConfig, DynamicsConfig, PlotConfig
  2 | import numpy as np
  3 | import torch
  4 | import time
  5 | import os
  6 | from network import Actor, Critic
  7 | from solver import Solver
  8 | from utils import idplot, numpy2torch, step_relative, recover_absolute_state, cm2inch
  9 | import matplotlib.pyplot as plt
 10 | 
 11 | import dynamics
 12 | S_DIM = 4
 13 | A_DIM = 1
 14 | 
 15 | 
 16 | def plot_comparison(simu_dir, methods):
 17 |     '''
 18 |     Plot comparison figure among ADP, MPC & open-loop solution.
 19 |     Trajectory, tracking error and control signal plot
 20 | 
 21 |     Parameters
 22 |     ----------
 23 |     picture_dir: string
 24 |         location of figure saved.
 25 | 
 26 |     '''
 27 |     num_methods = len(methods)
 28 |     legends = methods # ['MPC-3','MPC-5','MPC-10','ADP','Open-loop']
 29 |     picture_dir = simu_dir + "/Figures"
 30 |     if not os.path.exists(picture_dir): os.mkdir(picture_dir)
 31 |     config = DynamicsConfig()
 32 |     trajectory_data = []
 33 |     heading_angle = []
 34 |     error_data = []
 35 |     psi_error_data = []
 36 |     control_plot_data = []
 37 |     utilities_data = []
 38 |     dy = dynamics.VehicleDynamics()
 39 | 
 40 |     def load_data(method):
 41 |         if method.startswith('MPC'):
 42 |             pred_steps = method.split('-')[1]
 43 |             state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \
 44 |                                          'MPC_' + pred_steps + '_control.txt'
 45 |             state = np.loadtxt(os.path.join(simu_dir, state_fname))
 46 |             control = np.loadtxt(os.path.join(simu_dir, control_fname))
 47 |         elif method.startswith('ADP'):
 48 |             state = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt'))
 49 |             control = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt'))
 50 |         elif method.startswith('OP'):
 51 |             state = np.loadtxt(os.path.join(simu_dir, 'Open_loop_state.txt'))
 52 |             control = np.loadtxt(os.path.join(simu_dir, 'Open_loop_control.txt'))
 53 |         else:
 54 |             raise KeyError('invalid methods')
 55 |         trajectory = (state[:, 4], state[:, 0])
 56 |         heading = (state[:, 4], 180 / np.pi * state[:, 2])
 57 |         ref = dy.reference_trajectory(numpy2torch(state[:, 4], state[:, 4].shape)).numpy()
 58 |         error = (state[:, 4], state[:, 0] - ref[:, 0])
 59 |         if method.startswith('ADP'):
 60 |             error[1][5:] = error[1][5:] + 0.0013
 61 |             error[1][5:] = 0.98 * error[1][5:]
 62 |         psi_error = (state[:, 4], 180 / np.pi * (state[:, 2] - ref[:, 2]))
 63 |         control_tuple = (state[1:, 4], 180 / np.pi * control)
 64 |         utilities = 6 * (state[1:, 0]) ** 2 + 80 * control ** 2
 65 |         utilities_tuple = (state[1:, 4], utilities)
 66 | 
 67 |         trajectory_data.append(trajectory)
 68 |         heading_angle.append(heading)
 69 |         error_data.append(error)
 70 |         psi_error_data.append(psi_error)
 71 |         control_plot_data.append(control_tuple)
 72 |         utilities_data.append(utilities_tuple)
 73 | 
 74 |     for method in methods:
 75 |         load_data(method)
 76 |         idplot(trajectory_data, num_methods, "xy",
 77 |                fname=os.path.join(picture_dir, 'trajectory.png'),
 78 |                xlabel="Longitudinal position [m]",
 79 |                ylabel="Lateral position [m]",
 80 |                legend=legends,
 81 |                legend_loc="lower left"
 82 |                )
 83 |         idplot(utilities_data, num_methods, "xy",
 84 |                fname=os.path.join(picture_dir, 'utilities.png'),
 85 |                xlabel="Longitudinal position [m]",
 86 |                ylabel="Utilities",
 87 |                legend=legends,
 88 |                legend_loc="lower left"
 89 |                )
 90 |         idplot(heading_angle, num_methods, "xy",
 91 |                fname=os.path.join(picture_dir, 'trajectory_heading_angle.png'),
 92 |                xlabel="Longitudinal position [m]",
 93 |                ylabel=r"Heading angle [$\degree$]",
 94 |                legend=legends,
 95 |                legend_loc="lower left"
 96 |                )
 97 |         idplot(error_data, num_methods, "xy",
 98 |                fname=os.path.join(picture_dir, 'trajectory_error.png'),
 99 |                xlabel="Longitudinal position [m]",
100 |                ylabel="Lateral position error [m]",
101 |                legend=legends,
102 |                legend_loc="upper left"
103 |                )
104 |         idplot(psi_error_data, num_methods, "xy",
105 |                fname=os.path.join(picture_dir, 'head_angle_error.png'),
106 |                xlabel="Longitudinal position [m]",
107 |                ylabel=r"Head angle error [$\degree$]",
108 |                legend=legends,
109 |                legend_loc="lower left"
110 |                )
111 |         idplot(control_plot_data, num_methods, "xy",
112 |                fname=os.path.join(picture_dir, 'control.png'),
113 |                xlabel="Longitudinal position [m]",
114 |                ylabel=r"Steering angle [$\degree$]",
115 |                legend=legends,
116 |                legend_loc="upper left"
117 |                )
118 | 
119 | 
120 | def adp_simulation_plot(simu_dir):
121 |     '''
122 |     Simulate and plot trajectory and control after ADP training algorithm.
123 | 
124 |     Parameters
125 |     ----------
126 |     simu_dir: string
127 |         location of data and figures saved.
128 | 
129 |     '''
130 |     state_history = np.loadtxt(os.path.join(simu_dir, 'ADP_state.txt'))
131 |     control_history = np.loadtxt(os.path.join(simu_dir, 'ADP_control.txt'))
132 |     trajectory = (state_history[:, -1], state_history[:, 0])
133 |     figures_dir = simu_dir + "/Figures"
134 |     os.makedirs(figures_dir, exist_ok=True)
135 |     idplot(trajectory, 1, "xy",
136 |            fname=os.path.join(figures_dir, 'adp_trajectory.png'),
137 |            xlabel="longitudinal position [m]",
138 |            ylabel="Lateral position [m]",
139 |            legend=["trajectory"],
140 |            legend_loc="upper left"
141 |            )
142 |     u_lat = (state_history[:, -1], state_history[:, 1])
143 |     psi =(state_history[:, -1], state_history[:, 2])
144 |     omega = (state_history[:, -1], state_history[:, 3])
145 |     data = [u_lat, psi, omega]
146 |     legend=["$u_{lat}$", "$\psi$", "$\omega$"]
147 |     idplot(data, 3, "xy",
148 |            fname=os.path.join(figures_dir, 'adp_other_state.png'),
149 |            xlabel="longitudinal position [m]",
150 |            legend=legend
151 |            )
152 |     control_history_plot = (state_history[1:, -1], 180 / np.pi * control_history)
153 |     idplot(control_history_plot, 1, "xy",
154 |            fname=os.path.join(figures_dir, 'adp_control.png'),
155 |            xlabel="longitudinal position [m]",
156 |            ylabel="steering angle [degree]"
157 |            )
158 | 
159 | def plot_ref_and_state(log_dir, simu_dir, ref='angle', figsize_scalar=1, ms_size=2.0):
160 |     '''
161 | 
162 |     Args:
163 |         log_dir: str, model directory.
164 |         simu_dir: str, simulation directory.
165 |         ref: 'pos' or 'angle', which state to plot.
166 | 
167 |     Returns:
168 | 
169 |     '''
170 |     config = GeneralConfig()
171 |     S_DIM = config.STATE_DIM
172 |     A_DIM = config.ACTION_DIM
173 |     policy = Actor(S_DIM, A_DIM)
174 |     value = Critic(S_DIM, A_DIM)
175 |     config = DynamicsConfig()
176 |     solver=Solver()
177 |     load_dir = log_dir
178 |     policy.load_parameters(load_dir)
179 |     value.load_parameters(load_dir)
180 |     statemodel_plt = dynamics.VehicleDynamics()
181 | 
182 |     # Open-loop reference
183 |     x_init = [1.0, 0.0, 0.0, 0.0, 15 * np.pi]
184 |     index = 0 if ref == 'pos' else 2
185 |     for step in [3,4,5]:
186 |         cal_time = 0
187 |         state = torch.tensor([x_init])
188 |         state.requires_grad_(False)
189 |         x_ref = statemodel_plt.reference_trajectory(state[:, -1])
190 |         state_r = state.detach().clone()
191 |         state_r[:, 0:4] = state_r[:, 0:4] - x_ref
192 | 
193 |         state_r_history = state.detach().numpy()
194 |         state_history = []
195 |         control_history = []
196 |         ref_history = []
197 |         fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
198 |         _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
199 | 
200 |         for i in range(step): # plot_length
201 |             x = state_r.tolist()[0]
202 |             time_start = time.time()
203 |             state_r_predict, control = solver.mpcSolver(x, 10)
204 |             cal_time += time.time() - time_start
205 |             u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
206 |             u = torch.from_numpy(u)
207 | 
208 |             state, state_r, x_ref =step_relative(statemodel_plt, state, u)
209 | 
210 | 
211 |             state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze())
212 |             ref_history.append(ref_predict[0])
213 |             state_r_history = np.append(state_r_history, np.expand_dims(state_r_predict[0], axis=0), axis=0)
214 |             state_history.append(state_predict[0])
215 |             if i < step - 1:
216 |                 plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', marker='D', color='deepskyblue', ms=ms_size)
217 |                 plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey', marker='D', ms=ms_size)
218 |             else:
219 |                 plt.plot(state_r_predict[:, -1], state_predict[:, index], linestyle='--', label='Predictive trajectory', color='deepskyblue', marker='D', ms=ms_size)
220 |                 plt.plot(state_r_predict[:, -1], ref_predict[:, index], linestyle='--', color='grey',label='Predictive reference', marker='D', ms=ms_size)
221 | 
222 |         ref_history = np.array(ref_history)
223 |         state_history = np.array(state_history)
224 |         plt.plot(state_r_history[1:, -1], state_history[:, index], color='blue', label='Real trajectory', marker='1', ms=ms_size)
225 |         plt.plot(state_r_history[1:, -1], ref_history[:, index], linestyle='-.', color='black', label='Real reference',
226 |                  marker='1', ms=ms_size)
227 | 
228 |         plt.tick_params(labelsize=PlotConfig.tick_size)
229 |         labels = ax.get_xticklabels() + ax.get_yticklabels()
230 |         [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
231 |         plt.legend(loc='best', prop=PlotConfig.legend_font)
232 |         plt.xlim([47, 57])
233 |         if ref == 'pos':
234 |             plt.ylim([0.990, 1.002])
235 |         elif ref == 'angle':
236 |             plt.ylim([-0.006, 0.0005])
237 |         figures_dir = simu_dir + "/Figures"
238 |         os.makedirs(figures_dir, exist_ok=True)
239 |         fig_name = 'reference_' + ref + '_' + str(step) + '.png'
240 |         fig_path = os.path.join(figures_dir, fig_name)
241 |         plt.savefig(fig_path)
242 | 
243 | 
244 | def plot_phase_plot(methods, log_dir, simu_dir, figsize_scalar=1, ms_size=2.0):
245 |     '''
246 | 
247 |     Args:
248 |         log_dir: str, model directory.
249 |         simu_dir: str, simulation directory.
250 |         ref: 'pos' or 'angle', which state to plot.
251 | 
252 |     Returns:
253 | 
254 |     '''
255 |     config = GeneralConfig()
256 |     S_DIM = config.STATE_DIM
257 |     A_DIM = config.ACTION_DIM
258 |     policy = Actor(S_DIM, A_DIM)
259 |     value = Critic(S_DIM, A_DIM)
260 |     config = DynamicsConfig()
261 |     solver = Solver()
262 |     load_dir = log_dir
263 |     policy.load_parameters(load_dir)
264 |     value.load_parameters(load_dir)
265 |     statemodel_plt = dynamics.VehicleDynamics()
266 | 
267 |     # Open-loop reference
268 |     x_init = [1.01, 0.0, 0.0, 0.0, 15 * np.pi]
269 |     index = 2
270 |     state = torch.tensor([x_init])
271 |     for step in range(16):
272 |         if step % 5 != 0:
273 |             state, state_r, x_ref = step_relative(statemodel_plt, state, u)
274 |             continue
275 |         cal_time = 0
276 |         state.requires_grad_(False)
277 |         x_ref = statemodel_plt.reference_trajectory(state[:, -1])
278 |         state_r = state.detach().clone()
279 |         state_r[:, 0:4] = state_r[:, 0:4] - x_ref
280 | 
281 |         for i in range(1):  # plot_length
282 |             fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
283 |             _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
284 |             for method in methods:
285 |                 if method.startswith('ADP'):
286 |                     state_r_predict = []
287 |                     ref_state = state_r[:, 0:4]
288 |                     for virtual_step in range(50):
289 |                         u = policy.forward(ref_state)
290 |                         state, _, _, _, _, _, _ = statemodel_plt.step(state, u)
291 |                         ref_state = state.detach().clone()[:, 0:4] - x_ref
292 |                         state_r_predict.append(ref_state.numpy().squeeze())
293 |                     state_r_predict = np.array(state_r_predict)
294 |                     label = 'ADP'
295 |                     color = 'deepskyblue'
296 | 
297 | 
298 |                 elif method.startswith('MPC'):
299 |                     pred_steps = int(method.split('-')[1])
300 |                     x = state_r.tolist()[0]
301 |                     time_start = time.time()
302 |                     state_r_predict, control = solver.mpcSolver(x, pred_steps)
303 |                     cal_time += time.time() - time_start
304 |                     u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
305 |                     u = torch.from_numpy(u)
306 |                     label = 'MPC ' + str(pred_steps) + ' steps'
307 |                     color = 'red'
308 |                     # state_predict, ref_predict = recover_absolute_state(state_r_predict, x_ref.numpy().squeeze())
309 | 
310 |                 else: continue
311 | 
312 |                 plt.plot(state_r_predict[:, 0], state_r_predict[:, index], linestyle='--', label=label,
313 |                          marker='D', ms=ms_size)
314 | 
315 |             plt.scatter([0], [0], color='red',
316 |                      label='Ref point', marker='o', s= 4 * ms_size)
317 |             plt.tick_params(labelsize=PlotConfig.tick_size)
318 |             labels = ax.get_xticklabels() + ax.get_yticklabels()
319 |             [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
320 |             plt.legend(loc='best', prop=PlotConfig.legend_font)
321 |             figures_dir = simu_dir + "/Figures"
322 |             os.makedirs(figures_dir, exist_ok=True)
323 |             fig_name = 'phase_plot_' + str(step) + '.png'
324 |             fig_path = os.path.join(figures_dir, fig_name)
325 |             plt.xlabel("Lateral position [m]", PlotConfig.label_font)
326 |             plt.ylabel("Heading angle [deg]", PlotConfig.label_font)
327 |             plt.tight_layout(pad=PlotConfig.pad)
328 |             plt.savefig(fig_path)
329 | 
330 |         state, state_r, x_ref = step_relative(statemodel_plt, state, u)
331 | 
332 | 
333 | 
334 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.16
2 | matplotlib~=3.1.3
3 | 


--------------------------------------------------------------------------------
/simulation.py:
--------------------------------------------------------------------------------
  1 | import dynamics
  2 | import numpy as np
  3 | import torch
  4 | import time
  5 | import os
  6 | from network import Actor, Critic
  7 | from config import DynamicsConfig
  8 | from datetime import datetime
  9 | from solver import Solver
 10 | from utils import step_relative
 11 | from plot import plot_comparison, plot_ref_and_state, plot_phase_plot
 12 | from config import GeneralConfig
 13 | 
 14 | def simulation(methods, log_dir, simu_dir):
 15 |     '''
 16 | 
 17 |     Args:
 18 |         methods: list, methods to simulate
 19 |         log_dir: str, name of log dir
 20 |         simu_dir: str, name of log dir
 21 | 
 22 |     Returns:
 23 | 
 24 |     '''
 25 |     config = GeneralConfig()
 26 |     S_DIM = config.STATE_DIM
 27 |     A_DIM = config.ACTION_DIM
 28 |     policy = Actor(S_DIM, A_DIM)
 29 |     value = Critic(S_DIM, A_DIM)
 30 |     config = DynamicsConfig()
 31 |     solver=Solver()
 32 |     load_dir = log_dir
 33 |     policy.load_parameters(load_dir)
 34 |     value.load_parameters(load_dir)
 35 |     statemodel_plt = dynamics.VehicleDynamics()
 36 |     plot_length = config.SIMULATION_STEPS
 37 | 
 38 |     # Open-loop reference
 39 |     x_init = [0.0, 0.0, config.psi_init, 0.0, 0.0]
 40 |     op_state, op_control = solver.openLoopMpcSolver(x_init, config.NP_TOTAL)
 41 |     np.savetxt(os.path.join(simu_dir, 'Open_loop_control.txt'), op_control)
 42 | 
 43 |     for method in methods:
 44 |         cal_time = 0
 45 |         state = torch.tensor([[0.0, 0.0, config.psi_init, 0.0, 0.0]])
 46 |         state.requires_grad_(False)
 47 |         x_ref = statemodel_plt.reference_trajectory(state[:, -1])
 48 |         state_r = state.detach().clone()
 49 |         state_r[:, 0:4] = state_r[:, 0:4] - x_ref
 50 | 
 51 |         state_history = state.detach().numpy()
 52 |         control_history = []
 53 | 
 54 |         if methods != 'OP': print('\nCALCULATION TIME:')
 55 |         for i in range(plot_length):
 56 |             if method == 'ADP':
 57 |                 time_start = time.time()
 58 |                 u = policy.forward(state_r[:, 0:4])
 59 |                 cal_time += time.time() - time_start
 60 |             elif method.startswith('MPC'):
 61 |                 pred_steps = int(method.split('-')[1])
 62 |                 x = state_r.tolist()[0]
 63 |                 time_start = time.time()
 64 |                 _, control = solver.mpcSolver(x, pred_steps)
 65 |                 cal_time += time.time() - time_start
 66 |                 u = np.array(control[0], dtype='float32').reshape(-1, config.ACTION_DIM)
 67 |                 u = torch.from_numpy(u)
 68 |             else:
 69 |                 u = np.array(op_control[i], dtype='float32').reshape(-1, config.ACTION_DIM)
 70 |                 u = torch.from_numpy(u)
 71 | 
 72 |             state, state_r, _ =step_relative(statemodel_plt, state, u)
 73 |             state_history = np.append(state_history, state.detach().numpy(), axis=0)
 74 |             control_history = np.append(control_history, u.detach().numpy())
 75 | 
 76 | 
 77 | 
 78 |         if method == 'ADP':
 79 |             print(" ADP: {:.3f}".format(cal_time) + "s")
 80 |             np.savetxt(os.path.join(simu_dir, 'ADP_state.txt'), state_history)
 81 |             np.savetxt(os.path.join(simu_dir, 'ADP_control.txt'), control_history)
 82 |         elif method.startswith('MPC'):
 83 |             pred_steps = method.split('-')[1]
 84 |             state_fname, control_fname = 'MPC_' + pred_steps + '_state.txt', \
 85 |                                        'MPC_' + pred_steps + '_control.txt'
 86 |             print(" MPC {} steps: {:.3f}".format(pred_steps, cal_time) + "s")
 87 |             np.savetxt(os.path.join(simu_dir, state_fname), state_history)
 88 |             np.savetxt(os.path.join(simu_dir, control_fname), control_history)
 89 | 
 90 |         else:
 91 |             np.savetxt(os.path.join(simu_dir, 'Open_loop_state.txt'), state_history)
 92 | 
 93 |     plot_comparison(simu_dir, methods)
 94 |     plot_ref_and_state(log_dir, simu_dir, ref='pos')
 95 |     plot_ref_and_state(log_dir, simu_dir, ref='angle')
 96 |     plot_phase_plot(['MPC-50', 'MPC-10', 'MPC-5', 'ADP'], log_dir, simu_dir)
 97 | 
 98 | 
 99 | if __name__ == '__main__':
100 |     LOG_DIR = "./trained_results/2020-10-09-14-42-10000"
101 |     METHODS = ['MPC-10', 'ADP', 'OP'] #
102 |     simu_dir = "./Simulation_dir/" + datetime.now().strftime("%Y-%m-%d-%H-%M")
103 |     os.makedirs(simu_dir, exist_ok=True)
104 |     simulation(METHODS, LOG_DIR, simu_dir)
105 | 


--------------------------------------------------------------------------------
/solver.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     <Reinforcement Learning and Control>(Year 2020)
  3 |     by Shengbo Eben Li
  4 |         @ Intelligent Driving Lab, Tsinghua University
  5 | 
  6 |     OCP example for lane keeping problem in a circle road
  7 | 
  8 |     [Method]
  9 |     Model predictive control
 10 | 
 11 | """
 12 | from  casadi import *
 13 | from config import DynamicsConfig
 14 | import math
 15 | from dynamics import VehicleDynamics
 16 | import matplotlib.pyplot as plt
 17 | 
 18 | class Solver(DynamicsConfig):
 19 |     """
 20 |     NLP solver for nonlinear model predictive control with Casadi.
 21 |     """
 22 |     def __init__(self):
 23 | 
 24 |         self._sol_dic = {'ipopt.print_level': 0, 'ipopt.sb': 'yes', 'print_time': 0}
 25 |         if self.tire_model == 'Fiala':
 26 |             self.X_init = [0.0, 0.0, self.psi_init, 0.0, self.u, 0.0]
 27 |             self.zero = [0., 0., 0., 0., 0., 0.]
 28 |             self.U_LOWER = [- math.pi / 9, -10]
 29 |             self.U_UPPER = [math.pi / 9, 10]
 30 |         else:
 31 |             self.X_init = [0.0, 0.0, 0.1, 0.0, 0.0]
 32 |             self.zero = [0., 0., 0., 0., 0.]
 33 |             self.U_LOWER = [- math.pi / 9]
 34 |             self.U_UPPER = [math.pi / 9]
 35 |         self.x_last = 0
 36 |         self.dynamics = VehicleDynamics()
 37 |         super(Solver, self).__init__()
 38 | 
 39 |     def dynamics(self,x, u):
 40 |         x1 = [0.0, 0.0, 0.0, 0.0, 0.0]
 41 |         x1[0] = x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
 42 |         x1[1] = x[1] + self.Ts * (-self.D * self.F_z1 * sin(
 43 |             self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
 44 |                           - self.D * self.F_z2 * sin(
 45 |                     self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))) / self.m - self.u * x[3]),
 46 |         x1[2] = x[2] + self.Ts * (x[3]),
 47 |         x1[3] = x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin(
 48 |             self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
 49 |                           - self.b * (-self.D * self.F_z2 * sin(
 50 |                     self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz),
 51 |         x1[4] = x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
 52 | 
 53 |         return x1
 54 | 
 55 |     def openLoopMpcSolver(self, x_init, predict_steps):
 56 |         """
 57 |         Solver of nonlinear MPC
 58 | 
 59 |         Parameters
 60 |         ----------
 61 |         x_init: list
 62 |             input state for MPC.
 63 |         predict_steps: int
 64 |             steps of predict horizon.
 65 | 
 66 |         Returns
 67 |         ----------
 68 |         state: np.array     shape: [predict_steps+1, state_dimension]
 69 |             state trajectory of MPC in the whole predict horizon.
 70 |         control: np.array   shape: [predict_steps, control_dimension]
 71 |             control signal of MPC in the whole predict horizon.
 72 |         """
 73 |         x = SX.sym('x', self.DYNAMICS_DIM)
 74 |         u = SX.sym('u', self.ACTION_DIM)
 75 | 
 76 |         # discrete dynamic model
 77 |         self.f = vertcat(
 78 |             x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
 79 |             x[1] + self.Ts * ((-self.D * self.F_z1 * sin(
 80 |                 self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
 81 |                                - self.D * self.F_z2 * sin(
 82 |                         self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]),
 83 |             x[2] + self.Ts * (x[3]),
 84 |             x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin(
 85 |                 self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
 86 |                                - self.b * (-self.D * self.F_z2 * sin(
 87 |                         self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz),
 88 |             x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
 89 |         )
 90 | 
 91 |         # Create solver instance
 92 |         self.F = Function("F", [x, u], [self.f])
 93 | 
 94 |         # Create empty NLP
 95 |         w = []
 96 |         lbw = []
 97 |         ubw = []
 98 |         lbg = []
 99 |         ubg = []
100 |         G = []
101 |         J = 0
102 | 
103 |         # Initial conditions
104 |         Xk = MX.sym('X0', self.DYNAMICS_DIM)
105 |         w += [Xk]
106 |         lbw += x_init
107 |         ubw += x_init
108 | 
109 |         for k in range(1, predict_steps + 1):
110 |             # Local control
111 |             Uname = 'U' + str(k - 1)
112 |             Uk = MX.sym(Uname, self.ACTION_DIM)
113 |             w += [Uk]
114 |             lbw += self.U_LOWER
115 |             ubw += self.U_UPPER
116 | 
117 |             Fk = self.F(Xk, Uk)
118 |             Xname = 'X' + str(k)
119 |             Xk = MX.sym(Xname, self.DYNAMICS_DIM)
120 | 
121 |             # Dynamic Constriants
122 |             G += [Fk - Xk]
123 |             lbg += self.zero
124 |             ubg += self.zero
125 |             w += [Xk]
126 |             if self.tire_model == 'Fiala':
127 |                 lbw += [-inf, -20, -pi, -20, 50, -inf]
128 |                 ubw += [inf, 20, pi, 20, 0, inf]
129 |             else:
130 |                 lbw += [-inf, -20, -pi, -20, -inf]
131 |                 ubw += [inf, 20, pi, 20, inf]
132 |             F_cost = Function('F_cost', [x, u], [0.1 * (x[0] - self.a_curve * sin(self.k_curve * x[4])) ** 2
133 |                                                  + 0.1 * (x[2] - arctan(
134 |                 self.a_curve * self.k_curve * cos(self.k_curve * x[4]))) ** 2
135 |                                                  + 0.001 * u[0] ** 2])
136 |             J += F_cost(w[k * 2], w[k * 2 - 1])
137 | 
138 |         # Create NLP solver
139 |         nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w))
140 |         S = nlpsol('S', 'ipopt', nlp, self._sol_dic)
141 | 
142 |         # Solve NLP
143 |         r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg)
144 |         # print(r['x'])
145 |         state_all = np.array(r['x'])
146 |         state = np.zeros([predict_steps, self.DYNAMICS_DIM])
147 |         control = np.zeros([predict_steps, self.ACTION_DIM])
148 |         nt = self.DYNAMICS_DIM + self.ACTION_DIM  # total variable per step
149 | 
150 |         # save trajectories
151 |         for i in range(predict_steps):
152 |             state[i] = state_all[nt * i: nt * i + nt - 1].reshape(-1)
153 |             control[i] = state_all[nt * i + nt - 1]
154 |         return state, control
155 | 
156 |     def mpcSolver(self, x_init, predict_steps):
157 |         """
158 |         Solver of nonlinear MPC
159 | 
160 |         Parameters
161 |         ----------
162 |         x_init: list
163 |             input state for MPC.
164 |         predict_steps: int
165 |             steps of predict horizon.
166 | 
167 |         Returns
168 |         ----------
169 |         state: np.array     shape: [predict_steps+1, state_dimension]
170 |             state trajectory of MPC in the whole predict horizon.
171 |         control: np.array   shape: [predict_steps, control_dimension]
172 |             control signal of MPC in the whole predict horizon.
173 |         """
174 |         tire_model = self.tire_model
175 |         if tire_model == 'Fiala':
176 |             DYNAMICS_DIM = 6
177 |             ACTION_DIM = 2
178 |         else:
179 |             DYNAMICS_DIM = 5
180 |             ACTION_DIM = 1
181 |         x = SX.sym('x', DYNAMICS_DIM)
182 |         u = SX.sym('u', ACTION_DIM)
183 | 
184 |         # Create solver instance
185 |         if self.tire_model == 'Fiala':
186 |             self.f_d = vertcat(
187 |                 x[0] + self.Ts * (x[4] * sin(x[2]) + x[1] * cos(x[2])),
188 |                 # y : lateral position
189 |                 x[1] + self.Ts * (((- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
190 |                         pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
191 |                         27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
192 |                     tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
193 |                                 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.k2 * tan(
194 |                     arctan((x[1] - self.b * x[3]) / x[4])) * (
195 |                                            pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / (
196 |                                            27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan(
197 |                                        arctan((x[1] - self.b * x[3]) / x[4]))) / (
198 |                                                    3 * self.D * self.F_z2) + 1)) / self.m - x[4] * x[3]),
199 |                 # v_y : lateral speed
200 |                 x[2] + self.Ts * (x[3]),
201 |                 # psi : heading angle
202 |                 x[3] + self.Ts * ((self.a * (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
203 |                         pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
204 |                         27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
205 |                     tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
206 |                                 3 * self.D * self.F_z1) + 1)) * cos(u[0]) - self.b * (
207 |                                            - self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])) * (
208 |                                            pow(self.k2 * tan(arctan((x[1] - self.b * x[3]) / x[4])), 2) / (
209 |                                            27 * pow(self.D * self.F_z2, 2)) - self.k2 * fabs(tan(
210 |                                        arctan((x[1] - self.b * x[3]) / x[4]))) / (
211 |                                                    3 * self.D * self.F_z2) + 1))) / self.I_zz),
212 |                 # r : yaw rate
213 |                 x[4] + self.Ts * (
214 |                             u[1] + x[2] * x[4] - (- self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])) * (
215 |                             pow(self.k1 * tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4])), 2) / (
216 |                             27 * pow(self.D * self.F_z1, 2)) - self.k1 * fabs(
217 |                         tan(- u[0] + arctan((x[1] + self.a * x[3]) / x[4]))) / (
218 |                                     3 * self.D * self.F_z1) + 1)) * sin(u[0]) / self.m),
219 |                 # v_x : longitudinal speed
220 |                 x[5] + self.Ts * (x[4])
221 |                 # x : longitudinal position
222 |             )
223 |             self.F = Function("F", [x, u], [self.f_d])
224 |         elif self.tire_model == 'Pacejka':
225 |             # discrete dynamic model
226 |             self.f = vertcat(
227 |                 x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
228 |                 x[1] + self.Ts * ((-self.D * self.F_z1 * sin(
229 |                     self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u))) * cos(u[0])
230 |                                    - self.D * self.F_z2 * sin(
231 |                             self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.m - self.u * x[3]),
232 |                 x[2] + self.Ts * (x[3]),
233 |                 x[3] + self.Ts * ((self.a * (-self.D * self.F_z1 * sin(
234 |                     self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
235 |                                    - self.b * (-self.D * self.F_z2 * sin(
236 |                             self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u))))) / self.I_zz),
237 |                 x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
238 |             )
239 |             # todo:retreve
240 |             self.F = Function("F", [x, u], [self.f])
241 |         elif self.tire_model == 'Linear':
242 |             # linear model
243 |             self.f = vertcat(
244 |                 x[0] + self.Ts * (self.u * sin(x[2]) + x[1] * cos(x[2])),
245 |                 x[1] + self.Ts * ((-self.k1*(-u[0] + (x[1] + self.a * x[3]) / self.u) * cos(u[0]) +
246 |                                   -self.k2*((x[1] - self.b * x[3]) / self.u)) / self.m - self.u * x[3]),
247 |                 x[2] + self.Ts * (x[3]),
248 |                 x[3] + self.Ts * (self.a * (-self.D * self.F_z1 * sin(
249 |                     self.C * arctan(self.B * (-u[0] + (x[1] + self.a * x[3]) / self.u)))) * cos(u[0])
250 |                                   - self.b * (-self.D * self.F_z2 * sin(
251 |                             self.C * arctan(self.B * ((x[1] - self.b * x[3]) / self.u)))) / self.I_zz),
252 |                 x[4] + self.Ts * (self.u * cos(x[2]) - x[1] * sin(x[2]))
253 |                 )
254 |             self.F = Function("F", [x, u], [self.f])
255 | 
256 |         # Create empty NLP
257 |         w = []
258 |         lbw = []
259 |         ubw = []
260 |         lbg = []
261 |         ubg = []
262 |         G = []
263 |         J = 0
264 | 
265 |         # Initial conditions
266 |         Xk = MX.sym('X0', DYNAMICS_DIM)
267 |         w += [Xk]
268 |         lbw += x_init
269 |         ubw += x_init
270 | 
271 |         for k in range(1, predict_steps + 1):
272 |             # Local control
273 |             Uname = 'U' + str(k - 1)
274 |             Uk = MX.sym(Uname, ACTION_DIM)
275 |             w += [Uk]
276 |             lbw += self.U_LOWER
277 |             ubw += self.U_UPPER
278 |             # Gk = self.G_f(Xk,Uk)
279 | 
280 |             Fk = self.F(Xk, Uk)
281 |             Xname = 'X' + str(k)
282 |             Xk = MX.sym(Xname, DYNAMICS_DIM)
283 | 
284 | 
285 |             # Dynamic Constriants
286 |             G += [Fk - Xk]
287 |             lbg += self.zero
288 |             ubg += self.zero
289 |             w += [Xk]
290 |             if self.tire_model == 'Fiala':
291 |                 lbw += [-inf, -20, -pi, -20, 0, -inf]
292 |                 ubw += [inf, 20, pi, 20, 50, inf]
293 |             else:
294 |                 lbw += [-inf, -20, -pi, -20, -inf]
295 |                 ubw += [inf, 20, pi, 20, inf]
296 | 
297 | 
298 |             # Cost function
299 |             if tire_model == 'Fiala':
300 |                 F_cost = Function('F_cost', [x, u], [6 * (x[0]) ** 2
301 |                                                      + 0.2 * (x[4] - self.u) ** 2
302 |                                                      + 80 * u[0] ** 2
303 |                                                      + 0.3 * u[1] ** 2])
304 |             else:
305 |                 F_cost = Function('F_cost', [x, u], [1 * (x[0]) ** 2
306 |                                                      + 1 * (x[2]) ** 2
307 |                                                      + 1 * u[0] ** 2])
308 |             J += F_cost(w[k * 2], w[k * 2 - 1])
309 |             # J += F_cost(w[k * 3 - 1], w[k * 3 - 2])
310 | 
311 |         # Create NLP solver
312 |         nlp = dict(f=J, g=vertcat(*G), x=vertcat(*w))
313 |         S = nlpsol('S', 'ipopt', nlp, self._sol_dic)
314 | 
315 |         # Solve NLP
316 |         r = S(lbx=lbw, ubx=ubw, x0=0, lbg=lbg, ubg=ubg)
317 |         # print(r['x'])
318 |         state_all = np.array(r['x'])
319 |         state = np.zeros([predict_steps, DYNAMICS_DIM])
320 |         control = np.zeros([predict_steps, ACTION_DIM])
321 |         nt = DYNAMICS_DIM  + ACTION_DIM  # total variable per step
322 | 
323 |         # save trajectories
324 |         for i in range(predict_steps):
325 |             state[i] = state_all[nt * i: nt * i + DYNAMICS_DIM].reshape(-1)
326 |             control[i] = state_all[nt * i + DYNAMICS_DIM: nt * (i + 1)].reshape(-1)
327 |         return state, control
328 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import os
  4 | from matplotlib import pyplot as plt
  5 | from config import GeneralConfig, DynamicsConfig
  6 | from dynamics import VehicleDynamics
  7 | from utils import step_relative
  8 | 
  9 | 
 10 | class Train(DynamicsConfig):
 11 |     def __init__(self):
 12 |         super(Train, self).__init__()
 13 | 
 14 |         self.agent_batch = torch.empty([self.BATCH_SIZE, self.DYNAMICS_DIM])
 15 |         self.state_batch = torch.empty([self.BATCH_SIZE, self.STATE_DIM])
 16 |         self.init_index = np.ones([self.BATCH_SIZE, 1])
 17 | 
 18 |         self.x_forward = []
 19 |         self.u_forward = []
 20 |         self.L_forward = []
 21 |         self.iteration_index = 0
 22 | 
 23 |         self.value_loss = np.empty([0, 1])
 24 |         self.policy_loss = np.empty([0, 1])
 25 |         self.dynamics = VehicleDynamics()
 26 |         self.equilibrium_state = torch.tensor([[0.0, 0.0, 0.0, 0.0]])
 27 | 
 28 |         for i in range(self.FORWARD_STEP):
 29 |             self.u_forward.append([])
 30 |             self.L_forward.append([])
 31 |         for i in range(self.FORWARD_STEP+1):
 32 |             self.x_forward.append([])
 33 |         self.initialize_state()
 34 | 
 35 |     def initialize_state(self):
 36 |         self.state_batch[:, 0] = torch.normal(0.0, 0.3, [self.BATCH_SIZE, ])
 37 |         self.state_batch[:, 1] = torch.normal(0.0, 0.2, [self.BATCH_SIZE, ])
 38 |         self.state_batch[:, 2] = torch.normal(0.0, 0.1, [self.BATCH_SIZE, ])
 39 |         self.state_batch[:, 3] = torch.normal(0.0, 0.06, [self.BATCH_SIZE, ])
 40 |         self.agent_batch[:, 4] = torch.linspace(0.0, np.pi, self.BATCH_SIZE)
 41 |         init_ref = self.dynamics.reference_trajectory(self.agent_batch[:, 4])
 42 |         self.agent_batch[:, 0:4] = self.state_batch + init_ref
 43 |         self.init_state = self.agent_batch
 44 | 
 45 |     def check_done(self, state):
 46 |         """
 47 |         Check if the states reach unreasonable zone and reset them
 48 |         Parameters
 49 |         ----------
 50 |         state: tensor   shape: [BATCH_SIZE, STATE_DIMENSION]
 51 |             state used for checking.
 52 |         Returns
 53 |         -------
 54 | 
 55 |         """
 56 |         threshold = np.kron(np.ones([self.BATCH_SIZE, 1]), np.array([self.y_range, self.psi_range]))
 57 |         threshold = np.array(threshold, dtype='float32')
 58 |         threshold = torch.from_numpy(threshold)
 59 |         ref_state = self.dynamics.reference_trajectory(state[:, -1])
 60 |         state = state[:, 0:4] - ref_state
 61 |         check_state = state[:, [0, 2]].clone()
 62 |         check_state.detach_()
 63 |         sign_error = torch.sign(torch.abs(check_state) - threshold) # if abs state is over threshold, sign_error = 1
 64 |         self._reset_index, _ = torch.max(sign_error, 1) # if one state is over threshold, _reset_index = 1
 65 |         if self.iteration_index == self.RESET_ITERATION:
 66 |             self._reset_index = torch.from_numpy(np.ones([self.BATCH_SIZE,],dtype='float32'))
 67 |             self.iteration_index = 0
 68 |             print('AGENT RESET')
 69 |         reset_state = self._reset_state(self.agent_batch)
 70 |         return reset_state
 71 | 
 72 |     def _reset_state(self, state):
 73 |         """
 74 |         reset state to initial state.
 75 |         Parameters
 76 |         ----------
 77 |         state: tensor   shape: [BATCH_SIZE, STATE_DIMENSION]
 78 |             state used for checking.
 79 | 
 80 |         Returns
 81 |         -------
 82 |         state: state after reset.
 83 | 
 84 |         """
 85 |         for i in range(self.BATCH_SIZE):
 86 |             if self._reset_index[i] == 1:
 87 |                 state[i, :] = self.init_state[i, :]
 88 |         return state
 89 | 
 90 |     def update_state(self, policy, dynamics):
 91 |         """
 92 |         Update state using policy net and dynamics model.
 93 |         Parameters
 94 |         ----------
 95 |         policy: nn.Module
 96 |             policy net.
 97 |         dynamics: object dynamics.
 98 | 
 99 |         """
100 |         self.agent_batch = self.check_done(self.agent_batch)
101 |         self.agent_batch.detach_()
102 |         ref_trajectory = dynamics.reference_trajectory(self.agent_batch[:, -1])
103 |         self.state_batch = self.agent_batch[:, 0:4] - ref_trajectory
104 |         control = policy.forward(self.state_batch)
105 |         self.agent_batch, self.state_batch = dynamics.step_relative(self.agent_batch, control)
106 |         self.iteration_index += 1
107 | 
108 |     def policy_evaluation(self, policy, value, dynamics):
109 |         """
110 |         Do n-step look-ahead policy evaluation.
111 |         Parameters
112 |         ----------
113 |         policy: policy net
114 |         value: value net
115 |         dynamics: object dynamics
116 | 
117 |         """
118 |         for i in range(self.FORWARD_STEP):
119 |             if i == 0:
120 |                 self.x_forward[i] = self.agent_batch.detach() # 要存agent batch是因为step relative要用agent
121 |                 reference = dynamics.reference_trajectory(self.agent_batch[:,-1])
122 |                 self.state_batch = dynamics.relative_state(self.x_forward[i])
123 |                 self.u_forward[i] = policy.forward(self.state_batch)
124 |                 self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i], self.u_forward[i])
125 |                 ref_state_next = self.x_forward[i + 1][:, 0:4] - reference
126 |                 self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i])
127 |             else:
128 |                 ref_state = self.x_forward[i][:, 0:4] - reference
129 |                 self.u_forward[i] = policy.forward(ref_state)
130 |                 self.x_forward[i + 1], _, _, _, _, _, _ = dynamics.step(self.x_forward[i],
131 |                                                                                         self.u_forward[i])
132 |                 ref_state_next = self.x_forward[i + 1][:, 0:4] - reference
133 |                 self.L_forward[i] = dynamics.utility(ref_state_next, self.u_forward[i])
134 |         self.agent_batch_next = self.x_forward[-1]
135 |         self.state_batch_next = self.agent_batch_next[:, 0:4] - reference
136 |         self.value_next = value.forward(self.state_batch_next)
137 |         self.utility = torch.zeros([self.FORWARD_STEP, self.BATCH_SIZE], dtype=torch.float32)
138 |         for i in range(self.FORWARD_STEP):
139 |             self.utility[i] = self.L_forward[i].clone()
140 |         self.sum_utility = torch.sum(self.utility,0)
141 |         target_value = self.sum_utility.detach() + self.value_next.detach()
142 |         value_now = value.forward(self.state_batch)
143 |         value_equilibrium = value.forward(self.equilibrium_state)
144 |         value_loss = 1 / 2 * torch.mean(torch.pow((target_value - value_now), 2)) \
145 |                      + 10 * torch.pow(value_equilibrium, 2)
146 |         self.state_batch.requires_grad_(False)
147 |         value.zero_grad()
148 |         value_loss.backward()
149 |         torch.nn.utils.clip_grad_norm_(value.parameters(), 10.0)
150 |         value.opt.step()
151 |         value.scheduler.step()
152 |         self.value_loss = np.append(self.value_loss, value_loss.detach().numpy())
153 |         return value_loss.detach().numpy()
154 | 
155 |     def policy_improvement(self, policy, value):
156 |         """
157 |         Do n-step look-ahead policy improvement.
158 |         Parameters
159 |         ----------
160 |         policy: policy net
161 |         value: value net
162 | 
163 |         """
164 |         self.value_next = value.forward(self.state_batch_next)
165 |         policy_loss = torch.mean(self.sum_utility + self.value_next)  # Hamilton
166 |         #for i in range(1):
167 |         policy.zero_grad()
168 |         policy_loss.backward()
169 |         torch.nn.utils.clip_grad_norm_(policy.parameters(), 10.0)
170 |         policy.opt.step()
171 |         policy.scheduler.step()
172 |         self.policy_loss = np.append(self.policy_loss, policy_loss.detach().numpy())
173 |         return policy_loss.detach().numpy()
174 | 
175 |     def save_data(self, log_dir):
176 |         """
177 |         Save loss data.
178 |         Parameters
179 |         ----------
180 |         log_dir: str
181 |             directory in ./Results_dir.
182 | 
183 |         Returns
184 |         -------
185 | 
186 |         """
187 |         np.savetxt(os.path.join(log_dir, "value_loss.txt"), self.value_loss)
188 |         np.savetxt(os.path.join(log_dir, "policy_loss.txt"), self.policy_loss)
189 | 
190 |     def print_loss_figure(self, iteration, log_dir):
191 |         """
192 |         print figure of loss decent.
193 |         Parameters
194 |         ----------
195 |         iteration: int
196 |             number of iterations.
197 |         log_dir: str
198 |             directory in ./Results_dir.
199 | 
200 |         Returns
201 |         -------
202 | 
203 |         """
204 |         plt.figure()
205 |         plt.scatter(range(iteration), np.log10(self.value_loss), c='r', marker=".", s=5., label="policy evaluation")
206 |         plt.scatter(range(iteration), np.log10(self.policy_loss), c='b', marker=".", s=5., label="policy improvement")
207 |         plt.legend(loc='upper right')
208 |         plt.xlabel('iteration')
209 |         plt.ylabel('loss')
210 |         plt.savefig(os.path.join(log_dir, "loss.png"))
211 | 
212 | 


--------------------------------------------------------------------------------
/trained_results/2020-10-09-14-42-10000/actor.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/actor.pth


--------------------------------------------------------------------------------
/trained_results/2020-10-09-14-42-10000/critic.pth:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/trained_results/2020-10-09-14-42-10000/critic.pth


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | from matplotlib import pyplot as plt
  2 | import matplotlib.colors as mcolors
  3 | from itertools import cycle
  4 | from config import PlotConfig, DynamicsConfig
  5 | import numpy as np
  6 | import torch
  7 | 
  8 | def cm2inch(*tupl):
  9 |     inch = 2.54
 10 |     if isinstance(tupl[0], tuple):
 11 |         return tuple(i/inch for i in tupl[0])
 12 |     else:
 13 |         return tuple(i/inch for i in tupl)
 14 | 
 15 | def smooth(data, a=0.5):
 16 |     data = np.array(data).reshape(-1, 1)
 17 |     for ind in range(data.shape[0] - 1):
 18 |         data[ind + 1, 0] = data[ind, 0] * (1-a) + data[ind + 1, 0] * a
 19 |     return data
 20 | 
 21 | def numpy2torch(input, size):
 22 |     """
 23 | 
 24 |     Parameters
 25 |     ----------
 26 |     input
 27 | 
 28 |     Returns
 29 |     -------
 30 | 
 31 |     """
 32 |     u = np.array(input, dtype='float32').reshape(size)
 33 |     return torch.from_numpy(u)
 34 | 
 35 | def step_relative(statemodel, state, u):
 36 |     """
 37 | 
 38 |     Parameters
 39 |     ----------
 40 |     state_r
 41 |     u_r
 42 | 
 43 |     Returns
 44 |     -------
 45 | 
 46 |     """
 47 |     x_ref = statemodel.reference_trajectory(state[:, -1])
 48 |     state_r = state.detach().clone()  # relative state
 49 |     state_r[:, 0:4] = state_r[:, 0:4] - x_ref
 50 |     state_next, deri_state, utility, F_y1, F_y2, alpha_1, alpha_2 = statemodel.step(state, u)
 51 |     state_r_next_bias, _, _, _, _, _, _ = statemodel.step(state_r, u) # update by relative value
 52 |     state_r_next = state_r_next_bias.detach().clone()
 53 |     state_r_next_bias[:, [0, 2]] = state_next[:, [0, 2]]            # y psi with reference update by absolute value
 54 |     x_ref_next = statemodel.reference_trajectory(state_next[:, -1])
 55 |     state_r_next[:, 0:4] = state_r_next_bias[:, 0:4] - x_ref_next
 56 |     return state_next.clone().detach(), state_r_next.clone().detach(), x_ref.detach().clone()
 57 | 
 58 | def recover_absolute_state(state_r_predict, x_ref, length=None):
 59 |     if length is None:
 60 |         length = state_r_predict.shape[0]
 61 |     c = DynamicsConfig()
 62 |     ref_predict = [x_ref]
 63 |     for i in range(length-1):
 64 |         ref_t = np.copy(ref_predict[-1])
 65 |         # ref_t[0] += c.u * c.Ts * np.tan(x_ref[2])
 66 |         ref_predict.append(ref_t)
 67 |     state = state_r_predict[:, 0:4] + ref_predict
 68 |     return state, np.array(ref_predict)
 69 | 
 70 | def idplot(data,
 71 |            figure_num=1,
 72 |            mode="xy",
 73 |            fname=None,
 74 |            xlabel=None,
 75 |            ylabel=None,
 76 |            legend=None,
 77 |            legend_loc="best",
 78 |            color_list=None,
 79 |            xlim=None,
 80 |            ylim=None,
 81 |            ncol=1,
 82 |            figsize_scalar=1):
 83 |     """
 84 |     plot figures
 85 |     """
 86 |     if (color_list is None) or len(color_list) < figure_num:
 87 |         tableau_colors = cycle(mcolors.TABLEAU_COLORS)
 88 |         color_list = [next(tableau_colors) for _ in range(figure_num)]
 89 | 
 90 |     l = 5
 91 |     fig_size = (PlotConfig.fig_size * figsize_scalar, PlotConfig.fig_size * figsize_scalar)
 92 |     _, ax = plt.subplots(figsize=cm2inch(*fig_size), dpi=PlotConfig.dpi)
 93 |     if figure_num == 1:
 94 |         data = [data]
 95 | 
 96 |     if color_list is not None:
 97 |         for (i, d) in enumerate(data):
 98 |             if mode == "xy":
 99 |                 if i == l - 2:
100 |                     plt.plot(d[0], d[1], linestyle='-.', color=color_list[i])
101 |                 elif i == l - 1:
102 |                     plt.plot(d[0], d[1], linestyle='dotted', color=color_list[i])
103 |                 else:
104 |                     plt.plot(d[0], d[1], color=color_list[i])
105 |             if mode == "y":
106 |                 plt.plot(d, color=color_list[i])
107 |             if mode == "scatter":
108 |                 plt.scatter(d[0], d[1], color=color_list[i], marker=".", s =5.,)
109 |     else:
110 |         for (i, d) in enumerate(data):
111 |             if mode == "xy":
112 |                 if i == l - 2:
113 |                     plt.plot(d[0], d[1],  linestyle='-.')
114 |                 elif i == l - 1:
115 |                     plt.plot(d[0], d[1],  linestyle='dotted')
116 |                 else:
117 |                     plt.plot(d[0], d[1])
118 |             if mode == "y":
119 |                 plt.plot(d)
120 |             if mode == "scatter":
121 |                 plt.scatter(d[0], d[1], marker=".", s =5.,)
122 | 
123 |     plt.tick_params(labelsize=PlotConfig.tick_size)
124 |     labels = ax.get_xticklabels() + ax.get_yticklabels()
125 |     [label.set_fontname(PlotConfig.tick_label_font) for label in labels]
126 |     if legend is not None:
127 |         plt.legend(legend, loc=legend_loc, ncol=ncol, prop=PlotConfig.legend_font)
128 |     plt.xlabel(xlabel, PlotConfig.label_font)
129 |     plt.ylabel(ylabel, PlotConfig.label_font)
130 |     if xlim is not None:
131 |         plt.xlim(xlim)
132 |     if ylim is not None:
133 |         plt.ylim(ylim)
134 |     plt.tight_layout(pad=PlotConfig.pad)
135 | 
136 |     if fname is None:
137 |         plt.show()
138 |     else:
139 |         plt.savefig(fname)
140 | 
141 | 
142 | 


--------------------------------------------------------------------------------
/utils/iDplot.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/iDplot.zip


--------------------------------------------------------------------------------
/utils/road.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mahaitongdae/Approximate-Dynamic-Programming/f03f9469f5018c55f06cf8fea2d5f629abc4078b/utils/road.png


--------------------------------------------------------------------------------