├── img
    ├── screenshot.png
    └── Obstacle Avoidance.gif
├── README.md
├── D3QN_testing_Lidar_grid.py
├── D3QN_training_standard.py
└── D3QN_training_Lidar.py


/img/screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ironteen/Obstacle-Avoidance-in-AirSim/HEAD/img/screenshot.png


--------------------------------------------------------------------------------
/img/Obstacle Avoidance.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Ironteen/Obstacle-Avoidance-in-AirSim/HEAD/img/Obstacle Avoidance.gif


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # A simple Reinforcement Learning Demo for Obstacle Avoidance  of using Microsoft AirSim
 2 | 
 3 | This repository contains Python scripts showing how you can use [Microsoft AirSim](https://github.com/Microsoft/AirSim) to  collect image data from a moving vehicle, then use that data to train the vehicle to avoid obstacles  in TensorFlow. The RL  algorithm we used is D3QN(Double Deep Q Network with Dueling architecture)。
 4 | 
 5 | ![screenshot](https://github.com/Ironteen/Obstacle-Avoidance-in-AirSim/blob/master/img/screenshot.png)
 6 | 
 7 | ## Prerequisites
 8 | 
 9 | - [Recommended hardware](https://wiki.unrealengine.com/Recommended_Hardware) for running UnrealEngine4, required for AirSim. Although it is possible build AirSim on OS X and Linux, we found it easiest to use the pre-compiled Windows binaries.
10 | - This map we show aboved is a simple demo,which was built on Block.
11 | - [Python3](https://www.python.org/ftp/python/3.6.3/python-3.6.3-amd64.exe) for 64-bit Windows
12 | - [TensorFlow](https://www.tensorflow.org/install/install_windows). To run TensorFlow on your GPU as we and most people do, you'll need to follow the [directions](https://www.tensorflow.org/install/install_windows) for installing CUDA and CuDNN. We recommend setting aside at least an hour to make sure you do this right.
13 | 
14 | ## Document
15 | 
16 | - ```
17 |   D3QN_training_standard.py
18 |   ```
19 | 
20 |   This script is a standard Monocular-Obstacle-Avoidance training program. With only a monocular, the moving vehicle can learning to avoid obstacles.
21 | 
22 | - ```
23 |   D3QN_training_Lidar.py
24 |   ```
25 | 
26 |   With a monocular and lidar, the moving vehicle can learning to avoid obstacles more efficiently.
27 | 
28 | - ```
29 |   D3QN_testing_Lidar_grid.py
30 |   ```
31 | 
32 |    When the vehicle is well trained,  you can run this test program. When running,  the car records the explored space in a grid map simultaneously, which will be saved as .pkl file in the same path.
33 | 
34 | 
35 | ## Instructions
36 | 
37 | 1. Clone this repository.
38 | 
39 | 2. Open or build a map, set the SimMode:"Car"  in the setting.json, and then run it.
40 | 
41 | 3. Choose a train model and modified the destination coordinate, then run 
42 | 
43 |    ```
44 |    python 3QN_training_standard.py or python D3QN_training_Lidar.py
45 |    ```
46 | 
47 |    It will take a long time. if you choose training your car with lidar, it will be more efficient.
48 | 
49 | 4. when you find the moving vehicle trained well, then run
50 | 
51 |    ```
52 |    python D3QN_testing_Lidar_grid.py
53 |    ```
54 | 
55 |    The car is testing without lidar for the lidar is just a auxiliary tools in training task
56 | 
57 | 5. It's a simple demo for Obstacle Avoidance with D3QN, you can change the structure with DDPG or A3C quite easily.
58 | 
59 | ## show our result
60 | 
61 | ![Obstacle-Avoidance](https://github.com/Ironteen/Obstacle-Avoidance-in-AirSim/blob/master/img/Obstacle%20Avoidance.gif)
62 | 
63 | The average steps in a episode
64 | 
65 | | methods             | Steps |
66 | | ------------------- | ----- |
67 | | D3QN training       | 850   |
68 | | D3QN+Lidar training | 3200  |
69 | | Test without Lidar  | 2185  |
70 | 
71 | Note: When the moving vehicle reached the destination or has collided, a episode is over.
72 | 
73 | ## Acknowledgement
74 | 
75 | This code repository is highly inspired from work of  Linhai Xie, Sen Wang, Niki trigoni, Andrew Markham 
76 | 
77 | [[link\]]: https://github.com/xie9187/Monocular-Obstacle-Avoidance
78 | 


--------------------------------------------------------------------------------
/D3QN_testing_Lidar_grid.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import setup_path
  3 | import airsim
  4 | from collections import deque
  5 | 
  6 | import random
  7 | import numpy as np
  8 | import time
  9 | import os
 10 | import pickle
 11 | # basic setting
 12 | ACTION_NUMS = 13        # number of valid actions
 13 | MAX_EPISODE = 20000
 14 | DEPTH_IMAGE_WIDTH = 256
 15 | DEPTH_IMAGE_HEIGHT = 144
 16 | 
 17 | flatten_len = 9216      # the input shape before full connect layer
 18 | NumBufferFrames = 4     # take the latest 4 frames as input
 19 | 
 20 | def variable_summaries(var):
 21 |     """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
 22 |     with tf.name_scope('summaries'):
 23 |         mean = tf.reduce_mean(var)
 24 |     tf.summary.scalar('mean', mean)
 25 |     with tf.name_scope('stddev'):
 26 |         stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
 27 |     tf.summary.scalar('stddev', stddev)
 28 |     tf.summary.scalar('max', tf.reduce_max(var))
 29 |     tf.summary.scalar('min', tf.reduce_min(var))
 30 |     tf.summary.histogram('histogram', var)
 31 | 
 32 | def weight_variable(shape):
 33 |     initial = tf.truncated_normal(shape, stddev=0.01)
 34 |     return tf.Variable(initial, name="weights")
 35 | 
 36 | def bias_variable(shape):
 37 |     initial = tf.constant(0., shape=shape)
 38 |     return tf.Variable(initial, name="bias")
 39 | 
 40 | def conv2d(x, W, stride_h, stride_w):
 41 |     return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME")
 42 | 
 43 | class Deep_Q_Network(object):
 44 |     """docstring for ClassName"""
 45 |     def __init__(self, sess):
 46 |         # network weights and biases
 47 |         # input 144x256x4
 48 |         with tf.name_scope("Conv1"):
 49 |             W_conv1 = weight_variable([8, 8, NumBufferFrames, 32])
 50 |             variable_summaries(W_conv1)
 51 |             b_conv1 = bias_variable([32])
 52 |         with tf.name_scope("Conv2"):
 53 |             W_conv2 = weight_variable([4, 4, 32, 64])
 54 |             variable_summaries(W_conv2)
 55 |             b_conv2 = bias_variable([64])
 56 |         with tf.name_scope("Conv3"):
 57 |             W_conv3 = weight_variable([3, 3, 64, 64])
 58 |             variable_summaries(W_conv3)
 59 |             b_conv3 = bias_variable([64])
 60 |         with tf.name_scope("Value_Dense"):
 61 |             W_value = weight_variable([flatten_len, 512])
 62 |             variable_summaries(W_value)
 63 |             b_value = bias_variable([512])
 64 |         with tf.name_scope("FCAdv"):
 65 |             W_adv = weight_variable([flatten_len, 512])
 66 |             variable_summaries(W_adv)
 67 |             b_adv = bias_variable([512])
 68 |         with tf.name_scope("FCValueOut"):
 69 |             W_value_out = weight_variable([512, 1])
 70 |             variable_summaries(W_value_out)
 71 |             b_value_out = bias_variable([1])
 72 |         with tf.name_scope("FCAdvOut"):
 73 |             W_adv_out = weight_variable([512, ACTION_NUMS])
 74 |             variable_summaries(W_adv_out)
 75 |             b_adv_out = bias_variable([ACTION_NUMS])
 76 |         # input layer
 77 |         self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames])
 78 |         # Conv1 layer
 79 |         h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1)
 80 |         # Conv2 layer
 81 |         h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2)
 82 |         # Conv2 layer
 83 |         h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3)
 84 |         h_conv3_flat = tf.layers.flatten(h_conv3)
 85 |         # FC ob value layer
 86 |         h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value)
 87 |         value = tf.matmul(h_fc_value, W_value_out) + b_value_out
 88 |         # FC ob adv layer
 89 |         h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv)
 90 |         advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out
 91 |         # Q = value + (adv - advAvg)
 92 |         advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1)
 93 |         advIdentifiable = tf.subtract(advantage, advAvg)
 94 |         self.readout = tf.add(value, advIdentifiable)
 95 |         # define the cost function
 96 |         self.actions = tf.placeholder("float", [None, ACTION_NUMS])
 97 |         self.y = tf.placeholder("float", [None])
 98 |         self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1)
 99 |         self.td_error = tf.square(self.y - self.readout_action)
100 |         self.cost = tf.reduce_mean(self.td_error)
101 |         self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost)
102 | 
103 | def get_image(client,image_type):
104 |     if (image_type == 'Scene'):
105 |         responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)])
106 |         response = responses[0]
107 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)
108 |         img_rgba = img1d.reshape(response.height, response.width, 4)
109 |         img_rgba = np.flipud(img_rgba)
110 |         observation = img_rgba[:, :, 0:3]
111 |     elif (image_type == 'Segmentation'):
112 |         responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)])
113 |         response = responses[0]
114 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)  # get numpy array
115 |         img_rgba = img1d.reshape(response.height, response.width, 4)  # reshape array to 4 channel image array H X W X 4
116 |         observation = img_rgba[:, :, 0:3]
117 |     elif (image_type == 'DepthPlanner'):
118 |         try:
119 |             responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)])
120 |             response = responses[0]
121 |             img1d = np.array(response.image_data_float, dtype=np.float)
122 |             img1d = img1d * 3.5 + 30
123 |             img1d[img1d > 255] = 255
124 |             img2d = np.reshape(img1d, (responses[0].height, responses[0].width))
125 |             observation = img2d
126 |         except:
127 |             print('######### Error: I can not get a depth image correctly! #########################' )
128 |             observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
129 |     else:
130 |         observation = None
131 |     observation_size = np.shape(observation)
132 |     if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH):
133 |         return observation
134 |     else:
135 |         print('######### Error: The depth image shape: ',observation_size)
136 |         return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
137 | 
138 | def env_feedback(client,map_grid,latest_distance):
139 |     terminal_position = [-130, -210]
140 |     collision_info = client.simGetCollisionInfo()
141 |     car_position = client.getCarState().kinematics_estimated.position
142 |     distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2)
143 |     if collision_info.has_collided:
144 |         reset = True
145 |     else:
146 |         x_val, y_val = car_position.x_val, car_position.y_val
147 |         x_val, y_val = int(np.abs(x_val * 2)), int(np.abs(y_val * 2))
148 |         map_grid[x_val, y_val] = 1 / (distance + 0.001)
149 |         reset = False
150 |     if (distance <= 10):
151 |         terminal = 1
152 |         reset = True
153 |     else:
154 |         terminal = 0
155 |     if distance<latest_distance:
156 |         latest_distance = distance
157 |     return terminal,reset,distance,map_grid,latest_distance
158 | 
159 | def excute_action(client,car_controls,steer):
160 |     car_controls.steering = steer
161 |     car_speed = client.getCarState().speed
162 |     straight_range = [-0.1,0.1]
163 |     # when swerved wildly, slow down
164 |     BigTurn_range = [-0.6,0.6]
165 |     if(steer>=straight_range[0] and steer<=straight_range[1]):
166 |         car_controls.throttle = car_controls.throttle + 0.1
167 |         car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle
168 |     elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]):
169 |         car_controls.throttle = 0.5
170 |     else:
171 |         car_controls.throttle = 1
172 |     car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle
173 |     return car_controls
174 | 
175 | def print_action(episode,t,car_controls,distance,steer):
176 |     print('Episode:%05d,Step:%05d '%(episode,t),end='')
177 |     if(steer<0):
178 |         print('The car is turning left    , ',end='')
179 |     elif(steer>0):
180 |         print('The car is turning right   , ',end='')
181 |     else:
182 |         print('The car is going straightly, ',end='')
183 |     print('throttle=%.1f,  steer=%.3f, distance=%.2f'%(car_controls.throttle,steer,distance),end='')
184 | 
185 | def testNetwork():
186 |     client = airsim.CarClient()
187 |     client.confirmConnection()
188 |     print('Connect succcefully！')
189 |     client.enableApiControl(True)
190 |     car_controls = airsim.CarControls()
191 |     car_controls.throttle = 0.5
192 |     car_controls.steering = 0
193 |     client.reset()
194 |     print('Environment initialized!')
195 |     # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
196 |     # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
197 |     sess = tf.InteractiveSession()
198 |     with tf.name_scope("TargetNetwork"):
199 |         Q_net = Deep_Q_Network(sess)
200 |     time.sleep(1)
201 | 
202 |     reward_var = tf.Variable(0., trainable=False)
203 |     tf.summary.scalar('reward', reward_var)
204 |     # define summary
205 |     merged_summary = tf.summary.merge_all()
206 |     summary_writer = tf.summary.FileWriter('./logs', sess.graph)
207 |     # get the first state
208 |     observe_init = get_image(client,'DepthPlanner')
209 |     state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2)
210 |     # saving and loading networks
211 |     trainables = tf.trainable_variables()
212 |     trainable_saver = tf.train.Saver(trainables)
213 |     sess.run(tf.global_variables_initializer())
214 |     checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model_lidar/")
215 |     print('checkpoint:', checkpoint)
216 |     if checkpoint and checkpoint.model_checkpoint_path:
217 |         trainable_saver.restore(sess, checkpoint.model_checkpoint_path)
218 |         print("Successfully loaded:", checkpoint.model_checkpoint_path)
219 |     else:
220 |         if not os.path.exists("saved_networks/new_model_lidar"):
221 |             os.mkdir("saved_networks/new_model_lidar")
222 |             print('The file not exists, is created successfully')
223 |         print("Could not find old network weights")
224 |     # start training
225 |     episode = 1
226 |     print('Number of trainable variables:', len(trainables))
227 |     inner_loop_time_start = time.time()
228 |     terminal_record = 0
229 |     collision_record = 0
230 |     steps_record = []
231 |     map_grid = -1 * np.ones([500, 500])
232 |     while episode < MAX_EPISODE:
233 |         step = 1
234 |         reset = False
235 |         latest_distance = 300
236 |         while not reset:
237 |             # take the latest 4 frames as an input
238 |             observe = get_image(client,'DepthPlanner')
239 |             observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1))
240 |             state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2)
241 |             terminal,reset,distance,map_grid,latest_distance= env_feedback(client,map_grid,latest_distance)
242 |             # store the experience
243 |             state_pre = state_current
244 |             # choose an action epsilon greedily
245 |             actions = sess.run(Q_net.readout, feed_dict={Q_net.state: [state_current]})
246 |             readout_t = actions[0]
247 |             action_current = np.zeros([ACTION_NUMS])
248 |             # fill the reply experience
249 |             action_index = np.argmax(readout_t)
250 |             action_current[action_index] = 1
251 |             # Control the agent
252 |             side_num = int(ACTION_NUMS - 1) // 2
253 |             steer = float((action_index - side_num) / side_num)
254 |             inner_loop_time_end = time.time()
255 |             car_controls = excute_action(client,car_controls,steer)
256 |             client.setCarControls(car_controls)
257 |             print_action(episode, step, car_controls, distance, steer)
258 |             print(',inner loop=%.4fs\n'%(inner_loop_time_end-inner_loop_time_start),end='')
259 |             inner_loop_time_start = time.time()
260 |             time.sleep(0.5)
261 |             step = step+1
262 |         steps_record.append(step-1)
263 |         map_path = './test_result/grid_map_%d.pkl' % episode
264 |         if not os.path.exists('./test_result'):
265 |             os.mkdir('./test_result')
266 |         map_file = open(map_path, 'wb')
267 |         pickle.dump(map_grid, map_file)
268 |         map_file.close()
269 |         episode = episode + 1
270 |         if(terminal==1):
271 |             terminal_record +=1
272 |         else:
273 |             collision_record +=1
274 |         print('terminal nums=%d,collision num=%d,average steps is %04d,the shortest distance=%.2f'%(terminal_record,collision_record,int(np.mean(steps_record)),latest_distance))
275 |         client.reset()
276 | 
277 | def main():
278 |     testNetwork()
279 | 
280 | if __name__ == "__main__":
281 |     main()
282 | 


--------------------------------------------------------------------------------
/D3QN_training_standard.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import airsim
  3 | from collections import deque
  4 | 
  5 | import random
  6 | import numpy as np
  7 | import time
  8 | import os
  9 | import pickle
 10 | # basic setting
 11 | ACTION_NUMS = 13        # number of valid actions
 12 | GAMMA = 0.99            # decay rate of past observations
 13 | OBSERVE = 50            # time steps to observe before training
 14 | EXPLORE = 20000.        # frames over which to anneal epsilon
 15 | FINAL_EPSILON = 0.0001  # final value of epsilon
 16 | INITIAL_EPSILON = 0.1   # starting value of epsilon
 17 | EPSILON_DECAY_START = 20
 18 | MEMORY_SIZE = 20000     # number of previous transitions to remember
 19 | MINI_BATCH = 32         # size of mini batch
 20 | MAX_EPISODE = 20000
 21 | DEPTH_IMAGE_WIDTH = 256
 22 | DEPTH_IMAGE_HEIGHT = 144
 23 | 
 24 | TAU = 0.001             # Rate to update target network toward primary network
 25 | flatten_len = 9216      # the input shape before full connect layer
 26 | NumBufferFrames = 4     # take the latest 4 frames as input
 27 | 
 28 | def variable_summaries(var):
 29 |     """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
 30 |     with tf.name_scope('summaries'):
 31 |         mean = tf.reduce_mean(var)
 32 |     tf.summary.scalar('mean', mean)
 33 |     with tf.name_scope('stddev'):
 34 |         stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
 35 |     tf.summary.scalar('stddev', stddev)
 36 |     tf.summary.scalar('max', tf.reduce_max(var))
 37 |     tf.summary.scalar('min', tf.reduce_min(var))
 38 |     tf.summary.histogram('histogram', var)
 39 | 
 40 | def weight_variable(shape):
 41 |     initial = tf.truncated_normal(shape, stddev=0.01)
 42 |     return tf.Variable(initial, name="weights")
 43 | 
 44 | def bias_variable(shape):
 45 |     initial = tf.constant(0., shape=shape)
 46 |     return tf.Variable(initial, name="bias")
 47 | 
 48 | def conv2d(x, W, stride_h, stride_w):
 49 |     return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME")
 50 | 
 51 | class Deep_Q_Network(object):
 52 |     """docstring for ClassName"""
 53 |     def __init__(self, sess):
 54 |         # network weights and biases
 55 |         # input 144x256x4
 56 |         with tf.name_scope("Conv1"):
 57 |             W_conv1 = weight_variable([8, 8, NumBufferFrames, 32])
 58 |             variable_summaries(W_conv1)
 59 |             b_conv1 = bias_variable([32])
 60 |         with tf.name_scope("Conv2"):
 61 |             W_conv2 = weight_variable([4, 4, 32, 64])
 62 |             variable_summaries(W_conv2)
 63 |             b_conv2 = bias_variable([64])
 64 |         with tf.name_scope("Conv3"):
 65 |             W_conv3 = weight_variable([3, 3, 64, 64])
 66 |             variable_summaries(W_conv3)
 67 |             b_conv3 = bias_variable([64])
 68 |         with tf.name_scope("Value_Dense"):
 69 |             W_value = weight_variable([flatten_len, 512])
 70 |             variable_summaries(W_value)
 71 |             b_value = bias_variable([512])
 72 |         with tf.name_scope("FCAdv"):
 73 |             W_adv = weight_variable([flatten_len, 512])
 74 |             variable_summaries(W_adv)
 75 |             b_adv = bias_variable([512])
 76 |         with tf.name_scope("FCValueOut"):
 77 |             W_value_out = weight_variable([512, 1])
 78 |             variable_summaries(W_value_out)
 79 |             b_value_out = bias_variable([1])
 80 |         with tf.name_scope("FCAdvOut"):
 81 |             W_adv_out = weight_variable([512, ACTION_NUMS])
 82 |             variable_summaries(W_adv_out)
 83 |             b_adv_out = bias_variable([ACTION_NUMS])
 84 |         # input layer
 85 |         self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames])
 86 |         # Conv1 layer
 87 |         h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1)
 88 |         # Conv2 layer
 89 |         h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2)
 90 |         # Conv2 layer
 91 |         h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3)
 92 |         h_conv3_flat = tf.layers.flatten(h_conv3)
 93 |         # FC ob value layer
 94 |         h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value)
 95 |         value = tf.matmul(h_fc_value, W_value_out) + b_value_out
 96 |         # FC ob adv layer
 97 |         h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv)
 98 |         advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out
 99 |         # Q = value + (adv - advAvg)
100 |         advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1)
101 |         advIdentifiable = tf.subtract(advantage, advAvg)
102 |         self.readout = tf.add(value, advIdentifiable)
103 |         # define the cost function
104 |         self.actions = tf.placeholder("float", [None, ACTION_NUMS])
105 |         self.y = tf.placeholder("float", [None])
106 |         self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1)
107 |         self.td_error = tf.square(self.y - self.readout_action)
108 |         self.cost = tf.reduce_mean(self.td_error)
109 |         self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost)
110 | 
111 | def updateTargetGraph(tfVars, tau):
112 |     total_vars = len(tfVars)
113 |     op_holder = []
114 |     for idx, var in enumerate(tfVars[0:total_vars // 2]):
115 |         op_holder.append(tfVars[idx + total_vars // 2].assign(
116 |             (var.value() * tau) + ((1 - tau) * tfVars[idx + total_vars // 2].value())))
117 |     return op_holder
118 | 
119 | def updateTarget(op_holder, sess):
120 |     for op in op_holder:
121 |         sess.run(op)
122 | 
123 | def get_image(client,image_type):
124 |     if (image_type == 'Scene'):
125 |         responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)])
126 |         response = responses[0]
127 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)
128 |         img_rgba = img1d.reshape(response.height, response.width, 4)
129 |         img_rgba = np.flipud(img_rgba)
130 |         observation = img_rgba[:, :, 0:3]
131 |     elif (image_type == 'Segmentation'):
132 |         responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)])
133 |         response = responses[0]
134 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)  # get numpy array
135 |         img_rgba = img1d.reshape(response.height, response.width, 4)  # reshape array to 4 channel image array H X W X 4
136 |         observation = img_rgba[:, :, 0:3]
137 |     elif (image_type == 'DepthPlanner'):
138 |         try:
139 |             responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)])
140 |             response = responses[0]
141 |             img1d = np.array(response.image_data_float, dtype=np.float)
142 |             img1d = img1d * 3.5 + 30
143 |             img1d[img1d > 255] = 255
144 |             img2d = np.reshape(img1d, (responses[0].height, responses[0].width))
145 |             observation = img2d
146 |         except:
147 |             print('######### Error: I can not get a depth image correctly! #########################' )
148 |             observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
149 |     else:
150 |         observation = None
151 |     observation_size = np.shape(observation)
152 |     if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH):
153 |         return observation
154 |     else:
155 |         print('######### Error: The depth image shape: ',observation_size)
156 |         return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
157 | 
158 | def go_back(client,car_controls):
159 |     car_controls.throttle = -0.5
160 |     car_controls.is_manual_gear = True
161 |     car_controls.manual_gear = -1
162 |     car_controls.steering = 0
163 |     client.setCarControls(car_controls)
164 |     time.sleep(2)
165 |     print("############ The car collided, we are taking a step back...... ####################")
166 |     car_controls.is_manual_gear = False  # change back gear to auto
167 |     car_controls.manual_gear = 0
168 |     car_controls.throttle = 0
169 |     car_controls.steering = 0
170 |     car_controls.brake = 1
171 |     client.setCarControls(car_controls)
172 |     time.sleep(1)  # let car drive a bit
173 |     car_controls.brake = 0  # remove brake
174 |     return car_controls
175 | 
176 | def env_feedback(client,pre_position):
177 |     terminal_position = [-130, -210]
178 |     reset = False
179 |     collision_info = client.simGetCollisionInfo()
180 |     car_position = client.getCarState().kinematics_estimated.position
181 |     distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2)
182 |     current_position = [car_position.x_val,car_position.y_val]
183 |     if collision_info.has_collided:
184 |         reward = -100
185 |         reset = True
186 |     else:
187 |         pre_distance = np.sqrt((pre_position[0]-terminal_position[0])**2+(pre_position[1]-terminal_position[1])**2)
188 |         if(distance<=10):
189 |             reward = 20
190 |         elif(distance<pre_distance):
191 |             reward = 20/distance
192 |         else:
193 |             reward = -20 / distance
194 |     if (distance <= 10):
195 |         terminal = 1
196 |         reset = True
197 |     else:
198 |         terminal = 0
199 |     return current_position,distance,reward, terminal,reset
200 | 
201 | def store_transition(replay_experiences,store_or_read):
202 |     store_path = 'replay_experiences.pkl'
203 |     if(store_or_read=='read'):
204 |         if not os.path.exists(store_path) or os.path.getsize(store_path)==0:
205 |             print('Not Found the pkl file!')
206 |             return replay_experiences
207 |         else:
208 |             store_file = open(store_path,'rb')
209 |             replay_experiences = pickle.load(store_file)
210 |             store_file.close()
211 |             memory_len = len(replay_experiences)
212 |             print('Successfully load the replay_experiences.pkl, %05d memory'%memory_len)
213 |             return replay_experiences
214 |     elif(store_or_read=='store'):
215 |         store_file = open(store_path, 'wb')
216 |         pickle.dump(replay_experiences, store_file)
217 |         store_file.close()
218 |         return 1
219 |     else:
220 |         return 0
221 | 
222 | def excute_action(client,car_controls,steer):
223 |     car_controls.steering = steer
224 |     car_speed = client.getCarState().speed
225 |     straight_range = [-0.1,0.1]
226 |     # when swerved wildly, slow down
227 |     BigTurn_range = [-0.6,0.6]
228 |     if(steer>=straight_range[0] and steer<=straight_range[1]):
229 |         car_controls.throttle = car_controls.throttle + 0.1
230 |         car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle
231 |     elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]):
232 |         car_controls.throttle = 0.5
233 |     else:
234 |         car_controls.throttle = 1
235 |     car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle
236 |     return car_controls
237 | 
238 | def print_action(episode,t,car_controls,distance,steer,reward):
239 |     print('Episode:%05d,Step:%05d '%(episode,t),end='')
240 |     if(steer<0):
241 |         print('The car is turning left    , ',end='')
242 |     elif(steer>0):
243 |         print('The car is turning right   , ',end='')
244 |     else:
245 |         print('The car is going straightly, ',end='')
246 |     print('throttle=%.1f,  steer=%.3f,  reward=%.2f, distance=%.2f'%(car_controls.throttle,steer,reward,distance),end='')
247 | 
248 | def trainNetwork():
249 |     client = airsim.CarClient()
250 |     client.confirmConnection()
251 |     print('Connect succcefully！')
252 |     client.enableApiControl(True)
253 |     car_controls = airsim.CarControls()
254 |     car_controls.throttle = 0.5
255 |     car_controls.steering = 0
256 |     client.reset()
257 |     print('Environment initialized!')
258 |     # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
259 |     # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
260 |     sess = tf.InteractiveSession()
261 |     with tf.name_scope("OnlineNetwork"):
262 |         online_net = Deep_Q_Network(sess)
263 |     with tf.name_scope("TargetNetwork"):
264 |         target_net = Deep_Q_Network(sess)
265 |     time.sleep(1)
266 | 
267 |     reward_var = tf.Variable(0., trainable=False)
268 |     tf.summary.scalar('reward', reward_var)
269 |     # define summary
270 |     merged_summary = tf.summary.merge_all()
271 |     summary_writer = tf.summary.FileWriter('./logs', sess.graph)
272 |     # Initialize the buffer
273 |     replay_experiences = deque()
274 |     replay_experiences = store_transition(replay_experiences,'read')
275 |     # get the first state
276 |     observe_init = get_image(client,'DepthPlanner')
277 |     state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2)
278 |     # saving and loading networks
279 |     trainables = tf.trainable_variables()
280 |     trainable_saver = tf.train.Saver(trainables)
281 |     sess.run(tf.global_variables_initializer())
282 |     checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model/")
283 |     print('checkpoint:', checkpoint)
284 |     if checkpoint and checkpoint.model_checkpoint_path:
285 |         trainable_saver.restore(sess, checkpoint.model_checkpoint_path)
286 |         print("Successfully loaded:", checkpoint.model_checkpoint_path)
287 |     else:
288 |         if not os.path.exists("saved_networks/new_model"):
289 |             os.mkdir("saved_networks/new_model")
290 |             print('The file not exists, is created successfully')
291 |         print("Could not find old network weights")
292 | 
293 |     # start training
294 |     episode = 1
295 |     epsilon = INITIAL_EPSILON
296 |     print('Number of trainable variables:', len(trainables))
297 |     targetOps = updateTargetGraph(trainables, TAU)
298 |     inner_loop_time_start = time.time()
299 |     while episode < MAX_EPISODE:
300 |         reward_episode = 0.
301 |         terminal = 0
302 |         step = 1
303 |         reset = False
304 |         loop_start_time = time.time()
305 |         car_position = client.getCarState().kinematics_estimated.position
306 |         pre_position = [car_position.x_val,car_position.y_val]
307 |         while not reset:
308 |             # take the latest 4 frames as an input
309 |             observe = get_image(client,'DepthPlanner')
310 |             observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1))
311 |             state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2)
312 |             current_position,distance,reward_current, terminal,reset = env_feedback(client,pre_position)
313 |             # store the experience
314 |             if step-1 > 0:
315 |                 replay_experiences.append((state_pre, action_current, reward_current, state_current, terminal))
316 |                 if len(replay_experiences) > MEMORY_SIZE:
317 |                     replay_experiences.popleft()
318 |             state_pre = state_current
319 |             # choose an action epsilon greedily
320 |             actions = sess.run(online_net.readout, feed_dict={online_net.state: [state_current]})
321 |             readout_t = actions[0]
322 |             action_current = np.zeros([ACTION_NUMS])
323 |             # fill the reply experience
324 |             if len(replay_experiences) <= OBSERVE:
325 |                 action_index = random.randrange(ACTION_NUMS)
326 |                 print('episode=%05d,step=%05d,we are observing the env,the action is random......'%(episode,step))
327 |                 action_current[action_index] = 1
328 |             else:
329 |                 if random.random() <= epsilon:
330 |                     print("----------Random Action----------")
331 |                     action_index = random.randrange(ACTION_NUMS)
332 |                     action_current[action_index] = 1
333 |                 else:
334 |                     action_index = np.argmax(readout_t)
335 |                     action_current[action_index] = 1
336 |             # Control the agent
337 |             side_num = int(ACTION_NUMS - 1) // 2
338 |             steer = float((action_index - side_num) / side_num)
339 |             inner_loop_time_end = time.time()
340 |             car_controls = excute_action(client,car_controls,steer)
341 |             client.setCarControls(car_controls)
342 |             print_action(episode, step, car_controls, distance, steer, reward_current)
343 |             print(',experience len=%05d'%len(replay_experiences),end='')
344 |             print(',inner loop=%.4fs'%(inner_loop_time_end-inner_loop_time_start))
345 |             inner_loop_time_start = time.time()
346 |             time.sleep(0.5)
347 |             pre_position = current_position
348 | 
349 |             if episode > OBSERVE:
350 |                 # # sample a minibatch to train on
351 |                 minibatch = random.sample(replay_experiences, MINI_BATCH)
352 |                 y_batch = []
353 |                 # get the batch variables
354 |                 state_pre_batch = [d[0] for d in minibatch]
355 |                 actions_batch = [d[1] for d in minibatch]
356 |                 rewards_batch = [d[2] for d in minibatch]
357 |                 state_current_batch = [d[3] for d in minibatch]
358 |                 Q1 = online_net.readout.eval(feed_dict={online_net.state: state_current_batch})
359 |                 Q2 = target_net.readout.eval(feed_dict={target_net.state: state_current_batch})
360 |                 for i in range(0, len(minibatch)):
361 |                     terminal_batch = minibatch[i][4]
362 |                     # if terminal, only equals reward
363 |                     if terminal_batch:
364 |                         y_batch.append(rewards_batch[i])
365 |                     else:
366 |                         y_batch.append(rewards_batch[i] + GAMMA * Q2[i, np.argmax(Q1[i])])
367 | 
368 |                 # Update the network with our target values.
369 |                 online_net.train_step.run(feed_dict={online_net.y: y_batch,
370 |                                                      online_net.actions: actions_batch,
371 |                                                      online_net.state: state_pre_batch})
372 |                 updateTarget(targetOps, sess)  # Set the target network to be equal to the primary network.
373 | 
374 |             reward_episode = reward_episode + reward_current
375 |             step = step+1
376 |             # scale down epsilon
377 |             if epsilon > FINAL_EPSILON and episode > EPSILON_DECAY_START:
378 |                 epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE
379 |         # save progress every 20 episodes and write summaries
380 |         if (episode % 5 == 0):
381 |             trainable_saver.save(sess, "saved_networks/new_model/Simply_maze",global_step=episode,write_state=True)
382 |             print('######## The mode has been saved successfully after %d episodes ##########' % episode)
383 |             #  write summaries
384 |             summary_str = sess.run(merged_summary, feed_dict={reward_var: reward_episode})
385 |             summary_writer.add_summary(summary_str, episode)
386 |         if(len(replay_experiences)<2500):
387 |             signal_back = store_transition(replay_experiences, 'store')
388 |             if signal_back:
389 |                 print('######## The replay experiences has been saved successfully after %d episodes ##########' % episode)
390 |             else:
391 |                 print('######## Warning: the replay experiences can not be saved after %d episodes ##########' % episode)
392 | 
393 |         loop_end_time = time.time()
394 |         loop_time = loop_end_time - loop_start_time
395 |         print("EPISODE", episode, "/ REWARD", reward_episode, "/ steps ", step, "/ LoopTime:", loop_time)
396 |         episode = episode + 1
397 |         client.reset()
398 | 
399 | def main():
400 |     trainNetwork()
401 | 
402 | if __name__ == "__main__":
403 |     main()
404 | 


--------------------------------------------------------------------------------
/D3QN_training_Lidar.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import setup_path
  3 | import airsim
  4 | from collections import deque
  5 | 
  6 | import random
  7 | import numpy as np
  8 | import time
  9 | import os
 10 | import pickle
 11 | # basic setting
 12 | ACTION_NUMS = 13        # number of valid actions
 13 | GAMMA = 0.99            # decay rate of past observations
 14 | OBSERVE = 50            # time steps to observe before training
 15 | EXPLORE = 20000.        # frames over which to anneal epsilon
 16 | FINAL_EPSILON = 0.0001  # final value of epsilon
 17 | INITIAL_EPSILON = 0.1   # starting value of epsilon
 18 | EPSILON_DECAY_START = 20
 19 | MEMORY_SIZE = 20000     # number of previous transitions to remember
 20 | MINI_BATCH = 32         # size of mini batch
 21 | MAX_EPISODE = 20000
 22 | DEPTH_IMAGE_WIDTH = 256
 23 | DEPTH_IMAGE_HEIGHT = 144
 24 | 
 25 | TAU = 0.001             # Rate to update target network toward primary network
 26 | flatten_len = 9216      # the input shape before full connect layer
 27 | NumBufferFrames = 4     # take the latest 4 frames as input
 28 | 
 29 | def variable_summaries(var):
 30 |     """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
 31 |     with tf.name_scope('summaries'):
 32 |         mean = tf.reduce_mean(var)
 33 |     tf.summary.scalar('mean', mean)
 34 |     with tf.name_scope('stddev'):
 35 |         stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
 36 |     tf.summary.scalar('stddev', stddev)
 37 |     tf.summary.scalar('max', tf.reduce_max(var))
 38 |     tf.summary.scalar('min', tf.reduce_min(var))
 39 |     tf.summary.histogram('histogram', var)
 40 | 
 41 | def weight_variable(shape):
 42 |     initial = tf.truncated_normal(shape, stddev=0.01)
 43 |     return tf.Variable(initial, name="weights")
 44 | 
 45 | def bias_variable(shape):
 46 |     initial = tf.constant(0., shape=shape)
 47 |     return tf.Variable(initial, name="bias")
 48 | 
 49 | def conv2d(x, W, stride_h, stride_w):
 50 |     return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME")
 51 | 
 52 | class Deep_Q_Network(object):
 53 |     """docstring for ClassName"""
 54 |     def __init__(self, sess):
 55 |         # network weights and biases
 56 |         # input 144x256x4
 57 |         with tf.name_scope("Conv1"):
 58 |             W_conv1 = weight_variable([8, 8, NumBufferFrames, 32])
 59 |             variable_summaries(W_conv1)
 60 |             b_conv1 = bias_variable([32])
 61 |         with tf.name_scope("Conv2"):
 62 |             W_conv2 = weight_variable([4, 4, 32, 64])
 63 |             variable_summaries(W_conv2)
 64 |             b_conv2 = bias_variable([64])
 65 |         with tf.name_scope("Conv3"):
 66 |             W_conv3 = weight_variable([3, 3, 64, 64])
 67 |             variable_summaries(W_conv3)
 68 |             b_conv3 = bias_variable([64])
 69 |         with tf.name_scope("Value_Dense"):
 70 |             W_value = weight_variable([flatten_len, 512])
 71 |             variable_summaries(W_value)
 72 |             b_value = bias_variable([512])
 73 |         with tf.name_scope("FCAdv"):
 74 |             W_adv = weight_variable([flatten_len, 512])
 75 |             variable_summaries(W_adv)
 76 |             b_adv = bias_variable([512])
 77 |         with tf.name_scope("FCValueOut"):
 78 |             W_value_out = weight_variable([512, 1])
 79 |             variable_summaries(W_value_out)
 80 |             b_value_out = bias_variable([1])
 81 |         with tf.name_scope("FCAdvOut"):
 82 |             W_adv_out = weight_variable([512, ACTION_NUMS])
 83 |             variable_summaries(W_adv_out)
 84 |             b_adv_out = bias_variable([ACTION_NUMS])
 85 |         # input layer
 86 |         self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames])
 87 |         # Conv1 layer
 88 |         h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1)
 89 |         # Conv2 layer
 90 |         h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2)
 91 |         # Conv2 layer
 92 |         h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3)
 93 |         h_conv3_flat = tf.layers.flatten(h_conv3)
 94 |         # FC ob value layer
 95 |         h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value)
 96 |         value = tf.matmul(h_fc_value, W_value_out) + b_value_out
 97 |         # FC ob adv layer
 98 |         h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv)
 99 |         advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out
100 |         # Q = value + (adv - advAvg)
101 |         advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1)
102 |         advIdentifiable = tf.subtract(advantage, advAvg)
103 |         self.readout = tf.add(value, advIdentifiable)
104 |         # define the cost function
105 |         self.actions = tf.placeholder("float", [None, ACTION_NUMS])
106 |         self.y = tf.placeholder("float", [None])
107 |         self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1)
108 |         self.td_error = tf.square(self.y - self.readout_action)
109 |         self.cost = tf.reduce_mean(self.td_error)
110 |         self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost)
111 | 
112 | def updateTargetGraph(tfVars, tau):
113 |     total_vars = len(tfVars)
114 |     op_holder = []
115 |     for idx, var in enumerate(tfVars[0:total_vars // 2]):
116 |         op_holder.append(tfVars[idx + total_vars // 2].assign(
117 |             (var.value() * tau) + ((1 - tau) * tfVars[idx + total_vars // 2].value())))
118 |     return op_holder
119 | 
120 | def updateTarget(op_holder, sess):
121 |     for op in op_holder:
122 |         sess.run(op)
123 | 
124 | def get_image(client,image_type):
125 |     if (image_type == 'Scene'):
126 |         responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)])
127 |         response = responses[0]
128 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)
129 |         img_rgba = img1d.reshape(response.height, response.width, 4)
130 |         img_rgba = np.flipud(img_rgba)
131 |         observation = img_rgba[:, :, 0:3]
132 |     elif (image_type == 'Segmentation'):
133 |         responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)])
134 |         response = responses[0]
135 |         img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8)  # get numpy array
136 |         img_rgba = img1d.reshape(response.height, response.width, 4)  # reshape array to 4 channel image array H X W X 4
137 |         observation = img_rgba[:, :, 0:3]
138 |     elif (image_type == 'DepthPlanner'):
139 |         try:
140 |             responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)])
141 |             response = responses[0]
142 |             img1d = np.array(response.image_data_float, dtype=np.float)
143 |             img1d = img1d * 3.5 + 30
144 |             img1d[img1d > 255] = 255
145 |             img2d = np.reshape(img1d, (responses[0].height, responses[0].width))
146 |             observation = img2d
147 |         except:
148 |             print('######### Error: I can not get a depth image correctly! #########################' )
149 |             observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
150 |     else:
151 |         observation = None
152 |     observation_size = np.shape(observation)
153 |     if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH):
154 |         return observation
155 |     else:
156 |         print('######### Error: The depth image shape: ',observation_size)
157 |         return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH])
158 | 
159 | def lidar_auxiliary(client,car_controls):
160 |     lidarData = client.getLidarData()
161 |     if (len(lidarData.point_cloud) < 3):
162 |         print("##### No obstacles ahead........ ")
163 |     else:
164 |         points = np.array(lidarData.point_cloud, dtype=np.dtype('f4'))
165 |         points = np.reshape(points, (int(points.shape[0] / 3), 3))
166 |         x_points, y_points = np.array(points[:, 1]),np.array(points[:, 0])
167 |         zeros_index = np.argwhere(x_points >= -0.1)
168 |         if (len(zeros_index) <= 1):
169 |             if(car_controls.steering<=0):
170 |                 car_controls.steering = 0.5
171 |                 print('##### Obstacle ahead, please turn right........')
172 |             else:
173 |                 print('##### Obstacle ahead, operate correctly........')
174 |         elif (len(zeros_index) >= len(x_points) - 1 or y_points[zeros_index[0] - 1] >= y_points[zeros_index[0] + 1]):
175 |             if (car_controls.steering >= 0):
176 |                 car_controls.steering = -0.5
177 |                 print('##### Obstacle ahead, please turn left........')
178 |             else:
179 |                 print('##### Obstacle ahead, operate correctly........')
180 |         elif (y_points[zeros_index[0] - 1] >= y_points[zeros_index[0] + 1]):
181 |             if (car_controls.steering >= 0):
182 |                 car_controls.steering = -0.5
183 |                 print('##### Obstacle ahead, please turn left........')
184 |             else:
185 |                 print('##### Obstacle ahead, operate correctly........')
186 |         else:
187 |             if (car_controls.steering <= 0):
188 |                 car_controls.steering = 0.5
189 |                 print('##### Obstacle ahead, please turn right........')
190 |             else:
191 |                 print('##### Obstacle ahead, operate correctly........')
192 |     return car_controls
193 | 
194 | def env_feedback(client,pre_position):
195 |     terminal_position = [-130, -210]
196 |     reset = False
197 |     collision_info = client.simGetCollisionInfo()
198 |     car_position = client.getCarState().kinematics_estimated.position
199 |     distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2)
200 |     current_position = [car_position.x_val,car_position.y_val]
201 |     if collision_info.has_collided:
202 |         reward = -100
203 |         reset = True
204 |     else:
205 |         pre_distance = np.sqrt((pre_position[0]-terminal_position[0])**2+(pre_position[1]-terminal_position[1])**2)
206 |         if(distance<=10):
207 |             reward = 20
208 |         elif(distance<pre_distance):
209 |             reward = 20/distance
210 |         else:
211 |             reward = -20 / distance
212 |     if (distance <= 10):
213 |         terminal = 1
214 |         reset = True
215 |     else:
216 |         terminal = 0
217 |     return current_position,distance,reward, terminal,reset
218 | 
219 | def store_transition(replay_experiences,store_or_read):
220 |     store_path = 'replay_experiences.pkl'
221 |     if(store_or_read=='read'):
222 |         if not os.path.exists(store_path) or os.path.getsize(store_path)==0:
223 |             print('Not Found the pkl file!')
224 |             return replay_experiences
225 |         else:
226 |             store_file = open(store_path,'rb')
227 |             replay_experiences = pickle.load(store_file)
228 |             store_file.close()
229 |             memory_len = len(replay_experiences)
230 |             print('Successfully load the replay_experiences.pkl, %05d memory'%memory_len)
231 |             return replay_experiences
232 |     elif(store_or_read=='store'):
233 |         store_file = open(store_path, 'wb')
234 |         pickle.dump(replay_experiences, store_file)
235 |         store_file.close()
236 |         return 1
237 |     else:
238 |         return 0
239 | 
240 | def excute_action(client,car_controls,steer):
241 |     car_controls.steering = steer
242 |     car_speed = client.getCarState().speed
243 |     straight_range = [-0.1,0.1]
244 |     # when swerved wildly, slow down
245 |     BigTurn_range = [-0.6,0.6]
246 |     if(steer>=straight_range[0] and steer<=straight_range[1]):
247 |         car_controls.throttle = car_controls.throttle + 0.1
248 |         car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle
249 |     elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]):
250 |         car_controls.throttle = 0.5
251 |     else:
252 |         car_controls.throttle = 1
253 |     car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle
254 |     return car_controls
255 | 
256 | def print_action(episode,t,car_controls,distance,steer,reward):
257 |     print('Episode:%05d,Step:%05d '%(episode,t),end='')
258 |     if(steer<0):
259 |         print('The car is turning left    , ',end='')
260 |     elif(steer>0):
261 |         print('The car is turning right   , ',end='')
262 |     else:
263 |         print('The car is going straightly, ',end='')
264 |     print('throttle=%.1f,  steer=%.3f,  reward=%.2f, distance=%.2f'%(car_controls.throttle,steer,reward,distance),end='')
265 | 
266 | def trainNetwork():
267 |     client = airsim.CarClient()
268 |     client.confirmConnection()
269 |     print('Connect succcefully！')
270 |     client.enableApiControl(True)
271 |     car_controls = airsim.CarControls()
272 |     car_controls.throttle = 0.5
273 |     car_controls.steering = 0
274 |     client.reset()
275 |     print('Environment initialized!')
276 |     # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
277 |     # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
278 |     sess = tf.InteractiveSession()
279 |     with tf.name_scope("OnlineNetwork"):
280 |         online_net = Deep_Q_Network(sess)
281 |     with tf.name_scope("TargetNetwork"):
282 |         target_net = Deep_Q_Network(sess)
283 |     time.sleep(1)
284 | 
285 |     reward_var = tf.Variable(0., trainable=False)
286 |     tf.summary.scalar('reward', reward_var)
287 |     # define summary
288 |     merged_summary = tf.summary.merge_all()
289 |     summary_writer = tf.summary.FileWriter('./logs', sess.graph)
290 |     # Initialize the buffer
291 |     replay_experiences = deque()
292 |     replay_experiences = store_transition(replay_experiences,'read')
293 |     # get the first state
294 |     observe_init = get_image(client,'DepthPlanner')
295 |     state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2)
296 |     # saving and loading networks
297 |     trainables = tf.trainable_variables()
298 |     trainable_saver = tf.train.Saver(trainables)
299 |     sess.run(tf.global_variables_initializer())
300 |     checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model_lidar/")
301 |     print('checkpoint:', checkpoint)
302 |     if checkpoint and checkpoint.model_checkpoint_path:
303 |         trainable_saver.restore(sess, checkpoint.model_checkpoint_path)
304 |         print("Successfully loaded:", checkpoint.model_checkpoint_path)
305 |     else:
306 |         if not os.path.exists("saved_networks/new_model_lidar"):
307 |             os.mkdir("saved_networks/new_model_lidar")
308 |             print('The file not exists, is created successfully')
309 |         print("Could not find old network weights")
310 | 
311 |     # start training
312 |     episode = 1
313 |     epsilon = INITIAL_EPSILON
314 |     print('Number of trainable variables:', len(trainables))
315 |     targetOps = updateTargetGraph(trainables, TAU)
316 |     inner_loop_time_start = time.time()
317 |     while episode < MAX_EPISODE:
318 |         reward_episode = 0.
319 |         terminal = 0
320 |         step = 1
321 |         reset = False
322 |         loop_start_time = time.time()
323 |         car_position = client.getCarState().kinematics_estimated.position
324 |         pre_position = [car_position.x_val,car_position.y_val]
325 |         while not reset:
326 |             # take the latest 4 frames as an input
327 |             observe = get_image(client,'DepthPlanner')
328 |             observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1))
329 |             state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2)
330 |             current_position,distance,reward_current, terminal,reset = env_feedback(client,pre_position)
331 |             # store the experience
332 |             if step-1 > 0:
333 |                 replay_experiences.append((state_pre, action_current, reward_current, state_current, terminal))
334 |                 if len(replay_experiences) > MEMORY_SIZE:
335 |                     replay_experiences.popleft()
336 |             state_pre = state_current
337 |             # choose an action epsilon greedily
338 |             actions = sess.run(online_net.readout, feed_dict={online_net.state: [state_current]})
339 |             readout_t = actions[0]
340 |             action_current = np.zeros([ACTION_NUMS])
341 |             # fill the reply experience
342 |             if len(replay_experiences) <= OBSERVE:
343 |                 action_index = random.randrange(ACTION_NUMS)
344 |                 print('episode=%05d,step=%05d,we are observing the env,the action is random......'%(episode,step))
345 |                 action_current[action_index] = 1
346 |             else:
347 |                 if random.random() <= epsilon:
348 |                     print("----------Random Action----------")
349 |                     action_index = random.randrange(ACTION_NUMS)
350 |                     action_current[action_index] = 1
351 |                 else:
352 |                     action_index = np.argmax(readout_t)
353 |                     action_current[action_index] = 1
354 |             # Control the agent
355 |             side_num = int(ACTION_NUMS - 1) // 2
356 |             steer = float((action_index - side_num) / side_num)
357 |             inner_loop_time_end = time.time()
358 |             car_controls = excute_action(client,car_controls,steer)
359 |             car_controls = lidar_auxiliary(client,car_controls)
360 |             client.setCarControls(car_controls)
361 |             print_action(episode, step, car_controls, distance, steer, reward_current)
362 |             print(',experience len=%05d'%len(replay_experiences),end='')
363 |             print(',inner loop=%.4fs'%(inner_loop_time_end-inner_loop_time_start))
364 |             inner_loop_time_start = time.time()
365 |             time.sleep(0.5)
366 |             pre_position = current_position
367 | 
368 |             if episode > OBSERVE:
369 |                 # # sample a minibatch to train on
370 |                 minibatch = random.sample(replay_experiences, MINI_BATCH)
371 |                 y_batch = []
372 |                 # get the batch variables
373 |                 state_pre_batch = [d[0] for d in minibatch]
374 |                 actions_batch = [d[1] for d in minibatch]
375 |                 rewards_batch = [d[2] for d in minibatch]
376 |                 state_current_batch = [d[3] for d in minibatch]
377 |                 Q1 = online_net.readout.eval(feed_dict={online_net.state: state_current_batch})
378 |                 Q2 = target_net.readout.eval(feed_dict={target_net.state: state_current_batch})
379 |                 for i in range(0, len(minibatch)):
380 |                     terminal_batch = minibatch[i][4]
381 |                     # if terminal, only equals reward
382 |                     if terminal_batch:
383 |                         y_batch.append(rewards_batch[i])
384 |                     else:
385 |                         y_batch.append(rewards_batch[i] + GAMMA * Q2[i, np.argmax(Q1[i])])
386 | 
387 |                 # Update the network with our target values.
388 |                 online_net.train_step.run(feed_dict={online_net.y: y_batch,
389 |                                                      online_net.actions: actions_batch,
390 |                                                      online_net.state: state_pre_batch})
391 |                 updateTarget(targetOps, sess)  # Set the target network to be equal to the primary network.
392 | 
393 |             reward_episode = reward_episode + reward_current
394 |             step = step+1
395 |             # scale down epsilon
396 |             if epsilon > FINAL_EPSILON and episode > EPSILON_DECAY_START:
397 |                 epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE
398 |         # save progress every 20 episodes and write summaries
399 |         if (episode % 5 == 0):
400 |             trainable_saver.save(sess, "saved_networks/new_model_lidar/Simply_maze",global_step=episode,write_state=True)
401 |             print('######## The mode has been saved successfully after %d episodes ##########' % episode)
402 |             #  write summaries
403 |             summary_str = sess.run(merged_summary, feed_dict={reward_var: reward_episode})
404 |             summary_writer.add_summary(summary_str, episode)
405 |         if(len(replay_experiences)<2500):
406 |             signal_back = store_transition(replay_experiences, 'store')
407 |             if signal_back:
408 |                 print('######## The replay experiences has been saved successfully after %d episodes ##########' % episode)
409 |             else:
410 |                 print('######## Warning: the replay experiences can not be saved after %d episodes ##########' % episode)
411 | 
412 |         loop_end_time = time.time()
413 |         loop_time = loop_end_time - loop_start_time
414 |         print("EPISODE", episode, "/ REWARD", reward_episode, "/ steps ", step, "/ LoopTime:", loop_time)
415 |         episode = episode + 1
416 |         client.reset()
417 | 
418 | def main():
419 |     trainNetwork()
420 | 
421 | if __name__ == "__main__":
422 |     main()
423 | 


--------------------------------------------------------------------------------