├── img ├── screenshot.png └── Obstacle Avoidance.gif ├── README.md ├── D3QN_testing_Lidar_grid.py ├── D3QN_training_standard.py └── D3QN_training_Lidar.py /img/screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ironteen/Obstacle-Avoidance-in-AirSim/HEAD/img/screenshot.png -------------------------------------------------------------------------------- /img/Obstacle Avoidance.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ironteen/Obstacle-Avoidance-in-AirSim/HEAD/img/Obstacle Avoidance.gif -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A simple Reinforcement Learning Demo for Obstacle Avoidance of using Microsoft AirSim 2 | 3 | This repository contains Python scripts showing how you can use [Microsoft AirSim](https://github.com/Microsoft/AirSim) to collect image data from a moving vehicle, then use that data to train the vehicle to avoid obstacles in TensorFlow. The RL algorithm we used is D3QN(Double Deep Q Network with Dueling architecture)。 4 | 5 | ![screenshot](https://github.com/Ironteen/Obstacle-Avoidance-in-AirSim/blob/master/img/screenshot.png) 6 | 7 | ## Prerequisites 8 | 9 | - [Recommended hardware](https://wiki.unrealengine.com/Recommended_Hardware) for running UnrealEngine4, required for AirSim. Although it is possible build AirSim on OS X and Linux, we found it easiest to use the pre-compiled Windows binaries. 10 | - This map we show aboved is a simple demo,which was built on Block. 11 | - [Python3](https://www.python.org/ftp/python/3.6.3/python-3.6.3-amd64.exe) for 64-bit Windows 12 | - [TensorFlow](https://www.tensorflow.org/install/install_windows). To run TensorFlow on your GPU as we and most people do, you'll need to follow the [directions](https://www.tensorflow.org/install/install_windows) for installing CUDA and CuDNN. We recommend setting aside at least an hour to make sure you do this right. 13 | 14 | ## Document 15 | 16 | - ``` 17 | D3QN_training_standard.py 18 | ``` 19 | 20 | This script is a standard Monocular-Obstacle-Avoidance training program. With only a monocular, the moving vehicle can learning to avoid obstacles. 21 | 22 | - ``` 23 | D3QN_training_Lidar.py 24 | ``` 25 | 26 | With a monocular and lidar, the moving vehicle can learning to avoid obstacles more efficiently. 27 | 28 | - ``` 29 | D3QN_testing_Lidar_grid.py 30 | ``` 31 | 32 | When the vehicle is well trained, you can run this test program. When running, the car records the explored space in a grid map simultaneously, which will be saved as .pkl file in the same path. 33 | 34 | 35 | ## Instructions 36 | 37 | 1. Clone this repository. 38 | 39 | 2. Open or build a map, set the SimMode:"Car" in the setting.json, and then run it. 40 | 41 | 3. Choose a train model and modified the destination coordinate, then run 42 | 43 | ``` 44 | python 3QN_training_standard.py or python D3QN_training_Lidar.py 45 | ``` 46 | 47 | It will take a long time. if you choose training your car with lidar, it will be more efficient. 48 | 49 | 4. when you find the moving vehicle trained well, then run 50 | 51 | ``` 52 | python D3QN_testing_Lidar_grid.py 53 | ``` 54 | 55 | The car is testing without lidar for the lidar is just a auxiliary tools in training task 56 | 57 | 5. It's a simple demo for Obstacle Avoidance with D3QN, you can change the structure with DDPG or A3C quite easily. 58 | 59 | ## show our result 60 | 61 | ![Obstacle-Avoidance](https://github.com/Ironteen/Obstacle-Avoidance-in-AirSim/blob/master/img/Obstacle%20Avoidance.gif) 62 | 63 | The average steps in a episode 64 | 65 | | methods | Steps | 66 | | ------------------- | ----- | 67 | | D3QN training | 850 | 68 | | D3QN+Lidar training | 3200 | 69 | | Test without Lidar | 2185 | 70 | 71 | Note: When the moving vehicle reached the destination or has collided, a episode is over. 72 | 73 | ## Acknowledgement 74 | 75 | This code repository is highly inspired from work of Linhai Xie, Sen Wang, Niki trigoni, Andrew Markham 76 | 77 | [[link\]]: https://github.com/xie9187/Monocular-Obstacle-Avoidance 78 | -------------------------------------------------------------------------------- /D3QN_testing_Lidar_grid.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import setup_path 3 | import airsim 4 | from collections import deque 5 | 6 | import random 7 | import numpy as np 8 | import time 9 | import os 10 | import pickle 11 | # basic setting 12 | ACTION_NUMS = 13 # number of valid actions 13 | MAX_EPISODE = 20000 14 | DEPTH_IMAGE_WIDTH = 256 15 | DEPTH_IMAGE_HEIGHT = 144 16 | 17 | flatten_len = 9216 # the input shape before full connect layer 18 | NumBufferFrames = 4 # take the latest 4 frames as input 19 | 20 | def variable_summaries(var): 21 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 22 | with tf.name_scope('summaries'): 23 | mean = tf.reduce_mean(var) 24 | tf.summary.scalar('mean', mean) 25 | with tf.name_scope('stddev'): 26 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 27 | tf.summary.scalar('stddev', stddev) 28 | tf.summary.scalar('max', tf.reduce_max(var)) 29 | tf.summary.scalar('min', tf.reduce_min(var)) 30 | tf.summary.histogram('histogram', var) 31 | 32 | def weight_variable(shape): 33 | initial = tf.truncated_normal(shape, stddev=0.01) 34 | return tf.Variable(initial, name="weights") 35 | 36 | def bias_variable(shape): 37 | initial = tf.constant(0., shape=shape) 38 | return tf.Variable(initial, name="bias") 39 | 40 | def conv2d(x, W, stride_h, stride_w): 41 | return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME") 42 | 43 | class Deep_Q_Network(object): 44 | """docstring for ClassName""" 45 | def __init__(self, sess): 46 | # network weights and biases 47 | # input 144x256x4 48 | with tf.name_scope("Conv1"): 49 | W_conv1 = weight_variable([8, 8, NumBufferFrames, 32]) 50 | variable_summaries(W_conv1) 51 | b_conv1 = bias_variable([32]) 52 | with tf.name_scope("Conv2"): 53 | W_conv2 = weight_variable([4, 4, 32, 64]) 54 | variable_summaries(W_conv2) 55 | b_conv2 = bias_variable([64]) 56 | with tf.name_scope("Conv3"): 57 | W_conv3 = weight_variable([3, 3, 64, 64]) 58 | variable_summaries(W_conv3) 59 | b_conv3 = bias_variable([64]) 60 | with tf.name_scope("Value_Dense"): 61 | W_value = weight_variable([flatten_len, 512]) 62 | variable_summaries(W_value) 63 | b_value = bias_variable([512]) 64 | with tf.name_scope("FCAdv"): 65 | W_adv = weight_variable([flatten_len, 512]) 66 | variable_summaries(W_adv) 67 | b_adv = bias_variable([512]) 68 | with tf.name_scope("FCValueOut"): 69 | W_value_out = weight_variable([512, 1]) 70 | variable_summaries(W_value_out) 71 | b_value_out = bias_variable([1]) 72 | with tf.name_scope("FCAdvOut"): 73 | W_adv_out = weight_variable([512, ACTION_NUMS]) 74 | variable_summaries(W_adv_out) 75 | b_adv_out = bias_variable([ACTION_NUMS]) 76 | # input layer 77 | self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames]) 78 | # Conv1 layer 79 | h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1) 80 | # Conv2 layer 81 | h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2) 82 | # Conv2 layer 83 | h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3) 84 | h_conv3_flat = tf.layers.flatten(h_conv3) 85 | # FC ob value layer 86 | h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value) 87 | value = tf.matmul(h_fc_value, W_value_out) + b_value_out 88 | # FC ob adv layer 89 | h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv) 90 | advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out 91 | # Q = value + (adv - advAvg) 92 | advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1) 93 | advIdentifiable = tf.subtract(advantage, advAvg) 94 | self.readout = tf.add(value, advIdentifiable) 95 | # define the cost function 96 | self.actions = tf.placeholder("float", [None, ACTION_NUMS]) 97 | self.y = tf.placeholder("float", [None]) 98 | self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1) 99 | self.td_error = tf.square(self.y - self.readout_action) 100 | self.cost = tf.reduce_mean(self.td_error) 101 | self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost) 102 | 103 | def get_image(client,image_type): 104 | if (image_type == 'Scene'): 105 | responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)]) 106 | response = responses[0] 107 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) 108 | img_rgba = img1d.reshape(response.height, response.width, 4) 109 | img_rgba = np.flipud(img_rgba) 110 | observation = img_rgba[:, :, 0:3] 111 | elif (image_type == 'Segmentation'): 112 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)]) 113 | response = responses[0] 114 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) # get numpy array 115 | img_rgba = img1d.reshape(response.height, response.width, 4) # reshape array to 4 channel image array H X W X 4 116 | observation = img_rgba[:, :, 0:3] 117 | elif (image_type == 'DepthPlanner'): 118 | try: 119 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)]) 120 | response = responses[0] 121 | img1d = np.array(response.image_data_float, dtype=np.float) 122 | img1d = img1d * 3.5 + 30 123 | img1d[img1d > 255] = 255 124 | img2d = np.reshape(img1d, (responses[0].height, responses[0].width)) 125 | observation = img2d 126 | except: 127 | print('######### Error: I can not get a depth image correctly! #########################' ) 128 | observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 129 | else: 130 | observation = None 131 | observation_size = np.shape(observation) 132 | if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH): 133 | return observation 134 | else: 135 | print('######### Error: The depth image shape: ',observation_size) 136 | return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 137 | 138 | def env_feedback(client,map_grid,latest_distance): 139 | terminal_position = [-130, -210] 140 | collision_info = client.simGetCollisionInfo() 141 | car_position = client.getCarState().kinematics_estimated.position 142 | distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2) 143 | if collision_info.has_collided: 144 | reset = True 145 | else: 146 | x_val, y_val = car_position.x_val, car_position.y_val 147 | x_val, y_val = int(np.abs(x_val * 2)), int(np.abs(y_val * 2)) 148 | map_grid[x_val, y_val] = 1 / (distance + 0.001) 149 | reset = False 150 | if (distance <= 10): 151 | terminal = 1 152 | reset = True 153 | else: 154 | terminal = 0 155 | if distance=straight_range[0] and steer<=straight_range[1]): 166 | car_controls.throttle = car_controls.throttle + 0.1 167 | car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle 168 | elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]): 169 | car_controls.throttle = 0.5 170 | else: 171 | car_controls.throttle = 1 172 | car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle 173 | return car_controls 174 | 175 | def print_action(episode,t,car_controls,distance,steer): 176 | print('Episode:%05d,Step:%05d '%(episode,t),end='') 177 | if(steer<0): 178 | print('The car is turning left , ',end='') 179 | elif(steer>0): 180 | print('The car is turning right , ',end='') 181 | else: 182 | print('The car is going straightly, ',end='') 183 | print('throttle=%.1f, steer=%.3f, distance=%.2f'%(car_controls.throttle,steer,distance),end='') 184 | 185 | def testNetwork(): 186 | client = airsim.CarClient() 187 | client.confirmConnection() 188 | print('Connect succcefully!') 189 | client.enableApiControl(True) 190 | car_controls = airsim.CarControls() 191 | car_controls.throttle = 0.5 192 | car_controls.steering = 0 193 | client.reset() 194 | print('Environment initialized!') 195 | # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7) 196 | # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options)) 197 | sess = tf.InteractiveSession() 198 | with tf.name_scope("TargetNetwork"): 199 | Q_net = Deep_Q_Network(sess) 200 | time.sleep(1) 201 | 202 | reward_var = tf.Variable(0., trainable=False) 203 | tf.summary.scalar('reward', reward_var) 204 | # define summary 205 | merged_summary = tf.summary.merge_all() 206 | summary_writer = tf.summary.FileWriter('./logs', sess.graph) 207 | # get the first state 208 | observe_init = get_image(client,'DepthPlanner') 209 | state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2) 210 | # saving and loading networks 211 | trainables = tf.trainable_variables() 212 | trainable_saver = tf.train.Saver(trainables) 213 | sess.run(tf.global_variables_initializer()) 214 | checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model_lidar/") 215 | print('checkpoint:', checkpoint) 216 | if checkpoint and checkpoint.model_checkpoint_path: 217 | trainable_saver.restore(sess, checkpoint.model_checkpoint_path) 218 | print("Successfully loaded:", checkpoint.model_checkpoint_path) 219 | else: 220 | if not os.path.exists("saved_networks/new_model_lidar"): 221 | os.mkdir("saved_networks/new_model_lidar") 222 | print('The file not exists, is created successfully') 223 | print("Could not find old network weights") 224 | # start training 225 | episode = 1 226 | print('Number of trainable variables:', len(trainables)) 227 | inner_loop_time_start = time.time() 228 | terminal_record = 0 229 | collision_record = 0 230 | steps_record = [] 231 | map_grid = -1 * np.ones([500, 500]) 232 | while episode < MAX_EPISODE: 233 | step = 1 234 | reset = False 235 | latest_distance = 300 236 | while not reset: 237 | # take the latest 4 frames as an input 238 | observe = get_image(client,'DepthPlanner') 239 | observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1)) 240 | state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2) 241 | terminal,reset,distance,map_grid,latest_distance= env_feedback(client,map_grid,latest_distance) 242 | # store the experience 243 | state_pre = state_current 244 | # choose an action epsilon greedily 245 | actions = sess.run(Q_net.readout, feed_dict={Q_net.state: [state_current]}) 246 | readout_t = actions[0] 247 | action_current = np.zeros([ACTION_NUMS]) 248 | # fill the reply experience 249 | action_index = np.argmax(readout_t) 250 | action_current[action_index] = 1 251 | # Control the agent 252 | side_num = int(ACTION_NUMS - 1) // 2 253 | steer = float((action_index - side_num) / side_num) 254 | inner_loop_time_end = time.time() 255 | car_controls = excute_action(client,car_controls,steer) 256 | client.setCarControls(car_controls) 257 | print_action(episode, step, car_controls, distance, steer) 258 | print(',inner loop=%.4fs\n'%(inner_loop_time_end-inner_loop_time_start),end='') 259 | inner_loop_time_start = time.time() 260 | time.sleep(0.5) 261 | step = step+1 262 | steps_record.append(step-1) 263 | map_path = './test_result/grid_map_%d.pkl' % episode 264 | if not os.path.exists('./test_result'): 265 | os.mkdir('./test_result') 266 | map_file = open(map_path, 'wb') 267 | pickle.dump(map_grid, map_file) 268 | map_file.close() 269 | episode = episode + 1 270 | if(terminal==1): 271 | terminal_record +=1 272 | else: 273 | collision_record +=1 274 | print('terminal nums=%d,collision num=%d,average steps is %04d,the shortest distance=%.2f'%(terminal_record,collision_record,int(np.mean(steps_record)),latest_distance)) 275 | client.reset() 276 | 277 | def main(): 278 | testNetwork() 279 | 280 | if __name__ == "__main__": 281 | main() 282 | -------------------------------------------------------------------------------- /D3QN_training_standard.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import airsim 3 | from collections import deque 4 | 5 | import random 6 | import numpy as np 7 | import time 8 | import os 9 | import pickle 10 | # basic setting 11 | ACTION_NUMS = 13 # number of valid actions 12 | GAMMA = 0.99 # decay rate of past observations 13 | OBSERVE = 50 # time steps to observe before training 14 | EXPLORE = 20000. # frames over which to anneal epsilon 15 | FINAL_EPSILON = 0.0001 # final value of epsilon 16 | INITIAL_EPSILON = 0.1 # starting value of epsilon 17 | EPSILON_DECAY_START = 20 18 | MEMORY_SIZE = 20000 # number of previous transitions to remember 19 | MINI_BATCH = 32 # size of mini batch 20 | MAX_EPISODE = 20000 21 | DEPTH_IMAGE_WIDTH = 256 22 | DEPTH_IMAGE_HEIGHT = 144 23 | 24 | TAU = 0.001 # Rate to update target network toward primary network 25 | flatten_len = 9216 # the input shape before full connect layer 26 | NumBufferFrames = 4 # take the latest 4 frames as input 27 | 28 | def variable_summaries(var): 29 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 30 | with tf.name_scope('summaries'): 31 | mean = tf.reduce_mean(var) 32 | tf.summary.scalar('mean', mean) 33 | with tf.name_scope('stddev'): 34 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 35 | tf.summary.scalar('stddev', stddev) 36 | tf.summary.scalar('max', tf.reduce_max(var)) 37 | tf.summary.scalar('min', tf.reduce_min(var)) 38 | tf.summary.histogram('histogram', var) 39 | 40 | def weight_variable(shape): 41 | initial = tf.truncated_normal(shape, stddev=0.01) 42 | return tf.Variable(initial, name="weights") 43 | 44 | def bias_variable(shape): 45 | initial = tf.constant(0., shape=shape) 46 | return tf.Variable(initial, name="bias") 47 | 48 | def conv2d(x, W, stride_h, stride_w): 49 | return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME") 50 | 51 | class Deep_Q_Network(object): 52 | """docstring for ClassName""" 53 | def __init__(self, sess): 54 | # network weights and biases 55 | # input 144x256x4 56 | with tf.name_scope("Conv1"): 57 | W_conv1 = weight_variable([8, 8, NumBufferFrames, 32]) 58 | variable_summaries(W_conv1) 59 | b_conv1 = bias_variable([32]) 60 | with tf.name_scope("Conv2"): 61 | W_conv2 = weight_variable([4, 4, 32, 64]) 62 | variable_summaries(W_conv2) 63 | b_conv2 = bias_variable([64]) 64 | with tf.name_scope("Conv3"): 65 | W_conv3 = weight_variable([3, 3, 64, 64]) 66 | variable_summaries(W_conv3) 67 | b_conv3 = bias_variable([64]) 68 | with tf.name_scope("Value_Dense"): 69 | W_value = weight_variable([flatten_len, 512]) 70 | variable_summaries(W_value) 71 | b_value = bias_variable([512]) 72 | with tf.name_scope("FCAdv"): 73 | W_adv = weight_variable([flatten_len, 512]) 74 | variable_summaries(W_adv) 75 | b_adv = bias_variable([512]) 76 | with tf.name_scope("FCValueOut"): 77 | W_value_out = weight_variable([512, 1]) 78 | variable_summaries(W_value_out) 79 | b_value_out = bias_variable([1]) 80 | with tf.name_scope("FCAdvOut"): 81 | W_adv_out = weight_variable([512, ACTION_NUMS]) 82 | variable_summaries(W_adv_out) 83 | b_adv_out = bias_variable([ACTION_NUMS]) 84 | # input layer 85 | self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames]) 86 | # Conv1 layer 87 | h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1) 88 | # Conv2 layer 89 | h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2) 90 | # Conv2 layer 91 | h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3) 92 | h_conv3_flat = tf.layers.flatten(h_conv3) 93 | # FC ob value layer 94 | h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value) 95 | value = tf.matmul(h_fc_value, W_value_out) + b_value_out 96 | # FC ob adv layer 97 | h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv) 98 | advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out 99 | # Q = value + (adv - advAvg) 100 | advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1) 101 | advIdentifiable = tf.subtract(advantage, advAvg) 102 | self.readout = tf.add(value, advIdentifiable) 103 | # define the cost function 104 | self.actions = tf.placeholder("float", [None, ACTION_NUMS]) 105 | self.y = tf.placeholder("float", [None]) 106 | self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1) 107 | self.td_error = tf.square(self.y - self.readout_action) 108 | self.cost = tf.reduce_mean(self.td_error) 109 | self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost) 110 | 111 | def updateTargetGraph(tfVars, tau): 112 | total_vars = len(tfVars) 113 | op_holder = [] 114 | for idx, var in enumerate(tfVars[0:total_vars // 2]): 115 | op_holder.append(tfVars[idx + total_vars // 2].assign( 116 | (var.value() * tau) + ((1 - tau) * tfVars[idx + total_vars // 2].value()))) 117 | return op_holder 118 | 119 | def updateTarget(op_holder, sess): 120 | for op in op_holder: 121 | sess.run(op) 122 | 123 | def get_image(client,image_type): 124 | if (image_type == 'Scene'): 125 | responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)]) 126 | response = responses[0] 127 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) 128 | img_rgba = img1d.reshape(response.height, response.width, 4) 129 | img_rgba = np.flipud(img_rgba) 130 | observation = img_rgba[:, :, 0:3] 131 | elif (image_type == 'Segmentation'): 132 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)]) 133 | response = responses[0] 134 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) # get numpy array 135 | img_rgba = img1d.reshape(response.height, response.width, 4) # reshape array to 4 channel image array H X W X 4 136 | observation = img_rgba[:, :, 0:3] 137 | elif (image_type == 'DepthPlanner'): 138 | try: 139 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)]) 140 | response = responses[0] 141 | img1d = np.array(response.image_data_float, dtype=np.float) 142 | img1d = img1d * 3.5 + 30 143 | img1d[img1d > 255] = 255 144 | img2d = np.reshape(img1d, (responses[0].height, responses[0].width)) 145 | observation = img2d 146 | except: 147 | print('######### Error: I can not get a depth image correctly! #########################' ) 148 | observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 149 | else: 150 | observation = None 151 | observation_size = np.shape(observation) 152 | if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH): 153 | return observation 154 | else: 155 | print('######### Error: The depth image shape: ',observation_size) 156 | return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 157 | 158 | def go_back(client,car_controls): 159 | car_controls.throttle = -0.5 160 | car_controls.is_manual_gear = True 161 | car_controls.manual_gear = -1 162 | car_controls.steering = 0 163 | client.setCarControls(car_controls) 164 | time.sleep(2) 165 | print("############ The car collided, we are taking a step back...... ####################") 166 | car_controls.is_manual_gear = False # change back gear to auto 167 | car_controls.manual_gear = 0 168 | car_controls.throttle = 0 169 | car_controls.steering = 0 170 | car_controls.brake = 1 171 | client.setCarControls(car_controls) 172 | time.sleep(1) # let car drive a bit 173 | car_controls.brake = 0 # remove brake 174 | return car_controls 175 | 176 | def env_feedback(client,pre_position): 177 | terminal_position = [-130, -210] 178 | reset = False 179 | collision_info = client.simGetCollisionInfo() 180 | car_position = client.getCarState().kinematics_estimated.position 181 | distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2) 182 | current_position = [car_position.x_val,car_position.y_val] 183 | if collision_info.has_collided: 184 | reward = -100 185 | reset = True 186 | else: 187 | pre_distance = np.sqrt((pre_position[0]-terminal_position[0])**2+(pre_position[1]-terminal_position[1])**2) 188 | if(distance<=10): 189 | reward = 20 190 | elif(distance=straight_range[0] and steer<=straight_range[1]): 229 | car_controls.throttle = car_controls.throttle + 0.1 230 | car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle 231 | elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]): 232 | car_controls.throttle = 0.5 233 | else: 234 | car_controls.throttle = 1 235 | car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle 236 | return car_controls 237 | 238 | def print_action(episode,t,car_controls,distance,steer,reward): 239 | print('Episode:%05d,Step:%05d '%(episode,t),end='') 240 | if(steer<0): 241 | print('The car is turning left , ',end='') 242 | elif(steer>0): 243 | print('The car is turning right , ',end='') 244 | else: 245 | print('The car is going straightly, ',end='') 246 | print('throttle=%.1f, steer=%.3f, reward=%.2f, distance=%.2f'%(car_controls.throttle,steer,reward,distance),end='') 247 | 248 | def trainNetwork(): 249 | client = airsim.CarClient() 250 | client.confirmConnection() 251 | print('Connect succcefully!') 252 | client.enableApiControl(True) 253 | car_controls = airsim.CarControls() 254 | car_controls.throttle = 0.5 255 | car_controls.steering = 0 256 | client.reset() 257 | print('Environment initialized!') 258 | # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7) 259 | # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options)) 260 | sess = tf.InteractiveSession() 261 | with tf.name_scope("OnlineNetwork"): 262 | online_net = Deep_Q_Network(sess) 263 | with tf.name_scope("TargetNetwork"): 264 | target_net = Deep_Q_Network(sess) 265 | time.sleep(1) 266 | 267 | reward_var = tf.Variable(0., trainable=False) 268 | tf.summary.scalar('reward', reward_var) 269 | # define summary 270 | merged_summary = tf.summary.merge_all() 271 | summary_writer = tf.summary.FileWriter('./logs', sess.graph) 272 | # Initialize the buffer 273 | replay_experiences = deque() 274 | replay_experiences = store_transition(replay_experiences,'read') 275 | # get the first state 276 | observe_init = get_image(client,'DepthPlanner') 277 | state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2) 278 | # saving and loading networks 279 | trainables = tf.trainable_variables() 280 | trainable_saver = tf.train.Saver(trainables) 281 | sess.run(tf.global_variables_initializer()) 282 | checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model/") 283 | print('checkpoint:', checkpoint) 284 | if checkpoint and checkpoint.model_checkpoint_path: 285 | trainable_saver.restore(sess, checkpoint.model_checkpoint_path) 286 | print("Successfully loaded:", checkpoint.model_checkpoint_path) 287 | else: 288 | if not os.path.exists("saved_networks/new_model"): 289 | os.mkdir("saved_networks/new_model") 290 | print('The file not exists, is created successfully') 291 | print("Could not find old network weights") 292 | 293 | # start training 294 | episode = 1 295 | epsilon = INITIAL_EPSILON 296 | print('Number of trainable variables:', len(trainables)) 297 | targetOps = updateTargetGraph(trainables, TAU) 298 | inner_loop_time_start = time.time() 299 | while episode < MAX_EPISODE: 300 | reward_episode = 0. 301 | terminal = 0 302 | step = 1 303 | reset = False 304 | loop_start_time = time.time() 305 | car_position = client.getCarState().kinematics_estimated.position 306 | pre_position = [car_position.x_val,car_position.y_val] 307 | while not reset: 308 | # take the latest 4 frames as an input 309 | observe = get_image(client,'DepthPlanner') 310 | observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1)) 311 | state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2) 312 | current_position,distance,reward_current, terminal,reset = env_feedback(client,pre_position) 313 | # store the experience 314 | if step-1 > 0: 315 | replay_experiences.append((state_pre, action_current, reward_current, state_current, terminal)) 316 | if len(replay_experiences) > MEMORY_SIZE: 317 | replay_experiences.popleft() 318 | state_pre = state_current 319 | # choose an action epsilon greedily 320 | actions = sess.run(online_net.readout, feed_dict={online_net.state: [state_current]}) 321 | readout_t = actions[0] 322 | action_current = np.zeros([ACTION_NUMS]) 323 | # fill the reply experience 324 | if len(replay_experiences) <= OBSERVE: 325 | action_index = random.randrange(ACTION_NUMS) 326 | print('episode=%05d,step=%05d,we are observing the env,the action is random......'%(episode,step)) 327 | action_current[action_index] = 1 328 | else: 329 | if random.random() <= epsilon: 330 | print("----------Random Action----------") 331 | action_index = random.randrange(ACTION_NUMS) 332 | action_current[action_index] = 1 333 | else: 334 | action_index = np.argmax(readout_t) 335 | action_current[action_index] = 1 336 | # Control the agent 337 | side_num = int(ACTION_NUMS - 1) // 2 338 | steer = float((action_index - side_num) / side_num) 339 | inner_loop_time_end = time.time() 340 | car_controls = excute_action(client,car_controls,steer) 341 | client.setCarControls(car_controls) 342 | print_action(episode, step, car_controls, distance, steer, reward_current) 343 | print(',experience len=%05d'%len(replay_experiences),end='') 344 | print(',inner loop=%.4fs'%(inner_loop_time_end-inner_loop_time_start)) 345 | inner_loop_time_start = time.time() 346 | time.sleep(0.5) 347 | pre_position = current_position 348 | 349 | if episode > OBSERVE: 350 | # # sample a minibatch to train on 351 | minibatch = random.sample(replay_experiences, MINI_BATCH) 352 | y_batch = [] 353 | # get the batch variables 354 | state_pre_batch = [d[0] for d in minibatch] 355 | actions_batch = [d[1] for d in minibatch] 356 | rewards_batch = [d[2] for d in minibatch] 357 | state_current_batch = [d[3] for d in minibatch] 358 | Q1 = online_net.readout.eval(feed_dict={online_net.state: state_current_batch}) 359 | Q2 = target_net.readout.eval(feed_dict={target_net.state: state_current_batch}) 360 | for i in range(0, len(minibatch)): 361 | terminal_batch = minibatch[i][4] 362 | # if terminal, only equals reward 363 | if terminal_batch: 364 | y_batch.append(rewards_batch[i]) 365 | else: 366 | y_batch.append(rewards_batch[i] + GAMMA * Q2[i, np.argmax(Q1[i])]) 367 | 368 | # Update the network with our target values. 369 | online_net.train_step.run(feed_dict={online_net.y: y_batch, 370 | online_net.actions: actions_batch, 371 | online_net.state: state_pre_batch}) 372 | updateTarget(targetOps, sess) # Set the target network to be equal to the primary network. 373 | 374 | reward_episode = reward_episode + reward_current 375 | step = step+1 376 | # scale down epsilon 377 | if epsilon > FINAL_EPSILON and episode > EPSILON_DECAY_START: 378 | epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE 379 | # save progress every 20 episodes and write summaries 380 | if (episode % 5 == 0): 381 | trainable_saver.save(sess, "saved_networks/new_model/Simply_maze",global_step=episode,write_state=True) 382 | print('######## The mode has been saved successfully after %d episodes ##########' % episode) 383 | # write summaries 384 | summary_str = sess.run(merged_summary, feed_dict={reward_var: reward_episode}) 385 | summary_writer.add_summary(summary_str, episode) 386 | if(len(replay_experiences)<2500): 387 | signal_back = store_transition(replay_experiences, 'store') 388 | if signal_back: 389 | print('######## The replay experiences has been saved successfully after %d episodes ##########' % episode) 390 | else: 391 | print('######## Warning: the replay experiences can not be saved after %d episodes ##########' % episode) 392 | 393 | loop_end_time = time.time() 394 | loop_time = loop_end_time - loop_start_time 395 | print("EPISODE", episode, "/ REWARD", reward_episode, "/ steps ", step, "/ LoopTime:", loop_time) 396 | episode = episode + 1 397 | client.reset() 398 | 399 | def main(): 400 | trainNetwork() 401 | 402 | if __name__ == "__main__": 403 | main() 404 | -------------------------------------------------------------------------------- /D3QN_training_Lidar.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import setup_path 3 | import airsim 4 | from collections import deque 5 | 6 | import random 7 | import numpy as np 8 | import time 9 | import os 10 | import pickle 11 | # basic setting 12 | ACTION_NUMS = 13 # number of valid actions 13 | GAMMA = 0.99 # decay rate of past observations 14 | OBSERVE = 50 # time steps to observe before training 15 | EXPLORE = 20000. # frames over which to anneal epsilon 16 | FINAL_EPSILON = 0.0001 # final value of epsilon 17 | INITIAL_EPSILON = 0.1 # starting value of epsilon 18 | EPSILON_DECAY_START = 20 19 | MEMORY_SIZE = 20000 # number of previous transitions to remember 20 | MINI_BATCH = 32 # size of mini batch 21 | MAX_EPISODE = 20000 22 | DEPTH_IMAGE_WIDTH = 256 23 | DEPTH_IMAGE_HEIGHT = 144 24 | 25 | TAU = 0.001 # Rate to update target network toward primary network 26 | flatten_len = 9216 # the input shape before full connect layer 27 | NumBufferFrames = 4 # take the latest 4 frames as input 28 | 29 | def variable_summaries(var): 30 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 31 | with tf.name_scope('summaries'): 32 | mean = tf.reduce_mean(var) 33 | tf.summary.scalar('mean', mean) 34 | with tf.name_scope('stddev'): 35 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 36 | tf.summary.scalar('stddev', stddev) 37 | tf.summary.scalar('max', tf.reduce_max(var)) 38 | tf.summary.scalar('min', tf.reduce_min(var)) 39 | tf.summary.histogram('histogram', var) 40 | 41 | def weight_variable(shape): 42 | initial = tf.truncated_normal(shape, stddev=0.01) 43 | return tf.Variable(initial, name="weights") 44 | 45 | def bias_variable(shape): 46 | initial = tf.constant(0., shape=shape) 47 | return tf.Variable(initial, name="bias") 48 | 49 | def conv2d(x, W, stride_h, stride_w): 50 | return tf.nn.conv2d(x, W, strides=[1, stride_h, stride_w, 1], padding="SAME") 51 | 52 | class Deep_Q_Network(object): 53 | """docstring for ClassName""" 54 | def __init__(self, sess): 55 | # network weights and biases 56 | # input 144x256x4 57 | with tf.name_scope("Conv1"): 58 | W_conv1 = weight_variable([8, 8, NumBufferFrames, 32]) 59 | variable_summaries(W_conv1) 60 | b_conv1 = bias_variable([32]) 61 | with tf.name_scope("Conv2"): 62 | W_conv2 = weight_variable([4, 4, 32, 64]) 63 | variable_summaries(W_conv2) 64 | b_conv2 = bias_variable([64]) 65 | with tf.name_scope("Conv3"): 66 | W_conv3 = weight_variable([3, 3, 64, 64]) 67 | variable_summaries(W_conv3) 68 | b_conv3 = bias_variable([64]) 69 | with tf.name_scope("Value_Dense"): 70 | W_value = weight_variable([flatten_len, 512]) 71 | variable_summaries(W_value) 72 | b_value = bias_variable([512]) 73 | with tf.name_scope("FCAdv"): 74 | W_adv = weight_variable([flatten_len, 512]) 75 | variable_summaries(W_adv) 76 | b_adv = bias_variable([512]) 77 | with tf.name_scope("FCValueOut"): 78 | W_value_out = weight_variable([512, 1]) 79 | variable_summaries(W_value_out) 80 | b_value_out = bias_variable([1]) 81 | with tf.name_scope("FCAdvOut"): 82 | W_adv_out = weight_variable([512, ACTION_NUMS]) 83 | variable_summaries(W_adv_out) 84 | b_adv_out = bias_variable([ACTION_NUMS]) 85 | # input layer 86 | self.state = tf.placeholder("float", [None, DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, NumBufferFrames]) 87 | # Conv1 layer 88 | h_conv1 = tf.nn.relu(conv2d(self.state, W_conv1, 8, 8) + b_conv1) 89 | # Conv2 layer 90 | h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2, 2) + b_conv2) 91 | # Conv2 layer 92 | h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 1, 1) + b_conv3) 93 | h_conv3_flat = tf.layers.flatten(h_conv3) 94 | # FC ob value layer 95 | h_fc_value = tf.nn.relu(tf.matmul(h_conv3_flat, W_value) + b_value) 96 | value = tf.matmul(h_fc_value, W_value_out) + b_value_out 97 | # FC ob adv layer 98 | h_fc_adv = tf.nn.relu(tf.matmul(h_conv3_flat, W_adv) + b_adv) 99 | advantage = tf.matmul(h_fc_adv, W_adv_out) + b_adv_out 100 | # Q = value + (adv - advAvg) 101 | advAvg = tf.expand_dims(tf.reduce_mean(advantage, axis=1), axis=1) 102 | advIdentifiable = tf.subtract(advantage, advAvg) 103 | self.readout = tf.add(value, advIdentifiable) 104 | # define the cost function 105 | self.actions = tf.placeholder("float", [None, ACTION_NUMS]) 106 | self.y = tf.placeholder("float", [None]) 107 | self.readout_action = tf.reduce_sum(tf.multiply(self.readout, self.actions), axis=1) 108 | self.td_error = tf.square(self.y - self.readout_action) 109 | self.cost = tf.reduce_mean(self.td_error) 110 | self.train_step = tf.train.AdamOptimizer(1e-5).minimize(self.cost) 111 | 112 | def updateTargetGraph(tfVars, tau): 113 | total_vars = len(tfVars) 114 | op_holder = [] 115 | for idx, var in enumerate(tfVars[0:total_vars // 2]): 116 | op_holder.append(tfVars[idx + total_vars // 2].assign( 117 | (var.value() * tau) + ((1 - tau) * tfVars[idx + total_vars // 2].value()))) 118 | return op_holder 119 | 120 | def updateTarget(op_holder, sess): 121 | for op in op_holder: 122 | sess.run(op) 123 | 124 | def get_image(client,image_type): 125 | if (image_type == 'Scene'): 126 | responses = client.simGetImages([airsim.ImageRequest("0", airsim.ImageType.Scene, False, False)]) 127 | response = responses[0] 128 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) 129 | img_rgba = img1d.reshape(response.height, response.width, 4) 130 | img_rgba = np.flipud(img_rgba) 131 | observation = img_rgba[:, :, 0:3] 132 | elif (image_type == 'Segmentation'): 133 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.Segmentation, False, False)]) 134 | response = responses[0] 135 | img1d = np.fromstring(response.image_data_uint8, dtype=np.uint8) # get numpy array 136 | img_rgba = img1d.reshape(response.height, response.width, 4) # reshape array to 4 channel image array H X W X 4 137 | observation = img_rgba[:, :, 0:3] 138 | elif (image_type == 'DepthPlanner'): 139 | try: 140 | responses = client.simGetImages([airsim.ImageRequest(0, airsim.ImageType.DepthPlanner, pixels_as_float=True)]) 141 | response = responses[0] 142 | img1d = np.array(response.image_data_float, dtype=np.float) 143 | img1d = img1d * 3.5 + 30 144 | img1d[img1d > 255] = 255 145 | img2d = np.reshape(img1d, (responses[0].height, responses[0].width)) 146 | observation = img2d 147 | except: 148 | print('######### Error: I can not get a depth image correctly! #########################' ) 149 | observation = np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 150 | else: 151 | observation = None 152 | observation_size = np.shape(observation) 153 | if(observation_size[0]==DEPTH_IMAGE_HEIGHT and observation_size[1]==DEPTH_IMAGE_WIDTH): 154 | return observation 155 | else: 156 | print('######### Error: The depth image shape: ',observation_size) 157 | return np.ones([DEPTH_IMAGE_HEIGHT,DEPTH_IMAGE_WIDTH]) 158 | 159 | def lidar_auxiliary(client,car_controls): 160 | lidarData = client.getLidarData() 161 | if (len(lidarData.point_cloud) < 3): 162 | print("##### No obstacles ahead........ ") 163 | else: 164 | points = np.array(lidarData.point_cloud, dtype=np.dtype('f4')) 165 | points = np.reshape(points, (int(points.shape[0] / 3), 3)) 166 | x_points, y_points = np.array(points[:, 1]),np.array(points[:, 0]) 167 | zeros_index = np.argwhere(x_points >= -0.1) 168 | if (len(zeros_index) <= 1): 169 | if(car_controls.steering<=0): 170 | car_controls.steering = 0.5 171 | print('##### Obstacle ahead, please turn right........') 172 | else: 173 | print('##### Obstacle ahead, operate correctly........') 174 | elif (len(zeros_index) >= len(x_points) - 1 or y_points[zeros_index[0] - 1] >= y_points[zeros_index[0] + 1]): 175 | if (car_controls.steering >= 0): 176 | car_controls.steering = -0.5 177 | print('##### Obstacle ahead, please turn left........') 178 | else: 179 | print('##### Obstacle ahead, operate correctly........') 180 | elif (y_points[zeros_index[0] - 1] >= y_points[zeros_index[0] + 1]): 181 | if (car_controls.steering >= 0): 182 | car_controls.steering = -0.5 183 | print('##### Obstacle ahead, please turn left........') 184 | else: 185 | print('##### Obstacle ahead, operate correctly........') 186 | else: 187 | if (car_controls.steering <= 0): 188 | car_controls.steering = 0.5 189 | print('##### Obstacle ahead, please turn right........') 190 | else: 191 | print('##### Obstacle ahead, operate correctly........') 192 | return car_controls 193 | 194 | def env_feedback(client,pre_position): 195 | terminal_position = [-130, -210] 196 | reset = False 197 | collision_info = client.simGetCollisionInfo() 198 | car_position = client.getCarState().kinematics_estimated.position 199 | distance = np.sqrt((car_position.x_val-terminal_position[0])**2+(car_position.y_val-terminal_position[1])**2) 200 | current_position = [car_position.x_val,car_position.y_val] 201 | if collision_info.has_collided: 202 | reward = -100 203 | reset = True 204 | else: 205 | pre_distance = np.sqrt((pre_position[0]-terminal_position[0])**2+(pre_position[1]-terminal_position[1])**2) 206 | if(distance<=10): 207 | reward = 20 208 | elif(distance=straight_range[0] and steer<=straight_range[1]): 247 | car_controls.throttle = car_controls.throttle + 0.1 248 | car_controls.throttle = 2 if car_controls.throttle>=2 else car_controls.throttle 249 | elif(steer<=BigTurn_range[0] or steer>=BigTurn_range[1]): 250 | car_controls.throttle = 0.5 251 | else: 252 | car_controls.throttle = 1 253 | car_controls.throttle = 0 if car_speed>=3 else car_controls.throttle 254 | return car_controls 255 | 256 | def print_action(episode,t,car_controls,distance,steer,reward): 257 | print('Episode:%05d,Step:%05d '%(episode,t),end='') 258 | if(steer<0): 259 | print('The car is turning left , ',end='') 260 | elif(steer>0): 261 | print('The car is turning right , ',end='') 262 | else: 263 | print('The car is going straightly, ',end='') 264 | print('throttle=%.1f, steer=%.3f, reward=%.2f, distance=%.2f'%(car_controls.throttle,steer,reward,distance),end='') 265 | 266 | def trainNetwork(): 267 | client = airsim.CarClient() 268 | client.confirmConnection() 269 | print('Connect succcefully!') 270 | client.enableApiControl(True) 271 | car_controls = airsim.CarControls() 272 | car_controls.throttle = 0.5 273 | car_controls.steering = 0 274 | client.reset() 275 | print('Environment initialized!') 276 | # gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7) 277 | # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options)) 278 | sess = tf.InteractiveSession() 279 | with tf.name_scope("OnlineNetwork"): 280 | online_net = Deep_Q_Network(sess) 281 | with tf.name_scope("TargetNetwork"): 282 | target_net = Deep_Q_Network(sess) 283 | time.sleep(1) 284 | 285 | reward_var = tf.Variable(0., trainable=False) 286 | tf.summary.scalar('reward', reward_var) 287 | # define summary 288 | merged_summary = tf.summary.merge_all() 289 | summary_writer = tf.summary.FileWriter('./logs', sess.graph) 290 | # Initialize the buffer 291 | replay_experiences = deque() 292 | replay_experiences = store_transition(replay_experiences,'read') 293 | # get the first state 294 | observe_init = get_image(client,'DepthPlanner') 295 | state_pre = np.stack((observe_init, observe_init, observe_init, observe_init), axis=2) 296 | # saving and loading networks 297 | trainables = tf.trainable_variables() 298 | trainable_saver = tf.train.Saver(trainables) 299 | sess.run(tf.global_variables_initializer()) 300 | checkpoint = tf.train.get_checkpoint_state("saved_networks/new_model_lidar/") 301 | print('checkpoint:', checkpoint) 302 | if checkpoint and checkpoint.model_checkpoint_path: 303 | trainable_saver.restore(sess, checkpoint.model_checkpoint_path) 304 | print("Successfully loaded:", checkpoint.model_checkpoint_path) 305 | else: 306 | if not os.path.exists("saved_networks/new_model_lidar"): 307 | os.mkdir("saved_networks/new_model_lidar") 308 | print('The file not exists, is created successfully') 309 | print("Could not find old network weights") 310 | 311 | # start training 312 | episode = 1 313 | epsilon = INITIAL_EPSILON 314 | print('Number of trainable variables:', len(trainables)) 315 | targetOps = updateTargetGraph(trainables, TAU) 316 | inner_loop_time_start = time.time() 317 | while episode < MAX_EPISODE: 318 | reward_episode = 0. 319 | terminal = 0 320 | step = 1 321 | reset = False 322 | loop_start_time = time.time() 323 | car_position = client.getCarState().kinematics_estimated.position 324 | pre_position = [car_position.x_val,car_position.y_val] 325 | while not reset: 326 | # take the latest 4 frames as an input 327 | observe = get_image(client,'DepthPlanner') 328 | observe = np.reshape(observe, (DEPTH_IMAGE_HEIGHT, DEPTH_IMAGE_WIDTH, 1)) 329 | state_current = np.append(observe, state_pre[:, :, :(NumBufferFrames - 1)], axis=2) 330 | current_position,distance,reward_current, terminal,reset = env_feedback(client,pre_position) 331 | # store the experience 332 | if step-1 > 0: 333 | replay_experiences.append((state_pre, action_current, reward_current, state_current, terminal)) 334 | if len(replay_experiences) > MEMORY_SIZE: 335 | replay_experiences.popleft() 336 | state_pre = state_current 337 | # choose an action epsilon greedily 338 | actions = sess.run(online_net.readout, feed_dict={online_net.state: [state_current]}) 339 | readout_t = actions[0] 340 | action_current = np.zeros([ACTION_NUMS]) 341 | # fill the reply experience 342 | if len(replay_experiences) <= OBSERVE: 343 | action_index = random.randrange(ACTION_NUMS) 344 | print('episode=%05d,step=%05d,we are observing the env,the action is random......'%(episode,step)) 345 | action_current[action_index] = 1 346 | else: 347 | if random.random() <= epsilon: 348 | print("----------Random Action----------") 349 | action_index = random.randrange(ACTION_NUMS) 350 | action_current[action_index] = 1 351 | else: 352 | action_index = np.argmax(readout_t) 353 | action_current[action_index] = 1 354 | # Control the agent 355 | side_num = int(ACTION_NUMS - 1) // 2 356 | steer = float((action_index - side_num) / side_num) 357 | inner_loop_time_end = time.time() 358 | car_controls = excute_action(client,car_controls,steer) 359 | car_controls = lidar_auxiliary(client,car_controls) 360 | client.setCarControls(car_controls) 361 | print_action(episode, step, car_controls, distance, steer, reward_current) 362 | print(',experience len=%05d'%len(replay_experiences),end='') 363 | print(',inner loop=%.4fs'%(inner_loop_time_end-inner_loop_time_start)) 364 | inner_loop_time_start = time.time() 365 | time.sleep(0.5) 366 | pre_position = current_position 367 | 368 | if episode > OBSERVE: 369 | # # sample a minibatch to train on 370 | minibatch = random.sample(replay_experiences, MINI_BATCH) 371 | y_batch = [] 372 | # get the batch variables 373 | state_pre_batch = [d[0] for d in minibatch] 374 | actions_batch = [d[1] for d in minibatch] 375 | rewards_batch = [d[2] for d in minibatch] 376 | state_current_batch = [d[3] for d in minibatch] 377 | Q1 = online_net.readout.eval(feed_dict={online_net.state: state_current_batch}) 378 | Q2 = target_net.readout.eval(feed_dict={target_net.state: state_current_batch}) 379 | for i in range(0, len(minibatch)): 380 | terminal_batch = minibatch[i][4] 381 | # if terminal, only equals reward 382 | if terminal_batch: 383 | y_batch.append(rewards_batch[i]) 384 | else: 385 | y_batch.append(rewards_batch[i] + GAMMA * Q2[i, np.argmax(Q1[i])]) 386 | 387 | # Update the network with our target values. 388 | online_net.train_step.run(feed_dict={online_net.y: y_batch, 389 | online_net.actions: actions_batch, 390 | online_net.state: state_pre_batch}) 391 | updateTarget(targetOps, sess) # Set the target network to be equal to the primary network. 392 | 393 | reward_episode = reward_episode + reward_current 394 | step = step+1 395 | # scale down epsilon 396 | if epsilon > FINAL_EPSILON and episode > EPSILON_DECAY_START: 397 | epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE 398 | # save progress every 20 episodes and write summaries 399 | if (episode % 5 == 0): 400 | trainable_saver.save(sess, "saved_networks/new_model_lidar/Simply_maze",global_step=episode,write_state=True) 401 | print('######## The mode has been saved successfully after %d episodes ##########' % episode) 402 | # write summaries 403 | summary_str = sess.run(merged_summary, feed_dict={reward_var: reward_episode}) 404 | summary_writer.add_summary(summary_str, episode) 405 | if(len(replay_experiences)<2500): 406 | signal_back = store_transition(replay_experiences, 'store') 407 | if signal_back: 408 | print('######## The replay experiences has been saved successfully after %d episodes ##########' % episode) 409 | else: 410 | print('######## Warning: the replay experiences can not be saved after %d episodes ##########' % episode) 411 | 412 | loop_end_time = time.time() 413 | loop_time = loop_end_time - loop_start_time 414 | print("EPISODE", episode, "/ REWARD", reward_episode, "/ steps ", step, "/ LoopTime:", loop_time) 415 | episode = episode + 1 416 | client.reset() 417 | 418 | def main(): 419 | trainNetwork() 420 | 421 | if __name__ == "__main__": 422 | main() 423 | --------------------------------------------------------------------------------