├── README.md ├── __init__.py ├── config.ini ├── gym_airsim ├── __init__.py ├── airsim_car_env.py └── car_agent.py ├── images ├── learning_process.PNG ├── metrics.png ├── thumbnail.png └── waypoint_actor.png ├── train.py ├── training_car_a2c_steering_only.ipynb ├── training_car_ppo2_steering_only.ipynb ├── training_steering_only.ipynb └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | # AirSim-RL 2 | This project provides experiments with Deep Reinforcement Learning on autonomous vehicles (car and uav drone) in [AirSim](https://github.com/microsoft/AirSim). The followings are used to facilitate training models: 3 | *[OpenAI Gym](https://gym.openai.com/) 4 | *[Keras-RL](https://github.com/keras-rl/keras-rl) 5 | *[Stable Baselines](https://stable-baselines.readthedocs.io/en/master/) 6 | DRL algorithms: 7 | * DQN 8 | * A2C 9 | * PPO2 10 | 11 | ## Demo 12 | [![Demo Video](images/thumbnail.png)](https://youtu.be/oiGjnpcPB_8) 13 | * In this demo, The Deep Q Network (DQN) is used. The reward calculated based on the distance between the car to the center of the track. The episode stops when it is close to off track. 14 | * The input for training is frames of RGB image captured from the head-front camera on car. 15 | 16 | ## How do I set up my environment 17 | * Build AirSim, Unreal Engine 4 (UE4) on Windows (https://github.com/microsoft/AirSim/blob/master/docs/build_windows.md) 18 | * Create a new UE4 project with map [RaceCourse](https://github.com/microsoft/AirSim/wiki/build_FSTDriverless_windows) 19 | * Install conda an environment with the important packages: python 3.7, tensorflow 1.14, keras 2.2.5 , keras-rl 0.4.2, gym 0.17.2 (The versions come along are just for reference) 20 | * Arrange some waypoints in the center of the road and name it as "WayPoints[number]" (see the following picture). This list of waypoints is used to locate the road approximately in map. Car's position and waypoints are used in calculate the reward during training. As illustrated below, UE4 'Empty Actors' are used as waypoints. These waypoints are invisible when the game is running. 21 | ![waypoints](images/waypoint_actor.png) 22 | 23 | ## How to train 24 | * First take a look in the parameters in Config.ini file to understand some settings, like input image resolution, action space of the agent, etc. 25 | ``` 26 | # settings related to UE4/airsim 27 | [airsim_settings] 28 | image_height = 144 29 | image_width = 256 30 | image_channels = 3 31 | waypoint_regex = WayPoint.* 32 | track_width = 12 33 | 34 | # settings related to training car agent 35 | [car_agent] 36 | # what are adjusted in driving the car 37 | # 0: change steering only, 1(not supported yet): change throttle and steering, 38 | # 2(not supported yet): change throttle, steering, and brake 39 | action_mode = 0 40 | # steering value from left to right in range [-1, 1] 41 | # e.g: 0.3 means steering is from -0.3 (left) to 0.3 (right) 42 | steering_max = 0.3 43 | # the granularity of steering range as we use discrete values for steering 44 | # e.g: 7 will produce discrete steering(with max=0.3) actions as: -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3 45 | steering_granularity = 7 46 | # car's acceleration, now it is fixed, but will add a range of values later 47 | fixed_throttle = 0.75 48 | # total actions of car agent, update this accordingly when changing the above settings 49 | actions = 7 50 | ``` 51 | * Then, run the jupyter notebook (or train.py) 52 | 53 | ## TODO 54 | * Expand action spaces, try more reward functions 55 | * Add OpenAI Gym env for UAV drone 56 | 57 | ## References 58 | * [create custom open ai gym environment](https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html) 59 | * [training dqn with keras-rl](https://github.com/keras-rl/keras-rl/blob/master/examples/dqn_atari.py) how to use keras-rl with atari game 60 | * https://github.com/Kjell-K/AirGym 61 | * [AWS DeepRacer](https://docs.aws.amazon.com/deepracer/latest/developerguide/what-is-deepracer.html) 62 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/__init__.py -------------------------------------------------------------------------------- /config.ini: -------------------------------------------------------------------------------- 1 | # settings related to UE4/airsim 2 | [airsim_settings] 3 | image_height = 144 4 | image_width = 256 5 | image_channels = 3 6 | waypoint_regex = WayPoint.* 7 | track_width = 12 8 | 9 | # settings related to training car agent 10 | [car_agent] 11 | # what are adjusted in driving the car 12 | # 0: change steering only, 1(not supported yet): change throttle and steering, 13 | # 2(not supported yet): change throttle, steering, and brake 14 | action_mode = 0 15 | # steering value from left to right in range [-1, 1] 16 | # e.g: 0.3 means steering is from -0.3 (left) to 0.3 (right) 17 | steering_max = 0.3 18 | # the granularity of steering range as we use discrete values for steering 19 | # e.g: 7 will produce discrete steering(with max=0.3) actions as: -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3 20 | steering_granularity = 7 21 | # car's acceleration, now it is fixed, but will add a range of values later 22 | fixed_throttle = 0.75 23 | # total actions of car agent, update this accordingly when changing the above settings 24 | actions = 7 -------------------------------------------------------------------------------- /gym_airsim/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/gym_airsim/__init__.py -------------------------------------------------------------------------------- /gym_airsim/airsim_car_env.py: -------------------------------------------------------------------------------- 1 | from configparser import ConfigParser 2 | import gym 3 | from gym import spaces 4 | import numpy as np 5 | from numpy.linalg import norm 6 | from .car_agent import CarAgent 7 | from os.path import dirname, abspath, join 8 | import sys 9 | sys.path.append('..') 10 | import utils 11 | 12 | class AirSimCarEnv(gym.Env): 13 | """Custom Environment that follows gym interface""" 14 | metadata = {'render.modes': ['human']} 15 | 16 | def __init__(self): 17 | super().__init__() 18 | # Define action and observation space 19 | # They must be gym.spaces objects 20 | 21 | config = ConfigParser() 22 | config.read(join(dirname(dirname(abspath(__file__))), 'config.ini')) 23 | 24 | # Using discrete actions 25 | #TODO Compute number of actions from other settings 26 | self.action_space = spaces.Discrete(int(config['car_agent']['actions'])) 27 | 28 | self.image_height = int(config['airsim_settings']['image_height']) 29 | self.image_width = int(config['airsim_settings']['image_width']) 30 | self.image_channels = int(config['airsim_settings']['image_channels']) 31 | image_shape = (self.image_height, self.image_width, self.image_channels) 32 | 33 | self.track_width = float(config['airsim_settings']['track_width']) 34 | 35 | # Using image as input: 36 | self.observation_space = spaces.Box(low=0, high=255, shape=image_shape, dtype=np.uint8) 37 | 38 | self.car_agent = CarAgent() 39 | 40 | def step(self, action): 41 | # move the car according to the action 42 | self.car_agent.move(action) 43 | 44 | # compute reward 45 | car_state= self.car_agent.getCarState() 46 | reward = self._compute_reward(car_state) 47 | 48 | # check if the episode is done 49 | car_controls = self.car_agent.getCarControls() 50 | done = self._isDone(car_state, car_controls, reward) 51 | 52 | # log info 53 | info = {} 54 | #info = {"x_pos" :x_val, "y_pos" : y_val} 55 | 56 | # get observation 57 | observation = self.car_agent.observe() 58 | 59 | return observation, reward, done, info 60 | 61 | def reset(self): 62 | self.car_agent.restart() 63 | observation = self.car_agent.observe() 64 | 65 | return observation # reward, done, info can't be included 66 | 67 | def render(self, mode='human'): 68 | return #nothing 69 | 70 | def close (self): 71 | self.car_agent.reset() 72 | return 73 | 74 | def _compute_reward(self, car_state): 75 | way_point1, way_point2 = self.car_agent.simGet2ClosestWayPoints() 76 | car_pos = car_state.kinematics_estimated.position 77 | car_point = np.array([car_pos.x_val, car_pos.y_val]) 78 | 79 | # perpendicular distance to the line connecting 2 closest way points, 80 | # this distance is approximate to distance to center 81 | distance_p1_to_p2p3 = lambda p1, p2, p3: abs(np.cross(p2-p3, p3-p1))/norm(p2-p3) 82 | distance_to_center = distance_p1_to_p2p3(car_point, way_point1, way_point2) 83 | 84 | reward = utils.compute_reward_distance_to_center(distance_to_center, self.track_width) 85 | return reward 86 | 87 | def _isDone(self, car_state, car_controls, reward): 88 | if reward < 0: 89 | return True 90 | 91 | car_pos = car_state.kinematics_estimated.position 92 | car_point = ([car_pos.x_val, car_pos.y_val]) 93 | destination = self.car_agent.simGetWayPoints()[-1] 94 | distance = norm(car_point-destination) 95 | if distance < 5: # 5m close to the destination, stop 96 | return True 97 | 98 | return False -------------------------------------------------------------------------------- /gym_airsim/car_agent.py: -------------------------------------------------------------------------------- 1 | from airsim import CarClient, CarControls, ImageRequest, ImageType 2 | from configparser import ConfigParser 3 | import numpy as np 4 | from numpy.linalg import norm 5 | from os.path import dirname, abspath, join 6 | 7 | class CarAgent(CarClient): 8 | def __init__(self): 9 | # connect to the AirSim simulator 10 | super().__init__() 11 | super().confirmConnection() 12 | super().enableApiControl(True) 13 | 14 | # read configuration 15 | config = ConfigParser() 16 | config.read(join(dirname(dirname(abspath(__file__))), 'config.ini')) 17 | 18 | self.image_height = int(config['airsim_settings']['image_height']) 19 | self.image_width = int(config['airsim_settings']['image_width']) 20 | self.image_channels = int(config['airsim_settings']['image_channels']) 21 | self.image_size = self.image_height * self.image_width * self.image_channels 22 | 23 | self.action_mode = int(config['car_agent']['action_mode']) 24 | self.throttle = float(config['car_agent']['fixed_throttle']) 25 | self.steering_granularity = int(config['car_agent']['steering_granularity']) 26 | steering_max = float(config['car_agent']['steering_max']) 27 | self.steering_values = np.arange(-steering_max, steering_max, 2*steering_max/(self.steering_granularity-1)).tolist() 28 | self.steering_values.append(steering_max) 29 | 30 | # fetch waypoints 31 | waypoint_regex = config['airsim_settings']['waypoint_regex'] 32 | self._fetchWayPoints(waypoint_regex) 33 | 34 | def restart(self): 35 | super().reset() 36 | super().enableApiControl(True) 37 | 38 | # get RGB image from the front camera 39 | def observe(self): 40 | size = 0 41 | while size != self.image_size: # Sometimes simGetImages() return an unexpected resonpse. 42 | # If so, try it again. 43 | response = super().simGetImages([ImageRequest(0, ImageType.Scene, False, False)])[0] 44 | img1d_rgb = np.frombuffer(response.image_data_uint8, dtype=np.uint8) 45 | size = img1d_rgb.size 46 | 47 | img3d_rgb = img1d_rgb.reshape(self.image_height, self.image_width, self.image_channels) 48 | return img3d_rgb 49 | 50 | def move(self, action): 51 | car_controls = self._interpret_action(action) 52 | super().setCarControls(car_controls) 53 | 54 | def _interpret_action(self, action): 55 | car_controls = CarControls() 56 | if (self.action_mode == 0): # change steering only, throttle is fixed 57 | car_controls.throttle = self.throttle 58 | car_controls.steering = self.steering_values[action] 59 | elif (self.action_mode == 1): # change both steering and throttle 60 | return NotImplemented 61 | elif (self.action_mode == 2): 62 | return NotImplemented 63 | else: 64 | return NotImplemented 65 | 66 | return car_controls 67 | 68 | def _fetchWayPoints(self, waypoint_regex): 69 | wp_names = super().simListSceneObjects(waypoint_regex) 70 | wp_names.sort() 71 | print(wp_names) 72 | vec2r_to_numpy_array = lambda vec: np.array([vec.x_val, vec.y_val]) 73 | 74 | self.waypoints = [] 75 | for wp in wp_names: 76 | pose = super().simGetObjectPose(wp) 77 | self.waypoints.append(vec2r_to_numpy_array(pose.position)) 78 | 79 | return 80 | 81 | def simGetWayPoints(self): 82 | return self.waypoints 83 | 84 | def simGet2ClosestWayPoints(self): 85 | total_distance = lambda p, p1, p2: norm(p-p1) + norm(p-p2) 86 | pos = super().simGetVehiclePose().position 87 | car_point = np.array([pos.x_val, pos.y_val]) 88 | 89 | min_dist = 9999999 90 | min_i = 0 91 | for i in range(len(self.waypoints)-1): 92 | dist = total_distance(car_point, self.waypoints[i], self.waypoints[i+1]) 93 | if dist < min_dist: 94 | min_dist = dist 95 | min_i = i 96 | 97 | return self.waypoints[min_i], self.waypoints[min_i+1] -------------------------------------------------------------------------------- /images/learning_process.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/learning_process.PNG -------------------------------------------------------------------------------- /images/metrics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/metrics.png -------------------------------------------------------------------------------- /images/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/thumbnail.png -------------------------------------------------------------------------------- /images/waypoint_actor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/waypoint_actor.png -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from rl.callbacks import ModelIntervalCheckpoint, FileLogger 2 | from rl.agents.dqn import DQNAgent 3 | from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy 4 | from rl.memory import SequentialMemory 5 | from rl.core import Processor 6 | from keras.optimizers import Adam 7 | from keras.models import Sequential 8 | from keras.layers import Dense, Activation, Flatten, Convolution2D, Permute, Concatenate 9 | from gym import spaces 10 | import numpy as np 11 | from PIL import Image 12 | from configparser import ConfigParser 13 | import os 14 | from os.path import join, pardir, exists 15 | 16 | from gym_airsim.airsim_car_env import AirSimCarEnv 17 | 18 | import tensorflow as tf 19 | from keras.backend.tensorflow_backend import set_session 20 | config = tf.ConfigProto() 21 | config.gpu_options.allow_growth = True #dynamically grow the memory used on the GPU 22 | set_session(tf.Session(config=config)) 23 | 24 | class AirSimCarProcessor(Processor): 25 | def process_observation(self, observation): 26 | assert observation.ndim == 3 # (height, width, channel) 27 | img = Image.fromarray(observation) 28 | img = img.resize(INPUT_SHAPE).convert('L') # resize and convert to grayscale 29 | processed_observation = np.array(img) 30 | assert processed_observation.shape == INPUT_SHAPE 31 | return processed_observation.astype('uint8') # saves storage in experience memory 32 | 33 | def process_state_batch(self, batch): 34 | # We could perform this processing step in `process_observation`. In this case, however, 35 | # we would need to store a `float32` array instead, which is 4x more memory intensive than 36 | # an `uint8` array. This matters if we store 1M observations. 37 | processed_batch = batch.astype('float32') / 255. 38 | return processed_batch 39 | 40 | def process_reward(self, reward): 41 | return np.clip(reward, -1., 1.) 42 | 43 | config = ConfigParser() 44 | config.read('config.ini') 45 | num_actions = int(config['car_agent']['actions']) 46 | 47 | WINDOW_LENGTH = 4 48 | INPUT_SHAPE = (84, 84) 49 | 50 | env = AirSimCarEnv() 51 | np.random.seed(123) 52 | 53 | # Next, we build our model. We use the same model that was described by Mnih et al. (2015). 54 | input_shape = (WINDOW_LENGTH,) + INPUT_SHAPE 55 | model = Sequential() 56 | model.add(Permute((2, 3, 1), input_shape=input_shape)) 57 | model.add(Convolution2D(32, (8, 8), strides=(4, 4))) 58 | model.add(Activation('relu')) 59 | model.add(Convolution2D(64, (4, 4), strides=(2, 2))) 60 | model.add(Activation('relu')) 61 | model.add(Convolution2D(64, (3, 3), strides=(1, 1))) 62 | model.add(Activation('relu')) 63 | model.add(Flatten()) 64 | model.add(Dense(512)) 65 | model.add(Activation('relu')) 66 | model.add(Dense(num_actions)) 67 | model.add(Activation('linear')) 68 | print(model.summary()) 69 | 70 | 71 | def build_callbacks(env_name): 72 | log_dir = 'logs' 73 | if not exists(log_dir): 74 | os.makedirs(log_dir) 75 | 76 | checkpoint_weights_filename = join(log_dir, 'dqn_' + env_name + '_weights_{step}.h5f') 77 | log_filename = join(log_dir,'dqn_{}_log.json'.format(env_name)) 78 | callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=25000)] 79 | callbacks += [FileLogger(log_filename, interval=100)] 80 | return callbacks 81 | 82 | memory = SequentialMemory(limit=50000, window_length=WINDOW_LENGTH) 83 | policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., 84 | value_min=.1, value_test=.05, nb_steps=1000000) 85 | processor = AirSimCarProcessor() 86 | 87 | dqn = DQNAgent(model=model, nb_actions=num_actions, policy=policy, memory=memory, 88 | processor=processor, nb_steps_warmup=50000, gamma=.99, target_model_update=10000, 89 | train_interval=4, delta_clip=1.) 90 | dqn.compile(Adam(lr=.0001), metrics=['mae']) 91 | 92 | callbacks = build_callbacks('AirSimCarRL') 93 | 94 | dqn.fit(env, nb_steps=2000000, 95 | visualize=False, 96 | verbose=2, 97 | callbacks=callbacks) -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from airsim import CarControls 2 | 3 | def compute_reward_distance_to_center(distance_to_center, track_width): 4 | ''' 5 | #TODO add description 6 | 7 | Parameters: 8 | 9 | 10 | Returns: 11 | 12 | ''' 13 | 14 | # Calculate 3 markers that are increasingly further away from the center line 15 | marker_1 = 0.1 * track_width 16 | marker_2 = 0.25 * track_width 17 | marker_3 = 0.5 * track_width 18 | 19 | # Give higher reward if the car is closer to center line and vice versa 20 | if distance_to_center <= marker_1: 21 | reward = 1 22 | elif distance_to_center <= marker_2: 23 | reward = 0.5 24 | elif distance_to_center <= marker_3: 25 | reward = 0.1 26 | else: 27 | reward = -1 # likely close to off track 28 | 29 | return reward --------------------------------------------------------------------------------