├── README.md
├── __init__.py
├── config.ini
├── gym_airsim
    ├── __init__.py
    ├── airsim_car_env.py
    └── car_agent.py
├── images
    ├── learning_process.PNG
    ├── metrics.png
    ├── thumbnail.png
    └── waypoint_actor.png
├── train.py
├── training_car_a2c_steering_only.ipynb
├── training_car_ppo2_steering_only.ipynb
├── training_steering_only.ipynb
└── utils.py


/README.md:
--------------------------------------------------------------------------------
 1 | # AirSim-RL
 2 | This project provides experiments with Deep Reinforcement Learning on autonomous vehicles (car and uav drone) in [AirSim](https://github.com/microsoft/AirSim). The followings are used to facilitate training models:
 3 | *[OpenAI Gym](https://gym.openai.com/)
 4 | *[Keras-RL](https://github.com/keras-rl/keras-rl) 
 5 | *[Stable Baselines](https://stable-baselines.readthedocs.io/en/master/)
 6 | DRL algorithms:
 7 | * DQN
 8 | * A2C
 9 | * PPO2
10 | 
11 | ## Demo 
12 | [![Demo Video](images/thumbnail.png)](https://youtu.be/oiGjnpcPB_8)
13 | * In this demo, The Deep Q Network (DQN) is used. The reward calculated based on the distance between the car to the center of the track. The episode stops when it is close to off track.
14 | * The input for training is frames of RGB image captured from the head-front camera on car. 
15 | 
16 | ## How do I set up my environment
17 | * Build AirSim, Unreal Engine 4 (UE4) on Windows (https://github.com/microsoft/AirSim/blob/master/docs/build_windows.md)
18 | * Create a new UE4 project with map [RaceCourse](https://github.com/microsoft/AirSim/wiki/build_FSTDriverless_windows)
19 | * Install conda an environment with the important packages: python 3.7, tensorflow 1.14, keras 2.2.5 , keras-rl 0.4.2, gym 0.17.2 (The versions come along are just for reference)
20 | * Arrange some waypoints in the center of the road and name it as "WayPoints[number]" (see the following picture). This list of waypoints is used to locate the road approximately in map. Car's position and waypoints are used in calculate the reward during training. As illustrated below, UE4 'Empty Actors' are used as waypoints. These waypoints are invisible when the game is running.
21 | ![waypoints](images/waypoint_actor.png)
22 | 
23 | ## How to train
24 | * First take a look in the parameters in Config.ini file to understand some settings, like input image resolution, action space of the agent, etc.
25 | ```
26 | # settings related to UE4/airsim 
27 | [airsim_settings] 
28 | image_height = 144
29 | image_width = 256
30 | image_channels = 3
31 | waypoint_regex = WayPoint.*
32 | track_width = 12 
33 | 
34 | # settings related to training car agent
35 | [car_agent]
36 | # what are adjusted in driving the car
37 | # 0: change steering only, 1(not supported yet): change throttle and steering,
38 | # 2(not supported yet): change throttle, steering, and brake
39 | action_mode = 0 
40 | # steering value from left to right in range [-1, 1] 
41 | # e.g: 0.3 means steering is from -0.3 (left) to 0.3 (right)
42 | steering_max = 0.3
43 | # the granularity of steering range as we use discrete values for steering
44 | # e.g: 7 will produce discrete steering(with max=0.3) actions as: -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3
45 | steering_granularity = 7 
46 | # car's acceleration, now it is fixed, but will add a range of values later 
47 | fixed_throttle = 0.75 
48 | # total actions of car agent, update this accordingly when changing the above settings
49 | actions = 7 
50 | ```
51 | * Then, run the jupyter notebook (or train.py)
52 | 
53 | ## TODO
54 | * Expand action spaces, try more reward functions
55 | * Add OpenAI Gym env for UAV drone
56 | 
57 | ## References
58 | * [create custom open ai gym environment](https://stable-baselines.readthedocs.io/en/master/guide/custom_env.html)
59 | * [training dqn with keras-rl](https://github.com/keras-rl/keras-rl/blob/master/examples/dqn_atari.py) how to use keras-rl with atari game
60 | * https://github.com/Kjell-K/AirGym
61 | * [AWS DeepRacer](https://docs.aws.amazon.com/deepracer/latest/developerguide/what-is-deepracer.html)
62 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/__init__.py


--------------------------------------------------------------------------------
/config.ini:
--------------------------------------------------------------------------------
 1 | # settings related to UE4/airsim 
 2 | [airsim_settings] 
 3 | image_height = 144
 4 | image_width = 256
 5 | image_channels = 3
 6 | waypoint_regex = WayPoint.*
 7 | track_width = 12 
 8 | 
 9 | # settings related to training car agent
10 | [car_agent]
11 | # what are adjusted in driving the car
12 | # 0: change steering only, 1(not supported yet): change throttle and steering,
13 | # 2(not supported yet): change throttle, steering, and brake
14 | action_mode = 0 
15 | # steering value from left to right in range [-1, 1] 
16 | # e.g: 0.3 means steering is from -0.3 (left) to 0.3 (right)
17 | steering_max = 0.3
18 | # the granularity of steering range as we use discrete values for steering
19 | # e.g: 7 will produce discrete steering(with max=0.3) actions as: -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3
20 | steering_granularity = 7 
21 | # car's acceleration, now it is fixed, but will add a range of values later 
22 | fixed_throttle = 0.75 
23 | # total actions of car agent, update this accordingly when changing the above settings
24 | actions = 7 


--------------------------------------------------------------------------------
/gym_airsim/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/gym_airsim/__init__.py


--------------------------------------------------------------------------------
/gym_airsim/airsim_car_env.py:
--------------------------------------------------------------------------------
 1 | from configparser import ConfigParser
 2 | import gym
 3 | from gym import spaces
 4 | import numpy as np
 5 | from numpy.linalg import norm
 6 | from .car_agent import CarAgent
 7 | from os.path import dirname, abspath, join
 8 | import sys
 9 | sys.path.append('..')
10 | import utils
11 | 
12 | class AirSimCarEnv(gym.Env):
13 |     """Custom Environment that follows gym interface"""
14 |     metadata = {'render.modes': ['human']}
15 | 
16 |     def __init__(self):
17 |         super().__init__()
18 |         # Define action and observation space
19 |         # They must be gym.spaces objects
20 | 
21 |         config = ConfigParser()
22 |         config.read(join(dirname(dirname(abspath(__file__))), 'config.ini'))
23 | 
24 |         # Using discrete actions
25 |         #TODO Compute number of actions from other settings
26 |         self.action_space = spaces.Discrete(int(config['car_agent']['actions']))
27 |     
28 |         self.image_height = int(config['airsim_settings']['image_height'])
29 |         self.image_width = int(config['airsim_settings']['image_width'])
30 |         self.image_channels = int(config['airsim_settings']['image_channels'])
31 |         image_shape = (self.image_height, self.image_width, self.image_channels)
32 | 
33 |         self.track_width = float(config['airsim_settings']['track_width'])
34 | 
35 |         # Using image as input:
36 |         self.observation_space = spaces.Box(low=0, high=255, shape=image_shape, dtype=np.uint8)
37 | 
38 |         self.car_agent = CarAgent()
39 | 
40 |     def step(self, action):
41 |         # move the car according to the action
42 |         self.car_agent.move(action)
43 | 
44 |         # compute reward
45 |         car_state= self.car_agent.getCarState()
46 |         reward = self._compute_reward(car_state)
47 | 
48 |         # check if the episode is done
49 |         car_controls = self.car_agent.getCarControls()
50 |         done = self._isDone(car_state, car_controls, reward)
51 | 
52 |         # log info
53 |         info = {}
54 |         #info = {"x_pos" :x_val, "y_pos" : y_val}
55 | 
56 |         # get observation
57 |         observation = self.car_agent.observe()
58 | 
59 |         return observation, reward, done, info
60 | 
61 |     def reset(self):
62 |         self.car_agent.restart()
63 |         observation = self.car_agent.observe()
64 | 
65 |         return observation  # reward, done, info can't be included
66 | 
67 |     def render(self, mode='human'):
68 |         return #nothing
69 | 
70 |     def close (self):
71 |         self.car_agent.reset()
72 |         return
73 | 
74 |     def _compute_reward(self, car_state):
75 |         way_point1, way_point2 = self.car_agent.simGet2ClosestWayPoints()
76 |         car_pos = car_state.kinematics_estimated.position
77 |         car_point = np.array([car_pos.x_val, car_pos.y_val])
78 | 
79 |         # perpendicular  distance to the line connecting 2 closest way points,
80 |         # this distance is approximate to distance to center
81 |         distance_p1_to_p2p3 = lambda p1, p2, p3: abs(np.cross(p2-p3, p3-p1))/norm(p2-p3)
82 |         distance_to_center = distance_p1_to_p2p3(car_point, way_point1, way_point2)
83 | 
84 |         reward = utils.compute_reward_distance_to_center(distance_to_center, self.track_width)
85 |         return reward
86 | 
87 |     def _isDone(self, car_state, car_controls, reward):
88 |         if reward < 0:
89 |             return True
90 | 
91 |         car_pos = car_state.kinematics_estimated.position
92 |         car_point = ([car_pos.x_val, car_pos.y_val])
93 |         destination = self.car_agent.simGetWayPoints()[-1]
94 |         distance = norm(car_point-destination)
95 |         if distance < 5: # 5m close to the destination, stop
96 |             return True
97 | 
98 |         return False


--------------------------------------------------------------------------------
/gym_airsim/car_agent.py:
--------------------------------------------------------------------------------
 1 | from airsim import CarClient, CarControls, ImageRequest, ImageType
 2 | from configparser import ConfigParser
 3 | import numpy as np
 4 | from numpy.linalg import norm
 5 | from os.path import dirname, abspath, join
 6 | 
 7 | class CarAgent(CarClient):
 8 |     def __init__(self):
 9 |         # connect to the AirSim simulator
10 |         super().__init__()
11 |         super().confirmConnection()
12 |         super().enableApiControl(True)
13 | 
14 |         # read configuration
15 |         config = ConfigParser()
16 |         config.read(join(dirname(dirname(abspath(__file__))), 'config.ini'))
17 |         
18 |         self.image_height = int(config['airsim_settings']['image_height'])
19 |         self.image_width = int(config['airsim_settings']['image_width'])
20 |         self.image_channels = int(config['airsim_settings']['image_channels'])
21 |         self.image_size = self.image_height * self.image_width * self.image_channels
22 | 
23 |         self.action_mode = int(config['car_agent']['action_mode'])
24 |         self.throttle = float(config['car_agent']['fixed_throttle'])
25 |         self.steering_granularity = int(config['car_agent']['steering_granularity'])
26 |         steering_max = float(config['car_agent']['steering_max'])
27 |         self.steering_values = np.arange(-steering_max, steering_max, 2*steering_max/(self.steering_granularity-1)).tolist()
28 |         self.steering_values.append(steering_max)
29 |         
30 |         # fetch waypoints
31 |         waypoint_regex = config['airsim_settings']['waypoint_regex']
32 |         self._fetchWayPoints(waypoint_regex)
33 |    
34 |     def restart(self):
35 |         super().reset()
36 |         super().enableApiControl(True)
37 |         
38 |     # get RGB image from the front camera    
39 |     def observe(self):
40 |         size = 0
41 |         while size != self.image_size: # Sometimes simGetImages() return an unexpected resonpse. 
42 |                                        # If so, try it again.
43 |             response = super().simGetImages([ImageRequest(0, ImageType.Scene, False, False)])[0]
44 |             img1d_rgb = np.frombuffer(response.image_data_uint8, dtype=np.uint8)
45 |             size = img1d_rgb.size
46 |             
47 |         img3d_rgb = img1d_rgb.reshape(self.image_height, self.image_width, self.image_channels)
48 |         return img3d_rgb
49 |     
50 |     def move(self, action):
51 |         car_controls = self._interpret_action(action)
52 |         super().setCarControls(car_controls)
53 |                 
54 |     def _interpret_action(self, action):
55 |         car_controls = CarControls()
56 |         if (self.action_mode == 0): # change steering only, throttle is fixed
57 |             car_controls.throttle = self.throttle
58 |             car_controls.steering = self.steering_values[action]
59 |         elif (self.action_mode == 1): # change both steering and throttle
60 |             return NotImplemented
61 |         elif (self.action_mode == 2):
62 |             return NotImplemented
63 |         else:
64 |             return NotImplemented
65 |                    
66 |         return car_controls
67 |     
68 |     def _fetchWayPoints(self, waypoint_regex):
69 |         wp_names = super().simListSceneObjects(waypoint_regex)
70 |         wp_names.sort()
71 |         print(wp_names)
72 |         vec2r_to_numpy_array = lambda vec: np.array([vec.x_val, vec.y_val])
73 |         
74 |         self.waypoints = []
75 |         for wp in wp_names: 
76 |             pose = super().simGetObjectPose(wp)
77 |             self.waypoints.append(vec2r_to_numpy_array(pose.position))
78 |                
79 |         return
80 |         
81 |     def simGetWayPoints(self):
82 |         return self.waypoints
83 |     
84 |     def simGet2ClosestWayPoints(self):
85 |         total_distance = lambda p, p1, p2: norm(p-p1) + norm(p-p2)
86 |         pos = super().simGetVehiclePose().position
87 |         car_point = np.array([pos.x_val, pos.y_val])
88 |         
89 |         min_dist = 9999999 
90 |         min_i = 0
91 |         for i in range(len(self.waypoints)-1):
92 |             dist = total_distance(car_point, self.waypoints[i], self.waypoints[i+1])
93 |             if dist < min_dist:
94 |                 min_dist = dist
95 |                 min_i = i
96 |         
97 |         return self.waypoints[min_i], self.waypoints[min_i+1]


--------------------------------------------------------------------------------
/images/learning_process.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/learning_process.PNG


--------------------------------------------------------------------------------
/images/metrics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/metrics.png


--------------------------------------------------------------------------------
/images/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/thumbnail.png


--------------------------------------------------------------------------------
/images/waypoint_actor.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hoangtranngoc/AirSim-RL/b4700049d8c2c519476d4ee15f2504795f80df9d/images/waypoint_actor.png


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
 1 | from rl.callbacks import ModelIntervalCheckpoint, FileLogger
 2 | from rl.agents.dqn import DQNAgent
 3 | from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
 4 | from rl.memory import SequentialMemory
 5 | from rl.core import Processor
 6 | from keras.optimizers import Adam
 7 | from keras.models import Sequential
 8 | from keras.layers import Dense, Activation, Flatten, Convolution2D, Permute, Concatenate
 9 | from gym import spaces
10 | import numpy as np
11 | from PIL import Image
12 | from configparser import ConfigParser
13 | import os
14 | from os.path import join, pardir, exists
15 | 
16 | from gym_airsim.airsim_car_env import AirSimCarEnv
17 | 
18 | import tensorflow as tf
19 | from keras.backend.tensorflow_backend import set_session
20 | config = tf.ConfigProto()
21 | config.gpu_options.allow_growth = True  #dynamically grow the memory used on the GPU
22 | set_session(tf.Session(config=config))
23 | 
24 | class AirSimCarProcessor(Processor):
25 |     def process_observation(self, observation):
26 |         assert observation.ndim == 3  # (height, width, channel)
27 |         img = Image.fromarray(observation)
28 |         img = img.resize(INPUT_SHAPE).convert('L')  # resize and convert to grayscale
29 |         processed_observation = np.array(img)
30 |         assert processed_observation.shape == INPUT_SHAPE
31 |         return processed_observation.astype('uint8')  # saves storage in experience memory
32 | 
33 |     def process_state_batch(self, batch):
34 |         # We could perform this processing step in `process_observation`. In this case, however,
35 |         # we would need to store a `float32` array instead, which is 4x more memory intensive than
36 |         # an `uint8` array. This matters if we store 1M observations.
37 |         processed_batch = batch.astype('float32') / 255.
38 |         return processed_batch
39 | 
40 |     def process_reward(self, reward):
41 |         return np.clip(reward, -1., 1.)
42 | 
43 | config = ConfigParser()
44 | config.read('config.ini')
45 | num_actions = int(config['car_agent']['actions'])
46 |                     
47 | WINDOW_LENGTH = 4
48 | INPUT_SHAPE = (84, 84)
49 | 
50 | env = AirSimCarEnv()
51 | np.random.seed(123)
52 | 
53 | # Next, we build our model. We use the same model that was described by Mnih et al. (2015).
54 | input_shape = (WINDOW_LENGTH,) + INPUT_SHAPE
55 | model = Sequential()
56 | model.add(Permute((2, 3, 1), input_shape=input_shape))
57 | model.add(Convolution2D(32, (8, 8), strides=(4, 4)))
58 | model.add(Activation('relu'))
59 | model.add(Convolution2D(64, (4, 4), strides=(2, 2)))
60 | model.add(Activation('relu'))
61 | model.add(Convolution2D(64, (3, 3), strides=(1, 1)))
62 | model.add(Activation('relu'))
63 | model.add(Flatten())
64 | model.add(Dense(512))
65 | model.add(Activation('relu'))
66 | model.add(Dense(num_actions))
67 | model.add(Activation('linear'))
68 | print(model.summary())
69 | 
70 | 
71 | def build_callbacks(env_name):
72 |     log_dir = 'logs'
73 |     if not exists(log_dir):
74 |         os.makedirs(log_dir)
75 |     
76 |     checkpoint_weights_filename = join(log_dir, 'dqn_' + env_name + '_weights_{step}.h5f')
77 |     log_filename = join(log_dir,'dqn_{}_log.json'.format(env_name))
78 |     callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=25000)]
79 |     callbacks += [FileLogger(log_filename, interval=100)]
80 |     return callbacks
81 | 
82 | memory = SequentialMemory(limit=50000, window_length=WINDOW_LENGTH)
83 | policy = LinearAnnealedPolicy(EpsGreedyQPolicy(),  attr='eps', value_max=1., 
84 |                               value_min=.1, value_test=.05, nb_steps=1000000)
85 | processor = AirSimCarProcessor()
86 | 
87 | dqn = DQNAgent(model=model, nb_actions=num_actions, policy=policy, memory=memory,
88 |                processor=processor, nb_steps_warmup=50000, gamma=.99, target_model_update=10000,
89 |                train_interval=4, delta_clip=1.)
90 | dqn.compile(Adam(lr=.0001), metrics=['mae'])
91 | 
92 | callbacks = build_callbacks('AirSimCarRL')
93 | 
94 | dqn.fit(env, nb_steps=2000000,
95 |         visualize=False,
96 |         verbose=2,
97 |         callbacks=callbacks)


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
 1 | from airsim import CarControls
 2 | 
 3 | def compute_reward_distance_to_center(distance_to_center, track_width):
 4 |     '''
 5 |     #TODO add description
 6 | 
 7 |     Parameters:
 8 |         
 9 | 
10 |     Returns:
11 |          
12 |     '''
13 | 
14 |     # Calculate 3 markers that are increasingly further away from the center line
15 |     marker_1 = 0.1 * track_width
16 |     marker_2 = 0.25 * track_width
17 |     marker_3 = 0.5 * track_width
18 | 
19 |     # Give higher reward if the car is closer to center line and vice versa
20 |     if distance_to_center <= marker_1:
21 |         reward = 1
22 |     elif distance_to_center <= marker_2:
23 |         reward = 0.5
24 |     elif distance_to_center <= marker_3:
25 |         reward = 0.1
26 |     else:
27 |         reward = -1  # likely close to off track
28 | 
29 |     return reward


--------------------------------------------------------------------------------