├── .DS_Store ├── LICENSE ├── README.md ├── code ├── .DS_Store ├── SoftAC.py └── train_SAC.py ├── docs └── ENPM690_Phase1_report.pdf └── results ├── .DS_Store ├── Output.png ├── network.001.jpeg ├── track2.png └── train3.png /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Rahul karanam 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Autonomous-Drifting-using-deep-Reinforcement-Learning 2 | A Soft Actor Policy based model free off-policy network to control the steering and throttle of car while drifting at high speeds. 3 | 4 | 5 | - Our model will be trained using the Soft-Actor Critic 6 | (SAC), which optimizes the error loss (anticipated return - prediction) and maximizes entropy using an off-policy 7 | learning strategy to perform better in continuous domains. 8 | 9 | 10 | - Control policy is the actor in SAC, while value and 11 | Q-network will function as critics. 12 | 13 | 14 | - The basic goal of the actor is to maximize reward while 15 | minimizing entropy (measure of randomness in the policy - more exploration) 16 | 17 | 18 | # Project Demo 19 | 20 | 21 | [![DEMO](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/Output.png)](https://youtu.be/0ne7wq-tK18) 22 | 23 | # Training 24 | [![DEMO](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/Output.png)](https://youtu.be/X3My8udx-Ck) 25 | 26 | 27 | ## Map for training 28 | ![](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/track2.png) 29 | 30 | ## Basic Demo of Simualtor 31 | ![](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/train3.png) 32 | 33 | 34 | 35 | ## Environment 36 | 37 | - Ubuntu 20.04 38 | - Conda : Package and environment manager 39 | - Python 3.8 40 | - Pytorch 41 | - Pygame 42 | 43 | ## Installation steps for CARLA simulator 44 | 45 | We are using CARLA [0.9.5](https://carla.readthedocs.io/en/0.9.5/getting_started/) as our version for our simulation. 46 | 47 | Please download the the simulator from this [drive](https://drive.google.com/file/d/1CefYTLF48YKU5sPkQXsCScsG3fRiY0Gv/view?usp=sharing) 48 | 49 | Extract the folder in your Downloads directory. 50 | 51 | If you have a dual GPU setup , please enter the following command to enable your secondary graphics card as the primary one. 52 | ``` 53 | export VK_ICD_FILENAMES="/usr/share/vulkan/icd.d/nvidia_icd.json" 54 | ``` 55 | 56 | ``` 57 | Add Path to your bash file 58 | 59 | export PYTHONPATH=$PYTHONPATH:~/Downloads/CARLA_DRIFT_0.9.5/PythonAPI/carla/dist/carla-0.9.5-py3.5-linux-x86_64.egg 60 | export PYTHONPATH=$PYTHONPATH:~/Downloads/CARLA_DRIFT_0.9.5/PythonAPI/carla/ 61 | 62 | ``` 63 | 64 | 65 | To open and run the simulator please enter the below commands 66 | open a new terminal 67 | ``` 68 | cd Downloads/CARLA_DRIFT_0.9.5 69 | ./CarlaUE4.sh /Game/Carla/ExportedMaps/simple 70 | ``` 71 | 72 | This will show up the map.If you want to spawn vehicles and manually control the vehicle in the above 73 | map please enter the below commands. 74 | 75 | ``` 76 | Open a New terminal 77 | cd Downloads/CARLA_DRIFT_0.9.5/PythonAPI/examples 78 | ./spawn_npc.py 79 | ``` 80 | 81 | This will spawn vehicels in the map. 82 | 83 | To control a vehicle in the environment, enter the below commands. 84 | ``` 85 | Open a New terminal 86 | cd Downloads/CARLA_DRIFT_0.9.5/PythonAPI/examples 87 | ./manual_control.py 88 | ``` 89 | 90 | 91 | 92 | #TODO 93 | 94 | Need to take reference trajectories data for the above map and train with the SAC. 95 | 96 | 97 | 98 | -------------------------------------------------------------------------------- /code/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/code/.DS_Store -------------------------------------------------------------------------------- /code/SoftAC.py: -------------------------------------------------------------------------------- 1 | # Soft Actor Critic (SAC) agent for Autonomous Drifting using Deep Reinforcement Learning 2 | 3 | 4 | # -*- coding: utf-8 -*- 5 | 6 | 7 | """ 8 | Created on Mon April 11 2022 9 | 10 | @author: Rahul Karanam 11 | 12 | @brief Implementation of Soft Actor Critic (SAC) agent for Autonomous Drifting using Deep Reinforcement Learning. 13 | 14 | I have written code following from the original Paper to implement the SAC algorithm but with slight modifications to make it work for my problem. 15 | 16 | References: https://arxiv.org/abs/1801.01290 17 | """ 18 | 19 | # Importing the libraries 20 | import numpy as np 21 | import os 22 | import torch 23 | import torch.nn as nn 24 | import torch.nn.functional as F 25 | import torch.optim as optim 26 | from torch.distributions import Normal,MultivariateNormal 27 | from tensorboardX import SummaryWriter 28 | from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler 29 | 30 | 31 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 32 | global path 33 | path = "./SAC_Results" 34 | 35 | 36 | class Actor(nn.Module): 37 | """Actor (Policy) Model. 38 | Policy Network 39 | 40 | @brief This class takes in an observation of the environment and returns the action that the actor chooses to execute. 41 | 42 | """ 43 | 44 | def __init__(self, state_size, action_size, fc1_units=512, fc2_units=256): 45 | """Initialize parameters and build model. 46 | Params 47 | ====== 48 | state_size (int): Dimension of each state 49 | action_size (int): Dimension of each action 50 | fc1_units (int): Number of nodes in first hidden layer 51 | fc2_units (int): Number of nodes in second hidden layer 52 | """ 53 | super(Actor, self).__init__() 54 | self.fc1 = nn.Linear(state_size, fc1_units) 55 | self.fc2 = nn.Linear(fc1_units, fc2_units) 56 | self.fc3 = nn.Linear(fc2_units, action_size) 57 | self.log_std_head = nn.Linear(fc2_units,action_size) # log_std head for the policy network 58 | self.min_log_std = -20 # Minimum value of log_std 59 | self.max_log_std = 2 # Maximum value of log_std 60 | 61 | def forward(self, state): 62 | """Build an actor (policy) network that maps states -> actions.""" 63 | x = F.relu(self.fc1(state)) 64 | x = F.relu(self.fc2(x)) 65 | x = self.fc3(x) 66 | x_log_std = self.log_std_head(x) # log_std head for the policy network 67 | x_log_std = torch.clamp(x_log_std, self.min_log_std, self.max_log_std) # Clamp the log_std to be between the min and max values 68 | 69 | return x, x_log_std 70 | 71 | 72 | class Critic(nn.Module): 73 | """Critic (Value) Model. 74 | Value Network 75 | 76 | @brief This class takes in an observation of the environment and returns the value of the state. 77 | 78 | """ 79 | 80 | def __init__(self, state_size, fcs1_units=256, fc2_units=256): 81 | """Initialize parameters and build model. 82 | Params 83 | ====== 84 | state_size (int): Dimension of each state 85 | fcs1_units (int): Number of nodes in the first hidden layer 86 | fc2_units (int): Number of nodes in the second hidden layer 87 | """ 88 | super(Critic, self).__init__() 89 | self.fcs1 = nn.Linear(state_size, fcs1_units) 90 | self.fc2 = nn.Linear(fcs1_units, fc2_units) 91 | self.fc3 = nn.Linear(fc2_units, 1) 92 | 93 | def forward(self, state, action): 94 | """Build a critic (value) network that maps (state, action) pairs -> Q-values.""" 95 | x = F.relu(self.fcs1(state)) 96 | x = F.relu(self.fc2(x)) 97 | x = self.fc3(x) 98 | return x 99 | 100 | 101 | class Q_net(nn.Module): 102 | """ Q Network """ 103 | 104 | def __init__(self, state_size, action_size, fc1_units=256): 105 | """Initialize parameters and build model. 106 | Params 107 | ====== 108 | state_size (int): Dimension of each state 109 | action_size (int): Dimension of each action 110 | fc1_units (int): Number of nodes in first hidden layer 111 | """ 112 | super(Q_net, self).__init__() 113 | self.state_size = state_size 114 | self.action_size = action_size 115 | self.fc1 = nn.Linear(state_size+action_size, fc1_units) 116 | self.fc2 = nn.Linear(fc1_units, fc1_units) 117 | self.fc3 = nn.Linear(fc1_units, 1) 118 | 119 | def forward(self, state, action): 120 | """Build a critic (value) network that maps (state, action) pairs -> Q-values.""" 121 | state = state.reshape(-1,state.shape[-1]) 122 | action = action.reshape(-1,action.shape[-1]) 123 | x = torch.cat((state,action),-1) # Concatenate the state and action 124 | x = F.relu(self.fc1(x)) 125 | x = F.relu(self.fc2(x)) 126 | x = self.fc3(x) 127 | return x 128 | 129 | 130 | 131 | class ReplayBuffer: 132 | """ Replay Buffer 133 | 134 | @brief This class is used to store the experience tuples in the form of (state, action, reward, next_state ,done) 135 | 136 | This is used for training the agent to learn from the experience state transition. 137 | 138 | """ 139 | 140 | def __init__(self, buffer_size,state_size,action_size): 141 | """Initialize parameters and build model. 142 | Params 143 | ====== 144 | buffer_size (int): Maximum size of the replay buffer 145 | state_size (int): Dimension of each state 146 | action_size (int): Dimension of each action 147 | """ 148 | self.buffer_size = buffer_size 149 | self.state_size = state_size 150 | self.action_size = action_size 151 | self.state_arr = torch.zeros(self.buffer_size,self.state_size).float().to(device) 152 | self.action_arr = torch.zeros(self.buffer_size,self.action_size).float().to(device) 153 | self.reward_arr = torch.zeros(self.buffer_size,1).float().to(device) 154 | self.next_state_arr = torch.zeros(self.buffer_size,self.state_size).float().to(device) 155 | self.done_arr = torch.zeros(self.buffer_size,1).float().to(device) 156 | self.ptr = 0 157 | 158 | def add(self,state,action,reward,next_state,done): 159 | """Add a new experience to the replay buffer""" 160 | self.ptr = (self.ptr) % self.buffer_size 161 | state = torch.tensor(state,dtype=torch.float32).to(device) 162 | action = torch.tensor(action,dtype=torch.float32).to(device) 163 | reward = torch.tensor(reward,dtype=torch.float32).to(device) 164 | next_state = torch.tensor(next_state,dtype=torch.float32).to(device) 165 | done = torch.tensor(done,dtype=torch.float32).to(device) 166 | 167 | # Now we add the experience to the replay buffer (i.e to individual arrays) 168 | for array,element in zip([self.state_arr,self.action_arr,self.reward_arr,self.next_state_arr,self.done_arr], 169 | [state,action,reward,next_state,done]): 170 | array[self.ptr] = element 171 | self.ptr += 1 172 | 173 | 174 | 175 | def sample(self, batch_size): 176 | """Sample a batch of experiences from the buffer""" 177 | ind = np.random.randint(0, self.buffer_size, size=batch_size,replace = False) 178 | batch_state, batch_action, batch_reward, batch_next_state, batch_done = self.state_arr[ind], self.action_arr[ind], self.reward_arr[ind], self.next_state_arr[ind], self.done_arr[ind] 179 | 180 | 181 | return batch_state, batch_action, batch_reward, batch_next_state, batch_done 182 | 183 | 184 | 185 | class SAC: 186 | """ Soft Actor-Critic (SAC) 187 | 188 | @brief This class implements the Soft Actor Critic using the above defined classes 189 | for Policy Network ( Actor ) , Value Network + Q Network (Q_net) --> ( Critic ) 190 | 191 | """ 192 | def __init__(self,state_size,action_size,alpha,buffer_size=400000,lr=3e-4,gamma=0.99,tau=0.005,batch_size=512): 193 | """Initialize parameters and build model. 194 | Params 195 | ====== 196 | state_size (int): Dimension of each state 197 | action_size (int): Dimension of each action 198 | lr (float): Learning rate 199 | gamma (float): Discount factor 200 | tau (float): Soft update parameter 201 | alpha (float): Entropy parameter 202 | beta (float): Entropy parameter 203 | batch_size (int): Batch size for training 204 | buffer_size (int): Size of the replay buffer 205 | """ 206 | super(SAC,self).__init__() 207 | self.state_size = state_size 208 | self.action_size = action_size 209 | self.lr = lr 210 | self.gamma = gamma 211 | self.tau = tau 212 | self.alpha = alpha 213 | self.batch_size = batch_size 214 | self.buffer_size = buffer_size 215 | 216 | # Initialize the policy network 217 | self.policy_net = Actor(state_size,action_size).to(device) 218 | 219 | 220 | # Initialize the value network and the target network 221 | self.value_net = Critic(state_size).to(device) 222 | self.target_net = Critic(state_size).to(device) 223 | 224 | # Initialize the Q network 225 | self.q_net_1 = Q_net(state_size,action_size).to(device) 226 | self.q_net_2 = Q_net(state_size,action_size).to(device) 227 | 228 | self.policy_optimizer = optim.Adam(self.policy_net.parameters(),lr=lr) 229 | self.value_optimizer = optim.Adam(self.value_net.parameters(),lr=lr) 230 | self.q_net_1_optimizer = optim.Adam(self.q_net_1.parameters(),lr=lr) 231 | self.q_net_2_optimizer = optim.Adam(self.q_net_2.parameters(),lr=lr) 232 | 233 | # Initialize the replay buffer 234 | self.replay_buffer = ReplayBuffer(buffer_size,state_size,action_size) 235 | self.ptr = 0 236 | self.writer = SummaryWriter("runs/sac") 237 | 238 | self.value_critic_loss = nn.MSELoss() 239 | self.q_net_1_loss = nn.MSELoss() 240 | self.q_net_2_loss = nn.MSELoss() 241 | 242 | for trg_params,src_params in zip(self.target_net.parameters(),self.value_net.parameters()): 243 | trg_params.data.copy_(src_params.data) 244 | 245 | self.steering_range = (-0.8,0.8) # steering angle range for the car according to the simulator(CARLA) to showcase highspeed during drifting without rolling over. 246 | self.throttle_range = (0.6,1.0) # throttle range for the car according to the simulator(CARLA) to showcase highspeed during drifting. 247 | 248 | 249 | 250 | def select_action(self,state): 251 | """Select an action from the current policy""" 252 | state = torch.tensor(state,dtype=torch.float32).to(device) 253 | mu,sigma = self.policy_net(state) 254 | sigma = torch.exp(sigma) 255 | dist_space = Normal(mu,sigma) 256 | z = dist_space.sample() 257 | 258 | steer_action = float(torch.tanh(z[0,0])).detach().cpu().numpy() 259 | throttle_action = float(torch.sigmoid(z[0,1])).detach().cpu().numpy() 260 | 261 | steer_action = np.clip(steer_action,self.steering_range[0],self.steering_range[1]) 262 | throttle_action = np.clip(throttle_action,self.throttle_range[0],self.throttle_range[1]) 263 | 264 | return np.array([steer_action,throttle_action]) 265 | 266 | 267 | def test(self,state): 268 | """Test the current policy""" 269 | state = torch.tensor(state,dtype=torch.float32).to(device) 270 | mu,sigma = self.policy_net(state) 271 | 272 | z = mu 273 | 274 | steer_action = float(torch.tanh(z[0,0])).detach().cpu().numpy() 275 | throttle_action = float(torch.sigmoid(z[0,1])).detach().cpu().numpy() 276 | 277 | steer_action = np.clip(steer_action,self.steering_range[0],self.steering_range[1]) 278 | throttle_action = np.clip(throttle_action,self.throttle_range[0],self.throttle_range[1]) 279 | 280 | return np.array([steer_action,throttle_action]) 281 | 282 | def evaluate(self,state): 283 | "Evaulation of the current model" 284 | batch = state.size()[0] 285 | batch_mu,batch_sigma = self.policy_net(state) 286 | batch_sigma = torch.exp(batch_sigma) 287 | dist_space = Normal(batch_mu,batch_sigma) 288 | noise = Normal(0,1).sample(sample_shape=(batch,self.action_size)) 289 | 290 | action = torch.tanh(batch_mu + batch_sigma*noise.to(device)) 291 | 292 | prob_log = dist_space.log_prob(batch_mu + batch_sigma*noise.to(device)) - torch.log(1-torch.pow(action,2)+1e-6) 293 | 294 | prob_log_0= prob_log[:,0].reshape(batch,1) 295 | prob_log_1= prob_log[:,1].reshape(batch,1) 296 | prob_log = torch.cat((prob_log_0,prob_log_1),dim=1) 297 | 298 | return action,prob_log,noise,batch_mu,batch_sigma 299 | 300 | 301 | def update(self): 302 | "This function will update the policy and the value network" 303 | if self.ptr % 500 == 0: 304 | print("Updating the target network") 305 | print("---- Started Training ----") 306 | print("Train - \t{} times".format(self.ptr)) 307 | 308 | self.ptr = 0 309 | 310 | # Sample a batch of transitions 311 | state,action,reward,next_state,done = self.replay_buffer.sample(self.batch_size) 312 | 313 | target_val= self.target_net(next_state) 314 | next_q_val=reward + (1-done)*self.gamma*target_val 315 | 316 | expect_val = self.value_net(state) 317 | expect_q1,expect_q2 = self.q_net_1(state,action),self.q_net_2(state,action) 318 | sample_action,prob_log,noise,batch_mu,batch_sigma = self.evaluate(state) 319 | expect_q = torch.min(self.q_net_1(state,sample_action),self.q_net_2(state,sample_action)) 320 | next_val = expect_q - prob_log 321 | 322 | # Calculate the value loss 323 | # Loss function for the value network 324 | J_V_loss = self.value_critic_loss(expect_val,next_val.detach()).mean() 325 | 326 | # Loss function for the Q network 327 | Q1_loss = self.q_net_1_loss(expect_q1,next_q_val.detach()).mean() 328 | Q2_loss = self.q_net_2_loss(expect_q2,next_q_val.detach()).mean() 329 | 330 | pi_loss = -(expect_q - prob_log).mean() # Policy Loss 331 | # The above line is adapted from the original paper 332 | 333 | self.writer.add_scalar("Loss/J_V_loss",J_V_loss,self.ptr) 334 | self.writer.add_scalar("Loss/Q1_loss",Q1_loss,self.ptr) 335 | self.writer.add_scalar("Loss/Q2_loss",Q2_loss,self.ptr) 336 | self.writer.add_scalar("Loss/pi_loss",pi_loss,self.ptr) 337 | 338 | 339 | # Update the networks 340 | # Update the value network 341 | self.value_optimizer.zero_grad() 342 | J_V_loss.backward(retain_graph=True) 343 | nn.utils.clip_grad_norm_(self.value_net.parameters(),0.5) 344 | self.value_optimizer.step() 345 | 346 | # Update the policy network (Q network) 347 | self.q_net_1_optimizer.zero_grad() 348 | Q1_loss.backward(retain_graph=True) 349 | nn.utils.clip_grad_norm_(self.q_net_1.parameters(),0.5) 350 | self.q_net_1_optimizer.step() 351 | 352 | self.q_net_2_optimizer.zero_grad() 353 | Q2_loss.backward(retain_graph=True) 354 | nn.utils.clip_grad_norm_(self.q_net_2.parameters(),0.5) 355 | self.q_net_2_optimizer.step() 356 | 357 | 358 | # Update the policy network (Policy network) 359 | self.policy_optimizer.zero_grad() 360 | pi_loss.backward(retain_graph = True) 361 | nn.utils.clip_grad_norm_(self.policy_net.parameters(),0.5) 362 | self.policy_optimizer.step() 363 | 364 | 365 | # Update the target networks 366 | for trg_params,src_params in zip(self.target_net.parameters(),self.value_net.parameters()): 367 | trg_params.data.copy_(trg_params.data*self.tau + src_params.data*(1-self.tau)) 368 | 369 | self.ptr += 1 370 | 371 | 372 | def save_model(self,path,epoch,buffer_size): 373 | 374 | os.makedirs(path+str(buffer_size),exist_ok=True) 375 | torch.save(self.policy_net.state_dict(),path+str(buffer_size)+"/policy_net_"+str(epoch)+".pth") 376 | torch.save(self.value_net.state_dict(),path+str(buffer_size)+"/value_net_"+str(epoch)+".pth") 377 | torch.save(self.q_net_1.state_dict(),path+str(buffer_size)+"/q_net_1_"+str(epoch)+".pth") 378 | torch.save(self.q_net_2.state_dict(),path+str(buffer_size)+"/q_net_2_"+str(epoch)+".pth") 379 | 380 | print("Model Saved") 381 | 382 | 383 | def load_model(self,path,epoch,buffer_size): 384 | 385 | self.policy_net.load_state_dict(torch.load(path+str(buffer_size)+"/policy_net_"+str(epoch)+".pth")) 386 | self.value_net.load_state_dict(torch.load(path+str(buffer_size)+"/value_net_"+str(epoch)+".pth")) 387 | self.q_net_1.load_state_dict(torch.load(path+str(buffer_size)+"/q_net_1_"+str(epoch)+".pth")) 388 | self.q_net_2.load_state_dict(torch.load(path+str(buffer_size)+"/q_net_2_"+str(epoch)+".pth")) 389 | 390 | print("Model Loaded") 391 | 392 | 393 | def save_buffer(self,path,epoch,buffer_size): 394 | path = path+str(buffer_size)+'/' 395 | self.policy_net.load_state_dict(torch.load(path+"policy_net_"+str(epoch)+".pth")) 396 | self.value_net.load_state_dict(torch.load(path+"value_net_"+str(epoch)+".pth")) 397 | self.q_net_1.load_state_dict(torch.load(path+"q_net_1_"+str(epoch)+".pth")) 398 | self.q_net_2.load_state_dict(torch.load(path+"q_net_2_"+str(epoch)+".pth")) 399 | print("Model Loaded") -------------------------------------------------------------------------------- /code/train_SAC.py: -------------------------------------------------------------------------------- 1 | """ 2 | 3 | # -*- coding: utf-8 -*- 4 | 5 | 6 | 7 | Created on Mon April 11 2022 8 | 9 | @author: Rahul Karanam 10 | @brief Train a SAC agent for the High Speed Driving on CARLA simulator. ( I'll be training on the map - 1 which 11 | can be found in the reference folder ) 12 | 13 | 14 | """ 15 | 16 | # Importing the libraries 17 | import sys 18 | from SoftAC import * 19 | import time 20 | import numpy as np 21 | import random 22 | import pygame 23 | import matplotlib.pyplot as plt 24 | import matplotlib.patches as patches 25 | from agents.navigation.basic_agent import BasicAgent 26 | 27 | 28 | if __name__ == "__main__": 29 | 30 | # To check pygame is working or not 31 | print("Working:[1]") 32 | pygame.init() 33 | print("Working:[2]") 34 | pygame.font.init() 35 | print("Working:[3]") 36 | env = environment(traj_num = 1) 37 | -------------------------------------------------------------------------------- /docs/ENPM690_Phase1_report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/docs/ENPM690_Phase1_report.pdf -------------------------------------------------------------------------------- /results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/.DS_Store -------------------------------------------------------------------------------- /results/Output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/Output.png -------------------------------------------------------------------------------- /results/network.001.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/network.001.jpeg -------------------------------------------------------------------------------- /results/track2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/track2.png -------------------------------------------------------------------------------- /results/train3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/train3.png --------------------------------------------------------------------------------