├── .DS_Store
├── LICENSE
├── README.md
├── code
    ├── .DS_Store
    ├── SoftAC.py
    └── train_SAC.py
├── docs
    └── ENPM690_Phase1_report.pdf
└── results
    ├── .DS_Store
    ├── Output.png
    ├── network.001.jpeg
    ├── track2.png
    └── train3.png


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/.DS_Store


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Rahul karanam
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Autonomous-Drifting-using-deep-Reinforcement-Learning
 2 | A Soft Actor Policy based model free off-policy network to control the steering and throttle of car while drifting at high speeds.
 3 | 
 4 | 
 5 | - Our model will be trained using the Soft-Actor Critic
 6 | (SAC), which optimizes the error loss (anticipated return - prediction) and maximizes entropy using an off-policy
 7 | learning strategy to perform better in continuous domains.
 8 | 
 9 | 
10 | - Control policy is the actor in SAC, while value and
11 | Q-network will function as critics.
12 | 
13 | 
14 | - The basic goal of the actor is to maximize reward while
15 | minimizing entropy (measure of randomness in the policy - more exploration)
16 | 
17 | 
18 | # Project Demo
19 | 
20 | 
21 | [![DEMO](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/Output.png)](https://youtu.be/0ne7wq-tK18)
22 | 
23 | # Training
24 | [![DEMO](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/Output.png)](https://youtu.be/X3My8udx-Ck)
25 | 
26 | 
27 | ## Map for training
28 | ![](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/track2.png)
29 | 
30 | ## Basic Demo of Simualtor
31 | ![](https://github.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/blob/main/results/train3.png)
32 | 
33 | 
34 | 
35 | ## Environment
36 | 
37 | - Ubuntu 20.04
38 | - Conda : Package and environment manager
39 | - Python 3.8
40 | - Pytorch
41 | - Pygame
42 | 
43 | ## Installation steps for CARLA simulator
44 | 
45 | We are using CARLA [0.9.5](https://carla.readthedocs.io/en/0.9.5/getting_started/) as our version for our simulation.
46 | 
47 | Please download the the simulator from this [drive](https://drive.google.com/file/d/1CefYTLF48YKU5sPkQXsCScsG3fRiY0Gv/view?usp=sharing)
48 | 
49 | Extract the folder in your Downloads directory.
50 | 
51 | If you have a dual GPU setup , please enter the following command to enable your secondary graphics card as the primary one.
52 | ```
53 | export VK_ICD_FILENAMES="/usr/share/vulkan/icd.d/nvidia_icd.json"
54 | ```
55 | 
56 | ```
57 | Add Path to your bash file
58 | 
59 | export PYTHONPATH=$PYTHONPATH:~/Downloads/CARLA_DRIFT_0.9.5/PythonAPI/carla/dist/carla-0.9.5-py3.5-linux-x86_64.egg
60 | export PYTHONPATH=$PYTHONPATH:~/Downloads/CARLA_DRIFT_0.9.5/PythonAPI/carla/
61 | 
62 | ```
63 | 
64 | 
65 | To open and run the simulator please enter the below commands
66 | open a new terminal
67 | ```
68 | cd Downloads/CARLA_DRIFT_0.9.5
69 | ./CarlaUE4.sh /Game/Carla/ExportedMaps/simple
70 | ```
71 | 
72 | This will show up the map.If you want to spawn vehicles and manually control the vehicle in the above 
73 | map please enter the below commands.
74 | 
75 | ```
76 | Open a New terminal
77 | cd Downloads/CARLA_DRIFT_0.9.5/PythonAPI/examples
78 | ./spawn_npc.py
79 | ```
80 | 
81 | This will spawn vehicels in the map.
82 | 
83 | To control a vehicle in the environment, enter the below commands.
84 | ```
85 | Open a New terminal
86 | cd Downloads/CARLA_DRIFT_0.9.5/PythonAPI/examples
87 | ./manual_control.py
88 | ```
89 | 
90 | 
91 | 
92 | #TODO
93 | 
94 | Need to take reference trajectories data for the above map and train with the SAC.
95 | 
96 | 
97 | 
98 | 


--------------------------------------------------------------------------------
/code/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/code/.DS_Store


--------------------------------------------------------------------------------
/code/SoftAC.py:
--------------------------------------------------------------------------------
  1 | #  Soft Actor Critic (SAC) agent for Autonomous Drifting using Deep Reinforcement Learning
  2 | 
  3 | 
  4 | # -*- coding: utf-8 -*-
  5 | 
  6 | 
  7 | """
  8 | Created on Mon April 11 2022
  9 | 
 10 | @author: Rahul Karanam
 11 | 
 12 | @brief Implementation of Soft Actor Critic (SAC) agent for Autonomous Drifting using Deep Reinforcement Learning.
 13 | 
 14 | I have written code following from the original Paper to implement the SAC algorithm but with slight modifications to make it work for my problem.
 15 | 
 16 | References: https://arxiv.org/abs/1801.01290
 17 | """
 18 | 
 19 | # Importing the libraries
 20 | import numpy as np
 21 | import os 
 22 | import torch
 23 | import torch.nn as nn
 24 | import torch.nn.functional as F
 25 | import torch.optim as optim
 26 | from torch.distributions import Normal,MultivariateNormal
 27 | from tensorboardX import SummaryWriter
 28 | from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
 29 |     
 30 |     
 31 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 32 | global path
 33 | path = "./SAC_Results"
 34 | 
 35 | 
 36 | class Actor(nn.Module):
 37 |     """Actor (Policy) Model.
 38 |     Policy Network
 39 |                         
 40 |     @brief This class takes in an observation of the environment and returns the action that the actor chooses to execute.
 41 |     
 42 |     """
 43 | 
 44 |     def __init__(self, state_size, action_size, fc1_units=512, fc2_units=256):
 45 |         """Initialize parameters and build model.
 46 |         Params
 47 |         ======
 48 |             state_size (int): Dimension of each state
 49 |             action_size (int): Dimension of each action
 50 |             fc1_units (int): Number of nodes in first hidden layer
 51 |             fc2_units (int): Number of nodes in second hidden layer
 52 |         """
 53 |         super(Actor, self).__init__()
 54 |         self.fc1 = nn.Linear(state_size, fc1_units)
 55 |         self.fc2 = nn.Linear(fc1_units, fc2_units)
 56 |         self.fc3 = nn.Linear(fc2_units, action_size)
 57 |         self.log_std_head = nn.Linear(fc2_units,action_size) # log_std head for the policy network 
 58 |         self.min_log_std = -20 # Minimum value of log_std
 59 |         self.max_log_std = 2 # Maximum value of log_std
 60 | 
 61 |     def forward(self, state):
 62 |         """Build an actor (policy) network that maps states -> actions."""
 63 |         x = F.relu(self.fc1(state))
 64 |         x = F.relu(self.fc2(x))
 65 |         x = self.fc3(x)
 66 |         x_log_std = self.log_std_head(x) # log_std head for the policy network
 67 |         x_log_std = torch.clamp(x_log_std, self.min_log_std, self.max_log_std) # Clamp the log_std to be between the min and max values
 68 |         
 69 |         return x, x_log_std
 70 |     
 71 |     
 72 | class Critic(nn.Module):
 73 |     """Critic (Value) Model.
 74 |     Value Network
 75 |     
 76 |     @brief This class takes in an observation of the environment and returns the value of the state.
 77 |     
 78 |     """
 79 | 
 80 |     def __init__(self, state_size, fcs1_units=256, fc2_units=256):
 81 |         """Initialize parameters and build model.
 82 |         Params
 83 |         ======
 84 |             state_size (int): Dimension of each state
 85 |             fcs1_units (int): Number of nodes in the first hidden layer
 86 |             fc2_units (int): Number of nodes in the second hidden layer
 87 |         """
 88 |         super(Critic, self).__init__()
 89 |         self.fcs1 = nn.Linear(state_size, fcs1_units)
 90 |         self.fc2 = nn.Linear(fcs1_units, fc2_units)
 91 |         self.fc3 = nn.Linear(fc2_units, 1)
 92 |         
 93 |     def forward(self, state, action):
 94 |         """Build a critic (value) network that maps (state, action) pairs -> Q-values."""
 95 |         x = F.relu(self.fcs1(state))
 96 |         x = F.relu(self.fc2(x))
 97 |         x = self.fc3(x)
 98 |         return x
 99 |     
100 |     
101 | class Q_net(nn.Module):
102 |     """ Q Network """
103 |     
104 |     def __init__(self, state_size, action_size, fc1_units=256):
105 |         """Initialize parameters and build model.
106 |         Params
107 |         ======
108 |             state_size (int): Dimension of each state
109 |             action_size (int): Dimension of each action
110 |             fc1_units (int): Number of nodes in first hidden layer
111 |         """
112 |         super(Q_net, self).__init__()
113 |         self.state_size = state_size
114 |         self.action_size = action_size
115 |         self.fc1 = nn.Linear(state_size+action_size, fc1_units)
116 |         self.fc2 = nn.Linear(fc1_units, fc1_units)
117 |         self.fc3 = nn.Linear(fc1_units, 1)
118 |         
119 |     def forward(self, state, action):
120 |         """Build a critic (value) network that maps (state, action) pairs -> Q-values."""
121 |         state = state.reshape(-1,state.shape[-1])
122 |         action = action.reshape(-1,action.shape[-1])
123 |         x = torch.cat((state,action),-1) # Concatenate the state and action
124 |         x = F.relu(self.fc1(x))
125 |         x = F.relu(self.fc2(x))
126 |         x = self.fc3(x)
127 |         return x
128 |     
129 |     
130 |     
131 | class ReplayBuffer:
132 |     """ Replay Buffer 
133 |     
134 |     @brief This class is used to store the experience tuples in the form of (state, action, reward, next_state ,done)
135 |     
136 |     This is used for training the agent to learn from the experience state transition.
137 |     
138 |     """
139 |     
140 |     def __init__(self, buffer_size,state_size,action_size):
141 |         """Initialize parameters and build model.
142 |         Params
143 |         ======
144 |             buffer_size (int): Maximum size of the replay buffer
145 |             state_size (int): Dimension of each state
146 |             action_size (int): Dimension of each action
147 |         """
148 |         self.buffer_size = buffer_size
149 |         self.state_size = state_size
150 |         self.action_size = action_size
151 |         self.state_arr = torch.zeros(self.buffer_size,self.state_size).float().to(device)
152 |         self.action_arr = torch.zeros(self.buffer_size,self.action_size).float().to(device)
153 |         self.reward_arr = torch.zeros(self.buffer_size,1).float().to(device)
154 |         self.next_state_arr = torch.zeros(self.buffer_size,self.state_size).float().to(device)
155 |         self.done_arr = torch.zeros(self.buffer_size,1).float().to(device)
156 |         self.ptr = 0
157 |         
158 |     def add(self,state,action,reward,next_state,done):
159 |         """Add a new experience to the replay buffer"""
160 |         self.ptr = (self.ptr) % self.buffer_size
161 |         state = torch.tensor(state,dtype=torch.float32).to(device)
162 |         action = torch.tensor(action,dtype=torch.float32).to(device)
163 |         reward = torch.tensor(reward,dtype=torch.float32).to(device)
164 |         next_state = torch.tensor(next_state,dtype=torch.float32).to(device)
165 |         done = torch.tensor(done,dtype=torch.float32).to(device)
166 |         
167 |         # Now we add the experience to the replay buffer (i.e to individual arrays)
168 |         for array,element in zip([self.state_arr,self.action_arr,self.reward_arr,self.next_state_arr,self.done_arr],
169 |                                  [state,action,reward,next_state,done]):
170 |             array[self.ptr] = element
171 |         self.ptr += 1
172 |        
173 |         
174 |                  
175 |     def sample(self, batch_size):
176 |         """Sample a batch of experiences from the buffer"""
177 |         ind = np.random.randint(0, self.buffer_size, size=batch_size,replace = False)
178 |         batch_state, batch_action, batch_reward, batch_next_state, batch_done = self.state_arr[ind], self.action_arr[ind], self.reward_arr[ind], self.next_state_arr[ind], self.done_arr[ind]
179 |                                                                                 
180 |     
181 |         return batch_state, batch_action, batch_reward, batch_next_state, batch_done
182 |    
183 |    
184 |    
185 | class SAC:
186 |     """ Soft Actor-Critic (SAC) 
187 |     
188 |     @brief This class implements the Soft Actor Critic using the above defined classes
189 |            for Policy Network ( Actor ) , Value Network + Q Network (Q_net) --> ( Critic )
190 |            
191 |     """
192 |     def __init__(self,state_size,action_size,alpha,buffer_size=400000,lr=3e-4,gamma=0.99,tau=0.005,batch_size=512):
193 |         """Initialize parameters and build model.
194 |         Params
195 |         ======
196 |             state_size (int): Dimension of each state
197 |             action_size (int): Dimension of each action
198 |             lr (float): Learning rate
199 |             gamma (float): Discount factor
200 |             tau (float): Soft update parameter
201 |             alpha (float): Entropy parameter
202 |             beta (float): Entropy parameter
203 |             batch_size (int): Batch size for training
204 |             buffer_size (int): Size of the replay buffer
205 |         """
206 |         super(SAC,self).__init__()
207 |         self.state_size = state_size
208 |         self.action_size = action_size
209 |         self.lr = lr
210 |         self.gamma = gamma
211 |         self.tau = tau
212 |         self.alpha = alpha
213 |         self.batch_size = batch_size
214 |         self.buffer_size = buffer_size
215 |         
216 |         # Initialize the policy network 
217 |         self.policy_net = Actor(state_size,action_size).to(device) 
218 |         
219 |         
220 |         # Initialize the value network and the target network
221 |         self.value_net = Critic(state_size).to(device)
222 |         self.target_net = Critic(state_size).to(device)
223 |         
224 |         # Initialize the Q network 
225 |         self.q_net_1 = Q_net(state_size,action_size).to(device)
226 |         self.q_net_2 = Q_net(state_size,action_size).to(device)
227 |         
228 |         self.policy_optimizer = optim.Adam(self.policy_net.parameters(),lr=lr)
229 |         self.value_optimizer = optim.Adam(self.value_net.parameters(),lr=lr)
230 |         self.q_net_1_optimizer = optim.Adam(self.q_net_1.parameters(),lr=lr)
231 |         self.q_net_2_optimizer = optim.Adam(self.q_net_2.parameters(),lr=lr)
232 |         
233 |         # Initialize the replay buffer
234 |         self.replay_buffer = ReplayBuffer(buffer_size,state_size,action_size)
235 |         self.ptr = 0
236 |         self.writer = SummaryWriter("runs/sac")
237 |         
238 |         self.value_critic_loss = nn.MSELoss()
239 |         self.q_net_1_loss = nn.MSELoss()
240 |         self.q_net_2_loss = nn.MSELoss()
241 |         
242 |         for trg_params,src_params in zip(self.target_net.parameters(),self.value_net.parameters()):
243 |             trg_params.data.copy_(src_params.data)
244 |             
245 |         self.steering_range = (-0.8,0.8) # steering angle range for the car according to the simulator(CARLA) to showcase highspeed during drifting without rolling over.
246 |         self.throttle_range = (0.6,1.0) # throttle range for the car according to the simulator(CARLA) to showcase highspeed during drifting.
247 |         
248 |         
249 |  
250 |     def select_action(self,state):
251 |         """Select an action from the current policy"""
252 |         state = torch.tensor(state,dtype=torch.float32).to(device)
253 |         mu,sigma = self.policy_net(state)
254 |         sigma = torch.exp(sigma)
255 |         dist_space = Normal(mu,sigma)
256 |         z = dist_space.sample()
257 |         
258 |         steer_action = float(torch.tanh(z[0,0])).detach().cpu().numpy()
259 |         throttle_action = float(torch.sigmoid(z[0,1])).detach().cpu().numpy()
260 |         
261 |         steer_action = np.clip(steer_action,self.steering_range[0],self.steering_range[1])
262 |         throttle_action = np.clip(throttle_action,self.throttle_range[0],self.throttle_range[1])
263 |         
264 |         return np.array([steer_action,throttle_action])
265 |     
266 |     
267 |     def test(self,state):
268 |         """Test the current policy"""
269 |         state = torch.tensor(state,dtype=torch.float32).to(device)
270 |         mu,sigma = self.policy_net(state)
271 |     
272 |         z = mu
273 |         
274 |         steer_action = float(torch.tanh(z[0,0])).detach().cpu().numpy()
275 |         throttle_action = float(torch.sigmoid(z[0,1])).detach().cpu().numpy()
276 |         
277 |         steer_action = np.clip(steer_action,self.steering_range[0],self.steering_range[1])
278 |         throttle_action = np.clip(throttle_action,self.throttle_range[0],self.throttle_range[1])
279 |         
280 |         return np.array([steer_action,throttle_action])
281 |     
282 |     def evaluate(self,state):
283 |         "Evaulation of the current model"
284 |         batch = state.size()[0]
285 |         batch_mu,batch_sigma = self.policy_net(state)
286 |         batch_sigma = torch.exp(batch_sigma)
287 |         dist_space = Normal(batch_mu,batch_sigma)
288 |         noise = Normal(0,1).sample(sample_shape=(batch,self.action_size))
289 |         
290 |         action = torch.tanh(batch_mu + batch_sigma*noise.to(device))
291 |         
292 |         prob_log = dist_space.log_prob(batch_mu + batch_sigma*noise.to(device)) - torch.log(1-torch.pow(action,2)+1e-6)
293 |         
294 |         prob_log_0= prob_log[:,0].reshape(batch,1)
295 |         prob_log_1= prob_log[:,1].reshape(batch,1)
296 |         prob_log = torch.cat((prob_log_0,prob_log_1),dim=1)
297 |         
298 |         return action,prob_log,noise,batch_mu,batch_sigma
299 |     
300 |     
301 |     def update(self):
302 |         "This function will update the policy and the value network"
303 |         if self.ptr % 500 == 0:
304 |             print("Updating the target network")
305 |             print("---- Started Training ----")
306 |             print("Train - \t{} times".format(self.ptr))
307 |         
308 |         self.ptr = 0
309 |         
310 |         # Sample a batch of transitions
311 |         state,action,reward,next_state,done = self.replay_buffer.sample(self.batch_size)
312 |         
313 |         target_val= self.target_net(next_state)
314 |         next_q_val=reward + (1-done)*self.gamma*target_val
315 |         
316 |         expect_val = self.value_net(state)
317 |         expect_q1,expect_q2 = self.q_net_1(state,action),self.q_net_2(state,action)
318 |         sample_action,prob_log,noise,batch_mu,batch_sigma = self.evaluate(state)
319 |         expect_q  = torch.min(self.q_net_1(state,sample_action),self.q_net_2(state,sample_action))
320 |         next_val = expect_q - prob_log
321 |         
322 |          # Calculate the value loss
323 |         # Loss function for the value network
324 |         J_V_loss = self.value_critic_loss(expect_val,next_val.detach()).mean()
325 |         
326 |         # Loss function for the Q network
327 |         Q1_loss = self.q_net_1_loss(expect_q1,next_q_val.detach()).mean()
328 |         Q2_loss = self.q_net_2_loss(expect_q2,next_q_val.detach()).mean()
329 |         
330 |         pi_loss = -(expect_q - prob_log).mean()  # Policy Loss
331 |         #  The above line is adapted from the original paper
332 |         
333 |         self.writer.add_scalar("Loss/J_V_loss",J_V_loss,self.ptr)
334 |         self.writer.add_scalar("Loss/Q1_loss",Q1_loss,self.ptr)
335 |         self.writer.add_scalar("Loss/Q2_loss",Q2_loss,self.ptr)
336 |         self.writer.add_scalar("Loss/pi_loss",pi_loss,self.ptr)
337 |         
338 |         
339 |         # Update the networks
340 |         # Update the value network
341 |         self.value_optimizer.zero_grad()
342 |         J_V_loss.backward(retain_graph=True)
343 |         nn.utils.clip_grad_norm_(self.value_net.parameters(),0.5)
344 |         self.value_optimizer.step()
345 |         
346 |         # Update the policy network (Q network)
347 |         self.q_net_1_optimizer.zero_grad()
348 |         Q1_loss.backward(retain_graph=True)
349 |         nn.utils.clip_grad_norm_(self.q_net_1.parameters(),0.5)
350 |         self.q_net_1_optimizer.step()
351 |         
352 |         self.q_net_2_optimizer.zero_grad()
353 |         Q2_loss.backward(retain_graph=True)
354 |         nn.utils.clip_grad_norm_(self.q_net_2.parameters(),0.5)
355 |         self.q_net_2_optimizer.step()
356 |         
357 |         
358 |         # Update the policy network (Policy network)
359 |         self.policy_optimizer.zero_grad()
360 |         pi_loss.backward(retain_graph = True)
361 |         nn.utils.clip_grad_norm_(self.policy_net.parameters(),0.5)
362 |         self.policy_optimizer.step()
363 |         
364 |         
365 |         # Update the target networks
366 |         for trg_params,src_params in zip(self.target_net.parameters(),self.value_net.parameters()):
367 |             trg_params.data.copy_(trg_params.data*self.tau + src_params.data*(1-self.tau))
368 |             
369 |         self.ptr += 1
370 |         
371 |         
372 |     def save_model(self,path,epoch,buffer_size):
373 |         
374 |         os.makedirs(path+str(buffer_size),exist_ok=True)
375 |         torch.save(self.policy_net.state_dict(),path+str(buffer_size)+"/policy_net_"+str(epoch)+".pth")
376 |         torch.save(self.value_net.state_dict(),path+str(buffer_size)+"/value_net_"+str(epoch)+".pth")
377 |         torch.save(self.q_net_1.state_dict(),path+str(buffer_size)+"/q_net_1_"+str(epoch)+".pth")
378 |         torch.save(self.q_net_2.state_dict(),path+str(buffer_size)+"/q_net_2_"+str(epoch)+".pth")
379 |         
380 |         print("Model Saved")
381 |         
382 |         
383 |     def load_model(self,path,epoch,buffer_size):
384 |         
385 |         self.policy_net.load_state_dict(torch.load(path+str(buffer_size)+"/policy_net_"+str(epoch)+".pth"))
386 |         self.value_net.load_state_dict(torch.load(path+str(buffer_size)+"/value_net_"+str(epoch)+".pth"))
387 |         self.q_net_1.load_state_dict(torch.load(path+str(buffer_size)+"/q_net_1_"+str(epoch)+".pth"))
388 |         self.q_net_2.load_state_dict(torch.load(path+str(buffer_size)+"/q_net_2_"+str(epoch)+".pth"))
389 |         
390 |         print("Model Loaded")
391 |         
392 |         
393 |     def save_buffer(self,path,epoch,buffer_size):
394 |         path = path+str(buffer_size)+'/'
395 |         self.policy_net.load_state_dict(torch.load(path+"policy_net_"+str(epoch)+".pth"))
396 |         self.value_net.load_state_dict(torch.load(path+"value_net_"+str(epoch)+".pth"))
397 |         self.q_net_1.load_state_dict(torch.load(path+"q_net_1_"+str(epoch)+".pth"))
398 |         self.q_net_2.load_state_dict(torch.load(path+"q_net_2_"+str(epoch)+".pth"))
399 |         print("Model Loaded")


--------------------------------------------------------------------------------
/code/train_SAC.py:
--------------------------------------------------------------------------------
 1 | """ 
 2 | 
 3 | # -*- coding: utf-8 -*-
 4 | 
 5 | 
 6 | 
 7 | Created on Mon April 11 2022
 8 | 
 9 | @author: Rahul Karanam
10 | @brief Train a SAC agent for the High Speed Driving on CARLA simulator. ( I'll be training on the map - 1 which
11 | can be found in the reference folder )
12 | 
13 | 
14 | """
15 | 
16 | # Importing the libraries
17 | import sys
18 | from SoftAC import *
19 | import time
20 | import numpy as np
21 | import random
22 | import pygame
23 | import matplotlib.pyplot as plt
24 | import matplotlib.patches as patches
25 | from agents.navigation.basic_agent import BasicAgent
26 | 
27 | 
28 | if __name__ == "__main__":
29 |     
30 |     # To check pygame is working or not
31 |     print("Working:[1]")
32 |     pygame.init()
33 |     print("Working:[2]")
34 |     pygame.font.init()
35 |     print("Working:[3]")
36 |     env = environment(traj_num = 1)
37 | 


--------------------------------------------------------------------------------
/docs/ENPM690_Phase1_report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/docs/ENPM690_Phase1_report.pdf


--------------------------------------------------------------------------------
/results/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/.DS_Store


--------------------------------------------------------------------------------
/results/Output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/Output.png


--------------------------------------------------------------------------------
/results/network.001.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/network.001.jpeg


--------------------------------------------------------------------------------
/results/track2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/track2.png


--------------------------------------------------------------------------------
/results/train3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/karanamrahul/Autonomous-Drifting-using-deep-Reinforcement-Learning/484ab51f262d09653e5fc520f96e35c332be7e70/results/train3.png


--------------------------------------------------------------------------------