├── Arpit_Aggarwal_Report.pdf
├── Arpit_Aggarwal_ppt.pdf
├── LICENSE.md
├── README.md
├── Report.pdf
├── code
    ├── __pycache__
    │   ├── convnet.cpython-37.pyc
    │   ├── replay_buffer.cpython-37.pyc
    │   ├── td3.cpython-37.pyc
    │   └── utils.cpython-37.pyc
    ├── convnet.py
    ├── envs
    │   ├── __init__.py
    │   ├── __pycache__
    │   │   ├── __init__.cpython-37.pyc
    │   │   ├── robot_target_tracking_env_interface.cpython-37.pyc
    │   │   └── robot_target_tracking_env_sensors.cpython-37.pyc
    │   ├── convnet.py
    │   ├── robot_target_tracking_env.py
    │   ├── robot_target_tracking_env_interface.py
    │   └── robot_target_tracking_env_sensors.py
    ├── eval.py
    ├── replay_buffer.py
    ├── td3.py
    ├── train.py
    └── utils.py
├── outputs
    ├── 7.png
    ├── case1_1.jpg
    ├── case1_1.png
    ├── case1_2.jpg
    ├── case1_2.png
    ├── case1_3.png
    ├── case1_4.png
    ├── case2_1.jpg
    ├── case2_1.png
    ├── case2_2.jpg
    ├── case2_2.png
    ├── case2_3.png
    ├── case2_4.png
    ├── case3_1.jpg
    ├── case3_1.png
    ├── case3_2.jpg
    ├── case3_2.png
    ├── case3_3.png
    ├── case3_4.png
    ├── case4_1.jpg
    ├── case4_1.png
    ├── case4_2.jpg
    ├── case4_2.png
    ├── case4_3.png
    ├── case4_4.png
    ├── greedy_reward.png
    ├── qlearning_reward.png
    ├── reward.png
    ├── reward_1.png
    ├── reward_sensors_1_targets_1.png
    ├── reward_sensors_1_targets_2.png
    ├── reward_sensors_1_targets_3.png
    ├── reward_sensors_1_targets_4.png
    ├── sample.png
    └── sample1.png
└── scripts
    ├── __pycache__
        └── utils.cpython-37.pyc
    ├── bh_main.py
    ├── ekf_main.py
    ├── qlearning.py
    └── utils.py


/Arpit_Aggarwal_Report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/Arpit_Aggarwal_Report.pdf


--------------------------------------------------------------------------------
/Arpit_Aggarwal_ppt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/Arpit_Aggarwal_ppt.pdf


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Arpit Aggarwal
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 6 | 
 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Multi-robot Target Tracking
 2 | 
 3 | [![Packagist](https://img.shields.io/packagist/l/doctrine/orm.svg)](LICENSE.md)
 4 | ---
 5 | 
 6 | 
 7 | ### Authors
 8 | Arpit Aggarwal
 9 | Lifeng Zhou
10 | 
11 | 
12 | ### Introduction to the Project
13 | Robot-target tracking is the task of localizing the targets by using mobile robots. The task is to determine an optimal robot trajectory to minimize the uncertainty in the targets position. First, the task was comparing the Extended Kalman Filter(EKF) and Bayesian Histogram(BH) techniques for creating a heatmap of the environment, where the bright region denoted the total uncertainty in the targets position. Next, the task was using RL technique to determine the optimal action to be taken by the robot to minimize the total uncertainty in the heatmap.
14 | 
15 | 
16 | ### Results
17 | 
18 | Actor-Critic Model Training (robots=1, targets=1)
19 | ![](outputs/reward_sensors_1_targets_1.png) 
20 | 
21 | Actor-Critic Model Training (robots=1, targets=2)
22 | ![](outputs/reward_sensors_1_targets_2.png)
23 | 
24 | Actor-Critic Model Training (robots=1, targets=3)
25 | ![](outputs/reward_sensors_1_targets_3.png)
26 | 
27 | Actor-Critic Model Training (robots=1, targets=4)
28 | ![](outputs/reward_sensors_1_targets_4.png)
29 | 
30 | 
31 | ### Software Required
32 | To run the .py files, use Python 3. Standard Python 3 libraries like OpenAI Gym, PyTorch, Numpy, and matplotlib are used.
33 | 
34 | 
35 | ### Credits
36 | The following links were helpful for this project:
37 | 1. https://github.com/ksengin/active-target-localization/
38 | 2. https://ksengin.github.io/active-target-localization/
39 | 


--------------------------------------------------------------------------------
/Report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/Report.pdf


--------------------------------------------------------------------------------
/code/__pycache__/convnet.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/__pycache__/convnet.cpython-37.pyc


--------------------------------------------------------------------------------
/code/__pycache__/replay_buffer.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/__pycache__/replay_buffer.cpython-37.pyc


--------------------------------------------------------------------------------
/code/__pycache__/td3.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/__pycache__/td3.cpython-37.pyc


--------------------------------------------------------------------------------
/code/__pycache__/utils.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/__pycache__/utils.cpython-37.pyc


--------------------------------------------------------------------------------
/code/convnet.py:
--------------------------------------------------------------------------------
 1 | # reference: https://github.com/ksengin/active-target-localization/blob/master/target_localization/models/convnet.py
 2 | import os
 3 | 
 4 | import torch
 5 | import torch.nn as nn
 6 | import torch.nn.init as init
 7 | import torch.nn.functional as F
 8 | import torchvision
 9 | 
10 | 
11 | def _init_weights(module: nn.Module):
12 |     if type(module) == nn.Linear:
13 |         torch.nn.init.xavier_uniform(module.weight)
14 |         module.bias.data.fill_(0.01)
15 | 
16 | 
17 | class ConvNet(nn.Module):
18 |     """
19 |     The ConvNet class defines a convolutional neural net with a desired output dimension.
20 |     """
21 | 
22 |     def __init__(self, out_dim: int = 128, pretrained=True):
23 |         """
24 |         Create a convolutional network.
25 |         """
26 |         super(ConvNet, self).__init__()
27 | 
28 |         self.act_fn = F.relu
29 |         self.model_ft = torchvision.models.resnet18(pretrained=pretrained)
30 |         self.model_ft.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
31 |         num_ftrs = self.model_ft.fc.out_features
32 | 
33 |         self.model_ft.apply(_init_weights)
34 |         self.fc1 = nn.Linear(num_ftrs, num_ftrs)
35 |         self.fc2 = nn.Linear(num_ftrs, num_ftrs)
36 |         self.fc3 = nn.Linear(num_ftrs, out_dim)
37 | 
38 | 
39 |     def forward(self, x: torch.tensor):
40 |         x = self.act_fn(self.model_ft(x))
41 |         x = self.act_fn(self.fc1(x))
42 |         x = self.act_fn(self.fc2(x))
43 |         fc_out = self.fc3(x)
44 |         return fc_out
45 | 


--------------------------------------------------------------------------------
/code/envs/__init__.py:
--------------------------------------------------------------------------------
1 | from envs.robot_target_tracking_env_interface import RobotTargetTrackingEnvInterface
2 | 


--------------------------------------------------------------------------------
/code/envs/__pycache__/__init__.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/envs/__pycache__/__init__.cpython-37.pyc


--------------------------------------------------------------------------------
/code/envs/__pycache__/robot_target_tracking_env_interface.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/envs/__pycache__/robot_target_tracking_env_interface.cpython-37.pyc


--------------------------------------------------------------------------------
/code/envs/__pycache__/robot_target_tracking_env_sensors.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/code/envs/__pycache__/robot_target_tracking_env_sensors.cpython-37.pyc


--------------------------------------------------------------------------------
/code/envs/convnet.py:
--------------------------------------------------------------------------------
 1 | # reference: https://github.com/ksengin/active-target-localization/blob/master/target_localization/models/convnet.py
 2 | import os
 3 | 
 4 | import torch
 5 | import torch.nn as nn
 6 | import torch.nn.init as init
 7 | import torch.nn.functional as F
 8 | import torchvision
 9 | 
10 | 
11 | def _init_weights(module: nn.Module):
12 |     if type(module) == nn.Linear:
13 |         torch.nn.init.xavier_uniform(module.weight)
14 |         module.bias.data.fill_(0.01)
15 | 
16 | 
17 | class ConvNet(nn.Module):
18 |     """
19 |     The ConvNet class defines a convolutional neural net with a desired output dimension.
20 |     """
21 | 
22 |     def __init__(self, out_dim: int = 128, pretrained=False):
23 |         """
24 |         Create a convolutional network.
25 |         """
26 |         super(ConvNet, self).__init__()
27 | 
28 |         self.act_fn = F.relu
29 |         self.model_ft = torchvision.models.resnet18(pretrained=pretrained)
30 |         self.model_ft.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
31 |         num_ftrs = self.model_ft.fc.out_features
32 | 
33 |         self.model_ft.apply(_init_weights)
34 |         self.fc1 = nn.Linear(num_ftrs, num_ftrs)
35 |         self.fc2 = nn.Linear(num_ftrs, num_ftrs)
36 |         self.fc3 = nn.Linear(num_ftrs, out_dim)
37 | 
38 | 
39 |     def forward(self, x: torch.tensor):
40 |         x = self.act_fn(self.model_ft(x))
41 |         x = self.act_fn(self.fc1(x))
42 |         x = self.act_fn(self.fc2(x))
43 |         fc_out = self.fc3(x)
44 |         return fc_out
45 | 


--------------------------------------------------------------------------------
/code/envs/robot_target_tracking_env.py:
--------------------------------------------------------------------------------
  1 | """
  2 | OpenAI Gym environment with robots and targets
  3 | 
  4 | Environment source code: https://github.com/ksengin/active-target-localization/blob/master/target_localization/envs/tracking_waypoints_env.py
  5 | Modifications: Modified the existing environment with number of robots > 1 and the targets moving case
  6 | """
  7 | 
  8 | 
  9 | # header files
 10 | import os
 11 | import copy
 12 | import numpy as np
 13 | import torch
 14 | import torch.nn as nn
 15 | import torch.nn.functional as F
 16 | import torchvision
 17 | 
 18 | import gym
 19 | from gym import error, spaces
 20 | from gym.utils import seeding
 21 | import matplotlib.pyplot as plt
 22 | import matplotlib.cm as cm
 23 | from matplotlib.patches import Ellipse
 24 | import matplotlib.transforms as transforms
 25 | from math import pi, cos, sin
 26 | from scipy.stats import multivariate_normal
 27 | from convnet import *
 28 | 
 29 | 
 30 | # OpenAI gym environment class
 31 | class RobotTargetTrackingEnv(gym.GoalEnv):
 32 |     
 33 |     def __init__(self):
 34 |         """
 35 |             Init method for the environment
 36 |         """
 37 |         self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 38 |         self.seed()
 39 |         self.len_workspace = 20
 40 |         self.workspace = np.array([[0, self.len_workspace], [0, self.len_workspace]])
 41 |         self.sigma_meas = 1.0
 42 |         self.time_step = 1
 43 | 
 44 |         self.action_space = spaces.Box(-np.pi, np.pi, shape=(1,), dtype='float32')
 45 |         self.convnet = ConvNet()
 46 | 
 47 | 
 48 |     def env_parametrization(self, num_targets=4, num_sensors=1, target_motion_omegas=None, meas_model='range'):
 49 |         """ 
 50 |             Function for parametrizing the environment
 51 |         """
 52 |         self.x1_list = []
 53 |         self.y1_list = []
 54 |         self.x2_list = []
 55 |         self.y2_list = []
 56 |         self.x3_list = []
 57 |         self.y3_list = []
 58 |         self.x4_list = []
 59 |         self.y4_list = []
 60 |         self.time_step = 1
 61 |         self.num_targets = num_targets
 62 |         self.true_targets_radii = torch.rand(self.num_targets)*5.0+2.0
 63 |         self.true_targets_pos = (torch.rand(self.num_targets, 2)*self.len_workspace)
 64 |         self.initial_true_targets_pos = self.true_targets_pos.clone()
 65 |         self.estimated_targets_mean = self.true_targets_pos.clone()
 66 |         self.estimated_targets_var = torch.zeros(self.num_targets, 2, 2)
 67 |         for index in range(0, self.num_targets):
 68 |             self.estimated_targets_var[index] = torch.tensor([[1.0, 0.0], [0.0, 1.0]])
 69 |         self.target_motion_omegas = torch.rand(self.num_targets)*100.0
 70 | 
 71 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
 72 |         for index in range(0, self.num_targets):
 73 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
 74 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
 75 |             X, Y = np.meshgrid(x, y)
 76 |             pos = np.empty(X.shape + (2,))
 77 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
 78 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
 79 |             self.heatmap += rv.pdf(pos)
 80 |         image = F.interpolate(self.heatmap, (256, 256), mode='bilinear')
 81 |         true_obs = self.convnet(image).squeeze()
 82 |         #true_obs = self.heatmap.flatten()
 83 | 
 84 |         self.robot_movement_x = []
 85 |         self.robot_movement_y = []
 86 |         self.num_sensors = num_sensors
 87 |         self.sensors_pos = torch.zeros(self.num_sensors, 2)
 88 |         for index in range(0, self.num_sensors):
 89 |             rand_angle = torch.rand(1)*2*np.pi
 90 |             self.sensors_pos[index] = self.true_targets_pos.mean(0)+(torch.sqrt(torch.rand(1)+0.5)*(self.len_workspace/2)*(torch.tensor([torch.cos(rand_angle), torch.sin(rand_angle)])))
 91 |             if(self.sensors_pos[index, 0] >= self.len_workspace):
 92 |                 self.sensors_pos[index, 0] -= (self.sensors_pos[index, 0]-self.len_workspace+1)
 93 |             if(self.sensors_pos[index, 0] <= 0):
 94 |                 self.sensors_pos[index, 0] = (-self.sensors_pos[index, 0]+1)
 95 |             if(self.sensors_pos[index, 1] >= self.len_workspace):
 96 |                 self.sensors_pos[index, 1] -= (self.sensors_pos[index, 1]-self.len_workspace+1)
 97 |             if(self.sensors_pos[index, 1] <= 0):
 98 |                 self.sensors_pos[index, 1] = (-self.sensors_pos[index, 1]+1)
 99 | 
100 |         self.robot_movement_x.append(float(self.sensors_pos[0, 0]))
101 |         self.robot_movement_y.append(float(self.sensors_pos[0, 1]))
102 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
103 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
104 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
105 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
106 |         self.x3_list.append(float(self.true_targets_pos[2, 0]))
107 |         self.y3_list.append(float(self.true_targets_pos[2, 1]))
108 |         self.x4_list.append(float(self.true_targets_pos[3, 0]))
109 |         self.y4_list.append(float(self.true_targets_pos[3, 1]))
110 | 
111 |         self.meas_model = meas_model
112 |         if self.meas_model == 'bearing':
113 |             self.sigma_meas = 0.2
114 |             self.normal_dist_1d_torch = lambda x, mu, sgm: 1.0 / (np.sqrt(2 * np.pi*sgm**2)) * torch.exp(-0.5 / sgm**2 * (np.pi-torch.abs(torch.abs(x-mu) - np.pi))**2)
115 |         else:
116 |             self.sigma_meas = 1.0
117 |             self.normal_dist_1d_torch = lambda x, mu, sgm: 1.0 / (np.sqrt(2 * np.pi*sgm**2)) * np.exp(-0.5 / sgm**2 * (np.abs(x - mu)**2))
118 | 
119 |         self.state = torch.cat((self.sensors_pos[0], torch.tensor(true_obs).float()))
120 |         self.observation_space = spaces.Box(-np.inf, np.inf, shape=self.state.shape, dtype='float32')
121 | 
122 |     
123 |     def step(self, action, step_size):
124 |         """ 
125 |             Function to update the environment
126 |         """
127 |         self._set_action(action, step_size)
128 | 
129 |         self.time_step = self.time_step + 1
130 |         self.update_true_targets_pos()
131 |         self.update_estimated_targets_pos()
132 | 
133 |         self.robot_movement_x.append(float(self.sensors_pos[0, 0]))
134 |         self.robot_movement_y.append(float(self.sensors_pos[0, 1]))
135 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
136 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
137 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
138 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
139 |         self.x3_list.append(float(self.true_targets_pos[2, 0]))
140 |         self.y3_list.append(float(self.true_targets_pos[2, 1]))
141 |         self.x4_list.append(float(self.true_targets_pos[3, 0]))
142 |         self.y4_list.append(float(self.true_targets_pos[3, 1]))
143 | 
144 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
145 |         for index in range(0, self.num_targets):
146 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
147 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
148 |             X, Y = np.meshgrid(x, y)
149 |             pos = np.empty(X.shape + (2,))
150 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
151 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
152 |             self.heatmap += rv.pdf(pos)
153 |         true_obs = self.heatmap.flatten()
154 | 
155 |         done = False
156 |         reward = None
157 |         reward, done = self.compute_reward()        
158 |         if(self.time_step>1000 or float(self.sensors_pos[0, 0])<=0 or float(self.sensors_pos[0, 1])<=0 or float(self.sensors_pos[0, 0]) >= self.len_workspace or float(self.sensors_pos[0, 1])>=self.len_workspace):
159 |             done = True
160 | 
161 |         self.state = torch.cat((self.sensors_pos[0], torch.tensor(true_obs).float()))
162 |         return self.state, reward, done, None, self.estimated_targets_var
163 | 
164 |     
165 |     def reset(self, **kwargs):
166 |         """ 
167 |             Function to reset the environment
168 |         """
169 |         self.x1_list = []
170 |         self.y1_list = []
171 |         self.x2_list = []
172 |         self.y2_list = []
173 |         self.x3_list = []
174 |         self.y3_list = []
175 |         self.x4_list = []
176 |         self.y4_list = []
177 |         self.time_step = 1
178 |         self.true_targets_radii = torch.rand(self.num_targets)*5.0+2.0
179 |         self.true_targets_pos = (torch.rand(self.num_targets, 2)*self.len_workspace)
180 |         self.initial_true_targets_pos = self.true_targets_pos.clone()
181 |         self.estimated_targets_mean = self.true_targets_pos.clone()
182 |         self.estimated_targets_var = torch.zeros(self.num_targets, 2, 2)
183 |         for index in range(0, self.num_targets):
184 |             self.estimated_targets_var[index] = torch.tensor([[1, 0], [0, 1]])
185 |         self.target_motion_omegas = torch.rand(self.num_targets)*100.0
186 | 
187 |         self.robot_movement_x = []
188 |         self.robot_movement_y = []
189 |         self.sensors_pos = torch.zeros(self.num_sensors, 2)
190 |         for index in range(0, self.num_sensors):
191 |             rand_angle = torch.rand(1)*2*np.pi
192 |             self.sensors_pos[index] = self.true_targets_pos.mean(0)+(torch.sqrt(torch.rand(1)+0.5)*(self.len_workspace/2)*(torch.tensor([torch.cos(rand_angle), torch.sin(rand_angle)])))
193 |             if(self.sensors_pos[index, 0] >= self.len_workspace):
194 |                 self.sensors_pos[index, 0] -= (self.sensors_pos[index, 0] - self.len_workspace + 1)
195 |             if(self.sensors_pos[index, 0] <= 0):
196 |                 self.sensors_pos[index, 0] = (-self.sensors_pos[index, 0] + 1)
197 |             if(self.sensors_pos[index, 1] >= self.len_workspace):
198 |                 self.sensors_pos[index, 1] -= (self.sensors_pos[index, 1] - self.len_workspace + 1)
199 |             if(self.sensors_pos[index, 1] <= 0):
200 |                 self.sensors_pos[index, 1] = (-self.sensors_pos[index, 1] + 1)      
201 | 
202 |         self.robot_movement_x.append(float(self.sensors_pos[0, 0]))
203 |         self.robot_movement_y.append(float(self.sensors_pos[0, 1]))
204 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
205 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
206 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
207 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
208 |         self.x3_list.append(float(self.true_targets_pos[2, 0]))
209 |         self.y3_list.append(float(self.true_targets_pos[2, 1]))
210 |         self.x4_list.append(float(self.true_targets_pos[3, 0]))
211 |         self.y4_list.append(float(self.true_targets_pos[3, 1]))
212 | 
213 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
214 |         for index in range(0, self.num_targets):
215 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
216 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
217 |             X, Y = np.meshgrid(x, y)
218 |             pos = np.empty(X.shape + (2,))
219 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
220 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
221 |             self.heatmap += rv.pdf(pos)
222 |         true_obs = self.heatmap.flatten()
223 | 
224 |         self.state = torch.cat((self.sensors_pos[0], torch.tensor(true_obs).float()))
225 |         self.observation_space = spaces.Box(-np.inf, np.inf, shape=self.state.shape, dtype='float32')
226 |         return self.state, self.sensors_pos, self.estimated_targets_mean, self.true_targets_radii, self.target_motion_omegas
227 | 
228 |     
229 |     def close(self):
230 |         """ 
231 |             Function to close the environment
232 |         """
233 |         pass
234 | 
235 |             
236 |     def render(self):
237 |         """ 
238 |             Function for rendering the environment
239 |         """
240 |         heatmap = torch.zeros(256, 256)
241 |         for index in range(0, self.num_targets):
242 |             x = np.linspace(0, self.len_workspace, 256)
243 |             y = np.linspace(0, self.len_workspace, 256)
244 |             X, Y = np.meshgrid(x, y)
245 |             pos = np.empty(X.shape + (2,))
246 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
247 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
248 |             heatmap += rv.pdf(pos)
249 |         x = np.linspace(0, self.len_workspace, 256)
250 |         y = np.linspace(0, self.len_workspace, 256)
251 |         X, Y = np.meshgrid(x, y)
252 |         plt.cla()
253 |         plt.title("Time step = " + str(self.time_step))
254 |         plt.xlabel("x")
255 |         plt.ylabel("y")
256 |         plt.xlim(0, self.len_workspace)
257 |         plt.ylim(0, self.len_workspace)
258 |         plt.contourf(X, Y, heatmap, cmap=cm.inferno)
259 |         plt.plot(self.x1_list, self.y1_list, 'b--')
260 |         plt.plot(self.x1_list[len(self.x1_list)-1], self.y1_list[len(self.y1_list)-1], 'o', c='b', marker='*')
261 |         plt.plot(self.x2_list, self.y2_list, 'b--')
262 |         plt.plot(self.x2_list[len(self.x2_list)-1], self.y2_list[len(self.y2_list)-1], 'o', c='b', marker='*')
263 |         plt.plot(self.x3_list, self.y3_list, 'b--')
264 |         plt.plot(self.x3_list[len(self.x3_list)-1], self.y3_list[len(self.y3_list)-1], 'o', c='b', marker='*')
265 |         plt.plot(self.x4_list, self.y4_list, 'b--')
266 |         plt.plot(self.x4_list[len(self.x4_list)-1], self.y4_list[len(self.y4_list)-1], 'o', c='b', marker='*')
267 |         if(len(self.robot_movement_x)<8):
268 |             plt.plot(self.robot_movement_x, self.robot_movement_y, 'r--')
269 |         else:
270 |             plt.plot(self.robot_movement_x[-8:], self.robot_movement_y[-8:], 'r--')
271 |         plt.scatter(self.robot_movement_x[len(self.robot_movement_x)-1], self.robot_movement_y[len(self.robot_movement_y)-1], color='r', marker='D')
272 |         plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s2/"+str(self.time_step)+".png")
273 |         #plt.show()
274 | 
275 | 
276 |     # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
277 |     def get_correlated_dataset(self, n, dependency, mu, scale):
278 |         latent = np.random.randn(n, 2)
279 |         dependent = latent.dot(dependency)
280 |         scaled = dependent * scale
281 |         scaled_with_offset = scaled + mu
282 |         return (scaled_with_offset[:, 0], scaled_with_offset[:, 1])
283 | 
284 | 
285 |     def compute_reward(self):
286 |         """ 
287 |             Function for computing the reward
288 |         """
289 |         val = 0.0
290 |         for index in range(0, self.num_targets):
291 |             val += np.log(np.linalg.det(self.estimated_targets_var[index]))
292 |         return -val, False
293 | 
294 | 
295 |     # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
296 |     def confidence_ellipse(self, x, y, ax, n_std=3.0, facecolor='none', **kwargs):
297 |         """
298 |            Inputs:
299 |            x: the x-coordinate datapoints
300 |            y: the y-coordinate datapoints
301 |         """
302 |         if x.size != y.size:
303 |             raise ValueError("x and y must be the same size")
304 | 
305 |         cov = np.cov(x, y)
306 |         pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
307 |     
308 |         ell_radius_x = np.sqrt(1 + pearson)
309 |         ell_radius_y = np.sqrt(1 - pearson)
310 |         ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor=facecolor, **kwargs)
311 |         scale_x = np.sqrt(cov[0, 0]) * n_std
312 |         mean_x = np.mean(x)
313 | 
314 |         scale_y = np.sqrt(cov[1, 1]) * n_std
315 |         mean_y = np.mean(y)
316 | 
317 |         transf = transforms.Affine2D().rotate_deg(45).scale(scale_x, scale_y).translate(mean_x, mean_y)
318 |         ellipse.set_transform(transf + ax.transData)
319 |         return ax.add_patch(ellipse)  
320 | 
321 | 
322 |     def _reset_sim(self, **kwargs):
323 |         raise NotImplementedError()
324 | 
325 | 
326 |     def _get_obs(self):
327 |         """ 
328 |             Get observation function
329 |             Returns the noisy relative measurement depending on the measurement model (bearing or range)
330 |         """
331 |         targets_pos = self.true_targets_pos
332 |         sensor_pos = self.sensors_pos[0]
333 | 
334 |         if self.meas_model == 'bearing':
335 |             true_measurement = torch.atan2(targets_pos[:, 1] - sensor_pos[1], targets_pos[:, 0] - sensor_pos[0])
336 |         elif self.meas_model == 'range':
337 |             true_measurement = torch.norm(targets_pos - sensor_pos, p=2, dim=1)
338 | 
339 |         true_obs = true_measurement + (self.sigma_meas * torch.randn(targets_pos.shape[0]))
340 |         return true_obs
341 | 
342 | 
343 |     def get_estimated_obs(self):
344 |         """ 
345 |             Get observation function
346 |             Returns the noisy relative measurement depending on the measurement model (bearing or range)
347 |         """
348 |         targets_pos = self.estimated_targets_mean
349 |         sensor_pos = self.sensors_pos[0]
350 | 
351 |         if self.meas_model == 'bearing':
352 |             true_measurement = torch.atan2(targets_pos[:, 1] - sensor_pos[1], targets_pos[:, 0] - sensor_pos[0])
353 |         elif self.meas_model == 'range':
354 |             true_measurement = torch.norm(targets_pos - sensor_pos, p=2, dim=1)
355 |         return true_measurement
356 | 
357 | 
358 |     def update_true_targets_pos(self):
359 |         """
360 |             Function to update the true target positions when time step increases (assuming circular target motions)
361 |         """
362 |         for index in range(0, self.num_targets):
363 |             self.true_targets_pos[index] = torch.tensor([self.true_targets_radii[index]*np.cos((self.time_step-1)/float(self.target_motion_omegas[index]))+float(self.initial_true_targets_pos[index, 0])-self.true_targets_radii[index], self.true_targets_radii[index]*np.sin((self.time_step-1)/float(self.target_motion_omegas[index]))+float(self.initial_true_targets_pos[index, 1])])
364 |         
365 | 
366 |     def update_estimated_targets_pos(self):
367 |         """
368 |             Function to update the estimated target positions when time step increases (using ekf)
369 |         """
370 |         true_obs = self._get_obs()
371 |         for index in range(0, self.num_targets):
372 |             z_true = np.zeros((1, 1))
373 |             z_true[0, 0] = true_obs[index]
374 |             q_matrix = 0.2 * np.eye(2)
375 |             x_matrix = np.array([[float(self.estimated_targets_mean[index, 0])], [float(self.estimated_targets_mean[index, 1])]])
376 |             sigma_matrix = self.estimated_targets_var[index].numpy() + q_matrix
377 | 
378 |             z_pred = np.zeros((1, 1))
379 |             h_matrix = np.zeros((1, 2))
380 |             for index1 in range(0, 1):
381 |                 z_pred[index1][0] = np.linalg.norm([[x_matrix[0, 0] - float(self.sensors_pos[0, 0])], [x_matrix[1, 0] - float(self.sensors_pos[0, 1])]], 2)
382 |                 h_matrix[index1, 0] = (-1.0 / z_pred[index1]) * (float(self.sensors_pos[0, 0]) - x_matrix[0, 0])
383 |                 h_matrix[index1, 1] = (-1.0 / z_pred[index1]) * (float(self.sensors_pos[0, 1]) - x_matrix[1, 0])
384 |             
385 |             res = (z_true - z_pred)
386 |             r_matrix = 1.0 * 1.0 * np.eye(1)
387 |             s_matrix = np.matmul(np.matmul(h_matrix, sigma_matrix), h_matrix.T) + r_matrix
388 |             k_matrix = np.matmul(np.matmul(sigma_matrix, h_matrix.T), np.linalg.inv(s_matrix))
389 | 
390 |             x_matrix_tplus1 = x_matrix + (np.matmul(k_matrix, res))
391 |             sigma_matrix_tplus1 = np.matmul(np.matmul((np.eye(2) - np.matmul(k_matrix, h_matrix)), sigma_matrix), (np.eye(2) - np.matmul(k_matrix, h_matrix)).T) + np.matmul(np.matmul(k_matrix, r_matrix), k_matrix.T)
392 |             target_xhat_tplus1 = x_matrix_tplus1[0, 0]
393 |             target_yhat_tplus1 = x_matrix_tplus1[1, 0]
394 |             
395 |             self.estimated_targets_mean[index, 0] = target_xhat_tplus1
396 |             self.estimated_targets_mean[index, 1] = target_yhat_tplus1
397 |             self.estimated_targets_var[index] = torch.tensor(sigma_matrix_tplus1)
398 | 
399 | 
400 |     def _get_true_target_position(self):
401 |         """
402 |             Function to return the true target positions.
403 |         """
404 |         return self.true_targets_pos
405 | 
406 | 
407 |     def _get_estimated_target_position(self):
408 |         """
409 |             Function to return the estimated target positions.
410 |         """
411 |         return self.estimated_targets_pos
412 | 
413 |     
414 |     def get_posterior_map(self):
415 |         return self.posterior_map
416 | 
417 |     
418 |     def _set_action(self, action, step_size):
419 |         """
420 |             Applies the given action to the sensor.
421 |         """
422 |         action = torch.tensor(action).float()
423 |         vector = step_size * torch.tensor([torch.cos(action), torch.sin(action)])
424 |         sensor_pos = self.sensors_pos[0] + vector
425 |         if(sensor_pos[0] >= self.len_workspace):
426 |             sensor_pos[0] = self.sensors_pos[0, 0]
427 |         if(sensor_pos[1] >= self.len_workspace):
428 |             sensor_pos[1] = self.sensors_pos[0, 1]
429 |         self.sensors_pos[0] = sensor_pos   
430 | 
431 |         
432 |     def seed(self, seed=None):
433 |         """
434 |             Function that returns a random seed using OpenAI Gym seeding.
435 |         """
436 |         _, seed = seeding.np_random(seed)
437 |         return [seed]
438 |         
439 |         
440 |     def _is_success(self, achieved_goal, desired_goal):
441 |         raise NotImplementedError()
442 | 
443 |         
444 |     def _sample_goal(self):
445 |         return torch.rand(2)
446 | 
447 |     
448 |     def _env_setup(self, initial_qpos):
449 |         pass
450 | 
451 |     
452 |     def _viewer_setup(self):
453 |         pass
454 | 
455 |     
456 |     def _render_callback(self):
457 |         pass
458 | 
459 |     
460 |     def _step_callback(self):
461 |         pass
462 | 


--------------------------------------------------------------------------------
/code/envs/robot_target_tracking_env_interface.py:
--------------------------------------------------------------------------------
 1 | # header files needed
 2 | from gym import utils
 3 | from envs.robot_target_tracking_env_sensors import RobotTargetTrackingEnv
 4 | import numpy as np
 5 | import torch
 6 | 
 7 | 
 8 | class RobotTargetTrackingEnvInterface(RobotTargetTrackingEnv, utils.EzPickle):
 9 |     def __init__(self):
10 |         RobotTargetTrackingEnv.__init__(self)
11 |         utils.EzPickle.__init__(self)
12 | 


--------------------------------------------------------------------------------
/code/envs/robot_target_tracking_env_sensors.py:
--------------------------------------------------------------------------------
  1 | """
  2 | OpenAI Gym environment with robots and targets
  3 | 
  4 | Environment source code: https://github.com/ksengin/active-target-localization/blob/master/target_localization/envs/tracking_waypoints_env.py
  5 | Modifications: Modified the existing environment with number of robots > 1 and the targets moving case
  6 | """
  7 | 
  8 | 
  9 | # header files
 10 | import os
 11 | import copy
 12 | import numpy as np
 13 | import time
 14 | import torch
 15 | import torch.nn as nn
 16 | import torch.nn.functional as F
 17 | import torchvision
 18 | 
 19 | import gym
 20 | from gym import error, spaces
 21 | from gym.utils import seeding
 22 | import matplotlib.pyplot as plt
 23 | import matplotlib.cm as cm
 24 | from matplotlib.patches import Ellipse
 25 | import matplotlib.transforms as transforms
 26 | from math import pi, cos, sin
 27 | from scipy.stats import multivariate_normal
 28 | from convnet import *
 29 | 
 30 | 
 31 | # OpenAI gym environment class
 32 | class RobotTargetTrackingEnv(gym.GoalEnv):
 33 |     
 34 |     def __init__(self):
 35 |         """
 36 |             Init method for the environment
 37 |         """
 38 |         self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 39 |         self.seed()
 40 |         self.len_workspace = 20
 41 |         self.workspace = np.array([[0, self.len_workspace], [0, self.len_workspace]])
 42 |         self.sigma_meas = 1.0
 43 |         self.time_step = 1
 44 | 
 45 |         self.action_space = spaces.Box(-np.pi, np.pi, shape=(1,), dtype='float32')
 46 |         self.convnet = ConvNet()
 47 | 
 48 | 
 49 |     def env_parametrization(self, num_targets=2, num_sensors=2, target_motion_omegas=None, meas_model='range', include_cnn=True):
 50 |         """  
 51 |             Function for parametrizing the environment
 52 |         """
 53 |         self.include_cnn = include_cnn
 54 |         self.x1_list = []
 55 |         self.y1_list = []
 56 |         self.x2_list = []
 57 |         self.y2_list = []
 58 |         self.x3_list = []
 59 |         self.y3_list = []
 60 |         self.x4_list = []
 61 |         self.y4_list = []
 62 |         self.time_step = 1
 63 |         self.num_targets = num_targets
 64 |         self.true_targets_radii = torch.rand(self.num_targets)*5.0+2.0
 65 |         self.true_targets_pos = (torch.rand(self.num_targets, 2)*self.len_workspace)
 66 |         self.initial_true_targets_pos = self.true_targets_pos.clone()
 67 |         self.estimated_targets_mean = self.true_targets_pos.clone()
 68 |         self.estimated_targets_var = torch.zeros(self.num_targets, 2, 2)
 69 |         for index in range(0, self.num_targets):
 70 |             self.estimated_targets_var[index] = torch.tensor([[1.0, 0.0], [0.0, 1.0]])
 71 |         self.target_motion_omegas = torch.rand(self.num_targets)*100.0
 72 | 
 73 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
 74 |         for index in range(0, self.num_targets):
 75 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
 76 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
 77 |             X, Y = np.meshgrid(x, y)
 78 |             pos = np.empty(X.shape + (2,))
 79 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
 80 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
 81 |             self.heatmap += rv.pdf(pos)
 82 |         if include_cnn:
 83 |             image = F.interpolate(self.heatmap.unsqueeze(0).unsqueeze(0), (256, 256), mode='bilinear', align_corners=True)
 84 |             image = image.float()
 85 |             true_obs = self.convnet(image).squeeze()
 86 |         else:  
 87 |             true_obs = self.heatmap.flatten()
 88 | 
 89 |         self.robot_movement_x_1 = []
 90 |         self.robot_movement_y_1 = []
 91 |         self.robot_movement_x_2 = []
 92 |         self.robot_movement_y_2 = []
 93 |         self.robot_movement_x_3 = []
 94 |         self.robot_movement_y_3 = []
 95 |         self.robot_movement_x_4 = []
 96 |         self.robot_movement_y_4 = []
 97 |         self.num_sensors = num_sensors
 98 |         self.sensors_pos = torch.zeros(self.num_sensors, 2)
 99 |         for index in range(0, self.num_sensors):
100 |             rand_angle = torch.rand(1)*2*np.pi
101 |             self.sensors_pos[index] = self.true_targets_pos.mean(0) + (torch.sqrt(torch.rand(1)+0.5)*(self.len_workspace/2)*(torch.tensor([torch.cos(rand_angle), torch.sin(rand_angle)])))
102 |             if(self.sensors_pos[index, 0]>=self.len_workspace):
103 |                 self.sensors_pos[index, 0] -= (self.sensors_pos[index, 0]-self.len_workspace+1)
104 |             if(self.sensors_pos[index, 0]<=0):
105 |                 self.sensors_pos[index, 0] = (-self.sensors_pos[index, 0]+1)
106 |             if(self.sensors_pos[index, 1]>=self.len_workspace):
107 |                 self.sensors_pos[index, 1] -= (self.sensors_pos[index, 1]-self.len_workspace+1)
108 |             if(self.sensors_pos[index, 1]<=0):
109 |                 self.sensors_pos[index, 1] = (-self.sensors_pos[index, 1]+1)
110 | 
111 |         self.robot_movement_x_1.append(float(self.sensors_pos[0, 0]))
112 |         self.robot_movement_y_1.append(float(self.sensors_pos[0, 1]))
113 |         self.robot_movement_x_2.append(float(self.sensors_pos[1, 0]))
114 |         self.robot_movement_y_2.append(float(self.sensors_pos[1, 1]))
115 |         #self.robot_movement_x_3.append(float(self.sensors_pos[2, 0]))
116 |         #self.robot_movement_y_3.append(float(self.sensors_pos[2, 1]))
117 |         #self.robot_movement_x_4.append(float(self.sensors_pos[3, 0]))
118 |         #self.robot_movement_y_4.append(float(self.sensors_pos[3, 1]))
119 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
120 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
121 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
122 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
123 |         #self.x3_list.append(float(self.true_targets_pos[2, 0]))
124 |         #self.y3_list.append(float(self.true_targets_pos[2, 1]))
125 |         #self.x4_list.append(float(self.true_targets_pos[3, 0]))
126 |         #self.y4_list.append(float(self.true_targets_pos[3, 1]))
127 | 
128 |         self.meas_model = meas_model
129 |         if self.meas_model == 'bearing':
130 |             self.sigma_meas = 0.2
131 |             self.normal_dist_1d_torch = lambda x, mu, sgm: 1.0/(np.sqrt(2*np.pi*sgm**2))*torch.exp(-0.5/sgm**2*(np.pi-torch.abs(torch.abs(x-mu) - np.pi))**2)
132 |         else:
133 |             self.sigma_meas = 1.0
134 |             self.normal_dist_1d_torch = lambda x, mu, sgm: 1.0/(np.sqrt(2*np.pi*sgm**2))*np.exp(-0.5/sgm**2*(np.abs(x-mu)**2))
135 |         self.state = torch.cat((self.sensors_pos[0], self.sensors_pos[1], torch.tensor(true_obs).float()))
136 |         self.observation_space = spaces.Box(-np.inf, np.inf, shape=self.state.shape, dtype='float32')
137 | 
138 |     
139 |     def step(self, action, step_size):
140 |         """ 
141 |             Function to update the environment
142 |         """
143 |         start = time.time()
144 |         self._set_action(action, step_size)
145 | 
146 |         self.time_step = self.time_step + 1
147 |         self.update_true_targets_pos()
148 |         self.update_estimated_targets_pos()
149 | 
150 |         self.robot_movement_x_1.append(float(self.sensors_pos[0, 0]))
151 |         self.robot_movement_y_1.append(float(self.sensors_pos[0, 1]))
152 |         self.robot_movement_x_2.append(float(self.sensors_pos[1, 0]))
153 |         self.robot_movement_y_2.append(float(self.sensors_pos[1, 1]))
154 |         #self.robot_movement_x_3.append(float(self.sensors_pos[2, 0]))
155 |         #self.robot_movement_y_3.append(float(self.sensors_pos[2, 1]))
156 |         #self.robot_movement_x_4.append(float(self.sensors_pos[3, 0]))
157 |         #self.robot_movement_y_4.append(float(self.sensors_pos[3, 1]))
158 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
159 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
160 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
161 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
162 |         #self.x3_list.append(float(self.true_targets_pos[2, 0]))
163 |         #self.y3_list.append(float(self.true_targets_pos[2, 1]))
164 |         #self.x4_list.append(float(self.true_targets_pos[3, 0]))
165 |         #self.y4_list.append(float(self.true_targets_pos[3, 1]))
166 | 
167 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
168 |         for index in range(0, self.num_targets):
169 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
170 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
171 |             X, Y = np.meshgrid(x, y)
172 |             pos = np.empty(X.shape + (2,))
173 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
174 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
175 |             self.heatmap += rv.pdf(pos)
176 |         if self.include_cnn:
177 |             image = F.interpolate(self.heatmap.unsqueeze(0).unsqueeze(0), (256, 256), mode='bilinear', align_corners=True)
178 |             image = image.float()
179 |             true_obs = self.convnet(image).squeeze()
180 |         else:
181 |             true_obs = self.heatmap.flatten()
182 | 
183 |         done = False
184 |         reward = None
185 |         reward, done = self.compute_reward()
186 |         for index in range(0, self.num_sensors):
187 |             if(self.time_step>200 or float(self.sensors_pos[index, 0]) <= 0 or float(self.sensors_pos[index, 1]) <= 0 or float(self.sensors_pos[index, 0]) >= self.len_workspace or float(self.sensors_pos[index, 1]) >= self.len_workspace):
188 |                 done = True
189 | 
190 |         self.state = torch.cat((self.sensors_pos[0], self.sensors_pos[1], torch.tensor(true_obs).float()))
191 |         end = time.time() - start
192 |         return self.state, reward, done, None, self.estimated_targets_var
193 | 
194 |     
195 |     def reset(self, **kwargs):
196 |         """ 
197 |             Function to reset the environment
198 |         """
199 |         self.x1_list = []
200 |         self.y1_list = []
201 |         self.x2_list = []
202 |         self.y2_list = []
203 |         self.x3_list = []
204 |         self.y3_list = []
205 |         self.x4_list = []
206 |         self.y4_list = []
207 |         self.time_step = 1
208 |         self.true_targets_radii = torch.rand(self.num_targets)*5.0+2.0
209 |         self.true_targets_pos = (torch.rand(self.num_targets, 2)*self.len_workspace)
210 |         self.initial_true_targets_pos = self.true_targets_pos.clone()
211 |         self.estimated_targets_mean = self.true_targets_pos.clone()
212 |         self.estimated_targets_var = torch.zeros(self.num_targets, 2, 2)
213 |         for index in range(0, self.num_targets):
214 |             self.estimated_targets_var[index] = torch.tensor([[1, 0], [0, 1]])
215 |         self.target_motion_omegas = torch.rand(self.num_targets)*100.0
216 | 
217 |         self.robot_movement_x_1 = []
218 |         self.robot_movement_y_1 = []
219 |         self.robot_movement_x_2 = []
220 |         self.robot_movement_y_2 = []
221 |         self.robot_movement_x_3 = []
222 |         self.robot_movement_y_3 = []
223 |         self.robot_movement_x_4 = []
224 |         self.robot_movement_y_4 = []
225 |         self.sensors_pos = torch.zeros(self.num_sensors, 2)
226 |         for index in range(0, self.num_sensors):
227 |             rand_angle = torch.rand(1)*2*np.pi
228 |             self.sensors_pos[index] = self.true_targets_pos.mean(0)+(torch.sqrt(torch.rand(1)+0.5)*(self.len_workspace/2)*(torch.tensor([torch.cos(rand_angle), torch.sin(rand_angle)])))
229 |             if(self.sensors_pos[index, 0]>=self.len_workspace):
230 |                 self.sensors_pos[index, 0] -= (self.sensors_pos[index, 0]-self.len_workspace+1)
231 |             if(self.sensors_pos[index, 0]<=0):
232 |                 self.sensors_pos[index, 0] = (-self.sensors_pos[index, 0]+1)
233 |             if(self.sensors_pos[index, 1]>=self.len_workspace):
234 |                 self.sensors_pos[index, 1] -= (self.sensors_pos[index, 1]-self.len_workspace+1)
235 |             if(self.sensors_pos[index, 1]<=0):
236 |                 self.sensors_pos[index, 1] = (-self.sensors_pos[index, 1]+1)  
237 | 
238 |         self.robot_movement_x_1.append(float(self.sensors_pos[0, 0]))
239 |         self.robot_movement_y_1.append(float(self.sensors_pos[0, 1]))
240 |         self.robot_movement_x_2.append(float(self.sensors_pos[1, 0]))
241 |         self.robot_movement_y_2.append(float(self.sensors_pos[1, 1]))
242 |         #self.robot_movement_x_3.append(float(self.sensors_pos[2, 0]))
243 |         #self.robot_movement_y_3.append(float(self.sensors_pos[2, 1]))
244 |         #self.robot_movement_x_4.append(float(self.sensors_pos[3, 0]))
245 |         #self.robot_movement_y_4.append(float(self.sensors_pos[3, 1]))
246 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
247 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
248 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
249 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
250 |         #self.x3_list.append(float(self.true_targets_pos[2, 0]))
251 |         #self.y3_list.append(float(self.true_targets_pos[2, 1]))
252 |         #self.x4_list.append(float(self.true_targets_pos[3, 0]))
253 |         #self.y4_list.append(float(self.true_targets_pos[3, 1]))
254 | 
255 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
256 |         for index in range(0, self.num_targets):
257 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
258 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
259 |             X, Y = np.meshgrid(x, y)
260 |             pos = np.empty(X.shape + (2,))
261 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
262 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
263 |             self.heatmap += rv.pdf(pos)
264 |         if self.include_cnn:
265 |             image = F.interpolate(self.heatmap.unsqueeze(0).unsqueeze(0), (256, 256), mode='bilinear', align_corners=True)
266 |             image = image.float()
267 |             true_obs = self.convnet(image).squeeze()
268 |         else:
269 |             true_obs = self.heatmap.flatten()
270 | 
271 |         self.state = torch.cat((self.sensors_pos[0], self.sensors_pos[1], torch.tensor(true_obs).float()))
272 |         self.observation_space = spaces.Box(-np.inf, np.inf, shape=self.state.shape, dtype='float32')
273 |         return self.state, self.sensors_pos, self.estimated_targets_mean, self.true_targets_radii, self.target_motion_omegas
274 | 
275 | 
276 |     def set_env(self, sensors_pos, estimated_targets_mean, true_targets_radii, target_motion_omegas):
277 |         self.x1_list = []
278 |         self.y1_list = []
279 |         self.x2_list = []
280 |         self.y2_list = []
281 |         self.x3_list = []
282 |         self.y3_list = []
283 |         self.x4_list = []
284 |         self.y4_list = []
285 |         self.time_step = 1
286 |         self.true_targets_radii = true_targets_radii
287 |         self.true_targets_pos = estimated_targets_mean
288 |         self.initial_true_targets_pos = self.true_targets_pos.clone()
289 |         self.estimated_targets_mean = self.true_targets_pos.clone()
290 |         self.estimated_targets_var = torch.zeros(self.num_targets, 2, 2)
291 |         for index in range(0, self.num_targets):
292 |             self.estimated_targets_var[index] = torch.tensor([[1, 0], [0, 1]])
293 |         self.target_motion_omegas = target_motion_omegas
294 |         
295 |         self.robot_movement_x_1 = []
296 |         self.robot_movement_y_1 = []
297 |         self.robot_movement_x_2 = []
298 |         self.robot_movement_y_2 = []
299 |         self.robot_movement_x_3 = []
300 |         self.robot_movement_y_3 = []
301 |         self.robot_movement_x_4 = []
302 |         self.robot_movement_y_4 = []
303 |         self.sensors_pos = sensors_pos
304 | 
305 |         self.robot_movement_x_1.append(float(self.sensors_pos[0, 0]))
306 |         self.robot_movement_y_1.append(float(self.sensors_pos[0, 1]))
307 |         self.robot_movement_x_2.append(float(self.sensors_pos[1, 0]))
308 |         self.robot_movement_y_2.append(float(self.sensors_pos[1, 1]))
309 |         #self.robot_movement_x_3.append(float(self.sensors_pos[2, 0]))
310 |         #self.robot_movement_y_3.append(float(self.sensors_pos[2, 1]))
311 |         #self.robot_movement_x_4.append(float(self.sensors_pos[3, 0]))
312 |         #self.robot_movement_y_4.append(float(self.sensors_pos[3, 1]))
313 |         self.x1_list.append(float(self.true_targets_pos[0, 0]))
314 |         self.y1_list.append(float(self.true_targets_pos[0, 1]))
315 |         self.x2_list.append(float(self.true_targets_pos[1, 0]))
316 |         self.y2_list.append(float(self.true_targets_pos[1, 1]))
317 |         #self.x3_list.append(float(self.true_targets_pos[2, 0]))
318 |         #self.y3_list.append(float(self.true_targets_pos[2, 1]))
319 |         #self.x4_list.append(float(self.true_targets_pos[3, 0]))
320 |         #self.y4_list.append(float(self.true_targets_pos[3, 1]))
321 | 
322 |         self.heatmap = torch.zeros(self.len_workspace, self.len_workspace)
323 |         for index in range(0, self.num_targets):
324 |             x = np.linspace(0, self.len_workspace, self.len_workspace)
325 |             y = np.linspace(0, self.len_workspace, self.len_workspace)
326 |             X, Y = np.meshgrid(x, y)
327 |             pos = np.empty(X.shape + (2,))
328 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
329 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
330 |             self.heatmap += rv.pdf(pos)
331 |         if self.include_cnn:
332 |             image = F.interpolate(self.heatmap.unsqueeze(0).unsqueeze(0), (256, 256), mode='bilinear', align_corners=True)
333 |             image = image.float()
334 |             true_obs = self.convnet(image).squeeze()
335 |         else:
336 |             true_obs = self.heatmap.flatten()
337 | 
338 |         self.state = torch.cat((self.sensors_pos[0], self.sensors_pos[1], torch.tensor(true_obs).float()))
339 |         self.observation_space = spaces.Box(-np.inf, np.inf, shape=self.state.shape, dtype='float32')
340 |         return self.state, self.sensors_pos, self.estimated_targets_mean, self.true_targets_radii, self.target_motion_omegas
341 | 
342 |     
343 |     def close(self):
344 |         """ 
345 |             Function to close the environment
346 |         """
347 |         pass
348 | 
349 |             
350 |     def render(self):
351 |         """ 
352 |             Function for rendering the environment
353 |         """
354 |         heatmap = torch.zeros(256, 256)
355 |         for index in range(0, self.num_targets):
356 |             x = np.linspace(0, self.len_workspace, 256)
357 |             y = np.linspace(0, self.len_workspace, 256)
358 |             X, Y = np.meshgrid(x, y)
359 |             pos = np.empty(X.shape + (2,))
360 |             pos[:, :, 0] = X; pos[:, :, 1] = Y
361 |             rv = multivariate_normal(self.estimated_targets_mean[index], self.estimated_targets_var[index])
362 |             heatmap += rv.pdf(pos)
363 |         x = np.linspace(0, self.len_workspace, 256)
364 |         y = np.linspace(0, self.len_workspace, 256)
365 |         X, Y = np.meshgrid(x, y)
366 |         plt.cla()
367 |         plt.title("Time step = " + str(self.time_step))
368 |         plt.xlabel("x")
369 |         plt.ylabel("y")
370 |         plt.xlim(0, self.len_workspace)
371 |         plt.ylim(0, self.len_workspace)
372 |         plt.contourf(X, Y, heatmap, cmap=cm.inferno)
373 |         plt.plot(self.x1_list, self.y1_list, 'b--')
374 |         plt.plot(self.x1_list[len(self.x1_list)-1], self.y1_list[len(self.y1_list)-1], 'o', c='b', marker='*')
375 |         plt.plot(self.x2_list, self.y2_list, 'b--')
376 |         plt.plot(self.x2_list[len(self.x2_list)-1], self.y2_list[len(self.y2_list)-1], 'o', c='b', marker='*')
377 |         #plt.plot(self.x3_list, self.y3_list, 'b--')
378 |         #plt.plot(self.x3_list[len(self.x3_list)-1], self.y3_list[len(self.y3_list)-1], 'o', c='b', marker='*')
379 |         #plt.plot(self.x4_list, self.y4_list, 'b--')
380 |         #plt.plot(self.x4_list[len(self.x4_list)-1], self.y4_list[len(self.y4_list)-1], 'o', c='b', marker='*')
381 |         if(len(self.robot_movement_x_1)<8):
382 |             plt.plot(self.robot_movement_x_1, self.robot_movement_y_1, 'r--')
383 |         else:
384 |             plt.plot(self.robot_movement_x_1[-8:], self.robot_movement_y_1[-8:], 'r--')
385 |         plt.scatter(self.robot_movement_x_1[len(self.robot_movement_x_1) - 1], self.robot_movement_y_1[len(self.robot_movement_y_1) - 1], color='r', marker='D')
386 |         if(len(self.robot_movement_x_2)<8):
387 |             plt.plot(self.robot_movement_x_2, self.robot_movement_y_2, 'r--')
388 |         else:
389 |             plt.plot(self.robot_movement_x_2[-8:], self.robot_movement_y_2[-8:], 'r--')
390 |         plt.scatter(self.robot_movement_x_2[len(self.robot_movement_x_2) - 1], self.robot_movement_y_2[len(self.robot_movement_y_2) - 1], color='r', marker='D')
391 |         #if(len(self.robot_movement_x_3) < 8):
392 |         #    plt.plot(self.robot_movement_x_3, self.robot_movement_y_3, 'r--')
393 |         #else:
394 |         #    plt.plot(self.robot_movement_x_3[-8:], self.robot_movement_y_3[-8:], 'r--')
395 |         #plt.scatter(self.robot_movement_x_3[len(self.robot_movement_x_3) - 1], self.robot_movement_y_3[len(self.robot_movement_y_3) - 1], color='r', marker='D')
396 |         #if(len(self.robot_movement_x_4) < 8):
397 |         #    plt.plot(self.robot_movement_x_4, self.robot_movement_y_4, 'r--')
398 |         #else:
399 |         #    plt.plot(self.robot_movement_x_4[-8:], self.robot_movement_y_4[-8:], 'r--')
400 |         #plt.scatter(self.robot_movement_x_4[len(self.robot_movement_x_4) - 1], self.robot_movement_y_4[len(self.robot_movement_y_4) - 1], color='r', marker='D')
401 |         plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s2/" + str(self.time_step) + ".png")
402 |         #plt.show()
403 | 
404 | 
405 |     # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
406 |     def get_correlated_dataset(self, n, dependency, mu, scale):
407 |         latent = np.random.randn(n, 2)
408 |         dependent = latent.dot(dependency)
409 |         scaled = dependent * scale
410 |         scaled_with_offset = scaled + mu
411 |         return (scaled_with_offset[:, 0], scaled_with_offset[:, 1])
412 | 
413 | 
414 |     def compute_reward(self):
415 |         """ 
416 |             Function for computing the reward
417 |         """
418 |         val = 0.0
419 |         for index in range(0, self.num_targets):
420 |             val += np.log(np.linalg.det(self.estimated_targets_var[index]))
421 |         return -val, False  
422 | 
423 | 
424 |     def _reset_sim(self, **kwargs):
425 |         raise NotImplementedError()
426 | 
427 | 
428 |     def _get_obs(self):
429 |         """ 
430 |             Get observation function
431 |             Returns the noisy relative measurement depending on the measurement model (bearing or range)
432 |         """
433 |         targets_pos = self.true_targets_pos
434 |         sensor_pos = self.sensors_pos
435 | 
436 |         if self.meas_model=='bearing':
437 |             true_measurement = torch.zeros(self.num_sensors, self.num_targets)
438 |             for index in range(0, self.num_sensors):
439 |                 true_measurement[index] = torch.atan2(targets_pos[:, 1]-sensor_pos[index, 1], targets_pos[:, 0]-sensor_pos[index, 0])
440 |         elif self.meas_model=='range':
441 |             true_measurement = torch.zeros(self.num_sensors, self.num_targets)
442 |             for index in range(0, self.num_sensors):
443 |                 true_measurement[index] = torch.norm(targets_pos-sensor_pos[index], p=2, dim=1)
444 | 
445 |         true_obs = true_measurement+(self.sigma_meas*torch.randn(sensor_pos.shape[0], targets_pos.shape[0]))
446 |         return true_obs
447 | 
448 | 
449 |     def get_estimated_obs(self):
450 |         """ 
451 |             Get observation function
452 |             Returns the noisy relative measurement depending on the measurement model (bearing or range)
453 |         """
454 |         targets_pos = self.estimated_targets_mean
455 |         sensor_pos = self.sensors_pos
456 | 
457 |         if self.meas_model == 'bearing':
458 |             true_measurement = torch.zeros(self.num_sensors, self.num_targets)
459 |             for index in range(0, self.num_sensors):
460 |                 true_measurement[index] = torch.atan2(targets_pos[:, 1]-sensor_pos[index, 1], targets_pos[:, 0]-sensor_pos[index, 0])
461 |         elif self.meas_model == 'range':
462 |             for index in range(0, self.num_sensors):
463 |                 true_measurement[index] = torch.norm(targets_pos-sensor_pos[index], p=2, dim=1)
464 |         return true_measurement
465 | 
466 | 
467 |     def update_true_targets_pos(self):
468 |         """
469 |             Function to update the true target positions when time step increases (assuming circular target motions)
470 |         """
471 |         for index in range(0, self.num_targets):
472 |             self.true_targets_pos[index] = torch.tensor([self.true_targets_radii[index]*np.cos((self.time_step-1)/float(self.target_motion_omegas[index]))+float(self.initial_true_targets_pos[index, 0])-self.true_targets_radii[index], self.true_targets_radii[index]*np.sin((self.time_step-1)/float(self.target_motion_omegas[index]))+float(self.initial_true_targets_pos[index, 1])]) 
473 | 
474 | 
475 |     def update_estimated_targets_pos(self):
476 |         """
477 |             Function to update the estimated target positions when time step increases (using ekf)
478 |         """
479 |         true_obs = self._get_obs()
480 |         for index in range(0, self.num_targets):
481 |             z_true = np.zeros((self.num_sensors, 1))
482 |             for sensor_index in range(0, self.num_sensors):
483 |                 z_true[sensor_index, 0] = true_obs[sensor_index, index]         
484 |             q_matrix = 0.2*np.eye(2)
485 |             x_matrix = np.array([[float(self.estimated_targets_mean[index, 0])], [float(self.estimated_targets_mean[index, 1])]])
486 |             sigma_matrix = self.estimated_targets_var[index].numpy()+q_matrix
487 | 
488 |             z_pred = np.zeros((self.num_sensors, 1))
489 |             h_matrix = np.zeros((self.num_sensors, 2))
490 |             for sensor_index in range(0, self.num_sensors):
491 |                 z_pred[sensor_index, 0] = np.linalg.norm([[x_matrix[0, 0]-float(self.sensors_pos[sensor_index, 0])], [x_matrix[1, 0]-float(self.sensors_pos[sensor_index, 1])]], 2)
492 |                 h_matrix[sensor_index, 0] = (-1.0/z_pred[sensor_index])*(float(self.sensors_pos[sensor_index, 0])-x_matrix[0, 0])
493 |                 h_matrix[sensor_index, 1] = (-1.0/z_pred[sensor_index])*(float(self.sensors_pos[sensor_index, 1])-x_matrix[1, 0])
494 |             
495 |             res = (z_true-z_pred)
496 |             r_matrix = 1.0*1.0*np.eye(self.num_sensors)
497 |             s_matrix = np.matmul(np.matmul(h_matrix, sigma_matrix), h_matrix.T)+r_matrix
498 |             k_matrix = np.matmul(np.matmul(sigma_matrix, h_matrix.T), np.linalg.inv(s_matrix))
499 | 
500 |             x_matrix_tplus1 = x_matrix+(np.matmul(k_matrix, res))
501 |             sigma_matrix_tplus1 = np.matmul(np.matmul((np.eye(2)-np.matmul(k_matrix, h_matrix)), sigma_matrix), (np.eye(2)-np.matmul(k_matrix, h_matrix)).T)+np.matmul(np.matmul(k_matrix, r_matrix), k_matrix.T)
502 |             target_xhat_tplus1 = x_matrix_tplus1[0, 0]
503 |             target_yhat_tplus1 = x_matrix_tplus1[1, 0]
504 |             
505 |             self.estimated_targets_mean[index, 0] = target_xhat_tplus1
506 |             self.estimated_targets_mean[index, 1] = target_yhat_tplus1
507 |             self.estimated_targets_var[index] = torch.tensor(sigma_matrix_tplus1)
508 | 
509 | 
510 |     def _get_true_target_position(self):
511 |         """
512 |             Function to return the true target positions.
513 |         """
514 |         return self.true_targets_pos
515 | 
516 | 
517 |     def _get_estimated_target_position(self):
518 |         """
519 |             Function to return the estimated target positions.
520 |         """
521 |         return self.estimated_targets_mean
522 | 
523 |     
524 |     def get_posterior_map(self):
525 |         return self.posterior_map
526 | 
527 |     
528 |     def _set_action(self, action, step_size):
529 |         """
530 |             Applies the given action to the sensor.
531 |         """
532 |         for index in range(0, self.num_sensors):
533 |             curr_action = torch.tensor(action[index]).float()
534 |             vector = step_size[index]*torch.tensor([torch.cos(curr_action), torch.sin(curr_action)])
535 |             sensor_pos = self.sensors_pos[index]+vector
536 |             if(sensor_pos[0]>=self.len_workspace):
537 |                 sensor_pos[0] = self.sensors_pos[index, 0]
538 |             if(sensor_pos[1]>=self.len_workspace):
539 |                 sensor_pos[1] = self.sensors_pos[index, 1]
540 |             self.sensors_pos[index] = sensor_pos   
541 | 
542 |         
543 |     def seed(self, seed=None):
544 |         """
545 |             Function that returns a random seed using OpenAI Gym seeding.
546 |         """
547 |         _, seed = seeding.np_random(seed)
548 |         return [seed]
549 |         
550 |         
551 |     def _is_success(self, achieved_goal, desired_goal):
552 |         raise NotImplementedError()
553 | 
554 |         
555 |     def _sample_goal(self):
556 |         return torch.rand(2)
557 | 
558 |     
559 |     def _env_setup(self, initial_qpos):
560 |         pass
561 | 
562 |     
563 |     def _viewer_setup(self):
564 |         pass
565 | 
566 |     
567 |     def _render_callback(self):
568 |         pass
569 | 
570 |     
571 |     def _step_callback(self):
572 |         pass
573 | 


--------------------------------------------------------------------------------
/code/eval.py:
--------------------------------------------------------------------------------
  1 | # header files
  2 | import gym
  3 | import math
  4 | import random
  5 | import numpy as np
  6 | import matplotlib
  7 | import matplotlib.pyplot as plt
  8 | from collections import namedtuple
  9 | from itertools import count
 10 | from PIL import Image
 11 | from gym.envs.registration import register
 12 | import logging
 13 | import torch
 14 | import torch.nn as nn
 15 | import torch.optim as optim
 16 | import torch.nn.functional as F
 17 | import torchvision.transforms as T
 18 | from td3 import *
 19 | from replay_buffer import *
 20 | from utils import *
 21 | 
 22 | 
 23 | logger = logging.getLogger(__name__)
 24 | register(
 25 |     id='RobotTargetTrackingEnv-v0',
 26 |     entry_point='envs:RobotTargetTrackingEnvInterface',
 27 | )
 28 | 
 29 | # initialise environment
 30 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 31 | env_name = "RobotTargetTrackingEnv-v0"
 32 | env_1 = gym.make(env_name)
 33 | env_1.env_parametrization()
 34 | env_1.reset()
 35 | env_2 = gym.make(env_name)
 36 | env_2.env_parametrization(include_cnn=False)
 37 | env_2.reset()
 38 | 
 39 | # create TD3 object and load optimal policy
 40 | state_dim = env_1.observation_space.shape[0]
 41 | action_dim = env_1.action_space.shape[0]
 42 | max_action = float(env_1.action_space.high[0])
 43 | policy_1 = TD3(0.0005, state_dim, 4, max_action)
 44 | policy_1.load_actor("/home/arpitdec5/Desktop/robot_target_tracking/", "sample_model_sensors_2_targets_4")
 45 | state_dim = env_2.observation_space.shape[0]
 46 | action_dim = env_2.action_space.shape[0]
 47 | max_action = float(env_2.action_space.high[0])
 48 | policy_2 = TD3(0.0005, state_dim, 4, max_action)
 49 | policy_2.load_actor("/home/arpitdec5/Desktop/robot_target_tracking/", "model_sensors_2_targets_4")
 50 | 
 51 | # eval loop
 52 | greedy_cov = []
 53 | rl1_cov = []
 54 | rl2_cov = []
 55 | step = []
 56 | ratio = []
 57 | for index in range(0, 500):
 58 |     # init environment 1
 59 |     state, sensors, targets, radii, omegas = env_1.reset()
 60 |     state_2, sensors_2, targets_2, radii_2, omegas_2 = env_2.set_env(sensors, targets, radii, omegas)
 61 |     step.append(index+1)
 62 |     print(index)
 63 | 
 64 |     ############################ Greedy Algo #######################################
 65 |     '''
 66 |     # init ekf variables
 67 |     mean_1 = np.asarray([float(targets[0, 0]), float(targets[0, 1])])
 68 |     x_true_1, y_true_1 = float(targets[0, 0]), float(targets[0, 1])
 69 |     var_1 = [[1, 0], [0, 1]]
 70 |     init_pos_1 = [float(targets[0, 0]) - float(radii[0]), float(targets[0, 1])]
 71 |     mean_2 = np.asarray([float(targets[1, 0]), float(targets[1, 1])])
 72 |     x_true_2, y_true_2 = float(targets[1, 0]), float(targets[1, 1])
 73 |     var_2 = [[1, 0], [0, 1]]
 74 |     init_pos_2 = [float(targets[1, 0]) - float(radii[1]), float(targets[1, 1])]
 75 |     mean_3 = np.asarray([float(targets[2, 0]), float(targets[2, 1])])
 76 |     x_true_3, y_true_3 = float(targets[2, 0]), float(targets[2, 1])
 77 |     var_3 = [[1, 0], [0, 1]]
 78 |     init_pos_3 = [float(targets[2, 0]) - float(radii[2]), float(targets[2, 1])]
 79 |     mean_4 = np.asarray([float(targets[3, 0]), float(targets[3, 1])])
 80 |     x_true_4, y_true_4 = float(targets[3, 0]), float(targets[3, 1])
 81 |     var_4 = [[1, 0], [0, 1]]
 82 |     init_pos_4 = [float(targets[3, 0]) - float(radii[3]), float(targets[3, 1])]
 83 |     robots_x = []
 84 |     robots_y = []
 85 |     robots_id = []
 86 |     for index in range(0, len(sensors)):
 87 |         robots_x.append(float(sensors[index, 0]))
 88 |         robots_y.append(float(sensors[index, 1]))
 89 |         robots_id.append(index+1)     
 90 |     map_height = 20
 91 |     map_width = 20
 92 |     action_radius = 1.0
 93 |     robot_movement_x = [robots_x[0]]
 94 |     robot_movement_y = [robots_y[0]]
 95 |     true_target_x_1 = [x_true_1]
 96 |     true_target_y_1 = [y_true_1]
 97 |     prev_target_x_1 = x_true_1
 98 |     prev_target_y_1 = y_true_1
 99 |     true_target_x_2 = [x_true_2]
100 |     true_target_y_2 = [y_true_2]
101 |     prev_target_x_2 = x_true_2
102 |     prev_target_y_2 = y_true_2
103 |     true_target_x_3 = [x_true_3]
104 |     true_target_y_3 = [y_true_3]
105 |     prev_target_x_3 = x_true_3
106 |     prev_target_y_3 = y_true_3
107 |     true_target_x_4 = [x_true_4]
108 |     true_target_y_4 = [y_true_4]
109 |     prev_target_x_4 = x_true_4
110 |     prev_target_y_4 = y_true_4
111 |     prev_robot_x = robots_x[0]
112 |     prev_robot_y = robots_y[0]
113 | 
114 |     # estimate target position after each time step
115 |     avg_val_greedy = 0.0
116 |     i=0
117 |     for t in range(2, 200):
118 |         i += 1
119 | 
120 |         # update target position
121 |         target_x_mean_1, target_y_mean_1, var_1, x_true_1, y_true_1 = extended_kalman_filter(mean_1[0], mean_1[1], var_1, robots_x, robots_y, robots_id, t, init_pos_1[0], init_pos_1[1], 1, float(omegas[0]), float(radii[0]))
122 |         mean_1 = np.asarray([target_x_mean_1, target_y_mean_1])
123 |         target_x_mean_2, target_y_mean_2, var_2, x_true_2, y_true_2 = extended_kalman_filter(mean_2[0], mean_2[1], var_2, robots_x, robots_y, robots_id, t, init_pos_2[0], init_pos_2[1], 1, float(omegas[1]), float(radii[1]))
124 |         mean_2 = np.asarray([target_x_mean_2, target_y_mean_2])
125 |         target_x_mean_3, target_y_mean_3, var_3, x_true_3, y_true_3 = extended_kalman_filter(mean_3[0], mean_3[1], var_3, robots_x, robots_y, robots_id, t, init_pos_3[0], init_pos_3[1], 1, float(omegas[2]), float(radii[2]))
126 |         mean_3 = np.asarray([target_x_mean_3, target_y_mean_3])
127 |         target_x_mean_4, target_y_mean_4, var_4, x_true_4, y_true_4 = extended_kalman_filter(mean_4[0], mean_4[1], var_4, robots_x, robots_y, robots_id, t, init_pos_4[0], init_pos_4[1], 1, float(omegas[3]), float(radii[3]))
128 |         mean_4 = np.asarray([target_x_mean_4, target_y_mean_4])
129 | 
130 |         # update true target position
131 |         true_target_x_1.append(x_true_1)
132 |         true_target_y_1.append(y_true_1)
133 |         true_target_x_2.append(x_true_2)
134 |         true_target_y_2.append(y_true_2)
135 |         true_target_x_3.append(x_true_3)
136 |         true_target_y_3.append(y_true_3)
137 |         true_target_x_4.append(x_true_4)
138 |         true_target_y_4.append(y_true_4)
139 | 
140 |         # add robot position for rendering
141 |         robot_movement_x.append(robots_x[0])
142 |         robot_movement_y.append(robots_y[0])
143 |     
144 |         # plot map
145 |         #render_ekf([target_x_mean_1, target_y_mean_1], [target_x_mean_2, target_y_mean_2], var_1, var_2, t, true_target_x_1, true_target_y_1,  true_target_x_2, true_target_y_2, robot_movement_x, robot_movement_y)
146 | 
147 |         # update robot position
148 |         next_robot_x, next_robot_y, val = update_robot_pos_ekf(robots_x[0], robots_y[0], [target_x_mean_1, target_x_mean_2, target_x_mean_3, target_x_mean_4], [target_y_mean_1, target_y_mean_2, target_y_mean_3, target_y_mean_4], [var_1, var_2, var_3, var_4], [prev_target_x_1, prev_target_x_2, prev_target_x_3, prev_target_x_4], [prev_target_y_1, prev_target_y_2, prev_target_y_3, prev_target_y_4], action_radius, map_height, map_width, t+1, prev_robot_x, prev_robot_y)
149 |         avg_val_greedy += np.linalg.det(var_1) + np.linalg.det(var_2) + np.linalg.det(var_3) + np.linalg.det(var_4)
150 |         prev_robot_x = robots_x[0]
151 |         prev_robot_y = robots_y[0]
152 |         robots_x[0] = next_robot_x
153 |         robots_y[0] = next_robot_y
154 |         prev_target_x_1 = target_x_mean_1
155 |         prev_target_y_1 = target_y_mean_1
156 |         prev_target_x_2 = target_x_mean_2
157 |         prev_target_y_2 = target_y_mean_2
158 |         prev_target_x_3 = target_x_mean_3
159 |         prev_target_y_3 = target_y_mean_3
160 |         prev_target_x_4 = target_x_mean_4
161 |         prev_target_y_4 = target_y_mean_4
162 |     avg_val_greedy = avg_val_greedy/i
163 |     greedy_cov.append(avg_val_greedy)
164 |     '''
165 |     #################################################################################
166 | 
167 | 
168 |     ############################ RL_1 Algo #######################################
169 |     average_cov_rl_1 = 0.0
170 |     cov_rl = 0.0
171 |     i = 0
172 |     target_1 = []
173 |     target_2 = []
174 |     target_3 = []
175 |     target_4 = []
176 |     for iter in range(0, 100):
177 |         i += 1
178 |         action_1, step_size_1, action_2, step_size_2 = policy_1.select_action(state)
179 |         next_state, reward, done, _, var = env_1.step([action_1, action_2], [step_size_1, step_size_2])
180 |         state = next_state
181 |         #env_1.render()
182 |         #env_1.close()
183 | 
184 |         #average_cov_rl_1 += np.linalg.det(var[0])+np.linalg.det(var[1])+np.linalg.det(var[2])+np.linalg.det(var[3])
185 |         average_cov_rl_1 += np.linalg.det(var[0])+np.linalg.det(var[1])
186 |         #average_cov_rl += np.linalg.det(var[0])
187 |         #cov_rl = np.linalg.det(var[0])
188 |         for index in range(0, len(var)):
189 |             if(index==0):
190 |                 target_1.append(np.linalg.det(var[0]))
191 |             elif(index==1):
192 |                 target_2.append(np.linalg.det(var[1]))
193 |             elif(index==2):
194 |                 target_3.append(np.linalg.det(var[2]))
195 |             else:
196 |                 target_4.append(np.linalg.det(var[3]))
197 |         if(done):
198 |             break
199 |     average_cov_rl_1 = average_cov_rl_1/i
200 |     rl1_cov.append(average_cov_rl_1)
201 | 
202 |     ############################ RL_2 Algo #######################################
203 |     average_cov_rl_2 = 0.0
204 |     cov_rl = 0.0
205 |     i = 0
206 |     target_1 = []
207 |     target_2 = []
208 |     target_3 = []
209 |     target_4 = []
210 |     for iter in range(0, 100):
211 |         i += 1
212 |         action_1, step_size_1, action_2, step_size_2 = policy_2.select_action(state_2)
213 |         next_state, reward, done, _, var = env_2.step([action_1, action_2], [step_size_1, step_size_2])
214 |         state_2 = next_state
215 |         #env_2.render()
216 |         #env_2.close()
217 | 
218 |         #average_cov_rl_2 += np.linalg.det(var[0])+np.linalg.det(var[1])+np.linalg.det(var[2])+np.linalg.det(var[3])
219 |         average_cov_rl_2 += np.linalg.det(var[0])+np.linalg.det(var[1])
220 |         #average_cov_rl += np.linalg.det(var[0])
221 |         #cov_rl = np.linalg.det(var[0])
222 |         for index in range(0, len(var)):
223 |             if(index==0):
224 |                 target_1.append(np.linalg.det(var[0]))
225 |             elif(index==1):
226 |                 target_2.append(np.linalg.det(var[1]))
227 |             elif(index==2):
228 |                 target_3.append(np.linalg.det(var[2]))
229 |             else:
230 |                 target_4.append(np.linalg.det(var[3]))
231 |         if(done):
232 |             break
233 |     average_cov_rl_2 = average_cov_rl_2/i
234 |     rl2_cov.append(average_cov_rl_2)
235 |     ratio.append(average_cov_rl_1/average_cov_rl_2)
236 | 
237 | 
238 | # plot curve
239 | plt.cla()
240 | plt.title("Plot for scenario: sensors=2 and targets=4")
241 | plt.xlabel("Data(Ratio of Avg. Determinant of covariance matrix(RL_1/RL_2))")
242 | plt.ylabel("Bin Count")
243 | plt.hist(ratio, density=False, bins=30)
244 | plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/ratio_episode_curve_sensors_2_targets_4.png")
245 | plt.cla()
246 | plt.title("Plot for scenario: sensors=2 and targets=4")
247 | plt.xlabel("Episodes")
248 | plt.ylabel("Avg. Determinant of covariance matrix")
249 | plt.plot(step, rl2_cov, label='RL_2 Algo')
250 | plt.plot(step, rl1_cov, label='RL_1 Algo')
251 | plt.legend()
252 | plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/det_episode_curve_sensors_2_targets_4.png")
253 | 


--------------------------------------------------------------------------------
/code/replay_buffer.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | 
 4 | # References:
 5 | # 1. https://github.com/ksengin/active-target-localization/blob/master/target_localization/util/replay_buffer.py
 6 | 
 7 | 
 8 | class ReplayBuffer(object):
 9 |     def __init__(self, state_dim, action_dim, max_size=100000):
10 |         self.buffer = []
11 |         self.max_size = int(max_size)
12 |         self.state_dim = state_dim
13 |         self.action_dim = action_dim
14 |         self.size = 0
15 |     
16 |     def add(self, transition):
17 |         self.size +=1
18 |         # transiton is tuple of (state, action, reward, next_state, done)
19 |         self.buffer.append(transition)
20 |     
21 |     def sample(self, batch_size):
22 |         if self.size > self.max_size:
23 |             del self.buffer[:self.size//5]
24 |             self.size = len(self.buffer)
25 |         
26 |         indices = np.random.randint(0, len(self.buffer), size=batch_size)
27 | 
28 |         states = torch.zeros((batch_size, self.state_dim))
29 |         actions = torch.zeros((batch_size, self.action_dim))
30 |         next_states = torch.zeros((batch_size, self.state_dim))
31 |         rewards = torch.zeros((batch_size, 1))
32 |         dones = torch.zeros((batch_size, 1))
33 |         
34 |         for t, idx in enumerate(indices):
35 |             s, a, r, s_, d = self.buffer[idx]
36 |             states[t] = s
37 |             actions[t] = a
38 |             rewards[t] = r
39 |             next_states[t] = s_
40 |             dones[t] = torch.tensor(d)
41 |         
42 |         return states, actions, rewards, next_states, dones
43 | 


--------------------------------------------------------------------------------
/code/td3.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import torch.optim as optim
  5 | from itertools import chain
  6 | 
  7 | # Implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3)
  8 | # Paper: https://arxiv.org/abs/1802.09477
  9 | # References:
 10 | # 1. https://github.com/sfujim/TD3/blob/master/TD3.py
 11 | # 2. https://github.com/nikhilbarhate99/TD3-PyTorch-BipedalWalker-v2/blob/master/TD3.py
 12 | 
 13 | 
 14 | class Actor(nn.Module):
 15 |     def __init__(self, state_dim, action_dim, max_action):
 16 |         super(Actor, self).__init__()
 17 |         
 18 |         self.l1 = nn.Linear(state_dim, 1024)
 19 |         self.l2 = nn.Linear(1024, 1024)
 20 |         self.l3 = nn.Linear(1024, action_dim)
 21 |         
 22 |         self.max_action = max_action
 23 |         
 24 |     def forward(self, state):
 25 |         a = F.relu(self.l1(state))
 26 |         a = F.relu(self.l2(a))
 27 |         a = self.l3(a)
 28 |         a[:, 0] = torch.tanh(a[:, 0]) * self.max_action
 29 |         a[:, 1] = torch.sigmoid(a[:, 1])
 30 |         a[:, 2] = torch.tanh(a[:, 2]) * self.max_action
 31 |         a[:, 3] = torch.sigmoid(a[:, 3])
 32 |         return a
 33 |         
 34 | class Critic(nn.Module):
 35 |     def __init__(self, state_dim, action_dim):
 36 |         super(Critic, self).__init__()
 37 |         
 38 |         self.l1 = nn.Linear(state_dim + action_dim, 1024)
 39 |         self.l2 = nn.Linear(1024, 1024)
 40 |         self.l3 = nn.Linear(1024, action_dim)
 41 |         
 42 |     def forward(self, state, action):
 43 |         q = F.relu(self.l1(torch.cat([state, action], 1)))
 44 |         q = F.relu(self.l2(q))
 45 |         q = self.l3(q)
 46 |         return q
 47 |     
 48 | class TD3(object):
 49 |     def __init__(self, lr, state_dim, action_dim, max_action, joint_critic_optim: bool = False):
 50 | 
 51 |         self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
 52 |         
 53 |         self.actor = Actor(state_dim, action_dim, max_action).to(self.device)
 54 |         self.actor_target = Actor(state_dim, action_dim, max_action).to(self.device)
 55 |         self.actor_target.load_state_dict(self.actor.state_dict())
 56 |         self.actor_optimizer = optim.Adam(self.actor.parameters(), lr=lr)
 57 |         
 58 |         self.critic_1 = Critic(state_dim, action_dim).to(self.device)
 59 |         self.critic_1_target = Critic(state_dim, action_dim).to(self.device)
 60 |         self.critic_1_target.load_state_dict(self.critic_1.state_dict())
 61 |         self.critic_1_optimizer = optim.Adam(self.critic_1.parameters(), lr=lr)
 62 |         
 63 |         self.critic_2 = Critic(state_dim, action_dim).to(self.device)
 64 |         self.critic_2_target = Critic(state_dim, action_dim).to(self.device)
 65 |         self.critic_2_target.load_state_dict(self.critic_2.state_dict())
 66 |         self.critic_2_optimizer = optim.Adam(self.critic_2.parameters(), lr=lr)
 67 | 
 68 |         self.joint_critic_optim = joint_critic_optim
 69 |         if joint_critic_optim:
 70 |             self.critic_optim = optim.Adam(chain(self.critic_1.parameters(), self.critic_2.parameters()), lr=lr)
 71 | 
 72 |         self.max_action = max_action
 73 |         self.critic_loss = F.mse_loss
 74 |     
 75 |     def select_action(self, state):
 76 |         state = state.reshape(1, -1).to(self.device)
 77 |         return self.actor(state).cpu().data.flatten()
 78 |     
 79 |     def update(self, replay_buffer, n_iter, batch_size, gamma, tau, policy_noise, policy_delay):
 80 |         
 81 |         for idx in range(n_iter):
 82 |             # Sample a batch of transitions from the replay buffer
 83 |             state, action_, reward, next_state, done = replay_buffer.sample(batch_size)
 84 |             state = state.to(self.device)
 85 |             action = action_.to(self.device)
 86 |             reward = reward.to(self.device)
 87 |             next_state = next_state.to(self.device)
 88 |             done = done.to(self.device)
 89 |             
 90 |             # Select next action according to target policy
 91 |             noise = torch.normal(0, policy_noise, action.shape).to(self.device)
 92 |             next_action = (self.actor_target(next_state) + noise)
 93 |             next_action[:, 0] = next_action[:, 0].clamp(-self.max_action, self.max_action)
 94 |             next_action[:, 2] = next_action[:, 2].clamp(-self.max_action, self.max_action)
 95 |             
 96 |             # Compute target Q-value
 97 |             target_Q1 = self.critic_1_target(next_state, next_action)
 98 |             target_Q2 = self.critic_2_target(next_state, next_action)
 99 |             target_Q = torch.min(target_Q1, target_Q2)
100 |             target_Q = reward + ((1-done) * gamma * target_Q).detach()
101 | 
102 |             # Optimize the critic networks
103 |             if not self.joint_critic_optim:
104 |                 current_Q1 = self.critic_1(state, action)
105 |                 loss_Q1 = self.critic_loss(current_Q1, target_Q)
106 |                 self.critic_1_optimizer.zero_grad()
107 |                 loss_Q1.backward()
108 |                 self.critic_1_optimizer.step()
109 |                 
110 |                 current_Q2 = self.critic_2(state, action)
111 |                 loss_Q2 = self.critic_loss(current_Q2, target_Q)
112 |                 self.critic_2_optimizer.zero_grad()
113 |                 loss_Q2.backward()
114 |                 self.critic_2_optimizer.step()
115 |             
116 |             else: # optimization as in the original implementation
117 |                 current_Q1 = self.critic_1(state, action)
118 |                 current_Q2 = self.critic_2(state, action)
119 |                 loss_Q = self.critic_loss(current_Q1, target_Q) + self.critic_loss(current_Q2, target_Q)
120 |                 self.critic_optim.zero_grad()
121 |                 loss_Q.backward()
122 |                 self.critic_optim.step()
123 | 
124 |             # Delayed policy updates
125 |             if idx % policy_delay == 0:
126 |                 # Compute actor loss
127 |                 actor_loss = -self.critic_1(state, self.actor(state)).mean()
128 |                 
129 |                 # Optimize the actor
130 |                 self.actor_optimizer.zero_grad()
131 |                 actor_loss.backward()
132 |                 self.actor_optimizer.step()
133 |                 
134 |                 # Update the frozen target models
135 |                 for param, target_param in zip(self.actor.parameters(), self.actor_target.parameters()):
136 |                     target_param.data.copy_(tau * target_param.data + (1-tau) * param.data)
137 |                 
138 |                 for param, target_param in zip(self.critic_1.parameters(), self.critic_1_target.parameters()):
139 |                     target_param.data.copy_(tau * target_param.data + (1-tau) * param.data)
140 |                 
141 |                 for param, target_param in zip(self.critic_2.parameters(), self.critic_2_target.parameters()):
142 |                     target_param.data.copy_(tau * target_param.data + (1-tau) * param.data)
143 |                     
144 |                 
145 |     def save(self, directory, name):
146 |         torch.save(self.actor.state_dict(), f'{directory}/{name}_actor.pth')
147 |         torch.save(self.actor_target.state_dict(), f'{directory}/{name}_actor_target.pth')
148 |         
149 |         torch.save(self.critic_1.state_dict(), f'{directory}/{name}_crtic_1.pth')
150 |         torch.save(self.critic_1_target.state_dict(), f'{directory}/{name}_critic_1_target.pth')
151 |         
152 |         torch.save(self.critic_2.state_dict(), f'{directory}/{name}_crtic_2.pth')
153 |         torch.save(self.critic_2_target.state_dict(), f'{directory}/{name}_critic_2_target.pth')
154 |         
155 |     def load(self, directory, name):
156 |         self.load_actor(directory, name)
157 |         
158 |         self.critic_1.load_state_dict(torch.load(f'{directory}/{name}_crtic_1.pth'))
159 |         self.critic_1_target.load_state_dict(torch.load(f'{directory}/{name}_critic_1_target.pth'))
160 |         
161 |         self.critic_2.load_state_dict(torch.load(f'{directory}/{name}_crtic_2.pth'))
162 |         self.critic_2_target.load_state_dict(torch.load(f'{directory}/{name}_critic_2_target.pth'))
163 |         
164 |         
165 |     def load_actor(self, directory, name):
166 |         self.actor.load_state_dict(torch.load(f'{directory}/{name}_actor.pth'))
167 |         self.actor_target.load_state_dict(torch.load(f'{directory}/{name}_actor_target.pth'))
168 | 


--------------------------------------------------------------------------------
/code/train.py:
--------------------------------------------------------------------------------
  1 | # header files
  2 | import gym
  3 | import math
  4 | import random
  5 | import numpy as np
  6 | import matplotlib
  7 | import matplotlib.pyplot as plt
  8 | from collections import namedtuple
  9 | from itertools import count
 10 | from PIL import Image
 11 | from gym.envs.registration import register
 12 | import logging
 13 | import torch
 14 | import torch.nn as nn
 15 | import torch.optim as optim
 16 | import torch.nn.functional as F
 17 | import torchvision.transforms as T
 18 | from td3 import *
 19 | from replay_buffer import *
 20 | 
 21 | 
 22 | logger = logging.getLogger(__name__)
 23 | register(
 24 |     id='RobotTargetTrackingEnv-v0',
 25 |     entry_point='envs:RobotTargetTrackingEnvInterface',
 26 | )
 27 | 
 28 | # initialise environment
 29 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 30 | env_name = "RobotTargetTrackingEnv-v0"
 31 | env = gym.make(env_name)
 32 | env.env_parametrization()
 33 | env.reset()
 34 | 
 35 | # constants
 36 | lr = 0.0005
 37 | epochs = 1000
 38 | iters = 100
 39 | 
 40 | # create TD3 object
 41 | state_dim = env.observation_space.shape[0]
 42 | action_dim = env.action_space.shape[0]
 43 | max_action = float(env.action_space.high[0])
 44 | policy = TD3(lr, state_dim, 4, max_action)
 45 | replay_buffer = ReplayBuffer(state_dim=state_dim, action_dim=4)
 46 | mean_reward, ep_reward = 0, 0
 47 | 
 48 | # training
 49 | e = []
 50 | r = []
 51 | m_e = []
 52 | m_r = []
 53 | g_e = []
 54 | g_r = []
 55 | mean_reward = 0
 56 | for epoch in range(0, epochs):
 57 |     state, _, _, _, _ = env.reset()
 58 |     for iter in range(0, iters):
 59 |         action_1, step_size_1, action_2, step_size_2 = policy.select_action(state) + torch.normal(0, 0.1, size=env.action_space.shape)
 60 |         action_1 = action_1.clamp(env.action_space.low.item(), env.action_space.high.item())
 61 |         action_2 = action_2.clamp(env.action_space.low.item(), env.action_space.high.item())
 62 | 
 63 |         next_state, reward, done, _, _ = env.step([action_1, action_2], [step_size_1, step_size_2])
 64 |         replay_buffer.add((state, torch.tensor([action_1, step_size_1, action_2, step_size_2]), reward, next_state, np.float(done)))
 65 |         state = next_state
 66 | 
 67 |         mean_reward += reward
 68 |         ep_reward += reward
 69 |         if done:
 70 |             break
 71 | 
 72 |     # update policy
 73 |     policy.update(replay_buffer, iter, 100, 0.99, 0.99, 0.2, 2)
 74 | 
 75 |     # save actor-critic models
 76 |     if(epoch>100 and epoch%10==0):
 77 |         policy.save("/home/arpitdec5/Desktop/robot_target_tracking/", "sample_1_model_sensors_2_targets_2")
 78 | 
 79 |     # print reward
 80 |     print()
 81 |     print("Epoch: " + str(epoch))
 82 |     print("Reward: " + str(ep_reward))
 83 |     print()
 84 |     e.append(epoch)
 85 |     r.append(ep_reward)
 86 |     ep_reward = 0
 87 | 
 88 |     # mean reward
 89 |     if(epoch%50==0 and epoch>0):
 90 |         m_e.append(epoch)
 91 |         m_r.append(mean_reward/50.0)
 92 |         mean_reward = 0
 93 | 
 94 | 
 95 | # plot epoch vs reward curve
 96 | plt.xlabel("Episodes")
 97 | plt.ylabel("Reward")
 98 | plt.ylim(0, 1000)
 99 | plt.plot(e, r, c='blue', label='Cumulative Reward')
100 | plt.plot(m_e, m_r, c='orange', label='Mean Reward')
101 | #plt.plot(g_e, g_r, c='red', label='Greedy Algorithm')
102 | plt.legend()
103 | plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/sample_1_reward_sensors_2_targets_2.png")
104 | 


--------------------------------------------------------------------------------
/code/utils.py:
--------------------------------------------------------------------------------
  1 | # header files needed
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import matplotlib.cm as cm
  5 | from scipy.stats import multivariate_normal
  6 | from matplotlib.patches import Ellipse
  7 | import matplotlib.transforms as transforms
  8 | from math import pi, cos, sin
  9 | from random import random
 10 | import math
 11 | import torch
 12 | 
 13 | 
 14 | def render_ekf(estimated_targets_mean_1, estimated_targets_mean_2, estimated_targets_var_1, estimated_targets_var_2, time_step, x_list_1, y_list_1, x_list_2, y_list_2, robot_movement_x, robot_movement_y):
 15 |     """ 
 16 |         Function for rendering the environment
 17 |     """
 18 |     heatmap = torch.zeros(256, 256)
 19 |     for index in range(0, 1):
 20 |        x = np.linspace(0, 20, 256)
 21 |        y = np.linspace(0, 20, 256)
 22 |        X, Y = np.meshgrid(x, y)
 23 |        pos = np.empty(X.shape + (2,))
 24 |        pos[:, :, 0] = X; pos[:, :, 1] = Y
 25 |        rv_1 = multivariate_normal(estimated_targets_mean_1, estimated_targets_var_1)
 26 |        rv_2 = multivariate_normal(estimated_targets_mean_2, estimated_targets_var_2)
 27 |        heatmap += rv_1.pdf(pos)
 28 |        heatmap += rv_2.pdf(pos)
 29 |     x = np.linspace(0, 20, 256)
 30 |     y = np.linspace(0, 20, 256)
 31 |     X, Y = np.meshgrid(x, y)
 32 |     plt.cla()
 33 |     plt.title("Time step = " + str(time_step))
 34 |     plt.xlabel("x")
 35 |     plt.ylabel("y")
 36 |     plt.xlim(0, 20)
 37 |     plt.ylim(0, 20)
 38 |     plt.contourf(X, Y, heatmap, cmap=cm.inferno)
 39 |     plt.plot(x_list_1, y_list_1, 'b--')
 40 |     plt.plot(x_list_2, y_list_2, 'b--')
 41 |     plt.plot(x_list_1[len(x_list_1) - 1], y_list_1[len(y_list_1) - 1], 'o', c='b', marker='*')
 42 |     plt.plot(x_list_2[len(x_list_2) - 1], y_list_2[len(y_list_2) - 1], 'o', c='b', marker='*')
 43 |     #plt.plot(self.x2_list, self.y2_list, 'b--')
 44 |     #plt.plot(self.x2_list[len(self.x2_list) - 1], self.y2_list[len(self.y2_list) - 1], 'o', c='b', marker='*')
 45 |     #plt.plot(self.x3_list, self.y3_list, 'b--')
 46 |     #plt.plot(self.x3_list[len(self.x3_list) - 1], self.y3_list[len(self.y3_list) - 1], 'o', c='b', marker='*')
 47 |     #plt.plot(self.x4_list, self.y4_list, 'b--')
 48 |     #plt.plot(self.x4_list[len(self.x4_list) - 1], self.y4_list[len(self.y4_list) - 1], 'o', c='b', marker='*')
 49 |     if(len(robot_movement_x) < 8):
 50 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
 51 |     else:
 52 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
 53 |     plt.scatter(robot_movement_x[len(robot_movement_x) - 1], robot_movement_y[len(robot_movement_y) - 1], color='r', marker='D')
 54 |     plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s1/" + str(time_step) + ".png")
 55 |     #plt.show()
 56 | 
 57 | 
 58 | # plot the heatmap
 59 | def render(t, x_mesh, y_mesh, belief_map, x_target, y_target, robot_movement_x, robot_movement_y):
 60 |     """
 61 |         Inputs:
 62 |         t: time step
 63 |         x_mesh: the x-coordinates
 64 |         y_mesh: the y-coordinates
 65 |         belief_map: the map containing the probabilities
 66 |         x_target: the target x-coordinate
 67 |         y_target: the target y-coordinate
 68 |         robot_movement_x: the list of robot paths x-coordinates
 69 |         robot_movement_y: the list of robot paths y-coordinates
 70 |     """
 71 |     plt.cla()
 72 |     plt.title("Stationary robot(Target moving fast)")
 73 |     plt.xlabel("x")
 74 |     plt.ylabel("y")
 75 |     plt.contourf(x_mesh, y_mesh, belief_map, cmap=cm.inferno)
 76 |     plt.plot(x_target, y_target, 'b--', label='true target motion')
 77 |     plt.plot(x_target[len(x_target) - 1], y_target[len(y_target) - 1], 'o', c='b', marker='*')
 78 |     if(len(robot_movement_x) < 8):
 79 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
 80 |     else:
 81 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
 82 |     plt.scatter(robot_movement_x[len(robot_movement_x) - 1], robot_movement_y[len(robot_movement_y) - 1], color='r', marker='D', label='robot')
 83 |     plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s1/" + str(t) + ".png")
 84 |     #plt.show()
 85 | 
 86 | 
 87 | # compute bayesian histogram for 'm' targets and given robot position
 88 | def compute_bayesian_histogram(targets_x_true, targets_y_true, robot_x, robot_y, belief_map_height, belief_map_width, stepsize_map, sigma_bayesian_hist=1.0):
 89 |     """
 90 |         Inputs:
 91 |         targets_x_true: the true position of target x-coordinate
 92 |         targets_y_true: the true position of target y-coordinate
 93 |         robot_x: the position of robot x-coordinate
 94 |         robot_y: the position of robot y-coordinate 
 95 |         belief_map_height: the environment dimensions, height
 96 |         belief_map_width: the environment dimensions, width
 97 |         stepsize_map: equal to 0.1
 98 |         sigma_bayesian_hist: equal to 1
 99 | 
100 |         Outputs:
101 |         bayesian_hist: the belief map of dimensions (belief_map_height, belief_map_width) containing probabilities      
102 |     """
103 |     noise = sigma_bayesian_hist * np.random.randn(1, 1)
104 |     bayesian_hist = np.zeros((belief_map_height, belief_map_width))
105 |     for index in range(0, len(targets_x_true)):
106 |         estimated = np.sqrt((targets_x_true[index] - robot_x)**2 + (targets_y_true[index] - robot_y)**2) + noise[0][0]
107 |         for index1 in range(0, belief_map_height):
108 |             for index2 in range(0, belief_map_width):
109 |                 true = np.sqrt(((index1*stepsize_map) - robot_x)**2 + ((index2*stepsize_map) - robot_y)**2)
110 |                 bayesian_hist[index1, index2] += 1.0 / (np.sqrt(2 * np.pi * sigma_bayesian_hist**2)) * np.exp(-0.5 / sigma_bayesian_hist**2 * (np.abs(true - estimated)**2))
111 |     return bayesian_hist
112 | 
113 | 
114 | # get target estimate
115 | def get_target_position(t, x_true, y_true):
116 |     """
117 |         Inputs:
118 |         t: time step
119 |         x_true: the true position of target x-coordinate
120 |         y_true: the true position of target y-coordinate
121 | 
122 |         Outputs:
123 |         (x_true, y_true): the target position at next time step      
124 |     """
125 |     omega = 33
126 |     x_true = 2*np.cos((t-1) / omega) + 10
127 |     y_true = 2*np.sin((t-1) / omega) + 12
128 |     return (x_true, y_true)
129 | 
130 | 
131 | # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
132 | def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
133 |     """
134 |         Inputs:
135 |         x: the x-coordinate datapoints
136 |         y: the y-coordinate datapoints
137 |     """
138 |     if x.size != y.size:
139 |         raise ValueError("x and y must be the same size")
140 | 
141 |     cov = np.cov(x, y)
142 |     pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
143 |     # Using a special case to obtain the eigenvalues of this
144 |     # two-dimensionl dataset.
145 |     ell_radius_x = np.sqrt(1 + pearson)
146 |     ell_radius_y = np.sqrt(1 - pearson)
147 |     ellipse = Ellipse((0, 0),
148 |         width=ell_radius_x * 2,
149 |         height=ell_radius_y * 2,
150 |         facecolor=facecolor,
151 |         **kwargs)
152 | 
153 |     # Calculating the stdandard deviation of x from
154 |     # the squareroot of the variance and multiplying
155 |     # with the given number of standard deviations.
156 |     scale_x = np.sqrt(cov[0, 0]) * n_std
157 |     mean_x = np.mean(x)
158 | 
159 |     # calculating the stdandard deviation of y ...
160 |     scale_y = np.sqrt(cov[1, 1]) * n_std
161 |     mean_y = np.mean(y)
162 | 
163 |     transf = transforms.Affine2D() \
164 |         .rotate_deg(45) \
165 |         .scale(scale_x, scale_y) \
166 |         .translate(mean_x, mean_y)
167 | 
168 |     ellipse.set_transform(transf + ax.transData)
169 |     return ax.add_patch(ellipse)
170 | 
171 | 
172 | # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
173 | def get_correlated_dataset(n, dependency, mu, scale):
174 |     latent = np.random.randn(n, 2)
175 |     dependent = latent.dot(dependency)
176 |     scaled = dependent * scale
177 |     scaled_with_offset = scaled + mu
178 |     # return x and y of the new, correlated dataset
179 |     return scaled_with_offset[:, 0], scaled_with_offset[:, 1]
180 | 
181 | 
182 | # code for the filter
183 | def extended_kalman_filter(target_xhat_t, target_yhat_t, target_sigma_t, robots_x, robots_y, robots_id, t, x, y, flag, omega, radii):
184 |     """
185 |         Inputs:
186 |         target_xhat_t: the estimated target position x-coordinate
187 |         target_yhat_t: the estimated target position y-coordinate
188 |         robots_x: the position of robots x-coordinate
189 |         robots_y: the position of robots y-coordinate
190 |         robots_id: the ids of robots
191 |         t: the time step
192 | 
193 |         Outputs:
194 |         (target_xhat_tplus1, target_yhat_tplus1, sigma_matrix_tplus1, x_true, y_true): the predicted target position      
195 |     """
196 |     # get z_true using true target motion
197 |     sigma_z = 1.0
198 |     x_true = radii*np.cos((t-1)/omega)+x
199 |     if flag:
200 |         y_true = radii*np.sin((t-1)/omega)+y
201 |     else:
202 |         y_true = -radii*np.sin((t-1) / omega)+y
203 |     noise = sigma_z*np.random.randn(1000, 1)
204 |         
205 |     z_true = np.zeros((len(robots_x), 1))
206 |     for index in range(0, len(robots_x)):
207 |         z_true[index][0] = np.linalg.norm([[robots_x[index] - x_true], [robots_y[index] - y_true]], 2) + noise[robots_id[index]]
208 | 
209 |     # filter code
210 |     q_matrix = 0.2 * np.eye(2)
211 |     x_matrix = np.array([[target_xhat_t], [target_yhat_t]])
212 |     sigma_matrix = target_sigma_t + q_matrix
213 |     
214 |     z_pred = np.zeros((len(robots_x), 1))
215 |     h_matrix = np.zeros((len(robots_x), 2))
216 |     for index in range(0, len(robots_x)):
217 |         z_pred[index][0] = np.linalg.norm([[x_matrix[0][0] - robots_x[index]], [x_matrix[1][0] - robots_y[index]]], 2)
218 |         h_matrix[index][0] = (-1.0 / z_pred[index]) * (robots_x[index] - x_matrix[0][0])
219 |         h_matrix[index][1] = (-1.0 / z_pred[index]) * (robots_y[index] - x_matrix[1][0])
220 |         
221 |     res = (z_true - z_pred)
222 |     r_matrix = sigma_z * sigma_z * np.eye(len(robots_x))
223 |     s_matrix = np.matmul(np.matmul(h_matrix, sigma_matrix), h_matrix.T) + r_matrix
224 |     k_matrix = np.matmul(np.matmul(sigma_matrix, h_matrix.T), np.linalg.inv(s_matrix))    
225 | 
226 |     x_matrix_tplus1 = x_matrix + (np.matmul(k_matrix, res))
227 |     sigma_matrix_tplus1 = np.matmul(np.matmul((np.eye(2) - np.matmul(k_matrix, h_matrix)), sigma_matrix), (np.eye(2) - np.matmul(k_matrix, h_matrix)).T) + np.matmul(np.matmul(k_matrix, r_matrix), k_matrix.T)
228 |     target_xhat_tplus1 = x_matrix_tplus1[0][0]
229 |     target_yhat_tplus1 = x_matrix_tplus1[1][0]
230 |     return (target_xhat_tplus1, target_yhat_tplus1, sigma_matrix_tplus1, x_true, y_true)
231 | 
232 | 
233 | # plot gaussian
234 | def plot_gaussian(gauss):
235 |     x, y = np.mgrid[0:25:100j, 0:25:100j]
236 |     z = np.dstack((x, y))
237 |     plt.contourf(x, y, gauss.pdf(z))
238 |     #plt.show()
239 | 
240 | 
241 | # save gaussian
242 | def save_gaussian(gauss, path):
243 |     x, y = np.mgrid[0:25:100j, 0:25:100j]
244 |     z = np.dstack((x, y))
245 |     plt.contourf(x, y, gauss.pdf(z))
246 |     plt.savefig(path)
247 | 
248 | 
249 | # plot confidence ellipse
250 | def plot_ellipse(x, y, mean, x_list, y_list, target_x_mean, target_y_mean, path, robot_x, robot_y, robot_movement_x, robot_movement_y):
251 |     fig, ax_nstd = plt.subplots(figsize=(6, 6))
252 |     ax_nstd.axvline(c='grey', lw=1)
253 |     ax_nstd.axhline(c='grey', lw=1)
254 |     confidence_ellipse(x, y, ax_nstd, n_std=1, edgecolor='firebrick')
255 |     ax_nstd.scatter(mean[0], mean[1], c='b', s=1)
256 |     ax_nstd.legend()
257 |     plt.title("Greedy algorithm using EKF(Target moving slow)")
258 |     plt.xlabel("x")
259 |     plt.ylabel("y")
260 |     plt.xlim(0, 20)
261 |     plt.ylim(0, 20)
262 |     plt.plot(x_list, y_list, 'b--')
263 |     plt.scatter(x_list[len(x_list)-1], y_list[len(y_list)-1], color='b', marker='*')
264 |     plt.scatter([target_x_mean], [target_y_mean], color='b', marker="s")
265 |     if(len(robot_movement_x) < 8):
266 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
267 |     else:
268 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
269 |     plt.scatter(robot_x, robot_y, color='r', marker='D')    
270 |     plt.plot([robot_x, target_x_mean], [robot_y, target_y_mean], color='r')
271 |     plt.savefig(path)
272 |     plt.cla()
273 |     plt.close()
274 | 
275 | 
276 | def points_in_circle_np(radius, x0=0, y0=0):
277 |     """
278 |         Inputs:
279 |         radius: the radius of the circular region around the current robot position
280 |         x0: the x-coordinate of the robot position
281 |         y0: the y-coordinate of the robot position
282 | 
283 |         Outputs: 
284 |         points: the action set to be used for deciding the robot trajectory      
285 |     """
286 |     thetas = [0.0, 15.0, 30.0, 45.0, 60.0, 75.0, 90.0, 105.0, 120.0, 135.0, 150.0, 165.0, 180.0, 195.0, 210.0, 225.0, 240.0, 255.0, 270.0, 285.0, 300.0, 315.0, 330.0]
287 |     points = []
288 |     for theta in thetas:
289 |         theta = (theta * pi) / 180.0
290 |         points.append((float(x0 + cos(theta) * radius), float(y0 + sin(theta) * radius)))
291 |     return points
292 | 
293 | 
294 | # choose optimal action
295 | def update_robot_pos_ekf(robot_x, robot_y, target_x, target_y, var, prev_target_x, prev_target_y, radius, map_height, map_width, t, prev_robot_x, prev_robot_y):
296 |     """
297 |         Inputs:
298 |         robot_x: the position of robot x-coordinate
299 |         robot_y: the position of robot y-coordinate 
300 |         target_x: the position of target x-coordinate
301 |         target_y: the position of target y-coordinate
302 |         var: the uncertainty in target position
303 |         prev_target_x: the previous position of target x-coordinate
304 |         prev_target_y: the previous position of target y-coordinate
305 |         radius: the radius of the circular region around the current robot position
306 |         map_height: the environment dimensions, height
307 |         map_width: the environment dimensions, width
308 |         t: the time step
309 | 
310 |         Outputs:
311 |         (best_action[0], best_action[1]): updated robot position     
312 |     """
313 |     action_set = points_in_circle_np(radius, robot_x, robot_y)
314 |     alpha_opt = -1000000
315 |     best_action = (robot_x, robot_y)
316 |     for action in action_set:
317 |         curr_robot_x = action[0]
318 |         curr_robot_y = action[1] 
319 |         val = 0
320 |         for index in range(0, len(target_x)):
321 |             if(curr_robot_x >= 1 and curr_robot_x <= (map_height-1) and curr_robot_y >= 1 and curr_robot_y <= (map_width-1)):
322 |                 t_v_i = ((robot_x - target_x[index])*(robot_x - target_x[index])) + ((robot_y - target_y[index])*(robot_y - target_y[index]))
323 |                 t_v_j = ((curr_robot_x - target_x[index])*(curr_robot_x - target_x[index])) + ((curr_robot_y - target_y[index])*(curr_robot_y - target_y[index]))
324 |                 t_m1 = (target_y[index] - robot_y) / (target_x[index] - robot_x + 1e-9)
325 |                 t_m2 = (target_y[index] - curr_robot_y) / (target_x[index] - curr_robot_x + 1e-9)
326 |                 value = np.abs(t_m1*t_m2 + 1)
327 |                 t_angle1 = np.arctan2(t_m1-t_m2, 1+t_m1*t_m2)
328 |                 t_angle2 = np.arctan2(t_m2-t_m1, 1+t_m1*t_m2)
329 |                 t_val = max((t_v_i*t_v_j*np.sin(t_angle1)*np.sin(t_angle1)), (t_v_i*t_v_j*np.sin(t_angle2)*np.sin(t_angle2)))
330 |                 t1_v_i = ((prev_robot_x - prev_target_x[index])*(prev_robot_x - prev_target_x[index])) + ((prev_robot_y - prev_target_y[index])*(prev_robot_y - prev_target_y[index]))
331 |                 t1_v_j = ((robot_x - prev_target_x[index])*(robot_x - prev_target_x[index])) + ((robot_y - prev_target_y[index])*(robot_y - prev_target_y[index]))
332 |                 t1_m1 = (prev_target_y[index] - prev_robot_y) / (prev_target_x[index] - prev_robot_x + 1e-9)
333 |                 t1_m2 = (prev_target_y[index] - robot_y) / (prev_target_x[index] - robot_x + 1e-9)
334 |                 t1_angle1 = np.arctan2(t1_m1-t1_m2, 1+t1_m1*t1_m2)
335 |                 t1_angle2 = np.arctan2(t1_m2-t1_m1, 1+t1_m1*t1_m2)
336 |                 t1_val = max((t1_v_i*t1_v_j*np.sin(t1_angle1)*np.sin(t1_angle1)), (t1_v_i*t1_v_j*np.sin(t1_angle2)*np.sin(t1_angle2)))
337 |                 t2_v_i = ((prev_robot_x - prev_target_x[index])*(prev_robot_x - prev_target_x[index])) + ((prev_robot_y - prev_target_y[index])*(prev_robot_y - prev_target_y[index]))
338 |                 t2_v_j = ((curr_robot_x - prev_target_x[index])*(curr_robot_x - prev_target_x[index])) + ((curr_robot_y - prev_target_y[index])*(curr_robot_y - prev_target_y[index]))
339 |                 t2_m1 = (prev_target_y[index] - prev_robot_y) / (prev_target_x[index] - prev_robot_x + 1e-9)
340 |                 t2_m2 = (prev_target_y[index] - curr_robot_y) / (prev_target_x[index] - curr_robot_x + 1e-9)
341 |                 t2_angle1 = np.arctan2(t2_m1-t2_m2, 1+t2_m1*t2_m2)
342 |                 t2_angle2 = np.arctan2(t2_m2-t2_m1, 1+t2_m1*t2_m2)
343 |                 t2_val = max((t2_v_i*t2_v_j*np.sin(t2_angle1)*np.sin(t2_angle1)), (t2_v_i*t2_v_j*np.sin(t2_angle2)*np.sin(t2_angle2)))
344 |                 val += t1_val + t2_val + t_val
345 |         if(val > alpha_opt):
346 |            alpha_opt = val
347 |            best_action = action
348 |     return (best_action[0], best_action[1], 0)
349 | 
350 | 
351 | # choose optimal action
352 | def update_robot_pos(robot_x, robot_y, target_x, target_y, prev_target_x, prev_target_y, radius, map_height, map_width, prev_robot_x, prev_robot_y):
353 |     """
354 |         Inputs:
355 |         robot_x: the position of robot x-coordinate
356 |         robot_y: the position of robot y-coordinate 
357 |         target_x: the position of target x-coordinate
358 |         target_y: the position of target y-coordinate
359 |         prev_target_x: the previous position of target x-coordinate
360 |         prev_target_y: the previous position of target y-coordinate
361 |         radius: the radius of the circular region around the current robot position
362 |         map_height: the environment dimensions, height
363 |         map_width: the environment dimensions, width
364 | 
365 |         Outputs:
366 |         (best_action[0], best_action[1]): updated robot position
367 |     """
368 |     action_set = points_in_circle_np(radius, robot_x, robot_y)
369 |     alpha_opt = -10000000
370 |     dist_opt = 10000000
371 |     best_action = (0, 0)
372 |     for action in action_set:
373 |         curr_robot_x = action[0]
374 |         curr_robot_y = action[1]
375 |         if(curr_robot_x >= 1 and curr_robot_x < (map_height-1) and curr_robot_y >= 1 and curr_robot_y < (map_width-1)):
376 |             m1 = (prev_target_y - robot_y) / (prev_target_x - robot_x + 1e-8)
377 |             m2 = (target_y - curr_robot_y) / (target_x - curr_robot_x + 1e-8)
378 |             t_v_i = ((robot_x - target_x)*(robot_x - target_x)) + ((robot_y - target_y)*(robot_y - target_y))
379 |             t_v_j = ((curr_robot_x - target_x)*(curr_robot_x - target_x)) + ((curr_robot_y - target_y)*(curr_robot_y - target_y))
380 |             t_m1 = (target_y - robot_y) / (target_x - robot_x + 1e-9)
381 |             t_m2 = (target_y - curr_robot_y) / (target_x - curr_robot_x + 1e-9)
382 |             t_angle1 = np.arctan2(t_m1-t_m2, 1+t_m1*t_m2)
383 |             t_angle2 = np.arctan2(t_m2-t_m1, 1+t_m1*t_m2)
384 |             t_val = max((t_v_i*t_v_j*np.sin(t_angle1)*np.sin(t_angle1)), (t_v_i*t_v_j*np.sin(t_angle2)*np.sin(t_angle2)))
385 |             t1_v_i = ((prev_robot_x - prev_target_x)*(prev_robot_x - prev_target_x)) + ((prev_robot_y - prev_target_y)*(prev_robot_y - prev_target_y))
386 |             t1_v_j = ((robot_x - prev_target_x)*(robot_x - prev_target_x)) + ((robot_y - prev_target_y)*(robot_y - prev_target_y))
387 |             t1_m1 = (prev_target_y - prev_robot_y) / (prev_target_x - prev_robot_x + 1e-9)
388 |             t1_m2 = (prev_target_y - robot_y) / (prev_target_x - robot_x + 1e-9)
389 |             t1_angle1 = np.arctan2(t1_m1-t1_m2, 1+t1_m1*t1_m2)
390 |             t1_angle2 = np.arctan2(t1_m2-t1_m1, 1+t1_m1*t1_m2)
391 |             t1_val = max((t1_v_i*t1_v_j*np.sin(t1_angle1)*np.sin(t1_angle1)), (t1_v_i*t1_v_j*np.sin(t1_angle2)*np.sin(t1_angle2)))
392 |             t2_v_i = ((prev_robot_x - prev_target_x)*(prev_robot_x - prev_target_x)) + ((prev_robot_y - prev_target_y)*(prev_robot_y - prev_target_y))
393 |             t2_v_j = ((curr_robot_x - prev_target_x)*(curr_robot_x - prev_target_x)) + ((curr_robot_y - prev_target_y)*(curr_robot_y - prev_target_y))
394 |             t2_m1 = (prev_target_y - prev_robot_y) / (prev_target_x - prev_robot_x + 1e-9)
395 |             t2_m2 = (prev_target_y - curr_robot_y) / (prev_target_x - curr_robot_x + 1e-9)
396 |             t2_angle1 = np.arctan2(t2_m1-t2_m2, 1+t2_m1*t2_m2)
397 |             t2_angle2 = np.arctan2(t2_m2-t2_m1, 1+t2_m1*t2_m2)
398 |             t2_val = max((t2_v_i*t2_v_j*np.sin(t2_angle1)*np.sin(t2_angle1)), (t2_v_i*t2_v_j*np.sin(t2_angle2)*np.sin(t2_angle2)))
399 |             val = t1_val + t2_val + t_val
400 |             if(val > alpha_opt):
401 |                 alpha_opt = val
402 |                 best_action = action
403 |     return (best_action[0], best_action[1])
404 | 


--------------------------------------------------------------------------------
/outputs/7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/7.png


--------------------------------------------------------------------------------
/outputs/case1_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_1.jpg


--------------------------------------------------------------------------------
/outputs/case1_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_1.png


--------------------------------------------------------------------------------
/outputs/case1_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_2.jpg


--------------------------------------------------------------------------------
/outputs/case1_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_2.png


--------------------------------------------------------------------------------
/outputs/case1_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_3.png


--------------------------------------------------------------------------------
/outputs/case1_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case1_4.png


--------------------------------------------------------------------------------
/outputs/case2_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_1.jpg


--------------------------------------------------------------------------------
/outputs/case2_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_1.png


--------------------------------------------------------------------------------
/outputs/case2_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_2.jpg


--------------------------------------------------------------------------------
/outputs/case2_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_2.png


--------------------------------------------------------------------------------
/outputs/case2_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_3.png


--------------------------------------------------------------------------------
/outputs/case2_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case2_4.png


--------------------------------------------------------------------------------
/outputs/case3_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_1.jpg


--------------------------------------------------------------------------------
/outputs/case3_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_1.png


--------------------------------------------------------------------------------
/outputs/case3_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_2.jpg


--------------------------------------------------------------------------------
/outputs/case3_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_2.png


--------------------------------------------------------------------------------
/outputs/case3_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_3.png


--------------------------------------------------------------------------------
/outputs/case3_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case3_4.png


--------------------------------------------------------------------------------
/outputs/case4_1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_1.jpg


--------------------------------------------------------------------------------
/outputs/case4_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_1.png


--------------------------------------------------------------------------------
/outputs/case4_2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_2.jpg


--------------------------------------------------------------------------------
/outputs/case4_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_2.png


--------------------------------------------------------------------------------
/outputs/case4_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_3.png


--------------------------------------------------------------------------------
/outputs/case4_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/case4_4.png


--------------------------------------------------------------------------------
/outputs/greedy_reward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/greedy_reward.png


--------------------------------------------------------------------------------
/outputs/qlearning_reward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/qlearning_reward.png


--------------------------------------------------------------------------------
/outputs/reward.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward.png


--------------------------------------------------------------------------------
/outputs/reward_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward_1.png


--------------------------------------------------------------------------------
/outputs/reward_sensors_1_targets_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward_sensors_1_targets_1.png


--------------------------------------------------------------------------------
/outputs/reward_sensors_1_targets_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward_sensors_1_targets_2.png


--------------------------------------------------------------------------------
/outputs/reward_sensors_1_targets_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward_sensors_1_targets_3.png


--------------------------------------------------------------------------------
/outputs/reward_sensors_1_targets_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/reward_sensors_1_targets_4.png


--------------------------------------------------------------------------------
/outputs/sample.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/sample.png


--------------------------------------------------------------------------------
/outputs/sample1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/outputs/sample1.png


--------------------------------------------------------------------------------
/scripts/__pycache__/utils.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arp95/robot_target_tracking/662f2559893866c73717874ad8e2e5cfee78c731/scripts/__pycache__/utils.cpython-37.pyc


--------------------------------------------------------------------------------
/scripts/bh_main.py:
--------------------------------------------------------------------------------
 1 | # header files needed
 2 | import numpy as np
 3 | import matplotlib.pyplot as plt
 4 | import matplotlib.cm as cm
 5 | from scipy.stats import multivariate_normal
 6 | from matplotlib.patches import Ellipse
 7 | from utils import *
 8 | 
 9 | 
10 | # user-defined variables
11 | x_true = 12
12 | y_true = 12
13 | robots_x = [1]
14 | robots_y = [1]
15 | robots_id = [1]
16 | map_height = 20
17 | map_width = 20
18 | stepsize_map = 0.1
19 | sigma_bayesian_hist = 1.0
20 | action_radius = 2
21 | 
22 | # variables
23 | x_mesh, y_mesh = np.mgrid[0:map_height:stepsize_map, 0:map_width:stepsize_map]
24 | belief_map = np.ones((int(map_height / stepsize_map), int(map_width / stepsize_map)))
25 | robot_movement_x = []
26 | robot_movement_y = []
27 | true_target_x = []
28 | true_target_y = []
29 | true_target_x.append(x_true)
30 | true_target_y.append(y_true)
31 | robot_movement_x.append(robots_x[0])
32 | robot_movement_y.append(robots_y[0])
33 | prev_target_x = x_true
34 | prev_target_y = y_true
35 | prev_robot_x = robots_x[0]
36 | prev_robot_y = robots_y[0]
37 | 
38 | # plotting for t=1
39 | bayesian_hist = compute_bayesian_histogram([x_true], [y_true], robots_x[0], robots_y[0], int(belief_map.shape[0]), int(belief_map.shape[1]), stepsize_map, sigma_bayesian_hist)
40 | belief_map = belief_map * bayesian_hist
41 | belief_map = belief_map / belief_map.sum()
42 | render(1, x_mesh, y_mesh, belief_map, true_target_x, true_target_y, robot_movement_x, robot_movement_y)
43 | 
44 | 
45 | # estimate target position after each time step
46 | for t in range(2, 200):
47 | 
48 |     # update target position and add target to list
49 |     x_true, y_true = get_target_position(t, x_true, y_true)
50 |     true_target_x.append(x_true)
51 |     true_target_y.append(y_true)
52 | 
53 |     # compute bayesian histogram and update belief map
54 |     bayesian_hist = compute_bayesian_histogram([x_true], [y_true], robots_x[0], robots_y[0], int(belief_map.shape[0]), int(belief_map.shape[1]), stepsize_map)
55 |     belief_map = belief_map * bayesian_hist
56 |     belief_map = belief_map / belief_map.sum()
57 |     
58 |     # plot map
59 |     render(t, x_mesh, y_mesh, belief_map, true_target_x, true_target_y, robot_movement_x, robot_movement_y)
60 | 
61 |     # update robot position
62 |     #next_robot_x, next_robot_y = update_robot_pos(robots_x[0], robots_y[0], x_true, y_true, prev_target_x, prev_target_y, action_radius, map_height, map_width, prev_robot_x, prev_robot_y)
63 | 
64 |     # add robot position for rendering
65 |     prev_target_x = x_true
66 |     prev_target_y = y_true
67 |     prev_robot_x = robots_x[0]
68 |     prev_robot_y = robots_y[0]
69 |     #robots_x[0] = next_robot_x
70 |     #robots_y[0] = next_robot_y
71 |     robot_movement_x.append(robots_x[0])
72 |     robot_movement_y.append(robots_y[0])
73 | 


--------------------------------------------------------------------------------
/scripts/ekf_main.py:
--------------------------------------------------------------------------------
 1 | # header files needed
 2 | import numpy as np
 3 | import matplotlib.pyplot as plt
 4 | import matplotlib.cm as cm
 5 | from scipy.stats import multivariate_normal
 6 | from matplotlib.patches import Ellipse
 7 | from utils import *
 8 | 
 9 | 
10 | # user-defined variables
11 | mean_1 = np.asarray([10, 10])
12 | x_true_1 = 10
13 | y_true_1 = 10
14 | var_1 = [[1, 0], [0, 1]]
15 | init_pos_1 = [8, 10]
16 | mean_2 = np.asarray([12, 5])
17 | x_true_2 = 12
18 | y_true_2 = 5
19 | var_2 = [[1, 0], [0, 1]]
20 | init_pos_2 = [10, 5]
21 | robots_x = [12.5]
22 | robots_y = [2.5]
23 | robots_id = [1]
24 | map_height = 20
25 | map_width = 20
26 | stepsize_map = 0.1
27 | action_radius = 1
28 | 
29 | # variables
30 | x_g, y_g = np.mgrid[0:map_height:stepsize_map, 0:map_width:stepsize_map]
31 | points = np.dstack((x_g, y_g))
32 | robot_movement_x = [robots_x[0]]
33 | robot_movement_y = [robots_y[0]]
34 | true_target_x_1 = [x_true_1]
35 | true_target_y_1 = [y_true_1]
36 | prev_target_x_1 = x_true_1
37 | prev_target_y_1 = y_true_1
38 | true_target_x_2 = [x_true_2]
39 | true_target_y_2 = [y_true_2]
40 | prev_target_x_2 = x_true_2
41 | prev_target_y_2 = y_true_2
42 | prev_robot_x = robots_x[0]
43 | prev_robot_y = robots_y[0]
44 | 
45 | 
46 | # estimate target position after each time step
47 | net_val = 0
48 | avg_val = 0
49 | for t in range(2, 200):
50 | 
51 |     # update target position
52 |     target_x_mean_1, target_y_mean_1, var_1, x_true_1, y_true_1 = extended_kalman_filter(mean_1[0], mean_1[1], var_1, robots_x, robots_y, robots_id, t, init_pos_1[0], init_pos_1[1], 0)
53 |     mean_1 = np.asarray([target_x_mean_1, target_y_mean_1])
54 |     target_x_mean_2, target_y_mean_2, var_2, x_true_2, y_true_2 = extended_kalman_filter(mean_2[0], mean_2[1], var_2, robots_x, robots_y, robots_id, t, init_pos_2[0], init_pos_2[1], 1)
55 |     mean_2 = np.asarray([target_x_mean_2, target_y_mean_2])
56 | 
57 |     # update true target position
58 |     true_target_x_1.append(x_true_1)
59 |     true_target_y_1.append(y_true_1)
60 |     true_target_x_2.append(x_true_2)
61 |     true_target_y_2.append(y_true_2)
62 | 
63 |     # add robot position for rendering
64 |     robot_movement_x.append(robots_x[0])
65 |     robot_movement_y.append(robots_y[0])
66 | 
67 |     # update belief map
68 |     #gauss = multivariate_normal(mean, var)
69 |     #prob_ekf = gauss.pdf(points)
70 |     #scale = 2, 2
71 |     #x, y = get_correlated_dataset(500, var, (mean[0], mean[1]), scale)
72 |     
73 |     # plot map
74 |     render_ekf([target_x_mean_1, target_y_mean_1], [target_x_mean_2, target_y_mean_2], var_1, var_2, t, true_target_x_1, true_target_y_1,  true_target_x_2, true_target_y_2, robot_movement_x, robot_movement_y)
75 |     #plot_ellipse(x, y, mean, true_target_x, true_target_y, target_x_mean, target_y_mean, "/home/arpitdec5/Desktop/robot_target_tracking/s2/" + str(t) + ".png", robots_x[0], robots_y[0], robot_movement_x, robot_movement_y)
76 | 
77 |     # update robot position
78 |     next_robot_x, next_robot_y, val = update_robot_pos_ekf(robots_x[0], robots_y[0], [target_x_mean_1, target_x_mean_2], [target_y_mean_1, target_y_mean_2], [var_1, var_2], [prev_target_x_1, prev_target_x_2], [prev_target_y_1, prev_target_y_2], action_radius, map_height, map_width, t+1, prev_robot_x, prev_robot_y)
79 |     net_val = np.linalg.det(var_1) + np.linalg.det(var_2)
80 |     avg_val += net_val
81 |     prev_robot_x = robots_x[0]
82 |     prev_robot_y = robots_y[0]
83 |     robots_x[0] = next_robot_x
84 |     robots_y[0] = next_robot_y
85 |     prev_target_x_1 = target_x_mean_1
86 |     prev_target_y_1 = target_y_mean_1
87 |     prev_target_x_2 = target_x_mean_2
88 |     prev_target_y_2 = target_y_mean_2
89 | 
90 | print(net_val)
91 | print(avg_val/200)
92 | print("Done")
93 | 


--------------------------------------------------------------------------------
/scripts/qlearning.py:
--------------------------------------------------------------------------------
  1 | # header files
  2 | import gym
  3 | import math
  4 | import random
  5 | import numpy as np
  6 | import matplotlib
  7 | import matplotlib.pyplot as plt
  8 | from collections import namedtuple
  9 | from itertools import count
 10 | from PIL import Image
 11 | from gym.envs.registration import register
 12 | import logging
 13 | import torch
 14 | import torch.nn as nn
 15 | import torch.optim as optim
 16 | import torch.nn.functional as F
 17 | import torchvision.transforms as T
 18 | from td3 import *
 19 | from replay_buffer import *
 20 | 
 21 | 
 22 | logger = logging.getLogger(__name__)
 23 | register(
 24 |     id='RobotTargetTrackingEnv-v0',
 25 |     entry_point='envs:RobotTargetTrackingEnvInterface',
 26 | )
 27 | 
 28 | # initialise environment
 29 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 30 | env_name = "RobotTargetTrackingEnv-v0"
 31 | env = gym.make(env_name)
 32 | env.env_parametrization()
 33 | env.reset()
 34 | 
 35 | 
 36 | # constants
 37 | LR = 0.1
 38 | DISCOUNT = 0.95
 39 | epsilon = 0.5
 40 | EPOCHS = 1
 41 | START_EPSILON_DECAYING = 1
 42 | END_EPSILON_DECAYING = EPOCHS // 2
 43 | epsilon_decay = epsilon / (END_EPSILON_DECAYING - START_EPSILON_DECAYING)
 44 | 
 45 | # train qlearning
 46 | q_table = np.random.uniform(low=-1, high=1, size=(11, 11, 20, 8))
 47 | epoch_reward = 0
 48 | e = []
 49 | r = []
 50 | 
 51 | def get_state(state):
 52 |     new_state = np.array([0, 0, 0])
 53 |     if(state[0]%2 == 0):
 54 |         new_state[0] = int(state[0] / 2.0)
 55 |     else:
 56 |         new_state[0] = int((state[0] - 1) / 2.0) + 1
 57 | 
 58 |     if(state[1]%2 == 0):
 59 |         new_state[1] = int(state[1] / 2.0)
 60 |     else:
 61 |         new_state[1] = int((state[1] - 1) / 2.0) + 1
 62 | 
 63 |     if(state[2]%3 == 0):
 64 |         new_state[2] = int(state[2] / 3.0)
 65 |     else:
 66 |         new_state[2] = int((state[2] - 1) / 3.0) + 1
 67 |     return new_state  
 68 | 
 69 | 
 70 | for epoch in range(0, EPOCHS):
 71 |     discrete_state = env.reset()
 72 |     state = get_state(discrete_state.numpy())
 73 |     done = False
 74 |     while not done:
 75 |         if(np.random.random() > epsilon):
 76 |             action = np.argmax(q_table[(state[0], state[1], state[2])])
 77 |         else:
 78 |             action = np.random.randint(0, 8)
 79 |         action_index = action
 80 |         if(action == 0):
 81 |             action = [0]
 82 |         elif(action == 1):
 83 |             action = [np.pi / 4]
 84 |         elif(action == 2):
 85 |             action = [np.pi / 2]
 86 |         elif(action == 3):
 87 |             action = [3*np.pi / 4]
 88 |         elif(action == 4):
 89 |             action = [np.pi]
 90 |         elif(action == 5):
 91 |             action = [225*np.pi / 180]
 92 |         elif(action == 6):
 93 |             action = [3*np.pi / 2]
 94 |         else:
 95 |             action = [315*np.pi / 180]
 96 | 
 97 |         new_state, reward, done, _ = env.step(action)
 98 |         epoch_reward += reward
 99 |         new_state = get_state(new_state.numpy())
100 |         
101 |         env.render()
102 | 
103 |         #if(epoch%100 == 0):
104 |         #    env.render()
105 |         
106 |         if not done:
107 |             max_future_q = np.max(q_table[(new_state[0], new_state[1], new_state[2])])
108 |             current_q = q_table[(state[0], state[1], state[2], action_index)]        
109 |             new_q = ((1 - LR) * current_q) + (LR * (reward + DISCOUNT * max_future_q))
110 |             q_table[(state[0], state[1], state[2], action_index)] = new_q
111 |         state = new_state
112 | 
113 |     if(END_EPSILON_DECAYING >= epoch >= START_EPSILON_DECAYING):
114 |         epsilon -= epsilon_decay
115 |     if(epoch%1000 == 0):
116 |         e.append(epoch)
117 |         r.append(epoch_reward)
118 |         print("Epoch Reward:")
119 |         print(epoch_reward)
120 |         print()
121 |     epoch_reward = 0
122 |     env.close()
123 | 
124 | # plot reward curve
125 | plt.xlabel("Episodes")
126 | plt.ylabel("Avg. Reward")
127 | plt.plot(e, r)
128 | plt.savefig("/home/arpitdec5/Desktop/qlearning_reward.png")
129 | 
130 | 
131 | # test qlearning
132 | #discrete_state = env.reset()
133 | #state = get_state(discrete_state.numpy())
134 | #done = False
135 | #while not done:
136 | #    action = np.argmax(q_table[(state[0], state[1], state[2])])
137 | #    action_index = action
138 | #    if(action == 0):
139 | #        action = [0]
140 | #    elif(action == 1):
141 | #        action = [np.pi / 4]
142 | #    elif(action == 2):
143 | #        action = [np.pi / 2]
144 | #    elif(action == 3):
145 | #        action = [3*np.pi / 4]
146 | #    elif(action == 4):
147 | #        action = [np.pi]
148 | #    elif(action == 5):
149 | #        action = [225*np.pi / 180]
150 | #    elif(action == 6):
151 | #        action = [3*np.pi / 2]
152 | #    else:
153 | #        action = [315*np.pi / 180]
154 | #
155 | #    new_state, reward, done, _ = env.step(action)
156 | #    new_state = get_state(new_state.numpy())
157 | #    state = new_state    
158 | #
159 | #    env.render()
160 | #    env.close()
161 | 


--------------------------------------------------------------------------------
/scripts/utils.py:
--------------------------------------------------------------------------------
  1 | # header files needed
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import matplotlib.cm as cm
  5 | from scipy.stats import multivariate_normal
  6 | from matplotlib.patches import Ellipse
  7 | import matplotlib.transforms as transforms
  8 | from math import pi, cos, sin
  9 | from random import random
 10 | import math
 11 | import torch
 12 | 
 13 | 
 14 | def render_ekf(estimated_targets_mean_1, estimated_targets_mean_2, estimated_targets_var_1, estimated_targets_var_2, time_step, x_list_1, y_list_1, x_list_2, y_list_2, robot_movement_x, robot_movement_y):
 15 |     """ 
 16 |         Function for rendering the environment
 17 |     """
 18 |     heatmap = torch.zeros(256, 256)
 19 |     for index in range(0, 1):
 20 |        x = np.linspace(0, 20, 256)
 21 |        y = np.linspace(0, 20, 256)
 22 |        X, Y = np.meshgrid(x, y)
 23 |        pos = np.empty(X.shape + (2,))
 24 |        pos[:, :, 0] = X; pos[:, :, 1] = Y
 25 |        rv_1 = multivariate_normal(estimated_targets_mean_1, estimated_targets_var_1)
 26 |        rv_2 = multivariate_normal(estimated_targets_mean_2, estimated_targets_var_2)
 27 |        heatmap += rv_1.pdf(pos)
 28 |        heatmap += rv_2.pdf(pos)
 29 |     x = np.linspace(0, 20, 256)
 30 |     y = np.linspace(0, 20, 256)
 31 |     X, Y = np.meshgrid(x, y)
 32 |     plt.cla()
 33 |     plt.title("Time step = " + str(time_step))
 34 |     plt.xlabel("x")
 35 |     plt.ylabel("y")
 36 |     plt.xlim(0, 20)
 37 |     plt.ylim(0, 20)
 38 |     plt.contourf(X, Y, heatmap, cmap=cm.inferno)
 39 |     plt.plot(x_list_1, y_list_1, 'b--')
 40 |     plt.plot(x_list_2, y_list_2, 'b--')
 41 |     plt.plot(x_list_1[len(x_list_1) - 1], y_list_1[len(y_list_1) - 1], 'o', c='b', marker='*')
 42 |     plt.plot(x_list_2[len(x_list_2) - 1], y_list_2[len(y_list_2) - 1], 'o', c='b', marker='*')
 43 |     #plt.plot(self.x2_list, self.y2_list, 'b--')
 44 |     #plt.plot(self.x2_list[len(self.x2_list) - 1], self.y2_list[len(self.y2_list) - 1], 'o', c='b', marker='*')
 45 |     #plt.plot(self.x3_list, self.y3_list, 'b--')
 46 |     #plt.plot(self.x3_list[len(self.x3_list) - 1], self.y3_list[len(self.y3_list) - 1], 'o', c='b', marker='*')
 47 |     #plt.plot(self.x4_list, self.y4_list, 'b--')
 48 |     #plt.plot(self.x4_list[len(self.x4_list) - 1], self.y4_list[len(self.y4_list) - 1], 'o', c='b', marker='*')
 49 |     if(len(robot_movement_x) < 8):
 50 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
 51 |     else:
 52 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
 53 |     plt.scatter(robot_movement_x[len(robot_movement_x) - 1], robot_movement_y[len(robot_movement_y) - 1], color='r', marker='D')
 54 |     plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s1/" + str(time_step) + ".png")
 55 |     #plt.show()
 56 | 
 57 | 
 58 | # plot the heatmap
 59 | def render(t, x_mesh, y_mesh, belief_map, x_target, y_target, robot_movement_x, robot_movement_y):
 60 |     """
 61 |         Inputs:
 62 |         t: time step
 63 |         x_mesh: the x-coordinates
 64 |         y_mesh: the y-coordinates
 65 |         belief_map: the map containing the probabilities
 66 |         x_target: the target x-coordinate
 67 |         y_target: the target y-coordinate
 68 |         robot_movement_x: the list of robot paths x-coordinates
 69 |         robot_movement_y: the list of robot paths y-coordinates
 70 |     """
 71 |     plt.cla()
 72 |     plt.title("Stationary robot(Target moving fast)")
 73 |     plt.xlabel("x")
 74 |     plt.ylabel("y")
 75 |     plt.contourf(x_mesh, y_mesh, belief_map, cmap=cm.inferno)
 76 |     plt.plot(x_target, y_target, 'b--', label='true target motion')
 77 |     plt.plot(x_target[len(x_target) - 1], y_target[len(y_target) - 1], 'o', c='b', marker='*')
 78 |     if(len(robot_movement_x) < 8):
 79 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
 80 |     else:
 81 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
 82 |     plt.scatter(robot_movement_x[len(robot_movement_x) - 1], robot_movement_y[len(robot_movement_y) - 1], color='r', marker='D', label='robot')
 83 |     plt.savefig("/home/arpitdec5/Desktop/robot_target_tracking/s1/" + str(t) + ".png")
 84 |     #plt.show()
 85 | 
 86 | 
 87 | # compute bayesian histogram for 'm' targets and given robot position
 88 | def compute_bayesian_histogram(targets_x_true, targets_y_true, robot_x, robot_y, belief_map_height, belief_map_width, stepsize_map, sigma_bayesian_hist=1.0):
 89 |     """
 90 |         Inputs:
 91 |         targets_x_true: the true position of target x-coordinate
 92 |         targets_y_true: the true position of target y-coordinate
 93 |         robot_x: the position of robot x-coordinate
 94 |         robot_y: the position of robot y-coordinate 
 95 |         belief_map_height: the environment dimensions, height
 96 |         belief_map_width: the environment dimensions, width
 97 |         stepsize_map: equal to 0.1
 98 |         sigma_bayesian_hist: equal to 1
 99 | 
100 |         Outputs:
101 |         bayesian_hist: the belief map of dimensions (belief_map_height, belief_map_width) containing probabilities      
102 |     """
103 |     noise = sigma_bayesian_hist * np.random.randn(1, 1)
104 |     bayesian_hist = np.zeros((belief_map_height, belief_map_width))
105 |     for index in range(0, len(targets_x_true)):
106 |         estimated = np.sqrt((targets_x_true[index] - robot_x)**2 + (targets_y_true[index] - robot_y)**2) + noise[0][0]
107 |         for index1 in range(0, belief_map_height):
108 |             for index2 in range(0, belief_map_width):
109 |                 true = np.sqrt(((index1*stepsize_map) - robot_x)**2 + ((index2*stepsize_map) - robot_y)**2)
110 |                 bayesian_hist[index1, index2] += 1.0 / (np.sqrt(2 * np.pi * sigma_bayesian_hist**2)) * np.exp(-0.5 / sigma_bayesian_hist**2 * (np.abs(true - estimated)**2))
111 |     return bayesian_hist
112 | 
113 | 
114 | # get target estimate
115 | def get_target_position(t, x_true, y_true):
116 |     """
117 |         Inputs:
118 |         t: time step
119 |         x_true: the true position of target x-coordinate
120 |         y_true: the true position of target y-coordinate
121 | 
122 |         Outputs:
123 |         (x_true, y_true): the target position at next time step      
124 |     """
125 |     omega = 33
126 |     x_true = 2*np.cos((t-1) / omega) + 10
127 |     y_true = 2*np.sin((t-1) / omega) + 12
128 |     return (x_true, y_true)
129 | 
130 | 
131 | # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
132 | def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
133 |     """
134 |         Inputs:
135 |         x: the x-coordinate datapoints
136 |         y: the y-coordinate datapoints
137 |     """
138 |     if x.size != y.size:
139 |         raise ValueError("x and y must be the same size")
140 | 
141 |     cov = np.cov(x, y)
142 |     pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
143 |     # Using a special case to obtain the eigenvalues of this
144 |     # two-dimensionl dataset.
145 |     ell_radius_x = np.sqrt(1 + pearson)
146 |     ell_radius_y = np.sqrt(1 - pearson)
147 |     ellipse = Ellipse((0, 0),
148 |         width=ell_radius_x * 2,
149 |         height=ell_radius_y * 2,
150 |         facecolor=facecolor,
151 |         **kwargs)
152 | 
153 |     # Calculating the stdandard deviation of x from
154 |     # the squareroot of the variance and multiplying
155 |     # with the given number of standard deviations.
156 |     scale_x = np.sqrt(cov[0, 0]) * n_std
157 |     mean_x = np.mean(x)
158 | 
159 |     # calculating the stdandard deviation of y ...
160 |     scale_y = np.sqrt(cov[1, 1]) * n_std
161 |     mean_y = np.mean(y)
162 | 
163 |     transf = transforms.Affine2D() \
164 |         .rotate_deg(45) \
165 |         .scale(scale_x, scale_y) \
166 |         .translate(mean_x, mean_y)
167 | 
168 |     ellipse.set_transform(transf + ax.transData)
169 |     return ax.add_patch(ellipse)
170 | 
171 | 
172 | # reference: https://matplotlib.org/3.1.0/gallery/statistics/confidence_ellipse.html
173 | def get_correlated_dataset(n, dependency, mu, scale):
174 |     latent = np.random.randn(n, 2)
175 |     dependent = latent.dot(dependency)
176 |     scaled = dependent * scale
177 |     scaled_with_offset = scaled + mu
178 |     # return x and y of the new, correlated dataset
179 |     return scaled_with_offset[:, 0], scaled_with_offset[:, 1]
180 | 
181 | 
182 | # code for the filter
183 | def extended_kalman_filter(target_xhat_t, target_yhat_t, target_sigma_t, robots_x, robots_y, robots_id, t, x, y, flag, omega, radii):
184 |     """
185 |         Inputs:
186 |         target_xhat_t: the estimated target position x-coordinate
187 |         target_yhat_t: the estimated target position y-coordinate
188 |         robots_x: the position of robots x-coordinate
189 |         robots_y: the position of robots y-coordinate
190 |         robots_id: the ids of robots
191 |         t: the time step
192 | 
193 |         Outputs:
194 |         (target_xhat_tplus1, target_yhat_tplus1, sigma_matrix_tplus1, x_true, y_true): the predicted target position      
195 |     """
196 |     # get z_true using true target motion
197 |     sigma_z = 1.0
198 |     x_true = radii*np.cos((t-1)/omega)+x
199 |     if flag:
200 |         y_true = radii*np.sin((t-1)/omega)+y
201 |     else:
202 |         y_true = -radii*np.sin((t-1) / omega)+y
203 |     noise = sigma_z*np.random.randn(1000, 1)
204 |         
205 |     z_true = np.zeros((len(robots_x), 1))
206 |     for index in range(0, len(robots_x)):
207 |         z_true[index][0] = np.linalg.norm([[robots_x[index] - x_true], [robots_y[index] - y_true]], 2) + noise[robots_id[index]]
208 | 
209 |     # filter code
210 |     q_matrix = 0.2 * np.eye(2)
211 |     x_matrix = np.array([[target_xhat_t], [target_yhat_t]])
212 |     sigma_matrix = target_sigma_t + q_matrix
213 |     
214 |     z_pred = np.zeros((len(robots_x), 1))
215 |     h_matrix = np.zeros((len(robots_x), 2))
216 |     for index in range(0, len(robots_x)):
217 |         z_pred[index][0] = np.linalg.norm([[x_matrix[0][0] - robots_x[index]], [x_matrix[1][0] - robots_y[index]]], 2)
218 |         h_matrix[index][0] = (-1.0 / z_pred[index]) * (robots_x[index] - x_matrix[0][0])
219 |         h_matrix[index][1] = (-1.0 / z_pred[index]) * (robots_y[index] - x_matrix[1][0])
220 |         
221 |     res = (z_true - z_pred)
222 |     r_matrix = sigma_z * sigma_z * np.eye(len(robots_x))
223 |     s_matrix = np.matmul(np.matmul(h_matrix, sigma_matrix), h_matrix.T) + r_matrix
224 |     k_matrix = np.matmul(np.matmul(sigma_matrix, h_matrix.T), np.linalg.inv(s_matrix))    
225 | 
226 |     x_matrix_tplus1 = x_matrix + (np.matmul(k_matrix, res))
227 |     sigma_matrix_tplus1 = np.matmul(np.matmul((np.eye(2) - np.matmul(k_matrix, h_matrix)), sigma_matrix), (np.eye(2) - np.matmul(k_matrix, h_matrix)).T) + np.matmul(np.matmul(k_matrix, r_matrix), k_matrix.T)
228 |     target_xhat_tplus1 = x_matrix_tplus1[0][0]
229 |     target_yhat_tplus1 = x_matrix_tplus1[1][0]
230 |     return (target_xhat_tplus1, target_yhat_tplus1, sigma_matrix_tplus1, x_true, y_true)
231 | 
232 | 
233 | # plot gaussian
234 | def plot_gaussian(gauss):
235 |     x, y = np.mgrid[0:25:100j, 0:25:100j]
236 |     z = np.dstack((x, y))
237 |     plt.contourf(x, y, gauss.pdf(z))
238 |     #plt.show()
239 | 
240 | 
241 | # save gaussian
242 | def save_gaussian(gauss, path):
243 |     x, y = np.mgrid[0:25:100j, 0:25:100j]
244 |     z = np.dstack((x, y))
245 |     plt.contourf(x, y, gauss.pdf(z))
246 |     plt.savefig(path)
247 | 
248 | 
249 | # plot confidence ellipse
250 | def plot_ellipse(x, y, mean, x_list, y_list, target_x_mean, target_y_mean, path, robot_x, robot_y, robot_movement_x, robot_movement_y):
251 |     fig, ax_nstd = plt.subplots(figsize=(6, 6))
252 |     ax_nstd.axvline(c='grey', lw=1)
253 |     ax_nstd.axhline(c='grey', lw=1)
254 |     confidence_ellipse(x, y, ax_nstd, n_std=1, edgecolor='firebrick')
255 |     ax_nstd.scatter(mean[0], mean[1], c='b', s=1)
256 |     ax_nstd.legend()
257 |     plt.title("Greedy algorithm using EKF(Target moving slow)")
258 |     plt.xlabel("x")
259 |     plt.ylabel("y")
260 |     plt.xlim(0, 20)
261 |     plt.ylim(0, 20)
262 |     plt.plot(x_list, y_list, 'b--')
263 |     plt.scatter(x_list[len(x_list)-1], y_list[len(y_list)-1], color='b', marker='*')
264 |     plt.scatter([target_x_mean], [target_y_mean], color='b', marker="s")
265 |     if(len(robot_movement_x) < 8):
266 |         plt.plot(robot_movement_x, robot_movement_y, 'r--')
267 |     else:
268 |         plt.plot(robot_movement_x[-8:], robot_movement_y[-8:], 'r--')
269 |     plt.scatter(robot_x, robot_y, color='r', marker='D')    
270 |     plt.plot([robot_x, target_x_mean], [robot_y, target_y_mean], color='r')
271 |     plt.savefig(path)
272 |     plt.cla()
273 |     plt.close()
274 | 
275 | 
276 | def points_in_circle_np(radius, x0=0, y0=0):
277 |     """
278 |         Inputs:
279 |         radius: the radius of the circular region around the current robot position
280 |         x0: the x-coordinate of the robot position
281 |         y0: the y-coordinate of the robot position
282 | 
283 |         Outputs: 
284 |         points: the action set to be used for deciding the robot trajectory      
285 |     """
286 |     thetas = [0.0, 15.0, 30.0, 45.0, 60.0, 75.0, 90.0, 105.0, 120.0, 135.0, 150.0, 165.0, 180.0, 195.0, 210.0, 225.0, 240.0, 255.0, 270.0, 285.0, 300.0, 315.0, 330.0]
287 |     points = []
288 |     for theta in thetas:
289 |         theta = (theta * pi) / 180.0
290 |         points.append((float(x0 + cos(theta) * radius), float(y0 + sin(theta) * radius)))
291 |     return points
292 | 
293 | 
294 | # choose optimal action
295 | def update_robot_pos_ekf(robot_x, robot_y, target_x, target_y, var, prev_target_x, prev_target_y, radius, map_height, map_width, t, prev_robot_x, prev_robot_y):
296 |     """
297 |         Inputs:
298 |         robot_x: the position of robot x-coordinate
299 |         robot_y: the position of robot y-coordinate 
300 |         target_x: the position of target x-coordinate
301 |         target_y: the position of target y-coordinate
302 |         var: the uncertainty in target position
303 |         prev_target_x: the previous position of target x-coordinate
304 |         prev_target_y: the previous position of target y-coordinate
305 |         radius: the radius of the circular region around the current robot position
306 |         map_height: the environment dimensions, height
307 |         map_width: the environment dimensions, width
308 |         t: the time step
309 | 
310 |         Outputs:
311 |         (best_action[0], best_action[1]): updated robot position     
312 |     """
313 |     action_set = points_in_circle_np(radius, robot_x, robot_y)
314 |     alpha_opt = -1000000
315 |     best_action = (robot_x, robot_y)
316 |     for action in action_set:
317 |         curr_robot_x = action[0]
318 |         curr_robot_y = action[1] 
319 |         val = 0
320 |         for index in range(0, len(target_x)):
321 |             if(curr_robot_x >= 1 and curr_robot_x <= (map_height-1) and curr_robot_y >= 1 and curr_robot_y <= (map_width-1)):
322 |                 t_v_i = ((robot_x - target_x[index])*(robot_x - target_x[index])) + ((robot_y - target_y[index])*(robot_y - target_y[index]))
323 |                 t_v_j = ((curr_robot_x - target_x[index])*(curr_robot_x - target_x[index])) + ((curr_robot_y - target_y[index])*(curr_robot_y - target_y[index]))
324 |                 t_m1 = (target_y[index] - robot_y) / (target_x[index] - robot_x + 1e-9)
325 |                 t_m2 = (target_y[index] - curr_robot_y) / (target_x[index] - curr_robot_x + 1e-9)
326 |                 value = np.abs(t_m1*t_m2 + 1)
327 |                 t_angle1 = np.arctan2(t_m1-t_m2, 1+t_m1*t_m2)
328 |                 t_angle2 = np.arctan2(t_m2-t_m1, 1+t_m1*t_m2)
329 |                 t_val = max((t_v_i*t_v_j*np.sin(t_angle1)*np.sin(t_angle1)), (t_v_i*t_v_j*np.sin(t_angle2)*np.sin(t_angle2)))
330 |                 t1_v_i = ((prev_robot_x - prev_target_x[index])*(prev_robot_x - prev_target_x[index])) + ((prev_robot_y - prev_target_y[index])*(prev_robot_y - prev_target_y[index]))
331 |                 t1_v_j = ((robot_x - prev_target_x[index])*(robot_x - prev_target_x[index])) + ((robot_y - prev_target_y[index])*(robot_y - prev_target_y[index]))
332 |                 t1_m1 = (prev_target_y[index] - prev_robot_y) / (prev_target_x[index] - prev_robot_x + 1e-9)
333 |                 t1_m2 = (prev_target_y[index] - robot_y) / (prev_target_x[index] - robot_x + 1e-9)
334 |                 t1_angle1 = np.arctan2(t1_m1-t1_m2, 1+t1_m1*t1_m2)
335 |                 t1_angle2 = np.arctan2(t1_m2-t1_m1, 1+t1_m1*t1_m2)
336 |                 t1_val = max((t1_v_i*t1_v_j*np.sin(t1_angle1)*np.sin(t1_angle1)), (t1_v_i*t1_v_j*np.sin(t1_angle2)*np.sin(t1_angle2)))
337 |                 t2_v_i = ((prev_robot_x - prev_target_x[index])*(prev_robot_x - prev_target_x[index])) + ((prev_robot_y - prev_target_y[index])*(prev_robot_y - prev_target_y[index]))
338 |                 t2_v_j = ((curr_robot_x - prev_target_x[index])*(curr_robot_x - prev_target_x[index])) + ((curr_robot_y - prev_target_y[index])*(curr_robot_y - prev_target_y[index]))
339 |                 t2_m1 = (prev_target_y[index] - prev_robot_y) / (prev_target_x[index] - prev_robot_x + 1e-9)
340 |                 t2_m2 = (prev_target_y[index] - curr_robot_y) / (prev_target_x[index] - curr_robot_x + 1e-9)
341 |                 t2_angle1 = np.arctan2(t2_m1-t2_m2, 1+t2_m1*t2_m2)
342 |                 t2_angle2 = np.arctan2(t2_m2-t2_m1, 1+t2_m1*t2_m2)
343 |                 t2_val = max((t2_v_i*t2_v_j*np.sin(t2_angle1)*np.sin(t2_angle1)), (t2_v_i*t2_v_j*np.sin(t2_angle2)*np.sin(t2_angle2)))
344 |                 val += t1_val + t2_val + t_val
345 |         if(val > alpha_opt):
346 |            alpha_opt = val
347 |            best_action = action
348 |     return (best_action[0], best_action[1], value)
349 | 
350 | 
351 | # choose optimal action
352 | def update_robot_pos(robot_x, robot_y, target_x, target_y, prev_target_x, prev_target_y, radius, map_height, map_width, prev_robot_x, prev_robot_y):
353 |     """
354 |         Inputs:
355 |         robot_x: the position of robot x-coordinate
356 |         robot_y: the position of robot y-coordinate 
357 |         target_x: the position of target x-coordinate
358 |         target_y: the position of target y-coordinate
359 |         prev_target_x: the previous position of target x-coordinate
360 |         prev_target_y: the previous position of target y-coordinate
361 |         radius: the radius of the circular region around the current robot position
362 |         map_height: the environment dimensions, height
363 |         map_width: the environment dimensions, width
364 | 
365 |         Outputs:
366 |         (best_action[0], best_action[1]): updated robot position
367 |     """
368 |     action_set = points_in_circle_np(radius, robot_x, robot_y)
369 |     alpha_opt = -10000000
370 |     dist_opt = 10000000
371 |     best_action = (0, 0)
372 |     for action in action_set:
373 |         curr_robot_x = action[0]
374 |         curr_robot_y = action[1]
375 |         if(curr_robot_x >= 1 and curr_robot_x < (map_height-1) and curr_robot_y >= 1 and curr_robot_y < (map_width-1)):
376 |             m1 = (prev_target_y - robot_y) / (prev_target_x - robot_x + 1e-8)
377 |             m2 = (target_y - curr_robot_y) / (target_x - curr_robot_x + 1e-8)
378 |             t_v_i = ((robot_x - target_x)*(robot_x - target_x)) + ((robot_y - target_y)*(robot_y - target_y))
379 |             t_v_j = ((curr_robot_x - target_x)*(curr_robot_x - target_x)) + ((curr_robot_y - target_y)*(curr_robot_y - target_y))
380 |             t_m1 = (target_y - robot_y) / (target_x - robot_x + 1e-9)
381 |             t_m2 = (target_y - curr_robot_y) / (target_x - curr_robot_x + 1e-9)
382 |             t_angle1 = np.arctan2(t_m1-t_m2, 1+t_m1*t_m2)
383 |             t_angle2 = np.arctan2(t_m2-t_m1, 1+t_m1*t_m2)
384 |             t_val = max((t_v_i*t_v_j*np.sin(t_angle1)*np.sin(t_angle1)), (t_v_i*t_v_j*np.sin(t_angle2)*np.sin(t_angle2)))
385 |             t1_v_i = ((prev_robot_x - prev_target_x)*(prev_robot_x - prev_target_x)) + ((prev_robot_y - prev_target_y)*(prev_robot_y - prev_target_y))
386 |             t1_v_j = ((robot_x - prev_target_x)*(robot_x - prev_target_x)) + ((robot_y - prev_target_y)*(robot_y - prev_target_y))
387 |             t1_m1 = (prev_target_y - prev_robot_y) / (prev_target_x - prev_robot_x + 1e-9)
388 |             t1_m2 = (prev_target_y - robot_y) / (prev_target_x - robot_x + 1e-9)
389 |             t1_angle1 = np.arctan2(t1_m1-t1_m2, 1+t1_m1*t1_m2)
390 |             t1_angle2 = np.arctan2(t1_m2-t1_m1, 1+t1_m1*t1_m2)
391 |             t1_val = max((t1_v_i*t1_v_j*np.sin(t1_angle1)*np.sin(t1_angle1)), (t1_v_i*t1_v_j*np.sin(t1_angle2)*np.sin(t1_angle2)))
392 |             t2_v_i = ((prev_robot_x - prev_target_x)*(prev_robot_x - prev_target_x)) + ((prev_robot_y - prev_target_y)*(prev_robot_y - prev_target_y))
393 |             t2_v_j = ((curr_robot_x - prev_target_x)*(curr_robot_x - prev_target_x)) + ((curr_robot_y - prev_target_y)*(curr_robot_y - prev_target_y))
394 |             t2_m1 = (prev_target_y - prev_robot_y) / (prev_target_x - prev_robot_x + 1e-9)
395 |             t2_m2 = (prev_target_y - curr_robot_y) / (prev_target_x - curr_robot_x + 1e-9)
396 |             t2_angle1 = np.arctan2(t2_m1-t2_m2, 1+t2_m1*t2_m2)
397 |             t2_angle2 = np.arctan2(t2_m2-t2_m1, 1+t2_m1*t2_m2)
398 |             t2_val = max((t2_v_i*t2_v_j*np.sin(t2_angle1)*np.sin(t2_angle1)), (t2_v_i*t2_v_j*np.sin(t2_angle2)*np.sin(t2_angle2)))
399 |             val = t1_val + t2_val + t_val
400 |             if(val > alpha_opt):
401 |                 alpha_opt = val
402 |                 best_action = action
403 |     return (best_action[0], best_action[1])
404 | 


--------------------------------------------------------------------------------