├── GOT-10k
└── reports_31
│ └── OTB2015
│ └── SiamFC
│ ├── performance.json
│ ├── precision_plots.png
│ └── success_plots.png
├── README.md
├── config.py
├── img
├── GOT-10k dataset.jpg
├── SiamFusion.png
├── results.png
├── results_for_31.jpg
└── save model.jpg
├── pairwise.py
├── siamfc.py
├── test.py
└── train.py
/GOT-10k/reports_31/OTB2015/SiamFC/precision_plots.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/GOT-10k/reports_31/OTB2015/SiamFC/precision_plots.png
--------------------------------------------------------------------------------
/GOT-10k/reports_31/OTB2015/SiamFC/success_plots.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/GOT-10k/reports_31/OTB2015/SiamFC/success_plots.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SiamFusion PyTorch implementation
2 | ## Introduction
3 | This is my Thesis in the direction of Visual Object Tracking.
4 |
5 | **SiamFusion architecture**
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 | ## How to Run - Training
15 | 1. **Prerequisites:** The project was built using **python 3.6** and tested on Ubuntu 18.04 and 16.04. It was tested on a **GTX 1080 Ti**. Furthermore it requires [PyTorch 4.1](https://pytorch.org/).
16 |
17 | 2. Download the **GOT-10k** Dataset in http://got-10k.aitestunion.com/downloads and extract it on the folder of your choice, in my case it is `/home/arbi/desktop/GOT-10k` (OBS: data reading is done in execution time, so if available extract the dataset in your SSD partition).
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 | 3. Download the ImageNet VID Dataset in http://bvisionweb1.cs.unc.edu/ILSVRC2017/download-videos-1p39.php and extract it on the folder of your choice (OBS: data reading is done in execution time, so if available extract the dataset in your SSD partition). You can get rid of the test part of the dataset, since it has no Annotations.
27 |
28 | 4. In **config.py** script `root_dir_for_GOT_10k`, `root_dir_for_VID and` and `root_dir_for_OTB` change to your directory.
29 | ```
30 | root_dir_for_GOT_10k = '/home/arbi/desktop/GOT-10k' <-- change to your directory
31 | root_dir_for_VID = '/home/arbi/desktop/VID' <-- change to your directory
32 | root_dir_for_OTB = '/home/arbi/desktop/OTB2015' <-- change to your directory
33 | ```
34 |
35 | 5. Run the **train.py** script:
36 | ```
37 | python3 train.py
38 | ```
39 |
40 | ## How to Run - Testing
41 | 1. Download pretrained `model_e31.pth` from [Yandex Disk](https://yadi.sk/d/c-ffSCvtxkdiLw), and put the file under `model/model_e31.pth`.
42 |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 | 2. Run the **test.py** script:
51 | ```
52 | python3 test.py
53 | ```
54 |
55 | ## Results - Training
56 | **OTB2015**
57 |
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 | **Results on each epoch**
66 |
67 |
68 |
69 |
70 |
71 |
72 |
73 |
74 |
--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
1 |
2 |
3 | class Config(object):
4 |
5 | batch_size = 1
6 | num_workers= 1
7 | epoch_num = 50
8 | show_step = 1
9 |
10 | root_dir_for_GOT_10k = '/Users/arbi/Desktop' # '/Users/arbi/Desktop' '/media/arbi/9132EE0B9756C987/dataset/GOT-10k/full_data' '/home/arbi/desktop/GOT-10k'
11 | root_dir_for_VID = '/home/arbi/desktop/ILSVRC2017_VID' # '/home/arbi/desktop/ILSVRC'
12 | root_dir_for_OTB = '/Users/arbi/Desktop/dataOTB/OTB' #'/Users/arbi/Desktop/dataOTB/OTB' '/media/arbi/9132EE0B9756C987/dataset/OTB2015' '/home/arbi/desktop/data'
13 |
14 | config = Config()
15 |
--------------------------------------------------------------------------------
/img/GOT-10k dataset.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/GOT-10k dataset.jpg
--------------------------------------------------------------------------------
/img/SiamFusion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/SiamFusion.png
--------------------------------------------------------------------------------
/img/results.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/results.png
--------------------------------------------------------------------------------
/img/results_for_31.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/results_for_31.jpg
--------------------------------------------------------------------------------
/img/save model.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/save model.jpg
--------------------------------------------------------------------------------
/pairwise.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import, division
2 |
3 | import numpy as np
4 | from collections import namedtuple
5 | from torch.utils.data import Dataset
6 | from torchvision.transforms import Compose, CenterCrop, RandomCrop, ToTensor
7 | from PIL import Image, ImageStat, ImageOps
8 | import cv2
9 | import random
10 |
11 |
12 | class RandomStretch(object):
13 |
14 | def __init__(self, max_stretch=0.05, interpolation='bilinear'):
15 | assert interpolation in ['bilinear', 'bicubic']
16 | self.max_stretch = max_stretch
17 | self.interpolation = interpolation
18 |
19 | def __call__(self, img):
20 | scale = 1.0 + np.random.uniform(
21 | -self.max_stretch, self.max_stretch)
22 | size = np.round(np.array(img.size, float) * scale).astype(int)
23 | if self.interpolation == 'bilinear':
24 | method = Image.BILINEAR
25 | elif self.interpolation == 'bicubic':
26 | method = Image.BICUBIC
27 | return img.resize(tuple(size), method)
28 |
29 |
30 | class Pairwise(Dataset):
31 |
32 | def __init__(self, seq_dataset, **kargs):
33 | super(Pairwise, self).__init__()
34 | self.cfg = self.parse_args(**kargs)
35 |
36 | self.seq_dataset = seq_dataset
37 | self.indices = np.random.permutation(len(seq_dataset))
38 | # augmentation for exemplar and instance images
39 | self.transform_z = Compose([
40 | RandomStretch(max_stretch=0.05),
41 | CenterCrop(self.cfg.instance_sz - 8),
42 | RandomCrop(self.cfg.instance_sz - 2 * 8),
43 | CenterCrop(self.cfg.exemplar_sz),
44 | ToTensor()])
45 | self.transform_x = Compose([
46 | RandomStretch(max_stretch=0.05),
47 | CenterCrop(self.cfg.instance_sz - 8),
48 | RandomCrop(self.cfg.instance_sz - 2 * 8),
49 | ToTensor()])
50 |
51 | def parse_args(self, **kargs):
52 | # default parameters
53 | cfg = {
54 | 'pairs_per_seq': 10,
55 | 'max_dist': 100,
56 | 'exemplar_sz': 127,
57 | 'instance_sz': 255,
58 | 'context': 0.5}
59 |
60 | for key, val in kargs.items():
61 | if key in cfg:
62 | cfg.update({key: val})
63 | return namedtuple('GenericDict', cfg.keys())(**cfg)
64 |
65 | def __getitem__(self, index):
66 | index = self.indices[index % len(self.seq_dataset)]
67 | img_files, anno = self.seq_dataset[index]
68 |
69 | # remove too small objects
70 | valid = anno[:, 2:].prod(axis=1) >= 10
71 | img_files = np.array(img_files)[valid]
72 | anno = anno[valid, :]
73 |
74 | rand_z, rand_x = self._sample_pair(len(img_files))
75 |
76 | exemplar_image = Image.open(img_files[rand_z])
77 | exemplar_img = self._crop_and_resize(exemplar_image, anno[rand_z])
78 | exemplar_image = 255.0 * self.transform_z(exemplar_img)
79 |
80 | exemplar_noise = self.sp_noise(exemplar_img, 0.05)
81 | exemplar_noise = 255.0 * self.transform_z(exemplar_noise)
82 |
83 | instance_image = Image.open(img_files[rand_x])
84 | instance_img = self._crop_and_resize(instance_image, anno[rand_x])
85 | instance_image = 255.0 * self.transform_x(instance_img)
86 |
87 | instance_noise = self.sp_noise(instance_img, 0.05)
88 | instance_noise = 255.0 * self.transform_x(instance_noise)
89 |
90 | return exemplar_image, exemplar_noise, instance_image, instance_noise
91 |
92 | def __len__(self):
93 | return self.cfg.pairs_per_seq * len(self.seq_dataset)
94 |
95 | def sp_noise(self, image, prob):
96 | '''
97 | Add salt and pepper noise to image
98 | prob: Probability of the noise
99 | '''
100 | image = np.array(image)
101 | output = np.zeros(image.shape,np.uint8)
102 | thres = 1 - prob
103 | for i in range(image.shape[0]):
104 | for j in range(image.shape[1]):
105 | rdn = random.random()
106 | if rdn < prob:
107 | output[i][j] = 0
108 | elif rdn > thres:
109 | output[i][j] = 255
110 | else:
111 | output[i][j] = image[i][j]
112 | cv2.imwrite("cv.png", output)
113 | output = Image.fromarray(output)
114 | return output
115 |
116 | def _sample_pair(self, n):
117 | assert n > 0
118 | if n == 1:
119 | return 0, 0
120 | elif n == 2:
121 | return 0, 1
122 | else:
123 | max_dist = min(n - 1, self.cfg.max_dist)
124 | rand_dist = np.random.choice(max_dist) + 1
125 | rand_z = np.random.choice(n - rand_dist)
126 | rand_x = rand_z + rand_dist
127 |
128 | return rand_z, rand_x
129 |
130 | def _crop_and_resize(self, image, box):
131 | # convert box to 0-indexed and center based
132 | box = np.array([
133 | box[0] - 1 + (box[2] - 1) / 2,
134 | box[1] - 1 + (box[3] - 1) / 2,
135 | box[2], box[3]], dtype=np.float32)
136 | center, target_sz = box[:2], box[2:]
137 |
138 | # exemplar and search sizes
139 | context = self.cfg.context * np.sum(target_sz)
140 | z_sz = np.sqrt(np.prod(target_sz + context))
141 | x_sz = z_sz * self.cfg.instance_sz / self.cfg.exemplar_sz
142 |
143 | # convert box to corners (0-indexed)
144 | size = round(x_sz)
145 | corners = np.concatenate((
146 | np.round(center - (size - 1) / 2),
147 | np.round(center - (size - 1) / 2) + size))
148 | corners = np.round(corners).astype(int)
149 |
150 | # pad image if necessary
151 | pads = np.concatenate((
152 | -corners[:2], corners[2:] - image.size))
153 | npad = max(0, int(pads.max()))
154 | if npad > 0:
155 | avg_color = ImageStat.Stat(image).mean
156 | # PIL doesn't support float RGB image
157 | avg_color = tuple(int(round(c)) for c in avg_color)
158 | image = ImageOps.expand(image, border=npad, fill=avg_color)
159 |
160 | # crop image patch
161 | corners = tuple((corners + npad).astype(int))
162 | patch = image.crop(corners)
163 |
164 | # resize to instance_sz
165 | out_size = (self.cfg.instance_sz, self.cfg.instance_sz)
166 | patch = patch.resize(out_size, Image.BILINEAR)
167 | #print("patch",patch)
168 |
169 | return patch
170 |
--------------------------------------------------------------------------------
/siamfc.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import, division
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.init as init
6 | import torch.nn.functional as F
7 | import torch.optim as optim
8 | import numpy as np
9 | import cv2
10 | from collections import namedtuple
11 | from torch.optim.lr_scheduler import ExponentialLR
12 |
13 | from got10k.trackers import Tracker
14 |
15 |
16 | class SiamFC(nn.Module):
17 |
18 | def __init__(self):
19 | super(SiamFC, self).__init__()
20 |
21 |
22 | self.feature1 = nn.Sequential(
23 | # conv1
24 | nn.Conv2d(3, 192, 11, 2),
25 | nn.BatchNorm2d(192, eps=1e-6, momentum=0.05),
26 | nn.ReLU(inplace=True),
27 | nn.MaxPool2d(3, 2))
28 |
29 | self.feature2 = nn.Sequential(
30 | # conv1
31 | nn.Conv2d(3, 192, 11, 2),
32 | nn.BatchNorm2d(192, eps=1e-6, momentum=0.05),
33 | nn.ReLU(inplace=True),
34 | nn.MaxPool2d(3, 2))
35 |
36 | self.feature3 = nn.Sequential(
37 | # conv2
38 | nn.Conv2d(192, 256, 3, 1),
39 | nn.Conv2d(256, 256, 3, 1),
40 | nn.BatchNorm2d(256, eps=1e-6, momentum=0.05),
41 | nn.ReLU(inplace=True),
42 | nn.MaxPool2d(3, 2),
43 | # conv3
44 | nn.Conv2d(256, 512, 3, 1),
45 | nn.BatchNorm2d(512, eps=1e-6, momentum=0.05),
46 | nn.ReLU(inplace=True),
47 | nn.Conv2d(512, 512, 3, 1),
48 | nn.BatchNorm2d(512, eps=1e-6, momentum=0.05),
49 | nn.ReLU(inplace=True),
50 | nn.Conv2d(512, 384, 3, 1),
51 |
52 | )
53 |
54 | self._initialize_weights()
55 |
56 | def forward(self, z, z_noise, x, x_noise):
57 |
58 | z = self.feature1(z)
59 | z_noise = self.feature2(z_noise)
60 | z = torch.add(z, z_noise)
61 | z = self.feature3(z)
62 |
63 | x = self.feature1(x)
64 | x_noise = self.feature2(x_noise)
65 | x = torch.add(x, x_noise)
66 | x = self.feature3(x)
67 |
68 | # fast cross correlation
69 | out = self.fast_cross(x, z, "out", 0.1)
70 | #out = torch.add(cross4, out)
71 |
72 | # adjust the scale of responses
73 | out = 0.001 * out + 0.0
74 |
75 | return out
76 |
77 | def _initialize_weights(self):
78 | for m in self.modules():
79 | if isinstance(m, nn.Conv2d):
80 | init.kaiming_normal_(m.weight.data, mode='fan_out',
81 | nonlinearity='relu')
82 | m.bias.data.fill_(0)
83 | elif isinstance(m, nn.BatchNorm2d):
84 | m.weight.data.fill_(1)
85 | m.bias.data.zero_()
86 |
87 | def fast_cross(self, x, z, img_name = None, num = 0.001):
88 | n, c, h, w = x.size()
89 | out = F.conv2d(x.view(1, n * c, h, w), z, groups=n)
90 | out = out.view(n, 1, out.size(-2), out.size(-1))
91 |
92 | if img_name != None:
93 | out_img = num * out + 0.0
94 | out_img = np.asarray(out_img[0].permute(1,2,0).detach().cpu())
95 | cv2.imwrite("{}.png".format(img_name), out_img)
96 |
97 | return out
98 |
99 |
100 |
101 | class TrackerSiamFC(Tracker):
102 |
103 | def __init__(self, net_path=None, **kargs):
104 | super(TrackerSiamFC, self).__init__(
105 | name='SiamFC', is_deterministic=True)
106 | self.cfg = self.parse_args(**kargs)
107 |
108 | # setup GPU device if available
109 | self.cuda = torch.cuda.is_available()
110 | self.device = torch.device('cuda:0' if self.cuda else 'cpu')
111 |
112 | # setup model
113 | self.net = SiamFC()
114 | if net_path is not None:
115 | self.net.load_state_dict(torch.load(
116 | net_path, map_location=lambda storage, loc: storage))
117 | self.net = self.net.to(self.device)
118 |
119 | # setup optimizer
120 | self.optimizer = optim.SGD(
121 | self.net.parameters(),
122 | lr=self.cfg.initial_lr,
123 | weight_decay=self.cfg.weight_decay,
124 | momentum=self.cfg.momentum)
125 |
126 | # setup lr scheduler
127 | self.lr_scheduler = ExponentialLR(
128 | self.optimizer, gamma=self.cfg.lr_decay)
129 |
130 | def parse_args(self, **kargs):
131 | # default parameters
132 | cfg = {
133 | # inference parameters
134 | 'exemplar_sz': 127,
135 | 'instance_sz': 255,
136 | 'context': 0.5,
137 | 'scale_num': 3,
138 | 'scale_step': 1.0375,
139 | 'scale_lr': 0.59,
140 | 'scale_penalty': 0.9745,
141 | 'window_influence': 0.176,
142 | 'response_sz': 17,
143 | 'response_up': 16,
144 | 'total_stride': 8,
145 | 'adjust_scale': 0.001,
146 | # train parameters
147 | 'initial_lr': 0.01,
148 | 'lr_decay': 0.8685113737513527,
149 | 'weight_decay': 5e-4,
150 | 'momentum': 0.9,
151 | 'r_pos': 16,
152 | 'r_neg': 0}
153 |
154 | for key, val in kargs.items():
155 | if key in cfg:
156 | cfg.update({key: val})
157 | return namedtuple('GenericDict', cfg.keys())(**cfg)
158 |
159 | def init(self, image, box):
160 | image = np.asarray(image)
161 |
162 | # convert box to 0-indexed and center based [y, x, h, w]
163 | box = np.array([
164 | box[1] - 1 + (box[3] - 1) / 2,
165 | box[0] - 1 + (box[2] - 1) / 2,
166 | box[3], box[2]], dtype=np.float32)
167 | self.center, self.target_sz = box[:2], box[2:]
168 |
169 | # create hanning window
170 | self.upscale_sz = self.cfg.response_up * self.cfg.response_sz
171 | self.hann_window = np.outer(
172 | np.hanning(self.upscale_sz),
173 | np.hanning(self.upscale_sz))
174 | self.hann_window /= self.hann_window.sum()
175 |
176 | # search scale factors
177 | self.scale_factors = self.cfg.scale_step ** np.linspace(
178 | -(self.cfg.scale_num // 2),
179 | self.cfg.scale_num // 2, self.cfg.scale_num)
180 |
181 | # exemplar and search sizes
182 | context = self.cfg.context * np.sum(self.target_sz)
183 | self.z_sz = np.sqrt(np.prod(self.target_sz + context))
184 | self.x_sz = self.z_sz * \
185 | self.cfg.instance_sz / self.cfg.exemplar_sz
186 |
187 | # exemplar image
188 | self.avg_color = np.mean(image, axis=(0, 1))
189 | exemplar_image = self._crop_and_resize(
190 | image, self.center, self.z_sz,
191 | out_size=self.cfg.exemplar_sz,
192 | pad_color=self.avg_color)
193 |
194 | # exemplar features
195 | exemplar_image = torch.from_numpy(exemplar_image).to(
196 | self.device).permute([2, 0, 1]).unsqueeze(0).float()
197 | with torch.set_grad_enabled(False):
198 | self.net.eval()
199 |
200 | z = self.net.feature1(exemplar_image)
201 | z_noise = self.net.feature2(exemplar_image)
202 | z = torch.add(z, z_noise)
203 | self.kernel = self.net.feature3(z)
204 |
205 | def update(self, image):
206 | image = np.asarray(image)
207 |
208 | # search images
209 | instance_images = [self._crop_and_resize(
210 | image, self.center, self.x_sz * f,
211 | out_size=self.cfg.instance_sz,
212 | pad_color=self.avg_color) for f in self.scale_factors]
213 | instance_images = np.stack(instance_images, axis=0)
214 | instance_images = torch.from_numpy(instance_images).to(
215 | self.device).permute([0, 3, 1, 2]).float()
216 |
217 | # responses
218 | with torch.set_grad_enabled(False):
219 | self.net.eval()
220 |
221 | x = self.net.feature1(instance_images)
222 | x_noise = self.net.feature2(instance_images)
223 | x = torch.add(x, x_noise)
224 | instances = self.net.feature3(x)
225 |
226 | responses = F.conv2d(instances, self.kernel) * 0.001
227 | responses = responses.squeeze(1).cpu().numpy()
228 |
229 | # upsample responses and penalize scale changes
230 | responses = np.stack([cv2.resize(
231 | t, (self.upscale_sz, self.upscale_sz),
232 | interpolation=cv2.INTER_CUBIC) for t in responses], axis=0)
233 | responses[:self.cfg.scale_num // 2] *= self.cfg.scale_penalty
234 | responses[self.cfg.scale_num // 2 + 1:] *= self.cfg.scale_penalty
235 |
236 | # peak scale
237 | scale_id = np.argmax(np.amax(responses, axis=(1, 2)))
238 |
239 | # peak location
240 | response = responses[scale_id]
241 | response -= response.min()
242 | response /= response.sum() + 1e-16
243 | response = (1 - self.cfg.window_influence) * response + \
244 | self.cfg.window_influence * self.hann_window
245 | loc = np.unravel_index(response.argmax(), response.shape)
246 |
247 | # locate target center
248 | disp_in_response = np.array(loc) - self.upscale_sz // 2
249 | disp_in_instance = disp_in_response * \
250 | self.cfg.total_stride / self.cfg.response_up
251 | disp_in_image = disp_in_instance * self.x_sz * \
252 | self.scale_factors[scale_id] / self.cfg.instance_sz
253 | self.center += disp_in_image
254 |
255 | # update target size
256 | scale = (1 - self.cfg.scale_lr) * 1.0 + \
257 | self.cfg.scale_lr * self.scale_factors[scale_id]
258 | self.target_sz *= scale
259 | self.z_sz *= scale
260 | self.x_sz *= scale
261 |
262 | # return 1-indexed and left-top based bounding box
263 | box = np.array([
264 | self.center[1] + 1 - (self.target_sz[1] - 1) / 2,
265 | self.center[0] + 1 - (self.target_sz[0] - 1) / 2,
266 | self.target_sz[1], self.target_sz[0]])
267 |
268 | return box
269 |
270 | def step(self, batch, backward=True, update_lr=False):
271 | if backward:
272 | self.net.train()
273 | if update_lr:
274 | self.lr_scheduler.step()
275 | else:
276 | self.net.eval()
277 |
278 | z = batch[0].to(self.device)
279 | z_noise = batch[1].to(self.device)
280 |
281 | x = batch[2].to(self.device)
282 | x_noise = batch[3].to(self.device)
283 |
284 | with torch.set_grad_enabled(backward):
285 | responses = self.net(z, z_noise, x, x_noise)
286 | labels, weights = self._create_labels(responses.size())
287 | loss = F.binary_cross_entropy_with_logits(
288 | responses, labels, weight=weights, size_average=True)
289 |
290 | if backward:
291 | self.optimizer.zero_grad()
292 | loss.backward()
293 | self.optimizer.step()
294 | print(loss.item())
295 | return loss.item()
296 |
297 | def _crop_and_resize(self, image, center, size, out_size, pad_color):
298 | # convert box to corners (0-indexed)
299 | size = round(size)
300 | corners = np.concatenate((
301 | np.round(center - (size - 1) / 2),
302 | np.round(center - (size - 1) / 2) + size))
303 | corners = np.round(corners).astype(int)
304 |
305 | # pad image if necessary
306 | pads = np.concatenate((
307 | -corners[:2], corners[2:] - image.shape[:2]))
308 | npad = max(0, int(pads.max()))
309 | if npad > 0:
310 | image = cv2.copyMakeBorder(
311 | image, npad, npad, npad, npad,
312 | cv2.BORDER_CONSTANT, value=pad_color)
313 |
314 | # crop image patch
315 | corners = (corners + npad).astype(int)
316 | patch = image[corners[0]:corners[2], corners[1]:corners[3]]
317 |
318 | # resize to out_size
319 | patch = cv2.resize(patch, (out_size, out_size))
320 |
321 | return patch
322 |
323 | def _create_labels(self, size):
324 | # skip if same sized labels already created
325 | if hasattr(self, 'labels') and self.labels.size() == size:
326 | return self.labels, self.weights
327 |
328 | def logistic_labels(x, y, r_pos, r_neg):
329 | dist = np.abs(x) + np.abs(y) # block distance
330 | labels = np.where(dist <= r_pos,
331 | np.ones_like(x),
332 | np.where(dist < r_neg,
333 | np.ones_like(x) * 0.5,
334 | np.zeros_like(x)))
335 | return labels
336 |
337 | # distances along x- and y-axis
338 | n, c, h, w = size
339 | x = np.arange(w) - w // 2
340 | y = np.arange(h) - h // 2
341 | x, y = np.meshgrid(x, y)
342 |
343 | # create logistic labels
344 | r_pos = self.cfg.r_pos / self.cfg.total_stride
345 | r_neg = self.cfg.r_neg / self.cfg.total_stride
346 | labels = logistic_labels(x, y, r_pos, r_neg)
347 |
348 | # pos/neg weights
349 | pos_num = np.sum(labels == 1)
350 | neg_num = np.sum(labels == 0)
351 | weights = np.zeros_like(labels)
352 | weights[labels == 1] = 0.5 / pos_num
353 | weights[labels == 0] = 0.5 / neg_num
354 | weights *= pos_num + neg_num
355 |
356 | # repeat to size
357 | labels = labels.reshape((1, 1, h, w))
358 | weights = weights.reshape((1, 1, h, w))
359 | labels = np.tile(labels, (n, c, 1, 1))
360 | weights = np.tile(weights, [n, c, 1, 1])
361 |
362 | # convert to tensors
363 | self.labels = torch.from_numpy(labels).to(self.device).float()
364 | self.weights = torch.from_numpy(weights).to(self.device).float()
365 |
366 | return self.labels, self.weights
367 |
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import
2 |
3 | from got10k.experiments import *
4 |
5 | from siamfc import TrackerSiamFC
6 | from config import config
7 |
8 |
9 | if __name__ == '__main__':
10 |
11 | # setup tracker
12 | net_path = 'model/model_e31.pth'
13 |
14 | tracker_test = TrackerSiamFC(net_path=net_path)
15 | '''experiments = ExperimentOTB(config.root_dir_for_OTB, version=2015,
16 | result_dir='dataset/results',
17 | report_dir='dataset/reports')'''
18 |
19 | experiments = ExperimentGOT10k('/Users/arbi/Desktop/All2', subset='val',
20 | result_dir='GOT/results',
21 | report_dir='GOT/reports')
22 |
23 | # run tracking experiments and report performance
24 | experiments.run(tracker_test, visualize=True)
25 | experiments.report([tracker_test.name])
26 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import, print_function
2 |
3 | import os
4 | import sys
5 | import torch
6 | from torch.utils.data import DataLoader
7 |
8 | from got10k.datasets import ImageNetVID, GOT10k
9 | from pairwise import Pairwise
10 | from siamfc import TrackerSiamFC
11 | from got10k.experiments import *
12 |
13 | from config import config
14 |
15 | if __name__ == '__main__':
16 |
17 | # setup dataset
18 | name = 'GOT-10k'
19 | assert name in ['VID', 'GOT-10k', 'All']
20 | if name == 'GOT-10k':
21 | seq_dataset = GOT10k(config.root_dir_for_GOT_10k, subset='val')
22 | pair_dataset = Pairwise(seq_dataset)
23 | elif name == 'VID':
24 | seq_dataset = ImageNetVID(config.root_dir_for_VID, subset=('train', 'val'))
25 | pair_dataset = Pairwise(seq_dataset)
26 | elif name == 'All':
27 | seq_got_dataset = GOT10k(config.root_dir_for_GOT_10k, subset='train')
28 | seq_vid_dataset = ImageNetVID(config.root_dir_for_VID, subset=('train', 'val'))
29 | pair_dataset = Pairwise(seq_got_dataset) + Pairwise(seq_vid_dataset)
30 |
31 | print(len(pair_dataset))
32 |
33 | # setup data loader
34 | cuda = torch.cuda.is_available()
35 | loader = DataLoader(pair_dataset,
36 | batch_size = config.batch_size,
37 | shuffle = True,
38 | pin_memory = cuda,
39 | drop_last = True,
40 | num_workers= config.num_workers)
41 |
42 | # setup tracker
43 | tracker = TrackerSiamFC()
44 |
45 | # training loop
46 | for epoch in range(config.epoch_num):
47 | for step, batch in enumerate(loader):
48 |
49 | loss = tracker.step(batch,
50 | backward=True,
51 | update_lr=(step == 0))
52 |
53 | if step % config.show_step == 0:
54 | print('Epoch [{}][{}/{}]: Loss: {:.3f}'.format( epoch + 1,
55 | step + 1,
56 | len(loader),
57 | loss))
58 | sys.stdout.flush()
59 |
60 | # save checkpoint
61 | net_path = os.path.join('model', 'model_e%d.pth' % (epoch + 1))
62 | torch.save(tracker.net.state_dict(), net_path)
63 |
64 | # test on OTB2015 dataset
65 | tracker_test = TrackerSiamFC(net_path=net_path)
66 | experiments = ExperimentOTB(config.root_dir_for_OTB, version=2015,
67 | result_dir='{}_dataset/results_{}'.format(name, epoch + 1),
68 | report_dir='{}_dataset/reports_{}'.format(name, epoch + 1))
69 |
70 | # run tracking experiments and report performance
71 | experiments.run(tracker_test, visualize=False)
72 | experiments.report([tracker_test.name])
73 |
--------------------------------------------------------------------------------