├── GOT-10k └── reports_31 │ └── OTB2015 │ └── SiamFC │ ├── performance.json │ ├── precision_plots.png │ └── success_plots.png ├── README.md ├── config.py ├── img ├── GOT-10k dataset.jpg ├── SiamFusion.png ├── results.png ├── results_for_31.jpg └── save model.jpg ├── pairwise.py ├── siamfc.py ├── test.py └── train.py /GOT-10k/reports_31/OTB2015/SiamFC/precision_plots.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/GOT-10k/reports_31/OTB2015/SiamFC/precision_plots.png -------------------------------------------------------------------------------- /GOT-10k/reports_31/OTB2015/SiamFC/success_plots.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/GOT-10k/reports_31/OTB2015/SiamFC/success_plots.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SiamFusion PyTorch implementation 2 | ## Introduction 3 | This is my Thesis in the direction of Visual Object Tracking. 4 | 5 | **SiamFusion architecture** 6 | 7 |

12 | 13 | 14 | ## How to Run - Training 15 | 1. **Prerequisites:** The project was built using **python 3.6** and tested on Ubuntu 18.04 and 16.04. It was tested on a **GTX 1080 Ti**. Furthermore it requires [PyTorch 4.1](https://pytorch.org/). 16 | 17 | 2. Download the **GOT-10k** Dataset in http://got-10k.aitestunion.com/downloads and extract it on the folder of your choice, in my case it is `/home/arbi/desktop/GOT-10k` (OBS: data reading is done in execution time, so if available extract the dataset in your SSD partition). 18 | 19 |

24 | 25 | 26 | 3. Download the ImageNet VID Dataset in http://bvisionweb1.cs.unc.edu/ILSVRC2017/download-videos-1p39.php and extract it on the folder of your choice (OBS: data reading is done in execution time, so if available extract the dataset in your SSD partition). You can get rid of the test part of the dataset, since it has no Annotations. 27 | 28 | 4. In **config.py** script `root_dir_for_GOT_10k`, `root_dir_for_VID and` and `root_dir_for_OTB` change to your directory. 29 | ``` 30 | root_dir_for_GOT_10k = '/home/arbi/desktop/GOT-10k' <-- change to your directory 31 | root_dir_for_VID = '/home/arbi/desktop/VID' <-- change to your directory 32 | root_dir_for_OTB = '/home/arbi/desktop/OTB2015' <-- change to your directory 33 | ``` 34 | 35 | 5. Run the **train.py** script: 36 | ``` 37 | python3 train.py 38 | ``` 39 | 40 | ## How to Run - Testing 41 | 1. Download pretrained `model_e31.pth` from [Yandex Disk](https://yadi.sk/d/c-ffSCvtxkdiLw), and put the file under `model/model_e31.pth`. 42 | 43 |

48 | 49 | 50 | 2. Run the **test.py** script: 51 | ``` 52 | python3 test.py 53 | ``` 54 | 55 | ## Results - Training 56 | **OTB2015** 57 | 58 |

63 | 64 | 65 | **Results on each epoch** 66 | 67 |

72 | 73 | 74 | -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | class Config(object): 4 | 5 | batch_size = 1 6 | num_workers= 1 7 | epoch_num = 50 8 | show_step = 1 9 | 10 | root_dir_for_GOT_10k = '/Users/arbi/Desktop' # '/Users/arbi/Desktop' '/media/arbi/9132EE0B9756C987/dataset/GOT-10k/full_data' '/home/arbi/desktop/GOT-10k' 11 | root_dir_for_VID = '/home/arbi/desktop/ILSVRC2017_VID' # '/home/arbi/desktop/ILSVRC' 12 | root_dir_for_OTB = '/Users/arbi/Desktop/dataOTB/OTB' #'/Users/arbi/Desktop/dataOTB/OTB' '/media/arbi/9132EE0B9756C987/dataset/OTB2015' '/home/arbi/desktop/data' 13 | 14 | config = Config() 15 | -------------------------------------------------------------------------------- /img/GOT-10k dataset.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/GOT-10k dataset.jpg -------------------------------------------------------------------------------- /img/SiamFusion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/SiamFusion.png -------------------------------------------------------------------------------- /img/results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/results.png -------------------------------------------------------------------------------- /img/results_for_31.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/results_for_31.jpg -------------------------------------------------------------------------------- /img/save model.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arbitularov/SiamFusion/d7e2459cd659096a46f8e45e326961a3f4ebac89/img/save model.jpg -------------------------------------------------------------------------------- /pairwise.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division 2 | 3 | import numpy as np 4 | from collections import namedtuple 5 | from torch.utils.data import Dataset 6 | from torchvision.transforms import Compose, CenterCrop, RandomCrop, ToTensor 7 | from PIL import Image, ImageStat, ImageOps 8 | import cv2 9 | import random 10 | 11 | 12 | class RandomStretch(object): 13 | 14 | def __init__(self, max_stretch=0.05, interpolation='bilinear'): 15 | assert interpolation in ['bilinear', 'bicubic'] 16 | self.max_stretch = max_stretch 17 | self.interpolation = interpolation 18 | 19 | def __call__(self, img): 20 | scale = 1.0 + np.random.uniform( 21 | -self.max_stretch, self.max_stretch) 22 | size = np.round(np.array(img.size, float) * scale).astype(int) 23 | if self.interpolation == 'bilinear': 24 | method = Image.BILINEAR 25 | elif self.interpolation == 'bicubic': 26 | method = Image.BICUBIC 27 | return img.resize(tuple(size), method) 28 | 29 | 30 | class Pairwise(Dataset): 31 | 32 | def __init__(self, seq_dataset, **kargs): 33 | super(Pairwise, self).__init__() 34 | self.cfg = self.parse_args(**kargs) 35 | 36 | self.seq_dataset = seq_dataset 37 | self.indices = np.random.permutation(len(seq_dataset)) 38 | # augmentation for exemplar and instance images 39 | self.transform_z = Compose([ 40 | RandomStretch(max_stretch=0.05), 41 | CenterCrop(self.cfg.instance_sz - 8), 42 | RandomCrop(self.cfg.instance_sz - 2 * 8), 43 | CenterCrop(self.cfg.exemplar_sz), 44 | ToTensor()]) 45 | self.transform_x = Compose([ 46 | RandomStretch(max_stretch=0.05), 47 | CenterCrop(self.cfg.instance_sz - 8), 48 | RandomCrop(self.cfg.instance_sz - 2 * 8), 49 | ToTensor()]) 50 | 51 | def parse_args(self, **kargs): 52 | # default parameters 53 | cfg = { 54 | 'pairs_per_seq': 10, 55 | 'max_dist': 100, 56 | 'exemplar_sz': 127, 57 | 'instance_sz': 255, 58 | 'context': 0.5} 59 | 60 | for key, val in kargs.items(): 61 | if key in cfg: 62 | cfg.update({key: val}) 63 | return namedtuple('GenericDict', cfg.keys())(**cfg) 64 | 65 | def __getitem__(self, index): 66 | index = self.indices[index % len(self.seq_dataset)] 67 | img_files, anno = self.seq_dataset[index] 68 | 69 | # remove too small objects 70 | valid = anno[:, 2:].prod(axis=1) >= 10 71 | img_files = np.array(img_files)[valid] 72 | anno = anno[valid, :] 73 | 74 | rand_z, rand_x = self._sample_pair(len(img_files)) 75 | 76 | exemplar_image = Image.open(img_files[rand_z]) 77 | exemplar_img = self._crop_and_resize(exemplar_image, anno[rand_z]) 78 | exemplar_image = 255.0 * self.transform_z(exemplar_img) 79 | 80 | exemplar_noise = self.sp_noise(exemplar_img, 0.05) 81 | exemplar_noise = 255.0 * self.transform_z(exemplar_noise) 82 | 83 | instance_image = Image.open(img_files[rand_x]) 84 | instance_img = self._crop_and_resize(instance_image, anno[rand_x]) 85 | instance_image = 255.0 * self.transform_x(instance_img) 86 | 87 | instance_noise = self.sp_noise(instance_img, 0.05) 88 | instance_noise = 255.0 * self.transform_x(instance_noise) 89 | 90 | return exemplar_image, exemplar_noise, instance_image, instance_noise 91 | 92 | def __len__(self): 93 | return self.cfg.pairs_per_seq * len(self.seq_dataset) 94 | 95 | def sp_noise(self, image, prob): 96 | ''' 97 | Add salt and pepper noise to image 98 | prob: Probability of the noise 99 | ''' 100 | image = np.array(image) 101 | output = np.zeros(image.shape,np.uint8) 102 | thres = 1 - prob 103 | for i in range(image.shape[0]): 104 | for j in range(image.shape[1]): 105 | rdn = random.random() 106 | if rdn < prob: 107 | output[i][j] = 0 108 | elif rdn > thres: 109 | output[i][j] = 255 110 | else: 111 | output[i][j] = image[i][j] 112 | cv2.imwrite("cv.png", output) 113 | output = Image.fromarray(output) 114 | return output 115 | 116 | def _sample_pair(self, n): 117 | assert n > 0 118 | if n == 1: 119 | return 0, 0 120 | elif n == 2: 121 | return 0, 1 122 | else: 123 | max_dist = min(n - 1, self.cfg.max_dist) 124 | rand_dist = np.random.choice(max_dist) + 1 125 | rand_z = np.random.choice(n - rand_dist) 126 | rand_x = rand_z + rand_dist 127 | 128 | return rand_z, rand_x 129 | 130 | def _crop_and_resize(self, image, box): 131 | # convert box to 0-indexed and center based 132 | box = np.array([ 133 | box[0] - 1 + (box[2] - 1) / 2, 134 | box[1] - 1 + (box[3] - 1) / 2, 135 | box[2], box[3]], dtype=np.float32) 136 | center, target_sz = box[:2], box[2:] 137 | 138 | # exemplar and search sizes 139 | context = self.cfg.context * np.sum(target_sz) 140 | z_sz = np.sqrt(np.prod(target_sz + context)) 141 | x_sz = z_sz * self.cfg.instance_sz / self.cfg.exemplar_sz 142 | 143 | # convert box to corners (0-indexed) 144 | size = round(x_sz) 145 | corners = np.concatenate(( 146 | np.round(center - (size - 1) / 2), 147 | np.round(center - (size - 1) / 2) + size)) 148 | corners = np.round(corners).astype(int) 149 | 150 | # pad image if necessary 151 | pads = np.concatenate(( 152 | -corners[:2], corners[2:] - image.size)) 153 | npad = max(0, int(pads.max())) 154 | if npad > 0: 155 | avg_color = ImageStat.Stat(image).mean 156 | # PIL doesn't support float RGB image 157 | avg_color = tuple(int(round(c)) for c in avg_color) 158 | image = ImageOps.expand(image, border=npad, fill=avg_color) 159 | 160 | # crop image patch 161 | corners = tuple((corners + npad).astype(int)) 162 | patch = image.crop(corners) 163 | 164 | # resize to instance_sz 165 | out_size = (self.cfg.instance_sz, self.cfg.instance_sz) 166 | patch = patch.resize(out_size, Image.BILINEAR) 167 | #print("patch",patch) 168 | 169 | return patch 170 | -------------------------------------------------------------------------------- /siamfc.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.init as init 6 | import torch.nn.functional as F 7 | import torch.optim as optim 8 | import numpy as np 9 | import cv2 10 | from collections import namedtuple 11 | from torch.optim.lr_scheduler import ExponentialLR 12 | 13 | from got10k.trackers import Tracker 14 | 15 | 16 | class SiamFC(nn.Module): 17 | 18 | def __init__(self): 19 | super(SiamFC, self).__init__() 20 | 21 | 22 | self.feature1 = nn.Sequential( 23 | # conv1 24 | nn.Conv2d(3, 192, 11, 2), 25 | nn.BatchNorm2d(192, eps=1e-6, momentum=0.05), 26 | nn.ReLU(inplace=True), 27 | nn.MaxPool2d(3, 2)) 28 | 29 | self.feature2 = nn.Sequential( 30 | # conv1 31 | nn.Conv2d(3, 192, 11, 2), 32 | nn.BatchNorm2d(192, eps=1e-6, momentum=0.05), 33 | nn.ReLU(inplace=True), 34 | nn.MaxPool2d(3, 2)) 35 | 36 | self.feature3 = nn.Sequential( 37 | # conv2 38 | nn.Conv2d(192, 256, 3, 1), 39 | nn.Conv2d(256, 256, 3, 1), 40 | nn.BatchNorm2d(256, eps=1e-6, momentum=0.05), 41 | nn.ReLU(inplace=True), 42 | nn.MaxPool2d(3, 2), 43 | # conv3 44 | nn.Conv2d(256, 512, 3, 1), 45 | nn.BatchNorm2d(512, eps=1e-6, momentum=0.05), 46 | nn.ReLU(inplace=True), 47 | nn.Conv2d(512, 512, 3, 1), 48 | nn.BatchNorm2d(512, eps=1e-6, momentum=0.05), 49 | nn.ReLU(inplace=True), 50 | nn.Conv2d(512, 384, 3, 1), 51 | 52 | ) 53 | 54 | self._initialize_weights() 55 | 56 | def forward(self, z, z_noise, x, x_noise): 57 | 58 | z = self.feature1(z) 59 | z_noise = self.feature2(z_noise) 60 | z = torch.add(z, z_noise) 61 | z = self.feature3(z) 62 | 63 | x = self.feature1(x) 64 | x_noise = self.feature2(x_noise) 65 | x = torch.add(x, x_noise) 66 | x = self.feature3(x) 67 | 68 | # fast cross correlation 69 | out = self.fast_cross(x, z, "out", 0.1) 70 | #out = torch.add(cross4, out) 71 | 72 | # adjust the scale of responses 73 | out = 0.001 * out + 0.0 74 | 75 | return out 76 | 77 | def _initialize_weights(self): 78 | for m in self.modules(): 79 | if isinstance(m, nn.Conv2d): 80 | init.kaiming_normal_(m.weight.data, mode='fan_out', 81 | nonlinearity='relu') 82 | m.bias.data.fill_(0) 83 | elif isinstance(m, nn.BatchNorm2d): 84 | m.weight.data.fill_(1) 85 | m.bias.data.zero_() 86 | 87 | def fast_cross(self, x, z, img_name = None, num = 0.001): 88 | n, c, h, w = x.size() 89 | out = F.conv2d(x.view(1, n * c, h, w), z, groups=n) 90 | out = out.view(n, 1, out.size(-2), out.size(-1)) 91 | 92 | if img_name != None: 93 | out_img = num * out + 0.0 94 | out_img = np.asarray(out_img[0].permute(1,2,0).detach().cpu()) 95 | cv2.imwrite("{}.png".format(img_name), out_img) 96 | 97 | return out 98 | 99 | 100 | 101 | class TrackerSiamFC(Tracker): 102 | 103 | def __init__(self, net_path=None, **kargs): 104 | super(TrackerSiamFC, self).__init__( 105 | name='SiamFC', is_deterministic=True) 106 | self.cfg = self.parse_args(**kargs) 107 | 108 | # setup GPU device if available 109 | self.cuda = torch.cuda.is_available() 110 | self.device = torch.device('cuda:0' if self.cuda else 'cpu') 111 | 112 | # setup model 113 | self.net = SiamFC() 114 | if net_path is not None: 115 | self.net.load_state_dict(torch.load( 116 | net_path, map_location=lambda storage, loc: storage)) 117 | self.net = self.net.to(self.device) 118 | 119 | # setup optimizer 120 | self.optimizer = optim.SGD( 121 | self.net.parameters(), 122 | lr=self.cfg.initial_lr, 123 | weight_decay=self.cfg.weight_decay, 124 | momentum=self.cfg.momentum) 125 | 126 | # setup lr scheduler 127 | self.lr_scheduler = ExponentialLR( 128 | self.optimizer, gamma=self.cfg.lr_decay) 129 | 130 | def parse_args(self, **kargs): 131 | # default parameters 132 | cfg = { 133 | # inference parameters 134 | 'exemplar_sz': 127, 135 | 'instance_sz': 255, 136 | 'context': 0.5, 137 | 'scale_num': 3, 138 | 'scale_step': 1.0375, 139 | 'scale_lr': 0.59, 140 | 'scale_penalty': 0.9745, 141 | 'window_influence': 0.176, 142 | 'response_sz': 17, 143 | 'response_up': 16, 144 | 'total_stride': 8, 145 | 'adjust_scale': 0.001, 146 | # train parameters 147 | 'initial_lr': 0.01, 148 | 'lr_decay': 0.8685113737513527, 149 | 'weight_decay': 5e-4, 150 | 'momentum': 0.9, 151 | 'r_pos': 16, 152 | 'r_neg': 0} 153 | 154 | for key, val in kargs.items(): 155 | if key in cfg: 156 | cfg.update({key: val}) 157 | return namedtuple('GenericDict', cfg.keys())(**cfg) 158 | 159 | def init(self, image, box): 160 | image = np.asarray(image) 161 | 162 | # convert box to 0-indexed and center based [y, x, h, w] 163 | box = np.array([ 164 | box[1] - 1 + (box[3] - 1) / 2, 165 | box[0] - 1 + (box[2] - 1) / 2, 166 | box[3], box[2]], dtype=np.float32) 167 | self.center, self.target_sz = box[:2], box[2:] 168 | 169 | # create hanning window 170 | self.upscale_sz = self.cfg.response_up * self.cfg.response_sz 171 | self.hann_window = np.outer( 172 | np.hanning(self.upscale_sz), 173 | np.hanning(self.upscale_sz)) 174 | self.hann_window /= self.hann_window.sum() 175 | 176 | # search scale factors 177 | self.scale_factors = self.cfg.scale_step ** np.linspace( 178 | -(self.cfg.scale_num // 2), 179 | self.cfg.scale_num // 2, self.cfg.scale_num) 180 | 181 | # exemplar and search sizes 182 | context = self.cfg.context * np.sum(self.target_sz) 183 | self.z_sz = np.sqrt(np.prod(self.target_sz + context)) 184 | self.x_sz = self.z_sz * \ 185 | self.cfg.instance_sz / self.cfg.exemplar_sz 186 | 187 | # exemplar image 188 | self.avg_color = np.mean(image, axis=(0, 1)) 189 | exemplar_image = self._crop_and_resize( 190 | image, self.center, self.z_sz, 191 | out_size=self.cfg.exemplar_sz, 192 | pad_color=self.avg_color) 193 | 194 | # exemplar features 195 | exemplar_image = torch.from_numpy(exemplar_image).to( 196 | self.device).permute([2, 0, 1]).unsqueeze(0).float() 197 | with torch.set_grad_enabled(False): 198 | self.net.eval() 199 | 200 | z = self.net.feature1(exemplar_image) 201 | z_noise = self.net.feature2(exemplar_image) 202 | z = torch.add(z, z_noise) 203 | self.kernel = self.net.feature3(z) 204 | 205 | def update(self, image): 206 | image = np.asarray(image) 207 | 208 | # search images 209 | instance_images = [self._crop_and_resize( 210 | image, self.center, self.x_sz * f, 211 | out_size=self.cfg.instance_sz, 212 | pad_color=self.avg_color) for f in self.scale_factors] 213 | instance_images = np.stack(instance_images, axis=0) 214 | instance_images = torch.from_numpy(instance_images).to( 215 | self.device).permute([0, 3, 1, 2]).float() 216 | 217 | # responses 218 | with torch.set_grad_enabled(False): 219 | self.net.eval() 220 | 221 | x = self.net.feature1(instance_images) 222 | x_noise = self.net.feature2(instance_images) 223 | x = torch.add(x, x_noise) 224 | instances = self.net.feature3(x) 225 | 226 | responses = F.conv2d(instances, self.kernel) * 0.001 227 | responses = responses.squeeze(1).cpu().numpy() 228 | 229 | # upsample responses and penalize scale changes 230 | responses = np.stack([cv2.resize( 231 | t, (self.upscale_sz, self.upscale_sz), 232 | interpolation=cv2.INTER_CUBIC) for t in responses], axis=0) 233 | responses[:self.cfg.scale_num // 2] *= self.cfg.scale_penalty 234 | responses[self.cfg.scale_num // 2 + 1:] *= self.cfg.scale_penalty 235 | 236 | # peak scale 237 | scale_id = np.argmax(np.amax(responses, axis=(1, 2))) 238 | 239 | # peak location 240 | response = responses[scale_id] 241 | response -= response.min() 242 | response /= response.sum() + 1e-16 243 | response = (1 - self.cfg.window_influence) * response + \ 244 | self.cfg.window_influence * self.hann_window 245 | loc = np.unravel_index(response.argmax(), response.shape) 246 | 247 | # locate target center 248 | disp_in_response = np.array(loc) - self.upscale_sz // 2 249 | disp_in_instance = disp_in_response * \ 250 | self.cfg.total_stride / self.cfg.response_up 251 | disp_in_image = disp_in_instance * self.x_sz * \ 252 | self.scale_factors[scale_id] / self.cfg.instance_sz 253 | self.center += disp_in_image 254 | 255 | # update target size 256 | scale = (1 - self.cfg.scale_lr) * 1.0 + \ 257 | self.cfg.scale_lr * self.scale_factors[scale_id] 258 | self.target_sz *= scale 259 | self.z_sz *= scale 260 | self.x_sz *= scale 261 | 262 | # return 1-indexed and left-top based bounding box 263 | box = np.array([ 264 | self.center[1] + 1 - (self.target_sz[1] - 1) / 2, 265 | self.center[0] + 1 - (self.target_sz[0] - 1) / 2, 266 | self.target_sz[1], self.target_sz[0]]) 267 | 268 | return box 269 | 270 | def step(self, batch, backward=True, update_lr=False): 271 | if backward: 272 | self.net.train() 273 | if update_lr: 274 | self.lr_scheduler.step() 275 | else: 276 | self.net.eval() 277 | 278 | z = batch[0].to(self.device) 279 | z_noise = batch[1].to(self.device) 280 | 281 | x = batch[2].to(self.device) 282 | x_noise = batch[3].to(self.device) 283 | 284 | with torch.set_grad_enabled(backward): 285 | responses = self.net(z, z_noise, x, x_noise) 286 | labels, weights = self._create_labels(responses.size()) 287 | loss = F.binary_cross_entropy_with_logits( 288 | responses, labels, weight=weights, size_average=True) 289 | 290 | if backward: 291 | self.optimizer.zero_grad() 292 | loss.backward() 293 | self.optimizer.step() 294 | print(loss.item()) 295 | return loss.item() 296 | 297 | def _crop_and_resize(self, image, center, size, out_size, pad_color): 298 | # convert box to corners (0-indexed) 299 | size = round(size) 300 | corners = np.concatenate(( 301 | np.round(center - (size - 1) / 2), 302 | np.round(center - (size - 1) / 2) + size)) 303 | corners = np.round(corners).astype(int) 304 | 305 | # pad image if necessary 306 | pads = np.concatenate(( 307 | -corners[:2], corners[2:] - image.shape[:2])) 308 | npad = max(0, int(pads.max())) 309 | if npad > 0: 310 | image = cv2.copyMakeBorder( 311 | image, npad, npad, npad, npad, 312 | cv2.BORDER_CONSTANT, value=pad_color) 313 | 314 | # crop image patch 315 | corners = (corners + npad).astype(int) 316 | patch = image[corners[0]:corners[2], corners[1]:corners[3]] 317 | 318 | # resize to out_size 319 | patch = cv2.resize(patch, (out_size, out_size)) 320 | 321 | return patch 322 | 323 | def _create_labels(self, size): 324 | # skip if same sized labels already created 325 | if hasattr(self, 'labels') and self.labels.size() == size: 326 | return self.labels, self.weights 327 | 328 | def logistic_labels(x, y, r_pos, r_neg): 329 | dist = np.abs(x) + np.abs(y) # block distance 330 | labels = np.where(dist <= r_pos, 331 | np.ones_like(x), 332 | np.where(dist < r_neg, 333 | np.ones_like(x) * 0.5, 334 | np.zeros_like(x))) 335 | return labels 336 | 337 | # distances along x- and y-axis 338 | n, c, h, w = size 339 | x = np.arange(w) - w // 2 340 | y = np.arange(h) - h // 2 341 | x, y = np.meshgrid(x, y) 342 | 343 | # create logistic labels 344 | r_pos = self.cfg.r_pos / self.cfg.total_stride 345 | r_neg = self.cfg.r_neg / self.cfg.total_stride 346 | labels = logistic_labels(x, y, r_pos, r_neg) 347 | 348 | # pos/neg weights 349 | pos_num = np.sum(labels == 1) 350 | neg_num = np.sum(labels == 0) 351 | weights = np.zeros_like(labels) 352 | weights[labels == 1] = 0.5 / pos_num 353 | weights[labels == 0] = 0.5 / neg_num 354 | weights *= pos_num + neg_num 355 | 356 | # repeat to size 357 | labels = labels.reshape((1, 1, h, w)) 358 | weights = weights.reshape((1, 1, h, w)) 359 | labels = np.tile(labels, (n, c, 1, 1)) 360 | weights = np.tile(weights, [n, c, 1, 1]) 361 | 362 | # convert to tensors 363 | self.labels = torch.from_numpy(labels).to(self.device).float() 364 | self.weights = torch.from_numpy(weights).to(self.device).float() 365 | 366 | return self.labels, self.weights 367 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | 3 | from got10k.experiments import * 4 | 5 | from siamfc import TrackerSiamFC 6 | from config import config 7 | 8 | 9 | if __name__ == '__main__': 10 | 11 | # setup tracker 12 | net_path = 'model/model_e31.pth' 13 | 14 | tracker_test = TrackerSiamFC(net_path=net_path) 15 | '''experiments = ExperimentOTB(config.root_dir_for_OTB, version=2015, 16 | result_dir='dataset/results', 17 | report_dir='dataset/reports')''' 18 | 19 | experiments = ExperimentGOT10k('/Users/arbi/Desktop/All2', subset='val', 20 | result_dir='GOT/results', 21 | report_dir='GOT/reports') 22 | 23 | # run tracking experiments and report performance 24 | experiments.run(tracker_test, visualize=True) 25 | experiments.report([tracker_test.name]) 26 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, print_function 2 | 3 | import os 4 | import sys 5 | import torch 6 | from torch.utils.data import DataLoader 7 | 8 | from got10k.datasets import ImageNetVID, GOT10k 9 | from pairwise import Pairwise 10 | from siamfc import TrackerSiamFC 11 | from got10k.experiments import * 12 | 13 | from config import config 14 | 15 | if __name__ == '__main__': 16 | 17 | # setup dataset 18 | name = 'GOT-10k' 19 | assert name in ['VID', 'GOT-10k', 'All'] 20 | if name == 'GOT-10k': 21 | seq_dataset = GOT10k(config.root_dir_for_GOT_10k, subset='val') 22 | pair_dataset = Pairwise(seq_dataset) 23 | elif name == 'VID': 24 | seq_dataset = ImageNetVID(config.root_dir_for_VID, subset=('train', 'val')) 25 | pair_dataset = Pairwise(seq_dataset) 26 | elif name == 'All': 27 | seq_got_dataset = GOT10k(config.root_dir_for_GOT_10k, subset='train') 28 | seq_vid_dataset = ImageNetVID(config.root_dir_for_VID, subset=('train', 'val')) 29 | pair_dataset = Pairwise(seq_got_dataset) + Pairwise(seq_vid_dataset) 30 | 31 | print(len(pair_dataset)) 32 | 33 | # setup data loader 34 | cuda = torch.cuda.is_available() 35 | loader = DataLoader(pair_dataset, 36 | batch_size = config.batch_size, 37 | shuffle = True, 38 | pin_memory = cuda, 39 | drop_last = True, 40 | num_workers= config.num_workers) 41 | 42 | # setup tracker 43 | tracker = TrackerSiamFC() 44 | 45 | # training loop 46 | for epoch in range(config.epoch_num): 47 | for step, batch in enumerate(loader): 48 | 49 | loss = tracker.step(batch, 50 | backward=True, 51 | update_lr=(step == 0)) 52 | 53 | if step % config.show_step == 0: 54 | print('Epoch [{}][{}/{}]: Loss: {:.3f}'.format( epoch + 1, 55 | step + 1, 56 | len(loader), 57 | loss)) 58 | sys.stdout.flush() 59 | 60 | # save checkpoint 61 | net_path = os.path.join('model', 'model_e%d.pth' % (epoch + 1)) 62 | torch.save(tracker.net.state_dict(), net_path) 63 | 64 | # test on OTB2015 dataset 65 | tracker_test = TrackerSiamFC(net_path=net_path) 66 | experiments = ExperimentOTB(config.root_dir_for_OTB, version=2015, 67 | result_dir='{}_dataset/results_{}'.format(name, epoch + 1), 68 | report_dir='{}_dataset/reports_{}'.format(name, epoch + 1)) 69 | 70 | # run tracking experiments and report performance 71 | experiments.run(tracker_test, visualize=False) 72 | experiments.report([tracker_test.name]) 73 | --------------------------------------------------------------------------------