├── README.md ├── code ├── __pycache__ │ ├── bbox_transform.cpython-36.pyc │ ├── config.cpython-36.pyc │ ├── dataset.cpython-36.pyc │ ├── gen_all_anchors.cpython-36.pyc │ ├── generate_anchors.cpython-36.pyc │ ├── net.cpython-36.pyc │ ├── py_nms.cpython-36.pyc │ ├── run_SiamRPN.cpython-36.pyc │ ├── train.cpython-36.pyc │ ├── util_test.cpython-36.pyc │ └── utils.cpython-36.pyc ├── bbox_transform.py ├── config.py ├── configuration.py ├── dataset.py ├── gen_all_anchors.py ├── generate_anchors.py ├── net.py ├── py_nms.py ├── run_SiamRPN.py ├── test.py ├── train.py ├── util_test.py ├── utils.py ├── vot.py └── vot_SiamRPN.py ├── data └── whole_list.txt └── ext └── roi-align.png /README.md: -------------------------------------------------------------------------------- 1 | # DaSiamRPNWithOfflineTraining 2 | 3 | This repository adds offline training module and testing module (including distractor-awareness and local2global strategy) to the original PyTorch implementation of [DaSiamRPN](https://github.com/foolwood/DaSiamRPN). 4 | 5 | ## Introduction 6 | 7 | **SiamRPN** formulates the task of visual tracking as a task of localization and identification simultaneously, initially described in an [CVPR2018 spotlight paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf). (Slides at [CVPR 2018 Spotlight](https://drive.google.com/open?id=1OGIOUqANvYfZjRoQfpiDqhPQtOvPCpdq)) 8 | 9 | **DaSiamRPN** improves the performances of SiamRPN by (1) introducing an effective sampling strategy to control the imbalanced sample distribution, (2) designing a novel distractor-aware module to perform incremental learning, (3) making a long-term tracking extension. [ECCV2018](https://arxiv.org/pdf/1808.06048.pdf). (Slides at [VOT-18 Real-time challenge winners talk](https://drive.google.com/open?id=1dsEI2uYHDfELK0CW2xgv7R4QdCs6lwfr)) 10 | 11 | Specifically, for (2), this repository implements ROI-align technique to achieve similarity matching between x and z. The insight of the ROI-align implementation can be seen from the figure below. 12 |

13 | 14 | ## Prerequisites 15 | 16 | CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz 17 | GPU: NVIDIA GTX1060 18 | 19 | - python3.6 20 | - pytorch == 0.4.0 21 | - numpy 22 | - opencv 23 | - easydict 24 | 25 | ## Data Preparation 26 | One can prepare his own dataset for training and testing DaSiamRPN. Putting aside the positive and negative pairing for distractor-aware training as specified in the paper, each training and testing sequence, say Basketball, is organized in a folder "Basketball" in which "Basketball_gt.txt" and a sub-folder "imgs" are stored. The gt files have lines of groundtruths in format of (x, y, w, h). In "imgs" folders are frames named in format "xxxx.jpg", (e.g. 0001.jpg-9999.jpg). 27 | 28 | Besides the data, one should also prepare the corresponding list formatted as in ./data/whole_list.txt, where each row consists of the path of a sequence folder and the number of total frames of the sequence. 29 | 30 | ## Training Procedure 31 | `python code/train.py` 32 | 33 | The model will be saved in ./output/weights/ 34 | 35 | ## Testing Procedure 36 | `python code/test.py` 37 | 38 | ## Postscript 39 | Currently, this repo remains under construction, meaning that its effectiveness is not guaranteed. But one can still get some insights from reading this repo, including myself. And that is exactly what really matters. However, more is coming in the immediate future, including: (1) the sampling strategy to control the imbalanced sample distribution and (2) other implementation details not specified clearly in the related paper. 40 | 41 | To better this repo, I am looking forward to your suggestion. ^_^ 42 | 43 | -------------------------------------------------------------------------------- /code/__pycache__/bbox_transform.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/bbox_transform.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/config.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/config.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/dataset.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/dataset.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/gen_all_anchors.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/gen_all_anchors.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/generate_anchors.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/generate_anchors.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/net.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/net.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/py_nms.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/py_nms.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/run_SiamRPN.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/run_SiamRPN.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/train.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/train.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/util_test.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/util_test.cpython-36.pyc -------------------------------------------------------------------------------- /code/__pycache__/utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/utils.cpython-36.pyc -------------------------------------------------------------------------------- /code/bbox_transform.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | import pdb 10 | def bbox_transform(ex_rois, gt_rois): 11 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0 12 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0 13 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths 14 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights 15 | 16 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0 17 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0 18 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths 19 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights 20 | 21 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths 22 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights 23 | targets_dw = np.log(gt_widths / ex_widths) 24 | targets_dh = np.log(gt_heights / ex_heights) 25 | 26 | targets = np.vstack( 27 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose() 28 | return targets 29 | 30 | def bbox_transform_inv(boxes, deltas): 31 | if boxes.shape[0] == 0: 32 | return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype) 33 | 34 | boxes = boxes.astype(deltas.dtype, copy=False) 35 | #pdb.set_trace() 36 | widths = boxes[:, 2] - boxes[:, 0] + 1.0 37 | heights = boxes[:, 3] - boxes[:, 1] + 1.0 38 | ctr_x = boxes[:, 0] + 0.5 * widths 39 | ctr_y = boxes[:, 1] + 0.5 * heights 40 | 41 | dx = deltas[:, 0::4] 42 | dy = deltas[:, 1::4] 43 | dw = deltas[:, 2::4] 44 | dh = deltas[:, 3::4] 45 | 46 | pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis] 47 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis] 48 | pred_w = np.exp(dw) * widths[:, np.newaxis] 49 | pred_h = np.exp(dh) * heights[:, np.newaxis] 50 | 51 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype) 52 | # x1 53 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w 54 | # y1 55 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h 56 | # x2 57 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w 58 | # y2 59 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h 60 | 61 | return pred_boxes 62 | 63 | def clip_boxes(boxes, im_shape): 64 | """ 65 | Clip boxes to image boundaries. 66 | """ 67 | 68 | # x1 >= 0 69 | boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0) 70 | # y1 >= 0 71 | boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0) 72 | # x2 < im_shape[1] 73 | boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0) 74 | # y2 < im_shape[0] 75 | boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0) 76 | return boxes 77 | -------------------------------------------------------------------------------- /code/config.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Fast R-CNN config system. 9 | 10 | This file specifies default config options for Fast R-CNN. You should not 11 | change values in this file. Instead, you should write a config file (in yaml) 12 | and use cfg_from_file(yaml_file) to load it and override the default options. 13 | 14 | Most tools in $ROOT/tools take a --cfg option to specify an override file. 15 | - See tools/{train,test}_net.py for example code that uses cfg_from_file() 16 | - See experiments/cfgs/*.yml for example YAML config override files 17 | """ 18 | 19 | import os 20 | import os.path as osp 21 | import numpy as np 22 | # `pip install easydict` if you don't have it 23 | from easydict import EasyDict as edict 24 | 25 | __C = edict() 26 | # Consumers can get config by: 27 | # from fast_rcnn_config import cfg 28 | cfg = __C 29 | 30 | # 31 | # Training options 32 | # 33 | 34 | __C.TRAIN = edict() 35 | 36 | __C.TRAIN.SNAPSHOT_INFIX = '' 37 | 38 | # Deprecated (inside weights) 39 | __C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 40 | # IOU >= thresh: positive example 41 | __C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7 42 | # IOU < thresh: negative example 43 | __C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3 44 | # Deprecated (outside weights) 45 | __C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 46 | # Give the positive RPN examples weight of p * 1 / {num positives} 47 | # and give negatives a weight of (1 - p) 48 | # Set to -1.0 to use uniform example weighting 49 | __C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0 50 | __C.TRAIN.LAMBDA = 1.0 51 | 52 | # 53 | # Testing options 54 | # 55 | 56 | __C.TEST = edict() 57 | 58 | # Scales to use during testing (can list multiple scales) 59 | # Each scale is the pixel size of an image's shortest side 60 | __C.TEST.SCALES = (600,) 61 | 62 | # Max pixel size of the longest side of a scaled input image 63 | __C.TEST.MAX_SIZE = 1000 64 | 65 | # Overlap threshold used for non-maximum suppression (suppress boxes with 66 | # IoU >= this threshold) 67 | __C.TEST.NMS = 0.3 68 | 69 | # Experimental: treat the (K+1) units in the cls_score layer as linear 70 | # predictors (trained, eg, with one-vs-rest SVMs). 71 | __C.TEST.SVM = False 72 | 73 | # Test using bounding-box regressors 74 | __C.TEST.BBOX_REG = True 75 | 76 | # Propose boxes 77 | __C.TEST.HAS_RPN = True 78 | 79 | # Test using these proposals 80 | __C.TEST.PROPOSAL_METHOD = 'gt' 81 | 82 | ## NMS threshold used on RPN proposals 83 | __C.TEST.RPN_NMS_THRESH = 0.7 84 | ## Number of top scoring boxes to keep before apply NMS to RPN proposals 85 | __C.TEST.RPN_PRE_NMS_TOP_N = 6000 86 | ## Number of top scoring boxes to keep after applying NMS to RPN proposals 87 | __C.TEST.RPN_POST_NMS_TOP_N = 300 88 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale) 89 | __C.TEST.RPN_MIN_SIZE = 16 90 | 91 | 92 | # 93 | # MISC 94 | # 95 | 96 | # The mapping from image coordinates to feature map coordinates might cause 97 | # some boxes that are distinct in image space to become identical in feature 98 | # coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor 99 | # for identifying duplicate boxes. 100 | # 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16 101 | __C.DEDUP_BOXES = 1./16. 102 | 103 | # Pixel mean values (BGR order) as a (1, 1, 3) array 104 | # We use the same pixel mean for all networks even though it's not exactly what 105 | # they were trained with 106 | __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]]) 107 | 108 | # For reproducibility 109 | __C.RNG_SEED = 3 110 | 111 | # A small number that's used many times 112 | __C.EPS = 1e-14 113 | 114 | # Root directory of project 115 | __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..')) 116 | 117 | # Data directory 118 | __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data')) 119 | 120 | # Model directory 121 | __C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'caltech_ped')) 122 | 123 | # Name (or path to) the matlab executable 124 | __C.MATLAB = 'matlab' 125 | 126 | # Place outputs under an experiments directory 127 | __C.EXP_DIR = 'default' 128 | 129 | # Use GPU implementation of non-maximum suppression 130 | __C.USE_GPU_NMS = True 131 | 132 | # Default GPU device id 133 | __C.GPU_ID = 0,1 134 | 135 | # added by zhk 136 | # Default code dir 137 | __C.NEW_ROOT_DIR = '/home/code/lishen/py-faster-rcnn' 138 | 139 | 140 | def get_output_dir(imdb, net=None): 141 | """Return the directory where experimental artifacts are placed. 142 | If the directory does not exist, it is created. 143 | 144 | A canonical path is built using the name from an imdb and a network 145 | (if not None). 146 | """ 147 | #outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', imdb.name)) 148 | if __C.TRAIN.USE_OHEM: 149 | outdir = osp.abspath(osp.join(__C.NEW_ROOT_DIR, 'output', imdb.name, 'ohem')) 150 | else: 151 | outdir = osp.abspath(osp.join(__C.NEW_ROOT_DIR, 'output', imdb.name)) 152 | if net is not None: 153 | outdir = osp.join(outdir, net.name) 154 | if not os.path.exists(outdir): 155 | os.makedirs(outdir) 156 | return outdir 157 | 158 | def _merge_a_into_b(a, b): 159 | """Merge config dictionary a into config dictionary b, clobbering the 160 | options in b whenever they are also specified in a. 161 | """ 162 | if type(a) is not edict: 163 | return 164 | 165 | for k, v in a.iteritems(): 166 | # a must specify keys that are in b 167 | if not b.has_key(k): 168 | raise KeyError('{} is not a valid config key'.format(k)) 169 | 170 | # the types must match, too 171 | old_type = type(b[k]) 172 | if old_type is not type(v): 173 | if isinstance(b[k], np.ndarray): 174 | v = np.array(v, dtype=b[k].dtype) 175 | else: 176 | raise ValueError(('Type mismatch ({} vs. {}) ' 177 | 'for config key: {}').format(type(b[k]), 178 | type(v), k)) 179 | 180 | # recursively merge dicts 181 | if type(v) is edict: 182 | try: 183 | _merge_a_into_b(a[k], b[k]) 184 | except: 185 | print('Error under config key: {}'.format(k)) 186 | raise 187 | else: 188 | b[k] = v 189 | 190 | def cfg_from_file(filename): 191 | """Load a config file and merge it into the default options.""" 192 | import yaml 193 | with open(filename, 'r') as f: 194 | yaml_cfg = edict(yaml.load(f)) 195 | 196 | _merge_a_into_b(yaml_cfg, __C) 197 | 198 | def cfg_from_list(cfg_list): 199 | """Set config keys via list (e.g., from command line).""" 200 | from ast import literal_eval 201 | assert len(cfg_list) % 2 == 0 202 | for k, v in zip(cfg_list[0::2], cfg_list[1::2]): 203 | key_list = k.split('.') 204 | d = __C 205 | for subkey in key_list[:-1]: 206 | assert d.has_key(subkey) 207 | d = d[subkey] 208 | subkey = key_list[-1] 209 | assert d.has_key(subkey) 210 | try: 211 | value = literal_eval(v) 212 | except: 213 | # handle the case when v is a string literal 214 | value = v 215 | assert type(value) == type(d[subkey]), \ 216 | 'type {} does not match original type {}'.format( 217 | type(value), type(d[subkey])) 218 | d[subkey] = value 219 | -------------------------------------------------------------------------------- /code/configuration.py: -------------------------------------------------------------------------------- 1 | 2 | class ModelConfig(object): 3 | 4 | def __init__(self): 5 | self.image_format = 'jpeg' 6 | 7 | self.batch_size = 5 8 | self.max_seq_len = 15 9 | 10 | self.image_size = [224, 224] 11 | self.num_image_channels = 3 12 | 13 | self.num_clstm_kernels = 256 #384 for two layers of ConvLSTMs 14 | self.clstm_kernel_size = [3, 3] 15 | self.num_convlstm_layers = 2 16 | # If < 1.0, the dropout keep probability applied to ConvLSTM variables. 17 | self.clstm_dropout_keep_prob = 0.7 18 | 19 | self.pretrained_model_file = None 20 | 21 | self.training_data_tfrecord_path = \ 22 | '/home/lishen/Experiments/CLSTMT/dataset/training_set/TFRecord/training_data.tf_record.soft_gt' 23 | 24 | # Approximate number of values per input shard. Used to ensure sufficient 25 | # mixing between shards in training. 26 | self.values_per_input_shard = 2300 27 | # Minimum number of shards to keep in the input queue. 28 | self.input_queue_capacity_factor = 2 29 | # Number of threads for prefetching SequenceExample protos. 30 | self.num_input_reader_threads = 1 31 | 32 | # Number of threads for image preprocessing. Should be a multiple of 2. 33 | self.num_preprocess_threads = 1 34 | 35 | self.num_seqs = 115 # total number of domains(training tracking sequences) 36 | 37 | 38 | class TrainingConfig(object): 39 | 40 | def __init__(self): 41 | """Set the default training hyper-parameters.""" 42 | 43 | # Optimizer for training the model 44 | self.optimizer = "SGD" 45 | self.max_epoches = 100 46 | self.learning_rate = 0.001 47 | 48 | 49 | class FinetuningConfig(object): 50 | def __init__(self): 51 | self.learning_rate = 0.01 52 | self.use_domain_specific_finetuned_model = True 53 | 54 | 55 | class TestingConfig(object): 56 | 57 | def __init__(self): 58 | self.root_dir = '/home/lishen/Experiments/CLSTMT' 59 | self.code_root_dir = '/home/code/lishen/dataset' 60 | self.peep_ratio = 3.5 61 | 62 | 63 | class VerificationModelConfig(object): 64 | def __init__(self): 65 | self.pretrained_model_file = "./weights/vgg16_verif.npy" 66 | self.num_boxes_per_batch = None 67 | 68 | 69 | -------------------------------------------------------------------------------- /code/dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import numpy.random as npr 3 | from PIL import Image 4 | from torch.utils.data import Dataset, DataLoader 5 | import torchvision.transforms as transforms 6 | import torch 7 | import os 8 | import os.path as osp 9 | import linecache 10 | import cv2 11 | 12 | 13 | class DaSiamTrainingSet(Dataset): 14 | def __init__(self, transform, z_size, x_size): 15 | self.root_path = "/home/lishen/Experiments/CLSTMT/dataset/test_set/OTB100/" 16 | self.domain2nseq = {} 17 | self.create_domain2nseq(osp.join(self.root_path, "whole_list.txt")) 18 | self.transform = transform 19 | self.z_size = z_size 20 | self.x_size = x_size 21 | 22 | def create_domain2nseq(self, list_fpath): 23 | with open(list_fpath, 'r') as f: 24 | while True: 25 | line = f.readline() 26 | if not line: 27 | break 28 | splits = line.strip().split() 29 | domain_name = splits[0].split('/')[-1] 30 | nseq = int(splits[1]) 31 | self.domain2nseq[domain_name] = nseq 32 | 33 | def __len__(self): 34 | return sum(self.domain2nseq.values()) // len(self.domain2nseq.values()) 35 | 36 | def __getitem__(self, item): 37 | domain_list = list(self.domain2nseq.keys()) 38 | domain_name = npr.choice(domain_list, size=1)[0] 39 | num_frames = self.domain2nseq[domain_name] 40 | 41 | pair_frame_nos = npr.choice(range(1, num_frames+1), size=2, replace=False) 42 | z_frame_no, x_frame_no = min(pair_frame_nos), max(pair_frame_nos) 43 | 44 | domain_dir = osp.join(self.root_path, "sequences", domain_name) 45 | gt_fpath = osp.join(domain_dir, domain_name + '_gt.txt') 46 | z_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, z_frame_no).split())) 47 | x_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, x_frame_no).split())) 48 | 49 | z_frame_img_name = str(z_frame_no).zfill(4) + '.jpg' 50 | x_frame_img_name = str(x_frame_no).zfill(4) + '.jpg' 51 | z_frame = cv2.imread(osp.join(domain_dir, 'imgs', z_frame_img_name)) 52 | x_frame = cv2.imread(osp.join(domain_dir, 'imgs', x_frame_img_name)) 53 | 54 | #print(z_gt_bbx) 55 | z = crop_roi(z_frame, convert_bbx2box(z_gt_bbx)) 56 | z = cv2.resize(z, self.z_size) 57 | 58 | x_gt_box = convert_bbx2box(x_gt_bbx) 59 | sr_box = gen_sr_box(x_frame, x_gt_box) 60 | x = crop_roi(x_frame, sr_box) 61 | x = cv2.resize(x, self.x_size) 62 | 63 | translated_x_gt_box = np.array(trans_coord(sr_box, x_gt_box)) 64 | 65 | sample = { 66 | 'template': self.transform(z), 67 | 'search_region': self.transform(x), 68 | 'gt_box': translated_x_gt_box 69 | } 70 | return sample 71 | 72 | 73 | def trans_coord(sr_box, x_gt_box): 74 | return (x_gt_box[0]-sr_box[0], x_gt_box[1]-sr_box[1], x_gt_box[2]-sr_box[0], x_gt_box[3]-sr_box[1]) 75 | 76 | 77 | def gen_sr_box(frame, gt_box): 78 | gt_x1, gt_y1, gt_x2, gt_y2 = gt_box 79 | h, w = gt_y2-gt_y1+1, gt_x2-gt_x1+1 80 | rand_cx = np.random.randint(gt_x1, gt_x2+1) 81 | rand_cy = np.random.randint(gt_y1, gt_y2+1) 82 | 83 | sr_x1, sr_y1, sr_x2, sr_y2 = rand_cx-w, rand_cy-h, rand_cx+w, rand_cy+h 84 | H, W = frame.shape[:2] 85 | return max(0, sr_x1), max(0, sr_y1), min(sr_x2, W-1), min(sr_y2, H-1) 86 | 87 | 88 | def convert_bbx2box(bbx): 89 | x, y, w, h = bbx 90 | return (x, y, x+w-1, y+h-1) 91 | 92 | 93 | def convert_box2bbx(box): 94 | x1, y1, x2, y2 = box 95 | return (x1, y1, x2-x1+1, y2-y1+1) 96 | 97 | 98 | def crop_roi(frame, box): 99 | return frame[box[1]:box[3]+1, box[0]:box[2]+1, :] 100 | 101 | 102 | def IoU(prop, gt): 103 | x1, y1, w1, h1 = map(prop, float) 104 | x2, y2, w2, h2 = map(gt, float) 105 | startx, endx = min(x1, x2), max(x1+w1, x2+w2) 106 | starty, endy = min(y1, y2), max(y1+h1, y2+h2) 107 | width = w1 + w2 - (endx - startx) 108 | height = h1 + h2 - (endy - starty) 109 | if width <= 0 or height <= 0: 110 | return 0 111 | else: 112 | area = width * height 113 | return 1.0*area/(w1*h1+w2*h2-area) 114 | 115 | 116 | def load_data(batch_size, z_size, x_size): 117 | transform = transforms.Compose([ 118 | # convert a PIL.Image instance of value range [0, 255] or an numpy.ndarray of shape (H, W, C) 119 | # into a torch.FloatTensor of shape (C, H, W) with value range (0, 1.0). 120 | transforms.ToTensor(), 121 | ]) 122 | 123 | datasets = { 124 | 'train': DaSiamTrainingSet(transform, z_size, x_size) 125 | } 126 | 127 | dataloaders = {ds: DataLoader(datasets[ds], 128 | batch_size=batch_size, 129 | shuffle=False, 130 | pin_memory=True, 131 | num_workers=8) for ds in datasets} 132 | 133 | return dataloaders 134 | 135 | 136 | if __name__ == "__main__": 137 | da_siam_set = DaSiamTrainingSet(transforms.ToTensor(), (127, 127), (255, 255)) 138 | 139 | domain_list = list(da_siam_set.domain2nseq.keys()) 140 | domain_name = npr.choice(domain_list, size=1)[0] 141 | num_frames = da_siam_set.domain2nseq[domain_name] 142 | 143 | pair_frame_nos = npr.choice(range(num_frames), size=2, replace=False) 144 | z_frame_no, x_frame_no = min(pair_frame_nos), max(pair_frame_nos) 145 | 146 | domain_dir = osp.join(da_siam_set.root_path, "sequences", domain_name) 147 | gt_fpath = osp.join(domain_dir, domain_name + '_gt.txt') 148 | 149 | z_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, z_frame_no).split())) 150 | x_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, x_frame_no).split())) 151 | 152 | z_frame_img_name = str(z_frame_no).zfill(4) + '.jpg' 153 | x_frame_img_name = str(x_frame_no).zfill(4) + '.jpg' 154 | z_frame = cv2.imread(osp.join(domain_dir, 'imgs', z_frame_img_name)) 155 | x_frame = cv2.imread(osp.join(domain_dir, 'imgs', x_frame_img_name)) 156 | 157 | import pdb 158 | pdb.set_trace() 159 | 160 | z = crop_roi(z_frame, convert_bbx2box(z_gt_bbx)) 161 | z = cv2.resize(z, da_siam_set.z_size) 162 | 163 | x_gt_box = convert_bbx2box(x_gt_bbx) 164 | sr_box = gen_sr_box(x_frame, x_gt_box) 165 | x = crop_roi(x_frame, sr_box) 166 | x = cv2.resize(x, da_siam_set.x_size) 167 | 168 | translated_x_gt_box = trans_coord(sr_box, x_gt_box) 169 | 170 | print('DONE.') 171 | 172 | -------------------------------------------------------------------------------- /code/gen_all_anchors.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from generate_anchors import generate_anchors 3 | 4 | def generate_all_anchors(cls_output_shape, xs_shape): 5 | anchors = generate_anchors(ratios=[0.33, 0.5, 1, 2, 3], scales=np.array([8, ])) 6 | # anchors are in box format (x1, y1, x2, y2) 7 | 8 | A = anchors.shape[0] 9 | feat_stride = xs_shape[0] // cls_output_shape[0] 10 | 11 | allowed_border = 0 12 | height, width = cls_output_shape 13 | 14 | sr_size = xs_shape 15 | 16 | # 1. Generate proposals from bbox deltas and shifted anchors 17 | shift_x = np.arange(0, width) * feat_stride 18 | shift_y = np.arange(0, height) * feat_stride 19 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 20 | shifts = np.vstack((shift_x.ravel(), 21 | shift_y.ravel(), 22 | shift_x.ravel(), 23 | shift_y.ravel())).transpose() 24 | 25 | # 2. Add K anochors (1, A, 4) to cell K shifts (K, 1, 4) 26 | # to get shift anchors (K, A, 4) and reshape to (K*A, 4) shifted anchors 27 | K = shifts.shape[0] 28 | all_anchors = (anchors.reshape((1, A, 4))) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 29 | 30 | all_anchors = all_anchors.reshape((K*A, 4)) # of shape (5x22x22, 4) 31 | 32 | """ 33 | # total number of anchors == A * height * width, 34 | # where height and width are the size of conv feature map 35 | total_anchors = int(K*A) 36 | 37 | # Only keep anchors inside the image 38 | inds_inside = np.where( 39 | (all_anchors[:, 0] >= -allowed_border) & 40 | (all_anchors[:, 1] >= -allowed_border) & 41 | (all_anchors[:, 2] < sr_size[1] + allowed_border) & 42 | (all_anchors[:, 3] < sr_size[0] + allowed_border) 43 | )[0] 44 | anchors = all_anchors[inds_inside, :] 45 | # after keeping-inside, #anchors drops from 2420 down to 433 46 | """ 47 | 48 | return all_anchors, A # anchors 49 | 50 | if __name__ == '__main__': 51 | sr_shape = (255, 255) 52 | conv_shape = (17, 17) 53 | all_anchors = generate_all_anchors(conv_shape, sr_shape) 54 | print(all_anchors) 55 | 56 | import cv2 57 | img = cv2.imread("../data/SPRING2004B69.jpg") 58 | img = cv2.resize(img, sr_shape) 59 | for anchor in all_anchors: 60 | x1y1x2y2 = tuple(map(int, list(anchor))) 61 | cv2.rectangle(img, x1y1x2y2[:2], x1y1x2y2[2:], 2) 62 | cv2.imwrite("../data/result.jpg", img) 63 | 64 | print("DONE.") 65 | -------------------------------------------------------------------------------- /code/generate_anchors.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick and Sean Bell 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | # Verify that we compute the same anchors as Shaoqing's matlab implementation: 11 | # 12 | # >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat 13 | # >> anchors 14 | # 15 | # anchors = 16 | # 17 | # -83 -39 100 56 18 | # -175 -87 192 104 19 | # -359 -183 376 200 20 | # -55 -55 72 72 21 | # -119 -119 136 136 22 | # -247 -247 264 264 23 | # -35 -79 52 96 24 | # -79 -167 96 184 25 | # -167 -343 184 360 26 | 27 | #array([[ -83., -39., 100., 56.], 28 | # [-175., -87., 192., 104.], 29 | # [-359., -183., 376., 200.], 30 | # [ -55., -55., 72., 72.], 31 | # [-119., -119., 136., 136.], 32 | # [-247., -247., 264., 264.], 33 | # [ -35., -79., 52., 96.], 34 | # [ -79., -167., 96., 184.], 35 | # [-167., -343., 184., 360.]]) 36 | 37 | def generate_anchors(base_size=16, 38 | ratios=[0.5, 1, 2], #aspect ratios = (0.5, 1, 2) 39 | scales=2**np.arange(3, 6)): #scales == array([2^3, 2^4, 2^5]) 40 | """ 41 | Generate anchor (reference) windows by enumerating aspect ratios X 42 | scales wrt a reference (0, 0, 15, 15) window. 43 | """ 44 | 45 | base_anchor = np.array([1, 1, base_size, base_size]) - 1 46 | ratio_anchors = _ratio_enum(base_anchor, ratios) 47 | anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) 48 | for i in range(ratio_anchors.shape[0])]) 49 | return anchors 50 | 51 | def _whctrs(anchor): 52 | """ 53 | Return width, height, x center, and y center for an anchor (window). 54 | """ 55 | 56 | w = anchor[2] - anchor[0] + 1 57 | h = anchor[3] - anchor[1] + 1 58 | x_ctr = anchor[0] + 0.5 * (w - 1) 59 | y_ctr = anchor[1] + 0.5 * (h - 1) 60 | return w, h, x_ctr, y_ctr 61 | 62 | def _mkanchors(ws, hs, x_ctr, y_ctr): 63 | """ 64 | Given a vector of widths (ws) and heights (hs) around a center 65 | (x_ctr, y_ctr), output a set of anchors (windows). 66 | """ 67 | 68 | ws = ws[:, np.newaxis] 69 | hs = hs[:, np.newaxis] 70 | anchors = np.hstack((x_ctr - 0.5 * (ws - 1), 71 | y_ctr - 0.5 * (hs - 1), 72 | x_ctr + 0.5 * (ws - 1), 73 | y_ctr + 0.5 * (hs - 1))) 74 | return anchors 75 | 76 | def _ratio_enum(anchor, ratios): 77 | """ 78 | Enumerate a set of anchors for each aspect ratio wrt an anchor. 79 | """ 80 | 81 | w, h, x_ctr, y_ctr = _whctrs(anchor) 82 | size = w * h 83 | size_ratios = size / ratios 84 | ws = np.round(np.sqrt(size_ratios)) 85 | hs = np.round(ws * ratios) 86 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 87 | return anchors 88 | 89 | def _scale_enum(anchor, scales): 90 | """ 91 | Enumerate a set of anchors for each scale wrt an anchor. 92 | """ 93 | 94 | w, h, x_ctr, y_ctr = _whctrs(anchor) 95 | ws = w * scales 96 | hs = h * scales 97 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 98 | return anchors 99 | -------------------------------------------------------------------------------- /code/net.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # DaSiamRPN 3 | # Licensed under The MIT License 4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn) 5 | # -------------------------------------------------------- 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | 10 | 11 | class SiamRPNBIG(nn.Module): 12 | def __init__(self, feat_in=512, feature_out=512, anchor=5): 13 | super(SiamRPNBIG, self).__init__() 14 | self.anchor = anchor 15 | self.feature_out = feature_out 16 | self.featureExtract = nn.Sequential( 17 | nn.Conv2d(3, 192, 11, stride=2), 18 | nn.BatchNorm2d(192), 19 | nn.ReLU(inplace=True), 20 | nn.MaxPool2d(3, stride=2), 21 | 22 | nn.Conv2d(192, 512, 5), 23 | nn.BatchNorm2d(512), 24 | nn.ReLU(inplace=True), 25 | nn.MaxPool2d(3, stride=2), 26 | 27 | nn.Conv2d(512, 768, 3), 28 | nn.BatchNorm2d(768), 29 | nn.ReLU(inplace=True), 30 | 31 | nn.Conv2d(768, 768, 3), 32 | nn.BatchNorm2d(768), 33 | nn.ReLU(inplace=True), 34 | 35 | nn.Conv2d(768, 512, 3), 36 | nn.BatchNorm2d(512), 37 | ) 38 | 39 | self.conv_reg1 = nn.Conv2d(feat_in, feature_out*4*anchor, 3) 40 | self.conv_reg2 = nn.Conv2d(feat_in, feature_out, 3) 41 | self.conv_cls1 = nn.Conv2d(feat_in, feature_out*2*anchor, 3) 42 | self.conv_cls2 = nn.Conv2d(feat_in, feature_out, 3) 43 | self.regress_adjust = nn.Conv2d(4*anchor, 4*anchor, 1) 44 | 45 | #self.additional_conv = nn.Conv2d(512, 512, 6, padding=0) 46 | 47 | self.reg1_kernel = [] 48 | self.cls1_kernel = [] 49 | 50 | def forward(self, x): 51 | x_f = self.featureExtract(x) 52 | #x_ff = self.additional_conv(x_f) # simply for the compatibility of shape matching 53 | 54 | batch_size = x_f.size(0) 55 | reg_conv_output = self.conv_reg2(x_f) 56 | cls_conv_output = self.conv_cls2(x_f) 57 | 58 | cls_corr_list = [] 59 | reg_corr_list = [] 60 | for i_batch in range(batch_size): 61 | i_cls_corr = F.conv2d(torch.unsqueeze(cls_conv_output[i_batch], 0), self.cls1_kernel[i_batch]) 62 | cls_corr_list.append(i_cls_corr) 63 | i_reg_corr = F.conv2d(torch.unsqueeze(reg_conv_output[i_batch], 0), self.reg1_kernel[i_batch]) 64 | i_reg_corr = self.regress_adjust(i_reg_corr) 65 | reg_corr_list.append(i_reg_corr) 66 | 67 | cls_corr = torch.stack(cls_corr_list, dim=0) 68 | cls_corr = torch.squeeze(cls_corr) 69 | reg_corr = torch.stack(reg_corr_list, dim=0) 70 | reg_corr = torch.squeeze(reg_corr) 71 | 72 | """ 73 | # return tensors of shape (17,17,4K) and (17,17,2k), respectively 74 | return self.regress_adjust( 75 | F.conv2d(self.conv_reg2(x_f), self.reg1_kernel) 76 | ), \ 77 | F.conv2d(self.conv_cls2(x_f), self.cls1_kernel) 78 | """ 79 | return reg_corr, cls_corr, x_f # of shape (50, 4K, 17, 17), (50, 2K, 17, 17), (N, 22, 22, 512) ################################### 80 | 81 | 82 | def template(self, z): 83 | z_f = self.featureExtract(z) 84 | reg1_kernel_raw = self.conv_reg1(z_f) 85 | cls1_kernel_raw = self.conv_cls1(z_f) 86 | kernel_size = reg1_kernel_raw.data.size()[-1] 87 | 88 | self.reg1_kernel = reg1_kernel_raw.view(-1, self.anchor*4, self.feature_out, kernel_size, kernel_size)#50, 4K, 512, 4, 4 89 | self.cls1_kernel = cls1_kernel_raw.view(-1, self.anchor*2, self.feature_out, kernel_size, kernel_size)#50, 2K, 512, 4, 4 90 | 91 | return z_f # of shape (N, 6, 6, 512) ############################################################################################# 92 | -------------------------------------------------------------------------------- /code/py_nms.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def py_nms(dets, thresh=0.9): 4 | """Python NMS""" # the input is in box format 5 | 6 | x1 = dets[:, 0] 7 | y1 = dets[:, 1] 8 | x2 = dets[:, 2] 9 | y2 = dets[:, 3] 10 | scores = dets[:, 4] 11 | 12 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 13 | order = scores.argsort()[::-1] 14 | 15 | keep = [] 16 | while order.size > 0: 17 | i = order[0] 18 | keep.append(i) 19 | xx1 = np.maximum(x1[i], x1[order[1:]]) 20 | yy1 = np.maximum(y1[i], y1[order[1:]]) 21 | xx2 = np.minimum(x2[i], x2[order[1:]]) 22 | yy2 = np.minimum(y2[i], y2[order[1:]]) 23 | 24 | w = np.maximum(0.0, xx2 - xx1 + 1) 25 | h = np.maximum(0.0, yy2 - yy1 + 1) 26 | inter = w * h 27 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 28 | 29 | inds = np.where(ovr <= thresh)[0] 30 | order = order[inds + 1] 31 | 32 | return keep 33 | -------------------------------------------------------------------------------- /code/run_SiamRPN.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # DaSiamRPN 3 | # Licensed under The MIT License 4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn) 5 | # -------------------------------------------------------- 6 | import numpy as np 7 | from torch.autograd import Variable 8 | import torch.nn.functional as F 9 | 10 | from util_test import convert_box2bbx 11 | from utils import get_subwindow_tracking 12 | import py_nms 13 | from train import generate_all_anchors 14 | 15 | '''DEPRECATED.''' 16 | def generate_anchor(total_stride, scales, ratios, score_size): 17 | anchor_num = len(ratios) * len(scales) # 5 x 1 18 | anchor = np.zeros((anchor_num, 4), dtype=np.float32) 19 | size = total_stride * total_stride 20 | count = 0 21 | for ratio in ratios: 22 | ws = int(np.sqrt(size / ratio)) 23 | hs = int(ws * ratio) 24 | for scale in scales: 25 | wws = ws * scale 26 | hhs = hs * scale 27 | anchor[count, 0] = 0 28 | anchor[count, 1] = 0 29 | anchor[count, 2] = wws 30 | anchor[count, 3] = hhs 31 | count += 1 32 | 33 | anchor = np.tile(anchor, score_size * score_size).reshape((-1, 4)) 34 | ori = - (score_size / 2) * total_stride 35 | xx, yy = np.meshgrid([ori + total_stride * dx for dx in range(score_size)], 36 | [ori + total_stride * dy for dy in range(score_size)]) 37 | xx, yy = np.tile(xx.flatten(), (anchor_num, 1)).flatten(), \ 38 | np.tile(yy.flatten(), (anchor_num, 1)).flatten() 39 | anchor[:, 0], anchor[:, 1] = xx.astype(np.float32), yy.astype(np.float32) 40 | return anchor 41 | 42 | 43 | class TrackerConfig(object): 44 | # These are the default hyper-params for DaSiamRPN 0.3827 45 | 46 | windowing = 'cosine' # to penalize large displacements [cosine/uniform] 47 | 48 | # Params from the network architecture, have to be consistent with the training 49 | exemplar_size = 127 # input z size 50 | instance_size = 255 # input x size (search region), 271 51 | total_stride = 8 52 | score_size = (instance_size-exemplar_size)/total_stride+1 # 'Cuz examplar will be used as a kernel to convolve with instance 53 | delta_score_size = 17 # must be consistent with that of Siamese network, to be automatically linked 54 | 55 | context_amount = 0.5 # context amount for the exemplar 56 | ratios = [0.33, 0.5, 1, 2, 3] 57 | scales = [8, ] 58 | basic_anchor_num = len(ratios) * len(scales) 59 | anchors = [] 60 | penalty_k = 0.055 61 | window_influence = 0.42 62 | lr = 0.295 63 | 64 | alpha_i = 1.0 65 | eta = 0.01 66 | alpha_hat = 0.5 67 | num_pts_half_bin = 2 68 | distractor_thresh = 0.5 69 | 70 | 71 | def tracker_eval(net, x_crop, target_pos, target_sz, window, scale_z, p): 72 | delta, score = net(x_crop) # (1, 4K, 17, 17) and (1, 2K, 17, 17) 73 | 74 | delta = delta.permute(1, 2, 3, 0).contiguous().view(4, -1).data.cpu().numpy() 75 | score = F.softmax(score.permute(1, 2, 3, 0).contiguous().view(2, -1), dim=0).data[1, :].cpu().numpy() 76 | 77 | delta[0, :] = delta[0, :] * p.anchors[:, 2] + p.anchors[:, 0] 78 | delta[1, :] = delta[1, :] * p.anchors[:, 3] + p.anchors[:, 1] 79 | delta[2, :] = np.exp(delta[2, :]) * p.anchors[:, 2] 80 | delta[3, :] = np.exp(delta[3, :]) * p.anchors[:, 3] 81 | 82 | ''' 83 | def change(r): 84 | return np.maximum(r, 1./r) 85 | 86 | def sz(w, h): 87 | pad = (w + h) * 0.5 88 | sz2 = (w + pad) * (h + pad) 89 | return np.sqrt(sz2) 90 | 91 | def sz_wh(wh): 92 | pad = (wh[0] + wh[1]) * 0.5 93 | sz2 = (wh[0] + pad) * (wh[1] + pad) 94 | return np.sqrt(sz2) 95 | ''' 96 | 97 | # size penalty 98 | s_c = change(sz(delta[2, :], delta[3, :]) / (sz_wh(target_sz))) # scale penalty 99 | r_c = change((target_sz[0] / target_sz[1]) / (delta[2, :] / delta[3, :])) # ratio penalty 100 | 101 | penalty = np.exp(-(r_c * s_c - 1.) * p.penalty_k) 102 | pscore = penalty * score 103 | 104 | # window float 105 | pscore = pscore * (1 - p.window_influence) + window * p.window_influence 106 | best_pscore_id = np.argmax(pscore) 107 | 108 | target = delta[:, best_pscore_id] / scale_z 109 | target_sz = target_sz / scale_z 110 | lr = penalty[best_pscore_id] * score[best_pscore_id] * p.lr 111 | 112 | res_x = target[0] + target_pos[0] 113 | res_y = target[1] + target_pos[1] 114 | 115 | res_w = target_sz[0] * (1 - lr) + target[2] * lr 116 | res_h = target_sz[1] * (1 - lr) + target[3] * lr 117 | 118 | target_pos = np.array([res_x, res_y]) 119 | target_sz = np.array([res_w, res_h]) 120 | return target_pos, target_sz, score[best_pscore_id] 121 | 122 | 123 | def tracker_eval_distractor_aware(x_crop, target_sz, scale_z, state): 124 | p = state['p'] # tracking config 125 | net = state['net'] 126 | window = state['window'] # cosine window 127 | target_pos = state['target_pos'] 128 | 129 | delta, score, sr_feat = net(x_crop) # of shape (1, 4K, 17, 17), (1, 2K, 17, 17), (1, 22, 22, 512) 130 | 131 | delta = delta.contiguous().view(4, -1).data.cpu().numpy() # (4, K*17*17) 132 | score = F.softmax(score.contiguous().view(2, -1), dim=0).data[1, :].cpu().numpy() # (2, K*17*17) 133 | 134 | delta[0, :] = delta[0, :] * p.anchors[:, 2] + p.anchors[:, 0] # x 135 | delta[1, :] = delta[1, :] * p.anchors[:, 3] + p.anchors[:, 1] # y 136 | delta[2, :] = np.exp(delta[2, :]) * p.anchors[:, 2] # w 137 | delta[3, :] = np.exp(delta[3, :]) * p.anchors[:, 3] # h 138 | 139 | inds_inside = np.where( 140 | (delta[0, :] >= 0) & 141 | (delta[1, :] >= 0) & 142 | (delta[0, :] + delta[2, :] - 1 < p.instance_size) & 143 | (delta[1, :] + delta[3, :] - 1 < p.instance_size) 144 | )[0] 145 | delta = delta[:, inds_inside] 146 | score = score[inds_inside] 147 | 148 | # for i in range(delta.shape[1]): 149 | # print(delta[:, i]) 150 | 151 | '''NMS is performed on delta according to pscore's''' 152 | dets = np.hstack( 153 | (delta.transpose(), score[np.newaxis, :].transpose()) 154 | ) # in bbx format of (x, y, w, h) 155 | dets[:, 2] = dets[:, 0] + dets[:, 2] - 1 156 | dets[:, 3] = dets[:, 1] + dets[:, 3] - 1 157 | 158 | nms_indices_kept = py_nms.py_nms(dets, thresh=0.9) # now dets is in box format 159 | # dets_kept = dets[nums_ind_kept] # (N, 4+1) 160 | # print(dets.astype(int)) 161 | 162 | def bilinear_interp(sr_feat, x_f, y_f): 163 | ub = sr_feat.shape[-1]-1 164 | x1, y1 = max(0, min(ub, int(x_f))), max(0, min(ub, int(y_f))) 165 | x2, y2 = max(0, min(ub, int(x_f)+1)), max(0, min(ub, int(y_f)+1)) 166 | #print(f"{x1}, {y1}, {x2}, {y2}") 167 | 168 | fQ11, fQ12, fQ21, fQ22 = sr_feat[:, x1, y1], sr_feat[:, x1, y2], sr_feat[:, x2, y1], sr_feat[:, x2, y2] 169 | fQ11 = fQ11.cpu().detach().numpy() 170 | fQ12 = fQ12.cpu().detach().numpy() 171 | fQ21 = fQ21.cpu().detach().numpy() 172 | fQ22 = fQ22.cpu().detach().numpy() 173 | 174 | ret1 = (y2-y_f)/(y2-y1)*((x2-x_f)/(x2-x1)*fQ11 + (x_f-x1)/(x2-x1)*fQ21) 175 | ret2 = (y_f-y1)/(y2-y1)*((x2-x_f)/(x2-x1)*fQ12 + (x_f-x1)/(x2-x1)*fQ22) 176 | 177 | return ret1+ret2 178 | 179 | def binwise_max_pooling(meshgrid, num_bins, num_pts): 180 | assert meshgrid.shape[0] == meshgrid.shape[1] == num_bins*num_pts 181 | 182 | num_channels = meshgrid.shape[2]-2 183 | pooling_res = np.zeros((num_bins, num_bins, num_channels), dtype=np.float32) 184 | for channel in range(num_channels): 185 | for r in range(num_bins): 186 | for c in range(num_bins): 187 | res_rc = meshgrid[r*num_pts, c*num_pts, 2+channel] 188 | res_rc = max(res_rc, meshgrid[r*num_pts, c*num_pts+1, 2+channel]) 189 | res_rc = max(res_rc, meshgrid[r*num_pts+1, c*num_pts, 2+channel]) 190 | res_rc = max(res_rc, meshgrid[r*num_pts+1, c*num_pts+1, 2+channel]) 191 | pooling_res[r, c, channel] = res_rc 192 | 193 | return pooling_res 194 | 195 | '''Extract phi's of each region proposal using ROI-align''' 196 | W = H = p.instance_size # raw image size 197 | W_ = H_ = sr_feat.shape[2] # size of feature map of search region, expected to be 22 198 | num_props = len(nms_indices_kept) 199 | num_bins = state['template_feat'].shape[-1] # expect state['template_feat'] to be 6 200 | num_pts = p.num_pts_half_bin 201 | num_channels = state['template_feat'].shape[1] # expected to be 512 202 | roi_align_feats = np.empty((num_props, num_bins, num_bins, num_channels), dtype=np.float32) 203 | index2featkept_map = {} # a mapping from the original index to the new index 204 | for i in range(num_props): 205 | nms_index_kept = nms_indices_kept[i] 206 | 207 | x, y, w, h = convert_box2bbx(tuple(dets[nms_index_kept][:4])) 208 | x_, y_ = W_*(x+1)/W-1, H_*(y+1)/H-1 209 | w_, h_ = W_*(x+w)/W-x_, H_*(y+h)/H-y_ #W_*w/W, H_*h/H 210 | 211 | meshgrid = np.empty((num_bins*num_pts, num_bins*num_pts, 2+num_channels)) # `2+num_channels` means (x, y, val) 212 | h_stride = w_/num_bins/(num_pts+1) 213 | v_stride = h_/num_bins/(num_pts+1) 214 | 215 | for r in range(num_bins*num_pts): 216 | for c in range(num_bins*num_pts): 217 | h_delta = (c//num_pts)*((num_pts+1)*h_stride) + ((c%num_pts)+1)*h_stride 218 | v_delta = (r//num_pts)*((num_pts+1)*v_stride) + ((r%num_pts)+1)*v_stride 219 | 220 | meshgrid[r, c, :2] = np.array([x_+h_delta, y_+v_delta]) # can be disabled 221 | 222 | x_f, y_f = x_+h_delta, y_+v_delta 223 | # print(x_f, y_f) 224 | vals = bilinear_interp(sr_feat[0], x_f, y_f) # sr_feat (1, 512, 22, 22) 225 | meshgrid[r, c, 2:] = vals 226 | 227 | roi_align_res = binwise_max_pooling(meshgrid, num_bins, num_pts) # resulting in a tensor of shape (6, 6, 512) 228 | roi_align_feats[i, ...] = roi_align_res 229 | index2featkept_map[nms_index_kept] = i 230 | '''After RoI-align, we obtain roi_align_feats, which is a tensor of shape (N, 6, 6, 512)''' 231 | 232 | '''Distractor-aware incremental learning:''' 233 | # 1. Construct a distractor set, saving indices of the original set of proposals before NMS 234 | distractor_index_set = [] 235 | running_idx = nms_indices_kept[0] 236 | running_max = np.sum(state['template_feat'][0].transpose(1, 2, 0) * roi_align_feats[0]) # element-wise multiplication and sum 237 | if running_max > p.distractor_thresh: 238 | distractor_index_set.append(running_idx) 239 | for i in range(1, num_props): 240 | nms_index_kept = nms_indices_kept[i] 241 | curr_val = np.sum(state['template_feat'][0].transpose(1, 2, 0) * roi_align_feats[i]) # element-wise multiplication and sum 242 | if curr_val > running_max: 243 | running_idx = nms_index_kept 244 | running_max = curr_val 245 | if curr_val > p.distractor_thresh: 246 | distractor_index_set.append(nms_index_kept) 247 | distractor_index_set.remove(running_idx) 248 | 249 | # 2. Incremental learning according Eqn. (4) 250 | sum_alpha_i = len(distractor_index_set) * p.alpha_i 251 | running_template = state['acc_beta_phi'] / state['acc_beta'] - state['acc_beta_alpha_phi'] / (state['acc_beta'] * sum_alpha_i) 252 | running_idx = nms_indices_kept[0] 253 | running_max = np.sum(running_template[0].transpose(1, 2, 0) * roi_align_feats[0]) 254 | for i in range(1, num_props): 255 | nms_index_kept = nms_indices_kept[i] 256 | curr_val = np.sum(running_template[0].transpose(1, 2, 0) * roi_align_feats[index2featkept_map[nms_index_kept]]) 257 | if curr_val > running_max: 258 | running_idx = nms_index_kept 259 | running_max = curr_val 260 | 261 | beta_t = p.eta/(1-p.eta) 262 | curr_beta_alpha_phi = np.zeros_like(state['acc_beta_alpha_phi']) 263 | for distractor_index in distractor_index_set: 264 | curr_beta_alpha_phi += p.alpha_i * roi_align_feats[index2featkept_map[distractor_index]].transpose(2, 0, 1)[np.newaxis, ...] 265 | curr_beta_alpha_phi *= p.alpha_hat * beta_t 266 | state['acc_beta_alpha_phi'] += curr_beta_alpha_phi 267 | state['acc_beta'] += beta_t 268 | '''---Distractor-aware incremental learning---''' 269 | 270 | best_pscore_id = running_idx 271 | target = delta[:, best_pscore_id] / scale_z 272 | target_sz = target_sz / scale_z 273 | lr = 0.1 #penalty[best_pscore_id] * score[best_pscore_id] * p.lr 274 | 275 | res_x = target[0] + target_pos[0] 276 | res_y = target[1] + target_pos[1] 277 | 278 | res_w = target_sz[0] * (1 - lr) + target[2] * lr 279 | res_h = target_sz[1] * (1 - lr) + target[3] * lr 280 | 281 | target_pos = np.array([res_x, res_y]) 282 | target_sz = np.array([res_w, res_h]) 283 | return target_pos, target_sz, score[best_pscore_id] 284 | 285 | 286 | def SiamRPN_init(im, target_pos, target_sz, net): 287 | ## target_pos is (cx, cy) 288 | ## target_sz is (w, h) 289 | 290 | state = dict() 291 | p = TrackerConfig() 292 | state['im_h'] = im.shape[0] 293 | state['im_w'] = im.shape[1] 294 | 295 | if ((target_sz[0]*target_sz[1]) / float(state['im_h']*state['im_w'])) < 0.004: 296 | p.instance_size = 287 # small object big search region 297 | else: 298 | p.instance_size = 255 #271 299 | 300 | p.delta_score_size = int((p.instance_size-p.exemplar_size)/p.total_stride+1) # size of the last feature map, expected to be 17 301 | 302 | # all anchors of each aspect ratio and scale at each location are generated. 303 | p.anchors, _ = generate_all_anchors((p.delta_score_size, p.delta_score_size), 304 | (p.instance_size, p.instance_size)) 305 | # of shape (dropping from 2420 down to 433, 4) 306 | 307 | avg_chans = np.mean(im, axis=(0, 1)) #??????????? 308 | 309 | wc_z = target_sz[0] + p.context_amount * sum(target_sz) # adding some context info 310 | hc_z = target_sz[1] + p.context_amount * sum(target_sz) # adding some context info 311 | s_z = round(np.sqrt(wc_z * hc_z)) 312 | 313 | # initialize the exemplar 314 | z_crop = get_subwindow_tracking(im, target_pos, p.exemplar_size, s_z, avg_chans) 315 | 316 | z = Variable(z_crop.unsqueeze(0)) 317 | template_feat = net.template(z.cuda()) 318 | 319 | if p.windowing == 'cosine': 320 | # return the outer product of two hanning vectors, which is a matrix of the same size as the feature map of search region 321 | window = np.outer(np.hanning(p.delta_score_size), np.hanning(p.delta_score_size)) ############### p.score_size??? 322 | elif p.windowing == 'uniform': 323 | window = np.ones((p.delta_score_size, p.delta_score_size)) ################## p.score_size??? 324 | 325 | # flatten and replicate the cosine window 326 | window = np.tile(window.flatten(), p.basic_anchor_num) 327 | 328 | state['p'] = p 329 | state['net'] = net 330 | state['avg_chans'] = avg_chans 331 | state['window'] = window 332 | state['target_pos'] = target_pos 333 | state['target_sz'] = target_sz 334 | state['score'] = 1.0 335 | 336 | # for distractor-aware incremental learning 337 | template_feat_cpu = template_feat.cpu().detach().numpy() 338 | state['template_feat'] = template_feat_cpu 339 | state['acc_beta_phi'] = template_feat_cpu 340 | state['acc_beta'] = 1.0 341 | state['acc_beta_alpha_phi'] = np.zeros_like(template_feat_cpu) 342 | 343 | return state 344 | 345 | 346 | def SiamRPN_track(state, im): 347 | p = state['p'] # tracking config 348 | net = state['net'] 349 | avg_chans = state['avg_chans'] 350 | window = state['window'] # cosine window 351 | target_pos = state['target_pos'] # cx, cy of target in the previous frame 352 | target_sz = state['target_sz'] # w, h of target in the previous frame 353 | template_feat = state['template_feat'] 354 | 355 | wc_z = target_sz[0] + p.context_amount * sum(target_sz) 356 | hc_z = target_sz[1] + p.context_amount * sum(target_sz) 357 | s_z = np.sqrt(wc_z * hc_z) 358 | scale_z = p.exemplar_size / s_z 359 | 360 | ###'Local to Global': if failure mode is activated then expand d_search; otherwise set d_search to normal 361 | d_search = (p.instance_size - p.exemplar_size) / 2 362 | if state['score'] < 0.3: 363 | d_search *= 2 364 | 365 | pad = d_search / scale_z 366 | s_x = s_z + 2 * pad 367 | 368 | # extract scaled crops for search region x at previous target position 369 | x_crop = Variable(get_subwindow_tracking(im, target_pos, p.instance_size, round(s_x), avg_chans).unsqueeze(0)) 370 | # where the third argument is the model size and the fourth is the orginal size in the raw image. 371 | 372 | target_pos, target_sz, score = tracker_eval_distractor_aware(x_crop.cuda(), target_sz*scale_z, scale_z, state) 373 | 374 | target_pos[0] = max(0, min(state['im_w'], target_pos[0])) 375 | target_pos[1] = max(0, min(state['im_h'], target_pos[1])) 376 | target_sz[0] = max(10, min(state['im_w'], target_sz[0])) 377 | target_sz[1] = max(10, min(state['im_h'], target_sz[1])) 378 | 379 | state['target_pos'] = target_pos 380 | state['target_sz'] = target_sz 381 | state['score'] = score 382 | return state 383 | -------------------------------------------------------------------------------- /code/test.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import cv2 # imread 3 | import torch 4 | import numpy as np 5 | from os.path import realpath, dirname, join 6 | 7 | from net import SiamRPNBIG 8 | from run_SiamRPN import SiamRPN_init, SiamRPN_track 9 | from utils import get_axis_aligned_bbox, cxy_wh_2_rect 10 | from util_test import * 11 | import linecache 12 | 13 | class Tracker(object): 14 | def __init__(self, 15 | path_seq, 16 | num_frames, 17 | gt_first): 18 | 19 | self.path_seq = path_seq 20 | self.num_frames = num_frames 21 | self.gt_first = gt_first 22 | 23 | # load net 24 | self.net = SiamRPNBIG() 25 | # self.net.load_state_dict(torch.load("./SiamRPNBIG.model")) 26 | self.net.eval().cuda() 27 | 28 | #self.testing_config = testing_config 29 | self.cur_seq_name = os.path.split(path_seq)[1] 30 | self.cur_frame = None 31 | 32 | def on_tracking(self): 33 | # warm up 34 | for i in range(10): 35 | self.net.template(torch.autograd.Variable(torch.FloatTensor(1, 3, 127, 127)).cuda()) 36 | self.net(torch.autograd.Variable(torch.FloatTensor(1, 3, 255, 255)).cuda()) 37 | 38 | i = 1 39 | pred_bbx = self.gt_first 40 | print("{}th frame: {} {} {} {}".format(i, pred_bbx[0],pred_bbx[1], pred_bbx[2], pred_bbx[3])) 41 | cx, cy, w, h = pred_bbx[0]+pred_bbx[2]/2.0, pred_bbx[1]+pred_bbx[3]/2.0, pred_bbx[2], pred_bbx[3] 42 | i += 1 43 | 44 | target_pos, target_sz = np.array([cx, cy]), np.array([w, h]) 45 | im = cv2.imread(self.path_seq + '/imgs/0001.jpg') # HxWxC 46 | state = SiamRPN_init(im, target_pos, target_sz, self.net) # init tracker 47 | 48 | while i <= self.num_frames: 49 | self.index_frame = i 50 | im = cv2.imread(self.path_seq + '/imgs/' + str(i).zfill(4) + '.jpg') 51 | state = SiamRPN_track(state, im) 52 | 53 | # convert cx, cy, w, h into rect 54 | res = cxy_wh_2_rect(state['target_pos'], state['target_sz']) 55 | print(f"{i}th frame: ", res) 56 | i += 1 57 | 58 | 59 | if __name__ == "__main__": 60 | os.environ["CUDA_VISIBLE_DEVICES"] = "1" 61 | print('*****************TEST PHASE********************') 62 | import datetime 63 | testing_date = datetime.datetime.now().strftime('%b-%d-%y_%H:%M:%S') 64 | 65 | seq_list_path = '../data/whole_list.txt' 66 | 67 | seq_path_list = np.genfromtxt(seq_list_path, dtype='S', usecols=0) 68 | num_frames_list = np.genfromtxt(seq_list_path, dtype=int, usecols=1) 69 | if seq_path_list.ndim == 0: 70 | seq_path_list = seq_path_list.reshape(1) 71 | num_frames_list = num_frames_list.reshape(1) 72 | 73 | assert len(seq_path_list) == len(num_frames_list) 74 | total_seqs = len(seq_path_list) 75 | total_frames = sum(num_frames_list) 76 | for seq_index in range(len(seq_path_list)): 77 | path_seq = seq_path_list[seq_index].decode('utf-8') 78 | num_frames = num_frames_list[seq_index] 79 | 80 | seq_name = os.path.split(path_seq)[1] 81 | print(f"\nprocessing Sequence {seq_name} with {num_frames} frames...") 82 | 83 | global gt_file_name 84 | gt_file_name = path_seq + '/' + seq_name + "_gt.txt" 85 | 86 | gt_entry = linecache.getline(gt_file_name, 1) 87 | gt_first = parse_gt_entry(gt_entry) 88 | 89 | tracker = Tracker(path_seq=path_seq, 90 | num_frames=num_frames, 91 | gt_first=gt_first) 92 | tracker.on_tracking() 93 | -------------------------------------------------------------------------------- /code/train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | import torch 4 | 5 | from net import SiamRPNBIG 6 | from gen_all_anchors import generate_all_anchors 7 | from bbox_transform import bbox_transform 8 | from config import cfg 9 | 10 | import argparse 11 | import dataset 12 | from tqdm import tqdm 13 | 14 | 15 | def bbox_overlaps(box, gt, phase='iou'): 16 | """ 17 | Compute the overlaps between box and gt(_box) 18 | box: (N, 4) NDArray 19 | gt : (K, 4) NDArray 20 | return: (N, K) NDArray, stores Max(0, intersection/union) or Max(0, intersection/area_box) 21 | """ 22 | # Note that the inputs are in box format: x1, y1, x2, y2 23 | 24 | N = box.shape[0] 25 | K = gt.shape[0] 26 | target_shape = (N, K, 4) 27 | b_box = np.broadcast_to(np.expand_dims(box, axis=1), target_shape) 28 | b_gt = np.broadcast_to(np.expand_dims(gt, axis=0), target_shape) 29 | 30 | iw = (np.minimum(b_box[:, :, 2], b_gt[:, :, 2]) - 31 | np.maximum(b_box[:, :, 0], b_gt[:, :, 0])) 32 | ih = (np.minimum(b_box[:, :, 3], b_gt[:, :, 3]) - 33 | np.maximum(b_box[:, :, 1], b_gt[:, :, 1])) 34 | inter = np.maximum(iw, 0) * np.maximum(ih, 0) 35 | 36 | # Use the broadcast to save some time 37 | area_box = (box[:, 2] - box[:, 0]) * (box[:, 3] - box[:, 1]) 38 | area_gt = (gt[:, 2] - gt[:, 0]) * (gt[:, 3] - gt[:, 1]) 39 | area_target_shape = (N, K) 40 | b_area_box = np.broadcast_to(np.expand_dims(area_box, axis=1), area_target_shape) 41 | b_area_gt = np.broadcast_to(np.expand_dims(area_gt, axis=0), area_target_shape) 42 | 43 | assert phase == 'iou' or phase == 'ioa' 44 | union = b_area_box + b_area_gt - inter if phase == 'iou' else b_area_box 45 | 46 | overlaps = np.maximum(inter / np.maximum(union, 1), 0) 47 | return overlaps 48 | 49 | 50 | def smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, beta=1.0): 51 | box_diff = bbox_pred - bbox_targets 52 | in_box_diff = bbox_inside_weights * box_diff 53 | abs_in_box_diff = torch.abs(in_box_diff) 54 | smooth_l1_sign = (abs_in_box_diff < beta).detach().float() 55 | 56 | in_loss_box = smooth_l1_sign * 0.5 * torch.pow(in_box_diff, 2) / beta + \ 57 | (1-smooth_l1_sign) * (abs_in_box_diff-0.5*beta) 58 | 59 | out_loss_box = bbox_outside_weights * in_loss_box 60 | loss_box = out_loss_box 61 | N = loss_box.size(0) 62 | loss_box = loss_box.view(-1).sum(0) / N 63 | return loss_box 64 | 65 | 66 | def _unmap(data, count, inds, fill=0): 67 | """ Unmap a subset of item (data) back to the original set of items (of size count) """ 68 | if len(data.shape) == 1: 69 | ret = np.empty((count, ), dtype=np.float32) 70 | ret.fill(fill) 71 | ret[inds] = data 72 | else: 73 | ret = np.empty((count, ) + data.shape[1:], dtype=np.float32) 74 | ret.fill(fill) 75 | ret[inds, :] = data 76 | return ret 77 | 78 | 79 | def _compute_targets(ex_rois, gt_rois): 80 | """Compute bounding-box regression targets for an image.""" 81 | assert ex_rois.shape[0] == gt_rois.shape[0] 82 | assert ex_rois.shape[1] == 4 83 | assert gt_rois.shape[1] == 4 #5 84 | 85 | return bbox_transform(ex_rois, gt_rois[:, :4].numpy()).astype(np.float32, copy=False) 86 | 87 | 88 | def gen_anchor_target(cls_output_shape, xs_shape, gt_boxes): 89 | """ 90 | Assign anchors to ground-truth targets. 91 | Produces anchor classification labels and bounding-box regression targets. 92 | """ 93 | height, width = cls_output_shape 94 | all_anchors, A = generate_all_anchors(cls_output_shape, xs_shape) 95 | # Note that anchors are in format (x1, y1, x2, y2) 96 | 97 | total_anchors = all_anchors.shape[0] 98 | inds_inside = np.where( 99 | (all_anchors[:, 0] >= 0) & 100 | (all_anchors[:, 1] >= 0) & 101 | (all_anchors[:, 2] < xs_shape[1]) & 102 | (all_anchors[:, 3] < xs_shape[0]) 103 | )[0] 104 | anchors = all_anchors[inds_inside, :] 105 | 106 | labels = np.zeros((1, 1, A*height, width)) 107 | bbox_targets = np.zeros((1, 4*A, height, width)) 108 | bbox_inside_weights = np.zeros((1, 4*A, height, width)) 109 | bbox_outsied_weights = np.zeros((1, 4*A, height, width)) 110 | 111 | # label: 1 is positive, 0 is negative, -1 is don't care 112 | labels = np.empty((len(inds_inside), ), dtype=np.float32) 113 | labels.fill(-1) 114 | 115 | # overlaps between anchors and gt boxes 116 | # overlaps.shape = (#total_anchors, #gts) 117 | overlaps = bbox_overlaps( 118 | np.ascontiguousarray(anchors, dtype=np.float), 119 | np.ascontiguousarray(gt_boxes, dtype=np.float)) 120 | 121 | argmax_overlaps = overlaps.argmax(axis=1) # of shape (#total_anchors, ) 122 | max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] # of shape (#total_anchors, ) 123 | 124 | gt_argmax_overlaps = overlaps.argmax(axis=0) # of shape (#gt, ) 125 | gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])] # of shape (#gt, ) 126 | gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0] 127 | 128 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 # 0.3 129 | labels[gt_argmax_overlaps] = 1 130 | labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 # 0.7 131 | 132 | bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) 133 | # _compute_targets() returns #sifted_anchors-by-4 tensor with each row being (dx, dy, dw, dy), 134 | # the increment to be learnt by bbx regressor 135 | bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) 136 | bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) 137 | bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) #RPN_BBOX_INSIDE_WEIGHTS = [1.0, 1.0, 1.0, 1.0] 138 | 139 | bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32) 140 | if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: #cfg.TRAIN.RPN_POSITIVE_WEIGHT == -1.0 141 | # uniform weighting of examples (given non-uniform sampling) 142 | num_examples = np.sum(labels >= 0) # num_examples is the sum of anchors labeled 1 143 | positive_weights = np.ones((1, 4)) * 1.0 / num_examples 144 | negative_weights = np.ones((1, 4)) * 1.0 / num_examples 145 | else: 146 | assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) & 147 | (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1)) 148 | positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / np.sum(labels == 1)) 149 | negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / np.sum(labels == 0)) 150 | 151 | bbox_outside_weights[labels == 1, :] = positive_weights 152 | bbox_outside_weights[labels == 0, :] = negative_weights 153 | 154 | # map up to original set of anchors 155 | labels = _unmap(labels, total_anchors, inds_inside, fill=-1) #labels.shape == (#total_anchors, ) 156 | bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0) #bbox_targets.shape == (#total_anchors, 4) 157 | bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) #bbox_inside_weights.shape == (#total_anchors, 4) 158 | bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) #bbox_outside_weights.shape == (#total_anchors, 4) 159 | 160 | # labels 161 | labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2) # of shape (1, A, height, width) 162 | labels = labels.reshape((1, 1, A*height, width)) 163 | # bbox_targets 164 | bbox_targets = bbox_targets.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width) 165 | # bbox_inside_weights 166 | bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width) 167 | # bbox_outside_weights 168 | bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width) 169 | 170 | return labels, bbox_targets, bbox_inside_weights, bbox_outside_weights, A 171 | 172 | 173 | def parse_args(): 174 | parser = argparse.ArgumentParser() 175 | parser.add_argument('--gpu_id', default=0, help='GPU ID to use, e.g. \'0\'', type=int) 176 | 177 | return parser.parse_args() 178 | 179 | 180 | def load_pretrained_weights(net, weight_file_path): 181 | ori_pretrained_dict = torch.load(weight_file_path) 182 | model_dict = net.state_dict() 183 | #pretrained_dict = {k: v for k, v in ori_pretrained_dict.items() if k in model_dict} 184 | 185 | import collections 186 | pretrained_dict = collections.OrderedDict() 187 | 188 | for k, v in ori_pretrained_dict.items(): 189 | if k in model_dict and k.startswith('featureExtract'): # Only load the modified AlexNet weights 190 | pretrained_dict[k] = v 191 | # print(k) 192 | 193 | model_dict.update(pretrained_dict) 194 | net.load_state_dict(model_dict) 195 | 196 | 197 | if __name__ == '__main__': 198 | torch.backends.cudnn.enabled=False # to temporally remove the issue "illegal access to memory" 199 | 200 | args = parse_args() 201 | gpu_id = args.gpu_id 202 | if gpu_id is None: 203 | DEVICE = torch.device(f'cpu') 204 | else: 205 | DEVICE = torch.device(f'cuda:{gpu_id}') 206 | 207 | z_size = (127, 127) 208 | x_size = (255, 255) 209 | batch_size = num_domains = 50 210 | num_epoches = 100 211 | 212 | loader = dataset.load_data(batch_size, z_size, x_size)['train'] 213 | 214 | net = SiamRPNBIG() 215 | net.train().to(DEVICE) 216 | # load_pretrained_weights(net, "./SiamRPNBIG.model") 217 | optimizer = torch.optim.Adam(net.parameters(), weight_decay=0.001, lr=0.001) 218 | 219 | for i_ep in range(num_epoches): 220 | for i_iter, sample in tqdm(enumerate(loader), total=len(loader)): 221 | zs = sample['template'].to(DEVICE) 222 | xs = sample['search_region'].to(DEVICE) 223 | gt_boxes = sample['gt_box'] #.to(DEVICE) 224 | 225 | optimizer.zero_grad() 226 | 227 | net.template(zs) 228 | reg_output, cls_output, _ = net.forward(xs) # of shape (50, 4*5, 17, 17), (50, 2*5, 17, 17) 229 | 230 | feat_h, feat_w = tuple(cls_output.size()[-2:]) 231 | assert zs.shape[0] == xs.shape[0] == gt_boxes.shape[0] 232 | total_loss = total_cls_loss = total_reg_loss = 0.0 233 | for i in range(zs.shape[0]): 234 | rpn_labels, \ 235 | rpn_bbox_targets, \ 236 | rpn_bbox_inside_weights, \ 237 | rpn_bbox_outside_weights, \ 238 | A \ 239 | = gen_anchor_target(cls_output[i].shape[-2:], xs[i].shape[-2:], gt_boxes[i][np.newaxis, :]) 240 | 241 | #reg_loss_fn = torch.nn.SmoothL1Loss(reduce=False, size_average=False) 242 | reg_loss_fn = smooth_l1_loss 243 | reg_loss = reg_loss_fn(reg_output[i], torch.from_numpy(rpn_bbox_targets).to(DEVICE), torch.from_numpy(rpn_bbox_inside_weights).to(DEVICE), torch.from_numpy(rpn_bbox_outside_weights).to(DEVICE)) 244 | 245 | cls_loss_fn = torch.nn.CrossEntropyLoss(reduce=False, size_average=False) 246 | 247 | rpn_labels = rpn_labels.reshape(A, feat_h, feat_w) # from (1, 1, A*17, 17) to (A, 17, 17) 248 | logits = cls_output[i].view(A, 2, feat_h, feat_w) # from (2*A, 17, 17) to (A, 2, 17, 17) 249 | cls_loss = cls_loss_fn(logits, torch.from_numpy(rpn_labels).to(DEVICE).long()) # (A, 17, 17) 250 | 251 | mask = np.ones_like(rpn_labels) 252 | mask[np.where(rpn_labels==-1)] = 0 # mask where we 'don't care' 253 | 254 | #import pdb 255 | #pdb.set_trace() 256 | 257 | mask = torch.from_numpy(mask).to(DEVICE) 258 | cls_loss = torch.sum(cls_loss * mask) / torch.sum(mask) 259 | 260 | #print("{} + l * {} = {}".format(cls_loss, reg_loss, cls_loss+cfg.TRAIN.LAMBDA*reg_loss)) 261 | 262 | total_cls_loss += cls_loss 263 | total_reg_loss += reg_loss 264 | total_loss += cls_loss + cfg.TRAIN.LAMBDA * reg_loss 265 | 266 | total_loss /= batch_size 267 | total_reg_loss /= batch_size 268 | total_cls_loss /= batch_size 269 | print(f"Epoch{i_ep} Iter{i_iter} --- total_loss: {total_loss:.4f}, cls_loss: {total_cls_loss:.4f}, reg_loss: {total_reg_loss:.4f}") 270 | total_loss.backward() 271 | optimizer.step() 272 | 273 | ######## Save the current model 274 | print("Saving model...") 275 | if not os.path.exists("./output/weights"): 276 | os.makedirs("./output/weights") 277 | torch.save(net.state_dict(), f"./output/weights/dasiam_{i_ep}.pkl") 278 | 279 | print("Training completed.") 280 | 281 | -------------------------------------------------------------------------------- /code/util_test.py: -------------------------------------------------------------------------------- 1 | #import matplotlib as mpl 2 | #import matplotlib.cbook as cbook 3 | import os 4 | import cv2 5 | import numpy as np 6 | #import matplotlib.pyplot as plt 7 | 8 | import datetime 9 | 10 | """ 11 | UTIL OF TEST version 1.0 12 | """ 13 | 14 | def convert_box2bbx(box): 15 | x1, y1, x2, y2 = box 16 | return (x1, y1, x2-x1+1, y2-y1+1) 17 | 18 | 19 | def convert_bbx2box(bbx): 20 | x, y, w, h = bbx 21 | return (x, y, x+w-1, y+h-1) 22 | 23 | 24 | def save_pred_bboxes_v2(pred_tuple_list, seq_name, testing_date): 25 | source_path = '/home/code/xuxiaqing/dataset/OTB100/{}/imgs'.format(seq_name) 26 | saving_path = './output/tracking_res/OTB100/{}/{}/'.format(testing_date, seq_name) 27 | 28 | if not os.path.exists(saving_path): 29 | os.makedirs(saving_path) 30 | 31 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w') 32 | for index, pred_tuple in enumerate(pred_tuple_list): 33 | pred_bbx, score = pred_tuple 34 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg' 35 | raw_img_path = os.path.join(source_path, raw_img_name) 36 | frame = cv2.imread(raw_img_path) 37 | 38 | left = int(round(pred_bbx[0])) 39 | top = int(round(pred_bbx[1])) 40 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1)) 41 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1)) 42 | 43 | ############################################################################## 44 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 255, 0), 2) 45 | cv2.putText(frame, str(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 255), 1, 8) 46 | # cv2.putText(frame, str(thresh_eps), (right, bottom), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 97, 255), 1, 8) 47 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame) 48 | 49 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3]) 50 | list_file.write(entry + '\n') 51 | 52 | list_file.close() 53 | print('\nPredictions of Seq ' + seq_name + ' saved.') 54 | 55 | def save_pred_bboxes(pred_bbx_list, score_list, seq_name): 56 | assert len(pred_bbx_list) == len(score_list), 'length of lists not equal' 57 | 58 | saving_path = cfg.ROOT_DIR + './output/tracking_res/{}/'.format(cfg.TEST.BENCHMARK_NAME) + seq_name 59 | source_path = '/home/lishen/Experiments/siamese_tracking_net/dataset/test_set/OTB100/' + seq_name + '/imgs' 60 | 61 | if not os.path.exists(saving_path): 62 | os.makedirs(saving_path) 63 | 64 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w') 65 | for index, pred_bbx in enumerate(pred_bbx_list): 66 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg' 67 | raw_img_path = os.path.join(source_path, raw_img_name) 68 | frame = cv2.imread(raw_img_path) 69 | 70 | left = int(round(pred_bbx[0])) 71 | top = int(round(pred_bbx[1])) 72 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1)) 73 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1)) 74 | 75 | score = score_list[index] 76 | 77 | ############################################################################## 78 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 2) 79 | cv2.putText(frame, '{}'.format(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, 8) 80 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame) 81 | 82 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3]) 83 | list_file.write(entry + '\n') 84 | 85 | list_file.close() 86 | print('\nPredictions of Seq ' + seq_name + ' saved.') 87 | 88 | def save_pred_bboxes_bbxr_exclusive(pred_bbx_list_before_reg, pred_bbx_list, score_list, seq_name): 89 | assert len(pred_bbx_list) == len(score_list), 'length of lists not equal' 90 | assert len(pred_bbx_list) == len(pred_bbx_list_before_reg), 'length of lists not equal' 91 | 92 | saving_path = cfg.ROOT_DIR + '/output/tracking_res/{}/'.format(cfg.TEST.BENCHMARK_NAME) + seq_name 93 | source_path = '/home/lishen/Experiments/siamese_tracking_net/dataset/test_set/Benchmark/' + seq_name + '/imgs' 94 | 95 | if not os.path.exists(saving_path): 96 | os.makedirs(saving_path) 97 | 98 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w') 99 | for index, pred_bbx in enumerate(pred_bbx_list): 100 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg' 101 | raw_img_path = os.path.join(source_path, raw_img_name) 102 | frame = cv2.imread(raw_img_path) 103 | 104 | left = int(round(pred_bbx[0])) 105 | top = int(round(pred_bbx[1])) 106 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1)) 107 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1)) 108 | 109 | score = score_list[index] 110 | 111 | ## predicted bbx before regression ## 112 | left_before = int(round(pred_bbx_list_before_reg[index][0])) 113 | top_before = int(round(pred_bbx_list_before_reg[index][1])) 114 | right_before = int(round(pred_bbx_list_before_reg[index][0] + pred_bbx_list_before_reg[index][2] - 1)) 115 | bottom_before = int(round(pred_bbx_list_before_reg[index][1] + pred_bbx_list_before_reg[index][3] - 1)) 116 | cv2.rectangle(frame, (left_before, top_before), (right_before, bottom_before), (0, 0, 255), 2) 117 | 118 | ############################################################################## 119 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 2) 120 | cv2.putText(frame, '{}'.format(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, 8) 121 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame) 122 | 123 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3]) 124 | list_file.write(entry + '\n') 125 | 126 | list_file.close() 127 | print('\nPredictions of Seq ' + seq_name + ' saved.') 128 | 129 | 130 | def parse_gt_entry(gt_entry): 131 | split_gt_entry = gt_entry.split() #',' 132 | 133 | left = float(split_gt_entry[0]) 134 | top = float(split_gt_entry[1]) 135 | width = float(split_gt_entry[2]) 136 | height = float(split_gt_entry[3]) 137 | return (left, top, width, height) 138 | 139 | 140 | def crop_roi(frame, bbx): 141 | #box = (x1, y1, x2, y2) 142 | box = (int(round(bbx[0])), int(round(bbx[1])), int(round(bbx[0]+bbx[2])), int(round(bbx[1]+bbx[3]))) 143 | return frame[box[1]:box[3], box[0]:box[2], :] 144 | 145 | 146 | def crop_and_save(seq_name, raw_img, idx_frame, samples, type_str): 147 | root_dir = cfg.CODE_ROOT_DIR + '/output/finetuning_data/{}'.format(cfg.TEST.BENCHMARK_NAME) 148 | 149 | tar_dir = root_dir + '/' + seq_name + '/' + str(idx_frame) + '/' + type_str 150 | if not os.path.exists(tar_dir): 151 | os.makedirs(tar_dir) 152 | 153 | for idx in xrange(samples.shape[0]): 154 | bbx_sample = samples[idx, :] 155 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 156 | patch = raw_img[box[1]:box[3], box[0]:box[2], :] 157 | path_patch = tar_dir + '/' + str(idx+1) + '.jpg' 158 | cv2.imwrite(path_patch, patch) 159 | 160 | 161 | def sub_gen(gt, raw_img_size): 162 | sub_pos_samples = np.zeros((25, 4), dtype=np.float32) 163 | index = 0 164 | 165 | right_img = raw_img_size[1] 166 | bottom_img = raw_img_size[0] 167 | 168 | for dx in np.arange(-2, 3): 169 | for dy in np.arange(-2, 3): 170 | '''determine a new bounding box''' 171 | left = gt[0] + dx 172 | top = gt[1] + dy 173 | width = gt[2] + np.abs(dx) 174 | height = gt[3] + np.abs(dy) 175 | 176 | '''in case it lies beyond the boundary''' 177 | left = min(right_img, max(0, left)) # 0 <= left <= right_img 178 | top = min(bottom_img, max(0, top)) 179 | 180 | right = left + width 181 | right = min(right_img, max(0, right)) 182 | 183 | bottom = top + height 184 | bottom = min(bottom_img, max(0, bottom)) 185 | 186 | width = right - left 187 | height = bottom - top 188 | 189 | sub_pos_samples[index, :] = np.array([[left, top, width, height]]) 190 | index += 1 191 | 192 | return sub_pos_samples 193 | 194 | 195 | def gen_positive_samples(gt, raw_img_size): 196 | '''This function will generate 50 positive samples using pixel-difference type''' 197 | 198 | #Generate the first 25 positives 199 | first_sub_pos_samples = sub_gen(gt, raw_img_size) 200 | 201 | #Generate the second 25 positives 202 | shifted_gt = (gt[0]-1, gt[1]-1, gt[2]+2, gt[3]+2) 203 | second_sub_pos_samples = sub_gen(shifted_gt, raw_img_size) 204 | 205 | pos_samples = np.vstack((first_sub_pos_samples, second_sub_pos_samples)) 206 | return pos_samples 207 | 208 | 209 | def IoU(prop, gt): 210 | x1, y1, w1, h1 = float(prop[0]), float(prop[1]), float(prop[2]), float(prop[3]) 211 | x2, y2, w2, h2 = float(gt[0]), float(gt[1]), float(gt[2]), float(gt[3]) 212 | startx, endx = min(x1, x2), max(x1 + w1, x2 + w2) 213 | starty, endy = min(y1, y2), max(y1 + h1, y2 + h2) 214 | width = w1 + w2 - (endx - startx) 215 | height = h1 + h2 - (endy - starty) 216 | if width <= 0 or height <= 0: 217 | return 0 218 | else : 219 | area = width * height 220 | return 1.0 * area / (w1*h1 + w2*h2 - area) 221 | 222 | 223 | def post_proc(random_scalar): 224 | return max(-1, min(1, 0.5 * random_scalar)) #restrict it within the interval [-1, 1] 225 | 226 | 227 | def gen_samples_box(sampling_type, 228 | gt, 229 | num_samples, 230 | raw_img_size, 231 | base_scalar=1.05, 232 | trans_fac=0.1, 233 | scale_fac=5, 234 | pos_sampling=True, 235 | pos_thresh=0.7, 236 | neg_thresh=0.3, 237 | iou_thresh_ignored=False): 238 | 239 | H = raw_img_size[0] 240 | W = raw_img_size[1] 241 | 242 | #sample = (cx, cy, w, h), where (cx, cy) is the coodinate of the gt image 243 | sample = np.array([gt[0]+gt[2]/2, gt[1]+gt[3]/2, gt[2], gt[3]], dtype = np.float32) 244 | samples = np.tile(sample, (num_samples, 1)) 245 | 246 | idx = 0 247 | while idx < num_samples: 248 | curr_sample = samples[idx, :].copy() 249 | 250 | if sampling_type == 'gaussian': 251 | lt_increment = trans_fac * round(np.mean(gt[2:4])) * np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 252 | curr_sample[:2] = curr_sample[:2] + lt_increment.reshape(2,) 253 | 254 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 255 | wh_factor = base_scalar ** (scale_fac * randn_vec) 256 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 257 | 258 | elif sampling_type == 'uniform': #uniform distribution within a searching area 2.5 times the size of bbx 259 | sr_ratio = 3.5 #cfg.TEST.UNIFORM_SAMPLING_RANGE_RATIO #twice or 2.5 times??? 260 | 261 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 262 | wh_factor = base_scalar ** (scale_fac * randn_vec) 263 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 264 | 265 | cx_bound = (curr_sample[0]-curr_sample[2]*(sr_ratio/2), curr_sample[0]+curr_sample[2]*(sr_ratio/2)) 266 | cy_bound = (curr_sample[1]-curr_sample[3]*(sr_ratio/2), curr_sample[1]+curr_sample[3]*(sr_ratio/2)) 267 | cx = (cx_bound[1] - cx_bound[0]) * np.random.random_sample() + cx_bound[0] 268 | cy = (cy_bound[1] - cy_bound[0]) * np.random.random_sample() + cy_bound[0] 269 | 270 | curr_sample[0] = cx 271 | curr_sample[1] = cy 272 | 273 | elif sampling_type == 'whole': #uniform distribution within the whole image 274 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 275 | wh_factor = base_scalar ** (scale_fac * randn_vec) 276 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 277 | 278 | w = curr_sample[2] 279 | h = curr_sample[3] 280 | curr_sample[0] = (W - w) * np.random.random_sample() + w / 2.0 281 | curr_sample[1] = (H - h) * np.random.random_sample() + h / 2.0 282 | 283 | '''In case that samples experience abrupt scaling variation...''' ########## 284 | curr_sample[2] = max(5, min(W-5, curr_sample[2])) #w max(gt[2]/5.0, min(gt[2]*5.0, curr_sample[2])) 285 | curr_sample[3] = max(5, min(H-5, curr_sample[3])) #h max(gt[3]/5.0, min(gt[2]*5.0, curr_sample[3])) 286 | 287 | half_w, half_h = curr_sample[2]/2.0, curr_sample[3]/2.0 288 | 289 | # bbx_sample = np.array([curr_sample[0]-curr_sample[2]/2, curr_sample[1]-curr_sample[3]/2, curr_sample[2], curr_sample[3]]) 290 | # bbx_sample[0] = max(0, min(W-bbx_sample[2]-1, bbx_sample[0])) 291 | # bbx_sample[1] = max(0, min(H-bbx_sample[3]-1, bbx_sample[1])) 292 | 293 | """The centre coordinate of candidate box should lie within the [half_w, W-half_w-1]x[half_h, H-half_h-1]""" 294 | curr_sample[0] = max(half_w, min(W-half_w-1, curr_sample[0])) 295 | curr_sample[1] = max(half_h, min(H-half_h-1, curr_sample[1])) 296 | 297 | x1, y1 = curr_sample[0]-half_w, curr_sample[1]-half_h 298 | x1, y1 = max(0, min(W-1, x1)), max(0, min(H-1, y1)) ### for insurance 299 | x2, y2 = curr_sample[0]+half_w, curr_sample[1]+half_h 300 | x2, y2 = max(0, min(W-1, x2)), max(0, min(H-1, y2)) ### for insurance 301 | box_sample = np.array([x1, y1, x2, y2]) 302 | 303 | if iou_thresh_ignored: # this is exclusive for sampling candidates during online tracking 304 | samples[idx, :] = box_sample 305 | idx += 1 306 | continue 307 | 308 | overlap_ratio = IoU(convert_box2bbx(box_sample), gt) 309 | if overlap_ratio >= pos_thresh and pos_sampling: #if positive sampling is being performed and its overlapping ratio >= 0.7 310 | samples[idx, :] = box_sample 311 | idx += 1 312 | elif overlap_ratio < neg_thresh and not pos_sampling: #if negative sampling is being performed and its overlapping ratio < 0.3 313 | samples[idx, :] = box_sample 314 | idx += 1 315 | 316 | return samples 317 | 318 | 319 | def gen_samples(sampling_type, 320 | gt, 321 | num_samples, 322 | raw_img_size, 323 | base_scalar=1.05, 324 | trans_fac=0.1, 325 | scale_fac=5, 326 | pos_sampling=True, 327 | pos_thresh=0.7, 328 | neg_thresh=0.3, 329 | iou_thresh_ignored=False): 330 | 331 | H = raw_img_size[0] 332 | W = raw_img_size[1] 333 | 334 | #sample = (cx, cy, w, h), where (cx, cy) is the coodinate of the gt image 335 | sample = np.array([gt[0]+gt[2]/2, gt[1]+gt[3]/2, gt[2], gt[3]], dtype = np.float32) 336 | samples = np.tile(sample, (num_samples, 1)) 337 | 338 | idx = 0 339 | while idx < num_samples: 340 | curr_sample = samples[idx, :].copy() 341 | 342 | if sampling_type == 'gaussian': 343 | lt_increment = trans_fac * round(np.mean(gt[2:4])) * np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 344 | curr_sample[:2] = curr_sample[:2] + lt_increment.reshape(2,) 345 | 346 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 347 | wh_factor = base_scalar ** (scale_fac * randn_vec) 348 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 349 | 350 | elif sampling_type == 'uniform': #uniform distribution within a searching area 2.5 times the size of bbx 351 | sr_ratio = 3.5 #cfg.TEST.UNIFORM_SAMPLING_RANGE_RATIO #twice or 2.5 times??? 352 | 353 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 354 | wh_factor = base_scalar ** (scale_fac * randn_vec) 355 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 356 | 357 | cx_bound = (curr_sample[0]-curr_sample[2]*(sr_ratio/2), curr_sample[0]+curr_sample[2]*(sr_ratio/2)) 358 | cy_bound = (curr_sample[1]-curr_sample[3]*(sr_ratio/2), curr_sample[1]+curr_sample[3]*(sr_ratio/2)) 359 | cx = (cx_bound[1] - cx_bound[0]) * np.random.random_sample() + cx_bound[0] 360 | cy = (cy_bound[1] - cy_bound[0]) * np.random.random_sample() + cy_bound[0] 361 | 362 | curr_sample[0] = cx 363 | curr_sample[1] = cy 364 | 365 | elif sampling_type == 'whole': #uniform distribution within the whole image 366 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))]) 367 | wh_factor = base_scalar ** (scale_fac * randn_vec) 368 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,) 369 | 370 | w = curr_sample[2] 371 | h = curr_sample[3] 372 | curr_sample[0] = (W - w) * np.random.random_sample() + w / 2.0 373 | curr_sample[1] = (H - h) * np.random.random_sample() + h / 2.0 374 | 375 | '''In case that samples experience abrupt scaling variation...''' ########## 376 | curr_sample[2] = max(5, min(W-5, curr_sample[2])) #w max(gt[2]/5.0, min(gt[2]*5.0, curr_sample[2])) 377 | curr_sample[3] = max(5, min(H-5, curr_sample[3])) #h max(gt[3]/5.0, min(gt[2]*5.0, curr_sample[3])) 378 | 379 | half_w, half_h = curr_sample[2]/2.0, curr_sample[3]/2.0 380 | 381 | # bbx_sample = np.array([curr_sample[0]-curr_sample[2]/2, curr_sample[1]-curr_sample[3]/2, curr_sample[2], curr_sample[3]]) 382 | # bbx_sample[0] = max(0, min(W-bbx_sample[2]-1, bbx_sample[0])) 383 | # bbx_sample[1] = max(0, min(H-bbx_sample[3]-1, bbx_sample[1])) 384 | 385 | """The centre coordinate of candidate box should lie within the [half_w, W-half_w-1]x[half_h, H-half_h-1]""" 386 | curr_sample[0] = max(half_w, min(W-half_w-1, curr_sample[0])) 387 | curr_sample[1] = max(half_h, min(H-half_h-1, curr_sample[1])) 388 | 389 | x1, y1 = curr_sample[0]-half_w, curr_sample[1]-half_h 390 | x1, y1 = max(0, min(W-1, x1)), max(0, min(H-1, y1)) ### for insurance 391 | x2, y2 = curr_sample[0]+half_w, curr_sample[1]+half_h 392 | x2, y2 = max(0, min(W-1, x2)), max(0, min(H-1, y2)) ### for insurance 393 | bbx_sample = np.array([x1, y1, x2-x1+1, y2-y1+1]) 394 | 395 | if iou_thresh_ignored: # this is exclusive for sampling candidates during online tracking 396 | samples[idx, :] = bbx_sample 397 | idx += 1 398 | continue 399 | 400 | overlap_ratio = IoU(bbx_sample, gt) 401 | if overlap_ratio >= pos_thresh and pos_sampling: #if positive sampling is being performed and its overlapping ratio >= 0.7 402 | samples[idx, :] = bbx_sample 403 | idx += 1 404 | elif overlap_ratio < neg_thresh and not pos_sampling: #if negative sampling is being performed and its overlapping ratio < 0.3 405 | samples[idx, :] = bbx_sample 406 | idx += 1 407 | 408 | return samples 409 | 410 | 411 | def gen_negative_samples_polar_radius(num_samples, gt, raw_img_size): 412 | """This function will generate num_samples negative samples using polar-radius based method""" 413 | frame_height = raw_img_size[0] 414 | frame_width = raw_img_size[1] 415 | 416 | theta_list = np.linspace(0, 2 * np.pi, 60) 417 | 418 | l_x, t_y, w, h = gt[0], gt[1], gt[2], gt[3] 419 | 420 | r_start = 0.2 * np.sqrt(w ** 2 + h ** 2) 421 | r_end = 0.5 * np.sqrt(w ** 2 + h ** 2) 422 | r_list = np.linspace(r_start, r_end, 10) 423 | 424 | c_x, c_y = l_x+w/2, t_y+h/2 425 | 426 | sample_cnt = 0 427 | sample_list = np.zeros((0, 4), dtype=np.float32) 428 | 429 | iter_cnt = 0 430 | while sample_cnt < num_samples: 431 | iter_cnt += 1 432 | if iter_cnt > 3: break 433 | 434 | for theta in theta_list: 435 | if sample_cnt >= num_samples: break 436 | 437 | angle_eps = np.pi/9 438 | if np.abs(theta) <= angle_eps \ 439 | or np.abs(theta-np.pi/2) <= angle_eps \ 440 | or np.abs(theta-np.pi) <= angle_eps \ 441 | or np.abs(theta-1.5*np.pi) <= angle_eps \ 442 | or np.abs(theta-2*np.pi) <= angle_eps: continue 443 | 444 | for r in r_list: 445 | if sample_cnt >= num_samples: break 446 | 447 | c_x__, c_y__ = c_x + r * np.cos(theta), c_y - r * np.sin(theta) 448 | if theta >= 0 and theta < np.pi/2: #theta in Region I 449 | h__ = 2.0 * (c_y - c_y__ + h / 2.0) 450 | w__ = 2.0 * (c_x__ - c_x + w / 2.0) 451 | 452 | elif theta >= np.pi/2 and theta < np.pi: #theta in Region II 453 | h__ = 2.0 * (c_y - c_y__ + h / 2.0) 454 | w__ = 2.0 * (c_x - c_x__ + w / 2.0) 455 | 456 | elif theta >= np.pi and theta < 1.5 * np.pi: #theta in Region III 457 | h__ = 2.0 * (c_y__ - c_y + h / 2.0) 458 | w__ = 2.0 * (c_x - c_x__ + w / 2.0) 459 | 460 | else: #theta in Region IV 461 | h__ = 2.0 * (c_y__ - c_y + h / 2.0) 462 | w__ = 2.0 * (c_x__ - c_x + w / 2.0) 463 | 464 | l_x__ = c_x__ - w__ / 2.0 465 | t_y__ = c_y__ - h__ / 2.0 466 | 467 | r_x__ = l_x__ + w__ - 1 468 | b_y__ = t_y__ + h__ - 1 469 | 470 | l_x__ = max(0, l_x__) 471 | t_y__ = max(0, t_y__) 472 | r_x__ = min(r_x__, frame_width - 1) 473 | b_y__ = min(b_y__, frame_height - 1) 474 | 475 | w__ = r_x__ - l_x__ + 1 476 | h__ = b_y__ - t_y__ + 1 477 | 478 | bbx_sample = np.array([l_x__, t_y__, w__, h__]) 479 | overlap_ratio = IoU(bbx_sample, gt) 480 | #print 'overlap_ratio: {}'.format(overlap_ratio) 481 | 482 | if overlap_ratio <= 0.6: 483 | sample_list = np.vstack((sample_list, bbx_sample.reshape(1, 4))) 484 | sample_cnt += 1 485 | 486 | return sample_list 487 | 488 | 489 | def display(frame, saving_path, fname): 490 | saving_dir = saving_root + '/' + saving_path 491 | if not os.path.exists(saving_dir): 492 | os.makedirs(saving_dir) 493 | plt.imsave(saving_dir + '/' + fname, frame) 494 | 495 | #image_file = cbook.get_sample_data(saving_path + '/' + fname) 496 | #image = plt.imread(image_file) 497 | #plt.imshow(image) 498 | #plt.show() 499 | 500 | 501 | def vis_neg_finetuning_data_pool(seq_name, raw_img, idx_frame, neg_samples_gaussian, neg_samples_uniform, neg_samples_whole, neg_samples_polar_radius): 502 | root_dir = cfg.CODE_ROOT_DIR + '/output/finetuning_data/{}/'.format(cfg.TEST.BENCHMARK_NAME) 503 | 504 | tar_dir = root_dir + seq_name + '/' + str(idx_frame) 505 | if not os.path.exists(tar_dir): 506 | os.makedirs(tar_dir) 507 | 508 | for idx in xrange(neg_samples_gaussian.shape[0]): 509 | bbx_sample = neg_samples_gaussian[idx, :] 510 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 511 | 512 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (255, 0, 0)) 513 | 514 | for idx in xrange(neg_samples_uniform.shape[0]): 515 | bbx_sample = neg_samples_uniform[idx, :] 516 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 517 | 518 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (0, 255, 0)) 519 | 520 | for idx in xrange(neg_samples_whole.shape[0]): 521 | bbx_sample = neg_samples_whole[idx, :] 522 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 523 | 524 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (0, 0, 255)) 525 | 526 | for idx in xrange(neg_samples_polar_radius.shape[0]): 527 | bbx_sample = neg_samples_polar_radius[idx, :] 528 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 529 | 530 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (255, 255, 255)) 531 | 532 | now = datetime.datetime.now() 533 | jpg_name = now.strftime('%Y-%m-%d_%H:%M:%S') + '.jpg' 534 | cv2.imwrite(tar_dir + '/{}'.format(jpg_name), raw_img) 535 | 536 | 537 | def unif_save_visualization(frame_dup, path_seq, index_new_frame, pred_bbx_score, cand_dict_list, index_order): 538 | seq_name = os.path.split(path_seq)[1] 539 | saving_path = seq_name + '/' + str(index_new_frame) 540 | saving_root = cfg.CODE_ROOT_DIR + '/output/experimental/test_phase/{}'.format(cfg.TEST.BENCHMARK_NAME) 541 | saving_dir = saving_root + '/' + saving_path 542 | fname = 'cands.jpg' 543 | 544 | if not os.path.exists(saving_dir): 545 | os.makedirs(saving_dir) 546 | cv2.imwrite(os.path.join(saving_dir, fname), frame_dup) 547 | 548 | if pred_bbx_score >= 0.90: 549 | corr_fobj = open(os.path.join(saving_dir, 'corr.txt'), 'w') 550 | for index in index_order[:20]: 551 | distance = cand_dict_list[index, -1] 552 | prob = cand_dict_list[index, -2] 553 | entry = '{} {} {}'.format(index, prob, distance) 554 | corr_fobj.write(entry + '\n') 555 | corr_fobj.close() 556 | 557 | 558 | def unif_vis_cands_conf_weight(i, index, bbxes_Pk, frame_dup, path_seq, index_new_frame, cand_dict_list, i_dist_prob, i_factor): 559 | bbx_sample = bbxes_Pk[index, :] 560 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1))) 561 | cv2.rectangle(frame_dup, box[:2], box[2:], (0, 0, 255)) 562 | cv2.putText(frame_dup, '{}'.format(index), box[:2], cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, cv2.LINE_AA) 563 | 564 | 565 | def gen_config_report(): 566 | tar_fpath = cfg.ROOT_DIR + '/output/tracking_res/{}/{}'.format(cfg.TEST.BENCHMARK_NAME, cfg.TEST.BENCHMARK_NAME) + '_config_rep.txt' 567 | tar_fobj = open(tar_fpath, 'w') 568 | 569 | for key in cfg.keys(): 570 | if key == 'TEST' or key == 'TRAIN': 571 | continue 572 | 573 | info = '__C.' + key + ': ' + cfg[key] 574 | tar_fobj.write(info + '\n') 575 | 576 | for key in cfg.TEST.keys(): 577 | info = '__C.TEST.{}: {}'.format(key, cfg.TEST[key]) 578 | tar_fobj.write(info + '\n') 579 | 580 | tar_fobj.close() 581 | 582 | 583 | def compute_Gaussian2D_prob(x, mu, cov): 584 | det_cov = cov[0, 0] * cov[1, 1] 585 | normalizer = 2 * np.pi * (det_cov ** 0.5) 586 | 587 | delta = x - mu 588 | mahalanoibis_dis = -0.5 * (delta[0] ** 2 / cov[0, 0] + delta[1] ** 2 / cov[1, 1]) 589 | 590 | return (1.0 / normalizer) * np.exp(mahalanoibis_dis) 591 | 592 | 593 | def compute_Laplacian2D_prob(x, mu, b): 594 | euclidean_dis = np.dot((x - mu), (x - mu)) ** (0.5) 595 | return 1.0 / (2.0 * b) * np.exp(-1.0 * euclidean_dis / b) 596 | 597 | 598 | def determine_displacement(bbx1, bbx2): 599 | cx1 = bbx1[0] + bbx1[2] / 2.0 600 | cy1 = bbx1[1] + bbx1[3] / 2.0 601 | cx2 = bbx2[0] + bbx2[2] / 2.0 602 | cy2 = bbx2[1] + bbx2[3] / 2.0 603 | #return np.abs(cx1 - cx2), np.abs(cy1 - cy2) 604 | return np.sqrt((cx1 - cx2) ** 2 + (cy1 - cy2) ** 2) 605 | 606 | 607 | def func_iou(bb, gtbb): 608 | iou = 0 609 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1 610 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1 611 | 612 | if iw>0 and ih>0: 613 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih 614 | iou = iw*ih/ua; 615 | 616 | return iou 617 | 618 | 619 | def sample_regions_precompute(rad, nr_ang, stepsize, scales=[0.7071, 1, 1.4142]): 620 | nr_step = int(rad / stepsize) 621 | cos_values = np.cos(np.arange(0,2*np.pi,2*np.pi/nr_ang)) 622 | sin_values = np.sin(np.arange(0,2*np.pi,2*np.pi/nr_ang)) 623 | 624 | dxdys = np.zeros((2,nr_step*nr_ang+1)) 625 | count = 0 626 | for ir in range(1,nr_step+1): 627 | offset = stepsize * ir 628 | for ia in range(1,nr_ang+1): 629 | 630 | dx = offset * cos_values[ia-1] 631 | dy = offset * sin_values[ia-1] 632 | count += 1 633 | dxdys[0, count-1] = dx 634 | dxdys[1, count-1] = dy 635 | 636 | samples = np.zeros((4,(nr_ang*nr_step+1)*len(scales))) 637 | count = 0 638 | jump = nr_step*nr_ang+1 639 | for s in scales: 640 | samples[0:2, count*jump:(count+1)*jump] = dxdys 641 | samples[2, count*jump:(count+1)*jump] = s; 642 | samples[3, count*jump:(count+1)*jump] = s; 643 | count = count + 1 644 | 645 | return samples # dx dy 1*s 1*s 646 | 647 | 648 | def sample_regions(x, y, w, h, im_w, im_h, samples_template): 649 | samples = samples_template.copy() 650 | samples[0,:] += x 651 | samples[1,:] += y 652 | samples[2,:] *= w 653 | samples[3,:] *= h 654 | 655 | samples[2,:] = samples[0,:] + samples[2,:] - 1 656 | samples[3,:] = samples[1,:] + samples[3,:] - 1 657 | samples = np.round(samples) 658 | 659 | flags = np.logical_and(np.logical_and(np.logical_and(samples[0,:]>0, samples[1,:]>0), samples[2,:] 4: 32 | return Polygon([Point(tokens[i],tokens[i+1]) for i in xrange(0,len(tokens),2)]) 33 | return None 34 | 35 | def encode_region(region): 36 | if isinstance(region, Polygon): 37 | return ','.join(['{},{}'.format(p.x,p.y) for p in region.points]) 38 | elif isinstance(region, Rectangle): 39 | return '{},{},{},{}'.format(region.x, region.y, region.width, region.height) 40 | else: 41 | return "" 42 | 43 | def convert_region(region, to): 44 | 45 | if to == 'rectangle': 46 | 47 | if isinstance(region, Rectangle): 48 | return copy.copy(region) 49 | elif isinstance(region, Polygon): 50 | top = sys.float_info.max 51 | bottom = sys.float_info.min 52 | left = sys.float_info.max 53 | right = sys.float_info.min 54 | 55 | for point in region.points: 56 | top = min(top, point.y) 57 | bottom = max(bottom, point.y) 58 | left = min(left, point.x) 59 | right = max(right, point.x) 60 | 61 | return Rectangle(left, top, right - left, bottom - top) 62 | 63 | else: 64 | return None 65 | if to == 'polygon': 66 | 67 | if isinstance(region, Rectangle): 68 | points = [] 69 | points.append((region.x, region.y)) 70 | points.append((region.x + region.width, region.y)) 71 | points.append((region.x + region.width, region.y + region.height)) 72 | points.append((region.x, region.y + region.height)) 73 | return Polygon(points) 74 | 75 | elif isinstance(region, Polygon): 76 | return copy.copy(region) 77 | else: 78 | return None 79 | 80 | return None 81 | 82 | class VOT(object): 83 | """ Base class for Python VOT integration """ 84 | def __init__(self, region_format): 85 | """ Constructor 86 | 87 | Args: 88 | region_format: Region format options 89 | """ 90 | assert(region_format in ['rectangle', 'polygon']) 91 | if TRAX: 92 | options = trax.server.ServerOptions(region_format, trax.image.PATH) 93 | self._trax = trax.server.Server(options) 94 | 95 | request = self._trax.wait() 96 | assert(request.type == 'initialize') 97 | if request.region.type == 'polygon': 98 | self._region = Polygon([Point(x[0], x[1]) for x in request.region.points]) 99 | else: 100 | self._region = Rectangle(request.region.x, request.region.y, request.region.width, request.region.height) 101 | self._image = str(request.image) 102 | self._trax.status(request.region) 103 | else: 104 | self._files = [x.strip('\n') for x in open('images.txt', 'r').readlines()] 105 | self._frame = 0 106 | self._region = convert_region(parse_region(open('region.txt', 'r').readline()), region_format) 107 | self._result = [] 108 | 109 | def region(self): 110 | """ 111 | Send configuration message to the client and receive the initialization 112 | region and the path of the first image 113 | 114 | Returns: 115 | initialization region 116 | """ 117 | 118 | return self._region 119 | 120 | def report(self, region, confidence = 0): 121 | """ 122 | Report the tracking results to the client 123 | 124 | Arguments: 125 | region: region for the frame 126 | """ 127 | assert(isinstance(region, Rectangle) or isinstance(region, Polygon)) 128 | if TRAX: 129 | if isinstance(region, Polygon): 130 | tregion = trax.region.Polygon([(x.x, x.y) for x in region.points]) 131 | else: 132 | tregion = trax.region.Rectangle(region.x, region.y, region.width, region.height) 133 | self._trax.status(tregion, {"confidence" : confidence}) 134 | else: 135 | self._result.append(region) 136 | self._frame += 1 137 | 138 | def frame(self): 139 | """ 140 | Get a frame (image path) from client 141 | 142 | Returns: 143 | absolute path of the image 144 | """ 145 | if TRAX: 146 | if hasattr(self, "_image"): 147 | image = str(self._image) 148 | del self._image 149 | return image 150 | 151 | request = self._trax.wait() 152 | 153 | if request.type == 'frame': 154 | return str(request.image) 155 | else: 156 | return None 157 | 158 | else: 159 | if self._frame >= len(self._files): 160 | return None 161 | return self._files[self._frame] 162 | 163 | def quit(self): 164 | if TRAX: 165 | self._trax.quit() 166 | elif hasattr(self, '_result'): 167 | with open('output.txt', 'w') as f: 168 | for r in self._result: 169 | f.write(encode_region(r)) 170 | f.write('\n') 171 | 172 | def __del__(self): 173 | self.quit() 174 | 175 | -------------------------------------------------------------------------------- /code/vot_SiamRPN.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # DaSiamRPN 3 | # Licensed under The MIT License 4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn) 5 | # -------------------------------------------------------- 6 | #!/usr/bin/python 7 | 8 | import vot 9 | from vot import Rectangle 10 | import sys 11 | import cv2 # imread 12 | import torch 13 | import numpy as np 14 | from os.path import realpath, dirname, join 15 | 16 | from net import SiamRPNBIG 17 | from run_SiamRPN import SiamRPN_init, SiamRPN_track 18 | from utils import get_axis_aligned_bbox, cxy_wh_2_rect 19 | 20 | # load net 21 | net_file = join(realpath(dirname(__file__)), 'SiamRPNBIG.model') 22 | net = SiamRPNBIG() 23 | net.load_state_dict(torch.load(net_file)) 24 | net.eval().cuda() 25 | 26 | # warm up 27 | for i in range(10): 28 | net.temple(torch.autograd.Variable(torch.FloatTensor(1, 3, 127, 127)).cuda()) 29 | net(torch.autograd.Variable(torch.FloatTensor(1, 3, 255, 255)).cuda()) 30 | 31 | # start to track 32 | handle = vot.VOT("polygon") 33 | Polygon = handle.region() 34 | cx, cy, w, h = get_axis_aligned_bbox(Polygon) 35 | 36 | image_file = handle.frame() 37 | if not image_file: 38 | sys.exit(0) 39 | 40 | target_pos, target_sz = np.array([cx, cy]), np.array([w, h]) 41 | im = cv2.imread(image_file) # HxWxC 42 | state = SiamRPN_init(im, target_pos, target_sz, net) # init tracker 43 | while True: 44 | image_file = handle.frame() 45 | if not image_file: 46 | break 47 | im = cv2.imread(image_file) # HxWxC 48 | state = SiamRPN_track(state, im) # track 49 | 50 | # convert cx, cy, w, h into rect 51 | res = cxy_wh_2_rect(state['target_pos'], state['target_sz']) 52 | handle.report(Rectangle(res[0], res[1], res[2], res[3])) 53 | 54 | -------------------------------------------------------------------------------- /data/whole_list.txt: -------------------------------------------------------------------------------- 1 | /home/lishen/Experiments/CLSTMT/dataset/test_set/OTB100/sequences/KiteSurf 84 2 | -------------------------------------------------------------------------------- /ext/roi-align.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/ext/roi-align.png --------------------------------------------------------------------------------