├── README.md
├── code
├── __pycache__
│ ├── bbox_transform.cpython-36.pyc
│ ├── config.cpython-36.pyc
│ ├── dataset.cpython-36.pyc
│ ├── gen_all_anchors.cpython-36.pyc
│ ├── generate_anchors.cpython-36.pyc
│ ├── net.cpython-36.pyc
│ ├── py_nms.cpython-36.pyc
│ ├── run_SiamRPN.cpython-36.pyc
│ ├── train.cpython-36.pyc
│ ├── util_test.cpython-36.pyc
│ └── utils.cpython-36.pyc
├── bbox_transform.py
├── config.py
├── configuration.py
├── dataset.py
├── gen_all_anchors.py
├── generate_anchors.py
├── net.py
├── py_nms.py
├── run_SiamRPN.py
├── test.py
├── train.py
├── util_test.py
├── utils.py
├── vot.py
└── vot_SiamRPN.py
├── data
└── whole_list.txt
└── ext
└── roi-align.png
/README.md:
--------------------------------------------------------------------------------
1 | # DaSiamRPNWithOfflineTraining
2 |
3 | This repository adds offline training module and testing module (including distractor-awareness and local2global strategy) to the original PyTorch implementation of [DaSiamRPN](https://github.com/foolwood/DaSiamRPN).
4 |
5 | ## Introduction
6 |
7 | **SiamRPN** formulates the task of visual tracking as a task of localization and identification simultaneously, initially described in an [CVPR2018 spotlight paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf). (Slides at [CVPR 2018 Spotlight](https://drive.google.com/open?id=1OGIOUqANvYfZjRoQfpiDqhPQtOvPCpdq))
8 |
9 | **DaSiamRPN** improves the performances of SiamRPN by (1) introducing an effective sampling strategy to control the imbalanced sample distribution, (2) designing a novel distractor-aware module to perform incremental learning, (3) making a long-term tracking extension. [ECCV2018](https://arxiv.org/pdf/1808.06048.pdf). (Slides at [VOT-18 Real-time challenge winners talk](https://drive.google.com/open?id=1dsEI2uYHDfELK0CW2xgv7R4QdCs6lwfr))
10 |
11 | Specifically, for (2), this repository implements ROI-align technique to achieve similarity matching between x and z. The insight of the ROI-align implementation can be seen from the figure below.
12 |
13 |
14 | ## Prerequisites
15 |
16 | CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
17 | GPU: NVIDIA GTX1060
18 |
19 | - python3.6
20 | - pytorch == 0.4.0
21 | - numpy
22 | - opencv
23 | - easydict
24 |
25 | ## Data Preparation
26 | One can prepare his own dataset for training and testing DaSiamRPN. Putting aside the positive and negative pairing for distractor-aware training as specified in the paper, each training and testing sequence, say Basketball, is organized in a folder "Basketball" in which "Basketball_gt.txt" and a sub-folder "imgs" are stored. The gt files have lines of groundtruths in format of (x, y, w, h). In "imgs" folders are frames named in format "xxxx.jpg", (e.g. 0001.jpg-9999.jpg).
27 |
28 | Besides the data, one should also prepare the corresponding list formatted as in ./data/whole_list.txt, where each row consists of the path of a sequence folder and the number of total frames of the sequence.
29 |
30 | ## Training Procedure
31 | `python code/train.py`
32 |
33 | The model will be saved in ./output/weights/
34 |
35 | ## Testing Procedure
36 | `python code/test.py`
37 |
38 | ## Postscript
39 | Currently, this repo remains under construction, meaning that its effectiveness is not guaranteed. But one can still get some insights from reading this repo, including myself. And that is exactly what really matters. However, more is coming in the immediate future, including: (1) the sampling strategy to control the imbalanced sample distribution and (2) other implementation details not specified clearly in the related paper.
40 |
41 | To better this repo, I am looking forward to your suggestion. ^_^
42 |
43 |
--------------------------------------------------------------------------------
/code/__pycache__/bbox_transform.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/bbox_transform.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/config.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/config.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/dataset.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/dataset.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/gen_all_anchors.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/gen_all_anchors.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/generate_anchors.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/generate_anchors.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/net.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/net.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/py_nms.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/py_nms.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/run_SiamRPN.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/run_SiamRPN.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/train.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/train.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/util_test.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/util_test.cpython-36.pyc
--------------------------------------------------------------------------------
/code/__pycache__/utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/code/__pycache__/utils.cpython-36.pyc
--------------------------------------------------------------------------------
/code/bbox_transform.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 | import pdb
10 | def bbox_transform(ex_rois, gt_rois):
11 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
12 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
13 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
14 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
15 |
16 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
17 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
18 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
19 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
20 |
21 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
22 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
23 | targets_dw = np.log(gt_widths / ex_widths)
24 | targets_dh = np.log(gt_heights / ex_heights)
25 |
26 | targets = np.vstack(
27 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
28 | return targets
29 |
30 | def bbox_transform_inv(boxes, deltas):
31 | if boxes.shape[0] == 0:
32 | return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
33 |
34 | boxes = boxes.astype(deltas.dtype, copy=False)
35 | #pdb.set_trace()
36 | widths = boxes[:, 2] - boxes[:, 0] + 1.0
37 | heights = boxes[:, 3] - boxes[:, 1] + 1.0
38 | ctr_x = boxes[:, 0] + 0.5 * widths
39 | ctr_y = boxes[:, 1] + 0.5 * heights
40 |
41 | dx = deltas[:, 0::4]
42 | dy = deltas[:, 1::4]
43 | dw = deltas[:, 2::4]
44 | dh = deltas[:, 3::4]
45 |
46 | pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
47 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
48 | pred_w = np.exp(dw) * widths[:, np.newaxis]
49 | pred_h = np.exp(dh) * heights[:, np.newaxis]
50 |
51 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
52 | # x1
53 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
54 | # y1
55 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
56 | # x2
57 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
58 | # y2
59 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
60 |
61 | return pred_boxes
62 |
63 | def clip_boxes(boxes, im_shape):
64 | """
65 | Clip boxes to image boundaries.
66 | """
67 |
68 | # x1 >= 0
69 | boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
70 | # y1 >= 0
71 | boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
72 | # x2 < im_shape[1]
73 | boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
74 | # y2 < im_shape[0]
75 | boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
76 | return boxes
77 |
--------------------------------------------------------------------------------
/code/config.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Fast R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick
6 | # --------------------------------------------------------
7 |
8 | """Fast R-CNN config system.
9 |
10 | This file specifies default config options for Fast R-CNN. You should not
11 | change values in this file. Instead, you should write a config file (in yaml)
12 | and use cfg_from_file(yaml_file) to load it and override the default options.
13 |
14 | Most tools in $ROOT/tools take a --cfg option to specify an override file.
15 | - See tools/{train,test}_net.py for example code that uses cfg_from_file()
16 | - See experiments/cfgs/*.yml for example YAML config override files
17 | """
18 |
19 | import os
20 | import os.path as osp
21 | import numpy as np
22 | # `pip install easydict` if you don't have it
23 | from easydict import EasyDict as edict
24 |
25 | __C = edict()
26 | # Consumers can get config by:
27 | # from fast_rcnn_config import cfg
28 | cfg = __C
29 |
30 | #
31 | # Training options
32 | #
33 |
34 | __C.TRAIN = edict()
35 |
36 | __C.TRAIN.SNAPSHOT_INFIX = ''
37 |
38 | # Deprecated (inside weights)
39 | __C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
40 | # IOU >= thresh: positive example
41 | __C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
42 | # IOU < thresh: negative example
43 | __C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
44 | # Deprecated (outside weights)
45 | __C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
46 | # Give the positive RPN examples weight of p * 1 / {num positives}
47 | # and give negatives a weight of (1 - p)
48 | # Set to -1.0 to use uniform example weighting
49 | __C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
50 | __C.TRAIN.LAMBDA = 1.0
51 |
52 | #
53 | # Testing options
54 | #
55 |
56 | __C.TEST = edict()
57 |
58 | # Scales to use during testing (can list multiple scales)
59 | # Each scale is the pixel size of an image's shortest side
60 | __C.TEST.SCALES = (600,)
61 |
62 | # Max pixel size of the longest side of a scaled input image
63 | __C.TEST.MAX_SIZE = 1000
64 |
65 | # Overlap threshold used for non-maximum suppression (suppress boxes with
66 | # IoU >= this threshold)
67 | __C.TEST.NMS = 0.3
68 |
69 | # Experimental: treat the (K+1) units in the cls_score layer as linear
70 | # predictors (trained, eg, with one-vs-rest SVMs).
71 | __C.TEST.SVM = False
72 |
73 | # Test using bounding-box regressors
74 | __C.TEST.BBOX_REG = True
75 |
76 | # Propose boxes
77 | __C.TEST.HAS_RPN = True
78 |
79 | # Test using these proposals
80 | __C.TEST.PROPOSAL_METHOD = 'gt'
81 |
82 | ## NMS threshold used on RPN proposals
83 | __C.TEST.RPN_NMS_THRESH = 0.7
84 | ## Number of top scoring boxes to keep before apply NMS to RPN proposals
85 | __C.TEST.RPN_PRE_NMS_TOP_N = 6000
86 | ## Number of top scoring boxes to keep after applying NMS to RPN proposals
87 | __C.TEST.RPN_POST_NMS_TOP_N = 300
88 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
89 | __C.TEST.RPN_MIN_SIZE = 16
90 |
91 |
92 | #
93 | # MISC
94 | #
95 |
96 | # The mapping from image coordinates to feature map coordinates might cause
97 | # some boxes that are distinct in image space to become identical in feature
98 | # coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
99 | # for identifying duplicate boxes.
100 | # 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
101 | __C.DEDUP_BOXES = 1./16.
102 |
103 | # Pixel mean values (BGR order) as a (1, 1, 3) array
104 | # We use the same pixel mean for all networks even though it's not exactly what
105 | # they were trained with
106 | __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
107 |
108 | # For reproducibility
109 | __C.RNG_SEED = 3
110 |
111 | # A small number that's used many times
112 | __C.EPS = 1e-14
113 |
114 | # Root directory of project
115 | __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
116 |
117 | # Data directory
118 | __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
119 |
120 | # Model directory
121 | __C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'caltech_ped'))
122 |
123 | # Name (or path to) the matlab executable
124 | __C.MATLAB = 'matlab'
125 |
126 | # Place outputs under an experiments directory
127 | __C.EXP_DIR = 'default'
128 |
129 | # Use GPU implementation of non-maximum suppression
130 | __C.USE_GPU_NMS = True
131 |
132 | # Default GPU device id
133 | __C.GPU_ID = 0,1
134 |
135 | # added by zhk
136 | # Default code dir
137 | __C.NEW_ROOT_DIR = '/home/code/lishen/py-faster-rcnn'
138 |
139 |
140 | def get_output_dir(imdb, net=None):
141 | """Return the directory where experimental artifacts are placed.
142 | If the directory does not exist, it is created.
143 |
144 | A canonical path is built using the name from an imdb and a network
145 | (if not None).
146 | """
147 | #outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', imdb.name))
148 | if __C.TRAIN.USE_OHEM:
149 | outdir = osp.abspath(osp.join(__C.NEW_ROOT_DIR, 'output', imdb.name, 'ohem'))
150 | else:
151 | outdir = osp.abspath(osp.join(__C.NEW_ROOT_DIR, 'output', imdb.name))
152 | if net is not None:
153 | outdir = osp.join(outdir, net.name)
154 | if not os.path.exists(outdir):
155 | os.makedirs(outdir)
156 | return outdir
157 |
158 | def _merge_a_into_b(a, b):
159 | """Merge config dictionary a into config dictionary b, clobbering the
160 | options in b whenever they are also specified in a.
161 | """
162 | if type(a) is not edict:
163 | return
164 |
165 | for k, v in a.iteritems():
166 | # a must specify keys that are in b
167 | if not b.has_key(k):
168 | raise KeyError('{} is not a valid config key'.format(k))
169 |
170 | # the types must match, too
171 | old_type = type(b[k])
172 | if old_type is not type(v):
173 | if isinstance(b[k], np.ndarray):
174 | v = np.array(v, dtype=b[k].dtype)
175 | else:
176 | raise ValueError(('Type mismatch ({} vs. {}) '
177 | 'for config key: {}').format(type(b[k]),
178 | type(v), k))
179 |
180 | # recursively merge dicts
181 | if type(v) is edict:
182 | try:
183 | _merge_a_into_b(a[k], b[k])
184 | except:
185 | print('Error under config key: {}'.format(k))
186 | raise
187 | else:
188 | b[k] = v
189 |
190 | def cfg_from_file(filename):
191 | """Load a config file and merge it into the default options."""
192 | import yaml
193 | with open(filename, 'r') as f:
194 | yaml_cfg = edict(yaml.load(f))
195 |
196 | _merge_a_into_b(yaml_cfg, __C)
197 |
198 | def cfg_from_list(cfg_list):
199 | """Set config keys via list (e.g., from command line)."""
200 | from ast import literal_eval
201 | assert len(cfg_list) % 2 == 0
202 | for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
203 | key_list = k.split('.')
204 | d = __C
205 | for subkey in key_list[:-1]:
206 | assert d.has_key(subkey)
207 | d = d[subkey]
208 | subkey = key_list[-1]
209 | assert d.has_key(subkey)
210 | try:
211 | value = literal_eval(v)
212 | except:
213 | # handle the case when v is a string literal
214 | value = v
215 | assert type(value) == type(d[subkey]), \
216 | 'type {} does not match original type {}'.format(
217 | type(value), type(d[subkey]))
218 | d[subkey] = value
219 |
--------------------------------------------------------------------------------
/code/configuration.py:
--------------------------------------------------------------------------------
1 |
2 | class ModelConfig(object):
3 |
4 | def __init__(self):
5 | self.image_format = 'jpeg'
6 |
7 | self.batch_size = 5
8 | self.max_seq_len = 15
9 |
10 | self.image_size = [224, 224]
11 | self.num_image_channels = 3
12 |
13 | self.num_clstm_kernels = 256 #384 for two layers of ConvLSTMs
14 | self.clstm_kernel_size = [3, 3]
15 | self.num_convlstm_layers = 2
16 | # If < 1.0, the dropout keep probability applied to ConvLSTM variables.
17 | self.clstm_dropout_keep_prob = 0.7
18 |
19 | self.pretrained_model_file = None
20 |
21 | self.training_data_tfrecord_path = \
22 | '/home/lishen/Experiments/CLSTMT/dataset/training_set/TFRecord/training_data.tf_record.soft_gt'
23 |
24 | # Approximate number of values per input shard. Used to ensure sufficient
25 | # mixing between shards in training.
26 | self.values_per_input_shard = 2300
27 | # Minimum number of shards to keep in the input queue.
28 | self.input_queue_capacity_factor = 2
29 | # Number of threads for prefetching SequenceExample protos.
30 | self.num_input_reader_threads = 1
31 |
32 | # Number of threads for image preprocessing. Should be a multiple of 2.
33 | self.num_preprocess_threads = 1
34 |
35 | self.num_seqs = 115 # total number of domains(training tracking sequences)
36 |
37 |
38 | class TrainingConfig(object):
39 |
40 | def __init__(self):
41 | """Set the default training hyper-parameters."""
42 |
43 | # Optimizer for training the model
44 | self.optimizer = "SGD"
45 | self.max_epoches = 100
46 | self.learning_rate = 0.001
47 |
48 |
49 | class FinetuningConfig(object):
50 | def __init__(self):
51 | self.learning_rate = 0.01
52 | self.use_domain_specific_finetuned_model = True
53 |
54 |
55 | class TestingConfig(object):
56 |
57 | def __init__(self):
58 | self.root_dir = '/home/lishen/Experiments/CLSTMT'
59 | self.code_root_dir = '/home/code/lishen/dataset'
60 | self.peep_ratio = 3.5
61 |
62 |
63 | class VerificationModelConfig(object):
64 | def __init__(self):
65 | self.pretrained_model_file = "./weights/vgg16_verif.npy"
66 | self.num_boxes_per_batch = None
67 |
68 |
69 |
--------------------------------------------------------------------------------
/code/dataset.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import numpy.random as npr
3 | from PIL import Image
4 | from torch.utils.data import Dataset, DataLoader
5 | import torchvision.transforms as transforms
6 | import torch
7 | import os
8 | import os.path as osp
9 | import linecache
10 | import cv2
11 |
12 |
13 | class DaSiamTrainingSet(Dataset):
14 | def __init__(self, transform, z_size, x_size):
15 | self.root_path = "/home/lishen/Experiments/CLSTMT/dataset/test_set/OTB100/"
16 | self.domain2nseq = {}
17 | self.create_domain2nseq(osp.join(self.root_path, "whole_list.txt"))
18 | self.transform = transform
19 | self.z_size = z_size
20 | self.x_size = x_size
21 |
22 | def create_domain2nseq(self, list_fpath):
23 | with open(list_fpath, 'r') as f:
24 | while True:
25 | line = f.readline()
26 | if not line:
27 | break
28 | splits = line.strip().split()
29 | domain_name = splits[0].split('/')[-1]
30 | nseq = int(splits[1])
31 | self.domain2nseq[domain_name] = nseq
32 |
33 | def __len__(self):
34 | return sum(self.domain2nseq.values()) // len(self.domain2nseq.values())
35 |
36 | def __getitem__(self, item):
37 | domain_list = list(self.domain2nseq.keys())
38 | domain_name = npr.choice(domain_list, size=1)[0]
39 | num_frames = self.domain2nseq[domain_name]
40 |
41 | pair_frame_nos = npr.choice(range(1, num_frames+1), size=2, replace=False)
42 | z_frame_no, x_frame_no = min(pair_frame_nos), max(pair_frame_nos)
43 |
44 | domain_dir = osp.join(self.root_path, "sequences", domain_name)
45 | gt_fpath = osp.join(domain_dir, domain_name + '_gt.txt')
46 | z_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, z_frame_no).split()))
47 | x_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, x_frame_no).split()))
48 |
49 | z_frame_img_name = str(z_frame_no).zfill(4) + '.jpg'
50 | x_frame_img_name = str(x_frame_no).zfill(4) + '.jpg'
51 | z_frame = cv2.imread(osp.join(domain_dir, 'imgs', z_frame_img_name))
52 | x_frame = cv2.imread(osp.join(domain_dir, 'imgs', x_frame_img_name))
53 |
54 | #print(z_gt_bbx)
55 | z = crop_roi(z_frame, convert_bbx2box(z_gt_bbx))
56 | z = cv2.resize(z, self.z_size)
57 |
58 | x_gt_box = convert_bbx2box(x_gt_bbx)
59 | sr_box = gen_sr_box(x_frame, x_gt_box)
60 | x = crop_roi(x_frame, sr_box)
61 | x = cv2.resize(x, self.x_size)
62 |
63 | translated_x_gt_box = np.array(trans_coord(sr_box, x_gt_box))
64 |
65 | sample = {
66 | 'template': self.transform(z),
67 | 'search_region': self.transform(x),
68 | 'gt_box': translated_x_gt_box
69 | }
70 | return sample
71 |
72 |
73 | def trans_coord(sr_box, x_gt_box):
74 | return (x_gt_box[0]-sr_box[0], x_gt_box[1]-sr_box[1], x_gt_box[2]-sr_box[0], x_gt_box[3]-sr_box[1])
75 |
76 |
77 | def gen_sr_box(frame, gt_box):
78 | gt_x1, gt_y1, gt_x2, gt_y2 = gt_box
79 | h, w = gt_y2-gt_y1+1, gt_x2-gt_x1+1
80 | rand_cx = np.random.randint(gt_x1, gt_x2+1)
81 | rand_cy = np.random.randint(gt_y1, gt_y2+1)
82 |
83 | sr_x1, sr_y1, sr_x2, sr_y2 = rand_cx-w, rand_cy-h, rand_cx+w, rand_cy+h
84 | H, W = frame.shape[:2]
85 | return max(0, sr_x1), max(0, sr_y1), min(sr_x2, W-1), min(sr_y2, H-1)
86 |
87 |
88 | def convert_bbx2box(bbx):
89 | x, y, w, h = bbx
90 | return (x, y, x+w-1, y+h-1)
91 |
92 |
93 | def convert_box2bbx(box):
94 | x1, y1, x2, y2 = box
95 | return (x1, y1, x2-x1+1, y2-y1+1)
96 |
97 |
98 | def crop_roi(frame, box):
99 | return frame[box[1]:box[3]+1, box[0]:box[2]+1, :]
100 |
101 |
102 | def IoU(prop, gt):
103 | x1, y1, w1, h1 = map(prop, float)
104 | x2, y2, w2, h2 = map(gt, float)
105 | startx, endx = min(x1, x2), max(x1+w1, x2+w2)
106 | starty, endy = min(y1, y2), max(y1+h1, y2+h2)
107 | width = w1 + w2 - (endx - startx)
108 | height = h1 + h2 - (endy - starty)
109 | if width <= 0 or height <= 0:
110 | return 0
111 | else:
112 | area = width * height
113 | return 1.0*area/(w1*h1+w2*h2-area)
114 |
115 |
116 | def load_data(batch_size, z_size, x_size):
117 | transform = transforms.Compose([
118 | # convert a PIL.Image instance of value range [0, 255] or an numpy.ndarray of shape (H, W, C)
119 | # into a torch.FloatTensor of shape (C, H, W) with value range (0, 1.0).
120 | transforms.ToTensor(),
121 | ])
122 |
123 | datasets = {
124 | 'train': DaSiamTrainingSet(transform, z_size, x_size)
125 | }
126 |
127 | dataloaders = {ds: DataLoader(datasets[ds],
128 | batch_size=batch_size,
129 | shuffle=False,
130 | pin_memory=True,
131 | num_workers=8) for ds in datasets}
132 |
133 | return dataloaders
134 |
135 |
136 | if __name__ == "__main__":
137 | da_siam_set = DaSiamTrainingSet(transforms.ToTensor(), (127, 127), (255, 255))
138 |
139 | domain_list = list(da_siam_set.domain2nseq.keys())
140 | domain_name = npr.choice(domain_list, size=1)[0]
141 | num_frames = da_siam_set.domain2nseq[domain_name]
142 |
143 | pair_frame_nos = npr.choice(range(num_frames), size=2, replace=False)
144 | z_frame_no, x_frame_no = min(pair_frame_nos), max(pair_frame_nos)
145 |
146 | domain_dir = osp.join(da_siam_set.root_path, "sequences", domain_name)
147 | gt_fpath = osp.join(domain_dir, domain_name + '_gt.txt')
148 |
149 | z_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, z_frame_no).split()))
150 | x_gt_bbx = tuple(map(int, linecache.getline(gt_fpath, x_frame_no).split()))
151 |
152 | z_frame_img_name = str(z_frame_no).zfill(4) + '.jpg'
153 | x_frame_img_name = str(x_frame_no).zfill(4) + '.jpg'
154 | z_frame = cv2.imread(osp.join(domain_dir, 'imgs', z_frame_img_name))
155 | x_frame = cv2.imread(osp.join(domain_dir, 'imgs', x_frame_img_name))
156 |
157 | import pdb
158 | pdb.set_trace()
159 |
160 | z = crop_roi(z_frame, convert_bbx2box(z_gt_bbx))
161 | z = cv2.resize(z, da_siam_set.z_size)
162 |
163 | x_gt_box = convert_bbx2box(x_gt_bbx)
164 | sr_box = gen_sr_box(x_frame, x_gt_box)
165 | x = crop_roi(x_frame, sr_box)
166 | x = cv2.resize(x, da_siam_set.x_size)
167 |
168 | translated_x_gt_box = trans_coord(sr_box, x_gt_box)
169 |
170 | print('DONE.')
171 |
172 |
--------------------------------------------------------------------------------
/code/gen_all_anchors.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from generate_anchors import generate_anchors
3 |
4 | def generate_all_anchors(cls_output_shape, xs_shape):
5 | anchors = generate_anchors(ratios=[0.33, 0.5, 1, 2, 3], scales=np.array([8, ]))
6 | # anchors are in box format (x1, y1, x2, y2)
7 |
8 | A = anchors.shape[0]
9 | feat_stride = xs_shape[0] // cls_output_shape[0]
10 |
11 | allowed_border = 0
12 | height, width = cls_output_shape
13 |
14 | sr_size = xs_shape
15 |
16 | # 1. Generate proposals from bbox deltas and shifted anchors
17 | shift_x = np.arange(0, width) * feat_stride
18 | shift_y = np.arange(0, height) * feat_stride
19 | shift_x, shift_y = np.meshgrid(shift_x, shift_y)
20 | shifts = np.vstack((shift_x.ravel(),
21 | shift_y.ravel(),
22 | shift_x.ravel(),
23 | shift_y.ravel())).transpose()
24 |
25 | # 2. Add K anochors (1, A, 4) to cell K shifts (K, 1, 4)
26 | # to get shift anchors (K, A, 4) and reshape to (K*A, 4) shifted anchors
27 | K = shifts.shape[0]
28 | all_anchors = (anchors.reshape((1, A, 4))) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))
29 |
30 | all_anchors = all_anchors.reshape((K*A, 4)) # of shape (5x22x22, 4)
31 |
32 | """
33 | # total number of anchors == A * height * width,
34 | # where height and width are the size of conv feature map
35 | total_anchors = int(K*A)
36 |
37 | # Only keep anchors inside the image
38 | inds_inside = np.where(
39 | (all_anchors[:, 0] >= -allowed_border) &
40 | (all_anchors[:, 1] >= -allowed_border) &
41 | (all_anchors[:, 2] < sr_size[1] + allowed_border) &
42 | (all_anchors[:, 3] < sr_size[0] + allowed_border)
43 | )[0]
44 | anchors = all_anchors[inds_inside, :]
45 | # after keeping-inside, #anchors drops from 2420 down to 433
46 | """
47 |
48 | return all_anchors, A # anchors
49 |
50 | if __name__ == '__main__':
51 | sr_shape = (255, 255)
52 | conv_shape = (17, 17)
53 | all_anchors = generate_all_anchors(conv_shape, sr_shape)
54 | print(all_anchors)
55 |
56 | import cv2
57 | img = cv2.imread("../data/SPRING2004B69.jpg")
58 | img = cv2.resize(img, sr_shape)
59 | for anchor in all_anchors:
60 | x1y1x2y2 = tuple(map(int, list(anchor)))
61 | cv2.rectangle(img, x1y1x2y2[:2], x1y1x2y2[2:], 2)
62 | cv2.imwrite("../data/result.jpg", img)
63 |
64 | print("DONE.")
65 |
--------------------------------------------------------------------------------
/code/generate_anchors.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # Faster R-CNN
3 | # Copyright (c) 2015 Microsoft
4 | # Licensed under The MIT License [see LICENSE for details]
5 | # Written by Ross Girshick and Sean Bell
6 | # --------------------------------------------------------
7 |
8 | import numpy as np
9 |
10 | # Verify that we compute the same anchors as Shaoqing's matlab implementation:
11 | #
12 | # >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
13 | # >> anchors
14 | #
15 | # anchors =
16 | #
17 | # -83 -39 100 56
18 | # -175 -87 192 104
19 | # -359 -183 376 200
20 | # -55 -55 72 72
21 | # -119 -119 136 136
22 | # -247 -247 264 264
23 | # -35 -79 52 96
24 | # -79 -167 96 184
25 | # -167 -343 184 360
26 |
27 | #array([[ -83., -39., 100., 56.],
28 | # [-175., -87., 192., 104.],
29 | # [-359., -183., 376., 200.],
30 | # [ -55., -55., 72., 72.],
31 | # [-119., -119., 136., 136.],
32 | # [-247., -247., 264., 264.],
33 | # [ -35., -79., 52., 96.],
34 | # [ -79., -167., 96., 184.],
35 | # [-167., -343., 184., 360.]])
36 |
37 | def generate_anchors(base_size=16,
38 | ratios=[0.5, 1, 2], #aspect ratios = (0.5, 1, 2)
39 | scales=2**np.arange(3, 6)): #scales == array([2^3, 2^4, 2^5])
40 | """
41 | Generate anchor (reference) windows by enumerating aspect ratios X
42 | scales wrt a reference (0, 0, 15, 15) window.
43 | """
44 |
45 | base_anchor = np.array([1, 1, base_size, base_size]) - 1
46 | ratio_anchors = _ratio_enum(base_anchor, ratios)
47 | anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
48 | for i in range(ratio_anchors.shape[0])])
49 | return anchors
50 |
51 | def _whctrs(anchor):
52 | """
53 | Return width, height, x center, and y center for an anchor (window).
54 | """
55 |
56 | w = anchor[2] - anchor[0] + 1
57 | h = anchor[3] - anchor[1] + 1
58 | x_ctr = anchor[0] + 0.5 * (w - 1)
59 | y_ctr = anchor[1] + 0.5 * (h - 1)
60 | return w, h, x_ctr, y_ctr
61 |
62 | def _mkanchors(ws, hs, x_ctr, y_ctr):
63 | """
64 | Given a vector of widths (ws) and heights (hs) around a center
65 | (x_ctr, y_ctr), output a set of anchors (windows).
66 | """
67 |
68 | ws = ws[:, np.newaxis]
69 | hs = hs[:, np.newaxis]
70 | anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
71 | y_ctr - 0.5 * (hs - 1),
72 | x_ctr + 0.5 * (ws - 1),
73 | y_ctr + 0.5 * (hs - 1)))
74 | return anchors
75 |
76 | def _ratio_enum(anchor, ratios):
77 | """
78 | Enumerate a set of anchors for each aspect ratio wrt an anchor.
79 | """
80 |
81 | w, h, x_ctr, y_ctr = _whctrs(anchor)
82 | size = w * h
83 | size_ratios = size / ratios
84 | ws = np.round(np.sqrt(size_ratios))
85 | hs = np.round(ws * ratios)
86 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
87 | return anchors
88 |
89 | def _scale_enum(anchor, scales):
90 | """
91 | Enumerate a set of anchors for each scale wrt an anchor.
92 | """
93 |
94 | w, h, x_ctr, y_ctr = _whctrs(anchor)
95 | ws = w * scales
96 | hs = h * scales
97 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
98 | return anchors
99 |
--------------------------------------------------------------------------------
/code/net.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # DaSiamRPN
3 | # Licensed under The MIT License
4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn)
5 | # --------------------------------------------------------
6 | import torch
7 | import torch.nn as nn
8 | import torch.nn.functional as F
9 |
10 |
11 | class SiamRPNBIG(nn.Module):
12 | def __init__(self, feat_in=512, feature_out=512, anchor=5):
13 | super(SiamRPNBIG, self).__init__()
14 | self.anchor = anchor
15 | self.feature_out = feature_out
16 | self.featureExtract = nn.Sequential(
17 | nn.Conv2d(3, 192, 11, stride=2),
18 | nn.BatchNorm2d(192),
19 | nn.ReLU(inplace=True),
20 | nn.MaxPool2d(3, stride=2),
21 |
22 | nn.Conv2d(192, 512, 5),
23 | nn.BatchNorm2d(512),
24 | nn.ReLU(inplace=True),
25 | nn.MaxPool2d(3, stride=2),
26 |
27 | nn.Conv2d(512, 768, 3),
28 | nn.BatchNorm2d(768),
29 | nn.ReLU(inplace=True),
30 |
31 | nn.Conv2d(768, 768, 3),
32 | nn.BatchNorm2d(768),
33 | nn.ReLU(inplace=True),
34 |
35 | nn.Conv2d(768, 512, 3),
36 | nn.BatchNorm2d(512),
37 | )
38 |
39 | self.conv_reg1 = nn.Conv2d(feat_in, feature_out*4*anchor, 3)
40 | self.conv_reg2 = nn.Conv2d(feat_in, feature_out, 3)
41 | self.conv_cls1 = nn.Conv2d(feat_in, feature_out*2*anchor, 3)
42 | self.conv_cls2 = nn.Conv2d(feat_in, feature_out, 3)
43 | self.regress_adjust = nn.Conv2d(4*anchor, 4*anchor, 1)
44 |
45 | #self.additional_conv = nn.Conv2d(512, 512, 6, padding=0)
46 |
47 | self.reg1_kernel = []
48 | self.cls1_kernel = []
49 |
50 | def forward(self, x):
51 | x_f = self.featureExtract(x)
52 | #x_ff = self.additional_conv(x_f) # simply for the compatibility of shape matching
53 |
54 | batch_size = x_f.size(0)
55 | reg_conv_output = self.conv_reg2(x_f)
56 | cls_conv_output = self.conv_cls2(x_f)
57 |
58 | cls_corr_list = []
59 | reg_corr_list = []
60 | for i_batch in range(batch_size):
61 | i_cls_corr = F.conv2d(torch.unsqueeze(cls_conv_output[i_batch], 0), self.cls1_kernel[i_batch])
62 | cls_corr_list.append(i_cls_corr)
63 | i_reg_corr = F.conv2d(torch.unsqueeze(reg_conv_output[i_batch], 0), self.reg1_kernel[i_batch])
64 | i_reg_corr = self.regress_adjust(i_reg_corr)
65 | reg_corr_list.append(i_reg_corr)
66 |
67 | cls_corr = torch.stack(cls_corr_list, dim=0)
68 | cls_corr = torch.squeeze(cls_corr)
69 | reg_corr = torch.stack(reg_corr_list, dim=0)
70 | reg_corr = torch.squeeze(reg_corr)
71 |
72 | """
73 | # return tensors of shape (17,17,4K) and (17,17,2k), respectively
74 | return self.regress_adjust(
75 | F.conv2d(self.conv_reg2(x_f), self.reg1_kernel)
76 | ), \
77 | F.conv2d(self.conv_cls2(x_f), self.cls1_kernel)
78 | """
79 | return reg_corr, cls_corr, x_f # of shape (50, 4K, 17, 17), (50, 2K, 17, 17), (N, 22, 22, 512) ###################################
80 |
81 |
82 | def template(self, z):
83 | z_f = self.featureExtract(z)
84 | reg1_kernel_raw = self.conv_reg1(z_f)
85 | cls1_kernel_raw = self.conv_cls1(z_f)
86 | kernel_size = reg1_kernel_raw.data.size()[-1]
87 |
88 | self.reg1_kernel = reg1_kernel_raw.view(-1, self.anchor*4, self.feature_out, kernel_size, kernel_size)#50, 4K, 512, 4, 4
89 | self.cls1_kernel = cls1_kernel_raw.view(-1, self.anchor*2, self.feature_out, kernel_size, kernel_size)#50, 2K, 512, 4, 4
90 |
91 | return z_f # of shape (N, 6, 6, 512) #############################################################################################
92 |
--------------------------------------------------------------------------------
/code/py_nms.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | def py_nms(dets, thresh=0.9):
4 | """Python NMS""" # the input is in box format
5 |
6 | x1 = dets[:, 0]
7 | y1 = dets[:, 1]
8 | x2 = dets[:, 2]
9 | y2 = dets[:, 3]
10 | scores = dets[:, 4]
11 |
12 | areas = (x2 - x1 + 1) * (y2 - y1 + 1)
13 | order = scores.argsort()[::-1]
14 |
15 | keep = []
16 | while order.size > 0:
17 | i = order[0]
18 | keep.append(i)
19 | xx1 = np.maximum(x1[i], x1[order[1:]])
20 | yy1 = np.maximum(y1[i], y1[order[1:]])
21 | xx2 = np.minimum(x2[i], x2[order[1:]])
22 | yy2 = np.minimum(y2[i], y2[order[1:]])
23 |
24 | w = np.maximum(0.0, xx2 - xx1 + 1)
25 | h = np.maximum(0.0, yy2 - yy1 + 1)
26 | inter = w * h
27 | ovr = inter / (areas[i] + areas[order[1:]] - inter)
28 |
29 | inds = np.where(ovr <= thresh)[0]
30 | order = order[inds + 1]
31 |
32 | return keep
33 |
--------------------------------------------------------------------------------
/code/run_SiamRPN.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # DaSiamRPN
3 | # Licensed under The MIT License
4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn)
5 | # --------------------------------------------------------
6 | import numpy as np
7 | from torch.autograd import Variable
8 | import torch.nn.functional as F
9 |
10 | from util_test import convert_box2bbx
11 | from utils import get_subwindow_tracking
12 | import py_nms
13 | from train import generate_all_anchors
14 |
15 | '''DEPRECATED.'''
16 | def generate_anchor(total_stride, scales, ratios, score_size):
17 | anchor_num = len(ratios) * len(scales) # 5 x 1
18 | anchor = np.zeros((anchor_num, 4), dtype=np.float32)
19 | size = total_stride * total_stride
20 | count = 0
21 | for ratio in ratios:
22 | ws = int(np.sqrt(size / ratio))
23 | hs = int(ws * ratio)
24 | for scale in scales:
25 | wws = ws * scale
26 | hhs = hs * scale
27 | anchor[count, 0] = 0
28 | anchor[count, 1] = 0
29 | anchor[count, 2] = wws
30 | anchor[count, 3] = hhs
31 | count += 1
32 |
33 | anchor = np.tile(anchor, score_size * score_size).reshape((-1, 4))
34 | ori = - (score_size / 2) * total_stride
35 | xx, yy = np.meshgrid([ori + total_stride * dx for dx in range(score_size)],
36 | [ori + total_stride * dy for dy in range(score_size)])
37 | xx, yy = np.tile(xx.flatten(), (anchor_num, 1)).flatten(), \
38 | np.tile(yy.flatten(), (anchor_num, 1)).flatten()
39 | anchor[:, 0], anchor[:, 1] = xx.astype(np.float32), yy.astype(np.float32)
40 | return anchor
41 |
42 |
43 | class TrackerConfig(object):
44 | # These are the default hyper-params for DaSiamRPN 0.3827
45 |
46 | windowing = 'cosine' # to penalize large displacements [cosine/uniform]
47 |
48 | # Params from the network architecture, have to be consistent with the training
49 | exemplar_size = 127 # input z size
50 | instance_size = 255 # input x size (search region), 271
51 | total_stride = 8
52 | score_size = (instance_size-exemplar_size)/total_stride+1 # 'Cuz examplar will be used as a kernel to convolve with instance
53 | delta_score_size = 17 # must be consistent with that of Siamese network, to be automatically linked
54 |
55 | context_amount = 0.5 # context amount for the exemplar
56 | ratios = [0.33, 0.5, 1, 2, 3]
57 | scales = [8, ]
58 | basic_anchor_num = len(ratios) * len(scales)
59 | anchors = []
60 | penalty_k = 0.055
61 | window_influence = 0.42
62 | lr = 0.295
63 |
64 | alpha_i = 1.0
65 | eta = 0.01
66 | alpha_hat = 0.5
67 | num_pts_half_bin = 2
68 | distractor_thresh = 0.5
69 |
70 |
71 | def tracker_eval(net, x_crop, target_pos, target_sz, window, scale_z, p):
72 | delta, score = net(x_crop) # (1, 4K, 17, 17) and (1, 2K, 17, 17)
73 |
74 | delta = delta.permute(1, 2, 3, 0).contiguous().view(4, -1).data.cpu().numpy()
75 | score = F.softmax(score.permute(1, 2, 3, 0).contiguous().view(2, -1), dim=0).data[1, :].cpu().numpy()
76 |
77 | delta[0, :] = delta[0, :] * p.anchors[:, 2] + p.anchors[:, 0]
78 | delta[1, :] = delta[1, :] * p.anchors[:, 3] + p.anchors[:, 1]
79 | delta[2, :] = np.exp(delta[2, :]) * p.anchors[:, 2]
80 | delta[3, :] = np.exp(delta[3, :]) * p.anchors[:, 3]
81 |
82 | '''
83 | def change(r):
84 | return np.maximum(r, 1./r)
85 |
86 | def sz(w, h):
87 | pad = (w + h) * 0.5
88 | sz2 = (w + pad) * (h + pad)
89 | return np.sqrt(sz2)
90 |
91 | def sz_wh(wh):
92 | pad = (wh[0] + wh[1]) * 0.5
93 | sz2 = (wh[0] + pad) * (wh[1] + pad)
94 | return np.sqrt(sz2)
95 | '''
96 |
97 | # size penalty
98 | s_c = change(sz(delta[2, :], delta[3, :]) / (sz_wh(target_sz))) # scale penalty
99 | r_c = change((target_sz[0] / target_sz[1]) / (delta[2, :] / delta[3, :])) # ratio penalty
100 |
101 | penalty = np.exp(-(r_c * s_c - 1.) * p.penalty_k)
102 | pscore = penalty * score
103 |
104 | # window float
105 | pscore = pscore * (1 - p.window_influence) + window * p.window_influence
106 | best_pscore_id = np.argmax(pscore)
107 |
108 | target = delta[:, best_pscore_id] / scale_z
109 | target_sz = target_sz / scale_z
110 | lr = penalty[best_pscore_id] * score[best_pscore_id] * p.lr
111 |
112 | res_x = target[0] + target_pos[0]
113 | res_y = target[1] + target_pos[1]
114 |
115 | res_w = target_sz[0] * (1 - lr) + target[2] * lr
116 | res_h = target_sz[1] * (1 - lr) + target[3] * lr
117 |
118 | target_pos = np.array([res_x, res_y])
119 | target_sz = np.array([res_w, res_h])
120 | return target_pos, target_sz, score[best_pscore_id]
121 |
122 |
123 | def tracker_eval_distractor_aware(x_crop, target_sz, scale_z, state):
124 | p = state['p'] # tracking config
125 | net = state['net']
126 | window = state['window'] # cosine window
127 | target_pos = state['target_pos']
128 |
129 | delta, score, sr_feat = net(x_crop) # of shape (1, 4K, 17, 17), (1, 2K, 17, 17), (1, 22, 22, 512)
130 |
131 | delta = delta.contiguous().view(4, -1).data.cpu().numpy() # (4, K*17*17)
132 | score = F.softmax(score.contiguous().view(2, -1), dim=0).data[1, :].cpu().numpy() # (2, K*17*17)
133 |
134 | delta[0, :] = delta[0, :] * p.anchors[:, 2] + p.anchors[:, 0] # x
135 | delta[1, :] = delta[1, :] * p.anchors[:, 3] + p.anchors[:, 1] # y
136 | delta[2, :] = np.exp(delta[2, :]) * p.anchors[:, 2] # w
137 | delta[3, :] = np.exp(delta[3, :]) * p.anchors[:, 3] # h
138 |
139 | inds_inside = np.where(
140 | (delta[0, :] >= 0) &
141 | (delta[1, :] >= 0) &
142 | (delta[0, :] + delta[2, :] - 1 < p.instance_size) &
143 | (delta[1, :] + delta[3, :] - 1 < p.instance_size)
144 | )[0]
145 | delta = delta[:, inds_inside]
146 | score = score[inds_inside]
147 |
148 | # for i in range(delta.shape[1]):
149 | # print(delta[:, i])
150 |
151 | '''NMS is performed on delta according to pscore's'''
152 | dets = np.hstack(
153 | (delta.transpose(), score[np.newaxis, :].transpose())
154 | ) # in bbx format of (x, y, w, h)
155 | dets[:, 2] = dets[:, 0] + dets[:, 2] - 1
156 | dets[:, 3] = dets[:, 1] + dets[:, 3] - 1
157 |
158 | nms_indices_kept = py_nms.py_nms(dets, thresh=0.9) # now dets is in box format
159 | # dets_kept = dets[nums_ind_kept] # (N, 4+1)
160 | # print(dets.astype(int))
161 |
162 | def bilinear_interp(sr_feat, x_f, y_f):
163 | ub = sr_feat.shape[-1]-1
164 | x1, y1 = max(0, min(ub, int(x_f))), max(0, min(ub, int(y_f)))
165 | x2, y2 = max(0, min(ub, int(x_f)+1)), max(0, min(ub, int(y_f)+1))
166 | #print(f"{x1}, {y1}, {x2}, {y2}")
167 |
168 | fQ11, fQ12, fQ21, fQ22 = sr_feat[:, x1, y1], sr_feat[:, x1, y2], sr_feat[:, x2, y1], sr_feat[:, x2, y2]
169 | fQ11 = fQ11.cpu().detach().numpy()
170 | fQ12 = fQ12.cpu().detach().numpy()
171 | fQ21 = fQ21.cpu().detach().numpy()
172 | fQ22 = fQ22.cpu().detach().numpy()
173 |
174 | ret1 = (y2-y_f)/(y2-y1)*((x2-x_f)/(x2-x1)*fQ11 + (x_f-x1)/(x2-x1)*fQ21)
175 | ret2 = (y_f-y1)/(y2-y1)*((x2-x_f)/(x2-x1)*fQ12 + (x_f-x1)/(x2-x1)*fQ22)
176 |
177 | return ret1+ret2
178 |
179 | def binwise_max_pooling(meshgrid, num_bins, num_pts):
180 | assert meshgrid.shape[0] == meshgrid.shape[1] == num_bins*num_pts
181 |
182 | num_channels = meshgrid.shape[2]-2
183 | pooling_res = np.zeros((num_bins, num_bins, num_channels), dtype=np.float32)
184 | for channel in range(num_channels):
185 | for r in range(num_bins):
186 | for c in range(num_bins):
187 | res_rc = meshgrid[r*num_pts, c*num_pts, 2+channel]
188 | res_rc = max(res_rc, meshgrid[r*num_pts, c*num_pts+1, 2+channel])
189 | res_rc = max(res_rc, meshgrid[r*num_pts+1, c*num_pts, 2+channel])
190 | res_rc = max(res_rc, meshgrid[r*num_pts+1, c*num_pts+1, 2+channel])
191 | pooling_res[r, c, channel] = res_rc
192 |
193 | return pooling_res
194 |
195 | '''Extract phi's of each region proposal using ROI-align'''
196 | W = H = p.instance_size # raw image size
197 | W_ = H_ = sr_feat.shape[2] # size of feature map of search region, expected to be 22
198 | num_props = len(nms_indices_kept)
199 | num_bins = state['template_feat'].shape[-1] # expect state['template_feat'] to be 6
200 | num_pts = p.num_pts_half_bin
201 | num_channels = state['template_feat'].shape[1] # expected to be 512
202 | roi_align_feats = np.empty((num_props, num_bins, num_bins, num_channels), dtype=np.float32)
203 | index2featkept_map = {} # a mapping from the original index to the new index
204 | for i in range(num_props):
205 | nms_index_kept = nms_indices_kept[i]
206 |
207 | x, y, w, h = convert_box2bbx(tuple(dets[nms_index_kept][:4]))
208 | x_, y_ = W_*(x+1)/W-1, H_*(y+1)/H-1
209 | w_, h_ = W_*(x+w)/W-x_, H_*(y+h)/H-y_ #W_*w/W, H_*h/H
210 |
211 | meshgrid = np.empty((num_bins*num_pts, num_bins*num_pts, 2+num_channels)) # `2+num_channels` means (x, y, val)
212 | h_stride = w_/num_bins/(num_pts+1)
213 | v_stride = h_/num_bins/(num_pts+1)
214 |
215 | for r in range(num_bins*num_pts):
216 | for c in range(num_bins*num_pts):
217 | h_delta = (c//num_pts)*((num_pts+1)*h_stride) + ((c%num_pts)+1)*h_stride
218 | v_delta = (r//num_pts)*((num_pts+1)*v_stride) + ((r%num_pts)+1)*v_stride
219 |
220 | meshgrid[r, c, :2] = np.array([x_+h_delta, y_+v_delta]) # can be disabled
221 |
222 | x_f, y_f = x_+h_delta, y_+v_delta
223 | # print(x_f, y_f)
224 | vals = bilinear_interp(sr_feat[0], x_f, y_f) # sr_feat (1, 512, 22, 22)
225 | meshgrid[r, c, 2:] = vals
226 |
227 | roi_align_res = binwise_max_pooling(meshgrid, num_bins, num_pts) # resulting in a tensor of shape (6, 6, 512)
228 | roi_align_feats[i, ...] = roi_align_res
229 | index2featkept_map[nms_index_kept] = i
230 | '''After RoI-align, we obtain roi_align_feats, which is a tensor of shape (N, 6, 6, 512)'''
231 |
232 | '''Distractor-aware incremental learning:'''
233 | # 1. Construct a distractor set, saving indices of the original set of proposals before NMS
234 | distractor_index_set = []
235 | running_idx = nms_indices_kept[0]
236 | running_max = np.sum(state['template_feat'][0].transpose(1, 2, 0) * roi_align_feats[0]) # element-wise multiplication and sum
237 | if running_max > p.distractor_thresh:
238 | distractor_index_set.append(running_idx)
239 | for i in range(1, num_props):
240 | nms_index_kept = nms_indices_kept[i]
241 | curr_val = np.sum(state['template_feat'][0].transpose(1, 2, 0) * roi_align_feats[i]) # element-wise multiplication and sum
242 | if curr_val > running_max:
243 | running_idx = nms_index_kept
244 | running_max = curr_val
245 | if curr_val > p.distractor_thresh:
246 | distractor_index_set.append(nms_index_kept)
247 | distractor_index_set.remove(running_idx)
248 |
249 | # 2. Incremental learning according Eqn. (4)
250 | sum_alpha_i = len(distractor_index_set) * p.alpha_i
251 | running_template = state['acc_beta_phi'] / state['acc_beta'] - state['acc_beta_alpha_phi'] / (state['acc_beta'] * sum_alpha_i)
252 | running_idx = nms_indices_kept[0]
253 | running_max = np.sum(running_template[0].transpose(1, 2, 0) * roi_align_feats[0])
254 | for i in range(1, num_props):
255 | nms_index_kept = nms_indices_kept[i]
256 | curr_val = np.sum(running_template[0].transpose(1, 2, 0) * roi_align_feats[index2featkept_map[nms_index_kept]])
257 | if curr_val > running_max:
258 | running_idx = nms_index_kept
259 | running_max = curr_val
260 |
261 | beta_t = p.eta/(1-p.eta)
262 | curr_beta_alpha_phi = np.zeros_like(state['acc_beta_alpha_phi'])
263 | for distractor_index in distractor_index_set:
264 | curr_beta_alpha_phi += p.alpha_i * roi_align_feats[index2featkept_map[distractor_index]].transpose(2, 0, 1)[np.newaxis, ...]
265 | curr_beta_alpha_phi *= p.alpha_hat * beta_t
266 | state['acc_beta_alpha_phi'] += curr_beta_alpha_phi
267 | state['acc_beta'] += beta_t
268 | '''---Distractor-aware incremental learning---'''
269 |
270 | best_pscore_id = running_idx
271 | target = delta[:, best_pscore_id] / scale_z
272 | target_sz = target_sz / scale_z
273 | lr = 0.1 #penalty[best_pscore_id] * score[best_pscore_id] * p.lr
274 |
275 | res_x = target[0] + target_pos[0]
276 | res_y = target[1] + target_pos[1]
277 |
278 | res_w = target_sz[0] * (1 - lr) + target[2] * lr
279 | res_h = target_sz[1] * (1 - lr) + target[3] * lr
280 |
281 | target_pos = np.array([res_x, res_y])
282 | target_sz = np.array([res_w, res_h])
283 | return target_pos, target_sz, score[best_pscore_id]
284 |
285 |
286 | def SiamRPN_init(im, target_pos, target_sz, net):
287 | ## target_pos is (cx, cy)
288 | ## target_sz is (w, h)
289 |
290 | state = dict()
291 | p = TrackerConfig()
292 | state['im_h'] = im.shape[0]
293 | state['im_w'] = im.shape[1]
294 |
295 | if ((target_sz[0]*target_sz[1]) / float(state['im_h']*state['im_w'])) < 0.004:
296 | p.instance_size = 287 # small object big search region
297 | else:
298 | p.instance_size = 255 #271
299 |
300 | p.delta_score_size = int((p.instance_size-p.exemplar_size)/p.total_stride+1) # size of the last feature map, expected to be 17
301 |
302 | # all anchors of each aspect ratio and scale at each location are generated.
303 | p.anchors, _ = generate_all_anchors((p.delta_score_size, p.delta_score_size),
304 | (p.instance_size, p.instance_size))
305 | # of shape (dropping from 2420 down to 433, 4)
306 |
307 | avg_chans = np.mean(im, axis=(0, 1)) #???????????
308 |
309 | wc_z = target_sz[0] + p.context_amount * sum(target_sz) # adding some context info
310 | hc_z = target_sz[1] + p.context_amount * sum(target_sz) # adding some context info
311 | s_z = round(np.sqrt(wc_z * hc_z))
312 |
313 | # initialize the exemplar
314 | z_crop = get_subwindow_tracking(im, target_pos, p.exemplar_size, s_z, avg_chans)
315 |
316 | z = Variable(z_crop.unsqueeze(0))
317 | template_feat = net.template(z.cuda())
318 |
319 | if p.windowing == 'cosine':
320 | # return the outer product of two hanning vectors, which is a matrix of the same size as the feature map of search region
321 | window = np.outer(np.hanning(p.delta_score_size), np.hanning(p.delta_score_size)) ############### p.score_size???
322 | elif p.windowing == 'uniform':
323 | window = np.ones((p.delta_score_size, p.delta_score_size)) ################## p.score_size???
324 |
325 | # flatten and replicate the cosine window
326 | window = np.tile(window.flatten(), p.basic_anchor_num)
327 |
328 | state['p'] = p
329 | state['net'] = net
330 | state['avg_chans'] = avg_chans
331 | state['window'] = window
332 | state['target_pos'] = target_pos
333 | state['target_sz'] = target_sz
334 | state['score'] = 1.0
335 |
336 | # for distractor-aware incremental learning
337 | template_feat_cpu = template_feat.cpu().detach().numpy()
338 | state['template_feat'] = template_feat_cpu
339 | state['acc_beta_phi'] = template_feat_cpu
340 | state['acc_beta'] = 1.0
341 | state['acc_beta_alpha_phi'] = np.zeros_like(template_feat_cpu)
342 |
343 | return state
344 |
345 |
346 | def SiamRPN_track(state, im):
347 | p = state['p'] # tracking config
348 | net = state['net']
349 | avg_chans = state['avg_chans']
350 | window = state['window'] # cosine window
351 | target_pos = state['target_pos'] # cx, cy of target in the previous frame
352 | target_sz = state['target_sz'] # w, h of target in the previous frame
353 | template_feat = state['template_feat']
354 |
355 | wc_z = target_sz[0] + p.context_amount * sum(target_sz)
356 | hc_z = target_sz[1] + p.context_amount * sum(target_sz)
357 | s_z = np.sqrt(wc_z * hc_z)
358 | scale_z = p.exemplar_size / s_z
359 |
360 | ###'Local to Global': if failure mode is activated then expand d_search; otherwise set d_search to normal
361 | d_search = (p.instance_size - p.exemplar_size) / 2
362 | if state['score'] < 0.3:
363 | d_search *= 2
364 |
365 | pad = d_search / scale_z
366 | s_x = s_z + 2 * pad
367 |
368 | # extract scaled crops for search region x at previous target position
369 | x_crop = Variable(get_subwindow_tracking(im, target_pos, p.instance_size, round(s_x), avg_chans).unsqueeze(0))
370 | # where the third argument is the model size and the fourth is the orginal size in the raw image.
371 |
372 | target_pos, target_sz, score = tracker_eval_distractor_aware(x_crop.cuda(), target_sz*scale_z, scale_z, state)
373 |
374 | target_pos[0] = max(0, min(state['im_w'], target_pos[0]))
375 | target_pos[1] = max(0, min(state['im_h'], target_pos[1]))
376 | target_sz[0] = max(10, min(state['im_w'], target_sz[0]))
377 | target_sz[1] = max(10, min(state['im_h'], target_sz[1]))
378 |
379 | state['target_pos'] = target_pos
380 | state['target_sz'] = target_sz
381 | state['score'] = score
382 | return state
383 |
--------------------------------------------------------------------------------
/code/test.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import cv2 # imread
3 | import torch
4 | import numpy as np
5 | from os.path import realpath, dirname, join
6 |
7 | from net import SiamRPNBIG
8 | from run_SiamRPN import SiamRPN_init, SiamRPN_track
9 | from utils import get_axis_aligned_bbox, cxy_wh_2_rect
10 | from util_test import *
11 | import linecache
12 |
13 | class Tracker(object):
14 | def __init__(self,
15 | path_seq,
16 | num_frames,
17 | gt_first):
18 |
19 | self.path_seq = path_seq
20 | self.num_frames = num_frames
21 | self.gt_first = gt_first
22 |
23 | # load net
24 | self.net = SiamRPNBIG()
25 | # self.net.load_state_dict(torch.load("./SiamRPNBIG.model"))
26 | self.net.eval().cuda()
27 |
28 | #self.testing_config = testing_config
29 | self.cur_seq_name = os.path.split(path_seq)[1]
30 | self.cur_frame = None
31 |
32 | def on_tracking(self):
33 | # warm up
34 | for i in range(10):
35 | self.net.template(torch.autograd.Variable(torch.FloatTensor(1, 3, 127, 127)).cuda())
36 | self.net(torch.autograd.Variable(torch.FloatTensor(1, 3, 255, 255)).cuda())
37 |
38 | i = 1
39 | pred_bbx = self.gt_first
40 | print("{}th frame: {} {} {} {}".format(i, pred_bbx[0],pred_bbx[1], pred_bbx[2], pred_bbx[3]))
41 | cx, cy, w, h = pred_bbx[0]+pred_bbx[2]/2.0, pred_bbx[1]+pred_bbx[3]/2.0, pred_bbx[2], pred_bbx[3]
42 | i += 1
43 |
44 | target_pos, target_sz = np.array([cx, cy]), np.array([w, h])
45 | im = cv2.imread(self.path_seq + '/imgs/0001.jpg') # HxWxC
46 | state = SiamRPN_init(im, target_pos, target_sz, self.net) # init tracker
47 |
48 | while i <= self.num_frames:
49 | self.index_frame = i
50 | im = cv2.imread(self.path_seq + '/imgs/' + str(i).zfill(4) + '.jpg')
51 | state = SiamRPN_track(state, im)
52 |
53 | # convert cx, cy, w, h into rect
54 | res = cxy_wh_2_rect(state['target_pos'], state['target_sz'])
55 | print(f"{i}th frame: ", res)
56 | i += 1
57 |
58 |
59 | if __name__ == "__main__":
60 | os.environ["CUDA_VISIBLE_DEVICES"] = "1"
61 | print('*****************TEST PHASE********************')
62 | import datetime
63 | testing_date = datetime.datetime.now().strftime('%b-%d-%y_%H:%M:%S')
64 |
65 | seq_list_path = '../data/whole_list.txt'
66 |
67 | seq_path_list = np.genfromtxt(seq_list_path, dtype='S', usecols=0)
68 | num_frames_list = np.genfromtxt(seq_list_path, dtype=int, usecols=1)
69 | if seq_path_list.ndim == 0:
70 | seq_path_list = seq_path_list.reshape(1)
71 | num_frames_list = num_frames_list.reshape(1)
72 |
73 | assert len(seq_path_list) == len(num_frames_list)
74 | total_seqs = len(seq_path_list)
75 | total_frames = sum(num_frames_list)
76 | for seq_index in range(len(seq_path_list)):
77 | path_seq = seq_path_list[seq_index].decode('utf-8')
78 | num_frames = num_frames_list[seq_index]
79 |
80 | seq_name = os.path.split(path_seq)[1]
81 | print(f"\nprocessing Sequence {seq_name} with {num_frames} frames...")
82 |
83 | global gt_file_name
84 | gt_file_name = path_seq + '/' + seq_name + "_gt.txt"
85 |
86 | gt_entry = linecache.getline(gt_file_name, 1)
87 | gt_first = parse_gt_entry(gt_entry)
88 |
89 | tracker = Tracker(path_seq=path_seq,
90 | num_frames=num_frames,
91 | gt_first=gt_first)
92 | tracker.on_tracking()
93 |
--------------------------------------------------------------------------------
/code/train.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import os
3 | import torch
4 |
5 | from net import SiamRPNBIG
6 | from gen_all_anchors import generate_all_anchors
7 | from bbox_transform import bbox_transform
8 | from config import cfg
9 |
10 | import argparse
11 | import dataset
12 | from tqdm import tqdm
13 |
14 |
15 | def bbox_overlaps(box, gt, phase='iou'):
16 | """
17 | Compute the overlaps between box and gt(_box)
18 | box: (N, 4) NDArray
19 | gt : (K, 4) NDArray
20 | return: (N, K) NDArray, stores Max(0, intersection/union) or Max(0, intersection/area_box)
21 | """
22 | # Note that the inputs are in box format: x1, y1, x2, y2
23 |
24 | N = box.shape[0]
25 | K = gt.shape[0]
26 | target_shape = (N, K, 4)
27 | b_box = np.broadcast_to(np.expand_dims(box, axis=1), target_shape)
28 | b_gt = np.broadcast_to(np.expand_dims(gt, axis=0), target_shape)
29 |
30 | iw = (np.minimum(b_box[:, :, 2], b_gt[:, :, 2]) -
31 | np.maximum(b_box[:, :, 0], b_gt[:, :, 0]))
32 | ih = (np.minimum(b_box[:, :, 3], b_gt[:, :, 3]) -
33 | np.maximum(b_box[:, :, 1], b_gt[:, :, 1]))
34 | inter = np.maximum(iw, 0) * np.maximum(ih, 0)
35 |
36 | # Use the broadcast to save some time
37 | area_box = (box[:, 2] - box[:, 0]) * (box[:, 3] - box[:, 1])
38 | area_gt = (gt[:, 2] - gt[:, 0]) * (gt[:, 3] - gt[:, 1])
39 | area_target_shape = (N, K)
40 | b_area_box = np.broadcast_to(np.expand_dims(area_box, axis=1), area_target_shape)
41 | b_area_gt = np.broadcast_to(np.expand_dims(area_gt, axis=0), area_target_shape)
42 |
43 | assert phase == 'iou' or phase == 'ioa'
44 | union = b_area_box + b_area_gt - inter if phase == 'iou' else b_area_box
45 |
46 | overlaps = np.maximum(inter / np.maximum(union, 1), 0)
47 | return overlaps
48 |
49 |
50 | def smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, beta=1.0):
51 | box_diff = bbox_pred - bbox_targets
52 | in_box_diff = bbox_inside_weights * box_diff
53 | abs_in_box_diff = torch.abs(in_box_diff)
54 | smooth_l1_sign = (abs_in_box_diff < beta).detach().float()
55 |
56 | in_loss_box = smooth_l1_sign * 0.5 * torch.pow(in_box_diff, 2) / beta + \
57 | (1-smooth_l1_sign) * (abs_in_box_diff-0.5*beta)
58 |
59 | out_loss_box = bbox_outside_weights * in_loss_box
60 | loss_box = out_loss_box
61 | N = loss_box.size(0)
62 | loss_box = loss_box.view(-1).sum(0) / N
63 | return loss_box
64 |
65 |
66 | def _unmap(data, count, inds, fill=0):
67 | """ Unmap a subset of item (data) back to the original set of items (of size count) """
68 | if len(data.shape) == 1:
69 | ret = np.empty((count, ), dtype=np.float32)
70 | ret.fill(fill)
71 | ret[inds] = data
72 | else:
73 | ret = np.empty((count, ) + data.shape[1:], dtype=np.float32)
74 | ret.fill(fill)
75 | ret[inds, :] = data
76 | return ret
77 |
78 |
79 | def _compute_targets(ex_rois, gt_rois):
80 | """Compute bounding-box regression targets for an image."""
81 | assert ex_rois.shape[0] == gt_rois.shape[0]
82 | assert ex_rois.shape[1] == 4
83 | assert gt_rois.shape[1] == 4 #5
84 |
85 | return bbox_transform(ex_rois, gt_rois[:, :4].numpy()).astype(np.float32, copy=False)
86 |
87 |
88 | def gen_anchor_target(cls_output_shape, xs_shape, gt_boxes):
89 | """
90 | Assign anchors to ground-truth targets.
91 | Produces anchor classification labels and bounding-box regression targets.
92 | """
93 | height, width = cls_output_shape
94 | all_anchors, A = generate_all_anchors(cls_output_shape, xs_shape)
95 | # Note that anchors are in format (x1, y1, x2, y2)
96 |
97 | total_anchors = all_anchors.shape[0]
98 | inds_inside = np.where(
99 | (all_anchors[:, 0] >= 0) &
100 | (all_anchors[:, 1] >= 0) &
101 | (all_anchors[:, 2] < xs_shape[1]) &
102 | (all_anchors[:, 3] < xs_shape[0])
103 | )[0]
104 | anchors = all_anchors[inds_inside, :]
105 |
106 | labels = np.zeros((1, 1, A*height, width))
107 | bbox_targets = np.zeros((1, 4*A, height, width))
108 | bbox_inside_weights = np.zeros((1, 4*A, height, width))
109 | bbox_outsied_weights = np.zeros((1, 4*A, height, width))
110 |
111 | # label: 1 is positive, 0 is negative, -1 is don't care
112 | labels = np.empty((len(inds_inside), ), dtype=np.float32)
113 | labels.fill(-1)
114 |
115 | # overlaps between anchors and gt boxes
116 | # overlaps.shape = (#total_anchors, #gts)
117 | overlaps = bbox_overlaps(
118 | np.ascontiguousarray(anchors, dtype=np.float),
119 | np.ascontiguousarray(gt_boxes, dtype=np.float))
120 |
121 | argmax_overlaps = overlaps.argmax(axis=1) # of shape (#total_anchors, )
122 | max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] # of shape (#total_anchors, )
123 |
124 | gt_argmax_overlaps = overlaps.argmax(axis=0) # of shape (#gt, )
125 | gt_max_overlaps = overlaps[gt_argmax_overlaps, np.arange(overlaps.shape[1])] # of shape (#gt, )
126 | gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
127 |
128 | labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0 # 0.3
129 | labels[gt_argmax_overlaps] = 1
130 | labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1 # 0.7
131 |
132 | bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
133 | # _compute_targets() returns #sifted_anchors-by-4 tensor with each row being (dx, dy, dw, dy),
134 | # the increment to be learnt by bbx regressor
135 | bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
136 | bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
137 | bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS) #RPN_BBOX_INSIDE_WEIGHTS = [1.0, 1.0, 1.0, 1.0]
138 |
139 | bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
140 | if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0: #cfg.TRAIN.RPN_POSITIVE_WEIGHT == -1.0
141 | # uniform weighting of examples (given non-uniform sampling)
142 | num_examples = np.sum(labels >= 0) # num_examples is the sum of anchors labeled 1
143 | positive_weights = np.ones((1, 4)) * 1.0 / num_examples
144 | negative_weights = np.ones((1, 4)) * 1.0 / num_examples
145 | else:
146 | assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
147 | (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
148 | positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT / np.sum(labels == 1))
149 | negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) / np.sum(labels == 0))
150 |
151 | bbox_outside_weights[labels == 1, :] = positive_weights
152 | bbox_outside_weights[labels == 0, :] = negative_weights
153 |
154 | # map up to original set of anchors
155 | labels = _unmap(labels, total_anchors, inds_inside, fill=-1) #labels.shape == (#total_anchors, )
156 | bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0) #bbox_targets.shape == (#total_anchors, 4)
157 | bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0) #bbox_inside_weights.shape == (#total_anchors, 4)
158 | bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0) #bbox_outside_weights.shape == (#total_anchors, 4)
159 |
160 | # labels
161 | labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2) # of shape (1, A, height, width)
162 | labels = labels.reshape((1, 1, A*height, width))
163 | # bbox_targets
164 | bbox_targets = bbox_targets.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width)
165 | # bbox_inside_weights
166 | bbox_inside_weights = bbox_inside_weights.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width)
167 | # bbox_outside_weights
168 | bbox_outside_weights = bbox_outside_weights.reshape((1, height, width, A*4)).transpose(0, 3, 1, 2) # of shape (1, 4*A, height, width)
169 |
170 | return labels, bbox_targets, bbox_inside_weights, bbox_outside_weights, A
171 |
172 |
173 | def parse_args():
174 | parser = argparse.ArgumentParser()
175 | parser.add_argument('--gpu_id', default=0, help='GPU ID to use, e.g. \'0\'', type=int)
176 |
177 | return parser.parse_args()
178 |
179 |
180 | def load_pretrained_weights(net, weight_file_path):
181 | ori_pretrained_dict = torch.load(weight_file_path)
182 | model_dict = net.state_dict()
183 | #pretrained_dict = {k: v for k, v in ori_pretrained_dict.items() if k in model_dict}
184 |
185 | import collections
186 | pretrained_dict = collections.OrderedDict()
187 |
188 | for k, v in ori_pretrained_dict.items():
189 | if k in model_dict and k.startswith('featureExtract'): # Only load the modified AlexNet weights
190 | pretrained_dict[k] = v
191 | # print(k)
192 |
193 | model_dict.update(pretrained_dict)
194 | net.load_state_dict(model_dict)
195 |
196 |
197 | if __name__ == '__main__':
198 | torch.backends.cudnn.enabled=False # to temporally remove the issue "illegal access to memory"
199 |
200 | args = parse_args()
201 | gpu_id = args.gpu_id
202 | if gpu_id is None:
203 | DEVICE = torch.device(f'cpu')
204 | else:
205 | DEVICE = torch.device(f'cuda:{gpu_id}')
206 |
207 | z_size = (127, 127)
208 | x_size = (255, 255)
209 | batch_size = num_domains = 50
210 | num_epoches = 100
211 |
212 | loader = dataset.load_data(batch_size, z_size, x_size)['train']
213 |
214 | net = SiamRPNBIG()
215 | net.train().to(DEVICE)
216 | # load_pretrained_weights(net, "./SiamRPNBIG.model")
217 | optimizer = torch.optim.Adam(net.parameters(), weight_decay=0.001, lr=0.001)
218 |
219 | for i_ep in range(num_epoches):
220 | for i_iter, sample in tqdm(enumerate(loader), total=len(loader)):
221 | zs = sample['template'].to(DEVICE)
222 | xs = sample['search_region'].to(DEVICE)
223 | gt_boxes = sample['gt_box'] #.to(DEVICE)
224 |
225 | optimizer.zero_grad()
226 |
227 | net.template(zs)
228 | reg_output, cls_output, _ = net.forward(xs) # of shape (50, 4*5, 17, 17), (50, 2*5, 17, 17)
229 |
230 | feat_h, feat_w = tuple(cls_output.size()[-2:])
231 | assert zs.shape[0] == xs.shape[0] == gt_boxes.shape[0]
232 | total_loss = total_cls_loss = total_reg_loss = 0.0
233 | for i in range(zs.shape[0]):
234 | rpn_labels, \
235 | rpn_bbox_targets, \
236 | rpn_bbox_inside_weights, \
237 | rpn_bbox_outside_weights, \
238 | A \
239 | = gen_anchor_target(cls_output[i].shape[-2:], xs[i].shape[-2:], gt_boxes[i][np.newaxis, :])
240 |
241 | #reg_loss_fn = torch.nn.SmoothL1Loss(reduce=False, size_average=False)
242 | reg_loss_fn = smooth_l1_loss
243 | reg_loss = reg_loss_fn(reg_output[i], torch.from_numpy(rpn_bbox_targets).to(DEVICE), torch.from_numpy(rpn_bbox_inside_weights).to(DEVICE), torch.from_numpy(rpn_bbox_outside_weights).to(DEVICE))
244 |
245 | cls_loss_fn = torch.nn.CrossEntropyLoss(reduce=False, size_average=False)
246 |
247 | rpn_labels = rpn_labels.reshape(A, feat_h, feat_w) # from (1, 1, A*17, 17) to (A, 17, 17)
248 | logits = cls_output[i].view(A, 2, feat_h, feat_w) # from (2*A, 17, 17) to (A, 2, 17, 17)
249 | cls_loss = cls_loss_fn(logits, torch.from_numpy(rpn_labels).to(DEVICE).long()) # (A, 17, 17)
250 |
251 | mask = np.ones_like(rpn_labels)
252 | mask[np.where(rpn_labels==-1)] = 0 # mask where we 'don't care'
253 |
254 | #import pdb
255 | #pdb.set_trace()
256 |
257 | mask = torch.from_numpy(mask).to(DEVICE)
258 | cls_loss = torch.sum(cls_loss * mask) / torch.sum(mask)
259 |
260 | #print("{} + l * {} = {}".format(cls_loss, reg_loss, cls_loss+cfg.TRAIN.LAMBDA*reg_loss))
261 |
262 | total_cls_loss += cls_loss
263 | total_reg_loss += reg_loss
264 | total_loss += cls_loss + cfg.TRAIN.LAMBDA * reg_loss
265 |
266 | total_loss /= batch_size
267 | total_reg_loss /= batch_size
268 | total_cls_loss /= batch_size
269 | print(f"Epoch{i_ep} Iter{i_iter} --- total_loss: {total_loss:.4f}, cls_loss: {total_cls_loss:.4f}, reg_loss: {total_reg_loss:.4f}")
270 | total_loss.backward()
271 | optimizer.step()
272 |
273 | ######## Save the current model
274 | print("Saving model...")
275 | if not os.path.exists("./output/weights"):
276 | os.makedirs("./output/weights")
277 | torch.save(net.state_dict(), f"./output/weights/dasiam_{i_ep}.pkl")
278 |
279 | print("Training completed.")
280 |
281 |
--------------------------------------------------------------------------------
/code/util_test.py:
--------------------------------------------------------------------------------
1 | #import matplotlib as mpl
2 | #import matplotlib.cbook as cbook
3 | import os
4 | import cv2
5 | import numpy as np
6 | #import matplotlib.pyplot as plt
7 |
8 | import datetime
9 |
10 | """
11 | UTIL OF TEST version 1.0
12 | """
13 |
14 | def convert_box2bbx(box):
15 | x1, y1, x2, y2 = box
16 | return (x1, y1, x2-x1+1, y2-y1+1)
17 |
18 |
19 | def convert_bbx2box(bbx):
20 | x, y, w, h = bbx
21 | return (x, y, x+w-1, y+h-1)
22 |
23 |
24 | def save_pred_bboxes_v2(pred_tuple_list, seq_name, testing_date):
25 | source_path = '/home/code/xuxiaqing/dataset/OTB100/{}/imgs'.format(seq_name)
26 | saving_path = './output/tracking_res/OTB100/{}/{}/'.format(testing_date, seq_name)
27 |
28 | if not os.path.exists(saving_path):
29 | os.makedirs(saving_path)
30 |
31 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w')
32 | for index, pred_tuple in enumerate(pred_tuple_list):
33 | pred_bbx, score = pred_tuple
34 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg'
35 | raw_img_path = os.path.join(source_path, raw_img_name)
36 | frame = cv2.imread(raw_img_path)
37 |
38 | left = int(round(pred_bbx[0]))
39 | top = int(round(pred_bbx[1]))
40 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1))
41 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1))
42 |
43 | ##############################################################################
44 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 255, 0), 2)
45 | cv2.putText(frame, str(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 255), 1, 8)
46 | # cv2.putText(frame, str(thresh_eps), (right, bottom), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 97, 255), 1, 8)
47 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame)
48 |
49 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3])
50 | list_file.write(entry + '\n')
51 |
52 | list_file.close()
53 | print('\nPredictions of Seq ' + seq_name + ' saved.')
54 |
55 | def save_pred_bboxes(pred_bbx_list, score_list, seq_name):
56 | assert len(pred_bbx_list) == len(score_list), 'length of lists not equal'
57 |
58 | saving_path = cfg.ROOT_DIR + './output/tracking_res/{}/'.format(cfg.TEST.BENCHMARK_NAME) + seq_name
59 | source_path = '/home/lishen/Experiments/siamese_tracking_net/dataset/test_set/OTB100/' + seq_name + '/imgs'
60 |
61 | if not os.path.exists(saving_path):
62 | os.makedirs(saving_path)
63 |
64 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w')
65 | for index, pred_bbx in enumerate(pred_bbx_list):
66 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg'
67 | raw_img_path = os.path.join(source_path, raw_img_name)
68 | frame = cv2.imread(raw_img_path)
69 |
70 | left = int(round(pred_bbx[0]))
71 | top = int(round(pred_bbx[1]))
72 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1))
73 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1))
74 |
75 | score = score_list[index]
76 |
77 | ##############################################################################
78 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 2)
79 | cv2.putText(frame, '{}'.format(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, 8)
80 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame)
81 |
82 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3])
83 | list_file.write(entry + '\n')
84 |
85 | list_file.close()
86 | print('\nPredictions of Seq ' + seq_name + ' saved.')
87 |
88 | def save_pred_bboxes_bbxr_exclusive(pred_bbx_list_before_reg, pred_bbx_list, score_list, seq_name):
89 | assert len(pred_bbx_list) == len(score_list), 'length of lists not equal'
90 | assert len(pred_bbx_list) == len(pred_bbx_list_before_reg), 'length of lists not equal'
91 |
92 | saving_path = cfg.ROOT_DIR + '/output/tracking_res/{}/'.format(cfg.TEST.BENCHMARK_NAME) + seq_name
93 | source_path = '/home/lishen/Experiments/siamese_tracking_net/dataset/test_set/Benchmark/' + seq_name + '/imgs'
94 |
95 | if not os.path.exists(saving_path):
96 | os.makedirs(saving_path)
97 |
98 | list_file = open(os.path.join(saving_path, 'preds.txt'), 'w')
99 | for index, pred_bbx in enumerate(pred_bbx_list):
100 | raw_img_name = '%s' % (str(index+1).zfill(4)) + '.jpg'
101 | raw_img_path = os.path.join(source_path, raw_img_name)
102 | frame = cv2.imread(raw_img_path)
103 |
104 | left = int(round(pred_bbx[0]))
105 | top = int(round(pred_bbx[1]))
106 | right = int(round(pred_bbx[0] + pred_bbx[2] - 1))
107 | bottom = int(round(pred_bbx[1] + pred_bbx[3] - 1))
108 |
109 | score = score_list[index]
110 |
111 | ## predicted bbx before regression ##
112 | left_before = int(round(pred_bbx_list_before_reg[index][0]))
113 | top_before = int(round(pred_bbx_list_before_reg[index][1]))
114 | right_before = int(round(pred_bbx_list_before_reg[index][0] + pred_bbx_list_before_reg[index][2] - 1))
115 | bottom_before = int(round(pred_bbx_list_before_reg[index][1] + pred_bbx_list_before_reg[index][3] - 1))
116 | cv2.rectangle(frame, (left_before, top_before), (right_before, bottom_before), (0, 0, 255), 2)
117 |
118 | ##############################################################################
119 | cv2.rectangle(frame, (left, top), (right, bottom), (255, 0, 0), 2)
120 | cv2.putText(frame, '{}'.format(score), (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, 8)
121 | cv2.imwrite(os.path.join(saving_path, raw_img_name), frame)
122 |
123 | entry = str(pred_bbx[0]) + ' ' + str(pred_bbx[1]) + ' ' + str(pred_bbx[2]) + ' ' + str(pred_bbx[3])
124 | list_file.write(entry + '\n')
125 |
126 | list_file.close()
127 | print('\nPredictions of Seq ' + seq_name + ' saved.')
128 |
129 |
130 | def parse_gt_entry(gt_entry):
131 | split_gt_entry = gt_entry.split() #','
132 |
133 | left = float(split_gt_entry[0])
134 | top = float(split_gt_entry[1])
135 | width = float(split_gt_entry[2])
136 | height = float(split_gt_entry[3])
137 | return (left, top, width, height)
138 |
139 |
140 | def crop_roi(frame, bbx):
141 | #box = (x1, y1, x2, y2)
142 | box = (int(round(bbx[0])), int(round(bbx[1])), int(round(bbx[0]+bbx[2])), int(round(bbx[1]+bbx[3])))
143 | return frame[box[1]:box[3], box[0]:box[2], :]
144 |
145 |
146 | def crop_and_save(seq_name, raw_img, idx_frame, samples, type_str):
147 | root_dir = cfg.CODE_ROOT_DIR + '/output/finetuning_data/{}'.format(cfg.TEST.BENCHMARK_NAME)
148 |
149 | tar_dir = root_dir + '/' + seq_name + '/' + str(idx_frame) + '/' + type_str
150 | if not os.path.exists(tar_dir):
151 | os.makedirs(tar_dir)
152 |
153 | for idx in xrange(samples.shape[0]):
154 | bbx_sample = samples[idx, :]
155 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
156 | patch = raw_img[box[1]:box[3], box[0]:box[2], :]
157 | path_patch = tar_dir + '/' + str(idx+1) + '.jpg'
158 | cv2.imwrite(path_patch, patch)
159 |
160 |
161 | def sub_gen(gt, raw_img_size):
162 | sub_pos_samples = np.zeros((25, 4), dtype=np.float32)
163 | index = 0
164 |
165 | right_img = raw_img_size[1]
166 | bottom_img = raw_img_size[0]
167 |
168 | for dx in np.arange(-2, 3):
169 | for dy in np.arange(-2, 3):
170 | '''determine a new bounding box'''
171 | left = gt[0] + dx
172 | top = gt[1] + dy
173 | width = gt[2] + np.abs(dx)
174 | height = gt[3] + np.abs(dy)
175 |
176 | '''in case it lies beyond the boundary'''
177 | left = min(right_img, max(0, left)) # 0 <= left <= right_img
178 | top = min(bottom_img, max(0, top))
179 |
180 | right = left + width
181 | right = min(right_img, max(0, right))
182 |
183 | bottom = top + height
184 | bottom = min(bottom_img, max(0, bottom))
185 |
186 | width = right - left
187 | height = bottom - top
188 |
189 | sub_pos_samples[index, :] = np.array([[left, top, width, height]])
190 | index += 1
191 |
192 | return sub_pos_samples
193 |
194 |
195 | def gen_positive_samples(gt, raw_img_size):
196 | '''This function will generate 50 positive samples using pixel-difference type'''
197 |
198 | #Generate the first 25 positives
199 | first_sub_pos_samples = sub_gen(gt, raw_img_size)
200 |
201 | #Generate the second 25 positives
202 | shifted_gt = (gt[0]-1, gt[1]-1, gt[2]+2, gt[3]+2)
203 | second_sub_pos_samples = sub_gen(shifted_gt, raw_img_size)
204 |
205 | pos_samples = np.vstack((first_sub_pos_samples, second_sub_pos_samples))
206 | return pos_samples
207 |
208 |
209 | def IoU(prop, gt):
210 | x1, y1, w1, h1 = float(prop[0]), float(prop[1]), float(prop[2]), float(prop[3])
211 | x2, y2, w2, h2 = float(gt[0]), float(gt[1]), float(gt[2]), float(gt[3])
212 | startx, endx = min(x1, x2), max(x1 + w1, x2 + w2)
213 | starty, endy = min(y1, y2), max(y1 + h1, y2 + h2)
214 | width = w1 + w2 - (endx - startx)
215 | height = h1 + h2 - (endy - starty)
216 | if width <= 0 or height <= 0:
217 | return 0
218 | else :
219 | area = width * height
220 | return 1.0 * area / (w1*h1 + w2*h2 - area)
221 |
222 |
223 | def post_proc(random_scalar):
224 | return max(-1, min(1, 0.5 * random_scalar)) #restrict it within the interval [-1, 1]
225 |
226 |
227 | def gen_samples_box(sampling_type,
228 | gt,
229 | num_samples,
230 | raw_img_size,
231 | base_scalar=1.05,
232 | trans_fac=0.1,
233 | scale_fac=5,
234 | pos_sampling=True,
235 | pos_thresh=0.7,
236 | neg_thresh=0.3,
237 | iou_thresh_ignored=False):
238 |
239 | H = raw_img_size[0]
240 | W = raw_img_size[1]
241 |
242 | #sample = (cx, cy, w, h), where (cx, cy) is the coodinate of the gt image
243 | sample = np.array([gt[0]+gt[2]/2, gt[1]+gt[3]/2, gt[2], gt[3]], dtype = np.float32)
244 | samples = np.tile(sample, (num_samples, 1))
245 |
246 | idx = 0
247 | while idx < num_samples:
248 | curr_sample = samples[idx, :].copy()
249 |
250 | if sampling_type == 'gaussian':
251 | lt_increment = trans_fac * round(np.mean(gt[2:4])) * np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
252 | curr_sample[:2] = curr_sample[:2] + lt_increment.reshape(2,)
253 |
254 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
255 | wh_factor = base_scalar ** (scale_fac * randn_vec)
256 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
257 |
258 | elif sampling_type == 'uniform': #uniform distribution within a searching area 2.5 times the size of bbx
259 | sr_ratio = 3.5 #cfg.TEST.UNIFORM_SAMPLING_RANGE_RATIO #twice or 2.5 times???
260 |
261 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
262 | wh_factor = base_scalar ** (scale_fac * randn_vec)
263 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
264 |
265 | cx_bound = (curr_sample[0]-curr_sample[2]*(sr_ratio/2), curr_sample[0]+curr_sample[2]*(sr_ratio/2))
266 | cy_bound = (curr_sample[1]-curr_sample[3]*(sr_ratio/2), curr_sample[1]+curr_sample[3]*(sr_ratio/2))
267 | cx = (cx_bound[1] - cx_bound[0]) * np.random.random_sample() + cx_bound[0]
268 | cy = (cy_bound[1] - cy_bound[0]) * np.random.random_sample() + cy_bound[0]
269 |
270 | curr_sample[0] = cx
271 | curr_sample[1] = cy
272 |
273 | elif sampling_type == 'whole': #uniform distribution within the whole image
274 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
275 | wh_factor = base_scalar ** (scale_fac * randn_vec)
276 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
277 |
278 | w = curr_sample[2]
279 | h = curr_sample[3]
280 | curr_sample[0] = (W - w) * np.random.random_sample() + w / 2.0
281 | curr_sample[1] = (H - h) * np.random.random_sample() + h / 2.0
282 |
283 | '''In case that samples experience abrupt scaling variation...''' ##########
284 | curr_sample[2] = max(5, min(W-5, curr_sample[2])) #w max(gt[2]/5.0, min(gt[2]*5.0, curr_sample[2]))
285 | curr_sample[3] = max(5, min(H-5, curr_sample[3])) #h max(gt[3]/5.0, min(gt[2]*5.0, curr_sample[3]))
286 |
287 | half_w, half_h = curr_sample[2]/2.0, curr_sample[3]/2.0
288 |
289 | # bbx_sample = np.array([curr_sample[0]-curr_sample[2]/2, curr_sample[1]-curr_sample[3]/2, curr_sample[2], curr_sample[3]])
290 | # bbx_sample[0] = max(0, min(W-bbx_sample[2]-1, bbx_sample[0]))
291 | # bbx_sample[1] = max(0, min(H-bbx_sample[3]-1, bbx_sample[1]))
292 |
293 | """The centre coordinate of candidate box should lie within the [half_w, W-half_w-1]x[half_h, H-half_h-1]"""
294 | curr_sample[0] = max(half_w, min(W-half_w-1, curr_sample[0]))
295 | curr_sample[1] = max(half_h, min(H-half_h-1, curr_sample[1]))
296 |
297 | x1, y1 = curr_sample[0]-half_w, curr_sample[1]-half_h
298 | x1, y1 = max(0, min(W-1, x1)), max(0, min(H-1, y1)) ### for insurance
299 | x2, y2 = curr_sample[0]+half_w, curr_sample[1]+half_h
300 | x2, y2 = max(0, min(W-1, x2)), max(0, min(H-1, y2)) ### for insurance
301 | box_sample = np.array([x1, y1, x2, y2])
302 |
303 | if iou_thresh_ignored: # this is exclusive for sampling candidates during online tracking
304 | samples[idx, :] = box_sample
305 | idx += 1
306 | continue
307 |
308 | overlap_ratio = IoU(convert_box2bbx(box_sample), gt)
309 | if overlap_ratio >= pos_thresh and pos_sampling: #if positive sampling is being performed and its overlapping ratio >= 0.7
310 | samples[idx, :] = box_sample
311 | idx += 1
312 | elif overlap_ratio < neg_thresh and not pos_sampling: #if negative sampling is being performed and its overlapping ratio < 0.3
313 | samples[idx, :] = box_sample
314 | idx += 1
315 |
316 | return samples
317 |
318 |
319 | def gen_samples(sampling_type,
320 | gt,
321 | num_samples,
322 | raw_img_size,
323 | base_scalar=1.05,
324 | trans_fac=0.1,
325 | scale_fac=5,
326 | pos_sampling=True,
327 | pos_thresh=0.7,
328 | neg_thresh=0.3,
329 | iou_thresh_ignored=False):
330 |
331 | H = raw_img_size[0]
332 | W = raw_img_size[1]
333 |
334 | #sample = (cx, cy, w, h), where (cx, cy) is the coodinate of the gt image
335 | sample = np.array([gt[0]+gt[2]/2, gt[1]+gt[3]/2, gt[2], gt[3]], dtype = np.float32)
336 | samples = np.tile(sample, (num_samples, 1))
337 |
338 | idx = 0
339 | while idx < num_samples:
340 | curr_sample = samples[idx, :].copy()
341 |
342 | if sampling_type == 'gaussian':
343 | lt_increment = trans_fac * round(np.mean(gt[2:4])) * np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
344 | curr_sample[:2] = curr_sample[:2] + lt_increment.reshape(2,)
345 |
346 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
347 | wh_factor = base_scalar ** (scale_fac * randn_vec)
348 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
349 |
350 | elif sampling_type == 'uniform': #uniform distribution within a searching area 2.5 times the size of bbx
351 | sr_ratio = 3.5 #cfg.TEST.UNIFORM_SAMPLING_RANGE_RATIO #twice or 2.5 times???
352 |
353 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
354 | wh_factor = base_scalar ** (scale_fac * randn_vec)
355 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
356 |
357 | cx_bound = (curr_sample[0]-curr_sample[2]*(sr_ratio/2), curr_sample[0]+curr_sample[2]*(sr_ratio/2))
358 | cy_bound = (curr_sample[1]-curr_sample[3]*(sr_ratio/2), curr_sample[1]+curr_sample[3]*(sr_ratio/2))
359 | cx = (cx_bound[1] - cx_bound[0]) * np.random.random_sample() + cx_bound[0]
360 | cy = (cy_bound[1] - cy_bound[0]) * np.random.random_sample() + cy_bound[0]
361 |
362 | curr_sample[0] = cx
363 | curr_sample[1] = cy
364 |
365 | elif sampling_type == 'whole': #uniform distribution within the whole image
366 | randn_vec = np.array([post_proc(np.random.randn(1,)), post_proc(np.random.randn(1,))])
367 | wh_factor = base_scalar ** (scale_fac * randn_vec)
368 | curr_sample[2:] = curr_sample[2:] * wh_factor.reshape(2,)
369 |
370 | w = curr_sample[2]
371 | h = curr_sample[3]
372 | curr_sample[0] = (W - w) * np.random.random_sample() + w / 2.0
373 | curr_sample[1] = (H - h) * np.random.random_sample() + h / 2.0
374 |
375 | '''In case that samples experience abrupt scaling variation...''' ##########
376 | curr_sample[2] = max(5, min(W-5, curr_sample[2])) #w max(gt[2]/5.0, min(gt[2]*5.0, curr_sample[2]))
377 | curr_sample[3] = max(5, min(H-5, curr_sample[3])) #h max(gt[3]/5.0, min(gt[2]*5.0, curr_sample[3]))
378 |
379 | half_w, half_h = curr_sample[2]/2.0, curr_sample[3]/2.0
380 |
381 | # bbx_sample = np.array([curr_sample[0]-curr_sample[2]/2, curr_sample[1]-curr_sample[3]/2, curr_sample[2], curr_sample[3]])
382 | # bbx_sample[0] = max(0, min(W-bbx_sample[2]-1, bbx_sample[0]))
383 | # bbx_sample[1] = max(0, min(H-bbx_sample[3]-1, bbx_sample[1]))
384 |
385 | """The centre coordinate of candidate box should lie within the [half_w, W-half_w-1]x[half_h, H-half_h-1]"""
386 | curr_sample[0] = max(half_w, min(W-half_w-1, curr_sample[0]))
387 | curr_sample[1] = max(half_h, min(H-half_h-1, curr_sample[1]))
388 |
389 | x1, y1 = curr_sample[0]-half_w, curr_sample[1]-half_h
390 | x1, y1 = max(0, min(W-1, x1)), max(0, min(H-1, y1)) ### for insurance
391 | x2, y2 = curr_sample[0]+half_w, curr_sample[1]+half_h
392 | x2, y2 = max(0, min(W-1, x2)), max(0, min(H-1, y2)) ### for insurance
393 | bbx_sample = np.array([x1, y1, x2-x1+1, y2-y1+1])
394 |
395 | if iou_thresh_ignored: # this is exclusive for sampling candidates during online tracking
396 | samples[idx, :] = bbx_sample
397 | idx += 1
398 | continue
399 |
400 | overlap_ratio = IoU(bbx_sample, gt)
401 | if overlap_ratio >= pos_thresh and pos_sampling: #if positive sampling is being performed and its overlapping ratio >= 0.7
402 | samples[idx, :] = bbx_sample
403 | idx += 1
404 | elif overlap_ratio < neg_thresh and not pos_sampling: #if negative sampling is being performed and its overlapping ratio < 0.3
405 | samples[idx, :] = bbx_sample
406 | idx += 1
407 |
408 | return samples
409 |
410 |
411 | def gen_negative_samples_polar_radius(num_samples, gt, raw_img_size):
412 | """This function will generate num_samples negative samples using polar-radius based method"""
413 | frame_height = raw_img_size[0]
414 | frame_width = raw_img_size[1]
415 |
416 | theta_list = np.linspace(0, 2 * np.pi, 60)
417 |
418 | l_x, t_y, w, h = gt[0], gt[1], gt[2], gt[3]
419 |
420 | r_start = 0.2 * np.sqrt(w ** 2 + h ** 2)
421 | r_end = 0.5 * np.sqrt(w ** 2 + h ** 2)
422 | r_list = np.linspace(r_start, r_end, 10)
423 |
424 | c_x, c_y = l_x+w/2, t_y+h/2
425 |
426 | sample_cnt = 0
427 | sample_list = np.zeros((0, 4), dtype=np.float32)
428 |
429 | iter_cnt = 0
430 | while sample_cnt < num_samples:
431 | iter_cnt += 1
432 | if iter_cnt > 3: break
433 |
434 | for theta in theta_list:
435 | if sample_cnt >= num_samples: break
436 |
437 | angle_eps = np.pi/9
438 | if np.abs(theta) <= angle_eps \
439 | or np.abs(theta-np.pi/2) <= angle_eps \
440 | or np.abs(theta-np.pi) <= angle_eps \
441 | or np.abs(theta-1.5*np.pi) <= angle_eps \
442 | or np.abs(theta-2*np.pi) <= angle_eps: continue
443 |
444 | for r in r_list:
445 | if sample_cnt >= num_samples: break
446 |
447 | c_x__, c_y__ = c_x + r * np.cos(theta), c_y - r * np.sin(theta)
448 | if theta >= 0 and theta < np.pi/2: #theta in Region I
449 | h__ = 2.0 * (c_y - c_y__ + h / 2.0)
450 | w__ = 2.0 * (c_x__ - c_x + w / 2.0)
451 |
452 | elif theta >= np.pi/2 and theta < np.pi: #theta in Region II
453 | h__ = 2.0 * (c_y - c_y__ + h / 2.0)
454 | w__ = 2.0 * (c_x - c_x__ + w / 2.0)
455 |
456 | elif theta >= np.pi and theta < 1.5 * np.pi: #theta in Region III
457 | h__ = 2.0 * (c_y__ - c_y + h / 2.0)
458 | w__ = 2.0 * (c_x - c_x__ + w / 2.0)
459 |
460 | else: #theta in Region IV
461 | h__ = 2.0 * (c_y__ - c_y + h / 2.0)
462 | w__ = 2.0 * (c_x__ - c_x + w / 2.0)
463 |
464 | l_x__ = c_x__ - w__ / 2.0
465 | t_y__ = c_y__ - h__ / 2.0
466 |
467 | r_x__ = l_x__ + w__ - 1
468 | b_y__ = t_y__ + h__ - 1
469 |
470 | l_x__ = max(0, l_x__)
471 | t_y__ = max(0, t_y__)
472 | r_x__ = min(r_x__, frame_width - 1)
473 | b_y__ = min(b_y__, frame_height - 1)
474 |
475 | w__ = r_x__ - l_x__ + 1
476 | h__ = b_y__ - t_y__ + 1
477 |
478 | bbx_sample = np.array([l_x__, t_y__, w__, h__])
479 | overlap_ratio = IoU(bbx_sample, gt)
480 | #print 'overlap_ratio: {}'.format(overlap_ratio)
481 |
482 | if overlap_ratio <= 0.6:
483 | sample_list = np.vstack((sample_list, bbx_sample.reshape(1, 4)))
484 | sample_cnt += 1
485 |
486 | return sample_list
487 |
488 |
489 | def display(frame, saving_path, fname):
490 | saving_dir = saving_root + '/' + saving_path
491 | if not os.path.exists(saving_dir):
492 | os.makedirs(saving_dir)
493 | plt.imsave(saving_dir + '/' + fname, frame)
494 |
495 | #image_file = cbook.get_sample_data(saving_path + '/' + fname)
496 | #image = plt.imread(image_file)
497 | #plt.imshow(image)
498 | #plt.show()
499 |
500 |
501 | def vis_neg_finetuning_data_pool(seq_name, raw_img, idx_frame, neg_samples_gaussian, neg_samples_uniform, neg_samples_whole, neg_samples_polar_radius):
502 | root_dir = cfg.CODE_ROOT_DIR + '/output/finetuning_data/{}/'.format(cfg.TEST.BENCHMARK_NAME)
503 |
504 | tar_dir = root_dir + seq_name + '/' + str(idx_frame)
505 | if not os.path.exists(tar_dir):
506 | os.makedirs(tar_dir)
507 |
508 | for idx in xrange(neg_samples_gaussian.shape[0]):
509 | bbx_sample = neg_samples_gaussian[idx, :]
510 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
511 |
512 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (255, 0, 0))
513 |
514 | for idx in xrange(neg_samples_uniform.shape[0]):
515 | bbx_sample = neg_samples_uniform[idx, :]
516 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
517 |
518 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (0, 255, 0))
519 |
520 | for idx in xrange(neg_samples_whole.shape[0]):
521 | bbx_sample = neg_samples_whole[idx, :]
522 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
523 |
524 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (0, 0, 255))
525 |
526 | for idx in xrange(neg_samples_polar_radius.shape[0]):
527 | bbx_sample = neg_samples_polar_radius[idx, :]
528 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
529 |
530 | cv2.rectangle(raw_img, (box[0], box[1]), (box[2], box[3]), (255, 255, 255))
531 |
532 | now = datetime.datetime.now()
533 | jpg_name = now.strftime('%Y-%m-%d_%H:%M:%S') + '.jpg'
534 | cv2.imwrite(tar_dir + '/{}'.format(jpg_name), raw_img)
535 |
536 |
537 | def unif_save_visualization(frame_dup, path_seq, index_new_frame, pred_bbx_score, cand_dict_list, index_order):
538 | seq_name = os.path.split(path_seq)[1]
539 | saving_path = seq_name + '/' + str(index_new_frame)
540 | saving_root = cfg.CODE_ROOT_DIR + '/output/experimental/test_phase/{}'.format(cfg.TEST.BENCHMARK_NAME)
541 | saving_dir = saving_root + '/' + saving_path
542 | fname = 'cands.jpg'
543 |
544 | if not os.path.exists(saving_dir):
545 | os.makedirs(saving_dir)
546 | cv2.imwrite(os.path.join(saving_dir, fname), frame_dup)
547 |
548 | if pred_bbx_score >= 0.90:
549 | corr_fobj = open(os.path.join(saving_dir, 'corr.txt'), 'w')
550 | for index in index_order[:20]:
551 | distance = cand_dict_list[index, -1]
552 | prob = cand_dict_list[index, -2]
553 | entry = '{} {} {}'.format(index, prob, distance)
554 | corr_fobj.write(entry + '\n')
555 | corr_fobj.close()
556 |
557 |
558 | def unif_vis_cands_conf_weight(i, index, bbxes_Pk, frame_dup, path_seq, index_new_frame, cand_dict_list, i_dist_prob, i_factor):
559 | bbx_sample = bbxes_Pk[index, :]
560 | box = (int(round(bbx_sample[0])), int(round(bbx_sample[1])), int(round(bbx_sample[0]+bbx_sample[2]-1)), int(round(bbx_sample[1]+bbx_sample[3]-1)))
561 | cv2.rectangle(frame_dup, box[:2], box[2:], (0, 0, 255))
562 | cv2.putText(frame_dup, '{}'.format(index), box[:2], cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1, cv2.LINE_AA)
563 |
564 |
565 | def gen_config_report():
566 | tar_fpath = cfg.ROOT_DIR + '/output/tracking_res/{}/{}'.format(cfg.TEST.BENCHMARK_NAME, cfg.TEST.BENCHMARK_NAME) + '_config_rep.txt'
567 | tar_fobj = open(tar_fpath, 'w')
568 |
569 | for key in cfg.keys():
570 | if key == 'TEST' or key == 'TRAIN':
571 | continue
572 |
573 | info = '__C.' + key + ': ' + cfg[key]
574 | tar_fobj.write(info + '\n')
575 |
576 | for key in cfg.TEST.keys():
577 | info = '__C.TEST.{}: {}'.format(key, cfg.TEST[key])
578 | tar_fobj.write(info + '\n')
579 |
580 | tar_fobj.close()
581 |
582 |
583 | def compute_Gaussian2D_prob(x, mu, cov):
584 | det_cov = cov[0, 0] * cov[1, 1]
585 | normalizer = 2 * np.pi * (det_cov ** 0.5)
586 |
587 | delta = x - mu
588 | mahalanoibis_dis = -0.5 * (delta[0] ** 2 / cov[0, 0] + delta[1] ** 2 / cov[1, 1])
589 |
590 | return (1.0 / normalizer) * np.exp(mahalanoibis_dis)
591 |
592 |
593 | def compute_Laplacian2D_prob(x, mu, b):
594 | euclidean_dis = np.dot((x - mu), (x - mu)) ** (0.5)
595 | return 1.0 / (2.0 * b) * np.exp(-1.0 * euclidean_dis / b)
596 |
597 |
598 | def determine_displacement(bbx1, bbx2):
599 | cx1 = bbx1[0] + bbx1[2] / 2.0
600 | cy1 = bbx1[1] + bbx1[3] / 2.0
601 | cx2 = bbx2[0] + bbx2[2] / 2.0
602 | cy2 = bbx2[1] + bbx2[3] / 2.0
603 | #return np.abs(cx1 - cx2), np.abs(cy1 - cy2)
604 | return np.sqrt((cx1 - cx2) ** 2 + (cy1 - cy2) ** 2)
605 |
606 |
607 | def func_iou(bb, gtbb):
608 | iou = 0
609 | iw = min(bb[2],gtbb[2]) - max(bb[0],gtbb[0]) + 1
610 | ih = min(bb[3],gtbb[3]) - max(bb[1],gtbb[1]) + 1
611 |
612 | if iw>0 and ih>0:
613 | ua = (bb[2]-bb[0]+1)*(bb[3]-bb[1]+1) + (gtbb[2]-gtbb[0]+1)*(gtbb[3]-gtbb[1]+1) - iw*ih
614 | iou = iw*ih/ua;
615 |
616 | return iou
617 |
618 |
619 | def sample_regions_precompute(rad, nr_ang, stepsize, scales=[0.7071, 1, 1.4142]):
620 | nr_step = int(rad / stepsize)
621 | cos_values = np.cos(np.arange(0,2*np.pi,2*np.pi/nr_ang))
622 | sin_values = np.sin(np.arange(0,2*np.pi,2*np.pi/nr_ang))
623 |
624 | dxdys = np.zeros((2,nr_step*nr_ang+1))
625 | count = 0
626 | for ir in range(1,nr_step+1):
627 | offset = stepsize * ir
628 | for ia in range(1,nr_ang+1):
629 |
630 | dx = offset * cos_values[ia-1]
631 | dy = offset * sin_values[ia-1]
632 | count += 1
633 | dxdys[0, count-1] = dx
634 | dxdys[1, count-1] = dy
635 |
636 | samples = np.zeros((4,(nr_ang*nr_step+1)*len(scales)))
637 | count = 0
638 | jump = nr_step*nr_ang+1
639 | for s in scales:
640 | samples[0:2, count*jump:(count+1)*jump] = dxdys
641 | samples[2, count*jump:(count+1)*jump] = s;
642 | samples[3, count*jump:(count+1)*jump] = s;
643 | count = count + 1
644 |
645 | return samples # dx dy 1*s 1*s
646 |
647 |
648 | def sample_regions(x, y, w, h, im_w, im_h, samples_template):
649 | samples = samples_template.copy()
650 | samples[0,:] += x
651 | samples[1,:] += y
652 | samples[2,:] *= w
653 | samples[3,:] *= h
654 |
655 | samples[2,:] = samples[0,:] + samples[2,:] - 1
656 | samples[3,:] = samples[1,:] + samples[3,:] - 1
657 | samples = np.round(samples)
658 |
659 | flags = np.logical_and(np.logical_and(np.logical_and(samples[0,:]>0, samples[1,:]>0), samples[2,:] 4:
32 | return Polygon([Point(tokens[i],tokens[i+1]) for i in xrange(0,len(tokens),2)])
33 | return None
34 |
35 | def encode_region(region):
36 | if isinstance(region, Polygon):
37 | return ','.join(['{},{}'.format(p.x,p.y) for p in region.points])
38 | elif isinstance(region, Rectangle):
39 | return '{},{},{},{}'.format(region.x, region.y, region.width, region.height)
40 | else:
41 | return ""
42 |
43 | def convert_region(region, to):
44 |
45 | if to == 'rectangle':
46 |
47 | if isinstance(region, Rectangle):
48 | return copy.copy(region)
49 | elif isinstance(region, Polygon):
50 | top = sys.float_info.max
51 | bottom = sys.float_info.min
52 | left = sys.float_info.max
53 | right = sys.float_info.min
54 |
55 | for point in region.points:
56 | top = min(top, point.y)
57 | bottom = max(bottom, point.y)
58 | left = min(left, point.x)
59 | right = max(right, point.x)
60 |
61 | return Rectangle(left, top, right - left, bottom - top)
62 |
63 | else:
64 | return None
65 | if to == 'polygon':
66 |
67 | if isinstance(region, Rectangle):
68 | points = []
69 | points.append((region.x, region.y))
70 | points.append((region.x + region.width, region.y))
71 | points.append((region.x + region.width, region.y + region.height))
72 | points.append((region.x, region.y + region.height))
73 | return Polygon(points)
74 |
75 | elif isinstance(region, Polygon):
76 | return copy.copy(region)
77 | else:
78 | return None
79 |
80 | return None
81 |
82 | class VOT(object):
83 | """ Base class for Python VOT integration """
84 | def __init__(self, region_format):
85 | """ Constructor
86 |
87 | Args:
88 | region_format: Region format options
89 | """
90 | assert(region_format in ['rectangle', 'polygon'])
91 | if TRAX:
92 | options = trax.server.ServerOptions(region_format, trax.image.PATH)
93 | self._trax = trax.server.Server(options)
94 |
95 | request = self._trax.wait()
96 | assert(request.type == 'initialize')
97 | if request.region.type == 'polygon':
98 | self._region = Polygon([Point(x[0], x[1]) for x in request.region.points])
99 | else:
100 | self._region = Rectangle(request.region.x, request.region.y, request.region.width, request.region.height)
101 | self._image = str(request.image)
102 | self._trax.status(request.region)
103 | else:
104 | self._files = [x.strip('\n') for x in open('images.txt', 'r').readlines()]
105 | self._frame = 0
106 | self._region = convert_region(parse_region(open('region.txt', 'r').readline()), region_format)
107 | self._result = []
108 |
109 | def region(self):
110 | """
111 | Send configuration message to the client and receive the initialization
112 | region and the path of the first image
113 |
114 | Returns:
115 | initialization region
116 | """
117 |
118 | return self._region
119 |
120 | def report(self, region, confidence = 0):
121 | """
122 | Report the tracking results to the client
123 |
124 | Arguments:
125 | region: region for the frame
126 | """
127 | assert(isinstance(region, Rectangle) or isinstance(region, Polygon))
128 | if TRAX:
129 | if isinstance(region, Polygon):
130 | tregion = trax.region.Polygon([(x.x, x.y) for x in region.points])
131 | else:
132 | tregion = trax.region.Rectangle(region.x, region.y, region.width, region.height)
133 | self._trax.status(tregion, {"confidence" : confidence})
134 | else:
135 | self._result.append(region)
136 | self._frame += 1
137 |
138 | def frame(self):
139 | """
140 | Get a frame (image path) from client
141 |
142 | Returns:
143 | absolute path of the image
144 | """
145 | if TRAX:
146 | if hasattr(self, "_image"):
147 | image = str(self._image)
148 | del self._image
149 | return image
150 |
151 | request = self._trax.wait()
152 |
153 | if request.type == 'frame':
154 | return str(request.image)
155 | else:
156 | return None
157 |
158 | else:
159 | if self._frame >= len(self._files):
160 | return None
161 | return self._files[self._frame]
162 |
163 | def quit(self):
164 | if TRAX:
165 | self._trax.quit()
166 | elif hasattr(self, '_result'):
167 | with open('output.txt', 'w') as f:
168 | for r in self._result:
169 | f.write(encode_region(r))
170 | f.write('\n')
171 |
172 | def __del__(self):
173 | self.quit()
174 |
175 |
--------------------------------------------------------------------------------
/code/vot_SiamRPN.py:
--------------------------------------------------------------------------------
1 | # --------------------------------------------------------
2 | # DaSiamRPN
3 | # Licensed under The MIT License
4 | # Written by Qiang Wang (wangqiang2015 at ia.ac.cn)
5 | # --------------------------------------------------------
6 | #!/usr/bin/python
7 |
8 | import vot
9 | from vot import Rectangle
10 | import sys
11 | import cv2 # imread
12 | import torch
13 | import numpy as np
14 | from os.path import realpath, dirname, join
15 |
16 | from net import SiamRPNBIG
17 | from run_SiamRPN import SiamRPN_init, SiamRPN_track
18 | from utils import get_axis_aligned_bbox, cxy_wh_2_rect
19 |
20 | # load net
21 | net_file = join(realpath(dirname(__file__)), 'SiamRPNBIG.model')
22 | net = SiamRPNBIG()
23 | net.load_state_dict(torch.load(net_file))
24 | net.eval().cuda()
25 |
26 | # warm up
27 | for i in range(10):
28 | net.temple(torch.autograd.Variable(torch.FloatTensor(1, 3, 127, 127)).cuda())
29 | net(torch.autograd.Variable(torch.FloatTensor(1, 3, 255, 255)).cuda())
30 |
31 | # start to track
32 | handle = vot.VOT("polygon")
33 | Polygon = handle.region()
34 | cx, cy, w, h = get_axis_aligned_bbox(Polygon)
35 |
36 | image_file = handle.frame()
37 | if not image_file:
38 | sys.exit(0)
39 |
40 | target_pos, target_sz = np.array([cx, cy]), np.array([w, h])
41 | im = cv2.imread(image_file) # HxWxC
42 | state = SiamRPN_init(im, target_pos, target_sz, net) # init tracker
43 | while True:
44 | image_file = handle.frame()
45 | if not image_file:
46 | break
47 | im = cv2.imread(image_file) # HxWxC
48 | state = SiamRPN_track(state, im) # track
49 |
50 | # convert cx, cy, w, h into rect
51 | res = cxy_wh_2_rect(state['target_pos'], state['target_sz'])
52 | handle.report(Rectangle(res[0], res[1], res[2], res[3]))
53 |
54 |
--------------------------------------------------------------------------------
/data/whole_list.txt:
--------------------------------------------------------------------------------
1 | /home/lishen/Experiments/CLSTMT/dataset/test_set/OTB100/sequences/KiteSurf 84
2 |
--------------------------------------------------------------------------------
/ext/roi-align.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MathsShen/DaSiamRPNWithOfflineTraining/c9011aefd0551441ef6ab91c465951556cc86e50/ext/roi-align.png
--------------------------------------------------------------------------------