├── LICENSE
├── README.md
├── lib
    ├── __init__.py
    ├── config.py
    ├── dataloader
    │   ├── __init__.py
    │   ├── augmentations.py
    │   └── dataloader.py
    ├── model
    │   ├── __init__.py
    │   ├── anchors.py
    │   ├── aploss.py
    │   └── model.py
    └── util
    │   ├── __init__.py
    │   ├── calc_iou.py
    │   ├── coco_eval.py
    │   ├── pascal_voc_eval.py
    │   ├── utils.py
    │   └── voc_eval.py
├── test.py
├── test.sh
├── train.py
└── train.sh


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 CKA
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # AP-loss
 2 | The implementation of “[Towards accurate one-stage object detection with AP-loss](https://arxiv.org/abs/1904.06373)”.
 3 | 
 4 | ### Requirements
 5 | - Python 2.7
 6 | - PyTorch 1.3+
 7 | - Cuda
 8 | 
 9 | ### Installation
10 | 1. Clone this repo
11 | ```
12 | git clone https://github.com/cccorn/AP-loss.git
13 | cd AP-loss
14 | ```
15 | 2. Install the python packages:
16 | ```
17 | pip install pycocotools
18 | pip install opencv-python
19 | ```
20 | 3. Create directories:
21 | ```
22 | mkdir data models results
23 | ```
24 | 4. Prepare Data. You can use
25 | ```
26 | ln -s $YOUR_PATH_TO_coco data/coco
27 | ln -s $YOUR_PATH_TO_VOCdevkit data/voc
28 | ```
29 | The directories should be arranged like:
30 | ```
31 | ├── data
32 | │   ├── coco
33 | │   │   ├── annotations
34 | │   │   ├── images
35 | │   │   │   ├── train2017
36 | │   │   │   ├── val2017
37 | │   │   │   ├── test-dev2017
38 | │   ├── voc
39 | │   │   ├── VOC2007
40 | │   │   ├── VOC2012
41 | ```
42 | 5. Prepare the pre-trained models and put them in `models` like:
43 | ```
44 | ├── models
45 | │   ├── resnet50-pytorch.pth
46 | |   ├── resnet101-pytorch.pth
47 | ```
48 | We use the ResNet-50 and ResNet-101 pre-trained models which are converted from [here](https://github.com/KaimingHe/deep-residual-networks). We also provide the converted pre-trained models at [this link](https://1drv.ms/u/s!AgPNhBALXYVSa1pQCFJNNk6JgaA?e=PqhsWD).
49 | 
50 | ### Training
51 | 
52 | ```
53 | bash train.sh
54 | ```
55 | You can modify the configurations in `lib/config.py` to change the gpu_ids, network depth, image size, etc.
56 | 
57 | ### Testing
58 | 
59 | ```
60 | bash test.sh
61 | ```
62 | 
63 | ### Note
64 | 
65 | We release the AP-loss implementation in PyTorch instead of in MXNet due to an engineering [issue](https://github.com/apache/incubator-mxnet/issues/8884): the python custom operator in MXNet does not run in parrallel when using multi-gpus. It is more practical to implement AP-loss in PyTorch, for faster training speed. 
66 | 
67 | ### Acknowledgements
68 | 
69 | - Many thanks to the pytorch implementation of RetinaNet at [pytorch-retinanet](https://github.com/yhenon/pytorch-retinanet).
70 | 
71 | ### Citation
72 | 
73 | If you find this repository useful in your research, please consider citing:
74 | ```
75 | @inproceedings{chen2019towards,
76 |   title={Towards accurate one-stage object detection with ap-loss},
77 |   author={Chen, Kean and Li, Jianguo and Lin, Weiyao and See, John and Wang, Ji and Duan, Lingyu and Chen, Zhibo and He, Changwei and Zou, Junni},
78 |   booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
79 |   pages={5119--5127},
80 |   year={2019}
81 | }
82 | ```
83 | 


--------------------------------------------------------------------------------
/lib/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/lib/config.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | gpu_ids=[0,1]
 4 | batch_size=8
 5 | 
 6 | lr=0.001
 7 | 
 8 | warmup=True
 9 | warmup_step=500
10 | warmup_factor=0.33333333
11 | 
12 | train_img_size=512
13 | test_img_size=[500,833]  #(size of the shorter side, maximum size of the side)
14 | 
15 | anchor_ratios=np.array([0.5,1.0,2.0])
16 | anchor_scales=np.array([2**0,2**(1.0/2.0)])
17 | num_anchors=len(anchor_ratios)*len(anchor_scales)
18 | 
19 | pixel_mean = np.array([[[102.9801, 115.9465, 122.7717]]])
20 | 
21 | dataset_coco={'dataset':'coco', 'path':'data/coco', 'train_set':'train2017', 'test_set':'val2017', 'epochs':100, 'lr_step':[60,80]}
22 | dataset_voc={'dataset':'voc', 'path':'data/voc', 'train_set':'2007_trainval+2012_trainval', 'test_set':'2007_test', 'epochs':160, 'lr_step':[110,140]}
23 | 
24 | depth=101
25 | 


--------------------------------------------------------------------------------
/lib/dataloader/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/lib/dataloader/augmentations.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import random
  4 | from .. import config
  5 | import torch
  6 | 
  7 | def intersect(box_a, box_b):
  8 |     max_xy = np.minimum(box_a[:, 2:], box_b[2:])
  9 |     min_xy = np.maximum(box_a[:, :2], box_b[:2])
 10 |     inter = np.clip((max_xy - min_xy +1), a_min=0, a_max=np.inf)
 11 |     return inter[:, 0] * inter[:, 1]
 12 | 
 13 | 
 14 | def jaccard_numpy(box_a, box_b):
 15 |     """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
 16 |     is simply the intersection over union of two boxes.
 17 |     Args:
 18 |         box_a: Multiple bounding boxes, Shape: [num_boxes,4]
 19 |         box_b: Single bounding box, Shape: [4]
 20 |     Return:
 21 |         jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]]
 22 |     """
 23 |     inter = intersect(box_a, box_b)
 24 |     area_a = ((box_a[:, 2]-box_a[:, 0]+1) *
 25 |               (box_a[:, 3]-box_a[:, 1]+1))  # [A,B]
 26 |     area_b = ((box_b[2]-box_b[0]+1) *
 27 |               (box_b[3]-box_b[1]+1))  # [A,B]
 28 |     union = area_a + area_b - inter
 29 |     return inter / union  # [A,B]
 30 | 
 31 | 
 32 | class Compose(object):
 33 |     """Composes several augmentations together.
 34 |     Args:
 35 |         transforms (List[Transform]): list of transforms to compose.
 36 |     Example:
 37 |         >>> augmentations.Compose([
 38 |         >>>     transforms.CenterCrop(10),
 39 |         >>>     transforms.ToTensor(),
 40 |         >>> ])
 41 |     """
 42 | 
 43 |     def __init__(self, transforms):
 44 |         self.transforms = transforms
 45 | 
 46 |     def __call__(self, img, boxes=None, labels=None):
 47 |         for t in self.transforms:
 48 |             img, boxes, labels = t(img, boxes, labels)
 49 |         return img, boxes, labels
 50 | 
 51 | 
 52 | class ConvertFromInts(object):
 53 |     def __call__(self, image, boxes=None, labels=None):
 54 |         return image.astype(np.float32), boxes, labels
 55 | 
 56 | 
 57 | class SubtractMeans(object):
 58 |     def __init__(self, mean):
 59 |         self.mean = np.array(mean, dtype=np.float32)
 60 | 
 61 |     def __call__(self, image, boxes=None, labels=None):
 62 |         image = image.astype(np.float32)
 63 |         image -= self.mean
 64 |         return image.astype(np.float32), boxes, labels
 65 | 
 66 | 
 67 | class Resize(object):
 68 |     def __init__(self, size=300):
 69 |         self.size = size
 70 |         self.interp_mode = (
 71 |                     cv2.INTER_LINEAR,
 72 |                     cv2.INTER_AREA,
 73 |                     cv2.INTER_NEAREST,
 74 |                     cv2.INTER_CUBIC,
 75 |                     cv2.INTER_LANCZOS4)
 76 | 
 77 |     def __call__(self, image, boxes=None, labels=None):
 78 |         
 79 |         interpolation_mode=random.choice(self.interp_mode)
 80 |         rows,cols,cns=image.shape
 81 |         image = cv2.resize(image, (self.size,
 82 |                                  self.size), interpolation=interpolation_mode)
 83 |         rows2,cols2,cns=image.shape
 84 |         boxes_wh=boxes[:,2:4]-boxes[:,0:2]+1.0
 85 |         boxes[:,0:2]=boxes[:,0:2]/np.array([cols,rows])*np.array([cols2,rows2])
 86 |         boxes[:,2:4]=boxes[:,0:2]+boxes_wh/np.array([cols,rows])*np.array([cols2,rows2])-1.0
 87 |         return image, boxes, labels
 88 | 
 89 | 
 90 | class RandomSaturation(object):
 91 |     def __init__(self, lower=0.5, upper=1.5):
 92 |         self.lower = lower
 93 |         self.upper = upper
 94 |         assert self.upper >= self.lower, "contrast upper must be >= lower."
 95 |         assert self.lower >= 0, "contrast lower must be non-negative."
 96 | 
 97 |     def __call__(self, image, boxes=None, labels=None):
 98 |         if random.randint(0,1):
 99 |             image[:, :, 1] *= random.uniform(self.lower, self.upper)
100 | 
101 |         return image, boxes, labels
102 | 
103 | 
104 | class RandomHue(object):
105 |     def __init__(self, delta=18.0):
106 |         assert delta >= 0.0 and delta <= 360.0
107 |         self.delta = delta
108 | 
109 |     def __call__(self, image, boxes=None, labels=None):
110 |         if random.randint(0,1):
111 |             image[:, :, 0] += random.uniform(-self.delta, self.delta)
112 |             image[:, :, 0][image[:, :, 0] > 360.0] -= 360.0
113 |             image[:, :, 0][image[:, :, 0] < 0.0] += 360.0
114 |         return image, boxes, labels
115 | 
116 | 
117 | class RandomLightingNoise(object):
118 |     def __init__(self, std=0.1):
119 |         self.eigval = np.array([55.46, 4.794, 1.148])
120 |         self.eigvec = np.array([[-0.5675, 0.7192, 0.4009],
121 |                                 [-0.5808, -0.0045, -0.8140],
122 |                                 [-0.5836, -0.6948, 0.4203]])
123 |         self.std=std
124 | 
125 |     def __call__(self, image, boxes=None, labels=None):
126 |         if random.randint(0,1):
127 |             alpha = np.array([random.normalvariate(0, self.std) for _ in range(3)])
128 |             rgb = np.dot(self.eigvec * alpha, self.eigval)
129 |             image += rgb
130 |         return image, boxes, labels
131 | 
132 | 
133 | class ConvertColor(object):
134 |     def __init__(self, current='BGR', transform='HSV'):
135 |         self.transform = transform
136 |         self.current = current
137 | 
138 |     def __call__(self, image, boxes=None, labels=None):
139 |         if self.current == 'BGR' and self.transform == 'HSV':
140 |             image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
141 |         elif self.current == 'HSV' and self.transform == 'BGR':
142 |             image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
143 |         else:
144 |             raise NotImplementedError
145 |         return image, boxes, labels
146 | 
147 | 
148 | class RandomContrast(object):
149 |     def __init__(self, lower=0.5, upper=1.5):
150 |         self.lower = lower
151 |         self.upper = upper
152 |         assert self.upper >= self.lower, "contrast upper must be >= lower."
153 |         assert self.lower >= 0, "contrast lower must be non-negative."
154 | 
155 |     # expects float image
156 |     def __call__(self, image, boxes=None, labels=None):
157 |         if random.randint(0,1):
158 |             alpha = random.uniform(self.lower, self.upper)
159 |             image *= alpha
160 |         return image, boxes, labels
161 | 
162 | 
163 | class RandomBrightness(object):
164 |     def __init__(self, delta=32):
165 |         assert delta >= 0.0
166 |         assert delta <= 255.0
167 |         self.delta = delta
168 | 
169 |     def __call__(self, image, boxes=None, labels=None):
170 |         if random.randint(0,1):
171 |             delta = random.uniform(-self.delta, self.delta)
172 |             image += delta
173 |         return image, boxes, labels
174 | 
175 | class RandomSampleCrop(object):
176 |     """Crop
177 |     Arguments:
178 |         img (Image): the image being input during training
179 |         boxes (Tensor): the original bounding boxes in pt form
180 |         labels (Tensor): the class labels for each bbox
181 |         mode (float tuple): the min and max jaccard overlaps
182 |     Return:
183 |         (img, boxes, classes)
184 |             img (Image): the cropped image
185 |             boxes (Tensor): the adjusted bounding boxes in pt form
186 |             labels (Tensor): the class labels for each bbox
187 |     """
188 |     def __init__(self):
189 |         self.sample_options = (
190 |             # using entire original input image
191 |             None,
192 |             # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9
193 |             (0.1, None),
194 |             (0.3, None),
195 |             (0.7, None),
196 |             (0.9, None),
197 |             # randomly sample a patch
198 |             (None, None),
199 |         )
200 | 
201 |     def __call__(self, image, boxes=None, labels=None):
202 |         height, width, _ = image.shape
203 |         while True:
204 |             # randomly choose a mode
205 |             mode = random.choice(self.sample_options)
206 |             if mode is None:
207 |                 return image, boxes, labels
208 | 
209 |             min_iou, max_iou = mode
210 |             if min_iou is None:
211 |                 min_iou = float('-inf')
212 |             if max_iou is None:
213 |                 max_iou = float('inf')
214 | 
215 |             # max trails (50)
216 |             for _ in range(50):
217 |                 current_image = image
218 | 
219 |                 w = random.uniform(0.3 * width, width)
220 |                 h = random.uniform(0.3 * height, height)
221 | 
222 |                 # aspect ratio constraint b/t .5 & 2
223 |                 if h / w < 0.5 or h / w > 2 or w<1 or h<1:
224 |                     continue
225 | 
226 |                 left = random.uniform(0,width - w)
227 |                 top = random.uniform(0,height - h)
228 | 
229 |                 # convert to integer rect x1,y1,x2,y2
230 |                 rect = np.array([int(left), int(top), int(left+w)-1, int(top+h)-1])
231 | 
232 |                 # calculate IoU (jaccard overlap) b/t the cropped and gt boxes
233 |                 overlap = jaccard_numpy(boxes, rect)
234 | 
235 |                 # is min and max overlap constraint satisfied? if not try again
236 |                 #if overlap.min() < min_iou and max_iou < overlap.max():
237 |                 #    continue
238 |                 if overlap.max()<min_iou or overlap.min()>max_iou:
239 |                     continue
240 | 
241 |                 # cut the crop from the image
242 |                 current_image = current_image[rect[1]:rect[3]+1, rect[0]:rect[2]+1,
243 |                                               :]
244 | 
245 |                 # keep overlap with gt box IF center in sampled patch
246 |                 centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0
247 | 
248 |                 # mask in all gt boxes that above and to the left of centers
249 |                 m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1])
250 | 
251 |                 # mask in all gt boxes that under and to the right of centers
252 |                 m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1])
253 | 
254 |                 # mask in that both m1 and m2 are true
255 |                 mask = m1 * m2
256 | 
257 |                 # have any valid boxes? try again if not
258 |                 if not mask.any():
259 |                     continue
260 | 
261 |                 # take only matching gt boxes
262 |                 current_boxes = boxes[mask, :].copy()
263 | 
264 |                 # take only matching gt labels
265 |                 current_labels = labels[mask]
266 | 
267 |                 # should we use the box left and top corner or the crop's
268 |                 current_boxes[:, :2] = np.maximum(current_boxes[:, :2],
269 |                                                   rect[:2])
270 |                 # adjust to crop (by substracting crop's left,top)
271 |                 current_boxes[:, :2] -= rect[:2]
272 | 
273 |                 current_boxes[:, 2:] = np.minimum(current_boxes[:, 2:],
274 |                                                   rect[2:])
275 |                 # adjust to crop (by substracting crop's left,top)
276 |                 current_boxes[:, 2:] -= rect[:2]
277 | 
278 |                 return current_image, current_boxes, current_labels
279 | 
280 | 
281 | class Expand(object):
282 |     def __init__(self, mean):
283 |         self.mean = mean
284 | 
285 |     def __call__(self, image, boxes, labels):
286 |         if random.randint(0,1):
287 |             return image, boxes, labels
288 | 
289 |         height, width, depth = image.shape
290 |         ratio = random.uniform(1, 4)
291 |         left = random.uniform(0, int(width*ratio) - width)
292 |         top = random.uniform(0, int(height*ratio) - height)
293 | 
294 |         expand_image = np.zeros(
295 |             (int(height*ratio), int(width*ratio), depth),
296 |             dtype=image.dtype)
297 |         expand_image[:, :, :] = self.mean
298 |         expand_image[int(top):int(top + height),
299 |                      int(left):int(left + width)] = image
300 |         image = expand_image
301 | 
302 |         boxes = boxes.copy()
303 |         boxes[:, :2] += (int(left), int(top))
304 |         boxes[:, 2:] += (int(left), int(top))
305 | 
306 |         return image, boxes, labels
307 | 
308 | 
309 | class RandomMirror(object):
310 |     def __call__(self, image, boxes, classes):
311 |         _, width, _ = image.shape
312 |         if random.randint(0,1):
313 |             image = image[:, ::-1]
314 |             boxes = boxes.copy()
315 |             boxes[:, 0::2] = width - boxes[:, 2::-2]-1
316 |         return image, boxes, classes
317 | 
318 | 
319 | class PhotometricDistort(object):
320 |     def __init__(self):
321 |         self.pd = [
322 |             RandomContrast(),
323 |             ConvertColor(transform='HSV'),
324 |             RandomSaturation(),
325 |             RandomHue(),
326 |             ConvertColor(current='HSV', transform='BGR'),
327 |             RandomContrast()
328 |         ]
329 |         self.rand_brightness = RandomBrightness()
330 |         self.rand_lighting = RandomLightingNoise()
331 | 
332 |     def __call__(self, image, boxes, labels):
333 |         im = image.copy()
334 |         im, boxes, labels = self.rand_brightness(im, boxes, labels)
335 |         if random.randint(0,1):
336 |             distort = Compose(self.pd[:-1])
337 |         else:
338 |             distort = Compose(self.pd[1:])
339 |         im, boxes, labels = distort(im, boxes, labels)
340 |         im, boxes, labels = self.rand_lighting(im, boxes, labels)
341 |         return im, boxes, labels
342 | 
343 | class Augmentation(object):
344 |     def __init__(self):
345 |         self.mean = config.pixel_mean
346 |         self.size = config.train_img_size
347 |         self.augment = Compose([
348 |             ConvertFromInts(),
349 |             PhotometricDistort(),
350 |             Expand(self.mean),
351 |             RandomSampleCrop(),
352 |             RandomMirror(),
353 |             Resize(self.size),
354 |             SubtractMeans(self.mean)
355 |         ])
356 | 
357 |     def __call__(self, sample):
358 |         img=sample['img']
359 |         annot=sample['annot']
360 |         boxes=annot[:,:4].copy()
361 |         labels=annot[:,4:].copy()
362 |         img1,boxes1,labels1=self.augment(img, boxes, labels)
363 |         return {'img':torch.from_numpy(img1),'annot':torch.from_numpy(np.hstack([boxes1,labels1]))}
364 | 


--------------------------------------------------------------------------------
/lib/dataloader/dataloader.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function, division
  2 | import os
  3 | import torch
  4 | import numpy as np
  5 | import random
  6 | 
  7 | from torch.utils.data import Dataset, DataLoader
  8 | from torchvision import transforms, utils
  9 | from torch.utils.data.sampler import Sampler
 10 | 
 11 | from pycocotools.coco import COCO
 12 | from ..util.pascal_voc_eval import voc_eval
 13 | import cv2
 14 | from .. import config
 15 | 
 16 | from augmentations import Augmentation
 17 | 
 18 | class CocoDataset(Dataset):
 19 |     """Coco dataset."""
 20 | 
 21 |     def __init__(self, root_dir, set_name=['train2017'], transform=None):
 22 |         """
 23 |         Args:
 24 |             root_dir (string): COCO directory.
 25 |             transform (callable, optional): Optional transform to be applied
 26 |                 on a sample.
 27 |         """
 28 |         self.root_dir = root_dir
 29 |         self.set_name = set_name
 30 |         self.transform = transform
 31 | 
 32 |         self.coco={}
 33 |         for set_name_ii in self.set_name:
 34 |             prefix_ii='instances' if 'test' not in set_name_ii else 'image_info'
 35 |             self.coco[set_name_ii] = COCO(os.path.join(self.root_dir, 'annotations', prefix_ii + '_' + set_name_ii + '.json'))
 36 | 
 37 |         self.image_ids = []
 38 |         for set_name_ii in self.set_name:
 39 |             self.image_ids.extend([[set_name_ii,ids] for ids in self.coco[set_name_ii].getImgIds()])
 40 | 
 41 |         self.load_classes()
 42 | 
 43 |     def load_classes(self):
 44 |         # load class names (name -> label)
 45 |         set_name_ii=self.set_name[0]
 46 |         categories = self.coco[set_name_ii].loadCats(self.coco[set_name_ii].getCatIds())
 47 |         categories.sort(key=lambda x: x['id'])
 48 | 
 49 |         self.classes             = {}
 50 |         self.coco_labels         = {}
 51 |         self.coco_labels_inverse = {}
 52 |         for c in categories:
 53 |             self.coco_labels[len(self.classes)] = c['id']
 54 |             self.coco_labels_inverse[c['id']] = len(self.classes)
 55 |             self.classes[c['name']] = len(self.classes)
 56 | 
 57 |         # also load the reverse (label -> name)
 58 |         self.labels = {}
 59 |         for key, value in self.classes.items():
 60 |             self.labels[value] = key
 61 | 
 62 |     def __len__(self):
 63 |         return len(self.image_ids)
 64 | 
 65 |     def __getitem__(self, idx):
 66 | 
 67 |         img = self.load_image(idx)
 68 |         annot = self.load_annotations(idx)
 69 |         sample = {'img': img, 'annot': annot}
 70 |         if self.transform:
 71 |             sample = self.transform(sample)
 72 | 
 73 |         return sample
 74 | 
 75 |     def load_image(self, image_index):
 76 |         image_info_info = self.image_ids[image_index]
 77 |         image_info = self.coco[image_info_info[0]].loadImgs(image_info_info[1])[0]
 78 |         path       = os.path.join(self.root_dir, 'images', image_info_info[0], image_info['file_name'])
 79 | 
 80 |         img = cv2.imread(path,cv2.IMREAD_COLOR)
 81 | 
 82 |         return img.astype(np.float32)
 83 | 
 84 |     def load_annotations(self, image_index):
 85 | 
 86 |         image_info_info = self.image_ids[image_index]
 87 |         annotations_ids = self.coco[image_info_info[0]].getAnnIds(imgIds=image_info_info[1], iscrowd=False)
 88 | 
 89 |         loaded_img=self.coco[image_info_info[0]].loadImgs(image_info_info[1])
 90 | 
 91 |         width=loaded_img[0]['width']
 92 |         height=loaded_img[0]['height']
 93 | 
 94 |         valid_boxes=[]
 95 |         coco_annotations = self.coco[image_info_info[0]].loadAnns(annotations_ids)
 96 |         for idx, a in enumerate(coco_annotations): 
 97 | 
 98 |             x1,y1=a['bbox'][0],a['bbox'][1]
 99 |             x2=x1+np.maximum(0.,a['bbox'][2]-1.)
100 |             y2=y1+np.maximum(0.,a['bbox'][3]-1.)
101 | 
102 |             x1=np.minimum(width-1.,np.maximum(0.,x1))
103 |             y1=np.minimum(height-1.,np.maximum(0.,y1))
104 |             x2=np.minimum(width-1.,np.maximum(0.,x2))
105 |             y2=np.minimum(height-1.,np.maximum(0.,y2)) 
106 | 
107 |             label=self.coco_label_to_label(a['category_id'])
108 | 
109 |             if a['area']>0 and x2>x1 and y2>y1:
110 |                 valid_boxes.append([x1,y1,x2,y2,label])
111 |  
112 |         gt_boxes=np.zeros((len(valid_boxes),5),dtype=np.float32)
113 |         for ii,jj in enumerate(valid_boxes):
114 |             gt_boxes[ii,:]=jj
115 | 
116 |         return gt_boxes
117 | 
118 |     def coco_label_to_label(self, coco_label):
119 |         return self.coco_labels_inverse[coco_label]
120 | 
121 |     def label_to_coco_label(self, label):
122 |         return self.coco_labels[label]
123 | 
124 |     def image_aspect_ratio(self, image_index):
125 |         image_info_info = self.image_ids[image_index]
126 |         image = self.coco[image_info_info[0]].loadImgs(image_info_info[1])[0]
127 |         return float(image['width']) / float(image['height'])
128 | 
129 |     def num_classes(self):
130 |         return 80
131 | 
132 |     def num_gt(self, image_index):
133 |         gt_boxes=self.load_annotations(image_index)
134 |         return len(gt_boxes)
135 | 
136 | class VocDataset(Dataset):
137 |     """Voc dataset."""
138 | 
139 |     def __init__(self, root_dir, set_name=['2007_trainval'], transform=None):
140 |         """
141 |         Args:
142 |             root_dir (string): VOC directory.
143 |             transform (callable, optional): Optional transform to be applied
144 |                 on a sample.
145 |         """
146 |         self.set_name = set_name
147 |         self.devkit_path = root_dir
148 |         self.transform = transform
149 | 
150 |         self.classes = ['__background__',  # always index 0
151 |                         'aeroplane', 'bicycle', 'bird', 'boat',
152 |                         'bottle', 'bus', 'car', 'cat', 'chair',
153 |                         'cow', 'diningtable', 'dog', 'horse',
154 |                         'motorbike', 'person', 'pottedplant',
155 |                         'sheep', 'sofa', 'train', 'tvmonitor']
156 | 
157 | 
158 |         self.image_ids = []
159 |         for set_name_ii in self.set_name:
160 |             self.image_ids.extend([[set_name_ii,ids] for ids in self.load_image_set_index(set_name_ii)])
161 | 
162 |         self.num_images = len(self.image_ids)
163 |         print('num_images:' , self.num_images)
164 | 
165 |         self.config = {'comp_id': 'comp4',
166 |                        'use_diff': False,
167 |                        'min_size': 2}
168 | 
169 |         self.image_size=[]
170 |         for ii in range(len(self.image_ids)):
171 |             height,width,_=self.load_image(ii).shape
172 |             self.image_size.append([height,width])
173 |         
174 |     def load_image_set_index(self, image_set):
175 |         """
176 |         find out which indexes correspond to given image set (train or val)
177 |         :return:
178 |         """
179 |         year,image_set=image_set.split('_')
180 |         data_path=os.path.join(self.devkit_path,'VOC'+year)
181 |         image_set_index_file = os.path.join(data_path, 'ImageSets', 'Main', image_set + '.txt')
182 |         with open(image_set_index_file) as f:
183 |             image_set_index = [x.strip() for x in f.readlines()]
184 |         return image_set_index 
185 | 
186 |     def __len__(self):
187 |         return len(self.image_ids)
188 | 
189 |     def __getitem__(self, idx):
190 | 
191 |         img = self.load_image(idx)
192 |         if len(self.set_name)==1 and self.set_name[0]=='2012_test':
193 |             annot=np.zeros((0,5),dtype=np.float32)
194 |         else:
195 |             annot = self.load_annotations(idx)
196 |         sample = {'img': img, 'annot': annot}
197 |         if self.transform:
198 |             sample = self.transform(sample)
199 | 
200 |         return sample
201 | 
202 |     def load_image(self, image_index):
203 | 
204 |         image_set, index = self.image_ids[image_index]
205 |         year,image_set=image_set.split('_')
206 |         data_path=os.path.join(self.devkit_path,'VOC'+year)
207 |  
208 |         image_path = os.path.join(data_path, 'JPEGImages', index + '.jpg')
209 |         img = cv2.imread(image_path,cv2.IMREAD_COLOR)
210 | 
211 |         return img.astype(np.float32)
212 |     
213 |     def load_annotations(self, image_index):
214 |         """
215 |         for a given index, load image and bounding boxes info from XML file
216 |         :param index: index of a specific image
217 |         :return: record['boxes', 'gt_classes', 'gt_overlaps', 'flipped']
218 |         """
219 |         import xml.etree.ElementTree as ET 
220 | 
221 |         image_set, index = self.image_ids[image_index]
222 |         year,image_set=image_set.split('_')
223 |         data_path=os.path.join(self.devkit_path,'VOC'+year)
224 |         
225 |         height,width=self.image_size[image_index]
226 | 
227 |         filename = os.path.join(data_path, 'Annotations', index + '.xml')
228 |         tree = ET.parse(filename)
229 |         objs = tree.findall('object')
230 |         if not self.config['use_diff']:
231 |             non_diff_objs = [obj for obj in objs if int(obj.find('difficult').text) == 0]
232 |             objs = non_diff_objs 
233 |         num_objs = len(objs)
234 | 
235 |         valid_boxes=[]
236 |         class_to_index = dict(zip(self.classes, range(len(self.classes))))
237 |         # Load object bounding boxes into a data frame.
238 |         for ix, obj in enumerate(objs):
239 |             bbox = obj.find('bndbox')
240 |             # Make pixel indexes 0-based
241 |             x1 = float(bbox.find('xmin').text) - 1
242 |             y1 = float(bbox.find('ymin').text) - 1
243 |             x2 = float(bbox.find('xmax').text) - 1
244 |             y2 = float(bbox.find('ymax').text) - 1
245 |             
246 |             x1=np.minimum(width-1.,np.maximum(0.,x1))
247 |             y1=np.minimum(height-1.,np.maximum(0.,y1))
248 |             x2=np.minimum(width-1.,np.maximum(0.,x2))
249 |             y2=np.minimum(height-1.,np.maximum(0.,y2))
250 | 
251 |             cls = class_to_index[obj.find('name').text.lower().strip()]
252 |             if x2>x1 and y2>y1:
253 |                 valid_boxes.append([x1,y1,x2,y2,cls-1])
254 | 
255 |         gt_boxes=np.zeros((len(valid_boxes),5),dtype=np.float32)
256 |         for ii,jj in enumerate(valid_boxes):
257 |             gt_boxes[ii,:]=jj
258 | 
259 |         return gt_boxes
260 | 
261 |     def evaluate_detections(self, detections):
262 |         """
263 |         top level evaluations
264 |         :param detections: result matrix, [bbox, confidence]
265 |         :return: None
266 |         """
267 |         # make all these folders for results
268 |         image_set=self.set_name[0]
269 |         year,image_set=image_set.split('_')
270 | 
271 |         year_folder = os.path.join('results', 'VOC' + year)
272 |         if not os.path.exists(year_folder):
273 |             os.mkdir(year_folder)
274 |         res_file_folder = os.path.join('results', 'VOC' + year, 'Main')
275 |         if not os.path.exists(res_file_folder):
276 |             os.mkdir(res_file_folder)
277 | 
278 |         self.write_pascal_results(detections)
279 |         self.do_python_eval()
280 | 
281 |     def get_result_file_template(self):
282 |         """
283 |         this is a template
284 |         VOCdevkit/results/VOC2007/Main/<comp_id>_det_test_aeroplane.txt
285 |         :return: a string template
286 |         """
287 |         image_set=self.set_name[0]
288 |         year,image_set=image_set.split('_')
289 | 
290 |         res_file_folder = os.path.join('results', 'VOC' + year, 'Main')
291 |         comp_id = self.config['comp_id']
292 |         filename = comp_id + '_det_' + image_set + '_{:s}.txt'
293 |         path = os.path.join(res_file_folder, filename)
294 |         return path
295 | 
296 |     def write_pascal_results(self, all_boxes):
297 |         """
298 |         write results files in pascal devkit path
299 |         :param all_boxes: boxes to be processed [bbox, confidence]
300 |         :return: None
301 |         """
302 |         for cls_ind, cls in enumerate(self.classes):
303 |             if cls == '__background__':
304 |                 continue
305 |             print('Writing {} VOC results file'.format(cls))
306 |             filename = self.get_result_file_template().format(cls)
307 |             with open(filename, 'wt') as f:
308 |                 for im_ind, set_index in enumerate(self.image_ids):
309 |                     _,index=set_index
310 |                     dets = all_boxes[cls_ind][im_ind]
311 |                     if len(dets) == 0:
312 |                         continue
313 |                     # the VOCdevkit expects 1-based indices
314 |                     for k in range(dets.shape[0]):
315 |                         f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
316 |                                 format(index, dets[k, -1],
317 |                                        dets[k, 0] + 1, dets[k, 1] + 1, dets[k, 2] + 1, dets[k, 3] + 1))
318 | 
319 |     def do_python_eval(self):
320 |         """
321 |         python evaluation wrapper
322 |         :return: None
323 |         """
324 |         image_set=self.set_name[0]
325 |         year,image_set=image_set.split('_')
326 |         data_path=os.path.join(self.devkit_path,'VOC'+year)
327 | 
328 |         annopath = os.path.join(data_path, 'Annotations', '{0!s}.xml')
329 |         imageset_file = os.path.join(data_path, 'ImageSets', 'Main', image_set + '.txt')
330 | 
331 |         aps1 = [] 
332 |         # The PASCAL VOC metric changed in 2010
333 |         use_07_metric = True if int(year) < 2010 else False
334 |         print('VOC07 metric? ' + ('Y' if use_07_metric else 'No'))
335 |         for cls_ind, cls in enumerate(self.classes):
336 |             if cls == '__background__':
337 |                 continue
338 |             filename = self.get_result_file_template().format(cls)
339 |             rec, prec, ap = voc_eval(filename, annopath, imageset_file, cls,
340 |                                      ovthresh=0.5, use_07_metric=use_07_metric)
341 |             aps1 += [ap]
342 |             print('AP for {} = {:.4f}'.format(cls, ap))
343 | 
344 |         print('Mean AP = {:.4f}'.format(np.mean(aps1)))
345 | 
346 |     def image_aspect_ratio(self, image_index):
347 | 
348 |         height,width=self.image_size[image_index]
349 | 
350 |         return float(width) / float(height)
351 | 
352 |     def num_classes(self):
353 |         return 20
354 | 
355 |     def num_gt(self, image_index):
356 |         gt_boxes=self.load_annotations(image_index)
357 |         return len(gt_boxes)
358 | 
359 | 
360 | def collater(data):
361 | 
362 |     imgs = [s['img'] for s in data]
363 |     annots = [s['annot'] for s in data]
364 |         
365 |     widths = [int(s.shape[0]) for s in imgs]
366 |     heights = [int(s.shape[1]) for s in imgs]
367 |     batch_size = len(imgs)
368 | 
369 |     max_width = np.array(widths).max()
370 |     max_height = np.array(heights).max()
371 | 
372 |     padded_imgs = torch.zeros(batch_size, max_width, max_height, 3)
373 | 
374 |     for i in range(batch_size):
375 |         img = imgs[i]
376 |         padded_imgs[i, :int(img.shape[0]), :int(img.shape[1]), :] = img
377 | 
378 |     max_num_annots = max(annot.shape[0] for annot in annots)
379 |     
380 |     if max_num_annots > 0:
381 | 
382 |         annot_padded = torch.ones((len(annots), max_num_annots, 5)) * -1
383 | 
384 |         if max_num_annots > 0:
385 |             for idx, annot in enumerate(annots):
386 |                 if annot.shape[0] > 0:
387 |                     annot_padded[idx, :annot.shape[0], :] = annot
388 |     else:
389 |         annot_padded = torch.ones((len(annots), 1, 5)) * -1
390 | 
391 | 
392 |     padded_imgs = padded_imgs.permute(0, 3, 1, 2)
393 | 
394 |     return {'img': padded_imgs, 'annot': annot_padded}
395 | 
396 | class Resizer(object):
397 |     """Convert ndarrays in sample to Tensors."""
398 | 
399 |     def __call__(self, sample):
400 | 
401 |         min_side=config.test_img_size[0]
402 |         max_side=config.test_img_size[1]
403 | 
404 |         image, annots = sample['img'], sample['annot']
405 | 
406 |         rows, cols, cns = image.shape
407 | 
408 |         smallest_side = min(rows, cols)
409 | 
410 |         # rescale the image so the smallest side is min_side
411 |         scale = float(min_side) / float(smallest_side)
412 | 
413 |         # check if the largest side is now greater than max_side, which can happen
414 |         # when images have a large aspect ratio
415 |         largest_side = max(rows, cols)
416 | 
417 |         if largest_side * scale > max_side:
418 |             scale = float(max_side) / float(largest_side)
419 | 
420 |         # resize the image with the computed scale
421 |         image = cv2.resize(image, None, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR)
422 |         rows2, cols2, cns = image.shape
423 | 
424 |         pad_w = 32 - rows2%32
425 |         pad_h = 32 - cols2%32
426 | 
427 |         if pad_w==32:
428 |             pad_w=0
429 |         if pad_h==32:
430 |             pad_h=0
431 | 
432 |         new_image = np.zeros((rows2 + pad_w, cols2 + pad_h, cns)).astype(np.float32)
433 |         new_image[:rows2, :cols2, :] = image.astype(np.float32)
434 | 
435 |         annots_wh = annots[:,2:4]-annots[:,0:2]+1.0
436 |         annots[:,0:2] = annots[:,0:2]/np.array([cols,rows])*np.array([cols2,rows2])
437 |         annots[:,2:4]=annots[:,0:2]+annots_wh/np.array([cols,rows])*np.array([cols2,rows2])-1.0
438 | 
439 |         return {'img': torch.from_numpy(new_image), 'annot': torch.from_numpy(annots), 'scale': scale, 'im_info': torch.tensor([[rows2,cols2]])}
440 | 
441 | 
442 | class Normalizer(object):
443 | 
444 |     def __init__(self): 
445 |         self.mean = np.array([[[102.9801, 115.9465, 122.7717]]])
446 |         self.std = np.array([[[0.229, 0.224, 0.225]]])
447 | 
448 |     def __call__(self, sample):
449 | 
450 |         image, annots = sample['img'], sample['annot']
451 | 
452 |         return {'img':((image.astype(np.float32)-self.mean)), 'annot': annots}
453 | 
454 | 
455 | class AspectRatioBasedSampler(Sampler):
456 | 
457 |     def __init__(self, data_source, batch_size):
458 |         self.data_source = data_source
459 |         self.batch_size = batch_size
460 |         self.groups = self.group_images()
461 | 
462 |     def __iter__(self):
463 |         self.groups = self.group_images()
464 |         for group in self.groups:
465 |             yield group
466 | 
467 |     def __len__(self): 
468 |         return (len(self.data_source) + self.batch_size - 1) // self.batch_size
469 | 
470 |     def group_images(self):
471 | 
472 |         img_filter=True
473 |         if img_filter:
474 |             valid_img=[]
475 |             for ii in range(len(self.data_source)):
476 |                 if self.data_source.num_gt(ii)>0:
477 |                     valid_img.append(ii)
478 |         else:
479 |             valid_img=range(len(self.data_source))
480 | 
481 |         print('Shuffle')
482 |         print('images_num:'+str(len(valid_img)))
483 | 
484 |         aspect_ratio_grouping=False
485 |         if aspect_ratio_grouping:
486 |             aspect_ratios=[self.data_source.image_aspect_ratio(ii) for ii in valid_img]
487 |             aspect_ratios=np.array(aspect_ratios)
488 |             g1=(aspect_ratios>=1)
489 |             g2=np.logical_not(g1)
490 |             g1_inds=np.where(g1)[0]
491 |             g2_inds=np.where(g2)[0]
492 | 
493 |             pad_g1=self.batch_size-len(g1_inds)%self.batch_size
494 |             pad_g2=self.batch_size-len(g2_inds)%self.batch_size
495 |             if pad_g1==self.batch_size:
496 |                 pad_g1=0
497 |             if pad_g2==self.batch_size:
498 |                 pad_g2=0
499 |             g1_inds=np.hstack([g1_inds,g1_inds[:pad_g1]])
500 |             g2_inds=np.hstack([g2_inds,g2_inds[:pad_g2]])
501 |             random.shuffle(g1_inds)
502 |             random.shuffle(g2_inds)
503 |             inds=np.hstack((g1_inds,g2_inds))
504 | 
505 |             inds=np.reshape(inds[:],(-1,self.batch_size))
506 |             row_perm=np.arange(inds.shape[0])
507 |             random.shuffle(row_perm)
508 |             inds=np.reshape(inds[row_perm,:],(-1,))
509 | 
510 |         else:
511 |             inds=np.arange(len(valid_img))
512 |             random.shuffle(inds)
513 |             pad=self.batch_size-len(inds)%self.batch_size
514 |             if pad==self.batch_size:
515 |                 pad=0
516 |             inds=np.hstack([inds,inds[:pad]])
517 |             random.shuffle(inds)
518 | 
519 |         return [[valid_img[inds[x]] for x in range(i,i+self.batch_size)] for i in range(0,len(inds),self.batch_size)]
520 | 


--------------------------------------------------------------------------------
/lib/model/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/lib/model/anchors.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn as nn
  4 | from .. import config
  5 | 
  6 | class Anchors(nn.Module):
  7 |     def __init__(self, pyramid_levels=None, strides=None, sizes=None, ratios=None, scales=None):
  8 |         super(Anchors, self).__init__()
  9 | 
 10 |         if pyramid_levels is None:
 11 |             self.pyramid_levels = [3, 4, 5, 6, 7]
 12 |         if strides is None:
 13 |             self.strides = [2 ** x for x in self.pyramid_levels] 
 14 | 
 15 |         feat_stride=[8,16,32,64,128]
 16 |         self.base_anchors = [generate_anchors(base_size=feat_stride[i], ratios=config.anchor_ratios, scales=config.anchor_scales*4) for i in range(5)]
 17 | 
 18 |     def forward(self, image):
 19 |         
 20 |         image_shape = image.shape[2:]
 21 |         image_shape = np.array(image_shape)
 22 |         image_shapes = [(image_shape + 2 ** x - 1) // (2 ** x) for x in self.pyramid_levels]
 23 | 
 24 |         # compute anchors over all pyramid levels
 25 |         all_anchors = []
 26 | 
 27 |         for idx, p in enumerate(self.pyramid_levels):
 28 |             anchors         = self.base_anchors[idx]
 29 |             shifted_anchors = shift(image_shapes[idx], self.strides[idx], anchors)
 30 |             all_anchors.append(torch.from_numpy(np.expand_dims(shifted_anchors,axis=0).astype(np.float64)).cuda())
 31 | 
 32 |         return all_anchors
 33 | 
 34 | 
 35 | def shift(shape, stride, anchors):
 36 |     shift_x = (np.arange(0, shape[1])) * stride
 37 |     shift_y = (np.arange(0, shape[0])) * stride
 38 | 
 39 |     shift_x, shift_y = np.meshgrid(shift_x, shift_y)
 40 | 
 41 |     shifts = np.vstack((
 42 |         shift_x.ravel(), shift_y.ravel(),
 43 |         shift_x.ravel(), shift_y.ravel()
 44 |     )).transpose()
 45 | 
 46 |     # add A anchors (1, A, 4) to
 47 |     # cell K shifts (K, 1, 4) to get
 48 |     # shift anchors (K, A, 4)
 49 |     # reshape to (K*A, 4) shifted anchors
 50 |     A = anchors.shape[0]
 51 |     K = shifts.shape[0]
 52 |     all_anchors = (anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
 53 |     all_anchors = all_anchors.reshape((K * A, 4))
 54 | 
 55 |     return all_anchors
 56 | 
 57 | def generate_anchors(base_size, ratios, scales):
 58 |     """
 59 |     Generate anchor (reference) windows by enumerating aspect ratios X
 60 |     scales wrt a reference (0, 0, 15, 15) window.
 61 |     """
 62 | 
 63 |     base_anchor = np.array([1, 1, base_size, base_size]) - 1
 64 |     ratio_anchors = _ratio_enum(base_anchor, ratios)
 65 |     anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
 66 |                          for i in range(ratio_anchors.shape[0])]) 
 67 |     return anchors
 68 | 
 69 | 
 70 | def _whctrs(anchor):
 71 |     """
 72 |     Return width, height, x center, and y center for an anchor (window).
 73 |     """
 74 | 
 75 |     w = anchor[2] - anchor[0] + 1
 76 |     h = anchor[3] - anchor[1] + 1
 77 |     x_ctr = anchor[0] + 0.5 * (w - 1)
 78 |     y_ctr = anchor[1] + 0.5 * (h - 1)
 79 |     return w, h, x_ctr, y_ctr
 80 | 
 81 | 
 82 | def _mkanchors(ws, hs, x_ctr, y_ctr):
 83 |     """
 84 |     Given a vector of widths (ws) and heights (hs) around a center
 85 |     (x_ctr, y_ctr), output a set of anchors (windows).
 86 |     """
 87 | 
 88 |     ws = ws[:, np.newaxis]
 89 |     hs = hs[:, np.newaxis]
 90 |     anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
 91 |                          y_ctr - 0.5 * (hs - 1),
 92 |                          x_ctr + 0.5 * (ws - 1),
 93 |                          y_ctr + 0.5 * (hs - 1)))
 94 |     return anchors
 95 | 
 96 | 
 97 | def _ratio_enum(anchor, ratios):
 98 |     """
 99 |     Enumerate a set of anchors for each aspect ratio wrt an anchor.
100 |     """
101 | 
102 |     w, h, x_ctr, y_ctr = _whctrs(anchor)
103 |     size = w * h
104 |     size_ratios = size / ratios
105 |     ws = (np.sqrt(size_ratios))
106 |     hs = (ws * ratios)
107 |     anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
108 |     return anchors
109 | 
110 | 
111 | def _scale_enum(anchor, scales):
112 |     """
113 |     Enumerate a set of anchors for each scale wrt an anchor.
114 |     """
115 | 
116 |     w, h, x_ctr, y_ctr = _whctrs(anchor)
117 |     ws = w * scales
118 |     hs = h * scales
119 |     anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
120 |     return anchors
121 | 


--------------------------------------------------------------------------------
/lib/model/aploss.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn as nn
  4 | from .. import config
  5 | from ..util.calc_iou import calc_iou
  6 | 
  7 | class APLoss(torch.autograd.Function):
  8 |     @staticmethod
  9 |     def forward(ctx, classifications, regressions, anchors, annotations):
 10 |  
 11 |         batch_size = classifications.shape[0]
 12 |         regression_losses = []
 13 | 
 14 |         regression_grads=torch.zeros(regressions.shape).cuda()
 15 |         p_num=torch.zeros(1).cuda()
 16 |         labels_b=[]
 17 | 
 18 |         anchor = anchors[0, :, :].type(torch.cuda.FloatTensor)
 19 | 
 20 |         anchor_widths  = anchor[:, 2] - anchor[:, 0]+1.0
 21 |         anchor_heights = anchor[:, 3] - anchor[:, 1]+1.0
 22 |         anchor_ctr_x   = anchor[:, 0] + 0.5 * (anchor_widths-1.0)
 23 |         anchor_ctr_y   = anchor[:, 1] + 0.5 * (anchor_heights-1.0)
 24 | 
 25 |         for j in range(batch_size):
 26 | 
 27 |             classification = classifications[j, :, :]
 28 |             regression = regressions[j, :, :]
 29 | 
 30 |             bbox_annotation = annotations[j, :, :]
 31 |             bbox_annotation = bbox_annotation[bbox_annotation[:, 4] != -1]
 32 | 
 33 |             if bbox_annotation.shape[0] == 0:
 34 |                 regression_losses.append(torch.tensor(0).float().cuda())
 35 |                 labels_b.append(torch.zeros(classification.shape).cuda())
 36 |                 continue
 37 | 
 38 |             IoU = calc_iou(anchors[0, :, :], bbox_annotation[:, :4]) # num_anchors x num_annotations
 39 | 
 40 |             IoU_max, IoU_argmax = torch.max(IoU, dim=1) # num_anchors x 1
 41 | 
 42 |             # compute the loss for classification
 43 |             targets = torch.ones(classification.shape) * -1
 44 |             targets = targets.cuda()
 45 | 
 46 |             ######
 47 |             gt_IoU_max, gt_IoU_argmax = torch.max(IoU, dim=0)
 48 |             gt_IoU_argmax=torch.where(IoU==gt_IoU_max)[0]
 49 |             positive_indices = torch.ge(torch.zeros(IoU_max.shape).cuda(),1)
 50 |             positive_indices[gt_IoU_argmax.long()] = True
 51 |             ######
 52 | 
 53 |             positive_indices = positive_indices | torch.ge(IoU_max, 0.5)
 54 |             negative_indices = torch.lt(IoU_max, 0.4)
 55 | 
 56 |             p_num+=positive_indices.sum()
 57 | 
 58 |             assigned_annotations = bbox_annotation[IoU_argmax, :]
 59 | 
 60 |             targets[negative_indices, :] = 0
 61 |             targets[positive_indices, :] = 0
 62 |             targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1
 63 |             labels_b.append(targets)
 64 | 
 65 |             # compute the loss for regression
 66 |             if positive_indices.sum() > 0:
 67 | 
 68 |                 assigned_annotations = assigned_annotations[positive_indices, :]
 69 | 
 70 |                 anchor_widths_pi = anchor_widths[positive_indices]
 71 |                 anchor_heights_pi = anchor_heights[positive_indices]
 72 |                 anchor_ctr_x_pi = anchor_ctr_x[positive_indices]
 73 |                 anchor_ctr_y_pi = anchor_ctr_y[positive_indices]
 74 | 
 75 |                 gt_widths  = assigned_annotations[:, 2] - assigned_annotations[:, 0]+1.0
 76 |                 gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1]+1.0
 77 |                 gt_ctr_x   = assigned_annotations[:, 0] + 0.5 * (gt_widths-1.0)
 78 |                 gt_ctr_y   = assigned_annotations[:, 1] + 0.5 * (gt_heights-1.0)
 79 | 
 80 |                 # clip widths to 1
 81 |                 gt_widths  = torch.clamp(gt_widths, min=1)
 82 |                 gt_heights = torch.clamp(gt_heights, min=1)
 83 | 
 84 |                 targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi
 85 |                 targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi
 86 |                 targets_dw = torch.log(gt_widths / anchor_widths_pi)
 87 |                 targets_dh = torch.log(gt_heights / anchor_heights_pi)
 88 | 
 89 |                 targets2 = torch.stack((targets_dx, targets_dy, targets_dw, targets_dh))
 90 |                 targets2 = targets2.t()
 91 | 
 92 |                 targets2 = targets2/torch.Tensor([[0.1, 0.1, 0.2, 0.2]]).cuda()
 93 | 
 94 |                 #negative_indices = ~ positive_indices
 95 | 
 96 |                 regression_diff = regression[positive_indices, :]-targets2
 97 |                 regression_diff_abs= torch.abs(regression_diff)
 98 | 
 99 |                 regression_loss = torch.where(
100 |                     torch.le(regression_diff_abs, 1.0 / 1.0),
101 |                     0.5 * 1.0 * torch.pow(regression_diff_abs, 2),
102 |                     regression_diff_abs - 0.5 / 1.0
103 |                 )
104 |                 regression_losses.append(regression_loss.sum())
105 | 
106 | 
107 |                 regression_grad=torch.where(
108 |                     torch.le(regression_diff_abs,1.0/1.0),
109 |                     1.0*regression_diff,
110 |                     torch.sign(regression_diff))
111 |                 regression_grads[j,positive_indices,:]=regression_grad
112 | 
113 |             else:
114 |                 regression_losses.append(torch.tensor(0).float().cuda())
115 | 
116 |         p_num=torch.clamp(p_num,min=1)
117 |         regression_grads/=(4*p_num)
118 | 
119 |         ########################AP-LOSS##########################
120 |         labels_b=torch.stack(labels_b)
121 |         classification_grads,classification_losses=AP_loss(classifications,labels_b)
122 |         #########################################################
123 | 
124 |         ctx.save_for_backward(classification_grads,regression_grads)
125 |         return classification_losses, torch.stack(regression_losses).sum(dim=0, keepdim=True)/p_num
126 | 
127 |     @staticmethod
128 |     def backward(ctx, out_grad1, out_grad2):
129 |         g1,g2=ctx.saved_tensors
130 |         return g1*out_grad1,g2*out_grad2,None,None
131 | 
132 | 
133 | def AP_loss(logits,targets):
134 |     
135 |     delta=1.0
136 | 
137 |     grad=torch.zeros(logits.shape).cuda()
138 |     metric=torch.zeros(1).cuda()
139 | 
140 |     if torch.max(targets)<=0:
141 |         return grad, metric
142 |   
143 |     labels_p=(targets==1)
144 |     fg_logits=logits[labels_p]
145 |     threshold_logit=torch.min(fg_logits)-delta
146 | 
147 |     ######## Ignore those negative j that satisfy (L_{ij}=0 for all positive i), to accelerate the AP-loss computation.
148 |     valid_labels_n=((targets==0)&(logits>=threshold_logit))
149 |     valid_bg_logits=logits[valid_labels_n] 
150 |     valid_bg_grad=torch.zeros(len(valid_bg_logits)).cuda()
151 |     ########
152 | 
153 |     fg_num=len(fg_logits)
154 |     prec=torch.zeros(fg_num).cuda()
155 |     order=torch.argsort(fg_logits)
156 |     max_prec=0
157 | 
158 |     for ii in order:
159 |         tmp1=fg_logits-fg_logits[ii] 
160 |         tmp1=torch.clamp(tmp1/(2*delta)+0.5,min=0,max=1)
161 |         tmp2=valid_bg_logits-fg_logits[ii]
162 |         tmp2=torch.clamp(tmp2/(2*delta)+0.5,min=0,max=1)
163 |         a=torch.sum(tmp1)+0.5
164 |         b=torch.sum(tmp2)
165 |         tmp2/=(a+b)
166 |         current_prec=a/(a+b)
167 |         if (max_prec<=current_prec):
168 |             max_prec=current_prec
169 |         else:
170 |             tmp2*=((1-max_prec)/(1-current_prec))
171 |         valid_bg_grad+=tmp2
172 |         prec[ii]=max_prec 
173 | 
174 |     grad[valid_labels_n]=valid_bg_grad
175 |     grad[labels_p]=-(1-prec) 
176 | 
177 |     fg_num=max(fg_num,1)
178 | 
179 |     grad /= (fg_num)
180 |     
181 |     metric=torch.sum(prec,dim=0,keepdim=True)/fg_num
182 | 
183 |     return grad, 1-metric
184 | 


--------------------------------------------------------------------------------
/lib/model/model.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch
  3 | import math
  4 | import time
  5 | from ..util.utils import BasicBlock, Bottleneck, BBoxTransform, ClipBoxes
  6 | from anchors import Anchors
  7 | from aploss import APLoss
  8 | 
  9 | from .. import config
 10 | import torchvision
 11 | 
 12 | class PyramidFeatures(nn.Module):
 13 |     def __init__(self, C3_size, C4_size, C5_size, feature_size=256):
 14 |         super(PyramidFeatures, self).__init__()
 15 |         
 16 |         # upsample C5 to get P5 from the FPN paper
 17 |         self.P5_1           = nn.Conv2d(C5_size, feature_size, kernel_size=1, stride=1, padding=0)
 18 |         self.P5_upsampled   = nn.Upsample(scale_factor=2, mode='nearest')
 19 |         self.P5_2           = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)
 20 | 
 21 |         # add P5 elementwise to C4
 22 |         self.P4_1           = nn.Conv2d(C4_size, feature_size, kernel_size=1, stride=1, padding=0)
 23 |         self.P4_upsampled   = nn.Upsample(scale_factor=2, mode='nearest')
 24 |         self.P4_2           = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)
 25 | 
 26 |         # add P4 elementwise to C3
 27 |         self.P3_1 = nn.Conv2d(C3_size, feature_size, kernel_size=1, stride=1, padding=0)
 28 |         self.P3_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1)
 29 | 
 30 |         # "P6 is obtained via a 3x3 stride-2 conv on C5"
 31 |         self.P6 = nn.Conv2d(C5_size, feature_size, kernel_size=3, stride=2, padding=1)
 32 | 
 33 |         # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6"
 34 |         self.P7_1 = nn.ReLU()
 35 |         self.P7_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=2, padding=1)
 36 | 
 37 |     def forward(self, inputs):
 38 | 
 39 |         C3, C4, C5 = inputs
 40 | 
 41 |         P5_x = self.P5_1(C5)
 42 |         P5_upsampled_x = self.P5_upsampled(P5_x)
 43 |         P5_x = self.P5_2(P5_x)
 44 |         
 45 |         P4_x = self.P4_1(C4)
 46 |         P4_x = P5_upsampled_x + P4_x
 47 |         P4_upsampled_x = self.P4_upsampled(P4_x)
 48 |         P4_x = self.P4_2(P4_x)
 49 | 
 50 |         P3_x = self.P3_1(C3)
 51 |         P3_x = P3_x + P4_upsampled_x
 52 |         P3_x = self.P3_2(P3_x)
 53 | 
 54 |         P6_x = self.P6(C5)
 55 | 
 56 |         P7_x = self.P7_1(P6_x)
 57 |         P7_x = self.P7_2(P7_x)
 58 | 
 59 |         return [P3_x, P4_x, P5_x, P6_x, P7_x]
 60 | 
 61 | 
 62 | class RegressionModel(nn.Module):
 63 |     def __init__(self, num_features_in, num_anchors=9, feature_size=256):
 64 |         super(RegressionModel, self).__init__()
 65 |         
 66 |         self.conv1 = nn.Conv2d(num_features_in, feature_size, kernel_size=3, padding=1)
 67 |         self.act1 = nn.ReLU()
 68 | 
 69 |         self.conv2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
 70 |         self.act2 = nn.ReLU()
 71 | 
 72 |         self.conv3 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
 73 |         self.act3 = nn.ReLU()
 74 | 
 75 |         self.conv4 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
 76 |         self.act4 = nn.ReLU()
 77 | 
 78 |         self.output = nn.Conv2d(feature_size, num_anchors*4, kernel_size=3, padding=1)
 79 | 
 80 |     def forward(self, x):
 81 | 
 82 |         out = self.conv1(x)
 83 |         out = self.act1(out)
 84 | 
 85 |         out = self.conv2(out)
 86 |         out = self.act2(out)
 87 | 
 88 |         out = self.conv3(out)
 89 |         out = self.act3(out)
 90 | 
 91 |         out = self.conv4(out)
 92 |         out = self.act4(out)
 93 | 
 94 |         out = self.output(out)
 95 | 
 96 |         # out is B x C x W x H, with C = 4*num_anchors
 97 |         out = out.permute(0, 2, 3, 1)
 98 | 
 99 |         return out.contiguous().view(out.shape[0], -1, 4)
100 | 
101 | class ClassificationModel(nn.Module):
102 |     def __init__(self, num_features_in, num_anchors=9, num_classes=80, prior=0.01, feature_size=256):
103 |         super(ClassificationModel, self).__init__()
104 | 
105 |         self.num_classes = num_classes
106 |         self.num_anchors = num_anchors
107 |         
108 |         self.conv1 = nn.Conv2d(num_features_in, feature_size, kernel_size=3, padding=1)
109 |         self.act1 = nn.ReLU()
110 | 
111 |         self.conv2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
112 |         self.act2 = nn.ReLU()
113 | 
114 |         self.conv3 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
115 |         self.act3 = nn.ReLU()
116 | 
117 |         self.conv4 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1)
118 |         self.act4 = nn.ReLU()
119 | 
120 |         self.output = nn.Conv2d(feature_size, num_anchors*num_classes, kernel_size=3, padding=1)
121 |         #self.output_act = nn.Sigmoid()
122 | 
123 |     def forward(self, x):
124 | 
125 |         out = self.conv1(x)
126 |         out = self.act1(out)
127 | 
128 |         out = self.conv2(out)
129 |         out = self.act2(out)
130 | 
131 |         out = self.conv3(out)
132 |         out = self.act3(out)
133 | 
134 |         out = self.conv4(out)
135 |         out = self.act4(out)
136 | 
137 |         out = self.output(out)
138 |         #out = self.output_act(out)
139 | 
140 |         # out is B x C x W x H, with C = n_classes + n_anchors
141 |         out1 = out.permute(0, 2, 3, 1)
142 | 
143 |         batch_size, width, height, channels = out1.shape
144 | 
145 |         out2 = out1.view(batch_size, width, height, self.num_anchors, self.num_classes)
146 | 
147 |         return out2.contiguous().view(x.shape[0], -1, self.num_classes)
148 | 
149 | class ResNet(nn.Module):
150 | 
151 |     def __init__(self, num_classes, block, layers, conv1_bias):
152 |         self.inplanes = 64
153 |         super(ResNet, self).__init__()
154 |         self.num_classes=num_classes
155 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=conv1_bias)
156 |         
157 |         for ii in self.conv1.parameters():
158 |             ii.requires_grad=False
159 |         
160 |         self.bn1 = nn.BatchNorm2d(64)
161 |         self.relu = nn.ReLU(inplace=True)
162 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
163 |         self.layer1 = self._make_layer(block, 64, layers[0])
164 |         
165 |         for ii in self.layer1.parameters():
166 |             ii.requires_grad=False
167 |         
168 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 
169 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
170 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
171 | 
172 |         if block == BasicBlock:
173 |             fpn_sizes = [self.layer2[layers[1]-1].conv2.out_channels, self.layer3[layers[2]-1].conv2.out_channels, self.layer4[layers[3]-1].conv2.out_channels]
174 |         elif block == Bottleneck:
175 |             fpn_sizes = [self.layer2[layers[1]-1].conv3.out_channels, self.layer3[layers[2]-1].conv3.out_channels, self.layer4[layers[3]-1].conv3.out_channels]
176 | 
177 |         self.fpn = PyramidFeatures(fpn_sizes[0], fpn_sizes[1], fpn_sizes[2])
178 | 
179 |         self.regressionModel = RegressionModel(256, num_anchors=config.num_anchors)
180 |         self.classificationModel = ClassificationModel(256, num_classes=num_classes, num_anchors=config.num_anchors)
181 | 
182 |         self.anchors = Anchors()
183 | 
184 |         self.regressBoxes = BBoxTransform()
185 | 
186 |         self.clipBoxes = ClipBoxes()
187 |         
188 |         self.aploss= APLoss()
189 | 
190 |         for m in self.modules():
191 |             if isinstance(m, nn.Conv2d):
192 |                 m.weight.data.normal_(0,0.01)
193 |                 if m.bias is not None:
194 |                     m.bias.data.fill_(0)
195 |             elif isinstance(m, nn.BatchNorm2d):
196 |                 m.weight.data.fill_(1)
197 |                 m.bias.data.zero_()
198 | 
199 |         prior = 0.01
200 |         
201 |         #self.classificationModel.output.weight.data.fill_(0)
202 |         self.classificationModel.output.weight.data.normal_(0,0.01)
203 |         self.classificationModel.output.bias.data.fill_(-math.log((1.0-prior)/prior))
204 | 
205 |         #self.regressionModel.output.weight.data.fill_(0)
206 |         self.regressionModel.output.weight.data.normal_(0,0.001)
207 |         self.regressionModel.output.bias.data.fill_(0)
208 | 
209 |         self.freeze_bn()
210 | 
211 |     def _make_layer(self, block, planes, blocks, stride=1):
212 |         downsample = None
213 |         if stride != 1 or self.inplanes != planes * block.expansion:
214 |             downsample = nn.Sequential(
215 |                 nn.Conv2d(self.inplanes, planes * block.expansion,
216 |                           kernel_size=1, stride=stride, bias=False),
217 |                 nn.BatchNorm2d(planes * block.expansion),
218 |             )
219 | 
220 |         layers = []
221 |         layers.append(block(self.inplanes, planes, stride, downsample))
222 |         self.inplanes = planes * block.expansion
223 |         for i in range(1, blocks):
224 |             layers.append(block(self.inplanes, planes))
225 | 
226 |         return nn.Sequential(*layers)
227 | 
228 |     def freeze_bn(self):
229 |         '''Freeze BatchNorm layers.'''
230 |         for layer in self.modules():
231 |             if isinstance(layer, nn.BatchNorm2d):
232 |                 layer.eval()
233 |                 for ii in layer.parameters():
234 |                     ii.requires_grad=False 
235 | 
236 |     def forward(self, inputs):
237 | 
238 |         if self.training:
239 |             img_batch, annotations = inputs
240 |         else:
241 |             img_batch, im_info = inputs
242 |  
243 |         x = self.conv1(img_batch)
244 |         x = self.bn1(x)
245 |         x = self.relu(x)
246 |         x = self.maxpool(x)
247 | 
248 |         x1 = self.layer1(x)
249 |         x2 = self.layer2(x1)
250 |         x3 = self.layer3(x2)
251 |         x4 = self.layer4(x3)
252 | 
253 |         features = self.fpn([x2, x3, x4])
254 | 
255 |         regressions = [self.regressionModel(feature) for feature in features]
256 | 
257 |         classifications = [self.classificationModel(feature) for feature in features]
258 | 
259 |         anchors = self.anchors(img_batch)
260 | 
261 |         if self.training:
262 |             regressions=torch.cat(regressions, dim=1)
263 |             classifications=torch.cat(classifications, dim=1)
264 |             anchors=torch.cat(anchors, dim=1)
265 |             return self.aploss.apply(classifications, regressions, anchors, annotations)
266 |         else:
267 |             box_all=[]
268 |             cls_all=[]
269 |             score_all=[]
270 |             for ii, anchor in enumerate(anchors):
271 | 
272 |                 classification=classifications[ii].view(-1)
273 |                 regression=regressions[ii]
274 | 
275 |                 classification=torch.sigmoid(classification)
276 | 
277 |                 ###filter
278 |                 num_topk=min(1000,classification.size(0))
279 |                 ordered_score,ordered_idx=classification.sort(descending=True)
280 |                 ordered_score=ordered_score[:num_topk]
281 |                 ordered_idx=ordered_idx[:num_topk]
282 |  
283 |                 if ii<4:
284 |                     score_th=0.01
285 |                 else:
286 |                     score_th=0
287 |                 keep_idx=(ordered_score>score_th)
288 |                 ordered_score=ordered_score[keep_idx]
289 |                 ordered_idx=ordered_idx[keep_idx]
290 | 
291 |                 anchor_idx = ordered_idx // self.num_classes
292 |                 cls_idx = ordered_idx % self.num_classes
293 | 
294 |                 transformed_anchor = self.regressBoxes(anchor[:,anchor_idx,:], regression[:,anchor_idx,:])
295 |                 transformed_anchor = self.clipBoxes(transformed_anchor, im_info[0])
296 |                 
297 |                 box_all.append(transformed_anchor[0])
298 |                 cls_all.append(cls_idx)
299 |                 score_all.append(ordered_score)
300 | 
301 |             box_all=torch.cat(box_all,dim=0)
302 |             cls_all=torch.cat(cls_all,dim=0)
303 |             score_all=torch.cat(score_all,dim=0)
304 | 
305 |             keep=torchvision.ops.boxes.batched_nms(box_all,score_all,cls_all,0.5)
306 |             keep=keep[:100]
307 |             return [score_all[keep], cls_all[keep], box_all[keep, :]]
308 |  
309 | def resnet50(num_classes, pretrained=False):
310 | 
311 |     model = ResNet(num_classes, Bottleneck, [3, 4, 6, 3], conv1_bias=True)
312 |     if pretrained:
313 |         model.load_state_dict(torch.load('models/resnet50-pytorch.pth'), strict=False)
314 |     return model
315 | 
316 | def resnet101(num_classes, pretrained=False):
317 |  
318 |     model = ResNet(num_classes, Bottleneck, [3, 4, 23, 3], conv1_bias=False)
319 |     if pretrained:
320 |         model.load_state_dict(torch.load('models/resnet101-pytorch.pth'), strict=False)
321 |     return model
322 | 


--------------------------------------------------------------------------------
/lib/util/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/lib/util/calc_iou.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | import torch.nn as nn
 4 | 
 5 | def calc_iou(a, b):
 6 | 
 7 |     a=a.type(torch.cuda.DoubleTensor)
 8 |     b=b.type(torch.cuda.DoubleTensor)
 9 | 
10 |     area = (b[:, 2] - b[:, 0]+1) * (b[:, 3] - b[:, 1]+1)
11 | 
12 |     iw = torch.min(torch.unsqueeze(a[:, 2], dim=1), b[:, 2]) - torch.max(torch.unsqueeze(a[:, 0], 1), b[:, 0])+1
13 |     ih = torch.min(torch.unsqueeze(a[:, 3], dim=1), b[:, 3]) - torch.max(torch.unsqueeze(a[:, 1], 1), b[:, 1])+1
14 | 
15 |     iw = torch.clamp(iw, min=0)
16 |     ih = torch.clamp(ih, min=0)
17 | 
18 |     ua = torch.unsqueeze((a[:, 2] - a[:, 0]+1) * (a[:, 3] - a[:, 1]+1), dim=1) + area - iw * ih
19 | 
20 |     #ua = torch.clamp(ua, min=1e-8)
21 | 
22 |     intersection = iw * ih
23 | 
24 |     IoU = intersection / ua
25 | 
26 |     return IoU
27 | 


--------------------------------------------------------------------------------
/lib/util/coco_eval.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | 
 3 | from pycocotools.coco import COCO
 4 | from pycocotools.cocoeval import COCOeval
 5 | 
 6 | import numpy as np
 7 | import json
 8 | import os
 9 | 
10 | import torch
11 | 
12 | def evaluate_coco(dataset, model, threshold=0.01):
13 |     
14 |     model.eval()
15 |     
16 |     with torch.no_grad():
17 | 
18 |         # start collecting results
19 |         results = []
20 |         image_ids = []
21 | 
22 |         for index in range(len(dataset)):
23 |             data = dataset[index]
24 |             scale = data['scale']
25 | 
26 |             # run network
27 |             scores, labels, boxes = model([data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0),data['im_info'].cuda()])
28 |             scores = scores.cpu()
29 |             labels = labels.cpu()
30 |             boxes  = boxes.cpu()
31 | 
32 |             # correct boxes for image scale
33 |             boxes[:,2]=boxes[:,2]-boxes[:,0]+1
34 |             boxes[:,3]=boxes[:,3]-boxes[:,1]+1
35 |             boxes /= scale
36 | 
37 |             if boxes.shape[0] > 0:
38 |  
39 |                 # compute predicted labels and scores
40 |                 for box_id in range(boxes.shape[0]):
41 |                     score = float(scores[box_id])
42 |                     label = int(labels[box_id])
43 |                     box = boxes[box_id, :] 
44 | 
45 |                     # append detection for each positively labeled class
46 |                     image_result = {
47 |                         'image_id'    : dataset.image_ids[index][1],
48 |                         'category_id' : dataset.label_to_coco_label(label),
49 |                         'score'       : float(score),
50 |                         'bbox'        : box.tolist(),
51 |                     }
52 | 
53 |                     # append detection to results
54 |                     results.append(image_result)
55 | 
56 |             # append image to list of processed images
57 |             image_ids.append(dataset.image_ids[index][1])
58 | 
59 |             # print progress
60 |             print('{}/{}'.format(index, len(dataset)), end='\r') 
61 | 
62 |         # write output
63 |         json.dump(results, open('./results/{}_bbox_results.json'.format(dataset.set_name[0]), 'w'), indent=4)
64 | 
65 |         if 'test' in dataset.set_name[0]:
66 |             return
67 | 
68 |         # load results in COCO evaluation tool
69 |         coco_true = dataset.coco
70 |         coco_true = coco_true[list(coco_true.keys())[0]]
71 |         coco_pred = coco_true.loadRes('./results/{}_bbox_results.json'.format(dataset.set_name[0]))
72 | 
73 |         # run COCO evaluation
74 |         coco_eval = COCOeval(coco_true, coco_pred, 'bbox')
75 |         coco_eval.params.imgIds = image_ids
76 |         coco_eval.evaluate()
77 |         coco_eval.accumulate()
78 |         coco_eval.summarize()
79 | 
80 |         return
81 | 


--------------------------------------------------------------------------------
/lib/util/pascal_voc_eval.py:
--------------------------------------------------------------------------------
  1 | # Licensed to the Apache Software Foundation (ASF) under one
  2 | # or more contributor license agreements.  See the NOTICE file
  3 | # distributed with this work for additional information
  4 | # regarding copyright ownership.  The ASF licenses this file
  5 | # to you under the Apache License, Version 2.0 (the
  6 | # "License"); you may not use this file except in compliance
  7 | # with the License.  You may obtain a copy of the License at
  8 | #
  9 | #   http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing,
 12 | # software distributed under the License is distributed on an
 13 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 14 | # KIND, either express or implied.  See the License for the
 15 | # specific language governing permissions and limitations
 16 | # under the License.
 17 | 
 18 | 
 19 | import numpy as np
 20 | import os
 21 | import cPickle
 22 | 
 23 | 
 24 | def parse_voc_rec(filename):
 25 |     """
 26 |     parse pascal voc record into a dictionary
 27 |     :param filename: xml file path
 28 |     :return: list of dict
 29 |     """
 30 |     import xml.etree.ElementTree as ET
 31 |     tree = ET.parse(filename)
 32 |     objects = []
 33 |     for obj in tree.findall('object'):
 34 |         obj_dict = dict()
 35 |         obj_dict['name'] = obj.find('name').text
 36 |         obj_dict['difficult'] = int(obj.find('difficult').text)
 37 |         bbox = obj.find('bndbox')
 38 |         obj_dict['bbox'] = [int(float(bbox.find('xmin').text)),
 39 |                             int(float(bbox.find('ymin').text)),
 40 |                             int(float(bbox.find('xmax').text)),
 41 |                             int(float(bbox.find('ymax').text))]
 42 |         objects.append(obj_dict)
 43 |     return objects
 44 | 
 45 | 
 46 | def voc_ap(rec, prec, use_07_metric=False):
 47 |     """
 48 |     average precision calculations
 49 |     [precision integrated to recall]
 50 |     :param rec: recall
 51 |     :param prec: precision
 52 |     :param use_07_metric: 2007 metric is 11-recall-point based AP
 53 |     :return: average precision
 54 |     """
 55 |     if use_07_metric:
 56 |         ap = 0.
 57 |         for t in np.arange(0., 1.1, 0.1):
 58 |             if np.sum(rec >= t) == 0:
 59 |                 p = 0
 60 |             else:
 61 |                 p = np.max(prec[rec >= t])
 62 |             ap += p / 11.
 63 |     else:
 64 |         # append sentinel values at both ends
 65 |         mrec = np.concatenate(([0.], rec, [1.]))
 66 |         mpre = np.concatenate(([0.], prec, [0.]))
 67 | 
 68 |         # compute precision integration ladder
 69 |         for i in range(mpre.size - 1, 0, -1):
 70 |             mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
 71 | 
 72 |         # look for recall value changes
 73 |         i = np.where(mrec[1:] != mrec[:-1])[0]
 74 | 
 75 |         # sum (\delta recall) * prec
 76 |         ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
 77 |     return ap
 78 | 
 79 | 
 80 | def voc_eval(detpath, annopath, imageset_file, classname, ovthresh=0.5, use_07_metric=False):
 81 |     """
 82 |     pascal voc evaluation
 83 |     :param detpath: detection results detpath.format(classname)
 84 |     :param annopath: annotations annopath.format(classname)
 85 |     :param imageset_file: text file containing list of images
 86 |     :param classname: category name
 87 |     :param annocache: caching annotations
 88 |     :param ovthresh: overlap threshold
 89 |     :param use_07_metric: whether to use voc07's 11 point ap computation
 90 |     :return: rec, prec, ap
 91 |     """
 92 |     with open(imageset_file, 'r') as f:
 93 |         lines = f.readlines()
 94 |     image_filenames = [x.strip() for x in lines]
 95 | 
 96 |     # load annotations from cache
 97 |     recs = {}
 98 |     for ind, image_filename in enumerate(image_filenames):
 99 |         recs[image_filename] = parse_voc_rec(annopath.format(image_filename))
100 | 
101 |     # extract objects in :param classname:
102 |     class_recs = {}
103 |     npos = 0
104 |     for image_filename in image_filenames:
105 |         objects = [obj for obj in recs[image_filename] if obj['name'] == classname]
106 |         bbox = np.array([x['bbox'] for x in objects])
107 |         difficult = np.array([x['difficult'] for x in objects]).astype(np.bool)
108 |         det = [False] * len(objects)  # stand for detected
109 |         npos = npos + sum(~difficult)
110 |         class_recs[image_filename] = {'bbox': bbox,
111 |                                       'difficult': difficult,
112 |                                       'det': det}
113 | 
114 |     # read detections
115 |     detfile = detpath.format(classname)
116 |     with open(detfile, 'r') as f:
117 |         lines = f.readlines()
118 | 
119 |     splitlines = [x.strip().split(' ') for x in lines]
120 |     image_ids = [x[0] for x in splitlines]
121 |     confidence = np.array([float(x[1]) for x in splitlines])
122 |     bbox = np.array([[float(z) for z in x[2:]] for x in splitlines])
123 | 
124 |     # sort by confidence
125 |     if bbox.shape[0] > 0:
126 |         sorted_inds = np.argsort(-confidence)
127 |         sorted_scores = np.sort(-confidence)
128 |         bbox = bbox[sorted_inds, :]
129 |         image_ids = [image_ids[x] for x in sorted_inds]
130 | 
131 |     # go down detections and mark true positives and false positives
132 |     nd = len(image_ids)
133 |     tp = np.zeros(nd)
134 |     fp = np.zeros(nd)
135 |     for d in range(nd):
136 |         r = class_recs[image_ids[d]]
137 |         bb = bbox[d, :].astype(float)
138 |         ovmax = -np.inf
139 |         bbgt = r['bbox'].astype(float)
140 | 
141 |         if bbgt.size > 0:
142 |             # compute overlaps
143 |             # intersection
144 |             ixmin = np.maximum(bbgt[:, 0], bb[0])
145 |             iymin = np.maximum(bbgt[:, 1], bb[1])
146 |             ixmax = np.minimum(bbgt[:, 2], bb[2])
147 |             iymax = np.minimum(bbgt[:, 3], bb[3])
148 |             iw = np.maximum(ixmax - ixmin + 1., 0.)
149 |             ih = np.maximum(iymax - iymin + 1., 0.)
150 |             inters = iw * ih
151 | 
152 |             # union
153 |             uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
154 |                    (bbgt[:, 2] - bbgt[:, 0] + 1.) *
155 |                    (bbgt[:, 3] - bbgt[:, 1] + 1.) - inters)
156 | 
157 |             overlaps = inters / uni
158 |             ovmax = np.max(overlaps)
159 |             jmax = np.argmax(overlaps)
160 | 
161 |         if ovmax > ovthresh:
162 |             if not r['difficult'][jmax]:
163 |                 if not r['det'][jmax]:
164 |                     tp[d] = 1.
165 |                     r['det'][jmax] = 1
166 |                 else:
167 |                     fp[d] = 1.
168 |         else:
169 |             fp[d] = 1.
170 | 
171 |     # compute precision recall
172 |     fp = np.cumsum(fp)
173 |     tp = np.cumsum(tp)
174 |     rec = tp / float(npos)
175 |     # avoid division by zero in case first detection matches a difficult ground truth
176 |     prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
177 |     ap = voc_ap(rec, prec, use_07_metric)
178 | 
179 |     return rec, prec, ap
180 | 


--------------------------------------------------------------------------------
/lib/util/utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import numpy as np
  4 | 
  5 | def conv3x3(in_planes, out_planes, stride=1):
  6 |     """3x3 convolution with padding"""
  7 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
  8 |                      padding=1, bias=False)
  9 | 
 10 | class BasicBlock(nn.Module):
 11 |     expansion = 1
 12 | 
 13 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 14 |         super(BasicBlock, self).__init__()
 15 |         self.conv1 = conv3x3(inplanes, planes, stride)
 16 |         self.bn1 = nn.BatchNorm2d(planes)
 17 |         self.relu = nn.ReLU(inplace=True)
 18 |         self.conv2 = conv3x3(planes, planes)
 19 |         self.bn2 = nn.BatchNorm2d(planes)
 20 |         self.downsample = downsample
 21 |         self.stride = stride
 22 | 
 23 |     def forward(self, x):
 24 |         residual = x
 25 | 
 26 |         out = self.conv1(x)
 27 |         out = self.bn1(out)
 28 |         out = self.relu(out)
 29 | 
 30 |         out = self.conv2(out)
 31 |         out = self.bn2(out)
 32 | 
 33 |         if self.downsample is not None:
 34 |             residual = self.downsample(x)
 35 | 
 36 |         out += residual
 37 |         out = self.relu(out)
 38 | 
 39 |         return out
 40 | 
 41 | 
 42 | class Bottleneck(nn.Module):
 43 |     expansion = 4
 44 | 
 45 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 46 |         super(Bottleneck, self).__init__()
 47 |         #self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
 48 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) 
 49 |         self.bn1 = nn.BatchNorm2d(planes)
 50 |         #self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
 51 |         #                       padding=1, bias=False)
 52 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1, bias=False)
 53 |         self.bn2 = nn.BatchNorm2d(planes)
 54 |         self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
 55 |         self.bn3 = nn.BatchNorm2d(planes * 4)
 56 |         self.relu = nn.ReLU(inplace=True)
 57 |         self.downsample = downsample
 58 |         self.stride = stride
 59 | 
 60 |     def forward(self, x):
 61 |         residual = x
 62 | 
 63 |         out = self.conv1(x)
 64 |         out = self.bn1(out)
 65 |         out = self.relu(out)
 66 | 
 67 |         out = self.conv2(out)
 68 |         out = self.bn2(out)
 69 |         out = self.relu(out)
 70 | 
 71 |         out = self.conv3(out)
 72 |         out = self.bn3(out)
 73 | 
 74 |         if self.downsample is not None:
 75 |             residual = self.downsample(x)
 76 | 
 77 |         out += residual
 78 |         out = self.relu(out)
 79 | 
 80 |         return out
 81 | 
 82 | class BBoxTransform(nn.Module):
 83 | 
 84 |     def __init__(self, mean=None, std=None):
 85 |         super(BBoxTransform, self).__init__()
 86 |         if mean is None:
 87 |             self.mean = torch.from_numpy(np.array([0, 0, 0, 0]).astype(np.float32)).cuda()
 88 |         else:
 89 |             self.mean = mean
 90 |         if std is None:
 91 |             self.std = torch.from_numpy(np.array([0.1, 0.1, 0.2, 0.2]).astype(np.float32)).cuda()
 92 |         else:
 93 |             self.std = std
 94 | 
 95 |     def forward(self, boxes, deltas):
 96 |  
 97 |         widths  = boxes[:, :, 2] - boxes[:, :, 0] + 1.0
 98 |         heights = boxes[:, :, 3] - boxes[:, :, 1] + 1.0
 99 |         ctr_x   = boxes[:, :, 0] + 0.5 * (widths - 1.0)
100 |         ctr_y   = boxes[:, :, 1] + 0.5 * (heights -1.0)
101 | 
102 |         dx = deltas[:, :, 0] * self.std[0] + self.mean[0]
103 |         dy = deltas[:, :, 1] * self.std[1] + self.mean[1]
104 |         dw = deltas[:, :, 2] * self.std[2] + self.mean[2]
105 |         dh = deltas[:, :, 3] * self.std[3] + self.mean[3]
106 | 
107 |         pred_ctr_x = ctr_x + dx * widths
108 |         pred_ctr_y = ctr_y + dy * heights
109 |         pred_w     = torch.exp(dw) * widths
110 |         pred_h     = torch.exp(dh) * heights
111 | 
112 |         pred_boxes_x1 = pred_ctr_x - 0.5 * torch.clamp(pred_w-1.0,min=0)
113 |         pred_boxes_y1 = pred_ctr_y - 0.5 * torch.clamp(pred_h-1.0,min=0)
114 |         pred_boxes_x2 = pred_ctr_x + 0.5 * torch.clamp(pred_w-1.0,min=0)
115 |         pred_boxes_y2 = pred_ctr_y + 0.5 * torch.clamp(pred_h-1.0,min=0)
116 | 
117 |         pred_boxes = torch.stack([pred_boxes_x1, pred_boxes_y1, pred_boxes_x2, pred_boxes_y2], dim=2)
118 | 
119 |         return pred_boxes
120 | 
121 | 
122 | class ClipBoxes(nn.Module):
123 | 
124 |     def __init__(self, width=None, height=None):
125 |         super(ClipBoxes, self).__init__()
126 | 
127 |     def forward(self, boxes, img_shape):
128 | 
129 |         height,width=img_shape 
130 | 
131 |         boxes[:,:,0]=torch.clamp(boxes[:,:,0],min=0,max=width-1)
132 |         boxes[:,:,1]=torch.clamp(boxes[:,:,1],min=0,max=height-1)
133 |         boxes[:,:,2]=torch.clamp(boxes[:,:,2],min=0,max=width-1)
134 |         boxes[:,:,3]=torch.clamp(boxes[:,:,3],min=0,max=height-1)
135 | 
136 |         return boxes
137 | 


--------------------------------------------------------------------------------
/lib/util/voc_eval.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | 
 3 | import numpy as np
 4 | import json
 5 | import os
 6 | 
 7 | import torch
 8 | 
 9 | def evaluate_voc(dataset, model, threshold=0.01):
10 |     
11 |     model.eval()
12 |     
13 |     with torch.no_grad():
14 | 
15 |         all_boxes = [[[] for _ in xrange(len(dataset))] for _ in xrange(21)]
16 | 
17 |         for index in range(len(dataset)):
18 |             data = dataset[index]
19 |             scale = data['scale']
20 | 
21 |             # run network
22 |             scores, labels, boxes = model([data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0),data['im_info'].cuda()])
23 |             scores = scores.cpu()
24 |             labels = labels.cpu()
25 |             boxes  = boxes.cpu()
26 | 
27 |             # correct boxes for image scale
28 |             boxes[:,2]=boxes[:,2]-boxes[:,0]+1
29 |             boxes[:,3]=boxes[:,3]-boxes[:,1]+1
30 |             boxes /= scale
31 |             boxes[:,2]=boxes[:,2]+boxes[:,0]-1
32 |             boxes[:,3]=boxes[:,3]+boxes[:,1]-1 
33 | 
34 |             for j in range(1, 21):
35 |                 indexes=np.where(labels==j-1)[0]
36 |                 cls_scores=scores[indexes,np.newaxis]
37 |                 cls_boxes=boxes[indexes,:]
38 |                 cls_dets=np.hstack((cls_boxes,cls_scores))
39 |                 all_boxes[j][index] = cls_dets[:,:]
40 | 
41 |             # print progress
42 |             print('{}/{}'.format(index, len(dataset)), end='\r')
43 | 
44 |         dataset.evaluate_detections(all_boxes)
45 | 
46 |         return
47 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
 1 | import time
 2 | import argparse
 3 | import collections
 4 | 
 5 | import numpy as np
 6 | 
 7 | import torch
 8 | import torch.nn as nn
 9 | import torch.optim as optim
10 | from torch.optim import lr_scheduler
11 | from torch.autograd import Variable
12 | from torchvision import datasets, models, transforms
13 | import torchvision
14 | 
15 | from lib.model import model
16 | from lib.dataloader.dataloader import CocoDataset, VocDataset, Resizer, Normalizer
17 | from torch.utils.data import Dataset, DataLoader
18 | 
19 | from lib.util import coco_eval
20 | from lib.util import voc_eval
21 | from lib import config
22 | 
23 | def main(args=None):
24 | 
25 |     parser     = argparse.ArgumentParser(description='Simple testing script for testing a RetinaNet network.')
26 | 
27 |     parser.add_argument('--dataset',type=str)
28 |     parser.add_argument('--test_epoch',type=int,default=0)
29 | 
30 |     parser = parser.parse_args(args)
31 |     
32 |     if parser.dataset=='coco':
33 |         config.dataset=config.dataset_coco
34 |     elif parser.dataset=='voc':
35 |         config.dataset=config.dataset_voc
36 |     
37 |     set_name=[iset for iset in config.dataset['test_set'].split('+')]
38 |     if config.dataset['dataset']=='coco': 
39 |         dataset_val = CocoDataset(config.dataset['path'], set_name=set_name, transform=transforms.Compose([Normalizer(), Resizer()]))
40 |     elif config.dataset['dataset']=='voc':
41 |         dataset_val = VocDataset(config.dataset['path'], set_name=set_name, transform=transforms.Compose([Normalizer(), Resizer()]))
42 |     else:
43 |         raise ValueError('Not implemented.')	
44 |   
45 |     if config.depth == 50:
46 |         retinanet = model.resnet50(num_classes=dataset_val.num_classes(), pretrained=True)
47 |     elif config.depth == 101:
48 |         retinanet = model.resnet101(num_classes=dataset_val.num_classes(), pretrained=True)
49 |     else:
50 |         raise ValueError('Not implemented.')	
51 | 
52 |     use_gpu=True
53 |     if use_gpu:
54 |         retinanet = retinanet.cuda()
55 | 	
56 |     retinanet = torch.nn.DataParallel(module=retinanet,device_ids=[config.gpu_ids[0]]).cuda()
57 | 
58 |     retinanet.load_state_dict(torch.load('models/'+config.dataset['dataset']+'_retinanet_'+str(parser.test_epoch)+'.pt',map_location=lambda storage, loc: storage.cuda()))
59 | 
60 |     retinanet.training = False
61 | 
62 |     retinanet.eval()
63 |     retinanet.module.freeze_bn()
64 | 
65 |     if config.dataset['dataset']=='coco':
66 |         coco_eval.evaluate_coco(dataset_val, retinanet)
67 |     elif config.dataset['dataset']=='voc':
68 |         voc_eval.evaluate_voc(dataset_val, retinanet)
69 |     else:
70 |         raise ValueError('Not implemented.')
71 | 
72 | if __name__ == '__main__':
73 |     with torch.cuda.device(config.gpu_ids[0]):
74 |         main()
75 | 


--------------------------------------------------------------------------------
/test.sh:
--------------------------------------------------------------------------------
1 | python test.py --dataset coco --test_epoch 99
2 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import argparse
  3 | import collections
  4 | 
  5 | import numpy as np
  6 | 
  7 | import torch
  8 | import torch.nn as nn
  9 | import torch.optim as optim
 10 | from torch.optim import lr_scheduler
 11 | from torch.autograd import Variable
 12 | from torchvision import datasets, models, transforms
 13 | import torchvision
 14 | 
 15 | from lib.model import model
 16 | from lib.dataloader.dataloader import CocoDataset, VocDataset, collater, AspectRatioBasedSampler, Augmentation
 17 | from torch.utils.data import Dataset, DataLoader
 18 | 
 19 | from lib import config
 20 | 
 21 | 
 22 | print('CUDA available: {}'.format(torch.cuda.is_available()))
 23 | 
 24 | def main(args=None):
 25 | 
 26 |     parser     = argparse.ArgumentParser(description='Simple training script for training a RetinaNet network.')
 27 | 
 28 |     parser.add_argument('--dataset',type=str)
 29 |     parser.add_argument('--resume',type=bool, default=False)
 30 |     parser.add_argument('--resume_epoch',type=int, default=-1)
 31 | 
 32 |     parser = parser.parse_args(args)
 33 |     
 34 |     if parser.dataset=='coco':
 35 |         config.dataset=config.dataset_coco
 36 |     elif parser.dataset=='voc':
 37 |         config.dataset=config.dataset_voc
 38 | 
 39 |     set_name=[iset for iset in config.dataset['train_set'].split('+')]
 40 |     # Create the data loaders
 41 |     if config.dataset['dataset'] == 'coco':
 42 |         dataset_train = CocoDataset(config.dataset['path'], set_name=set_name, transform=Augmentation())
 43 |     elif config.dataset['dataset'] == 'voc':
 44 |         dataset_train = VocDataset(config.dataset['path'], set_name=set_name, transform=Augmentation())
 45 |     else:
 46 |         raise ValueError('Not implemented.')
 47 | 
 48 |     sampler = AspectRatioBasedSampler(dataset_train, batch_size=config.batch_size*len(config.gpu_ids))
 49 |     dataloader_train = DataLoader(dataset_train, num_workers=len(config.gpu_ids), collate_fn=collater, batch_sampler=sampler)
 50 | 
 51 |     # Create the model 
 52 |     if config.depth == 50:
 53 |         retinanet = model.resnet50(num_classes=dataset_train.num_classes(), pretrained=True)
 54 |     elif config.depth == 101:
 55 |         retinanet = model.resnet101(num_classes=dataset_train.num_classes(), pretrained=True)
 56 |     else:
 57 |         raise ValueError('Not implemented')
 58 | 
 59 |     use_gpu = True
 60 | 
 61 |     if use_gpu:
 62 |         retinanet = retinanet.cuda()
 63 | 	
 64 |     retinanet = torch.nn.DataParallel(module=retinanet,device_ids=config.gpu_ids).cuda()
 65 |     
 66 |     retinanet.training = True
 67 | 
 68 |     optimizer = optim.SGD(retinanet.parameters(), lr=config.lr, momentum=0.9, weight_decay=1e-4)
 69 | 
 70 |     scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=config.dataset['lr_step'], gamma=0.1)
 71 | 
 72 |     warmup=config.warmup
 73 |     begin_epoch=0
 74 |     if parser.resume==True:
 75 |         retinanet.load_state_dict(torch.load('./models/'+config.dataset['dataset']+'_retinanet_'+str(parser.resume_epoch)+'.pt'))
 76 |         begin_epoch=parser.resume_epoch+1 
 77 |         for jj in range(begin_epoch):
 78 |             scheduler.step()
 79 | 
 80 |     cls_loss_hist = collections.deque(maxlen=300)
 81 |     reg_loss_hist = collections.deque(maxlen=300)
 82 |     tic_hist = collections.deque(maxlen=100)
 83 | 
 84 |     retinanet.train()
 85 |     retinanet.module.freeze_bn()
 86 | 
 87 |     print('Num training images: {}'.format(len(dataset_train)))
 88 | 
 89 |     for epoch_num in range(begin_epoch,config.dataset['epochs']):
 90 | 
 91 |         retinanet.train()
 92 |         retinanet.module.freeze_bn()
 93 | 
 94 |         tic=time.time()
 95 |         for iter_num, data in enumerate(dataloader_train):
 96 | 
 97 |             optimizer.zero_grad()
 98 | 
 99 |             classification_loss, regression_loss = retinanet([data['img'].cuda().float(), data['annot']])
100 | 
101 |             classification_loss = classification_loss.mean()
102 |             regression_loss = regression_loss.mean()
103 | 
104 |             loss = classification_loss + regression_loss
105 | 
106 |             if bool(loss == 0):
107 |                 continue
108 | 
109 |             loss.backward()
110 | 
111 |             if warmup and optimizer._step_count<=config.warmup_step:
112 |                 init_lr=config.lr
113 |                 warmup_lr=init_lr*config.warmup_factor + optimizer._step_count/float(config.warmup_step)*(init_lr*(1-config.warmup_factor))
114 |                 for ii_ in optimizer.param_groups:
115 |                     ii_['lr']=warmup_lr 
116 | 
117 |             optimizer.step()
118 |  
119 |             tic_hist.append(time.time()-tic)
120 |             tic=time.time()
121 |             speed=(config.batch_size*len(config.gpu_ids)*len(tic_hist))/(np.sum(tic_hist))
122 |             cls_loss_hist.append(float(classification_loss))
123 |             reg_loss_hist.append(float(regression_loss))
124 |             print('Epoch: {} | Iteration: {} | Classification loss: avg: {:1.5f}, cur: {:1.5f} | Regression loss: avg: {:1.5f}, cur: {:1.5f} | Speed: {:1.5f} images per second'.format(epoch_num, iter_num, np.mean(cls_loss_hist), float(classification_loss), np.mean(reg_loss_hist), float(regression_loss), speed))
125 | 
126 |             del classification_loss
127 |             del regression_loss 
128 | 
129 |         scheduler.step()
130 | 
131 |         torch.save(retinanet.state_dict(), 'models/{}_retinanet_{}.pt'.format(config.dataset['dataset'], epoch_num))
132 | 
133 |     retinanet.eval()
134 | 
135 |     torch.save(retinanet.state_dict(), 'models/model_final.pt'.format(epoch_num))
136 | 
137 | if __name__ == '__main__':
138 |     with torch.cuda.device(config.gpu_ids[0]):
139 |         main()
140 | 


--------------------------------------------------------------------------------
/train.sh:
--------------------------------------------------------------------------------
1 | python train.py --dataset coco
2 | 


--------------------------------------------------------------------------------