├── LICENSE ├── README.md ├── lib ├── __init__.py ├── config.py ├── dataloader │ ├── __init__.py │ ├── augmentations.py │ └── dataloader.py ├── model │ ├── __init__.py │ ├── anchors.py │ ├── aploss.py │ └── model.py └── util │ ├── __init__.py │ ├── calc_iou.py │ ├── coco_eval.py │ ├── pascal_voc_eval.py │ ├── utils.py │ └── voc_eval.py ├── test.py ├── test.sh ├── train.py └── train.sh /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 CKA 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AP-loss 2 | The implementation of “[Towards accurate one-stage object detection with AP-loss](https://arxiv.org/abs/1904.06373)”. 3 | 4 | ### Requirements 5 | - Python 2.7 6 | - PyTorch 1.3+ 7 | - Cuda 8 | 9 | ### Installation 10 | 1. Clone this repo 11 | ``` 12 | git clone https://github.com/cccorn/AP-loss.git 13 | cd AP-loss 14 | ``` 15 | 2. Install the python packages: 16 | ``` 17 | pip install pycocotools 18 | pip install opencv-python 19 | ``` 20 | 3. Create directories: 21 | ``` 22 | mkdir data models results 23 | ``` 24 | 4. Prepare Data. You can use 25 | ``` 26 | ln -s $YOUR_PATH_TO_coco data/coco 27 | ln -s $YOUR_PATH_TO_VOCdevkit data/voc 28 | ``` 29 | The directories should be arranged like: 30 | ``` 31 | ├── data 32 | │ ├── coco 33 | │ │ ├── annotations 34 | │ │ ├── images 35 | │ │ │ ├── train2017 36 | │ │ │ ├── val2017 37 | │ │ │ ├── test-dev2017 38 | │ ├── voc 39 | │ │ ├── VOC2007 40 | │ │ ├── VOC2012 41 | ``` 42 | 5. Prepare the pre-trained models and put them in `models` like: 43 | ``` 44 | ├── models 45 | │ ├── resnet50-pytorch.pth 46 | | ├── resnet101-pytorch.pth 47 | ``` 48 | We use the ResNet-50 and ResNet-101 pre-trained models which are converted from [here](https://github.com/KaimingHe/deep-residual-networks). We also provide the converted pre-trained models at [this link](https://1drv.ms/u/s!AgPNhBALXYVSa1pQCFJNNk6JgaA?e=PqhsWD). 49 | 50 | ### Training 51 | 52 | ``` 53 | bash train.sh 54 | ``` 55 | You can modify the configurations in `lib/config.py` to change the gpu_ids, network depth, image size, etc. 56 | 57 | ### Testing 58 | 59 | ``` 60 | bash test.sh 61 | ``` 62 | 63 | ### Note 64 | 65 | We release the AP-loss implementation in PyTorch instead of in MXNet due to an engineering [issue](https://github.com/apache/incubator-mxnet/issues/8884): the python custom operator in MXNet does not run in parrallel when using multi-gpus. It is more practical to implement AP-loss in PyTorch, for faster training speed. 66 | 67 | ### Acknowledgements 68 | 69 | - Many thanks to the pytorch implementation of RetinaNet at [pytorch-retinanet](https://github.com/yhenon/pytorch-retinanet). 70 | 71 | ### Citation 72 | 73 | If you find this repository useful in your research, please consider citing: 74 | ``` 75 | @inproceedings{chen2019towards, 76 | title={Towards accurate one-stage object detection with ap-loss}, 77 | author={Chen, Kean and Li, Jianguo and Lin, Weiyao and See, John and Wang, Ji and Duan, Lingyu and Chen, Zhibo and He, Changwei and Zou, Junni}, 78 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, 79 | pages={5119--5127}, 80 | year={2019} 81 | } 82 | ``` 83 | -------------------------------------------------------------------------------- /lib/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /lib/config.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | gpu_ids=[0,1] 4 | batch_size=8 5 | 6 | lr=0.001 7 | 8 | warmup=True 9 | warmup_step=500 10 | warmup_factor=0.33333333 11 | 12 | train_img_size=512 13 | test_img_size=[500,833] #(size of the shorter side, maximum size of the side) 14 | 15 | anchor_ratios=np.array([0.5,1.0,2.0]) 16 | anchor_scales=np.array([2**0,2**(1.0/2.0)]) 17 | num_anchors=len(anchor_ratios)*len(anchor_scales) 18 | 19 | pixel_mean = np.array([[[102.9801, 115.9465, 122.7717]]]) 20 | 21 | dataset_coco={'dataset':'coco', 'path':'data/coco', 'train_set':'train2017', 'test_set':'val2017', 'epochs':100, 'lr_step':[60,80]} 22 | dataset_voc={'dataset':'voc', 'path':'data/voc', 'train_set':'2007_trainval+2012_trainval', 'test_set':'2007_test', 'epochs':160, 'lr_step':[110,140]} 23 | 24 | depth=101 25 | -------------------------------------------------------------------------------- /lib/dataloader/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /lib/dataloader/augmentations.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import random 4 | from .. import config 5 | import torch 6 | 7 | def intersect(box_a, box_b): 8 | max_xy = np.minimum(box_a[:, 2:], box_b[2:]) 9 | min_xy = np.maximum(box_a[:, :2], box_b[:2]) 10 | inter = np.clip((max_xy - min_xy +1), a_min=0, a_max=np.inf) 11 | return inter[:, 0] * inter[:, 1] 12 | 13 | 14 | def jaccard_numpy(box_a, box_b): 15 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 16 | is simply the intersection over union of two boxes. 17 | Args: 18 | box_a: Multiple bounding boxes, Shape: [num_boxes,4] 19 | box_b: Single bounding box, Shape: [4] 20 | Return: 21 | jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]] 22 | """ 23 | inter = intersect(box_a, box_b) 24 | area_a = ((box_a[:, 2]-box_a[:, 0]+1) * 25 | (box_a[:, 3]-box_a[:, 1]+1)) # [A,B] 26 | area_b = ((box_b[2]-box_b[0]+1) * 27 | (box_b[3]-box_b[1]+1)) # [A,B] 28 | union = area_a + area_b - inter 29 | return inter / union # [A,B] 30 | 31 | 32 | class Compose(object): 33 | """Composes several augmentations together. 34 | Args: 35 | transforms (List[Transform]): list of transforms to compose. 36 | Example: 37 | >>> augmentations.Compose([ 38 | >>> transforms.CenterCrop(10), 39 | >>> transforms.ToTensor(), 40 | >>> ]) 41 | """ 42 | 43 | def __init__(self, transforms): 44 | self.transforms = transforms 45 | 46 | def __call__(self, img, boxes=None, labels=None): 47 | for t in self.transforms: 48 | img, boxes, labels = t(img, boxes, labels) 49 | return img, boxes, labels 50 | 51 | 52 | class ConvertFromInts(object): 53 | def __call__(self, image, boxes=None, labels=None): 54 | return image.astype(np.float32), boxes, labels 55 | 56 | 57 | class SubtractMeans(object): 58 | def __init__(self, mean): 59 | self.mean = np.array(mean, dtype=np.float32) 60 | 61 | def __call__(self, image, boxes=None, labels=None): 62 | image = image.astype(np.float32) 63 | image -= self.mean 64 | return image.astype(np.float32), boxes, labels 65 | 66 | 67 | class Resize(object): 68 | def __init__(self, size=300): 69 | self.size = size 70 | self.interp_mode = ( 71 | cv2.INTER_LINEAR, 72 | cv2.INTER_AREA, 73 | cv2.INTER_NEAREST, 74 | cv2.INTER_CUBIC, 75 | cv2.INTER_LANCZOS4) 76 | 77 | def __call__(self, image, boxes=None, labels=None): 78 | 79 | interpolation_mode=random.choice(self.interp_mode) 80 | rows,cols,cns=image.shape 81 | image = cv2.resize(image, (self.size, 82 | self.size), interpolation=interpolation_mode) 83 | rows2,cols2,cns=image.shape 84 | boxes_wh=boxes[:,2:4]-boxes[:,0:2]+1.0 85 | boxes[:,0:2]=boxes[:,0:2]/np.array([cols,rows])*np.array([cols2,rows2]) 86 | boxes[:,2:4]=boxes[:,0:2]+boxes_wh/np.array([cols,rows])*np.array([cols2,rows2])-1.0 87 | return image, boxes, labels 88 | 89 | 90 | class RandomSaturation(object): 91 | def __init__(self, lower=0.5, upper=1.5): 92 | self.lower = lower 93 | self.upper = upper 94 | assert self.upper >= self.lower, "contrast upper must be >= lower." 95 | assert self.lower >= 0, "contrast lower must be non-negative." 96 | 97 | def __call__(self, image, boxes=None, labels=None): 98 | if random.randint(0,1): 99 | image[:, :, 1] *= random.uniform(self.lower, self.upper) 100 | 101 | return image, boxes, labels 102 | 103 | 104 | class RandomHue(object): 105 | def __init__(self, delta=18.0): 106 | assert delta >= 0.0 and delta <= 360.0 107 | self.delta = delta 108 | 109 | def __call__(self, image, boxes=None, labels=None): 110 | if random.randint(0,1): 111 | image[:, :, 0] += random.uniform(-self.delta, self.delta) 112 | image[:, :, 0][image[:, :, 0] > 360.0] -= 360.0 113 | image[:, :, 0][image[:, :, 0] < 0.0] += 360.0 114 | return image, boxes, labels 115 | 116 | 117 | class RandomLightingNoise(object): 118 | def __init__(self, std=0.1): 119 | self.eigval = np.array([55.46, 4.794, 1.148]) 120 | self.eigvec = np.array([[-0.5675, 0.7192, 0.4009], 121 | [-0.5808, -0.0045, -0.8140], 122 | [-0.5836, -0.6948, 0.4203]]) 123 | self.std=std 124 | 125 | def __call__(self, image, boxes=None, labels=None): 126 | if random.randint(0,1): 127 | alpha = np.array([random.normalvariate(0, self.std) for _ in range(3)]) 128 | rgb = np.dot(self.eigvec * alpha, self.eigval) 129 | image += rgb 130 | return image, boxes, labels 131 | 132 | 133 | class ConvertColor(object): 134 | def __init__(self, current='BGR', transform='HSV'): 135 | self.transform = transform 136 | self.current = current 137 | 138 | def __call__(self, image, boxes=None, labels=None): 139 | if self.current == 'BGR' and self.transform == 'HSV': 140 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 141 | elif self.current == 'HSV' and self.transform == 'BGR': 142 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR) 143 | else: 144 | raise NotImplementedError 145 | return image, boxes, labels 146 | 147 | 148 | class RandomContrast(object): 149 | def __init__(self, lower=0.5, upper=1.5): 150 | self.lower = lower 151 | self.upper = upper 152 | assert self.upper >= self.lower, "contrast upper must be >= lower." 153 | assert self.lower >= 0, "contrast lower must be non-negative." 154 | 155 | # expects float image 156 | def __call__(self, image, boxes=None, labels=None): 157 | if random.randint(0,1): 158 | alpha = random.uniform(self.lower, self.upper) 159 | image *= alpha 160 | return image, boxes, labels 161 | 162 | 163 | class RandomBrightness(object): 164 | def __init__(self, delta=32): 165 | assert delta >= 0.0 166 | assert delta <= 255.0 167 | self.delta = delta 168 | 169 | def __call__(self, image, boxes=None, labels=None): 170 | if random.randint(0,1): 171 | delta = random.uniform(-self.delta, self.delta) 172 | image += delta 173 | return image, boxes, labels 174 | 175 | class RandomSampleCrop(object): 176 | """Crop 177 | Arguments: 178 | img (Image): the image being input during training 179 | boxes (Tensor): the original bounding boxes in pt form 180 | labels (Tensor): the class labels for each bbox 181 | mode (float tuple): the min and max jaccard overlaps 182 | Return: 183 | (img, boxes, classes) 184 | img (Image): the cropped image 185 | boxes (Tensor): the adjusted bounding boxes in pt form 186 | labels (Tensor): the class labels for each bbox 187 | """ 188 | def __init__(self): 189 | self.sample_options = ( 190 | # using entire original input image 191 | None, 192 | # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9 193 | (0.1, None), 194 | (0.3, None), 195 | (0.7, None), 196 | (0.9, None), 197 | # randomly sample a patch 198 | (None, None), 199 | ) 200 | 201 | def __call__(self, image, boxes=None, labels=None): 202 | height, width, _ = image.shape 203 | while True: 204 | # randomly choose a mode 205 | mode = random.choice(self.sample_options) 206 | if mode is None: 207 | return image, boxes, labels 208 | 209 | min_iou, max_iou = mode 210 | if min_iou is None: 211 | min_iou = float('-inf') 212 | if max_iou is None: 213 | max_iou = float('inf') 214 | 215 | # max trails (50) 216 | for _ in range(50): 217 | current_image = image 218 | 219 | w = random.uniform(0.3 * width, width) 220 | h = random.uniform(0.3 * height, height) 221 | 222 | # aspect ratio constraint b/t .5 & 2 223 | if h / w < 0.5 or h / w > 2 or w<1 or h<1: 224 | continue 225 | 226 | left = random.uniform(0,width - w) 227 | top = random.uniform(0,height - h) 228 | 229 | # convert to integer rect x1,y1,x2,y2 230 | rect = np.array([int(left), int(top), int(left+w)-1, int(top+h)-1]) 231 | 232 | # calculate IoU (jaccard overlap) b/t the cropped and gt boxes 233 | overlap = jaccard_numpy(boxes, rect) 234 | 235 | # is min and max overlap constraint satisfied? if not try again 236 | #if overlap.min() < min_iou and max_iou < overlap.max(): 237 | # continue 238 | if overlap.max()max_iou: 239 | continue 240 | 241 | # cut the crop from the image 242 | current_image = current_image[rect[1]:rect[3]+1, rect[0]:rect[2]+1, 243 | :] 244 | 245 | # keep overlap with gt box IF center in sampled patch 246 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0 247 | 248 | # mask in all gt boxes that above and to the left of centers 249 | m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1]) 250 | 251 | # mask in all gt boxes that under and to the right of centers 252 | m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1]) 253 | 254 | # mask in that both m1 and m2 are true 255 | mask = m1 * m2 256 | 257 | # have any valid boxes? try again if not 258 | if not mask.any(): 259 | continue 260 | 261 | # take only matching gt boxes 262 | current_boxes = boxes[mask, :].copy() 263 | 264 | # take only matching gt labels 265 | current_labels = labels[mask] 266 | 267 | # should we use the box left and top corner or the crop's 268 | current_boxes[:, :2] = np.maximum(current_boxes[:, :2], 269 | rect[:2]) 270 | # adjust to crop (by substracting crop's left,top) 271 | current_boxes[:, :2] -= rect[:2] 272 | 273 | current_boxes[:, 2:] = np.minimum(current_boxes[:, 2:], 274 | rect[2:]) 275 | # adjust to crop (by substracting crop's left,top) 276 | current_boxes[:, 2:] -= rect[:2] 277 | 278 | return current_image, current_boxes, current_labels 279 | 280 | 281 | class Expand(object): 282 | def __init__(self, mean): 283 | self.mean = mean 284 | 285 | def __call__(self, image, boxes, labels): 286 | if random.randint(0,1): 287 | return image, boxes, labels 288 | 289 | height, width, depth = image.shape 290 | ratio = random.uniform(1, 4) 291 | left = random.uniform(0, int(width*ratio) - width) 292 | top = random.uniform(0, int(height*ratio) - height) 293 | 294 | expand_image = np.zeros( 295 | (int(height*ratio), int(width*ratio), depth), 296 | dtype=image.dtype) 297 | expand_image[:, :, :] = self.mean 298 | expand_image[int(top):int(top + height), 299 | int(left):int(left + width)] = image 300 | image = expand_image 301 | 302 | boxes = boxes.copy() 303 | boxes[:, :2] += (int(left), int(top)) 304 | boxes[:, 2:] += (int(left), int(top)) 305 | 306 | return image, boxes, labels 307 | 308 | 309 | class RandomMirror(object): 310 | def __call__(self, image, boxes, classes): 311 | _, width, _ = image.shape 312 | if random.randint(0,1): 313 | image = image[:, ::-1] 314 | boxes = boxes.copy() 315 | boxes[:, 0::2] = width - boxes[:, 2::-2]-1 316 | return image, boxes, classes 317 | 318 | 319 | class PhotometricDistort(object): 320 | def __init__(self): 321 | self.pd = [ 322 | RandomContrast(), 323 | ConvertColor(transform='HSV'), 324 | RandomSaturation(), 325 | RandomHue(), 326 | ConvertColor(current='HSV', transform='BGR'), 327 | RandomContrast() 328 | ] 329 | self.rand_brightness = RandomBrightness() 330 | self.rand_lighting = RandomLightingNoise() 331 | 332 | def __call__(self, image, boxes, labels): 333 | im = image.copy() 334 | im, boxes, labels = self.rand_brightness(im, boxes, labels) 335 | if random.randint(0,1): 336 | distort = Compose(self.pd[:-1]) 337 | else: 338 | distort = Compose(self.pd[1:]) 339 | im, boxes, labels = distort(im, boxes, labels) 340 | im, boxes, labels = self.rand_lighting(im, boxes, labels) 341 | return im, boxes, labels 342 | 343 | class Augmentation(object): 344 | def __init__(self): 345 | self.mean = config.pixel_mean 346 | self.size = config.train_img_size 347 | self.augment = Compose([ 348 | ConvertFromInts(), 349 | PhotometricDistort(), 350 | Expand(self.mean), 351 | RandomSampleCrop(), 352 | RandomMirror(), 353 | Resize(self.size), 354 | SubtractMeans(self.mean) 355 | ]) 356 | 357 | def __call__(self, sample): 358 | img=sample['img'] 359 | annot=sample['annot'] 360 | boxes=annot[:,:4].copy() 361 | labels=annot[:,4:].copy() 362 | img1,boxes1,labels1=self.augment(img, boxes, labels) 363 | return {'img':torch.from_numpy(img1),'annot':torch.from_numpy(np.hstack([boxes1,labels1]))} 364 | -------------------------------------------------------------------------------- /lib/dataloader/dataloader.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os 3 | import torch 4 | import numpy as np 5 | import random 6 | 7 | from torch.utils.data import Dataset, DataLoader 8 | from torchvision import transforms, utils 9 | from torch.utils.data.sampler import Sampler 10 | 11 | from pycocotools.coco import COCO 12 | from ..util.pascal_voc_eval import voc_eval 13 | import cv2 14 | from .. import config 15 | 16 | from augmentations import Augmentation 17 | 18 | class CocoDataset(Dataset): 19 | """Coco dataset.""" 20 | 21 | def __init__(self, root_dir, set_name=['train2017'], transform=None): 22 | """ 23 | Args: 24 | root_dir (string): COCO directory. 25 | transform (callable, optional): Optional transform to be applied 26 | on a sample. 27 | """ 28 | self.root_dir = root_dir 29 | self.set_name = set_name 30 | self.transform = transform 31 | 32 | self.coco={} 33 | for set_name_ii in self.set_name: 34 | prefix_ii='instances' if 'test' not in set_name_ii else 'image_info' 35 | self.coco[set_name_ii] = COCO(os.path.join(self.root_dir, 'annotations', prefix_ii + '_' + set_name_ii + '.json')) 36 | 37 | self.image_ids = [] 38 | for set_name_ii in self.set_name: 39 | self.image_ids.extend([[set_name_ii,ids] for ids in self.coco[set_name_ii].getImgIds()]) 40 | 41 | self.load_classes() 42 | 43 | def load_classes(self): 44 | # load class names (name -> label) 45 | set_name_ii=self.set_name[0] 46 | categories = self.coco[set_name_ii].loadCats(self.coco[set_name_ii].getCatIds()) 47 | categories.sort(key=lambda x: x['id']) 48 | 49 | self.classes = {} 50 | self.coco_labels = {} 51 | self.coco_labels_inverse = {} 52 | for c in categories: 53 | self.coco_labels[len(self.classes)] = c['id'] 54 | self.coco_labels_inverse[c['id']] = len(self.classes) 55 | self.classes[c['name']] = len(self.classes) 56 | 57 | # also load the reverse (label -> name) 58 | self.labels = {} 59 | for key, value in self.classes.items(): 60 | self.labels[value] = key 61 | 62 | def __len__(self): 63 | return len(self.image_ids) 64 | 65 | def __getitem__(self, idx): 66 | 67 | img = self.load_image(idx) 68 | annot = self.load_annotations(idx) 69 | sample = {'img': img, 'annot': annot} 70 | if self.transform: 71 | sample = self.transform(sample) 72 | 73 | return sample 74 | 75 | def load_image(self, image_index): 76 | image_info_info = self.image_ids[image_index] 77 | image_info = self.coco[image_info_info[0]].loadImgs(image_info_info[1])[0] 78 | path = os.path.join(self.root_dir, 'images', image_info_info[0], image_info['file_name']) 79 | 80 | img = cv2.imread(path,cv2.IMREAD_COLOR) 81 | 82 | return img.astype(np.float32) 83 | 84 | def load_annotations(self, image_index): 85 | 86 | image_info_info = self.image_ids[image_index] 87 | annotations_ids = self.coco[image_info_info[0]].getAnnIds(imgIds=image_info_info[1], iscrowd=False) 88 | 89 | loaded_img=self.coco[image_info_info[0]].loadImgs(image_info_info[1]) 90 | 91 | width=loaded_img[0]['width'] 92 | height=loaded_img[0]['height'] 93 | 94 | valid_boxes=[] 95 | coco_annotations = self.coco[image_info_info[0]].loadAnns(annotations_ids) 96 | for idx, a in enumerate(coco_annotations): 97 | 98 | x1,y1=a['bbox'][0],a['bbox'][1] 99 | x2=x1+np.maximum(0.,a['bbox'][2]-1.) 100 | y2=y1+np.maximum(0.,a['bbox'][3]-1.) 101 | 102 | x1=np.minimum(width-1.,np.maximum(0.,x1)) 103 | y1=np.minimum(height-1.,np.maximum(0.,y1)) 104 | x2=np.minimum(width-1.,np.maximum(0.,x2)) 105 | y2=np.minimum(height-1.,np.maximum(0.,y2)) 106 | 107 | label=self.coco_label_to_label(a['category_id']) 108 | 109 | if a['area']>0 and x2>x1 and y2>y1: 110 | valid_boxes.append([x1,y1,x2,y2,label]) 111 | 112 | gt_boxes=np.zeros((len(valid_boxes),5),dtype=np.float32) 113 | for ii,jj in enumerate(valid_boxes): 114 | gt_boxes[ii,:]=jj 115 | 116 | return gt_boxes 117 | 118 | def coco_label_to_label(self, coco_label): 119 | return self.coco_labels_inverse[coco_label] 120 | 121 | def label_to_coco_label(self, label): 122 | return self.coco_labels[label] 123 | 124 | def image_aspect_ratio(self, image_index): 125 | image_info_info = self.image_ids[image_index] 126 | image = self.coco[image_info_info[0]].loadImgs(image_info_info[1])[0] 127 | return float(image['width']) / float(image['height']) 128 | 129 | def num_classes(self): 130 | return 80 131 | 132 | def num_gt(self, image_index): 133 | gt_boxes=self.load_annotations(image_index) 134 | return len(gt_boxes) 135 | 136 | class VocDataset(Dataset): 137 | """Voc dataset.""" 138 | 139 | def __init__(self, root_dir, set_name=['2007_trainval'], transform=None): 140 | """ 141 | Args: 142 | root_dir (string): VOC directory. 143 | transform (callable, optional): Optional transform to be applied 144 | on a sample. 145 | """ 146 | self.set_name = set_name 147 | self.devkit_path = root_dir 148 | self.transform = transform 149 | 150 | self.classes = ['__background__', # always index 0 151 | 'aeroplane', 'bicycle', 'bird', 'boat', 152 | 'bottle', 'bus', 'car', 'cat', 'chair', 153 | 'cow', 'diningtable', 'dog', 'horse', 154 | 'motorbike', 'person', 'pottedplant', 155 | 'sheep', 'sofa', 'train', 'tvmonitor'] 156 | 157 | 158 | self.image_ids = [] 159 | for set_name_ii in self.set_name: 160 | self.image_ids.extend([[set_name_ii,ids] for ids in self.load_image_set_index(set_name_ii)]) 161 | 162 | self.num_images = len(self.image_ids) 163 | print('num_images:' , self.num_images) 164 | 165 | self.config = {'comp_id': 'comp4', 166 | 'use_diff': False, 167 | 'min_size': 2} 168 | 169 | self.image_size=[] 170 | for ii in range(len(self.image_ids)): 171 | height,width,_=self.load_image(ii).shape 172 | self.image_size.append([height,width]) 173 | 174 | def load_image_set_index(self, image_set): 175 | """ 176 | find out which indexes correspond to given image set (train or val) 177 | :return: 178 | """ 179 | year,image_set=image_set.split('_') 180 | data_path=os.path.join(self.devkit_path,'VOC'+year) 181 | image_set_index_file = os.path.join(data_path, 'ImageSets', 'Main', image_set + '.txt') 182 | with open(image_set_index_file) as f: 183 | image_set_index = [x.strip() for x in f.readlines()] 184 | return image_set_index 185 | 186 | def __len__(self): 187 | return len(self.image_ids) 188 | 189 | def __getitem__(self, idx): 190 | 191 | img = self.load_image(idx) 192 | if len(self.set_name)==1 and self.set_name[0]=='2012_test': 193 | annot=np.zeros((0,5),dtype=np.float32) 194 | else: 195 | annot = self.load_annotations(idx) 196 | sample = {'img': img, 'annot': annot} 197 | if self.transform: 198 | sample = self.transform(sample) 199 | 200 | return sample 201 | 202 | def load_image(self, image_index): 203 | 204 | image_set, index = self.image_ids[image_index] 205 | year,image_set=image_set.split('_') 206 | data_path=os.path.join(self.devkit_path,'VOC'+year) 207 | 208 | image_path = os.path.join(data_path, 'JPEGImages', index + '.jpg') 209 | img = cv2.imread(image_path,cv2.IMREAD_COLOR) 210 | 211 | return img.astype(np.float32) 212 | 213 | def load_annotations(self, image_index): 214 | """ 215 | for a given index, load image and bounding boxes info from XML file 216 | :param index: index of a specific image 217 | :return: record['boxes', 'gt_classes', 'gt_overlaps', 'flipped'] 218 | """ 219 | import xml.etree.ElementTree as ET 220 | 221 | image_set, index = self.image_ids[image_index] 222 | year,image_set=image_set.split('_') 223 | data_path=os.path.join(self.devkit_path,'VOC'+year) 224 | 225 | height,width=self.image_size[image_index] 226 | 227 | filename = os.path.join(data_path, 'Annotations', index + '.xml') 228 | tree = ET.parse(filename) 229 | objs = tree.findall('object') 230 | if not self.config['use_diff']: 231 | non_diff_objs = [obj for obj in objs if int(obj.find('difficult').text) == 0] 232 | objs = non_diff_objs 233 | num_objs = len(objs) 234 | 235 | valid_boxes=[] 236 | class_to_index = dict(zip(self.classes, range(len(self.classes)))) 237 | # Load object bounding boxes into a data frame. 238 | for ix, obj in enumerate(objs): 239 | bbox = obj.find('bndbox') 240 | # Make pixel indexes 0-based 241 | x1 = float(bbox.find('xmin').text) - 1 242 | y1 = float(bbox.find('ymin').text) - 1 243 | x2 = float(bbox.find('xmax').text) - 1 244 | y2 = float(bbox.find('ymax').text) - 1 245 | 246 | x1=np.minimum(width-1.,np.maximum(0.,x1)) 247 | y1=np.minimum(height-1.,np.maximum(0.,y1)) 248 | x2=np.minimum(width-1.,np.maximum(0.,x2)) 249 | y2=np.minimum(height-1.,np.maximum(0.,y2)) 250 | 251 | cls = class_to_index[obj.find('name').text.lower().strip()] 252 | if x2>x1 and y2>y1: 253 | valid_boxes.append([x1,y1,x2,y2,cls-1]) 254 | 255 | gt_boxes=np.zeros((len(valid_boxes),5),dtype=np.float32) 256 | for ii,jj in enumerate(valid_boxes): 257 | gt_boxes[ii,:]=jj 258 | 259 | return gt_boxes 260 | 261 | def evaluate_detections(self, detections): 262 | """ 263 | top level evaluations 264 | :param detections: result matrix, [bbox, confidence] 265 | :return: None 266 | """ 267 | # make all these folders for results 268 | image_set=self.set_name[0] 269 | year,image_set=image_set.split('_') 270 | 271 | year_folder = os.path.join('results', 'VOC' + year) 272 | if not os.path.exists(year_folder): 273 | os.mkdir(year_folder) 274 | res_file_folder = os.path.join('results', 'VOC' + year, 'Main') 275 | if not os.path.exists(res_file_folder): 276 | os.mkdir(res_file_folder) 277 | 278 | self.write_pascal_results(detections) 279 | self.do_python_eval() 280 | 281 | def get_result_file_template(self): 282 | """ 283 | this is a template 284 | VOCdevkit/results/VOC2007/Main/_det_test_aeroplane.txt 285 | :return: a string template 286 | """ 287 | image_set=self.set_name[0] 288 | year,image_set=image_set.split('_') 289 | 290 | res_file_folder = os.path.join('results', 'VOC' + year, 'Main') 291 | comp_id = self.config['comp_id'] 292 | filename = comp_id + '_det_' + image_set + '_{:s}.txt' 293 | path = os.path.join(res_file_folder, filename) 294 | return path 295 | 296 | def write_pascal_results(self, all_boxes): 297 | """ 298 | write results files in pascal devkit path 299 | :param all_boxes: boxes to be processed [bbox, confidence] 300 | :return: None 301 | """ 302 | for cls_ind, cls in enumerate(self.classes): 303 | if cls == '__background__': 304 | continue 305 | print('Writing {} VOC results file'.format(cls)) 306 | filename = self.get_result_file_template().format(cls) 307 | with open(filename, 'wt') as f: 308 | for im_ind, set_index in enumerate(self.image_ids): 309 | _,index=set_index 310 | dets = all_boxes[cls_ind][im_ind] 311 | if len(dets) == 0: 312 | continue 313 | # the VOCdevkit expects 1-based indices 314 | for k in range(dets.shape[0]): 315 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'. 316 | format(index, dets[k, -1], 317 | dets[k, 0] + 1, dets[k, 1] + 1, dets[k, 2] + 1, dets[k, 3] + 1)) 318 | 319 | def do_python_eval(self): 320 | """ 321 | python evaluation wrapper 322 | :return: None 323 | """ 324 | image_set=self.set_name[0] 325 | year,image_set=image_set.split('_') 326 | data_path=os.path.join(self.devkit_path,'VOC'+year) 327 | 328 | annopath = os.path.join(data_path, 'Annotations', '{0!s}.xml') 329 | imageset_file = os.path.join(data_path, 'ImageSets', 'Main', image_set + '.txt') 330 | 331 | aps1 = [] 332 | # The PASCAL VOC metric changed in 2010 333 | use_07_metric = True if int(year) < 2010 else False 334 | print('VOC07 metric? ' + ('Y' if use_07_metric else 'No')) 335 | for cls_ind, cls in enumerate(self.classes): 336 | if cls == '__background__': 337 | continue 338 | filename = self.get_result_file_template().format(cls) 339 | rec, prec, ap = voc_eval(filename, annopath, imageset_file, cls, 340 | ovthresh=0.5, use_07_metric=use_07_metric) 341 | aps1 += [ap] 342 | print('AP for {} = {:.4f}'.format(cls, ap)) 343 | 344 | print('Mean AP = {:.4f}'.format(np.mean(aps1))) 345 | 346 | def image_aspect_ratio(self, image_index): 347 | 348 | height,width=self.image_size[image_index] 349 | 350 | return float(width) / float(height) 351 | 352 | def num_classes(self): 353 | return 20 354 | 355 | def num_gt(self, image_index): 356 | gt_boxes=self.load_annotations(image_index) 357 | return len(gt_boxes) 358 | 359 | 360 | def collater(data): 361 | 362 | imgs = [s['img'] for s in data] 363 | annots = [s['annot'] for s in data] 364 | 365 | widths = [int(s.shape[0]) for s in imgs] 366 | heights = [int(s.shape[1]) for s in imgs] 367 | batch_size = len(imgs) 368 | 369 | max_width = np.array(widths).max() 370 | max_height = np.array(heights).max() 371 | 372 | padded_imgs = torch.zeros(batch_size, max_width, max_height, 3) 373 | 374 | for i in range(batch_size): 375 | img = imgs[i] 376 | padded_imgs[i, :int(img.shape[0]), :int(img.shape[1]), :] = img 377 | 378 | max_num_annots = max(annot.shape[0] for annot in annots) 379 | 380 | if max_num_annots > 0: 381 | 382 | annot_padded = torch.ones((len(annots), max_num_annots, 5)) * -1 383 | 384 | if max_num_annots > 0: 385 | for idx, annot in enumerate(annots): 386 | if annot.shape[0] > 0: 387 | annot_padded[idx, :annot.shape[0], :] = annot 388 | else: 389 | annot_padded = torch.ones((len(annots), 1, 5)) * -1 390 | 391 | 392 | padded_imgs = padded_imgs.permute(0, 3, 1, 2) 393 | 394 | return {'img': padded_imgs, 'annot': annot_padded} 395 | 396 | class Resizer(object): 397 | """Convert ndarrays in sample to Tensors.""" 398 | 399 | def __call__(self, sample): 400 | 401 | min_side=config.test_img_size[0] 402 | max_side=config.test_img_size[1] 403 | 404 | image, annots = sample['img'], sample['annot'] 405 | 406 | rows, cols, cns = image.shape 407 | 408 | smallest_side = min(rows, cols) 409 | 410 | # rescale the image so the smallest side is min_side 411 | scale = float(min_side) / float(smallest_side) 412 | 413 | # check if the largest side is now greater than max_side, which can happen 414 | # when images have a large aspect ratio 415 | largest_side = max(rows, cols) 416 | 417 | if largest_side * scale > max_side: 418 | scale = float(max_side) / float(largest_side) 419 | 420 | # resize the image with the computed scale 421 | image = cv2.resize(image, None, None, fx=scale, fy=scale, interpolation=cv2.INTER_LINEAR) 422 | rows2, cols2, cns = image.shape 423 | 424 | pad_w = 32 - rows2%32 425 | pad_h = 32 - cols2%32 426 | 427 | if pad_w==32: 428 | pad_w=0 429 | if pad_h==32: 430 | pad_h=0 431 | 432 | new_image = np.zeros((rows2 + pad_w, cols2 + pad_h, cns)).astype(np.float32) 433 | new_image[:rows2, :cols2, :] = image.astype(np.float32) 434 | 435 | annots_wh = annots[:,2:4]-annots[:,0:2]+1.0 436 | annots[:,0:2] = annots[:,0:2]/np.array([cols,rows])*np.array([cols2,rows2]) 437 | annots[:,2:4]=annots[:,0:2]+annots_wh/np.array([cols,rows])*np.array([cols2,rows2])-1.0 438 | 439 | return {'img': torch.from_numpy(new_image), 'annot': torch.from_numpy(annots), 'scale': scale, 'im_info': torch.tensor([[rows2,cols2]])} 440 | 441 | 442 | class Normalizer(object): 443 | 444 | def __init__(self): 445 | self.mean = np.array([[[102.9801, 115.9465, 122.7717]]]) 446 | self.std = np.array([[[0.229, 0.224, 0.225]]]) 447 | 448 | def __call__(self, sample): 449 | 450 | image, annots = sample['img'], sample['annot'] 451 | 452 | return {'img':((image.astype(np.float32)-self.mean)), 'annot': annots} 453 | 454 | 455 | class AspectRatioBasedSampler(Sampler): 456 | 457 | def __init__(self, data_source, batch_size): 458 | self.data_source = data_source 459 | self.batch_size = batch_size 460 | self.groups = self.group_images() 461 | 462 | def __iter__(self): 463 | self.groups = self.group_images() 464 | for group in self.groups: 465 | yield group 466 | 467 | def __len__(self): 468 | return (len(self.data_source) + self.batch_size - 1) // self.batch_size 469 | 470 | def group_images(self): 471 | 472 | img_filter=True 473 | if img_filter: 474 | valid_img=[] 475 | for ii in range(len(self.data_source)): 476 | if self.data_source.num_gt(ii)>0: 477 | valid_img.append(ii) 478 | else: 479 | valid_img=range(len(self.data_source)) 480 | 481 | print('Shuffle') 482 | print('images_num:'+str(len(valid_img))) 483 | 484 | aspect_ratio_grouping=False 485 | if aspect_ratio_grouping: 486 | aspect_ratios=[self.data_source.image_aspect_ratio(ii) for ii in valid_img] 487 | aspect_ratios=np.array(aspect_ratios) 488 | g1=(aspect_ratios>=1) 489 | g2=np.logical_not(g1) 490 | g1_inds=np.where(g1)[0] 491 | g2_inds=np.where(g2)[0] 492 | 493 | pad_g1=self.batch_size-len(g1_inds)%self.batch_size 494 | pad_g2=self.batch_size-len(g2_inds)%self.batch_size 495 | if pad_g1==self.batch_size: 496 | pad_g1=0 497 | if pad_g2==self.batch_size: 498 | pad_g2=0 499 | g1_inds=np.hstack([g1_inds,g1_inds[:pad_g1]]) 500 | g2_inds=np.hstack([g2_inds,g2_inds[:pad_g2]]) 501 | random.shuffle(g1_inds) 502 | random.shuffle(g2_inds) 503 | inds=np.hstack((g1_inds,g2_inds)) 504 | 505 | inds=np.reshape(inds[:],(-1,self.batch_size)) 506 | row_perm=np.arange(inds.shape[0]) 507 | random.shuffle(row_perm) 508 | inds=np.reshape(inds[row_perm,:],(-1,)) 509 | 510 | else: 511 | inds=np.arange(len(valid_img)) 512 | random.shuffle(inds) 513 | pad=self.batch_size-len(inds)%self.batch_size 514 | if pad==self.batch_size: 515 | pad=0 516 | inds=np.hstack([inds,inds[:pad]]) 517 | random.shuffle(inds) 518 | 519 | return [[valid_img[inds[x]] for x in range(i,i+self.batch_size)] for i in range(0,len(inds),self.batch_size)] 520 | -------------------------------------------------------------------------------- /lib/model/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /lib/model/anchors.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | from .. import config 5 | 6 | class Anchors(nn.Module): 7 | def __init__(self, pyramid_levels=None, strides=None, sizes=None, ratios=None, scales=None): 8 | super(Anchors, self).__init__() 9 | 10 | if pyramid_levels is None: 11 | self.pyramid_levels = [3, 4, 5, 6, 7] 12 | if strides is None: 13 | self.strides = [2 ** x for x in self.pyramid_levels] 14 | 15 | feat_stride=[8,16,32,64,128] 16 | self.base_anchors = [generate_anchors(base_size=feat_stride[i], ratios=config.anchor_ratios, scales=config.anchor_scales*4) for i in range(5)] 17 | 18 | def forward(self, image): 19 | 20 | image_shape = image.shape[2:] 21 | image_shape = np.array(image_shape) 22 | image_shapes = [(image_shape + 2 ** x - 1) // (2 ** x) for x in self.pyramid_levels] 23 | 24 | # compute anchors over all pyramid levels 25 | all_anchors = [] 26 | 27 | for idx, p in enumerate(self.pyramid_levels): 28 | anchors = self.base_anchors[idx] 29 | shifted_anchors = shift(image_shapes[idx], self.strides[idx], anchors) 30 | all_anchors.append(torch.from_numpy(np.expand_dims(shifted_anchors,axis=0).astype(np.float64)).cuda()) 31 | 32 | return all_anchors 33 | 34 | 35 | def shift(shape, stride, anchors): 36 | shift_x = (np.arange(0, shape[1])) * stride 37 | shift_y = (np.arange(0, shape[0])) * stride 38 | 39 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 40 | 41 | shifts = np.vstack(( 42 | shift_x.ravel(), shift_y.ravel(), 43 | shift_x.ravel(), shift_y.ravel() 44 | )).transpose() 45 | 46 | # add A anchors (1, A, 4) to 47 | # cell K shifts (K, 1, 4) to get 48 | # shift anchors (K, A, 4) 49 | # reshape to (K*A, 4) shifted anchors 50 | A = anchors.shape[0] 51 | K = shifts.shape[0] 52 | all_anchors = (anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))) 53 | all_anchors = all_anchors.reshape((K * A, 4)) 54 | 55 | return all_anchors 56 | 57 | def generate_anchors(base_size, ratios, scales): 58 | """ 59 | Generate anchor (reference) windows by enumerating aspect ratios X 60 | scales wrt a reference (0, 0, 15, 15) window. 61 | """ 62 | 63 | base_anchor = np.array([1, 1, base_size, base_size]) - 1 64 | ratio_anchors = _ratio_enum(base_anchor, ratios) 65 | anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) 66 | for i in range(ratio_anchors.shape[0])]) 67 | return anchors 68 | 69 | 70 | def _whctrs(anchor): 71 | """ 72 | Return width, height, x center, and y center for an anchor (window). 73 | """ 74 | 75 | w = anchor[2] - anchor[0] + 1 76 | h = anchor[3] - anchor[1] + 1 77 | x_ctr = anchor[0] + 0.5 * (w - 1) 78 | y_ctr = anchor[1] + 0.5 * (h - 1) 79 | return w, h, x_ctr, y_ctr 80 | 81 | 82 | def _mkanchors(ws, hs, x_ctr, y_ctr): 83 | """ 84 | Given a vector of widths (ws) and heights (hs) around a center 85 | (x_ctr, y_ctr), output a set of anchors (windows). 86 | """ 87 | 88 | ws = ws[:, np.newaxis] 89 | hs = hs[:, np.newaxis] 90 | anchors = np.hstack((x_ctr - 0.5 * (ws - 1), 91 | y_ctr - 0.5 * (hs - 1), 92 | x_ctr + 0.5 * (ws - 1), 93 | y_ctr + 0.5 * (hs - 1))) 94 | return anchors 95 | 96 | 97 | def _ratio_enum(anchor, ratios): 98 | """ 99 | Enumerate a set of anchors for each aspect ratio wrt an anchor. 100 | """ 101 | 102 | w, h, x_ctr, y_ctr = _whctrs(anchor) 103 | size = w * h 104 | size_ratios = size / ratios 105 | ws = (np.sqrt(size_ratios)) 106 | hs = (ws * ratios) 107 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 108 | return anchors 109 | 110 | 111 | def _scale_enum(anchor, scales): 112 | """ 113 | Enumerate a set of anchors for each scale wrt an anchor. 114 | """ 115 | 116 | w, h, x_ctr, y_ctr = _whctrs(anchor) 117 | ws = w * scales 118 | hs = h * scales 119 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 120 | return anchors 121 | -------------------------------------------------------------------------------- /lib/model/aploss.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | from .. import config 5 | from ..util.calc_iou import calc_iou 6 | 7 | class APLoss(torch.autograd.Function): 8 | @staticmethod 9 | def forward(ctx, classifications, regressions, anchors, annotations): 10 | 11 | batch_size = classifications.shape[0] 12 | regression_losses = [] 13 | 14 | regression_grads=torch.zeros(regressions.shape).cuda() 15 | p_num=torch.zeros(1).cuda() 16 | labels_b=[] 17 | 18 | anchor = anchors[0, :, :].type(torch.cuda.FloatTensor) 19 | 20 | anchor_widths = anchor[:, 2] - anchor[:, 0]+1.0 21 | anchor_heights = anchor[:, 3] - anchor[:, 1]+1.0 22 | anchor_ctr_x = anchor[:, 0] + 0.5 * (anchor_widths-1.0) 23 | anchor_ctr_y = anchor[:, 1] + 0.5 * (anchor_heights-1.0) 24 | 25 | for j in range(batch_size): 26 | 27 | classification = classifications[j, :, :] 28 | regression = regressions[j, :, :] 29 | 30 | bbox_annotation = annotations[j, :, :] 31 | bbox_annotation = bbox_annotation[bbox_annotation[:, 4] != -1] 32 | 33 | if bbox_annotation.shape[0] == 0: 34 | regression_losses.append(torch.tensor(0).float().cuda()) 35 | labels_b.append(torch.zeros(classification.shape).cuda()) 36 | continue 37 | 38 | IoU = calc_iou(anchors[0, :, :], bbox_annotation[:, :4]) # num_anchors x num_annotations 39 | 40 | IoU_max, IoU_argmax = torch.max(IoU, dim=1) # num_anchors x 1 41 | 42 | # compute the loss for classification 43 | targets = torch.ones(classification.shape) * -1 44 | targets = targets.cuda() 45 | 46 | ###### 47 | gt_IoU_max, gt_IoU_argmax = torch.max(IoU, dim=0) 48 | gt_IoU_argmax=torch.where(IoU==gt_IoU_max)[0] 49 | positive_indices = torch.ge(torch.zeros(IoU_max.shape).cuda(),1) 50 | positive_indices[gt_IoU_argmax.long()] = True 51 | ###### 52 | 53 | positive_indices = positive_indices | torch.ge(IoU_max, 0.5) 54 | negative_indices = torch.lt(IoU_max, 0.4) 55 | 56 | p_num+=positive_indices.sum() 57 | 58 | assigned_annotations = bbox_annotation[IoU_argmax, :] 59 | 60 | targets[negative_indices, :] = 0 61 | targets[positive_indices, :] = 0 62 | targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1 63 | labels_b.append(targets) 64 | 65 | # compute the loss for regression 66 | if positive_indices.sum() > 0: 67 | 68 | assigned_annotations = assigned_annotations[positive_indices, :] 69 | 70 | anchor_widths_pi = anchor_widths[positive_indices] 71 | anchor_heights_pi = anchor_heights[positive_indices] 72 | anchor_ctr_x_pi = anchor_ctr_x[positive_indices] 73 | anchor_ctr_y_pi = anchor_ctr_y[positive_indices] 74 | 75 | gt_widths = assigned_annotations[:, 2] - assigned_annotations[:, 0]+1.0 76 | gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1]+1.0 77 | gt_ctr_x = assigned_annotations[:, 0] + 0.5 * (gt_widths-1.0) 78 | gt_ctr_y = assigned_annotations[:, 1] + 0.5 * (gt_heights-1.0) 79 | 80 | # clip widths to 1 81 | gt_widths = torch.clamp(gt_widths, min=1) 82 | gt_heights = torch.clamp(gt_heights, min=1) 83 | 84 | targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi 85 | targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi 86 | targets_dw = torch.log(gt_widths / anchor_widths_pi) 87 | targets_dh = torch.log(gt_heights / anchor_heights_pi) 88 | 89 | targets2 = torch.stack((targets_dx, targets_dy, targets_dw, targets_dh)) 90 | targets2 = targets2.t() 91 | 92 | targets2 = targets2/torch.Tensor([[0.1, 0.1, 0.2, 0.2]]).cuda() 93 | 94 | #negative_indices = ~ positive_indices 95 | 96 | regression_diff = regression[positive_indices, :]-targets2 97 | regression_diff_abs= torch.abs(regression_diff) 98 | 99 | regression_loss = torch.where( 100 | torch.le(regression_diff_abs, 1.0 / 1.0), 101 | 0.5 * 1.0 * torch.pow(regression_diff_abs, 2), 102 | regression_diff_abs - 0.5 / 1.0 103 | ) 104 | regression_losses.append(regression_loss.sum()) 105 | 106 | 107 | regression_grad=torch.where( 108 | torch.le(regression_diff_abs,1.0/1.0), 109 | 1.0*regression_diff, 110 | torch.sign(regression_diff)) 111 | regression_grads[j,positive_indices,:]=regression_grad 112 | 113 | else: 114 | regression_losses.append(torch.tensor(0).float().cuda()) 115 | 116 | p_num=torch.clamp(p_num,min=1) 117 | regression_grads/=(4*p_num) 118 | 119 | ########################AP-LOSS########################## 120 | labels_b=torch.stack(labels_b) 121 | classification_grads,classification_losses=AP_loss(classifications,labels_b) 122 | ######################################################### 123 | 124 | ctx.save_for_backward(classification_grads,regression_grads) 125 | return classification_losses, torch.stack(regression_losses).sum(dim=0, keepdim=True)/p_num 126 | 127 | @staticmethod 128 | def backward(ctx, out_grad1, out_grad2): 129 | g1,g2=ctx.saved_tensors 130 | return g1*out_grad1,g2*out_grad2,None,None 131 | 132 | 133 | def AP_loss(logits,targets): 134 | 135 | delta=1.0 136 | 137 | grad=torch.zeros(logits.shape).cuda() 138 | metric=torch.zeros(1).cuda() 139 | 140 | if torch.max(targets)<=0: 141 | return grad, metric 142 | 143 | labels_p=(targets==1) 144 | fg_logits=logits[labels_p] 145 | threshold_logit=torch.min(fg_logits)-delta 146 | 147 | ######## Ignore those negative j that satisfy (L_{ij}=0 for all positive i), to accelerate the AP-loss computation. 148 | valid_labels_n=((targets==0)&(logits>=threshold_logit)) 149 | valid_bg_logits=logits[valid_labels_n] 150 | valid_bg_grad=torch.zeros(len(valid_bg_logits)).cuda() 151 | ######## 152 | 153 | fg_num=len(fg_logits) 154 | prec=torch.zeros(fg_num).cuda() 155 | order=torch.argsort(fg_logits) 156 | max_prec=0 157 | 158 | for ii in order: 159 | tmp1=fg_logits-fg_logits[ii] 160 | tmp1=torch.clamp(tmp1/(2*delta)+0.5,min=0,max=1) 161 | tmp2=valid_bg_logits-fg_logits[ii] 162 | tmp2=torch.clamp(tmp2/(2*delta)+0.5,min=0,max=1) 163 | a=torch.sum(tmp1)+0.5 164 | b=torch.sum(tmp2) 165 | tmp2/=(a+b) 166 | current_prec=a/(a+b) 167 | if (max_prec<=current_prec): 168 | max_prec=current_prec 169 | else: 170 | tmp2*=((1-max_prec)/(1-current_prec)) 171 | valid_bg_grad+=tmp2 172 | prec[ii]=max_prec 173 | 174 | grad[valid_labels_n]=valid_bg_grad 175 | grad[labels_p]=-(1-prec) 176 | 177 | fg_num=max(fg_num,1) 178 | 179 | grad /= (fg_num) 180 | 181 | metric=torch.sum(prec,dim=0,keepdim=True)/fg_num 182 | 183 | return grad, 1-metric 184 | -------------------------------------------------------------------------------- /lib/model/model.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch 3 | import math 4 | import time 5 | from ..util.utils import BasicBlock, Bottleneck, BBoxTransform, ClipBoxes 6 | from anchors import Anchors 7 | from aploss import APLoss 8 | 9 | from .. import config 10 | import torchvision 11 | 12 | class PyramidFeatures(nn.Module): 13 | def __init__(self, C3_size, C4_size, C5_size, feature_size=256): 14 | super(PyramidFeatures, self).__init__() 15 | 16 | # upsample C5 to get P5 from the FPN paper 17 | self.P5_1 = nn.Conv2d(C5_size, feature_size, kernel_size=1, stride=1, padding=0) 18 | self.P5_upsampled = nn.Upsample(scale_factor=2, mode='nearest') 19 | self.P5_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1) 20 | 21 | # add P5 elementwise to C4 22 | self.P4_1 = nn.Conv2d(C4_size, feature_size, kernel_size=1, stride=1, padding=0) 23 | self.P4_upsampled = nn.Upsample(scale_factor=2, mode='nearest') 24 | self.P4_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1) 25 | 26 | # add P4 elementwise to C3 27 | self.P3_1 = nn.Conv2d(C3_size, feature_size, kernel_size=1, stride=1, padding=0) 28 | self.P3_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=1, padding=1) 29 | 30 | # "P6 is obtained via a 3x3 stride-2 conv on C5" 31 | self.P6 = nn.Conv2d(C5_size, feature_size, kernel_size=3, stride=2, padding=1) 32 | 33 | # "P7 is computed by applying ReLU followed by a 3x3 stride-2 conv on P6" 34 | self.P7_1 = nn.ReLU() 35 | self.P7_2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, stride=2, padding=1) 36 | 37 | def forward(self, inputs): 38 | 39 | C3, C4, C5 = inputs 40 | 41 | P5_x = self.P5_1(C5) 42 | P5_upsampled_x = self.P5_upsampled(P5_x) 43 | P5_x = self.P5_2(P5_x) 44 | 45 | P4_x = self.P4_1(C4) 46 | P4_x = P5_upsampled_x + P4_x 47 | P4_upsampled_x = self.P4_upsampled(P4_x) 48 | P4_x = self.P4_2(P4_x) 49 | 50 | P3_x = self.P3_1(C3) 51 | P3_x = P3_x + P4_upsampled_x 52 | P3_x = self.P3_2(P3_x) 53 | 54 | P6_x = self.P6(C5) 55 | 56 | P7_x = self.P7_1(P6_x) 57 | P7_x = self.P7_2(P7_x) 58 | 59 | return [P3_x, P4_x, P5_x, P6_x, P7_x] 60 | 61 | 62 | class RegressionModel(nn.Module): 63 | def __init__(self, num_features_in, num_anchors=9, feature_size=256): 64 | super(RegressionModel, self).__init__() 65 | 66 | self.conv1 = nn.Conv2d(num_features_in, feature_size, kernel_size=3, padding=1) 67 | self.act1 = nn.ReLU() 68 | 69 | self.conv2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 70 | self.act2 = nn.ReLU() 71 | 72 | self.conv3 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 73 | self.act3 = nn.ReLU() 74 | 75 | self.conv4 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 76 | self.act4 = nn.ReLU() 77 | 78 | self.output = nn.Conv2d(feature_size, num_anchors*4, kernel_size=3, padding=1) 79 | 80 | def forward(self, x): 81 | 82 | out = self.conv1(x) 83 | out = self.act1(out) 84 | 85 | out = self.conv2(out) 86 | out = self.act2(out) 87 | 88 | out = self.conv3(out) 89 | out = self.act3(out) 90 | 91 | out = self.conv4(out) 92 | out = self.act4(out) 93 | 94 | out = self.output(out) 95 | 96 | # out is B x C x W x H, with C = 4*num_anchors 97 | out = out.permute(0, 2, 3, 1) 98 | 99 | return out.contiguous().view(out.shape[0], -1, 4) 100 | 101 | class ClassificationModel(nn.Module): 102 | def __init__(self, num_features_in, num_anchors=9, num_classes=80, prior=0.01, feature_size=256): 103 | super(ClassificationModel, self).__init__() 104 | 105 | self.num_classes = num_classes 106 | self.num_anchors = num_anchors 107 | 108 | self.conv1 = nn.Conv2d(num_features_in, feature_size, kernel_size=3, padding=1) 109 | self.act1 = nn.ReLU() 110 | 111 | self.conv2 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 112 | self.act2 = nn.ReLU() 113 | 114 | self.conv3 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 115 | self.act3 = nn.ReLU() 116 | 117 | self.conv4 = nn.Conv2d(feature_size, feature_size, kernel_size=3, padding=1) 118 | self.act4 = nn.ReLU() 119 | 120 | self.output = nn.Conv2d(feature_size, num_anchors*num_classes, kernel_size=3, padding=1) 121 | #self.output_act = nn.Sigmoid() 122 | 123 | def forward(self, x): 124 | 125 | out = self.conv1(x) 126 | out = self.act1(out) 127 | 128 | out = self.conv2(out) 129 | out = self.act2(out) 130 | 131 | out = self.conv3(out) 132 | out = self.act3(out) 133 | 134 | out = self.conv4(out) 135 | out = self.act4(out) 136 | 137 | out = self.output(out) 138 | #out = self.output_act(out) 139 | 140 | # out is B x C x W x H, with C = n_classes + n_anchors 141 | out1 = out.permute(0, 2, 3, 1) 142 | 143 | batch_size, width, height, channels = out1.shape 144 | 145 | out2 = out1.view(batch_size, width, height, self.num_anchors, self.num_classes) 146 | 147 | return out2.contiguous().view(x.shape[0], -1, self.num_classes) 148 | 149 | class ResNet(nn.Module): 150 | 151 | def __init__(self, num_classes, block, layers, conv1_bias): 152 | self.inplanes = 64 153 | super(ResNet, self).__init__() 154 | self.num_classes=num_classes 155 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=conv1_bias) 156 | 157 | for ii in self.conv1.parameters(): 158 | ii.requires_grad=False 159 | 160 | self.bn1 = nn.BatchNorm2d(64) 161 | self.relu = nn.ReLU(inplace=True) 162 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 163 | self.layer1 = self._make_layer(block, 64, layers[0]) 164 | 165 | for ii in self.layer1.parameters(): 166 | ii.requires_grad=False 167 | 168 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 169 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 170 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 171 | 172 | if block == BasicBlock: 173 | fpn_sizes = [self.layer2[layers[1]-1].conv2.out_channels, self.layer3[layers[2]-1].conv2.out_channels, self.layer4[layers[3]-1].conv2.out_channels] 174 | elif block == Bottleneck: 175 | fpn_sizes = [self.layer2[layers[1]-1].conv3.out_channels, self.layer3[layers[2]-1].conv3.out_channels, self.layer4[layers[3]-1].conv3.out_channels] 176 | 177 | self.fpn = PyramidFeatures(fpn_sizes[0], fpn_sizes[1], fpn_sizes[2]) 178 | 179 | self.regressionModel = RegressionModel(256, num_anchors=config.num_anchors) 180 | self.classificationModel = ClassificationModel(256, num_classes=num_classes, num_anchors=config.num_anchors) 181 | 182 | self.anchors = Anchors() 183 | 184 | self.regressBoxes = BBoxTransform() 185 | 186 | self.clipBoxes = ClipBoxes() 187 | 188 | self.aploss= APLoss() 189 | 190 | for m in self.modules(): 191 | if isinstance(m, nn.Conv2d): 192 | m.weight.data.normal_(0,0.01) 193 | if m.bias is not None: 194 | m.bias.data.fill_(0) 195 | elif isinstance(m, nn.BatchNorm2d): 196 | m.weight.data.fill_(1) 197 | m.bias.data.zero_() 198 | 199 | prior = 0.01 200 | 201 | #self.classificationModel.output.weight.data.fill_(0) 202 | self.classificationModel.output.weight.data.normal_(0,0.01) 203 | self.classificationModel.output.bias.data.fill_(-math.log((1.0-prior)/prior)) 204 | 205 | #self.regressionModel.output.weight.data.fill_(0) 206 | self.regressionModel.output.weight.data.normal_(0,0.001) 207 | self.regressionModel.output.bias.data.fill_(0) 208 | 209 | self.freeze_bn() 210 | 211 | def _make_layer(self, block, planes, blocks, stride=1): 212 | downsample = None 213 | if stride != 1 or self.inplanes != planes * block.expansion: 214 | downsample = nn.Sequential( 215 | nn.Conv2d(self.inplanes, planes * block.expansion, 216 | kernel_size=1, stride=stride, bias=False), 217 | nn.BatchNorm2d(planes * block.expansion), 218 | ) 219 | 220 | layers = [] 221 | layers.append(block(self.inplanes, planes, stride, downsample)) 222 | self.inplanes = planes * block.expansion 223 | for i in range(1, blocks): 224 | layers.append(block(self.inplanes, planes)) 225 | 226 | return nn.Sequential(*layers) 227 | 228 | def freeze_bn(self): 229 | '''Freeze BatchNorm layers.''' 230 | for layer in self.modules(): 231 | if isinstance(layer, nn.BatchNorm2d): 232 | layer.eval() 233 | for ii in layer.parameters(): 234 | ii.requires_grad=False 235 | 236 | def forward(self, inputs): 237 | 238 | if self.training: 239 | img_batch, annotations = inputs 240 | else: 241 | img_batch, im_info = inputs 242 | 243 | x = self.conv1(img_batch) 244 | x = self.bn1(x) 245 | x = self.relu(x) 246 | x = self.maxpool(x) 247 | 248 | x1 = self.layer1(x) 249 | x2 = self.layer2(x1) 250 | x3 = self.layer3(x2) 251 | x4 = self.layer4(x3) 252 | 253 | features = self.fpn([x2, x3, x4]) 254 | 255 | regressions = [self.regressionModel(feature) for feature in features] 256 | 257 | classifications = [self.classificationModel(feature) for feature in features] 258 | 259 | anchors = self.anchors(img_batch) 260 | 261 | if self.training: 262 | regressions=torch.cat(regressions, dim=1) 263 | classifications=torch.cat(classifications, dim=1) 264 | anchors=torch.cat(anchors, dim=1) 265 | return self.aploss.apply(classifications, regressions, anchors, annotations) 266 | else: 267 | box_all=[] 268 | cls_all=[] 269 | score_all=[] 270 | for ii, anchor in enumerate(anchors): 271 | 272 | classification=classifications[ii].view(-1) 273 | regression=regressions[ii] 274 | 275 | classification=torch.sigmoid(classification) 276 | 277 | ###filter 278 | num_topk=min(1000,classification.size(0)) 279 | ordered_score,ordered_idx=classification.sort(descending=True) 280 | ordered_score=ordered_score[:num_topk] 281 | ordered_idx=ordered_idx[:num_topk] 282 | 283 | if ii<4: 284 | score_th=0.01 285 | else: 286 | score_th=0 287 | keep_idx=(ordered_score>score_th) 288 | ordered_score=ordered_score[keep_idx] 289 | ordered_idx=ordered_idx[keep_idx] 290 | 291 | anchor_idx = ordered_idx // self.num_classes 292 | cls_idx = ordered_idx % self.num_classes 293 | 294 | transformed_anchor = self.regressBoxes(anchor[:,anchor_idx,:], regression[:,anchor_idx,:]) 295 | transformed_anchor = self.clipBoxes(transformed_anchor, im_info[0]) 296 | 297 | box_all.append(transformed_anchor[0]) 298 | cls_all.append(cls_idx) 299 | score_all.append(ordered_score) 300 | 301 | box_all=torch.cat(box_all,dim=0) 302 | cls_all=torch.cat(cls_all,dim=0) 303 | score_all=torch.cat(score_all,dim=0) 304 | 305 | keep=torchvision.ops.boxes.batched_nms(box_all,score_all,cls_all,0.5) 306 | keep=keep[:100] 307 | return [score_all[keep], cls_all[keep], box_all[keep, :]] 308 | 309 | def resnet50(num_classes, pretrained=False): 310 | 311 | model = ResNet(num_classes, Bottleneck, [3, 4, 6, 3], conv1_bias=True) 312 | if pretrained: 313 | model.load_state_dict(torch.load('models/resnet50-pytorch.pth'), strict=False) 314 | return model 315 | 316 | def resnet101(num_classes, pretrained=False): 317 | 318 | model = ResNet(num_classes, Bottleneck, [3, 4, 23, 3], conv1_bias=False) 319 | if pretrained: 320 | model.load_state_dict(torch.load('models/resnet101-pytorch.pth'), strict=False) 321 | return model 322 | -------------------------------------------------------------------------------- /lib/util/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /lib/util/calc_iou.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | 5 | def calc_iou(a, b): 6 | 7 | a=a.type(torch.cuda.DoubleTensor) 8 | b=b.type(torch.cuda.DoubleTensor) 9 | 10 | area = (b[:, 2] - b[:, 0]+1) * (b[:, 3] - b[:, 1]+1) 11 | 12 | iw = torch.min(torch.unsqueeze(a[:, 2], dim=1), b[:, 2]) - torch.max(torch.unsqueeze(a[:, 0], 1), b[:, 0])+1 13 | ih = torch.min(torch.unsqueeze(a[:, 3], dim=1), b[:, 3]) - torch.max(torch.unsqueeze(a[:, 1], 1), b[:, 1])+1 14 | 15 | iw = torch.clamp(iw, min=0) 16 | ih = torch.clamp(ih, min=0) 17 | 18 | ua = torch.unsqueeze((a[:, 2] - a[:, 0]+1) * (a[:, 3] - a[:, 1]+1), dim=1) + area - iw * ih 19 | 20 | #ua = torch.clamp(ua, min=1e-8) 21 | 22 | intersection = iw * ih 23 | 24 | IoU = intersection / ua 25 | 26 | return IoU 27 | -------------------------------------------------------------------------------- /lib/util/coco_eval.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | from pycocotools.coco import COCO 4 | from pycocotools.cocoeval import COCOeval 5 | 6 | import numpy as np 7 | import json 8 | import os 9 | 10 | import torch 11 | 12 | def evaluate_coco(dataset, model, threshold=0.01): 13 | 14 | model.eval() 15 | 16 | with torch.no_grad(): 17 | 18 | # start collecting results 19 | results = [] 20 | image_ids = [] 21 | 22 | for index in range(len(dataset)): 23 | data = dataset[index] 24 | scale = data['scale'] 25 | 26 | # run network 27 | scores, labels, boxes = model([data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0),data['im_info'].cuda()]) 28 | scores = scores.cpu() 29 | labels = labels.cpu() 30 | boxes = boxes.cpu() 31 | 32 | # correct boxes for image scale 33 | boxes[:,2]=boxes[:,2]-boxes[:,0]+1 34 | boxes[:,3]=boxes[:,3]-boxes[:,1]+1 35 | boxes /= scale 36 | 37 | if boxes.shape[0] > 0: 38 | 39 | # compute predicted labels and scores 40 | for box_id in range(boxes.shape[0]): 41 | score = float(scores[box_id]) 42 | label = int(labels[box_id]) 43 | box = boxes[box_id, :] 44 | 45 | # append detection for each positively labeled class 46 | image_result = { 47 | 'image_id' : dataset.image_ids[index][1], 48 | 'category_id' : dataset.label_to_coco_label(label), 49 | 'score' : float(score), 50 | 'bbox' : box.tolist(), 51 | } 52 | 53 | # append detection to results 54 | results.append(image_result) 55 | 56 | # append image to list of processed images 57 | image_ids.append(dataset.image_ids[index][1]) 58 | 59 | # print progress 60 | print('{}/{}'.format(index, len(dataset)), end='\r') 61 | 62 | # write output 63 | json.dump(results, open('./results/{}_bbox_results.json'.format(dataset.set_name[0]), 'w'), indent=4) 64 | 65 | if 'test' in dataset.set_name[0]: 66 | return 67 | 68 | # load results in COCO evaluation tool 69 | coco_true = dataset.coco 70 | coco_true = coco_true[list(coco_true.keys())[0]] 71 | coco_pred = coco_true.loadRes('./results/{}_bbox_results.json'.format(dataset.set_name[0])) 72 | 73 | # run COCO evaluation 74 | coco_eval = COCOeval(coco_true, coco_pred, 'bbox') 75 | coco_eval.params.imgIds = image_ids 76 | coco_eval.evaluate() 77 | coco_eval.accumulate() 78 | coco_eval.summarize() 79 | 80 | return 81 | -------------------------------------------------------------------------------- /lib/util/pascal_voc_eval.py: -------------------------------------------------------------------------------- 1 | # Licensed to the Apache Software Foundation (ASF) under one 2 | # or more contributor license agreements. See the NOTICE file 3 | # distributed with this work for additional information 4 | # regarding copyright ownership. The ASF licenses this file 5 | # to you under the Apache License, Version 2.0 (the 6 | # "License"); you may not use this file except in compliance 7 | # with the License. You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, 12 | # software distributed under the License is distributed on an 13 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 14 | # KIND, either express or implied. See the License for the 15 | # specific language governing permissions and limitations 16 | # under the License. 17 | 18 | 19 | import numpy as np 20 | import os 21 | import cPickle 22 | 23 | 24 | def parse_voc_rec(filename): 25 | """ 26 | parse pascal voc record into a dictionary 27 | :param filename: xml file path 28 | :return: list of dict 29 | """ 30 | import xml.etree.ElementTree as ET 31 | tree = ET.parse(filename) 32 | objects = [] 33 | for obj in tree.findall('object'): 34 | obj_dict = dict() 35 | obj_dict['name'] = obj.find('name').text 36 | obj_dict['difficult'] = int(obj.find('difficult').text) 37 | bbox = obj.find('bndbox') 38 | obj_dict['bbox'] = [int(float(bbox.find('xmin').text)), 39 | int(float(bbox.find('ymin').text)), 40 | int(float(bbox.find('xmax').text)), 41 | int(float(bbox.find('ymax').text))] 42 | objects.append(obj_dict) 43 | return objects 44 | 45 | 46 | def voc_ap(rec, prec, use_07_metric=False): 47 | """ 48 | average precision calculations 49 | [precision integrated to recall] 50 | :param rec: recall 51 | :param prec: precision 52 | :param use_07_metric: 2007 metric is 11-recall-point based AP 53 | :return: average precision 54 | """ 55 | if use_07_metric: 56 | ap = 0. 57 | for t in np.arange(0., 1.1, 0.1): 58 | if np.sum(rec >= t) == 0: 59 | p = 0 60 | else: 61 | p = np.max(prec[rec >= t]) 62 | ap += p / 11. 63 | else: 64 | # append sentinel values at both ends 65 | mrec = np.concatenate(([0.], rec, [1.])) 66 | mpre = np.concatenate(([0.], prec, [0.])) 67 | 68 | # compute precision integration ladder 69 | for i in range(mpre.size - 1, 0, -1): 70 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 71 | 72 | # look for recall value changes 73 | i = np.where(mrec[1:] != mrec[:-1])[0] 74 | 75 | # sum (\delta recall) * prec 76 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 77 | return ap 78 | 79 | 80 | def voc_eval(detpath, annopath, imageset_file, classname, ovthresh=0.5, use_07_metric=False): 81 | """ 82 | pascal voc evaluation 83 | :param detpath: detection results detpath.format(classname) 84 | :param annopath: annotations annopath.format(classname) 85 | :param imageset_file: text file containing list of images 86 | :param classname: category name 87 | :param annocache: caching annotations 88 | :param ovthresh: overlap threshold 89 | :param use_07_metric: whether to use voc07's 11 point ap computation 90 | :return: rec, prec, ap 91 | """ 92 | with open(imageset_file, 'r') as f: 93 | lines = f.readlines() 94 | image_filenames = [x.strip() for x in lines] 95 | 96 | # load annotations from cache 97 | recs = {} 98 | for ind, image_filename in enumerate(image_filenames): 99 | recs[image_filename] = parse_voc_rec(annopath.format(image_filename)) 100 | 101 | # extract objects in :param classname: 102 | class_recs = {} 103 | npos = 0 104 | for image_filename in image_filenames: 105 | objects = [obj for obj in recs[image_filename] if obj['name'] == classname] 106 | bbox = np.array([x['bbox'] for x in objects]) 107 | difficult = np.array([x['difficult'] for x in objects]).astype(np.bool) 108 | det = [False] * len(objects) # stand for detected 109 | npos = npos + sum(~difficult) 110 | class_recs[image_filename] = {'bbox': bbox, 111 | 'difficult': difficult, 112 | 'det': det} 113 | 114 | # read detections 115 | detfile = detpath.format(classname) 116 | with open(detfile, 'r') as f: 117 | lines = f.readlines() 118 | 119 | splitlines = [x.strip().split(' ') for x in lines] 120 | image_ids = [x[0] for x in splitlines] 121 | confidence = np.array([float(x[1]) for x in splitlines]) 122 | bbox = np.array([[float(z) for z in x[2:]] for x in splitlines]) 123 | 124 | # sort by confidence 125 | if bbox.shape[0] > 0: 126 | sorted_inds = np.argsort(-confidence) 127 | sorted_scores = np.sort(-confidence) 128 | bbox = bbox[sorted_inds, :] 129 | image_ids = [image_ids[x] for x in sorted_inds] 130 | 131 | # go down detections and mark true positives and false positives 132 | nd = len(image_ids) 133 | tp = np.zeros(nd) 134 | fp = np.zeros(nd) 135 | for d in range(nd): 136 | r = class_recs[image_ids[d]] 137 | bb = bbox[d, :].astype(float) 138 | ovmax = -np.inf 139 | bbgt = r['bbox'].astype(float) 140 | 141 | if bbgt.size > 0: 142 | # compute overlaps 143 | # intersection 144 | ixmin = np.maximum(bbgt[:, 0], bb[0]) 145 | iymin = np.maximum(bbgt[:, 1], bb[1]) 146 | ixmax = np.minimum(bbgt[:, 2], bb[2]) 147 | iymax = np.minimum(bbgt[:, 3], bb[3]) 148 | iw = np.maximum(ixmax - ixmin + 1., 0.) 149 | ih = np.maximum(iymax - iymin + 1., 0.) 150 | inters = iw * ih 151 | 152 | # union 153 | uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + 154 | (bbgt[:, 2] - bbgt[:, 0] + 1.) * 155 | (bbgt[:, 3] - bbgt[:, 1] + 1.) - inters) 156 | 157 | overlaps = inters / uni 158 | ovmax = np.max(overlaps) 159 | jmax = np.argmax(overlaps) 160 | 161 | if ovmax > ovthresh: 162 | if not r['difficult'][jmax]: 163 | if not r['det'][jmax]: 164 | tp[d] = 1. 165 | r['det'][jmax] = 1 166 | else: 167 | fp[d] = 1. 168 | else: 169 | fp[d] = 1. 170 | 171 | # compute precision recall 172 | fp = np.cumsum(fp) 173 | tp = np.cumsum(tp) 174 | rec = tp / float(npos) 175 | # avoid division by zero in case first detection matches a difficult ground truth 176 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) 177 | ap = voc_ap(rec, prec, use_07_metric) 178 | 179 | return rec, prec, ap 180 | -------------------------------------------------------------------------------- /lib/util/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | 5 | def conv3x3(in_planes, out_planes, stride=1): 6 | """3x3 convolution with padding""" 7 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 8 | padding=1, bias=False) 9 | 10 | class BasicBlock(nn.Module): 11 | expansion = 1 12 | 13 | def __init__(self, inplanes, planes, stride=1, downsample=None): 14 | super(BasicBlock, self).__init__() 15 | self.conv1 = conv3x3(inplanes, planes, stride) 16 | self.bn1 = nn.BatchNorm2d(planes) 17 | self.relu = nn.ReLU(inplace=True) 18 | self.conv2 = conv3x3(planes, planes) 19 | self.bn2 = nn.BatchNorm2d(planes) 20 | self.downsample = downsample 21 | self.stride = stride 22 | 23 | def forward(self, x): 24 | residual = x 25 | 26 | out = self.conv1(x) 27 | out = self.bn1(out) 28 | out = self.relu(out) 29 | 30 | out = self.conv2(out) 31 | out = self.bn2(out) 32 | 33 | if self.downsample is not None: 34 | residual = self.downsample(x) 35 | 36 | out += residual 37 | out = self.relu(out) 38 | 39 | return out 40 | 41 | 42 | class Bottleneck(nn.Module): 43 | expansion = 4 44 | 45 | def __init__(self, inplanes, planes, stride=1, downsample=None): 46 | super(Bottleneck, self).__init__() 47 | #self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 48 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) 49 | self.bn1 = nn.BatchNorm2d(planes) 50 | #self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 51 | # padding=1, bias=False) 52 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1, bias=False) 53 | self.bn2 = nn.BatchNorm2d(planes) 54 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 55 | self.bn3 = nn.BatchNorm2d(planes * 4) 56 | self.relu = nn.ReLU(inplace=True) 57 | self.downsample = downsample 58 | self.stride = stride 59 | 60 | def forward(self, x): 61 | residual = x 62 | 63 | out = self.conv1(x) 64 | out = self.bn1(out) 65 | out = self.relu(out) 66 | 67 | out = self.conv2(out) 68 | out = self.bn2(out) 69 | out = self.relu(out) 70 | 71 | out = self.conv3(out) 72 | out = self.bn3(out) 73 | 74 | if self.downsample is not None: 75 | residual = self.downsample(x) 76 | 77 | out += residual 78 | out = self.relu(out) 79 | 80 | return out 81 | 82 | class BBoxTransform(nn.Module): 83 | 84 | def __init__(self, mean=None, std=None): 85 | super(BBoxTransform, self).__init__() 86 | if mean is None: 87 | self.mean = torch.from_numpy(np.array([0, 0, 0, 0]).astype(np.float32)).cuda() 88 | else: 89 | self.mean = mean 90 | if std is None: 91 | self.std = torch.from_numpy(np.array([0.1, 0.1, 0.2, 0.2]).astype(np.float32)).cuda() 92 | else: 93 | self.std = std 94 | 95 | def forward(self, boxes, deltas): 96 | 97 | widths = boxes[:, :, 2] - boxes[:, :, 0] + 1.0 98 | heights = boxes[:, :, 3] - boxes[:, :, 1] + 1.0 99 | ctr_x = boxes[:, :, 0] + 0.5 * (widths - 1.0) 100 | ctr_y = boxes[:, :, 1] + 0.5 * (heights -1.0) 101 | 102 | dx = deltas[:, :, 0] * self.std[0] + self.mean[0] 103 | dy = deltas[:, :, 1] * self.std[1] + self.mean[1] 104 | dw = deltas[:, :, 2] * self.std[2] + self.mean[2] 105 | dh = deltas[:, :, 3] * self.std[3] + self.mean[3] 106 | 107 | pred_ctr_x = ctr_x + dx * widths 108 | pred_ctr_y = ctr_y + dy * heights 109 | pred_w = torch.exp(dw) * widths 110 | pred_h = torch.exp(dh) * heights 111 | 112 | pred_boxes_x1 = pred_ctr_x - 0.5 * torch.clamp(pred_w-1.0,min=0) 113 | pred_boxes_y1 = pred_ctr_y - 0.5 * torch.clamp(pred_h-1.0,min=0) 114 | pred_boxes_x2 = pred_ctr_x + 0.5 * torch.clamp(pred_w-1.0,min=0) 115 | pred_boxes_y2 = pred_ctr_y + 0.5 * torch.clamp(pred_h-1.0,min=0) 116 | 117 | pred_boxes = torch.stack([pred_boxes_x1, pred_boxes_y1, pred_boxes_x2, pred_boxes_y2], dim=2) 118 | 119 | return pred_boxes 120 | 121 | 122 | class ClipBoxes(nn.Module): 123 | 124 | def __init__(self, width=None, height=None): 125 | super(ClipBoxes, self).__init__() 126 | 127 | def forward(self, boxes, img_shape): 128 | 129 | height,width=img_shape 130 | 131 | boxes[:,:,0]=torch.clamp(boxes[:,:,0],min=0,max=width-1) 132 | boxes[:,:,1]=torch.clamp(boxes[:,:,1],min=0,max=height-1) 133 | boxes[:,:,2]=torch.clamp(boxes[:,:,2],min=0,max=width-1) 134 | boxes[:,:,3]=torch.clamp(boxes[:,:,3],min=0,max=height-1) 135 | 136 | return boxes 137 | -------------------------------------------------------------------------------- /lib/util/voc_eval.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import numpy as np 4 | import json 5 | import os 6 | 7 | import torch 8 | 9 | def evaluate_voc(dataset, model, threshold=0.01): 10 | 11 | model.eval() 12 | 13 | with torch.no_grad(): 14 | 15 | all_boxes = [[[] for _ in xrange(len(dataset))] for _ in xrange(21)] 16 | 17 | for index in range(len(dataset)): 18 | data = dataset[index] 19 | scale = data['scale'] 20 | 21 | # run network 22 | scores, labels, boxes = model([data['img'].permute(2, 0, 1).cuda().float().unsqueeze(dim=0),data['im_info'].cuda()]) 23 | scores = scores.cpu() 24 | labels = labels.cpu() 25 | boxes = boxes.cpu() 26 | 27 | # correct boxes for image scale 28 | boxes[:,2]=boxes[:,2]-boxes[:,0]+1 29 | boxes[:,3]=boxes[:,3]-boxes[:,1]+1 30 | boxes /= scale 31 | boxes[:,2]=boxes[:,2]+boxes[:,0]-1 32 | boxes[:,3]=boxes[:,3]+boxes[:,1]-1 33 | 34 | for j in range(1, 21): 35 | indexes=np.where(labels==j-1)[0] 36 | cls_scores=scores[indexes,np.newaxis] 37 | cls_boxes=boxes[indexes,:] 38 | cls_dets=np.hstack((cls_boxes,cls_scores)) 39 | all_boxes[j][index] = cls_dets[:,:] 40 | 41 | # print progress 42 | print('{}/{}'.format(index, len(dataset)), end='\r') 43 | 44 | dataset.evaluate_detections(all_boxes) 45 | 46 | return 47 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import time 2 | import argparse 3 | import collections 4 | 5 | import numpy as np 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torch.optim as optim 10 | from torch.optim import lr_scheduler 11 | from torch.autograd import Variable 12 | from torchvision import datasets, models, transforms 13 | import torchvision 14 | 15 | from lib.model import model 16 | from lib.dataloader.dataloader import CocoDataset, VocDataset, Resizer, Normalizer 17 | from torch.utils.data import Dataset, DataLoader 18 | 19 | from lib.util import coco_eval 20 | from lib.util import voc_eval 21 | from lib import config 22 | 23 | def main(args=None): 24 | 25 | parser = argparse.ArgumentParser(description='Simple testing script for testing a RetinaNet network.') 26 | 27 | parser.add_argument('--dataset',type=str) 28 | parser.add_argument('--test_epoch',type=int,default=0) 29 | 30 | parser = parser.parse_args(args) 31 | 32 | if parser.dataset=='coco': 33 | config.dataset=config.dataset_coco 34 | elif parser.dataset=='voc': 35 | config.dataset=config.dataset_voc 36 | 37 | set_name=[iset for iset in config.dataset['test_set'].split('+')] 38 | if config.dataset['dataset']=='coco': 39 | dataset_val = CocoDataset(config.dataset['path'], set_name=set_name, transform=transforms.Compose([Normalizer(), Resizer()])) 40 | elif config.dataset['dataset']=='voc': 41 | dataset_val = VocDataset(config.dataset['path'], set_name=set_name, transform=transforms.Compose([Normalizer(), Resizer()])) 42 | else: 43 | raise ValueError('Not implemented.') 44 | 45 | if config.depth == 50: 46 | retinanet = model.resnet50(num_classes=dataset_val.num_classes(), pretrained=True) 47 | elif config.depth == 101: 48 | retinanet = model.resnet101(num_classes=dataset_val.num_classes(), pretrained=True) 49 | else: 50 | raise ValueError('Not implemented.') 51 | 52 | use_gpu=True 53 | if use_gpu: 54 | retinanet = retinanet.cuda() 55 | 56 | retinanet = torch.nn.DataParallel(module=retinanet,device_ids=[config.gpu_ids[0]]).cuda() 57 | 58 | retinanet.load_state_dict(torch.load('models/'+config.dataset['dataset']+'_retinanet_'+str(parser.test_epoch)+'.pt',map_location=lambda storage, loc: storage.cuda())) 59 | 60 | retinanet.training = False 61 | 62 | retinanet.eval() 63 | retinanet.module.freeze_bn() 64 | 65 | if config.dataset['dataset']=='coco': 66 | coco_eval.evaluate_coco(dataset_val, retinanet) 67 | elif config.dataset['dataset']=='voc': 68 | voc_eval.evaluate_voc(dataset_val, retinanet) 69 | else: 70 | raise ValueError('Not implemented.') 71 | 72 | if __name__ == '__main__': 73 | with torch.cuda.device(config.gpu_ids[0]): 74 | main() 75 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | python test.py --dataset coco --test_epoch 99 2 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import time 2 | import argparse 3 | import collections 4 | 5 | import numpy as np 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torch.optim as optim 10 | from torch.optim import lr_scheduler 11 | from torch.autograd import Variable 12 | from torchvision import datasets, models, transforms 13 | import torchvision 14 | 15 | from lib.model import model 16 | from lib.dataloader.dataloader import CocoDataset, VocDataset, collater, AspectRatioBasedSampler, Augmentation 17 | from torch.utils.data import Dataset, DataLoader 18 | 19 | from lib import config 20 | 21 | 22 | print('CUDA available: {}'.format(torch.cuda.is_available())) 23 | 24 | def main(args=None): 25 | 26 | parser = argparse.ArgumentParser(description='Simple training script for training a RetinaNet network.') 27 | 28 | parser.add_argument('--dataset',type=str) 29 | parser.add_argument('--resume',type=bool, default=False) 30 | parser.add_argument('--resume_epoch',type=int, default=-1) 31 | 32 | parser = parser.parse_args(args) 33 | 34 | if parser.dataset=='coco': 35 | config.dataset=config.dataset_coco 36 | elif parser.dataset=='voc': 37 | config.dataset=config.dataset_voc 38 | 39 | set_name=[iset for iset in config.dataset['train_set'].split('+')] 40 | # Create the data loaders 41 | if config.dataset['dataset'] == 'coco': 42 | dataset_train = CocoDataset(config.dataset['path'], set_name=set_name, transform=Augmentation()) 43 | elif config.dataset['dataset'] == 'voc': 44 | dataset_train = VocDataset(config.dataset['path'], set_name=set_name, transform=Augmentation()) 45 | else: 46 | raise ValueError('Not implemented.') 47 | 48 | sampler = AspectRatioBasedSampler(dataset_train, batch_size=config.batch_size*len(config.gpu_ids)) 49 | dataloader_train = DataLoader(dataset_train, num_workers=len(config.gpu_ids), collate_fn=collater, batch_sampler=sampler) 50 | 51 | # Create the model 52 | if config.depth == 50: 53 | retinanet = model.resnet50(num_classes=dataset_train.num_classes(), pretrained=True) 54 | elif config.depth == 101: 55 | retinanet = model.resnet101(num_classes=dataset_train.num_classes(), pretrained=True) 56 | else: 57 | raise ValueError('Not implemented') 58 | 59 | use_gpu = True 60 | 61 | if use_gpu: 62 | retinanet = retinanet.cuda() 63 | 64 | retinanet = torch.nn.DataParallel(module=retinanet,device_ids=config.gpu_ids).cuda() 65 | 66 | retinanet.training = True 67 | 68 | optimizer = optim.SGD(retinanet.parameters(), lr=config.lr, momentum=0.9, weight_decay=1e-4) 69 | 70 | scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=config.dataset['lr_step'], gamma=0.1) 71 | 72 | warmup=config.warmup 73 | begin_epoch=0 74 | if parser.resume==True: 75 | retinanet.load_state_dict(torch.load('./models/'+config.dataset['dataset']+'_retinanet_'+str(parser.resume_epoch)+'.pt')) 76 | begin_epoch=parser.resume_epoch+1 77 | for jj in range(begin_epoch): 78 | scheduler.step() 79 | 80 | cls_loss_hist = collections.deque(maxlen=300) 81 | reg_loss_hist = collections.deque(maxlen=300) 82 | tic_hist = collections.deque(maxlen=100) 83 | 84 | retinanet.train() 85 | retinanet.module.freeze_bn() 86 | 87 | print('Num training images: {}'.format(len(dataset_train))) 88 | 89 | for epoch_num in range(begin_epoch,config.dataset['epochs']): 90 | 91 | retinanet.train() 92 | retinanet.module.freeze_bn() 93 | 94 | tic=time.time() 95 | for iter_num, data in enumerate(dataloader_train): 96 | 97 | optimizer.zero_grad() 98 | 99 | classification_loss, regression_loss = retinanet([data['img'].cuda().float(), data['annot']]) 100 | 101 | classification_loss = classification_loss.mean() 102 | regression_loss = regression_loss.mean() 103 | 104 | loss = classification_loss + regression_loss 105 | 106 | if bool(loss == 0): 107 | continue 108 | 109 | loss.backward() 110 | 111 | if warmup and optimizer._step_count<=config.warmup_step: 112 | init_lr=config.lr 113 | warmup_lr=init_lr*config.warmup_factor + optimizer._step_count/float(config.warmup_step)*(init_lr*(1-config.warmup_factor)) 114 | for ii_ in optimizer.param_groups: 115 | ii_['lr']=warmup_lr 116 | 117 | optimizer.step() 118 | 119 | tic_hist.append(time.time()-tic) 120 | tic=time.time() 121 | speed=(config.batch_size*len(config.gpu_ids)*len(tic_hist))/(np.sum(tic_hist)) 122 | cls_loss_hist.append(float(classification_loss)) 123 | reg_loss_hist.append(float(regression_loss)) 124 | print('Epoch: {} | Iteration: {} | Classification loss: avg: {:1.5f}, cur: {:1.5f} | Regression loss: avg: {:1.5f}, cur: {:1.5f} | Speed: {:1.5f} images per second'.format(epoch_num, iter_num, np.mean(cls_loss_hist), float(classification_loss), np.mean(reg_loss_hist), float(regression_loss), speed)) 125 | 126 | del classification_loss 127 | del regression_loss 128 | 129 | scheduler.step() 130 | 131 | torch.save(retinanet.state_dict(), 'models/{}_retinanet_{}.pt'.format(config.dataset['dataset'], epoch_num)) 132 | 133 | retinanet.eval() 134 | 135 | torch.save(retinanet.state_dict(), 'models/model_final.pt'.format(epoch_num)) 136 | 137 | if __name__ == '__main__': 138 | with torch.cuda.device(config.gpu_ids[0]): 139 | main() 140 | -------------------------------------------------------------------------------- /train.sh: -------------------------------------------------------------------------------- 1 | python train.py --dataset coco 2 | --------------------------------------------------------------------------------